intro
module1 : Your first database
Introduction
This is an introduction to relational databases with practical case we are going to investigate to get our hands dirty and not just rely on the theortical side. In the resources section below I mentioned some of the cool resources to get started learning and working with sql and relational databases.
our case:
Why to use relational database?
real life entites becomes table: A database models real-life entites like professors, students, universities by storying them on tables
reduces redundancy: each table contains data from a single entity type, this reduces redundnacy by storing entites only once
data integrity by relationships: the database can be used to model relationships between entities, for example: a professor can work on a mutliple universities, a company can have mutiple workers we can define how excatly data relates to each others
what you're going to learn:
creating and manipulating with the data
creating a relational database from scratch
learning the following three concepts, which help preserve data quality on databases:
constraints
keys
referential integrity
Query information_schema with SELECT
Information_schema: is a meta-database that holds information about your current database, it has multiple tables you can query with the select * from syntax:
tables: information about all tables in your current database
columns: information about all columns in all of the tables in your current database
and many more.
public schema holds information about user-defined tables and databases
tables: the core of relational databases:
tables is the most important concept of relational database, In this section we are going to see how deal with redundancy in tables let's see what's inside the first three records of 'university_professors' tables

Here we will begin to notice that professor 'karl' is repeated on the first three column and his university 'ETH Lausanne' also repeated couple of times because he only works for this university, However he has a affliations with at least three differenet organization as seen So there is a certain redundancy in this table, and the reason for this redundancy is that table contains entites of at least different types let's look at them

professors highlighted in green
universities highlighted in blue
organization highlighted in brown
let's look at the database once again, the graphic shown here is called entity-relationship diagram

squares denotes entity types, while circles denote attributes(columns) as we said before that the table 'university_professor' contains at least three different entities, so we need to model them now, each one at it's own table

Now after we've created the entity-relationship diagram, the next step is to actually make those tables and connected them together But before we actually do this, we need a refresher of the syntax to do so:
CREAT TABLE syntax
Your first task:
create table professors table with two text column: firstname, lastname
create table university table with three text column: university_shortname, university, university_city
solving the task:
oops we forget to add university_shortname to professor table! don't worry to do so you need to alter the table then add the column the syntax to do so:
Let's to it
updating the tables
only the university_professor holds data. Now we've 5 different tables so far, it's time to migrate data from the university_professors table to the 4-other tables
If we have another look at the data on the university_professor table, we see:

but if we see the unique organizations stored on the university_professor table:

So before coping the data from the original table, we need first to make sure we copy the distinct data to reduce redundancy, in order to this we use the ""INSERT INTO SELECT DISTINCT" patterns the syntax as follow:
so out query will be like this:
In the last section when we create the tables, in the affliations table, instead of typing 'organization' we've typed 'organisation, so we need to fix this now, to do so the syntax as follow:
also we don't need the university_shortname column, so are going to delete(drop) it from affliations table:
but why did we delete the university_shortname at the first place? querying the university_professors table for uniquefirstname, lastname and university_shortname results of 551 unique records

and querying only the firstname and lastname results of 551 unique recordss 
so the firstname and lastname uniquely identify a professor
Now after migrating the data from the university_professor table to other table, it is not needed anymore, so we are going to delete it.
module2 : Enforce data consistency with attribute constraints
Better data quality with constraints
You are going to learn:
what are constraints? and why to use them?
making our data consistent using those constrains
Althought we have setup a simple database, we haven't used most of the features database provides the idea of database it push data into certain structure, a pre-defined model where you enforce data types, relationships, and other rules these rules are called 'Integrity constrains'
Integrity constrains:
Integrity constraints can be divided into three types:
Attribute constraints: e.g data type
key constraints: e.g primary data type
Referential integrity constraints: e.g foreign keys
In this section we are going to explore the first type, and in the next two sections, we will look at the other two
why constraints?
Constrains enforce a certain data form: things can get messy when people are in charge of entering data, so we need to make sure that data entered are consistent and have a rigid form!
Help enforce data qualiy, and data quality is important for both users and developers
here are the common data types on postgreSQL that work as attributes constraints: 
Data types also restrict possbile sql operations on stored data: for example
this will produce error, exactly as almost programming language will do
the solution for this is to use CASTing as follows:
This is on-the-fly type conversion
working with data type:
Enforced on columns (i.e. attributes)
Define the so-called 'domain' values of a column: meaning what form these values can take and what not
Define what operations are possible -Enforece consistent storage of values
PostgreSQL datatypes(some):
text : character strings of any length
varchar [ (x) ] : a maximum of n characters
char [ (x) ] : a
fxed-length string of n characters
boolean : can only take three states, e.g. TRUE , FALSE and NULL (unknown)
specifying types while creation:
specifying types after creation:
Now let's get back to our database tables, we need to do some Constrains
Task:
in professor table specify a fixed-length character type with the correct length for university_shortname.
change the type of firstname to varchar(64)
solution
Convert types USING a function
If you don't want to reserve too much space for a certain varchar column, you can truncate the values before converting its type.
The not-null and unique constraints
those are the most important two attributes constraints
Not-null constraints:
Disallow any NULL values
Must hold true for the current state and the future state so you can't insert null on that columns
what does null means?
Nothing xD, HAHAHAHA actually it does represent 'non exising value':
Does not exist or
unknown
adding or removing not-null constraints
While creation
After creation:
Unique constraint:
Disallow duplicate values on a column
Must hold true for the current and future
what does 'holding true' means? meaning that you couldn't apply unique constrains if the column does hold actually duplicates, because think of it, how the DBMS know now what to remove from what to persist?
Adding or removing unique constraints:
TASK:
add not-null constraints on professors' firstname and lastname
make universities' shortname unique
Make organizations.organization unique
solution:
module3: Uniquely identify records with key constraints
keys and superkeys
let's take a look at our current database: 
After completing this section, our database will look like this: 
what is that underlyed id? where did it come from? and why? we will examine all those question right now
what is key:
Attribute(s) that identify a record uniquely: Normally, a table, as whole, only contain unique records, meaning that the combination of all attributes is a key in itself, However it's not called a key, but a superkey if attributes from that combination can be removed, and attributes still uniquely identify records If all possible attributes have been removed but the records are still uniquely identifed by the remaing attributes, we speak of minimal superkey
As long as attributes can be removed: superkey
if no more attributes can be removed: minimal superkey or key: this is a actual key, so the key is always minimal
Let's look at an example: a database from the textbook: Fundementals of Database Systems

the table shows 6 different cars, so the combination of all attributes is attributes
if we remove the year from the the superkey, the six records are still unique, so it's still a super key
Actually there are alot of superkeys in this example
However there are only 4 minimal superkeys
remember that superkeys are minimal when no attributes can be removed without losing the uniqueness property, this is trivival for k1 to k3 because they only consist of single attribute
also if we remove year from k4, 'make' would contain duplicates, and would, therefore be no longer suited as key.
there four minimal superkeys are also called candidate keys why candidate keys? In the end, there can only be one key for the table, which has to be chosen from the candidate keys
Identify keys with SELECT COUNT DISTINCT:
How to know if a given column could be qualified to be a key? There are some steps you can follow to know:
Count the distinct records for all possible combinations of columns. If the resulting number x equals the number of all rows in the table for a combination, you have discovered a superkey.
Then remove one column after another until you can no longer remove columns without seeing the number x decrease. If that is the case, you have discovered a (candidate) key.
The table professors has 551 rows. It has only one possible candidate key, which is a combination of two attributes, so let's try to find that candidate key
output: 551, let's try to remove firstname
output: 546, so let's try to remove lastname and use firstname instead
output: 479, so let's try again removing the university_shortname
output: 551 , that's it Indeed, the only combination that uniquely identifies professors is {firstname, lastname}. {firstname, lastname, university_shortname} is a superkey, and all other combinations give duplicate values. Hopefully, the concept of superkeys and keys is let's move to primary key!
primar key:
primary key is one of the most important concepts in database design, almost every database shoud have primary key chosen by you from the set of candidate keys. the main purpose as already explained, is uniquely identifying records in a table this makes it easier to reference these records from other tables. you may have seen that primary keys need to be defined on columns that don't accept duplicat or null values
the syntax of specifying primary key:
upon creation
to exising table:
so back to your database tables, we now need to make a primary key for every table we have, so let's do it:
Surrogate keys: (_id)
surrogate keys are sort of an artificial primry key. they are not based on a native column on your data, but on a column that just exists for the sake of having a primary key.
Primary keys should be build from as few columns as possible
Primary keys should never change over time.
let's look at this table:
in this table, the 'license_no' column would be suited as the primary key - the license_no is unlikely to change over time, not like the color column, for example, which might change if the car is repainted. so there's no need for surrogate key here but if we have this table:
the only sensible primary key here for this table is the combination of {make, model} but this are two column for a primary key we can add a surrogate key column, called "id" to solve this problem
Adding a surrogate key with serial data type
using serial data type in PostgreSQL which allow us to add auto incrementing number column
Another type of surrogate key
back to our database tables, we need to add surrogate key to professors table, because it's attributes can't make a suitable primary key, we could have two professor with the name working for different universites which does make duplicates
an example of combining two table together:
module4: Glue together tables with foreign keys
Model 1:N relationships with foreign keys
a foreign key(FK) points to the primary key(PK) of another table
Domain of FK must be equal to domain of PK: there are some resitrictions to the foreign keys: first the domain and the datatyp must be the same as one of the primary key. seconldy
Each value of FK must exist in PK pf the other table(FK constraint or 'referential integrity'): only foreign keys values are allowed that exist as values un the primary key of the referenced table
FKs are not actual keys: because duplicates are allowe
our database model will be like this: 
A more than just one professor can work into one univeristy not multiple universities at the same times
example of the syntax used to make foriegn keys:
upon creation
after creation
now let's implement this on our database tables
the result is 
Model more complex relationships
how to implement many to many relationship?
create ordinary table that contain two foriegn keys that points to both connected entites
in our tables: we have this relationship now
as professor can work for multiple organizations and an organization could have mutliple professors working for it
in this case we need to add additional attributes such as function
notice that: no primary keyis defined here, because theoritcaly a professor can have mutliple functions in one organization
now we will be populating the professor id column on affliations from professors.id column directly here is the syntax:
now let's implement this:
let's drop firstname and lastname from affliations, we don't need it anymore
Referntial integrity
round up
count afflitions per university
Join all the tables together
In this last exercise, your task is to find the university city of the professor with the most affiliations in the sector "Media & communication".
For this,
you need to join all the tables, group by some column, and then use selection criteria to get only the rows in the correct sector. Let's do this in three steps!
Last updated