Lecture 5 Announcements#
Lecture exercise to be posted later today (often due in 48 hours after being released)
Hw #1 to be released later today, due on next week thursday
No hand written homeworks! Use ASCII or PDF (or challenge yourself and do them in Latex)
Normalization#
Artist |
DJoin |
DLeft |
DOB |
Group |
GFormed |
Genre |
|---|---|---|---|---|---|---|
John Lennon |
1960 |
1969 |
1940 |
Beatles |
1960 |
British Invasion |
Paul McCartney |
1960 |
1970 |
1942 |
Beatles |
1960 |
British Invasion |
George Harrison |
1960 |
1970 |
1943 |
Beatles |
1960 |
British Invasion |
Ringo Starr |
1962 |
1970 |
1940 |
Beatles |
1960 |
British Invasion |
Stuart Sutcliffe |
1960 |
1961 |
1940 |
Beatles |
1960 |
British Invasion |
Pete Best |
1960 |
1962 |
1941 |
Beatles |
1960 |
British Invasion |
John Lennon |
1960 |
1969 |
1940 |
Beatles |
1960 |
Pop/Rock |
Paul McCartney |
1960 |
1970 |
1942 |
Beatles |
1960 |
Pop/Rock |
George Harrison |
1960 |
1970 |
1943 |
Beatles |
1960 |
Pop/Rock |
Ringo Starr |
1962 |
1970 |
1940 |
Beatles |
1960 |
Pop/Rock |
Stuart Sutcliffe |
1960 |
1961 |
1940 |
Beatles |
1960 |
Pop/Rock |
Pete Best |
1960 |
1962 |
1941 |
Beatles |
1960 |
Pop/Rock |
George Harrison |
1988 |
1992 |
1943 |
The Traveling Wilburys |
1988 |
Pop/Rock |
Roy Orbison |
1988 |
1992 |
1936 |
The Traveling Wilburys |
1988 |
Pop/Rock |
Bob Dylan |
1988 |
1992 |
1941 |
The Traveling Wilburys |
1988 |
Pop/Rock |
Tom Petty |
1988 |
1992 |
1950 |
The Traveling Wilburys |
1988 |
Pop/Rock |
Key is a minimal set of attributes such that no two tuples can have the same value for the key.
Key: Group, Artist, Genre
Lots of repeated information, if you change it in one place, you need to change it everywhere
Coupled information, I cannot enter a group without knowing at least one artist in it.
If I delete someone from a group, I may loose other information about them like DOB of an artist.
Functional dependencies#
Given a relation R, a functional dependency (fd) is an expression of the form
X -> Y
where
X and Y are a set of attributes in R Means: whenever X->Y is true, then if two tuples have the same values for all attributes in X, then it must have the same value for all attributes in Y.
Artist -> DOB
Group -> DFormed
Artist Group -> DJoin DLeft #assuming artist cannot join back
Artist -> Artist
Artist Group -> Group ##trivial
Artist Group -> DFormed
Artist Group -> DJoin
Artist Group -> DLeft
Closure of a set of functional dependencies (F+)#
If I have a set of functional dependencies F for a relation R, the closure (F+) is the set of all functional dependencies in F or are implied by F (i.e. obtained by applying the inference rules to F).
MusicGroup(Artist, DJoin, Dleft, DoB, DFormed, Group, Genre)
F= {Artist -> DOB, Group -> DFormed, Artist Group -> DJoin DLeft }
A superkey for a relation R and a set of fds F is a set of attributes X such that X->Y is in F+ and Y is the set of all attributes in R.
A key is a superkey that is minimal.
Functional dependency (fd) set F#
Given a relation R(A1,…,An) and a set of F functional dependencies
-> find keys
-> tell me if this relation is a good one
-> if it is not good, how can I make it into a good one
Given a relation R(A1,…,An) and a set of F functional dependencies, it is possible to infer new functional dependencies.
Closure of a set of fds, F+ is the set of all possible f.d. that are implied by F.
If a functional dependency is implied by F, then it is in F+.
Inference rules:
If \(Y\subseteq X\) then X->Y is true. (Trivial, true for any relation)
If X->Y and Y->Z then X->Z is also true. (Transitivity)
If X->YZ then X->Y and X->Z is also true. (Decomposition, only for the right hand side!)
If X->Y then XZ->YZ is also true. (Augmentation)
If X->Y and X->Z, then X->YZ (Accumulation)
Ex: Given R(A,B,C,D,E) and F = {AB->C, C->D, BC->E, E->A}
Is AB->C in F+? Yes, AB->C is in F
Is ABD->E in F+?
AB->C is in F
ABD->BCD is in F (augmentation with BD)
ABD->BC
ABD->D (decomposition)
Given ABD->BC and BC->E (given), I can conclude: ABD->E (transitivity)
ABD->BD (trivial)
Ex: Given R(A,B) and F = {A->B}
F+ = {A->A, B->B, AB->A, AB->B, AB->AB, A->B, A->AB}
Equivalence of functional dependencies: Given a relation R and two sets of functional dependences, F1 and F2 are equivalence \(F1\equiv F2\) if F1+=F2+ (if they have the same closure).
Ex: Given R(A,B) and F1 = {A->B} and F2 = {A->AB, B->B}
F1+ = {A->A, B->B, AB->A, AB->B, AB->AB, A->B, A->AB}
F2+ = {A->A, B->B, AB->A, AB->B, AB->AB, A->B, A->AB}
F2 = {A->AB, B->B}
A->AB
Decompose: A->A and A->B
B->B
Check if \(F1\equiv F2\)
Check if all X->Y in F1 is implied by F2
Check if all X->Y in F2 is implied by F1
If both are true, then \(F1\equiv F2\)* is true.
Ex: Given R(A,B) and F1 = {A->B} and F2 = {A->AB, B->B}
Given F1:
Is A->B implied by F2? A->AB in F2, decompose A->BGiven F2:
If A->AB implied by F1? A->B in F1, A->AB (augmentation)!
If B->B implied by F1? Yes, trivial
Ex: Given R(A,B,C) and F1 = {A->B, B->C, A->C} and F2 = {A->B, B->C} and F3={A->B, A->C}
F1 equivalent to F2
\(F2\subset F1\), no need to check condition 2
Is A->C implied by F2? Yes, \(F1 \equiv F2\)
F1 equivalent to F3
\(F3\subset F1\), no need to check condition 2
Is B->C implied by F3? No. \(F1 \not\equiv F3\).
Closure of a set of attributes \(X^+\)
Given a relation a relation R and a set of functional dependencies F, \(X^+\) (closure of X) is the set of all attributes that are implied by X given F.
It also means that \(X \rightarrow X^+\) is implied by F.
Algorithm to find the closure:
Initialize \(X^+ = X\)
Repeat until no new attributes are added to \(X^+\)
Find a functional dependency \(A\rightarrow B\), if \(A\subseteq X^+\), then add B to \(X^+\) (\(X+=X^+\cup B\))
Checking if a functional dependency is in \(F^+\) Given a relation a relation R and a set of functional dependencies F, \(X\rightarrow Y\) is in \(F^+\) if \(Y\subseteq X^+\).
R(A,B,C,D,E,F,G) F = {AB->C, C->D, BC->E, E->A, BEF->G}
BC+
BC+ = {B,C} BC+ = {B,C,D} (C->D) BC+ = {B,C,D,E} (BC->E) BC+ = {A,B,C,D,E} (E->A)
BC->ABCDE
ABD+
ABD+ = {A,B,D} ABD+ = {A,B,C,D,E}
If ABD->C implied by F?
ABD->ABCDE (given ABD+ = {A,B,C,D,E})
ABD->C (decomposition)
YES, ABD->C implied by F
YES, ABD->C IS IN F+.
Given a relation a relation R and a set of functional dependencies F, all possible other functional dependencies implied by F is the closure F+. The closure is the meaning of F (what is true in F).
F1 and F2 can have the same closure, if all fd in F1 are implied by F2 and all fds in F2 are implied by F1
You can check if a functional dependency X->Y is implied by F, you can
Use F and apply inference rules, or
Use X+, and check if Y is in X+.
Superkeys#
Given a relation R and a set of fds F, the set X is a superkey if X+ contains all the attributes in R.
R(A,B,C,D,E) F = {AB->C, C->D, AC->E, E->A}
ABCDE+ = {A,B,C,D,E}
ABCDE->ABCDE
ABCDE is a superkey
Is ABC a superkey? ABC+ ={A,B,C,D,E} ABC is a superkey
Is AB a superkey? AB+ = {A,B,C,D,E} Yes
Is AB a key? A+ = {A} B+ = {B} Yes, A or B is a not key, AB is minimal
Is EB a key: EB+ = {E,B,A,C,D}, (E+ = {E,A}, B+ ={B})
Keys: AB, EB
Key#
Given a relation R and a set of fds F, the set X is a key if
X+ contains all the attributes in R, and
X is minimal (no subset X1 of X has the property that X1+ contains all the attributes)
A key is the minimal set of attributes such that if you have the same value for the attributes in X for two tuples, all the rest of the attributes should be the same.
A key is the minimal set of attributes such that no two different tuples can have the same values for the attributes in key.
R1(A,B,C,D,E) F = {A->B, B->CDE}
Key: A
R2(A,B,C,D,E) F = {A->BC, BD->E}
ABD - superkey Key: AD
R3(A,B,C,D,E) F = {A->BC, BD->E, BE->A}
Key: AD, BD
BED - superkey
BD
R4(A,B,C,D,E) F = {A->BC, B->D}
Key: AE
MG(Artist, DOB, Group, DFormed, DJoin, DLeft, Genre)
Artist -> DOB
Group -> DFormed
Artist Group -> DJoin DLeft
Key: Artist Group Genre
Normal Forms#
Boyce-Codd Normal Form (BCNF)#
Given a relation R and a set of functional dependencies F, we say that R is in BCNF (R is in Boyce-Codd Normal Form) iff for every functional dependency X->Y in F, one of the following is true:
X is a superkey, or
X->Y is trivial.
R1(A,B,C,D,E) F = {A->B, B->CDE}
Key: A
A->B ok! (A is a superkey) B->CDE no (B is not a superkey, B->CDE is not trivial)
R1 is not in BCNF
MG(Artist, DOB, Group, DFormed, DJoin, DLeft, Genre)
Artist -> DOB violates (Artist is not superkey)
Group -> DFormed violates
Artist Group -> DJoin DLeft violates
Key: Artist Group Genre
not in BCNF
R2(A,B,C,D,E) F = {AB->C, ABC->DE, DE->D}
Key: AB
AB->C, ok because AB is a superkey ABC->DE ok because ABC is a superkey DE->D ok trivial
R3(A,B,C,D,E) F = {A->BCDE, BD->D, ABD->C}
Key:A (In BCNF)
MG2(Artist, DOB, Group, DFormed, DJoin, DLeft)
Artist -> DOB
Group -> DFormed
Artist Group -> DJoin DLeft
Key: Artist Group