Lecture 5 Announcements#

Lecture exercise to be posted later today (often due in 48 hours after being released)
Hw #1 to be released later today, due on next week thursday
No hand written homeworks! Use ASCII or PDF (or challenge yourself and do them in Latex)

Normalization#

Artist	DJoin	DLeft	DOB	Group	GFormed	Genre
John Lennon	1960	1969	1940	Beatles	1960	British Invasion
Paul McCartney	1960	1970	1942	Beatles	1960	British Invasion
George Harrison	1960	1970	1943	Beatles	1960	British Invasion
Ringo Starr	1962	1970	1940	Beatles	1960	British Invasion
Stuart Sutcliffe	1960	1961	1940	Beatles	1960	British Invasion
Pete Best	1960	1962	1941	Beatles	1960	British Invasion
John Lennon	1960	1969	1940	Beatles	1960	Pop/Rock
Paul McCartney	1960	1970	1942	Beatles	1960	Pop/Rock
George Harrison	1960	1970	1943	Beatles	1960	Pop/Rock
Ringo Starr	1962	1970	1940	Beatles	1960	Pop/Rock
Stuart Sutcliffe	1960	1961	1940	Beatles	1960	Pop/Rock
Pete Best	1960	1962	1941	Beatles	1960	Pop/Rock
George Harrison	1988	1992	1943	The Traveling Wilburys	1988	Pop/Rock
Roy Orbison	1988	1992	1936	The Traveling Wilburys	1988	Pop/Rock
Bob Dylan	1988	1992	1941	The Traveling Wilburys	1988	Pop/Rock
Tom Petty	1988	1992	1950	The Traveling Wilburys	1988	Pop/Rock

Key is a minimal set of attributes such that no two tuples can have the same value for the key.

Key: Group, Artist, Genre

Lots of repeated information, if you change it in one place, you need to change it everywhere
Coupled information, I cannot enter a group without knowing at least one artist in it.
If I delete someone from a group, I may loose other information about them like DOB of an artist.

Functional dependencies#

Given a relation R, a functional dependency (fd) is an expression of the form

X -> Y

where

X and Y are a set of attributes in R Means: whenever X->Y is true, then if two tuples have the same values for all attributes in X, then it must have the same value for all attributes in Y.

Artist -> DOB
Group -> DFormed
Artist Group -> DJoin DLeft #assuming artist cannot join back

Artist -> Artist
Artist Group -> Group ##trivial
Artist Group -> DFormed
Artist Group -> DJoin
Artist Group -> DLeft

Closure of a set of functional dependencies (F+)#

If I have a set of functional dependencies F for a relation R, the closure (F+) is the set of all functional dependencies in F or are implied by F (i.e. obtained by applying the inference rules to F).

MusicGroup(Artist, DJoin, Dleft, DoB, DFormed, Group, Genre)

F= {Artist -> DOB, Group -> DFormed, Artist Group -> DJoin DLeft }

A superkey for a relation R and a set of fds F is a set of attributes X such that X->Y is in F+ and Y is the set of all attributes in R.

A key is a superkey that is minimal.

Functional dependency (fd) set F#

Given a relation R(A1,…,An) and a set of F functional dependencies

-> find keys
-> tell me if this relation is a good one
-> if it is not good, how can I make it into a good one

Given a relation R(A1,…,An) and a set of F functional dependencies, it is possible to infer new functional dependencies.

Closure of a set of fds, F+ is the set of all possible f.d. that are implied by F.

If a functional dependency is implied by F, then it is in F+.

Inference rules:

If \(Y\subseteq X\) then X->Y is true. (Trivial, true for any relation)
If X->Y and Y->Z then X->Z is also true. (Transitivity)
If X->YZ then X->Y and X->Z is also true. (Decomposition, only for the right hand side!)
If X->Y then XZ->YZ is also true. (Augmentation)
If X->Y and X->Z, then X->YZ (Accumulation)

Ex: Given R(A,B,C,D,E) and F = {AB->C, C->D, BC->E, E->A}

Is AB->C in F+? Yes, AB->C is in F

Is ABD->E in F+?

AB->C is in F
ABD->BCD is in F (augmentation with BD)
ABD->BC
ABD->D (decomposition)
Given ABD->BC and BC->E (given), I can conclude: ABD->E (transitivity)

ABD->BD (trivial)

Ex: Given R(A,B) and F = {A->B}

F+ = {A->A, B->B, AB->A, AB->B, AB->AB, A->B, A->AB}

Equivalence of functional dependencies: Given a relation R and two sets of functional dependences, F1 and F2 are equivalence \(F1\equiv F2\) if F1+=F2+ (if they have the same closure).

Ex: Given R(A,B) and F1 = {A->B} and F2 = {A->AB, B->B}

F1+ = {A->A, B->B, AB->A, AB->B, AB->AB, A->B, A->AB}
F2+ = {A->A, B->B, AB->A, AB->B, AB->AB, A->B, A->AB}

F2 = {A->AB, B->B}

A->AB
Decompose: A->A and A->B
B->B

Check if \(F1\equiv F2\)

Check if all X->Y in F1 is implied by F2
Check if all X->Y in F2 is implied by F1

If both are true, then \(F1\equiv F2\)* is true.

Ex: Given R(A,B) and F1 = {A->B} and F2 = {A->AB, B->B}

Given F1:
Is A->B implied by F2? A->AB in F2, decompose A->B
Given F2:
If A->AB implied by F1? A->B in F1, A->AB (augmentation)!
If B->B implied by F1? Yes, trivial

Ex: Given R(A,B,C) and F1 = {A->B, B->C, A->C} and F2 = {A->B, B->C} and F3={A->B, A->C}

F1 equivalent to F2
\(F2\subset F1\), no need to check condition 2
Is A->C implied by F2? Yes, \(F1 \equiv F2\)

F1 equivalent to F3
\(F3\subset F1\), no need to check condition 2
Is B->C implied by F3? No. \(F1 \not\equiv F3\).

Closure of a set of attributes \(X^+\)

Given a relation a relation R and a set of functional dependencies F, \(X^+\) (closure of X) is the set of all attributes that are implied by X given F.

It also means that \(X \rightarrow X^+\) is implied by F.

Algorithm to find the closure:

Initialize \(X^+ = X\)
Repeat until no new attributes are added to \(X^+\)
- Find a functional dependency \(A\rightarrow B\), if \(A\subseteq X^+\), then add B to \(X^+\) (\(X+=X^+\cup B\))

Checking if a functional dependency is in \(F^+\) Given a relation a relation R and a set of functional dependencies F, \(X\rightarrow Y\) is in \(F^+\) if \(Y\subseteq X^+\).

R(A,B,C,D,E,F,G) F = {AB->C, C->D, BC->E, E->A, BEF->G}

BC+

BC+ = {B,C} BC+ = {B,C,D} (C->D) BC+ = {B,C,D,E} (BC->E) BC+ = {A,B,C,D,E} (E->A)

BC->ABCDE

ABD+

ABD+ = {A,B,D} ABD+ = {A,B,C,D,E}

If ABD->C implied by F?
ABD->ABCDE (given ABD+ = {A,B,C,D,E})
ABD->C (decomposition)
YES, ABD->C implied by F
YES, ABD->C IS IN F+.

Given a relation a relation R and a set of functional dependencies F, all possible other functional dependencies implied by F is the closure F+. The closure is the meaning of F (what is true in F).

F1 and F2 can have the same closure, if all fd in F1 are implied by F2 and all fds in F2 are implied by F1

You can check if a functional dependency X->Y is implied by F, you can

Use F and apply inference rules, or
Use X+, and check if Y is in X+.

Superkeys#

Given a relation R and a set of fds F, the set X is a superkey if X+ contains all the attributes in R.

R(A,B,C,D,E) F = {AB->C, C->D, AC->E, E->A}

ABCDE+ = {A,B,C,D,E}
ABCDE->ABCDE
ABCDE is a superkey

Is ABC a superkey? ABC+ ={A,B,C,D,E} ABC is a superkey

Is AB a superkey? AB+ = {A,B,C,D,E} Yes

Is AB a key? A+ = {A} B+ = {B} Yes, A or B is a not key, AB is minimal

Is EB a key: EB+ = {E,B,A,C,D}, (E+ = {E,A}, B+ ={B})

Keys: AB, EB

Key#

Given a relation R and a set of fds F, the set X is a key if

X+ contains all the attributes in R, and
X is minimal (no subset X1 of X has the property that X1+ contains all the attributes)

A key is the minimal set of attributes such that if you have the same value for the attributes in X for two tuples, all the rest of the attributes should be the same.

A key is the minimal set of attributes such that no two different tuples can have the same values for the attributes in key.

R1(A,B,C,D,E) F = {A->B, B->CDE}

Key: A

R2(A,B,C,D,E) F = {A->BC, BD->E}

ABD - superkey Key: AD

R3(A,B,C,D,E) F = {A->BC, BD->E, BE->A}

Key: AD, BD

BED - superkey

R4(A,B,C,D,E) F = {A->BC, B->D}

Key: AE

MG(Artist, DOB, Group, DFormed, DJoin, DLeft, Genre)
Artist -> DOB
Group -> DFormed
Artist Group -> DJoin DLeft

Key: Artist Group Genre

Normal Forms#

Boyce-Codd Normal Form (BCNF)#

Given a relation R and a set of functional dependencies F, we say that R is in BCNF (R is in Boyce-Codd Normal Form) iff for every functional dependency X->Y in F, one of the following is true:

X is a superkey, or
X->Y is trivial.

R1(A,B,C,D,E) F = {A->B, B->CDE}

Key: A

A->B ok! (A is a superkey) B->CDE no (B is not a superkey, B->CDE is not trivial)

R1 is not in BCNF

MG(Artist, DOB, Group, DFormed, DJoin, DLeft, Genre)
Artist -> DOB violates (Artist is not superkey)
Group -> DFormed violates
Artist Group -> DJoin DLeft violates

Key: Artist Group Genre
not in BCNF

R2(A,B,C,D,E) F = {AB->C, ABC->DE, DE->D}

Key: AB

AB->C, ok because AB is a superkey ABC->DE ok because ABC is a superkey DE->D ok trivial

R3(A,B,C,D,E) F = {A->BCDE, BD->D, ABD->C}

Key:A (In BCNF)

MG2(Artist, DOB, Group, DFormed, DJoin, DLeft)
Artist -> DOB
Group -> DFormed
Artist Group -> DJoin DLeft

Key: Artist Group

Lecture 5 Announcements

Contents