Lecture 5 Announcements#

  • Lecture exercise to be posted later today (often due in 48 hours after being released)

  • Hw #1 to be released later today, due on next week thursday

  • No hand written homeworks! Use ASCII or PDF (or challenge yourself and do them in Latex)

Normalization#

Artist

DJoin

DLeft

DOB

Group

GFormed

Genre

John Lennon

1960

1969

1940

Beatles

1960

British Invasion

Paul McCartney

1960

1970

1942

Beatles

1960

British Invasion

George Harrison

1960

1970

1943

Beatles

1960

British Invasion

Ringo Starr

1962

1970

1940

Beatles

1960

British Invasion

Stuart Sutcliffe

1960

1961

1940

Beatles

1960

British Invasion

Pete Best

1960

1962

1941

Beatles

1960

British Invasion

John Lennon

1960

1969

1940

Beatles

1960

Pop/Rock

Paul McCartney

1960

1970

1942

Beatles

1960

Pop/Rock

George Harrison

1960

1970

1943

Beatles

1960

Pop/Rock

Ringo Starr

1962

1970

1940

Beatles

1960

Pop/Rock

Stuart Sutcliffe

1960

1961

1940

Beatles

1960

Pop/Rock

Pete Best

1960

1962

1941

Beatles

1960

Pop/Rock

George Harrison

1988

1992

1943

The Traveling Wilburys

1988

Pop/Rock

Roy Orbison

1988

1992

1936

The Traveling Wilburys

1988

Pop/Rock

Bob Dylan

1988

1992

1941

The Traveling Wilburys

1988

Pop/Rock

Tom Petty

1988

1992

1950

The Traveling Wilburys

1988

Pop/Rock

Key is a minimal set of attributes such that no two tuples can have the same value for the key.

Key: Group, Artist, Genre

  1. Lots of repeated information, if you change it in one place, you need to change it everywhere

  2. Coupled information, I cannot enter a group without knowing at least one artist in it.

  3. If I delete someone from a group, I may loose other information about them like DOB of an artist.

Functional dependencies#

Given a relation R, a functional dependency (fd) is an expression of the form

X -> Y

where

  • X and Y are a set of attributes in R Means: whenever X->Y is true, then if two tuples have the same values for all attributes in X, then it must have the same value for all attributes in Y.

Artist -> DOB
Group -> DFormed
Artist Group -> DJoin DLeft #assuming artist cannot join back

Artist -> Artist
Artist Group -> Group ##trivial
Artist Group -> DFormed
Artist Group -> DJoin
Artist Group -> DLeft

Closure of a set of functional dependencies (F+)#

If I have a set of functional dependencies F for a relation R, the closure (F+) is the set of all functional dependencies in F or are implied by F (i.e. obtained by applying the inference rules to F).

MusicGroup(Artist, DJoin, Dleft, DoB, DFormed, Group, Genre)

F= {Artist -> DOB, Group -> DFormed, Artist Group -> DJoin DLeft }

A superkey for a relation R and a set of fds F is a set of attributes X such that X->Y is in F+ and Y is the set of all attributes in R.

A key is a superkey that is minimal.

Functional dependency (fd) set F#

Given a relation R(A1,…,An) and a set of F functional dependencies

-> find keys
-> tell me if this relation is a good one
-> if it is not good, how can I make it into a good one

Given a relation R(A1,…,An) and a set of F functional dependencies, it is possible to infer new functional dependencies.

Closure of a set of fds, F+ is the set of all possible f.d. that are implied by F.

If a functional dependency is implied by F, then it is in F+.

Inference rules:

  1. If \(Y\subseteq X\) then X->Y is true. (Trivial, true for any relation)

  2. If X->Y and Y->Z then X->Z is also true. (Transitivity)

  3. If X->YZ then X->Y and X->Z is also true. (Decomposition, only for the right hand side!)

  4. If X->Y then XZ->YZ is also true. (Augmentation)

  5. If X->Y and X->Z, then X->YZ (Accumulation)

Ex: Given R(A,B,C,D,E) and F = {AB->C, C->D, BC->E, E->A}

Is AB->C in F+? Yes, AB->C is in F

Is ABD->E in F+?

AB->C is in F
ABD->BCD is in F (augmentation with BD)
ABD->BC
ABD->D (decomposition)
Given ABD->BC and BC->E (given), I can conclude: ABD->E (transitivity)

ABD->BD (trivial)

Ex: Given R(A,B) and F = {A->B}

F+ = {A->A, B->B, AB->A, AB->B, AB->AB, A->B, A->AB}

Equivalence of functional dependencies: Given a relation R and two sets of functional dependences, F1 and F2 are equivalence \(F1\equiv F2\) if F1+=F2+ (if they have the same closure).

Ex: Given R(A,B) and F1 = {A->B} and F2 = {A->AB, B->B}

F1+ = {A->A, B->B, AB->A, AB->B, AB->AB, A->B, A->AB}
F2+ = {A->A, B->B, AB->A, AB->B, AB->AB, A->B, A->AB}

F2 = {A->AB, B->B}

A->AB
Decompose: A->A and A->B
B->B

Check if \(F1\equiv F2\)

  1. Check if all X->Y in F1 is implied by F2

  2. Check if all X->Y in F2 is implied by F1

If both are true, then \(F1\equiv F2\)* is true.

Ex: Given R(A,B) and F1 = {A->B} and F2 = {A->AB, B->B}

  1. Given F1:
    Is A->B implied by F2? A->AB in F2, decompose A->B

  2. Given F2:
    If A->AB implied by F1? A->B in F1, A->AB (augmentation)!
    If B->B implied by F1? Yes, trivial

Ex: Given R(A,B,C) and F1 = {A->B, B->C, A->C} and F2 = {A->B, B->C} and F3={A->B, A->C}

F1 equivalent to F2
\(F2\subset F1\), no need to check condition 2
Is A->C implied by F2? Yes, \(F1 \equiv F2\)

F1 equivalent to F3
\(F3\subset F1\), no need to check condition 2
Is B->C implied by F3? No. \(F1 \not\equiv F3\).

Closure of a set of attributes \(X^+\)

Given a relation a relation R and a set of functional dependencies F, \(X^+\) (closure of X) is the set of all attributes that are implied by X given F.

It also means that \(X \rightarrow X^+\) is implied by F.

Algorithm to find the closure:

  1. Initialize \(X^+ = X\)

  2. Repeat until no new attributes are added to \(X^+\)

    • Find a functional dependency \(A\rightarrow B\), if \(A\subseteq X^+\), then add B to \(X^+\) (\(X+=X^+\cup B\))

Checking if a functional dependency is in \(F^+\) Given a relation a relation R and a set of functional dependencies F, \(X\rightarrow Y\) is in \(F^+\) if \(Y\subseteq X^+\).

R(A,B,C,D,E,F,G) F = {AB->C, C->D, BC->E, E->A, BEF->G}

BC+

BC+ = {B,C} BC+ = {B,C,D} (C->D) BC+ = {B,C,D,E} (BC->E) BC+ = {A,B,C,D,E} (E->A)

BC->ABCDE

ABD+

ABD+ = {A,B,D} ABD+ = {A,B,C,D,E}

If ABD->C implied by F?
ABD->ABCDE (given ABD+ = {A,B,C,D,E})
ABD->C (decomposition)
YES, ABD->C implied by F
YES, ABD->C IS IN F+.

Given a relation a relation R and a set of functional dependencies F, all possible other functional dependencies implied by F is the closure F+. The closure is the meaning of F (what is true in F).

F1 and F2 can have the same closure, if all fd in F1 are implied by F2 and all fds in F2 are implied by F1

You can check if a functional dependency X->Y is implied by F, you can

  • Use F and apply inference rules, or

  • Use X+, and check if Y is in X+.

Superkeys#

Given a relation R and a set of fds F, the set X is a superkey if X+ contains all the attributes in R.

R(A,B,C,D,E) F = {AB->C, C->D, AC->E, E->A}

ABCDE+ = {A,B,C,D,E}
ABCDE->ABCDE
ABCDE is a superkey

Is ABC a superkey? ABC+ ={A,B,C,D,E} ABC is a superkey

Is AB a superkey? AB+ = {A,B,C,D,E} Yes

Is AB a key? A+ = {A} B+ = {B} Yes, A or B is a not key, AB is minimal

Is EB a key: EB+ = {E,B,A,C,D}, (E+ = {E,A}, B+ ={B})

Keys: AB, EB

Key#

Given a relation R and a set of fds F, the set X is a key if

  1. X+ contains all the attributes in R, and

  2. X is minimal (no subset X1 of X has the property that X1+ contains all the attributes)

A key is the minimal set of attributes such that if you have the same value for the attributes in X for two tuples, all the rest of the attributes should be the same.

A key is the minimal set of attributes such that no two different tuples can have the same values for the attributes in key.

R1(A,B,C,D,E) F = {A->B, B->CDE}

Key: A

R2(A,B,C,D,E) F = {A->BC, BD->E}

ABD - superkey Key: AD

R3(A,B,C,D,E) F = {A->BC, BD->E, BE->A}

Key: AD, BD

BED - superkey

BD

R4(A,B,C,D,E) F = {A->BC, B->D}

Key: AE

MG(Artist, DOB, Group, DFormed, DJoin, DLeft, Genre)
Artist -> DOB
Group -> DFormed
Artist Group -> DJoin DLeft

Key: Artist Group Genre

Normal Forms#

Boyce-Codd Normal Form (BCNF)#

Given a relation R and a set of functional dependencies F, we say that R is in BCNF (R is in Boyce-Codd Normal Form) iff for every functional dependency X->Y in F, one of the following is true:

  1. X is a superkey, or

  2. X->Y is trivial.

R1(A,B,C,D,E) F = {A->B, B->CDE}

Key: A

A->B ok! (A is a superkey) B->CDE no (B is not a superkey, B->CDE is not trivial)

R1 is not in BCNF

MG(Artist, DOB, Group, DFormed, DJoin, DLeft, Genre)
Artist -> DOB violates (Artist is not superkey)
Group -> DFormed violates
Artist Group -> DJoin DLeft violates

Key: Artist Group Genre
not in BCNF

R2(A,B,C,D,E) F = {AB->C, ABC->DE, DE->D}

Key: AB

AB->C, ok because AB is a superkey ABC->DE ok because ABC is a superkey DE->D ok trivial

R3(A,B,C,D,E) F = {A->BCDE, BD->D, ABD->C}

Key:A (In BCNF)

MG2(Artist, DOB, Group, DFormed, DJoin, DLeft)
Artist -> DOB
Group -> DFormed
Artist Group -> DJoin DLeft

Key: Artist Group