Lecture 5: Also known as: Everything you wanted to know about functional dependencies but were afraid to ask Announcements: ----------- - Hw 1 is due this thursday at midnight. - Office hours have been updated, please check course website. - Lecture 5 exercise after class -------------------------------- FUNCTIONAL DEPENDENCY: Given a relation R, a functional dependency (fd) is an expression of the form: X -> Y where X and Y are sets of attributes from R. If X -> Y, then whenever you have two tuples in R with the same values for attributes in X, then they must have the same values for attributes in Y. -------------------------- INFERENCE RULES FOR FUNCTIONAL DEPENDENCIES: Given a relation R and a set of F of fds, the following are inference rules: 1. Trivial: If Y is a subset or equal to X, then X->Y is implied by F. 2. Combining rule: If X->Y and X->Z are in F, then X->YZ is implied by F. 3. Decomposition: If X->YZ is in F, then X->Y and X->Z are both implied by F. 4. Transitivity: If X->Y, Y->Z are in F, then X->Z is implied by F. 5. Augmentation: If X->Y is in F, then XA->YA is implied by F. ------------------------------------------------------ THE CLOSURE OF A SET OF FUNCTIONAL DEPENDENCIES (F+): Given a relation R and a set of F of fds, the closure of F given by F+ is the set of all functional dependencies implied by F. F+ = F union All trivial FDs union all other fds implied by F ---------------------------------------- THE CLOSURE OF A SET OF ATTRIBUTES (X+):Given a relation R and a set of F of fds, the closure of a set X of attributes, given by X+ is the set of all atttributes Y that are implied by X. To compute of X+: - Initialize X+ = {X} - Repeat until X+ does not change: Find Y->Z such that Y is a subseteq of X+, then add Z to X+ Then, we have that X-> X+ is implied by F (or it is in F+) F = {AB->CD, D->E, EF->G} ABC->ABC ABC+ = {A,B,C} ABC+ = {A,B,C,D,E} ABC->ABCDE is implied by F (or is in F+) ------------------------------------- SUPERKEY: Given a relation R and a set of F of fds, X is a superkey if X+ is the set of all attributes in R. ------------------------------------- KEY: A Key is a minimal set of attributes X that imply all the attributes in R. Every key is a superkey by definition. Key: A set of attributes X such that no two tuples can have same values for X. R(A,B,C,D,E) F={AB->CDE} Key: AB a1 b1 c1 d1 e1 a1 b2 c1 d1 e1 AB->CDE AB ->AB AB->ABCDE AB+ = {A,B,C,D,E} ------------------------------------- PRIME ATTRIBUTE: If an attribute is in a key (any key), it is called a prime attribute. ------------------------------------- BOYCE CODD NORMAL FORM (BCNF): Given a relation R and a set of F of fds. R is in BCNF iff and only one of the following is true for every functional dependency X->Y in F: a) either X->Y is trivial (Y is a subseteq of X), b) or X is a superkey of R. ------------------------------------ THIRD NORMAL FORM (3NF): Given a relation R and a set of F of fds. R is in 3NF iff and only one of the following is true for every functional dependency X->Y in F: a) either X->Y is trivial (Y is a subseteq of X), b) or X is a superkey of R, c) all of Y are prime attributes. R5(A,B,C,D) F = {AB->CD, BC->B, D->A} Key: AB, BD In 3NF? Yes. AB->CD , AB is a superkey (condition b), OK BC->B , trivial (condition a), ok D->A , D is not a superkey, but A is a prime attribute (condition c). OK. ------------------------------------------ F1 = {AB->CD, BC->B, D->A} F2 = {AB->CD, D->A} ------------------------------------- Two fd sets F1 and F2 are equivalent (F1 ≡ F2) have the same meaning), iff F1+ = F2+. To check if F1 ≡ F2 you can check: a) everything in F1 is implied by F2, and b) everything in F2 is implied by F1. F1 = {AB->CD, D->A} F2 = {AB->BCD, D->A, BD->C} a) everything in F1 is implied by F2, and True! b) everything in F2 is implied by F1. AB->BCD , using F1: AB+={A,B,C,D}, so it is implied by F1. D->A, in F1 so it is implied by F1. BD->C, using F1, BD+={B,D,A,C} since C is in the closure, this is implied by F1. Hence: F1 ≡ F2. -------------- MINIMAL COVER: A set F of fds is a minimal cover if nothing can be removed from it without changing F+. --------------- BASIS: A set F is in a basis form if every fd has a single attribute on the right hand side. ----------------- Algorithm to compute the minimal basis: -------------------------------------- Given F: Step 1) Convert it to basis form Step 2) Remove all trivial functional dependencies Step 3) Given F, remove X->Y from F to get F'. If X+ is the same in F and F', then I can remove X->Y from F. Step 4) Given F, we can replace XZ->Y in F with X->Y to get F'. If X+ is the same in F and F', then we can change XZ->Y to X->Y. Optional Step 5) Use the combining rule to combine fds with the same left hand side. ----------------------------- Given: F = {AB->BCD, D->AE, DEF->DG, ABCF->G} Step 1: F = {AB->B, AB->C, AB->D, D->A, D->E, DEF->D, DEF->G, ABCF->G} Step 2: Remove: AB->B, DEF->D F = {AB->C, AB->D, D->A, D->E, DEF->G, ABCF->G} Step 3: F = {AB->C, AB->D, D->A, D->E, DEF->G, ABCF->G} AB+ = {A,B,C,D,E} F' = {AB->D, D->A, D->E, DEF->G, ABCF->G} AB+ = {A,B,D,E} Hence, cannot remove AB->C ----- F = {AB->C, AB->D, D->A, D->E, DEF->G, ABCF->G} DEF+ = {D,E,F,G,A} F' = {AB->C, AB->D, D->A, D->E, ABCF->G} DEF+ = {D,E,F,A} Cannot remove DEF->G ------- F = {AB->C, AB->D, D->A, D->E, DEF->G, ABCF->G} ABCF+ = {A,B,C,F,G,D,E} F' = {AB->C, AB->D, D->A, D->E, DEF->G} ABCF+ = {A,B,C,F,D,E,G} Can remove ABCF->G Step 4: F = {AB->C, AB->D, D->A, D->E, DEF->G} A+ = {A} F' = {A->C, AB->D, D->A, D->E, DEF->G} A+ = {A,C} I cannot remove B in AB->C ------ F = {AB->C, AB->D, D->A, D->E, DEF->G} DF+ = {D,F,A,E,G} F' = {AB->C, AB->D, D->A, D->E, DF->G} DF+ = {D,F,G,A,E} Same, so F' becomes F. (I can remove E from DEF->G) Result: F = {AB->C, AB->D, D->A, D->E, DF->G} Step 5) F = {AB->CD, D->AE, DF->G} ----------- PROJECTION OF FUNCTIONAL DEPENDENCIES Given a relation R and a set F of fds, if R2 = PROJECT_{A1...AN} (R) the projection of F into F2 is the set of all functional dependencies in F+ that only contain attributes in R2 (simplify by finding the minimal basis). To compute: find X+ for every subset of attributes in R2, add X->X+ to the set F2, then find the minimal basis. ------------ R1(A,B,C,D,E,F) F1 = {AB->CD, AC->EF} R11(A,B,C,D) F11={AB->CD} R12(A,C,E,F) F12={AC->EF} ---------------- R2(A,B,C,D,E,F,G,H) F2 = {AB->C, C->D, BE->F, F->G, F->H} R21(A,B,D,E,G) F21={AB->D, BE->G} R22(B,C,D,F,H) F22={C->D, F->H} ------------------------- DECOMPOSITION: A decomposition of R into R1,...,Rn is given by: If A1,...,An are the attributes in R, then: R1 = PROJECT_{A1,...,An} (R) VALID: A decomposition is valid if R1,...,Rn contains all the attributes in R. LOSSLESS DECOMPOSITION: If it is guaranteed that R1 join R2 join .... join Rn = R, then the decomposition is lossless. (or R1 * R2 * .... * Rn = R0 DEPENDENCY PRESERVING: If F1, ..., Fn are the projection of F onto R1, ..,Rn then F1 union ... union Fn is equivalent to F, then the decomposition is dependency preserving. ------------------------------ TVShows(Showname, StartYear, EndYear, Actor, Role, DateOfBirth, JoinedCast, LeftCast) Showname -> StartYear EndYear Actor -> DateOfBirth ShowName Actor Role -> JoinedCast LeftCast T1(Showname, StartYEar, Endyear) Farscape 1999 2003 Bab 5 1993 1998 T2(Actor, DateOfBirth) Ben Browder 1962 Lani Tupu 1955 .... T3(ShowName, Actor, Role, JoinedCast, LeftCast) Bab 5 Lani Tupu Pilot 1999 2003 Bab 5 Lani Tupu Captain Crais 1999 2002 ...... T1 * T2 * T3 ------------------ CHASE Algorithm to check if a decomposition is lossless GIven a set F of fds and a decomposition R1, ..., Rn of a relation R. Compute a canonical relation R as follows: -> A tuple for each Ri, and each attribute in Ri will have no subscripts (known value) and the other attributes will have a unique subscript (unknown value) -> Use the functional dependencies to recover unknowns If: X->Y and two tuples have the same X values then make Y values the same If one Y value is known, then make the other one known, otherwise make them the same value. -> Repeat until no row changes or you find a row with no subscripts. If there is a tuple with no subscripts, then the relation is LOSSLESS. If there is no tuples with no subscripts, then the relation is not lossless AND the resulting relation is a counter example! -------------- R(A,B,C,D,E,F,G,H) F ={AB->C, C->D, BE->F, F->G, F->H} R1(A,B,C,D) R2(A,B,E,F) R3(F,G,H) A B C D E F G H ----------------------- a b c d e1 f1 g1 h1 a b c2 d2 e f g2 h2 a3 b3 c3 d3 e3 f g h AB->C A B C D E F G H ----------------------- a b c d e1 f1 g1 h1 a b c d2 e f g2 h2 a3 b3 c3 d3 e3 f g h C->D A B C D E F G H ----------------------- a b c d e1 f1 g1 h1 a b c d e f g2 h2 a3 b3 c3 d3 e3 f g h F ={AB->C, C->D, BE->F, F->G, F->H} F->G A B C D E F G H ----------------------- a b c d e1 f1 g1 h1 a b c d e f g h2 a3 b3 c3 d3 e3 f g h F->H A B C D E F G H ----------------------- a b c d e1 f1 g1 h1 a b c d e f g h <--- no subscript, lossless. a3 b3 c3 d3 e3 f g h