Lecture 5: ------------- Normalization Theory --------------------- Key: Given a relation R and fds F, a key is a set of attributes X such that: (a) X+ includes all attributes in R (uniqueness) (b) and no subset of X satisfies (a) (minimality) SuperKey(superset of key): Given a relation R and fds F, a superkey is a set of attributes X such that: (a) X+ includes all attributes in R (uniqueness) By definition, any key is a superkey. Prime Attribute: A prime attribute is an attribute in any key. Equivalence of functional dependencies: Given a set F of fds, F+ is the set of all functional dependencies I can obtain from F using all the inference rules. Two sets F1, and F2, are equivalent iff F1+ = F2+. Minimal Basis: A set of functional dependencies is said to be minimal if we cannot remove anything from them and still have the same closure! If a set F of fds is minimal, then we cannot: - remove a functional dependency - remove an attribute from the left or the right hand side of an f.d. without altering its meaning (i.e. its closure). Algorithm to compute a minimal basis: Input: a set F of fds F = {A->B, B->BC, ABC->D, B->D} 1. First put the set of fds in a basis form using the decomposition rule F = {A->B, B->B, B->C, ABC->D, B->D} 2. Remove all trivial fds F = {A->B, B->C, ABC->D, B->D} 3. Suppose X->Y is in F and F'=F-{X->Y}. Compute X+ in F and F', if they are the same, then we can remove X->Y. F = {A->B, B->C, ABC->D, B->D} F' = {A->B, ABC->D, B->D} --removed B->C In F, B+ = {B,C,D} In F', B+ = {B,D}, not the same, canot remove B->C F = {A->B, B->C, ABC->D, B->D} F' = {A->B, B->C, ABC->D} -- removed B->D in F, B+ = {B,C,D} in F', B+ = {B,C} Cannot remove B->D F = {A->B, B->C, ABC->D, B->D} F' = {A->B, B->C, B->D} -- removed ABC->D in F, ABC+={A,B,C,D} in F', ABC+={A,B,C,D} Can remove ABC->D 4. Suppose XZ->Y is in F. Construct, F'= F - {XZ->Y} union {X->Y} Check if X+ is the same in F and F', if so, then F' becomes F. F = {A->B, B->C, B->D} -- done, because no extra attribute on the left 5. Use combining rule to return a set of fds F = {A->B, B->CD} ---------- New example: F2 = {A->B, B->C, AB->D} F2' = {A->B, B->C, B->D} -- removed A from AB->D in F, B+={B,C} in F', B+={B,C,D} -- cannot remove A F2 = {A->B, B->C, AB->D} F2' = {A->B, B->C, A->D} -- removed B from AB->D in F, A+={A,B,C,D} in F', A+={A,B,C,D} -- Yes, can remove B Final: F2 = {A->B, B->C, A->D} F2 = {A->BD, B->C} ---------- R(A,B,C,D,E,F,G) F = { AB->C, BD->BEF, CDF->AG, ABC->G, ABC->D } Step 1: F = { AB->C, BD->B, BD->E, BD->F, CDF->G, CDF->A, ABC->G, ABC->D } Step 2: F = { AB->C, BD->B, BD->E, BD->F, CDF->G, CDF->A, ABC->G, ABC->D } F = { AB->C, BD->E, BD->F, CDF->G, CDF->A, ABC->G, ABC->D } Step 3: F = { AB->C, BD->E, BD->F, CDF->G, CDF->A, ABC->G, ABC->D } F' = { AB->C, BD->E, BD->F, CDF->G, CDF->A, ABC->D } -- remove ABC->G in F, ABC+ = {A,B,C,D,E,F,G} in F', ABC+ = {A,B,C,D,E,F,G} Step 4: F = { AB->C, BD->E, BD->F, CDF->G, CDF->A, ABC->D } F' = { AB->C, BD->E, BD->F, CDF->G, CDF->A, AB->D } == changed ABC->D to AB->D in F, AB+ = {A,B,C,D,E,F,G} in F', AB+ = {A,B,C,D,E,F,G} Step 5: F = { AB->CD, BD->EF, CDF->AG } This is a minimal set. -------------- Decomposition: Given a relation R and a set F of fds, a decomposition is given by R1(X), R2(Y), ... where X,Y are sets of attributes of R, computed by: R1 = project_{X} R R2 = project_{Y} R ... such that X union Y union ... make up all the attributes in R. ------ Lossless Decomposition: Given R and F, a decomposition R1, R2, ...,Rn is lossless if it is guaranteed that R1 * R2 * ... * Rn = R i.e. the natural join is guaranteed to return the same results as the original relation. All decompositions must be lossless! Algorithm to check if a decomposition is lossless ---------------------------------------------------- Given R and F, a decomposition R1, R2, ...,Rn construct a relation R such that For each decomposed relation, there is a tuple in R where attributes in Ri has no subscripts (known values) and the rest with a new subscript for each tuple (unknown values) Apply functional dependencies like X-Y, if two tuples have the same values for X, then make Y values the same (if the value for one tuple is known, make the other known, otherwise make them the same unknown value) Continue until no rules can be applied. If in the resulting relation, there is a tuple with no subscript, then the decomposition is lossless. If in the resulting relation, there is NO tuple with no subscript, then the decomposition is lossy and the resulting relation is a counter example. R(A,B,C,D,E) F = {AB->C, B->D, C->E} R1(A,B,C) R2(A,B,E) R3(D,E) A B C D E a b c d1 e1 a b c2 d2 e a3 b3 c3 d e Apply AB->C A B C D E a b c d1 e1 a b c d2 e a3 b3 c3 d e Apply C->E A B C D E a b c d1 e a b c d2 e a3 b3 c3 d e Apply B->D A B C D E a b c d1 e a b c d1 e a3 b3 c3 d e Lossy decomposition! No tuple with no subscripts. ----------- R(A,B,C) F={A->B} R1(A,C), R2(B,C) A B C a b1 c a2 b c Can't apply any of the rules! Lossy R3(A,B), R4(A,C) A B C a b c1 a b2 c Apply A->B A B C a b c1 a b c <-- no subscript, lossless decomposition ------------- Projection of functional dependencies to a decomposition Given a relation R and a set F of fds and a decomposed relation R1. The projection of F into R1 is the set of all functional dependencies in F+ that only contain the attributes of R1. R(A,B,C,D,E,F,G) F = {AB->CD, BD->EF, CDF->AG } R1(A,B,D,G) F1 = {AB->DG} AB+ = {A,B,C,D,E,F,G} AB -> ABDG simplify to find minimal basis AB -> DG R2(A,B,C,E,F) F2 = {AB->CEF} AB->CEF F1 union F2 = {AB->CDEFG}, not equivalent to F CDF+ = {C,D,F} F = {AB->CD, BD->EF, CDF->AG } CDF+ = {C,D,F,A,G} ------------- S(A,B,C) F = {A->B, B->C} S1(A,B) F1 = {A->B} S2(B,C) F2 = {B->C} Then, F1 union F2 = {A->B, B->C}, is equal to F. This decomposition is dependency preserving! Dependency Preserving Decompositions -------------------------------------- Given a relation R and a set F of fds., and a decomposition R1, R2,...,Rn and suppose F1, F2, ... Fn, are the projection of F onto R1, R2, .. Rn respectively. If F1 union F2 union .... Fn equivalent to F, then we say that this is a dependency preserving decomposition. Showname, Startyear, Creator, StreamSite, URL, ShowSeasons, StreamSeasons Showname -> StartYear, Creator, ShowSeasons (assuming a single creator) StreamSite -> URL Showname StreamSite -> StreamSeasons URL -> StreamSite Showname URL -> StreamSeasons Student(RIN, SSN, FirstName, LastName, Email, Major, Advisor) RIN->SSN, FirstName, LastName, Email SSN->RUN RIN, Major -> Advisor ----------