Lecture 4: ------------- Normalization Theory --------------------- Functional Dependencies (fd): Given a relation R is given by A1..An -> B1...Bm Where A1..An, B1...Bm are attributes in R. (X -> Y where X and Y are sets of attributes in R) It means that whenever you have two tuples in R with the same values for attributes A1..An, then these tuples must also have the same values for attributes B1...Bm. ----------------- Showname, Startyear, Creator, StreamSite, URL, ShowSeasons, StreamSeasons Showname -> StartYear, Creator, ShowSeasons (assuming a single creator) StreamSite -> URL Showname StreamSite -> StreamSeasons URL -> StreamSite URL -> URL Showname URL -> StreamSeasons ------------- Functional Dependency Inference Rules Given a relation R and a set F of functional dependencies, the following inference rules allows you to find new functional dependencies (fds) 1. Trivial (always true for any relation) If X is a subseteq of Y, then Y->X e.g. A->A AB->A 2. Transitivity: If X -> Y, Y->Z, then X->Z e.g. AB-> C C -> D then: AB->D 3. Decomposition If X -> YZ, then X->Y and X->Z Showname -> StartYear, Creator, ShowSeasons Showname -> StartYear Showname -> Creator Showname -> ShowSeasons 4. Combining rule If X->Y and X->Z then X->YZ e.g. Given: Showname -> StartYear Showname -> Creator Then, Showname -> StartYear, Creator 5. Augmentation rule: If X->Y, then XZ -> YZ e.g. Given: AB->CD , ABE->CDE --------- Given a relation R and a set F of functional dependencies, X->Y is implied by F if we can obtain X->Y from F using inference rules. F = { Showname -> StartYear, Creator, ShowSeasons StreamSite -> URL Showname StreamSite -> StreamSeasons URL -> StreamSite } See if URL -> URL is true given F? Yes, URL-> URL is trivial. See if URL -> StreamSite is true given F? Yes, it is in F. See if Showname URL -> StreamSeasons is true given F? Given URL -> StreamSite Showname URL -> Showname StreamSite Augmentation Showname StreamSite -> StreamSeasons Transitivity: Showname URL -> StreamSeasons ------------- Closure of a set of attributes: Given a relation R and a set F of functional dependencies, and a set X of attributes in R, the closure of X (denoted by X+) is the set of all attributes implied by X with respect to F. Then, X -> X+ is implied by F. Algorithm to compute the closure of X: Step 1: X+ = {X} Initialize (X->X) Step 2: Do until no new attributes are added: find a functional dependency Y->Z such that Y is a subset of X then add Z to X+. R(A,B,C,D,E,F,G) F={ A->BCD E->F F->E AE->G } A+ = {A} A+ = {A,B,C,D} (A->BCD) done! B+ ={B} E+ ={E} E+ = {E,F} AE+ = {A,E} AE+ = {A,B,C,D,E} (A->BCD) AE+ = {A,B,C,D,E,F} (E->F) AE+ = {A,B,C,D,E,F,G} (AE->G) AE-> ABCDEFG ------------------------------------------------------ Key: Given a relation R and fds F, a key is a set of attributes X such that: (a) X+ includes all attributes in R (uniqueness) (b) and no subset of X satisfies (a) (minimality) R(A,B,C,D,E,F,G) F={A->BCD, E->F, F->E, AE->G} AE is a key for R. AF is a key for R. A+ = {A,B,C,D}, F+ = {E,F} AF+ = {A,B,C,D,E,F,G} ---------------------- R1(A,B,C,D,E,F,G) F1 = {AB->CD, DE->F, F->G} Key: ABE ABE+ = {A,B,C,D,E,F,G} R2(A,B,C,D,E,F,G) F2 = {AB->C, BCD->F, F->AEG} Keys: ABD, BCD, BDF R3(A,B,C,D,E,F,G) F3 = {AB->CDEF, F-> AG} Keys: AB, FB R4(A,B,C,D,E,F,G) F4 ={AB->CDEF, F->A} Keys: ABG, FBG ---------- Students(RIN, SSN, Name, Address, Major, Advisor, Email, Hobby) RIN->SSN, Name, Address, Email SSN-> RIN RIN, Major -> Advisor Key: RIN Major Hobby, SSN Major Hobby ------------- SuperKey (superset of key): Given a relation R and fds F, a superkey is a set of attributes X such that: (a) X+ includes all attributes in R (uniqueness) By definition, any key is a superkey. R(A,B,C,D) F={A->BCD} Key: A Superkey: A, AB, AC, AD, ABC, ABD, ACD, ABCD Normal Forms: ------------- Boyce-Codd Normal Form (BCNF): Given a relation R and a set of fds F, R is in Boyce-Codd Normal Form iff, every functional dependency X->Y in F satisfies one of the following conditions: (a) X->Y is trivial, or (b) X is a superkey! R1(A,B,C,D) F={A->BCD} Key: A, in BCNF A->BCD A is a superkey, R1 is in BCNF R2(A,B,C,D) F={AD->B, ABD->C, AC->A} Key: AD, in BCNF AD->B ok, AD is a superkey ABD->C ok, ABD is a superkey AC->A ok, trivial R3(A,B,C,D) F={AD->B, AB->C} Key: AD, is not in BCNF AD->B ok, AD is a superkey AB->C no, AB is not a superkey R4(A,B,C,D) F={AD->B, B->C} Key: AD, is not IN BCNF B->C B is not a superkey R5(A,B,C,D) F={AD->BC, B->A} Key: AD, BD (not in BCNF) AD->BC ok, AD is a superkey B->A not ok, B is not a superkey R6(A,B,C,D) F={A->BC} Key: AD Not in BCNF A->BC, no A is not a superkey --------- Showname, Startyear, Creator, StreamSite, URL, ShowSeasons, StreamSeasons Showname -> StartYear, Creator, ShowSeasons (assuming a single creator) StreamSite -> URL Showname StreamSite -> StreamSeasons URL -> StreamSite Key: Showname Streamsite, Showname, URL Showname -> StartYear, Creator, ShowSeasons not ok, Showname is not a superkey StreamSite -> URL not ok, Streamsite is not a superkey URL -> StreamSite not ok, URL is not a superkey Showname StreamSite -> StreamSeasons, Not in BCNF ----- If a relation is not in BCNF, then it can have -> duplicate data, must be updated all for a single change -> deleting something may cause a loss of data for another (not being available in a streaming site, means we will loose data about the show) ---------- BASIS FORM: We say that a set of functional dependencies is in a basis form if there is only one attribute on the right hand side. (You can easily put a set in basis form by using the decomposition rule!) {A->BCD} put in basis form: {A->B, A->C, A->D} PRIME ATTRIBUTES: A prime attribute is an attribute in any key. R1(A,B,C,D,E,F,G) F1 = {AB->CD, DE->F, F->G} Key: ABE Prime attributes: A,B,E R2(A,B,C,D,E,F,G) F2 = {AB->C, BCD->F, F->AEG} Keys: ABD, BCD, BDF Prime attributes: A,B,C,D,F ---------- Third Normal Form: Given a relation R and a set of fds F in basis form, R is in Third Normal Form iff, every functional dependency X->Y in F satisfies one of the following conditions: (a) X->Y is trivial, or (b) X is a superkey, or (c) Y is a prime attribute. If a relation is in BCNF, then it is also in 3NF. R1(A,B,C,D) F={A->BCD} Key: A, in BCNF -> also in 3NF A->BCD A is a superkey, R1 is in BCNF R2(A,B,C,D) F={AD->B, ABD->C, AC->A} Key: AD, in BCNF -> also in 3NF AD->B ok, AD is a superkey ABD->C ok, ABD is a superkey AC->A ok, trivial R3(A,B,C,D) F={AD->B, AB->C} Key: AD, is not in BCNF, not in 3NF AD->B ok, AD is a superkey AB->C no, AB is not a superkey and C is not a prime attribute R4(A,B,C,D) F={AD->B, B->C} Key: AD, is not IN BCNF, not in BCNF B->C B is not a superkey, C is not a prime attribute R5(A,B,C,D) F={AD->BC, B->A} Key: AD, BD (not in BCNF), but in 3NF! AD->BC ok, AD is a superkey B->A not ok for BCNF, B is not a superkey, but ok for 3NF A is a prime attribute R6(A,B,C,D) F={A->BC} Key: AD Not in BCNF, not in 3NF A->B, no A is not a superkey, B is not a prime attribute A->C, no A is not a superkey, C is not a prime attribute ----------------------- Minimal Basis: A set of functional dependencies is said to be minimal if we cannot remove anything from them and still get the same meaning! Given a set F of fds, F+ is the set of all functinal dependencies I can obtain from F using all the inference rules. Two sets F1, and F2, are equivalent iff F1+ = F2+. Alternatively: If every functional dependency X->Y in F1, is implied by F2 (or X+ in F2 and check if Y is in it!) and If every functional dependency X->Y in F2, is implied by F1 (or X+ in F1 and check if Y is in it!) Then, F1 and F2 are equivalent. F1={A->B, B->C} F2={A->B,B->C,A->C} A->C ? A+={A,B,C} C is in A+ F3={A->B, A->C} F2={A->B,B->C,A->C} B->C ? B+ = {}