------------------------ Decomposition: A decomposition of R into R1,R2,...,Rn is valid if R1,..,Rn make up all the attributes in R and is given by: R1 = project_{attributes of R1} (R) R2 = project_{attributes of R2} (R) ... Rn = project_{attributes of Rn} (R) A good decomposition is: 1. lossless: required property [all decompositions should be lossless!] 2. dependency preserving [desired property!] Lossless -------- A decomposition of R into R1,R2,..Rn is lossless iff we are guaranteed for every possible instance of R that R = R1*R2*...*Rn Students(RIN, Name, Email, Major) RIN->Name Email ---- violates BCNF RIN Name Email Major a1 b1 c1 d1 a1 b1 c1 d2 a2 b1 c2 d1 S1(RIN, Name, Email) RIN Name Email a1 b1 c1 a2 b1 c2 S2(Name, Major) Name Major b1 d1 b1 d2 Result = S1 * S2 RIN Name Email Major a1 b1 c1 d1 a1 b1 c1 d2 a2 b1 c2 d1 a2 b1 c2 d2 -- extra tuple, not in the original student relation This decomposition is not lossless! Chase Algorithm For Checking Lossless Decompositions ---------------------------------------------------- Given a relation R, a set fds F, and its decomposition R1, ...,Rn, First construct a canonical relation that has a tuple for each relation in the decomposition: --- for each tuple for Ri, if the attribute is in Ri, we represent it as an unsubscripted variable (known value!) otherwise, we are going to have a subscript (a distinct number for each tuple!) Given a canonical relation, apply all possible fds in F, X->Y such that if you have two tuples with the same X values, then make the Y values the same: -- if one tuple has no subscript for an attribute in Y, then change all the variables to no subscript, otherwise make all of them the same (same subscript). Continue until no more changes are possible. When finished, if there is a tuple with no subscripts, then this relation is lossless. If there is no such tuple, then this resulting canonical relation is a counter example of why this is a lossy decomposition. (no subscript = known value, has subscript = unknown value/variable!) Ex: Students(RIN, Name, Email, Major) RIN->Name Email S1(RIN, Name, Email) S2(Name, Major) RIN Name Email Major r n e m1 -- for S1 r2 n e2 m -- for S2 No two tuples have the same RIN, so this is the final form. Lossy decomposition! S1: (r,n,e), (r2,n,e2) S2: (n,m1), (n,m) S1*S2: RIN Name Email Major r n e m1 r n e m r2 n e2 m1 r2 n e2 m Ex: Students(RIN, Name, Email, Major) RIN->Name Email S1(RIN, Name, Email) S2(RIN, Major) RIN Name Email Major r n e m1 r n2 e2 m Apply RIN -> Name Email RIN Name Email Major r n e m1 r n e m Last row has no subscript, so it is lossless! ------------------------ R(A,B,C,D,E,F,G) AB->C CD->EF F->A R1(A,B,C,D,E) R2(C,D,F) R3(A,B,D,G) R4(A,F) A B C D E F G a b c d e f1 g1 a2 b2 c d e2 f g2 a b c3 d e3 f3 g a b4 c4 d4 e4 f g4 Apply: AB->C A B C D E F G a b c d e f1 g1 a2 b2 c d e2 f g2 a b c d e3 f3 g a b4 c4 d4 e4 f g4 Apply: CD->EF A B C D E F G a b c d e f g1 a2 b2 c d e f g2 a b c d e f g a b4 c4 d4 e4 f g4 Third row has no subscript, this is a lossless decomposition! Decomposition: A decomposition of R into R1,R2,...,Rn is valid if R1,..,Rn make up all the attributes in R and is given by: R1 = project_{attributes of R1} (R) R2 = project_{attributes of R2} (R) ... Rn = project_{attributes of Rn} (R) A good decomposition is: 1. lossless: required property [all decompositions should be lossless!] 2. dependency preserving [desired property!] Lossless -------- A decomposition of R into R1,R2,..Rn is lossless iff we are guaranteed for every possible instance of R that R = R1*R2*...*Rn Projection of functional dependencies to a decomposition -------------------------------------------------------- Given a relation R and a set F of fd, the projection of F into a decomposed relation R1 is F1 and F1 is given by the set of all functional dependencies in F+ that only contain the attributes in R1. R(A,B,C,D) F = {A->B, B->C, C->D} Ex. R1(A,B,C) F1 = {A->B, B->C} R2(C,D) F2 = {C->D} Ex2. R1(A,B,D) F1 = {A->B, A->D, B->D} R2(C,D) F2 = {C->D} "Algorithm" for finding projection: Find all subsets X of attributes of R1, compute closure w.r.t. F, add into F1 all non-trivial fds X -> Y, s.t., Y is both in X+ and an attribute of R1. Find the minimal basis. Functional dependency preserving decomposition ----------------------------------------------- A decomposition of R and a set of fds F into R1,R2,..Rn is functional dependency preserving if, say F1,...,Fn are projections of F into R1,...,Rn then: F is equivalent to F1 union F2 union ... union Fn. (meaning that they have the same closure!) Check for fd preservation: R(A,B,C,D) F = {A->B, B->C, C->D} Ex. R1(A,B,C) F1 = {A->B, B->C} R2(C,D) F2 = {C->D} F1 union F2 = {A->B, B->C, C->D}, this is equal to F. This is fd preserving. Ex2. R1(A,B,D) F1 = {A->B, A->D, B->D} R2(C,D) F2 = {C->D} F1 union F2 = {A->B, A->D, B->D, C->D} F = {A->B, B->C, C->D} (Since F1 union F2 is obtained by projection, then everything in F1 and F2 is implied by F) Check if F1 union F2 implies everything in F: A->B: in F1 union F2 B->C: compute w.r.t. F1 union F2: B+ = {B,D}, C is not implied. C->D: in F1 union F2 B->C is not preserved. So this is not a dependency preserving decomposition. Third Normal Form Decompositions --------------------------------- Given a relation R and a minimal set F of fds is not in 3NF, we compute the following decomposition: - Create a new relation for each functional dependency - If no relation has all the attributes in one of the keys, then add one more relation with attributes of one of the keys - If one relation has a subset of the attributes of another relation then remove the smaller relation. 3NF decomposition is: - always lossless - dependency preserving - all resulting relations are in 3NF Ex: R(A,B,C,D,E,F) AB->C CD->EF F->A Keys: ABD, BDF, BCD 3NF decomposition: R1(A,B,C) F1 = {AB->C} Key: AB R2(C,D,E,F) F2 = {CD->EF} Key: CD R3(A,F) F3 = {F->A} Key: F R4(A,B,D) F4 = {} Key: ABD S(A,B,C,D) AB->C A->D C->A Key:AB S1(A,B,C) F1 = {AB->C, C->A} Keys: AB, BC (in 3NF, not in BCNF) S2(A,D) F2 = {A->D} Keys: A (in 3NF, BCNF) BCNF Decomposition ------------------- Given a relation R and a minimal set F of fds. R is not in BCNF, we compute the following decomposition: - Find any X->Y in F that violates BCNF, compute X+ in F. Create two new relations instead of R: R1 contains attributes in X+, and R2 contains all attributes in R except for attributes in (X+ - X). - Compute projection of fds for R1 and R2, and if either one is not in BCNF, then apply BCNF decomposition recursively until all resulting relations are in BCNF. BCNF Decompositions are: - Guaranteed to be lossless! - Not guaranteed to be dependency preserving Ex: S(A,B,C,D) AB->C A->D violates BCNF C->A violates BCNF Key:AB A->D A+ = {A,D} S1(A,D) F1={A->D} in BCNF S2(A,B,C) (everything in S except for A+ - {A} = {D}) F2 = {AB->C, C->A}, not in BCNF Decompose further using C->A: S21(A,C) F21 ={C->A} in BCNF S22(B,C) F22 ={} in BCNF Done! Resulting functional dependencies: S1(A,D) F1={A->D} in BCNF S21(A,C) F21 ={C->A} in BCNF S22(B,C) F22 ={} in BCNF ---> F1 union F21 union F22 = {A->D, C->A} clearly: AB->C in F is lost. ------------------------------- Additional Exercise for BCNF Decomposition! Ex: R(A,B,C,D,E,F) AB->C CD->EF F->A Keys: ABD, BDF, BCD Decompose using: CD->EF CD+ = {C,D,E,F,A} R1(A,C,D,E,F) F1 = {CD->EF, F->A}, Keys: CD, F->A violates BCNF R2(B,C,D) F2 = {} Key: BCD in BCNF everything in R, except CD+ - {CD} = {E,F,A} Decompose using F->A R11(A,F) F11 = {A->F} Key: A , in BCNF R12(C,D,E,F) F12 = {CD->EF} Key: CD, in BCNF Resulting relations: AF CDEF BCD Check if using different sequence of fds, do I get different results? Fourth Normal Form ------------------------------------ Students(RIN, Name, Email, Hobby, PhoneNumber) RIN-> Name Email x not in BCNF Key: RIN, Hobby, PhoneNumber Student can have multiple hobbies Student can have multiple phones BCNF Decomposition: Students(RIN, Name, Email) {RIN-> Name, Email}, Key: RIN, in BCNF StudentInfo(RIN, Hobby, PhoneNumber) {}, Key: all attributes, in BCNF RIN Hobby PhoneNumber 1 Running 1234 1 Running 4567 1 Baking 1234 1 Baking 4567 2 Joomchi 3456 2 Joomchi 0954 2 Lampworking 0954 2 Lampworking 3456 Multi-valued attributes RIN ->> Hobby RIN ->> PhoneNumber (RIN, Hobby) (RIN, PhoneNumber) Inference rules Trivial MVDs ------------ A1 … An =>> B1 … Bm is true if {B1, …,Bm } ⊆ {A1,…,An} If all the attributes of relation R are A1 … An, B1 … Bm, then A1 … An =>> B1 … Bm holds in R. Transitive rule --------------- A1 … An =>> B1 … Bm and B1 … Bm =>> C1 … Ck implies A1 … An =>> C1 … Ck (where A, B, C are sets of attributes). Any C's that are also A's must be deleted from the right side. Note that the splitting part of the splitting/combining rule does not apply to MVDs. Combining rule -------------- If A1 … An =>> B1 … Bm and A1 … An =>> =>> C1 … Ck then A1 … An =>> B1 … Bm C1 … Ck Augmentation rule ----------------- If A1 … An =>> B1 … Bm then If A1 … An C1 … Ck =>> B1 … Bm FD promotion ------------ Every FD is also an MVD. If A1 … An -> B1 … Bm then A1 … An =>> B1 … Bm is also true. Complementation rule -------------------- If A1 … An =>> B1 … Bm is true and C1 … Ck are all the attributes in R that are not As or Bs then A1 … An =>> C1 … Ck is also true. Fourth Normal Form ------------------ A relation is in fourth normal form iff whenever A1 … An =>> B1 … Bm is a non-trivial MVD, then A1 … An is a superkey. The notions of keys and superkeys depend on f.d.s only; adding MVDs does not change the definition of "key". To decompose a relation into fourth normal form, use an algorithm similar to BCNF decomposition algorithm using MVDs. Relations in 4NF ⊆ Relations in BCNF ⊆ Relations in 3NF. 4NF decomposition ----------------- Given a relation R where A1 … An =>> B1 … Bm violates the 4NF, decompose R into: R1(A1,…,An,B1,..,Bm) R2 contains all attributes except for B1,…,Bm. If the resulting relations are not in 4NF, then continue decomposing until they are. Discovering MVDs ---------------- Similar to chase algorithm for regular f.d.s. Relation R and a set of f.d.s F. Checking if X->Y is a regular f.d. 1. Construct a canonical relation with two tuples that agree on X. 2. Run the chase algorithm on the canonical relation using F. 3. If in the end two tuples agree on all attributes of Y, then X->Y holds; otherwise it does not. R(A, B, C, D, E, F) F = {AB->C, BC->AD, D->E, CF->B} Does AB->D hold? A B C D E F a b c1 d1 e1 f1 a b c2 d2 e2 f2 AB->C: A B C D E F a b c1 d1 e1 f1 a b c1 d2 e2 f2 BC->AD: A B C D E F a b c1 d1 e1 f1 a b c1 d1 e2 f2 D->E: A B C D E F a b c1 d1 e1 f1 a b c1 d1 e1 f2 Two tuples agree on the D attribute, so AB->D holds Extension for MVDs: ------------------- R(A,B,C,D) F = {A->B} MVD: B->>C Check if A->>C holds: A B C D a b1 c d1 a b c2 d Target tuple: (a, b, c, d) Apply A->B: A B C D a b c d1 a b c2 d Apply MVD B->>C: A B C D a b c d1 a b c2 d a b c2 d1 a b c d A->>C holds! ------------- Decompositions: R(A,B,C,D,E,F,G) ACD->E EC->F AE->BC R1(A,B,C,D,E) R2(A,E,C,F) R3(B,C,G) Rnew = R1*R2*R3 A B C D E F G a b c d e f1 g1 a b2 c d2 e f g2 a3 b c d3 e3 f3 g EC->F A B C D E F G a b c d e f g1 a b2 c d2 e f g2 a3 b c d3 e3 f3 g AE->BC A B C D E F G a b c d e f g1 a b c d2 e f g2 a3 b c d3 e3 f3 g This is a lossy decomposition! and this is a counter example. R(A,B,C) A->B R1(A,B) R2(B,C) A B C a b c1 a2 b c Lossy! R1 (a,b) (a2,b) R2 (b,c) (b,c1) R1*R2 A B C a b c1 a b c a2 b c a2 b c1 R(A,B,C) A->B R1(A,B) R2(A,C) A B C a b c1 a b2 c A->B A B C a b c1 a b c <- no subscript! Not lossy R1: (a,b) R2: (a,c1), (a,c) R1*R2 = R ---------------- 3NF Decomposition: R(A,B,C,D,E,F,G) ACD->E EC->F x AE->BC x Keys: ACDG, AEDG in 3NF? R1(A,C,D,E) {ACD->E, AE->C} keys: ACD, ADE R2(E,C,F) {EC->F} key: EC R3(A,B,C,E) {AE->BC} key: AE R4(A,C,D,G) {} key: ACDG Chase! A B C D E F G a b1 c d e f1 g1 a2 b2 c d2 e f g2 a b c d3 e f3 g3 a b4 c d e4 f4 g ACD->E A B C D E F G a b1 c d e f1 g1 a2 b2 c d2 e f g2 a b c d3 e f3 g3 a b4 c d e f4 g EC->F A B C D E F G a b1 c d e f g1 a2 b2 c d2 e f g2 a b c d3 e f g3 a b4 c d e f g AE->BC A B C D E F G a b c d e f g1 a2 b2 c d2 e f g2 a b c d3 e f g3 a b c d e f g <--- no subscript! Lossless decomposition! ---------------- BCNF Decomposition! Students(RIN, Name, Email, Major) RIN->Name Email x violates BCNF Key: RIN, Major RIN+ = {RIN, Name, Email) S1(RIN, Name, Email) {RIN->Name Email} Key: RIN, in BCNF S2(RIN, Major) {} Key: RIN Major, in BCNF ----------- MusicGroups(GroupName, ArtistName, DoB, YearFounded, YearJoined) GroupName -> YearFounded x violates BCNF ArtistName -> DoB x violates BCNF GroupName ArtistName -> YearJoined ok Key: GroupName ArtistName GroupName -> YearFounded GroupName+ = {GroupName, YearFounded} MG1(GroupName, YearFounded) GroupName -> YearFounded Key: GroupName, in BCNF MG2(GroupName, ArtistName, DoB, YearJoined) ArtistName -> DoB X violates BCNF GroupName ArtistName -> YearJoined ok Key: ArtistName, GroupName Not in BCNF ArtistName -> DoB MG21(ArtistName, DoB) ArtistName -> DoB Key: ArtistName, in BCNF MG22(GroupName, ArtistName, YearJoined) GroupName ArtistName -> YearJoined Key: GroupName ArtistName in BCNF Result: MG1(GroupName, YearFounded) MG21(ArtistName, DoB) MG22(GroupName, ArtistName, YearJoined) ------------------- R(A,B,C,D,E,F,G) AC->DEF AF->BG x violates BCNF BG->D x violates BCNF Keys: AC Take out! AF->BG AF+ = {A,F,B,G,D} R1(A,B,D,F,G) {AF->BG, BG->D}, Key: AF, not in BCNF R2(A,C,E,F) {AC->EF}, Key: AC, in BCNF Take out BG->D R11(B,D,G) BG->D in BCNF R12(A,B,F,G) AF->BG in BCNF Result: R2(A,C,E,F) {AC->EF}, Key: AC, in BCNF R11(B,D,G) {BG->D} in BCNF R12(A,B,F,G) {AF->BG} in BCNF Is this dependency preserving? Original F = {AC->DEF, AF->BG, BG->D} FD = {AC->EF,BG->D,AF->BG} In FD, AC+ = {A,C,E,F,B,G,D} since DEF in AC+, then AC->DEF is preserved! ------- A multi-valued dependency of the form A1 ... An =>> B1 ... Bm means that for all pairs of tuples t1 and t2 that agree on A, we can find a tuple v in R such that v agrees with t1 and t2 on As v agrees with t1 on B’s v agrees with t2 on the remaining attributes (not As or Bs) hero weaponsused hobby tuple lara ice axe relic collecting t1 lara compound bow motorcycling t2 lara ice axe motorcycling lara compound bow relic collecting v hosts(hostid, hostname, hosturl) ------ listings(listingid, name, hostid, neighbordhood, latitde, longitude, roomtype, --------- price, minimumnights, numberofreviews) calendar(listingid, date, available, price) --------------- reviews(reviewid, listingid, date, reviewerid, reviewername, comments) --------