Exam #1 Review Lecture ---------------------------- Relational Data Model: - First normal form - Basic definition of a key ------------------------------------------------------------------- Relational Algebra: Operations: - Select - Project - Cartesian Product/join/natural join - Rename - Set union - Set intersection - Set difference Basic queries: - Simple join queries - Simple set queries (union/intersection) - Set subtraction (never did something) - Complex joins (a pair, at least two/three of something, exactly one of something, max value, etc.) hosts((hostid), hostname, hosturl, hostsince, hostlocation, hostabout,hostresponsetime, hostacceptancerate, hostissuperhost, hostidentityverified) neighborhoods((neighbourhoodgroup,neighbourhood)) listings((listingid),name, hostid,neighbourhood,latitude,longitude,roomtype, price, minimumnights,numberofreviews, scoresrating, scoresaccuracy,scorescleanliness, scorescheckin, scorescommunication,scoreslocation) calendar((listingid, date), available, price) reviews((reviewid), listingid, date, reviewerid, reviewername, comments) (a) Return id, name, latitude and longitude of listings that are available in two consecutive days in November 2019. Return their name and ID. (Note: you can compare two days as follows:day1 = day2+1to check that they are consecutive.) listings c = select_{ date>= '1-11-2019' and date<= '31-11-2019'} (calendar c1(l1, d1, a1, p1) = c R1 = c1 join_{l1=listingid and d1 = date+1} c R2 = project_{listingid, name, latitute, longitude} (R1 * listings) (a1) Return id, name, latitude and longitude of listings that are not available in any two consecutive days in November 2019. c = select_{ date>= '1-11-2019' and date<= '31-11-2019'} (calendar c1(l1, d1, a1, p1) = c R1 = c1 join_{l1=listingid and d1 = date+1} c R2 = project_{listingid} (listings) --> all listings R3 = project_{listingid} (R1) --> all listings with a 1 day apart availability R4 = R2 - R3 Result = project_{listingid, name, latitute, longitude} (R4 * listings) -------- Find names of all people who host homes or write reviews R1[name] = project_{hostname} (Hosts) R2[name] = project_{reviewername} (Reviews) Result = R1 union R1 ---------------- Find name of all hosts who are verified or live in NY. R1[name] = project_{hostname} (select_{address='NY'} Hosts) R2[name] = project_{hostname} (select_{isverified=True} Hosts) Result = R1 union R1 Result = project_{hostname} (select_{address='NY' or isverified=True} Hosts) ------------------- Find hosts who have no listings R1 = project_{hostid}(listings) -- Find hosts who have listing Result = project_{hostid} (Hosts) - R1 Listings: h1 and h2 has listings, but not h3 R1[hostid1] = project_{hostid}(listings) hostid1 h1 h2 R2 = Project_{hostid} (Hosts) hostid h1 h2 h3 ------------------------ R3 = R1 join_{hostid<>hostid1} R2 R1 x R2 h1 h1 h1 h2 h1 h3 h2 h1 h2 h2 h2 h3 select_{hostid<>hostid1} h1 h2 h1 h3 h2 h1 h2 h3 project_{hostid} (select_{hostid<>hostid1}) h1 h2 h3 -------- R1[hostid1] = project_{hostid}(listings) R2= Hosts join_{hostid<>hostid1} R1 Result = project_{hostid} (R2) incorrect!!!!! ------ Find hosts who are not verified R1 = select_{isverified=True} Hosts Result = Hosts - R1 Result = select_{isverified=False} Hosts ------------------------------------------------------------------- Normalization - Definition of functional dependencies (fds) - Inference rules for fds - Closure of a set of fds, equivalence of two sets of fds - Closure of a set of attributes - Finding keys/superkeys/prime attributes given a set of fds - Finding minimal basis - Checking if a relation is in 3NF - Checking if a relation is in BCNF - Basic definitions of lossless and dependency preserving decompositions - Finding projection of a set of fds to a decomposed relation - Checking if a decomposition is lossless - Checking if a decomposition is dependency preserving - 3NF Decomposition - BCNF Decomposition - Basic idea of 4NF -------------------------------------------- X->Y If X->Y Y->Z then X->Z If X->YZ then X->Y, X->Z If X->Y the AX->AY If X->Y, X->Z then X->YZ If X subset of Y, then Y->X Set F of fds, F+ closure of F if F1 and F2 have F1+ = F2+, F1 and F2 are equivalent Closure of a set of attributes X, X+={X} Is X->Y implied by F (is X->Y in F+), check if Y is in X+. Given a relation R and set of fds F, X is key if: 1. - X+ is all attributes (uniqueness) 2. - no subset of X is a key. (minimality) Given a relation R and set of fds F, X is superkey if: 1. - X+ is all attributes (uniqueness) (All keys are superkeys as well.) Prime attribute: is an attribute in one key. R(A,B,C,D,E,F,G) F={AC->BD, ACD->AEFG, BC->B, EFG->AC} F={AC->BD, AC->EFG, EFG->AC} Key: AC, EFG AC+ = {A,C,B,D,E,F,G} Prime attribues: A,C,E,F,G BCNF: All non-trivial fds, should have a superkey on the left. R(A,B,C,D) {AB->C} Key:ABD, not in BCNF because ABD is not a superkey 3NF: All non-trivial fds, should have a superkey on the left or all prime attributes on the right. If a relation is in BCNF, then it is in 3NF. R(A,B,C,D) {AB->C, BC->A} Keys: ABD, BCD Prime attributes: all of them Not in BCNF because AB (and BC) is not a superkey But in 3NF because rhs is a prime attribute in both. Lossless decomposition: R-> R1, and R2 (decompose using projection), R1 * R2 = R R(A,B,C,D) {AB->C} R1(A,B,D) R2(B,C) R3(C,D) (Chase algorithm) A B C D a b c1 d a2 b c d2 a3 b3 c d R(A,B,C,D) {AB->C} R1(A,B,D) R2(A,B,C) A B C D a b c1 d a b c d2 Use AB->C A B C D a b c d <- no subscript, this is a lossless a b c d2 Dependency Preserving Decomposition ----------------------------------- R(A,B,C,D) F = {A->B, BC->D} R1(A,B,D) F1={A->B} -- projection of fds A+={A,B} B+={B} D+={D} AB+={A,B} AD+={A,D} R2(A,B,C) F2 ={A->B} A+={A,B} B+={B} C+={C} AB+={A,B} AC+={A,B,C,D} BC+={B,C,D} Is F1 union F2 equivalent to F R(A,B,C,D) F = {A->B, BC->D} R3(A,C,D) F3={AC->D} A+={A,B} C+={C} D+={D} AC+={A,B,C,D} AD+={A,D} CD+={C,D} R4(A,B,C) F4={A->B} F3 union F4 = {A->B, AC->D} equivalent F= {A->B, BC->D} ???? (already know everything in F3 union F4 implied by F) Is everything in F implied by F3 union F4: A->B true, already in F3 union F4 BC->D, BC+ (using F3 union F4): BC+ = {B,C}. Since D is not BC+, then BC->D is lost This is not a dependency preserving decomposition. --------------------------- 3NF Decomposition (set of fds in minimal basis) R(A,B,C,D,E,F) {AB->C, BC->A, AD->E} Key: ABDF, BCDF R1(A,B,C) AB->C -> remove R2(A,B,C) BC->A R3(A,D,E) AD->E R4(A,B,D,F) {} R1(A,B,C) {AB->C, BC->A} Key: AB, BC R2(A,D,E) {AD->E} Key: AD R4(A,B,D,F) {} Key: ABDF 3NF decomposition: guaranteed to be lossless and dependency preserving -------------------------------------------- BCNF Decomposition R(A,B,C,D,E) {AB->C, C->D} Key: ABE Pick AB->C AB+={A,B,C,D} R1(A,B,C,D) {AB->C, C->D}, Key: AB not in BCNF, C->D violates it R11(C,D) {C->D} Key: C, in BCNF R12(A,B,C) {AB->C} Key: AB in BCNF R2(A,B,E) {} Key: ABE in BCNF BCNF decomposition: guaranteed to be lossless -------------------------------------------- ER Diagrams - Entities: basic rules (key/simple attributes) - Relationships: basic rules (what to connect to) - Participation constraints - Ternary (or higher order) relationhips (including checking whether they can be decomposed further) - Weak entities - Hierarchies - Converting basic ER diagrams to relational data model - Converting models with weak entities to relational data model - Converting models with hierarchies to relational data model ---------- -> Converting N to 1 to 1 or N to N to 1 ternary relationship in ER diagram to a relation. Basically just when the relationship isn't 1 to 1 to 1 or N to N to N -> In a ER diagram, if a weak entity A connects to a weak entity B connects to a normal entity C, what is the key for A? -> Can you go over creating a relational model from an ER diagram when there are hierarchies? -> What is a relationship/entity? -> BCNF decomp -> Complex Relational Algebra Queries. We are trying to express Multiple Values between Two Entities (e.g., Multiple Comments between Users and Posts), then must we always create this as a Ternary Relationship, since using a Binary Relationship would limit you to one (i.e., a Given User would only be able comment on a Given Post once). -> Minimality of keys R4(A,B,C,D,E,F,G) F={AC->BD, ACD->AEFG, BC->B, EFG->AC} ----------------------- 3NF Decomposition vs BCNF Decomposition bottom up top down R(A,B,C,D,E) AB->C C->ABD Key: ABE, CE (not in BCNF, or in 3NF) ------------ 3NF Decomposition R2(A,B,C,D) C->ABD, AB->C Key:C, AB R3(A,B,E) Key: ABE ------------ BCNF Decomposition R(A,B,C,D,E) AB->C x C->ABD x Key: ABE, CE ---- AB->C AB+={A,B,C,D} R1(A,B,C,D) AB->C, C->ABD R2(A,B,E) Nothing --------------- R2(A,B,E) Students(RIN,Advisor,Email) Multiple advisors, multiple emails RIN Advisor Email 1 X E1 1 Y E2 ------------- S1RIN, Advisor RIN, Email ----------------- F = {AC->D,AC->E,BE->F,AFG->B} R1(A,B,C,F,G) A+ = {A} B+ = {B} C+ = {C} F+ = {F} G+ = {G} AC+ = {A,C,D,E} AF+ = AG BC BF BG CF CG FG R2(A,B,C,D,E) ---------------------------- people(id, name, fatherid, motherid, dob) Pair of ids of biological siblings: r1[id1,name1,f1,m1, dob1] = people r2[id2,name2,f2,m2, dob2] = people r3 = r1 join_{id1<>id2 and f1=f2 and m1=m2} r2 Twin or higher....: r3 = r1 join_{id1<>id2 and f1=f2 and m1=m2 and dob1=dob2} r2 r32 = project_{id1,id2} (r3) --> id1,id2 have the same mother,father and born on the same r4[id4,name4,f4,m4, dob4] = people r5 = r3 join_{f4=f2 and m4=m2 and dob4=dob2 and id4<>id1 and id4<>id2} r4 r6 = project_{id1,id2} (r5) ---> id1,id2 has a third sibling born on the same day Result = r32 - r6 --> id1,id2 have the same mother,father and born on the same have no third sibling born on the same day = Twins ----------------