Test #1 Review Lecture ---------------------------- Relational Data Model: - First normal form - Basic definition of a key ------------------------------------------------------------------- Relational Algebra: Operations: - Select - Project - Cartesian Product/join/natural join - Rename - Set union - Set intersection - Set difference Basic queries: - Simple join queries - Simple set queries (union/intersection) - Set subtraction (never did something) - Complex joins (a pair, at least two/three of something, exactly one of something, max value, etc.) hosts((hostid), hostname, hosturl, hostsince, hostlocation, hostabout,hostresponsetime, hostacceptancerate, hostissuperhost, hostidentityverified) neighborhoods((neighbourhoodgroup,neighbourhood)) listings((listingid),name, hostid,neighbourhood,latitude,longitude,roomtype, price, minimumnights,numberofreviews, scoresrating, scoresaccuracy,scorescleanliness, scorescheckin, scorescommunication,scoreslocation) calendar((listingid, date), available, price) reviews((reviewid), listingid, date, reviewerid, reviewername, comments) (a) Return id, name, latitude and longitude of listings that are available in two consecutive days in November 2019. Return their name and ID. (Note: you can compare two days as follows:day1 = day2+1 to check that they are consecutive.) listings c = select_{ date>= '1-11-2019' and date<= '31-11-2019'} (calendar) c1(l1, d1, a1, p1) = c R1 = c1 join_{l1=listingid and d1 = date+1} c R2 = project_{listingid, name, latitute, longitude} (R1 * listings) (a1) Return id, name, latitude and longitude of listings that are not available in any two consecutive days in November 2019. c = select_{ date>= '1-11-2019' and date<= '31-11-2019'} (calendar c1(l1, d1, a1, p1) = c R1 = c1 join_{l1=listingid and d1 = date+1} c R2 = project_{listingid} (listings) --> all listings R3 = project_{listingid} (R1) --> all listings with a 1 day apart availability R4 = R2 - R3 Result = project_{listingid, name, latitute, longitude} (R4 * listings) -------- Find names of all people who host homes or write reviews R1[name] = project_{hostname} (Hosts) R2[name] = project_{reviewername} (Reviews) Result = R1 union R1 ---------------- Find name of all hosts who are verified or live in NY. R1[name] = project_{hostname} (select_{address='NY'} Hosts) R2[name] = project_{hostname} (select_{isverified=True} Hosts) Result = R1 union R1 Result = project_{hostname} (select_{address='NY' or isverified=True} Hosts) ------------------- Find hosts who have no listings R1 = project_{hostid}(listings) -- Find hosts who have listing Result = project_{hostid} (Hosts) - R1 Listings: h1 and h2 has listings, but not h3 R1[hostid1] = project_{hostid}(listings) hostid1 h1 h2 R2 = Project_{hostid} (Hosts) hostid h1 h2 h3 ------------------------ R3 = R1 join_{hostid<>hostid1} R2 R1 x R2 h1 h1 h1 h2 h1 h3 h2 h1 h2 h2 h2 h3 select_{hostid<>hostid1} h1 h2 h1 h3 h2 h1 h2 h3 project_{hostid} (select_{hostid<>hostid1}) h1 h2 h3 -------- R1[hostid1] = project_{hostid}(listings) R2= Hosts join_{hostid<>hostid1} R1 Result = project_{hostid} (R2) incorrect!!!!! ------ Find hosts who are not verified R1 = select_{isverified=True} Hosts Result = Hosts - R1 Result = select_{isverified=False} Hosts ------------------------------------------------------------------- Normalization - Definition of functional dependencies (fds) - Inference rules for fds - Closure of a set of fds, equivalence of two sets of fds - Closure of a set of attributes - Finding keys/superkeys/prime attributes given a set of fds - Finding minimal basis - Checking if a relation is in 3NF - Checking if a relation is in BCNF - Basic definitions of lossless and dependency preserving decompositions - Finding projection of a set of fds to a decomposed relation - Checking if a decomposition is lossless - Checking if a decomposition is dependency preserving - 3NF Decomposition - BCNF Decomposition - MVDs, 4NF, 4NF decomposition -------------------------------------------- X->Y If X->Y Y->Z then X->Z If X->YZ then X->Y, X->Z If X->Y the AX->AY If X->Y, X->Z then X->YZ If X subset of Y, then Y->X Set F of fds, F+ closure of F if F1 and F2 have F1+ = F2+, F1 and F2 are equivalent Closure of a set of attributes X, X+={X} Is X->Y implied by F (is X->Y in F+), check if Y is in X+. Given a relation R and set of fds F, X is key if: 1. - X+ is all attributes (uniqueness) 2. - no subset of X is a key. (minimality) Given a relation R and set of fds F, X is superkey if: 1. - X+ is all attributes (uniqueness) (All keys are superkeys as well.) Prime attribute: is an attribute in one key. R(A,B,C,D,E,F,G) F={AC->BD, ACD->AEFG, BC->B, EFG->AC} F={AC->BD, AC->EFG, EFG->AC} Key: AC, EFG AC+ = {A,C,B,D,E,F,G} Prime attribues: A,C,E,F,G BCNF: All non-trivial fds, should have a superkey on the left. R(A,B,C,D) {AB->C} Key:ABD, not in BCNF because ABD is not a superkey 3NF: All non-trivial fds, should have a superkey on the left or all prime attributes on the right. If a relation is in BCNF, then it is in 3NF. R(A,B,C,D) {AB->C, BC->A} Keys: ABD, BCD Prime attributes: all of them Not in BCNF because AB (and BC) is not a superkey But in 3NF because rhs is a prime attribute in both. Lossless decomposition: R-> R1, and R2 (decompose using projection), R1 * R2 = R R(A,B,C,D) {AB->C} R1(A,B,D) R2(B,C) R3(C,D) (Chase algorithm) A B C D a b c1 d a2 b c d2 a3 b3 c d R(A,B,C,D) {AB->C} R1(A,B,D) R2(A,B,C) A B C D a b c1 d a b c d2 Use AB->C A B C D a b c d <- no subscript, this is a lossless a b c d2 Dependency Preserving Decomposition ----------------------------------- When computing dependency projections, it only makes sense to consider attributes that appear in the left hand side and their combinations that are in the given relation. R(A,B,C,D) F = {A->B, BC->D} R1(A,B,D) F1={A->B} -- projection of fds A+={A,B} B+={B} D+={D} AB+={A,B} AD+={A,B,D} BD+={B,D} R2(A,B,C) F2 ={A->B} A+={A,B} B+={B} C+={C} AB+={A,B} AC+={A,B,C,D} BC+={B,C,D} Is F1 union F2 equivalent to F R(A,B,C,D) F = {A->B, BC->D} R3(A,C,D) F3={AC->D} A+={A,B} C+={C} D+={D} AC+={A,B,C,D} AD+={A,D} CD+={C,D} R4(A,B,C) F4={A->B} F3 union F4 = {A->B, AC->D} equivalent F= {A->B, BC->D} ???? (already know everything in F3 union F4 implied by F) Is everything in F implied by F3 union F4: A->B true, already in F3 union F4 BC->D, BC+ (using F3 union F4): BC+ = {B,C}. Since D is not BC+, then BC->D is lost This is not a dependency preserving decomposition. --------------------------- 3NF Decomposition (set of fds in minimal basis) R(A,B,C,D,E,F) {AB->C, BC->A, AD->E} Key: ABDF, BCDF R1(A,B,C) AB->C -> remove R2(A,B,C) BC->A R3(A,D,E) AD->E R4(A,B,D,F) {} R1(A,B,C) {AB->C, BC->A} Key: AB, BC R2(A,D,E) {AD->E} Key: AD R4(A,B,D,F) {} Key: ABDF 3NF decomposition: guaranteed to be lossless and dependency preserving -------------------------------------------- BCNF Decomposition R(A,B,C,D,E) {AB->C, C->D} Key: ABE Pick AB->C AB+={A,B,C,D} R1(A,B,C,D) {AB->C, C->D}, Key: AB not in BCNF, C->D violates it R11(C,D) {C->D} Key: C, in BCNF R12(A,B,C) {AB->C} Key: AB in BCNF R2(A,B,E) {} Key: ABE in BCNF BCNF decomposition: guaranteed to be lossless -------------------------------------------- ER Diagrams - Entities: basic rules (key/simple attributes) - Relationships: basic rules (what to connect to) - Participation constraints - Ternary (or higher order) relationships (including checking whether they can be decomposed further) - Weak entities - Hierarchies - Converting basic ER diagrams to relational data model - Converting models with weak entities to relational data model - Converting models with hierarchies to relational data model ---------- -> If you could possibly go over 3NF and BCNF decomposition and checking for lossless and dependencies -> If we have a ternary relationship with two entities that have 0..N participation and a third that has 1..1 participation, how would we convert that to relations? See example from course notes: Airports(code, name) Key: code Flights(id, depAirportCode, depDate, depTime, arrAirportCode, arrDate, ArrTime) Key: id PricingPlans(flightId, name, conditions) Key: flightid, name Passengers(id, name, phone, seatPref) Key: id Reservations(passengerId, flightid, seat) Key: passengerId, flightid Or from Lecture 7.1: Students(RIN, Name, Class, Email) Key: RIN Classes(CN, Semester, Year, Section, CourseCode) Key: CRN Major(Name, offeredby_dcode) Key:Name Departments(dcode, name, phone, location, head_RIN, headStartDate, termLength) Key: dcode FacultyStaff(RIN, Name, Email, Title, WorkIn_dcode, family_RIN) Key: RIN Courses(code, title) Key: code StudentsHaveMajors(RIN, MajorName) Key: RIN, MajorName TakesClasses(RIN, CRN, Grade) Key: RIN, CRN AdvisedBy(StudentRIN, FacultyRIN, MajorName) Key: StudentRIN, MajorName -> Could we get an example of how a 3NF relation could cause problems, versus a BCNF relation? ----------------------- 3NF Decomposition vs BCNF Decomposition bottom up top down R(A,B,C,D,E) AB->C C->ABD Key: ABE, CE (not in BCNF, or in 3NF) ------------ 3NF Decomposition R2(A,B,C,D) C->ABD, AB->C Key:C, AB R3(A,B,E) Key: ABE ------------ BCNF Decomposition R(A,B,C,D,E) AB->C x C->ABD x Key: ABE, CE ---- AB->C AB+={A,B,C,D} R1(A,B,C,D) AB->C, C->ABD R2(A,B,E) Nothing --------------- R2(A,B,E) Students(RIN,Advisor,Email) Multiple advisors, multiple emails RIN Advisor Email 1 X E1 1 Y E2 ------------- S1RIN, Advisor RIN, Email ----------------- F = {AC->D,AC->E,BE->F,AFG->B} R1(A,B,C,F,G) A+ = {A} B+ = {B} C+ = {C} F+ = {F} G+ = {G} AC+ = {A,C,D,E} AF+ = AG BC BF BG CF CG FG R2(A,B,C,D,E) ---------------------------- people(id, name, fatherid, motherid, dob) Pair of ids of biological siblings: r1[id1,name1,f1,m1, dob1] = people r2[id2,name2,f2,m2, dob2] = people r3 = r1 join_{id1<>id2 and f1=f2 and m1=m2} r2 Twin or higher....: r3 = r1 join_{id1<>id2 and f1=f2 and m1=m2 and dob1=dob2} r2 r32 = project_{id1,id2} (r3) --> id1,id2 have the same mother,father and born on the same r4[id4,name4,f4,m4, dob4] = people r5 = r3 join_{f4=f2 and m4=m2 and dob4=dob2 and id4<>id1 and id4<>id2} r4 r6 = project_{id1,id2} (r5) ---> id1,id2 has a third sibling born on the same day Result = r32 - r6 --> id1,id2 have the same mother,father and born on the same have no third sibling born on the same day = Twins ---------------- MVDs and 4NF ============ R(name, street, city, title, year) FD = { } Key: name, street, city, title name street city title year C. Fisher 123 Maple St. Hollywood Star Wars 1977 C. Fisher 5 Locust St. Malibu Star Wars 1977 C. Fisher 123 Maple St. Hollywood Empire Strikes Back 1980 C. Fisher 5 Locust St. Malibu Empire Strikes Back 1980 C. Fisher 123 Maple St. Hollywood Return of the Jedi 1983 C. Fisher 5 Locust St. Malibu Return of the Jedi 1983 MVDs = { name ->> street city name ->> title year } name ->> street city violates 4NF, so not in 4NF R1(name, street, city) FD1 = { } Keys: { name street city } MVD1 = { name ->> street city is trivial } R2(name, title, year) FD2 = { } Keys: { name title year } MVD2 = { name ->> title year is trivial } in 4NF.