Exam #1 Review Lecture
----------------------------
Relational Data Model:
- First normal form
- Basic definition of a key
-------------------------------------------------------------------
Relational Algebra:
Operations:
- Select
- Project
- Cartesian Product/join/natural join
- Rename
- Set union
- Set intersection
- Set difference
Basic queries:
- Simple join queries
- Simple set queries (union/intersection)
- Set subtraction (never did something)
- Complex joins (a pair, at least two/three of something, exactly one of something, max value, etc.)
hosts((hostid), hostname, hosturl, hostsince, hostlocation, hostabout,hostresponsetime, hostacceptancerate, hostissuperhost, hostidentityverified)
neighborhoods((neighbourhoodgroup,neighbourhood))
listings((listingid),name, hostid,neighbourhood,latitude,longitude,roomtype,
price, minimumnights,numberofreviews, scoresrating,
scoresaccuracy,scorescleanliness, scorescheckin,
scorescommunication,scoreslocation)
calendar((listingid, date), available, price)
reviews((reviewid), listingid, date, reviewerid, reviewername, comments)
(a) Return id, name, latitude and longitude of listings that are available
in two consecutive days in November 2019. Return their name and
ID. (Note: you can compare two days as follows:day1 = day2+1to check
that they are consecutive.)
listings
c = select_{ date>= '1-11-2019' and date<= '31-11-2019'} (calendar
c1(l1, d1, a1, p1) = c
R1 = c1 join_{l1=listingid and d1 = date+1} c
R2 = project_{listingid, name, latitute, longitude} (R1 * listings)
(a1) Return id, name, latitude and longitude of listings that are not
available in any two consecutive days in November 2019.
c = select_{ date>= '1-11-2019' and date<= '31-11-2019'} (calendar
c1(l1, d1, a1, p1) = c
R1 = c1 join_{l1=listingid and d1 = date+1} c
R2 = project_{listingid} (listings) --> all listings
R3 = project_{listingid} (R1) --> all listings with a 1 day apart availability
R4 = R2 - R3
Result = project_{listingid, name, latitute, longitude} (R4 * listings)
--------
Find names of all people who host homes or write reviews
R1[name] = project_{hostname} (Hosts)
R2[name] = project_{reviewername} (Reviews)
Result = R1 union R1
----------------
Find name of all hosts who are verified or live in NY.
R1[name] = project_{hostname} (select_{address='NY'} Hosts)
R2[name] = project_{hostname} (select_{isverified=True} Hosts)
Result = R1 union R1
Result = project_{hostname} (select_{address='NY' or isverified=True} Hosts)
-------------------
Find hosts who have no listings
R1 = project_{hostid}(listings) -- Find hosts who have listing
Result = project_{hostid} (Hosts) - R1
Listings: h1 and h2 has listings, but not h3
R1[hostid1] = project_{hostid}(listings)
hostid1
h1
h2
R2 = Project_{hostid} (Hosts)
hostid
h1
h2
h3
------------------------
R3 = R1 join_{hostid<>hostid1} R2
R1 x R2
h1 h1
h1 h2
h1 h3
h2 h1
h2 h2
h2 h3
select_{hostid<>hostid1}
h1 h2
h1 h3
h2 h1
h2 h3
project_{hostid} (select_{hostid<>hostid1})
h1
h2
h3
--------
R1[hostid1] = project_{hostid}(listings)
R2= Hosts join_{hostid<>hostid1} R1
Result = project_{hostid} (R2) incorrect!!!!!
------
Find hosts who are not verified
R1 = select_{isverified=True} Hosts
Result = Hosts - R1
Result = select_{isverified=False} Hosts
-------------------------------------------------------------------
Normalization
- Definition of functional dependencies (fds)
- Inference rules for fds
- Closure of a set of fds, equivalence of two sets of fds
- Closure of a set of attributes
- Finding keys/superkeys/prime attributes given a set of fds
- Finding minimal basis
- Checking if a relation is in 3NF
- Checking if a relation is in BCNF
- Basic definitions of lossless and dependency preserving decompositions
- Finding projection of a set of fds to a decomposed relation
- Checking if a decomposition is lossless
- Checking if a decomposition is dependency preserving
- 3NF Decomposition
- BCNF Decomposition
- Basic idea of 4NF
--------------------------------------------
X->Y
If X->Y Y->Z then X->Z
If X->YZ then X->Y, X->Z
If X->Y the AX->AY
If X->Y, X->Z then X->YZ
If X subset of Y, then Y->X
Set F of fds, F+ closure of F
if F1 and F2 have F1+ = F2+, F1 and F2 are equivalent
Closure of a set of attributes X, X+={X}
Is X->Y implied by F (is X->Y in F+), check if Y is in X+.
Given a relation R and set of fds F, X is key if:
1. - X+ is all attributes (uniqueness)
2. - no subset of X is a key. (minimality)
Given a relation R and set of fds F, X is superkey if:
1. - X+ is all attributes (uniqueness)
(All keys are superkeys as well.)
Prime attribute: is an attribute in one key.
R(A,B,C,D,E,F,G) F={AC->BD, ACD->AEFG, BC->B, EFG->AC}
F={AC->BD, AC->EFG, EFG->AC}
Key: AC, EFG
AC+ = {A,C,B,D,E,F,G}
Prime attribues: A,C,E,F,G
BCNF: All non-trivial fds, should have a superkey on the left.
R(A,B,C,D) {AB->C} Key:ABD, not in BCNF because ABD is not a superkey
3NF: All non-trivial fds, should have a superkey on the left or all
prime attributes on the right.
If a relation is in BCNF, then it is in 3NF.
R(A,B,C,D) {AB->C, BC->A} Keys: ABD, BCD Prime attributes: all of them
Not in BCNF because AB (and BC) is not a superkey
But in 3NF because rhs is a prime attribute in both.
Lossless decomposition:
R-> R1, and R2 (decompose using projection), R1 * R2 = R
R(A,B,C,D) {AB->C}
R1(A,B,D)
R2(B,C)
R3(C,D)
(Chase algorithm)
A B C D
a b c1 d
a2 b c d2
a3 b3 c d
R(A,B,C,D) {AB->C}
R1(A,B,D)
R2(A,B,C)
A B C D
a b c1 d
a b c d2
Use AB->C
A B C D
a b c d <- no subscript, this is a lossless
a b c d2
Dependency Preserving Decomposition
-----------------------------------
R(A,B,C,D) F = {A->B, BC->D}
R1(A,B,D) F1={A->B} -- projection of fds
A+={A,B}
B+={B}
D+={D}
AB+={A,B}
AD+={A,D}
R2(A,B,C) F2 ={A->B}
A+={A,B}
B+={B}
C+={C}
AB+={A,B}
AC+={A,B,C,D}
BC+={B,C,D}
Is F1 union F2 equivalent to F
R(A,B,C,D) F = {A->B, BC->D}
R3(A,C,D) F3={AC->D}
A+={A,B}
C+={C}
D+={D}
AC+={A,B,C,D}
AD+={A,D}
CD+={C,D}
R4(A,B,C) F4={A->B}
F3 union F4 = {A->B, AC->D} equivalent F= {A->B, BC->D} ????
(already know everything in F3 union F4 implied by F)
Is everything in F implied by F3 union F4:
A->B true, already in F3 union F4
BC->D, BC+ (using F3 union F4): BC+ = {B,C}. Since D is not BC+, then BC->D is lost
This is not a dependency preserving decomposition.
---------------------------
3NF Decomposition (set of fds in minimal basis)
R(A,B,C,D,E,F) {AB->C, BC->A, AD->E} Key: ABDF, BCDF
R1(A,B,C) AB->C
-> remove R2(A,B,C) BC->A
R3(A,D,E) AD->E
R4(A,B,D,F) {}
R1(A,B,C) {AB->C, BC->A} Key: AB, BC
R2(A,D,E) {AD->E} Key: AD
R4(A,B,D,F) {} Key: ABDF
3NF decomposition: guaranteed to be lossless and dependency preserving
--------------------------------------------
BCNF Decomposition
R(A,B,C,D,E) {AB->C, C->D} Key: ABE
Pick AB->C AB+={A,B,C,D}
R1(A,B,C,D) {AB->C, C->D}, Key: AB not in BCNF, C->D violates it
R11(C,D) {C->D} Key: C, in BCNF
R12(A,B,C) {AB->C} Key: AB in BCNF
R2(A,B,E) {} Key: ABE in BCNF
BCNF decomposition: guaranteed to be lossless
--------------------------------------------
ER Diagrams
- Entities: basic rules (key/simple attributes)
- Relationships: basic rules (what to connect to)
- Participation constraints
- Ternary (or higher order) relationhips (including checking whether they can be decomposed further)
- Weak entities
- Hierarchies
- Converting basic ER diagrams to relational data model
- Converting models with weak entities to relational data model
- Converting models with hierarchies to relational data model
----------
-> Converting N to 1 to 1 or N to N to 1 ternary relationship in ER
diagram to a relation. Basically just when the relationship isn't 1
to 1 to 1 or N to N to N
-> In a ER diagram, if a weak entity A connects to a weak entity B
connects to a normal entity C, what is the key for A?
-> Can you go over creating a relational model from an ER diagram when
there are hierarchies?
-> What is a relationship/entity?
-> BCNF decomp
-> Complex Relational Algebra Queries. We are trying to express
Multiple Values between Two Entities (e.g., Multiple Comments
between Users and Posts), then must we always create this as a
Ternary Relationship, since using a Binary Relationship would limit
you to one (i.e., a Given User would only be able comment on a
Given Post once).
-> Minimality of keys
R4(A,B,C,D,E,F,G) F={AC->BD, ACD->AEFG, BC->B, EFG->AC}
-----------------------
3NF Decomposition vs BCNF Decomposition
bottom up top down
R(A,B,C,D,E)
AB->C
C->ABD
Key: ABE, CE (not in BCNF, or in 3NF)
------------
3NF Decomposition
R2(A,B,C,D) C->ABD, AB->C Key:C, AB
R3(A,B,E) Key: ABE
------------
BCNF Decomposition
R(A,B,C,D,E)
AB->C x
C->ABD x
Key: ABE, CE
----
AB->C
AB+={A,B,C,D}
R1(A,B,C,D) AB->C, C->ABD
R2(A,B,E) Nothing
---------------
R2(A,B,E)
Students(RIN,Advisor,Email)
Multiple advisors, multiple emails
RIN Advisor Email
1 X E1
1 Y E2
-------------
S1RIN, Advisor
RIN, Email
-----------------
F = {AC->D,AC->E,BE->F,AFG->B}
R1(A,B,C,F,G)
A+ = {A} B+ = {B} C+ = {C} F+ = {F} G+ = {G}
AC+ = {A,C,D,E}
AF+ =
AG
BC
BF
BG
CF
CG
FG
R2(A,B,C,D,E)
----------------------------
people(id, name, fatherid, motherid, dob)
Pair of ids of biological siblings:
r1[id1,name1,f1,m1, dob1] = people
r2[id2,name2,f2,m2, dob2] = people
r3 = r1 join_{id1<>id2 and f1=f2 and m1=m2} r2
Twin or higher....:
r3 = r1 join_{id1<>id2 and f1=f2 and m1=m2 and dob1=dob2} r2
r32 = project_{id1,id2} (r3) --> id1,id2 have the same mother,father and born on the same
r4[id4,name4,f4,m4, dob4] = people
r5 = r3 join_{f4=f2 and m4=m2 and dob4=dob2 and id4<>id1 and id4<>id2} r4
r6 = project_{id1,id2} (r5) ---> id1,id2 has a third sibling born on the same day
Result = r32 - r6 -->
id1,id2 have the same mother,father and born on the same
have no third sibling born on the same day
= Twins
----------------