Data Model: MarvelHeroes(hid, hero, realname) DCHeroes(hid, hero, realname) Movies(mid, moviename, year) HeroInMovie(hid, mid) Relational Algebra ---------------------- Set Operations: R Union S R Intersection S R Set Difference S Rename: Given R(A,B,C) T(A1, B1, C1) = R SELECTION SELECT_C R = {t in R such that t satisfies the boolean condition C} PROJECTION PROJECT_{A1,...AN} (R) = {t in R but only the values for the attributes A1,...,AN} CARTESIAN PRODUCT Assume R and S have no common attributes R x S = { t such that t has all the attributes in R and all the attributes in S, such that there is a tuple r in R and a tuple s in S where t is equal to r for attributes in R and to s for attributes in S } R(A,B) = {, } S(C,D) = {, , } T = R x S T(A,B,C,D) = {, , , , , } T1(hid1, hero, realname) = Marvel_Heroes HID1 HERO REALNAME ------------------------------------ m1 Spiderman Peter Parker m2 Black Panther T’Challa m3 Thor Thor m4 Jessica Jones Jessica Jones m5 Iron Man Tony Stark HeroInMovie hid Mid ------------- d1 1 m1 2 d3 3 d4 4 d5 5 m2 6 m3 7 m3 8 m3 9 m5 9 T2 = HeroInMovie x T1 hid Mid HID1 HERO REALNAME ------------------------------------------------- d1 1 m1 Spiderman Peter Parker m1 2 m1 Spiderman Peter Parker d3 3 m1 Spiderman Peter Parker d4 4 m1 Spiderman Peter Parker d5 5 m1 Spiderman Peter Parker m2 6 m1 Spiderman Peter Parker m3 7 m1 Spiderman Peter Parker m3 8 m1 Spiderman Peter Parker m3 9 m1 Spiderman Peter Parker m5 9 m1 Spiderman Peter Parker d1 1 m2 Black Panther T’Challa m1 2 m2 Black Panther T’Challa d3 3 m2 Black Panther T’Challa d4 4 m2 Black Panther T’Challa d5 5 m2 Black Panther T’Challa m2 6 m2 Black Panther T’Challa m3 7 m2 Black Panther T’Challa m3 8 m2 Black Panther T’Challa m3 9 m2 Black Panther T’Challa m5 9 m2 Black Panther T’Challa d1 1 m3 Thor Thor m1 2 m3 Thor Thor d3 3 m3 Thor Thor d4 4 m3 Thor Thor d5 5 m3 Thor Thor m2 6 m3 Thor Thor m3 7 m3 Thor Thor m3 8 m3 Thor Thor m3 9 m3 Thor Thor m5 9 M3 Thor Thor d1 1 m4 Jessica Jones Jessica Jones m1 2 m4 Jessica Jones Jessica Jones d3 3 m4 Jessica Jones Jessica Jones d4 4 m4 Jessica Jones Jessica Jones d5 5 m4 Jessica Jones Jessica Jones m2 6 m4 Jessica Jones Jessica Jones m3 7 m4 Jessica Jones Jessica Jones m3 8 m4 Jessica Jones Jessica Jones m3 9 m4 Jessica Jones Jessica Jones m5 9 m4 Jessica Jones Jessica Jones d1 1 m5 Iron Man Tony Stark m1 2 m5 Iron Man Tony Stark d3 3 m5 Iron Man Tony Stark d4 4 m5 Iron Man Tony Stark d5 5 m5 Iron Man Tony Stark m2 6 m5 Iron Man Tony Stark m3 7 m5 Iron Man Tony Stark m3 8 m5 Iron Man Tony Stark m3 9 m5 Iron Man Tony Stark m5 9 m5 Iron Man Tony Stark Name of all marvel heroes who have been in a movie in the database! T3 = SELECT_{hid=hid1} ( HeroInMovie x T1 ) PROJECT_{HERO} (T3) hid Mid HID1 HERO REALNAME ------------------------------------------------- m1 2 m1 Spiderman Peter Parker m2 6 m2 Black Panther T’Challa m3 7 m3 Thor Thor m3 8 m3 Thor Thor m3 9 m3 Thor Thor m5 9 m5 Iron Man Tony Stark PROJECT_{HERO} (T3) Spiderman Black Panther Thor Iron Man ------------- ID all movies who featured a DC comic hero T1(hid1, hero, realname) = DC_Heroes T2 = T1 x HeroInMovie All movies that feature a DC comic hero T3 = SELECT_{hid1 = hid} (T2) T3 schema? hid1, hero, realname, hid, mid Results = PROJECT_{mid} (T3) ---------------------------------------------- Data Model: MarvelHeroes(hid, hero, realname) DCHeroes(hid, hero, realname) Movies(mid, moviename, year) HeroInMovie(hid, mid) ---------------------------------------------- - Find the name of the hero for each movie made in year 2017 Heroes = MarvelHeroes union DCHeroes T1(hid1, mid1) = HeroInMovie T2 = select_{year=2017} (Movies) Result = project_{hero, moviename} (select_{mid=mid1 and hid1=hid} (T2 x T1 x Heroes)) - Return the id movies that star more than one hero. ## find two tuples for the same movie but different hero T1(hid1, mid1) = HeroInMovie T2 = project_{mid1} (select_{mid=mid1 and hid<>hid1} (HeroInMovie x T1)) Result = project_{moviename} (select_{mid1= mid} (T2 x Movies )) ------------------------------ THETA JOIN (any theta condition C) R Join_{C} S = {all tuples in R x S that satisfy the join condition C} A join condition is a condition that refers to comparisons between attributes of R and attributes of S Example: - Find the name of the hero for each movie made in year 2017 Heroes = MarvelHeroes union DCHeroes T1(hid1, mid1) = HeroInMovie T2 = select_{year=2017} (Movies) T3 = ( T2 join_{mid=mid1} T1 ) join_{hid1=hid} Heroes Result = project_{hero, moviename} T3 - Return the id movies that star more than one hero. ## find two tuples for the same movie but different hero T1(hid1, mid1) = HeroInMovie T2 = T1 join_{mid1=mid and hid1<>hid} HeroInMovie Result = project_{moviename} ((project_{mid1} T2) join_{mid1=mid} Movies) ----------------------- NATURAL JOIN (equality condition of the same attribute) R * S = {set of all tuples in RxS that have the same value for the same attribute, return only one attribute of the same name} Summary of operators: R union S R intersect S R - S T(A1, B1, C1) = R select_{C} (R) project_{A1,...,An} (R) R x S |><| == natural join |><|_C == theta join Properties of relational algebra operations ------------------------------------------- Selection is commutative: SELECT_{C}(SELECT_{D}(R)) = SELECT_{D}(SELECT_{C}(R)) Projection can be refined: PROJECT_{p}(PROJECT_{p, q}(R)) Selection and projection are commutative: PROJECT_{p}(SELECT_{C}(R)) = SELECT_{C}(PROJECT_{p}(R)) Union and cross product are commutative and associative. Theta join and natural join is commutative: R JOIN_{C} S = S JOIN_{C} R R * S = S * R Join is associative For any database relations R, S, and T such that (1) R and S have at least one common attribute, (2) S and T have at least one common attribute, and (3) no attribute is common to R, S, and T, we have: (R * S) * T = R * (S * T) Selection pushing: For any database relations R and S, any predicate C, and any operator # (union, intersection, or set difference), we have: SELECT_{C}(R # S) = SELECT_{C} (R) # SELECT_{C} (S) For any database relations R and S, any predicate C that depends only on attributes of R, and any operator @ (Cartesian product or join), we have: SELECT_{C}(R @ S) = SELECT_{C} (R) # S Projection pushing: PROJECT_{C} (R union S) = PROJECT_{C}(R) union PROJECT_{C}(S) Distributivity of Projection over Join For any database relations R and S with associated sets of attributes r and s, respectively, and any sets of attributes a, b, and c such that a is a subset of r union s, b = (a intersection r) union d, and c = (a intersection s) union d where d = r intersection s, we have: PROJECT_{a} (R * S) = PROJECT_{a} (PROJECT_{b} (R) * PROJECT_{c} (S)) MarvelHeroes(hid, hero, realname) DCHeroes(hid, hero, realname) Movies(mid, moviename, year) HeroInMovie(hid, mid) Exercises --------- T = Movies * HeroInMovie T(mid, moviename, year, hid) T1 = Movies * HeroInMovie * MarvelHeroes T(mid, moviename, year, hid, hero, realname) - Find the name of the hero for each movie made in year 2017 Heroes = MarvelHeroes union DCHeroes T1 = select_{year=2017} (Movies) T2 = project_{hero, moviename} (T1 * HeroInMovie * Heroes) - Find the name of all DC Heroes who starred in the movie 'Justice League' project_{hero} (select_{moviename = 'Justice League'} (DCHeroes * HeroInMovie * Movies)) ----------- MarvelHeroes * DCHeroes = MarvelHeroes intersect DCHeroes ---------------------------------- 1. Find id of Marvel heroes who have starred in a movie 2. Find id of all heroes who have starred in a movie 3. Find all DC comic heroes who NEVER starred in a movie For Queries 1, 2, 3: Return the hero and real name of the hero ----------------- 3. Find all DC comic heroes who NEVER starred in a movie T1 = project_{hid} (DC_Heroes) ## all DC heroes T2 = project_{hid} (HeroInMovie) ## heroes who starred in a movie T3(hid1) = T1 - T2 T4 = T3 x DC_Heroes ##(hid1, hid, hero, realname) Result = project_{hero, realname} ( select_{hid1 = hid} (T4) ) 4. Find id of all movies that starred a DC comic hero T1(hid1, mid1) = HeroInMovie Result = project_{mid} (select_{hid=hid1 and mid1=mid} (DCHeroes x Movies x T1)) 4alt. Find name, year of all movies that starred a DC comic hero T1(hid1, mid1) = HeroInMovie Result = project_{moviename, year} (select_{hid=hid1 and mid1=mid} (DCHeroes x Movies x T1)) 5. Find the id of all movies made after 2016 which star Marvel Heroes T1(mid1) = project_{mid} (select_{year > 2016} (Movies)) ##movies made after 2016 T2(hid1) = project_{hid} (MarvelHeroes) ##all Marvel heroes T3 = select_{mid1= mid} (T1 x HeroInMovie) ##all heroes in a movie T4 = select_{hid=hid1} ( T3 x T2 ) ### T4(mid1, hid, mid, hid1) Results = project_{mid} (T4) ---- 6. Find the id of all movies that does "not" star Thor T1(hid1, mid) = HeroInMovie T2 = select_{hero='Thor'} (MarvelHeroes) T3 = select_{hid1=hid} (T2 x T1) T4 = project_{mid} (T3) ###id of all movies with Thor Result = project_{mid} (Movies} - T4 ## all movies without Thor, includes movies with no superheroes Result1 = project_{mid} (HeroInMovie} - T4 ## all movies with superheroes but not Thor - Review of theta join/natural join - Example queries 7. Find name of Marvel heroes who have starred in a movie T1(hid1, mid1) = HeroInMovie Result = project_{hero} (select_{hid=hid1} (MarvelHeroes x T1)) T1(hid1, mid1) = HeroInMovie Result = project_{hero} (MarvelHeroes join_{hid=hid1} T1)) T1 = MarvelHeroes * HeroInMovie ##T1(hid, hero, realname, mid) Result = project_{hero} (T1) |><| == join 8. Find name of all heroes who did not star in a movie T1 = project_{hid} (HeroInMovie) ##all heroes who starred in a movie T2 = project_{hid} (MarvelHeroes union DCHeroes) ## all heroes in the database Result = project_{hero} ((T2-T1) * (MarvelHeroes union DCHeroes)) T1 = project_{hid} (HeroInMovie) ##all heroes who starred in a movie T2 = MarvelHeroes union DCHeroes ## all heroes in the database Result = project_{hero} ( project_{hid}(T2) - T1 ) * T2 T1 = project_{hid} (HeroInMovie) ##all heroes who starred in a movie T2 = MarvelHeroes union DCHeroes ## all heroes in the database Result = project_{hero} (T2 - (T2*T1)) 9. Find the name of all movies made after 2016 that stars a Marvel Hero Result = project_{moviename} (select_{year>2016} ( (MarvelHeroes * HeroInMovie * Movies) ) Result = project_{moviename} (select_{year>2016} ( (MarvelHeroes * Movies * HeroInMovie) ) Result = project_{moviename} ( ( (MarvelHeroes * HeroInMovie * select_{year>2016} (Movies)) ) 10. What is the result of this? T1 = Movies * MarvelHeroes = Marvel x MarvelHeroes Result = T1 * HeroInMovie 11. Find the title of a pair of different movies that were made in the same year and star the same hero ## two heroinmovie tuples with different movie id but same hero T1(hid1, mid1) = HeroInMovie T2(hid2, mid2) = HeroInMovie T3(mid3, moviename3, year3) = Movies T4 = T1 join_{hid1=hid2 and mid1<>mid2} T2 ## hid1,mid1,hid2,mid2 T5 = T4 join_{mid1=mid} Movies T6 = (T5 join_{mid2=mid3 and year=year3} T3) Result = project_{moviename,moviename3} (T6)