Lecture 2 Relational Data Model --------------------- Data model: type of data, rules about the data (Database Schema) Data: actual information/facts satisfying data model Database = Database Schema: fixed model + Database Instance: changing facts, what is true at the data at the moment + Application Logic + Constraints/rules Database: Relational Data Model Database: a set of relations. A relational database: A set of relations: A relation: A class of objects we want to store information about A relation instance contains a set of tuples: Each tuple is an object of this class A tuple: a set of attributes and a value for each attribute Each attribute comes from a domain: set of valid values for this attribute integer, string, boolean, date, money, timestamp, 9 digit number Data model: For each relation: attributes and their domains (types) Data instance: For each relation: a set of tuples where each attribute has a value from its domain Relation: Books Tuple: A single book Attributes: Title, Author, Price, Publisher, ISBN Relation: Students in DBS Tuple: A student in DBS class right now Attributes: RIN, SSN, FirstName, LastName, Major, Class, Email Relational data model: Rule 1: Attribute domains are simple values! OK: integer, string Not OK: set of values Rule 2: Relations contain "set" of tuples, and each tuple contains a "set" of attributes. Model: Book(Title, Author) Book('War and Peace', 'Leo Tolstoy') Book('1984', 'George Orwell') Book('Three comrades', 'Erich Maria Remarque') Logical Physical Relations Table Tuple Row Attribute Column R1 = {t1, t2, t3} R2 = {t1, t2, t2, t3, t3} R1 and R2 are the same relations! Key of a relation: ------------------ A set of attributes is a key for a relation if no two tuples in the relation will have the same value for these attributes and no subset of these attributes is a key (minimal set of attributes). A relation can have more than one key. Students: RIN is a key SSN is a key Email is a key Is First Name and Last Name a key? No, two students can have same first and last name RIN, First Name, Last Name a key? No, not minimal. RIN alone is a key. All relations must have a key. ----------------------------------------------------- Book(Title, Author) Is Title a key? Let's assume book can only have a single title. If title is a key, then I cannot store multiple authors. I want to store such books, so no title is not a key. Is Author a key? If I want to store two separate books by the same author, then no author is not a key. Title, Author together is a key. Other attributes to include: ISBN ----------------------------------------------------- Check all the keys for the following relation for managing your own life and store your passwords (like keychain on Macs). Given a media site, you can have multiple accounts, you can use the same password in two different sites, you can create two accounts on the same date and you can use the same username in two different sites. OnlineAccounts(site, username, encrypted_password, creation_date) Key is minimal: no subset of the set of attributes is also a key site username encrypted_password creation_date a K 0xf0 1/1/2015 a K2 0xf0 1/1/2016 b K 0xf1 1/1/2020 a S 0xf9 12/30/2020 Not keys: username site Not necessarily minimal keys (superkeys): { username, site } { username, site, encrypted_password } { username, site, creation_date } { username, site, encrypted_password, creation_date } { username, site, creation_date, encrypted_password } Candidate key: a minimal key (might be more than one) from superkeys Primary key: one of candidate keys Relational Algebra ---------------------- Basic Definitions: Input: set of tuples, a relation, or two relations Output: set of tuples Set Compatibility: Two relations are set compatible, if they have the same schema: the same set of attributes Set compatible operations. Same schema! R1(r1:int, r2:float) R2(r1:int, r2:datetime) R3(r2:float, r1:int) Set Operations: R Union S R Intersection S R Set Difference S are defined for two relations R and S, only if R and S are set compatible. R Union S = { tuples t such t is either in relation R or in relation S } R Intersection S = { tuples t such that t is in both relation R and S } R Set Difference S = { tuples t such that t is in R but not in S } Interesting facts: R ∩ S = R - (R - S) R ∩ S = (R U S) - ((R - S) U (S - R)) A − B = A ∩ B' De Morgan's: (A ∩ B)' = A' U B' (A U B)' = A' ∩ B' Data Model: Marvel_Heroes(hid, hero, realname) ___ DC_Heroes(hid, hero, realname) ___ Movies(mid, moviename, year) ___ HeroInMovie(hid, mid) ___ ___ -- Find all heroes in either Marvel or DC Universe T = Marvel_Heroes union DC_Heroes -- Find all heroes that are both in Marvel and DC Universes T = Marvel_Heroes intersect DC_Heroes -- Find all heroes that in the Marvel Universe but not in the DC Universe T = Marvel_Heroes - DC_Heroes --------------- RIN is a key. Assume that a student can be both grad and undergrad. Ugrad(RIN, Name) Grad(RIN, Name) Find students who are coterm, but not have graduated with their BS Ugrad intersect Grad Find undergraduate students who are not coterm Ugrad - Grad Find graduate students who are not also undergraduates Grad - Ugrad Find all students in the database Grad union Ugrad More Relational Algebra ------------------------ Rename ====== T(A1, B1, C1) = R T1 = MarvelHeroes T1 has the same schema as MarvelHeroes T2(hid1, hero1, realname1) = T1 T3(hid1, hero, realname) = T1 The number of attributes and the number of tuples does not change. Ugrad(RIN, Name) Grad(GradRIN, FullName) Ugrad and Grad are not set compatible. T1(RIN, Name) = Grad T2 = T1 union Ugrad ---------------- Selection ========= select_{C}(R) [[slice operation]] select_C R = { t in R such that t satisfies the Boolean condition C } The result of selection has the same schema as R, but has only those tuples that satisfy the Boolean condition Movies mid moviename year -------------------------------------------- 1 Superman: Man of Tomorrow 2020 2 Spiderman: Homecoming 2017 3 Wonder Woman 2017 4 Deadpool 2016 5 Green Lantern 2011 6 Black Panther 2018 7 Thor 2011 8 Thor: The Dark World 2013 9 Avengers 2012 -- Find all movies that are made after year 2011. T1 = select_{year > 2011}(Movies) T1 mid moviename year -------------------------------------------- 1 Superman: Man of Tomorrow 2020 2 Spiderman: Homecoming 2017 3 Wonder Woman 2017 4 Deadpool 2016 6 Black Panther 2018 8 Thor: The Dark World 2013 9 Avengers 2012 -- Find all movies with Thor in its title T2 = select_{moviename like '%Thor%'}(Movies) T2 mid moviename year -------------------------------------------- 7 Thor 2011 8 Thor: The Dark World 2013 -- Find movies made after 2011 with title that starts with Thor T3 = select_{moviename like 'Thor%' AND year > 2011}(Movies) mid moviename year -------------------------------------------- 8 Thor: The Dark World 2013 Projection ========== project_{A1,...,An}(R) project_A1,...AN R = { t in R but only the values for the attributes A1,...,AN } where A1,...,AN are attributes in R T4 = project_{mid, moviename}(Movies) T4 mid moviename --------------------------------- 1 Superman: Man of Tomorrow 2 Spiderman: Homecoming 3 Wonder Woman 4 Deadpool 5 Green Lantern 6 Black Panther 7 Thor 8 Thor: The Dark World 9 Avengers -- All years a movie was made in! T5 = project_{year}(Movies) [[A relation is a set of tuples]] year ---- 2020 2017 2016 2018 2011 2013 2012 -- Find all heroes who have not been featured in a movie -- T1: all heroes who have been featured in a movie T1 = project_{hid}(HeroInMovie) -- T2: all heroes in my database T2 = Marvel_Heroes UNION DC_Heroes -- T2 is not set compatible with T1 T3 = project_{hid}(T2) -- All heroes who have not been in a movie Result = T3 - T1 Alternative form: Result = project_{hid}(Marvel_Heroes UNION DC_Heroes) - project_{hid}(HeroInMovie) -- Find id of Marvel heroes who have starred in a movie T1 = project_{hid}(Marvel_Heroes) -- all Marvel heroes T2 = project_{hid}(HeroInMovie) -- heroes who starred in a movie Result = T1 intersect T2 -- Find id of all heroes who have starred in a movie T1 = project_{hid}(Marvel_Heroes) -- all Marvel heroes T2 = project_{hid}(DC_Heroes) -- all DC heroes T3 = T1 union T2 T4 = project_{hid}(HeroInMovie) -- heroes who starred in a movie Result = T3 intersect T4 Actually, I don't need to union after all. This one works too: Result = project_{hid}(HeroInMovie) -- heroes who starred in a movie --- Find all DC comic heroes who NEVER starred in a movie T1 = project_{hid}(DC_Heroes) -- all DC heroes T2 = project_{hid}(HeroInMovie) -- heroes who starred in a movie Result = T1 - T2