Lecture 4: Announcements: ----------- - Hw 1 is out - Office hours have started. The only in person office hour after class today: 4pm-6pm JRowl 1C08 - Lecture 4 exercise after class -------------------------------- Normalization ------------------------ Given a relation R, a functional dependency (fd) is an expression of the form: X -> Y where X and Y are sets of attributes from R. If X -> Y, then whenever you have two tuples in R with the same values for attributes in X, then they must have the same values for attributes in Y. TVShows(Showname, StartYear, EndYear, Actor, Role, DateOfBirth, JoinedCast, LeftCast) Showname -> StartYear EndYear Actor -> DateOfBirth ShowName Actor Role -> JoinedCast LeftCast Student(RIN, FirstName, LastName, GPA, Major, ClassYear, Address, RPIEmail, DoB) RIN -> FirstName LastName GPA ClassYear Address DoB RPIEmail RIN -> RIN RIN -> RIN FirstName Book(ISBN, Title, Author, Publisher, Edition) ISBN -> Title Edition Publisher StudentPhones(RIN, PhoneNumber, IsPreferred) Can two people have the same number? Yes. There can be only preferred phone number RIN PhoneNumber -> IsPreferred Class(CRN, CourseCode, SectionNo, Semester, Year, Classroom, Day, Time, Professor, Textbook) CRN -> CourseCode SectionNo Semester Year CourseCode SectionNo Semester Year -> CRN CRN Day Time -> Classroom CourseCode SectionNo Semester Year Day Time -> Classroom Inference Rules: -------------------------- Given a relation R and a set of F of fds, the following are inference rules: 1. Trivial 2. Combining rule 3. Decomposition 4. Transitivity 5. Augmentation Given: CourseCode SectionNo Semester Year -> CRN CRN Day Time -> Classroom Augmentation: CourseCode SectionNo Semester Year Day Time -> CRN Day Time Transitivity: CourseCode SectionNo Semester Year Day Time -> Classroom The closure of a set of functional dependencies ------------------------------------------------- Given a relation R and a set of F of fds, the closure of F given by F+ is the set of all functional dependencies implied by F. The closure of a set of attributes X -------------------------------------- Given a relation R and a set of F of fds, the closure of a set X of attributes, given by X+ is the set of all atttributes Y that are implied by X. To compute of X+: - Initialize X+ = {X} - Repeat until X+ does not change: Find Y->Z such that Y is a subseteq of X+, then add Z to X+ Then, we have that X-> X+ is implied by F (or it is in F+) ------------ R(A,B,C,D,E,F,G) F = {AB->C, C->D, BC->E, BE->F} AB+ = {A,B} AB->C , AB+ = {A,B,C) C->D, AB+ = {A,B,C,D} BC->E, AB+ = {A,B,C,D,E} BE->F, AB+ = {A,B,C,D,E,F} -------------- SUPERKEY: Given a relation R and a set of F of fds, X is a superkey if X+ is the set of all attributes in R. A Key is a minimal set of attributes that imply all the attributes in R. Every key is a superkey by definition. ----------- R(A,B,C,D,E,F,G) F1 = {AB->CD, AB->EFG} A+={A} B+={B} BC+ = {B,C} AB+ = {A,B,C,D,E,F,G} AB is a key ABC+ = {A,B,C,D,E,F,G} ABC is a superkey R1(A,B,C,D,E,F,G) F1 = {AB->C, CD->EF, F->G} AB+ = {A,B,C} ABD+ = {A,B,C,D,E,F,G} Key: ABD R2(A,B,C,D,E,F,G) F1 = {AB->C, CD->EF, CF-> AG} Keys: ABD ABD+ = {A,B,C,D,E,F,G} BCD+ = {B,C,D,E,F,A,G} R3(A,B,C,D,E,F,G) F1 = {AB->C, CD->EF, CF-> A} Keys: ABDG, BCDG R4(A,B,C,D,E,F,G) F1 = {AB->D, DE->G, DF-> A} TVShows(Showname, StartYear, EndYear, Actor, Role, DateOfBirth, JoinedCast, LeftCast) Showname -> StartYear EndYear Actor -> DateOfBirth ShowName Actor Role -> JoinedCast LeftCast Key: ShowName Actor Role Student(RIN, FirstName, LastName, GPA, Major, ClassYear, Address, RPIEmail, DoB) RIN -> FirstName LastName GPA ClassYear Address DoB RPIEmail RIN -> RIN RIN -> RIN FirstName Key: RIN Major ----------------------- Why do I care? TVShows(Showname, StartYear, EndYear, Actor, Role, DateOfBirth, JoinedCast, LeftCast) Showname -> StartYear EndYear Actor -> DateOfBirth ShowName Actor Role -> JoinedCast LeftCast Key: ShowName Actor Role Bad bad relation! 1. Information is repeated: startyear/endyear is repeated for each actor or role or repeat Actor dateofbirth for every role they play 2. Complicatd programming: To maintain data consistency, I need to make a single change in many places 3. Coupled information unnecessarily: I cannot enter information about a show until I have some actors cast in it If I delete a show, I may loose information about actors that was in that show only. ---------- Boyce Codd Normal Form (BCNF) ------------------------------- Given a relation R and a set of F of fds. R is in BCNF iff and only one of the following is true for every functional dependency X->Y in F: a) either X->Y is trivial (Y is a subseteq of X), b) or X is a superkey of R. ------------------------------------ R(A,B,C,D,E,F,G) F1 = {A->A, AB->CD, ABC->EFG, BC->C} Key: AB Is R in BCNF? Yes, R is in BCNF. A->A ok, trivial AB->CD, ok, AB is a superkey ABC->EFG, ok, ABC is a superkey BC->C, ok, trivial ------------------------------------ R3(A,B,C,D,E,F,G) F1 = {AB->C, CD->EF, CF-> A} Keys: ABDG, BCDG In BCNF? AB->C, not trivial, AB is not a superkey, not in BCNF ------------------------------------ TVShows(Showname, StartYear, EndYear, Actor, Role, DateOfBirth, JoinedCast, LeftCast) Showname -> StartYear EndYear (violates BCNF, showname is not a superkey) Actor -> DateOfBirth (violates BCNF, actor is not a superkey) ShowName Actor Role -> JoinedCast LeftCast Key: ShowName Actor Role Is TVShows in BCNF? Not in BCNF StudentPhones(RIN, PhoneNumber, IsPreferred) RIN PhoneNumber -> IsPreferred Key: RIN PhoneNumber IN BCNF -------------------------------- Prime attribute: ---------------- If an attribute is in a key (any key), it is called a prime attribute. R5(A,B,C,D) F = {AB->CD, D->A} Key: AB, BD In BCNF? AB->CD , AB is a superkey, OK D->A , D is not a superkey. Not in BCNF. Prime attributes: A,B,D Third Normal Form (3NF): ---------------------- Given a relation R and a set of F of fds. R is in 3NF iff and only one of the following is true for every functional dependency X->Y in F: a) either X->Y is trivial (Y is a subseteq of X), b) or X is a superkey of R, c) all of Y are prime attributes. R5(A,B,C,D) F = {AB->CD, BC->B, D->A} Key: AB, BD In 3NF? Yes. AB->CD , AB is a superkey, OK BC->B , trivial, ok D->A , D is not a superkey, but A is a prime attribute. OK.