SELECT R.B, S.E FROM R, S WHERE R.A=10 AND R.B <=200 AND S.C = R.D; Number of tuples expected in the output of this query? Tuples(R)=10,000 PAGES(R)=1,000 (Each page can have 10 tuples) Tuples(S)=20,000 PAGES(S)= 800 NDistinct(R.A)=500 NDistinct(R.D)=10,000 NDistinct(S.C)=12,000 MinVal(R.B)=1 MinVal(R.B)=1000 SEL(R.A=10) = 1/NDISTINCT(R.A) SEL(R.B <=200) = 200-minval(R.B)/(maxval(R.B)-minVal(R.B) SEL(S.C = R.D) = 1/max(NDISTINCT(S.C), NDISTINCT(R.D)) TUPLES(R)*TUPLES(S)* SEL(R.A=10)*SEL(R.B <=200)*SEL(S.C = R.D) 10,000*20,000*1/500*200/1000*1/12,000 ------------- Query Plan for: Index scan on I1 on R(B,A) with 500 leaf nodes and 3 level Join the result of this with S using block nested loop join with M=30 Index scan using: R.A=10 AND R.B <=200 Cost of the index scan (find all tuples satisfying R.A=10 AND R.B <=200) Scan the index for R.B<=200 (200/1000, 1/5) 500/5=100 leaf nodes + 2 Read 4 tuples to find R.D Index cost = 2 + 100 + 4 = 106 --> Which condition am I going to scan the index by? --> How many leaf nodes? ---> How many matching tuples am I going to find after scanning? ---> Do I need to read these tuples from data pages due to missing attributes needed for the query? -------------- # tuples in the output and how many disk pages they take? TUPLES(R)*SEL(R.A=10)*SEL(R.B <=200) = 10,000 * 1/500*200/1000 = 4 tuples (1 page) Due to pipelining, there is no additional cost of reading R (it is already in memory after index scan). How many times do I have to read S? Since R after selection fits in 1 block, I only have to read S once! Total cost = 106 (for index scan) + 800 (PAGES(S)) ------------------------ R1 = (SELECT R) Sequential scan: cost = PAGES(R) = 1,000 R2 = (SELECT R) join S 2*500 R3 = SORT (R2) 4000 (write 40 sorted groups) + 4000 (read/merge/output) ------------------------ Cost of joining R1 = 200 pages and S= 500 pages using M=101 R1 once, S two times Cost = 200 + 2*500 ------------------------ Sorting a relation of 4,000 pages M = 100 4000 + 4000 ( 40 sorted groups) 4000 and output Cost of sort = 3*4000 ------------ A. Which condition am I going to scan the index by? B. How many matching tuples am I going to find after scanning? C. Do I need to read these tuples from data pages due to missing attributes needed for the query? Index scan examples: Q1: SELECT B,E FROM R WHERE B<=200 AND B>=10 AND C=10 AND A=5 Q2: SELECT C,D FROM R WHERE C<10 AND B=10 Index I1 on R(A,B,C,D) Scan for Q1: A. scan for A and B conditions B. find matching for A,B,C conditions C. yes, I need to read for the E attribute Scan for Q2: A. scan the whole leaf B. find matching B,C conditions C. no Index I2 on R(C,A,E) Scan for Q1: A. Scan for A,C conditions B. Find tuples matching A,C conditions C. Yes, to check the B condition Scan for Q2: A. Scan for C condition and read matching tuples from Data pages Q1: SELECT B,E FROM R WHERE B<=200 AND B>=10 AND C=10 AND A=5 Q2: SELECT C,D FROM R WHERE C<10 AND B=10 Index I3 on R(C,D,A) Scan for Q1: A. Scan for C=10 B. Find matching tuples for A,C attributes C. Yes, for B condition and attribute E Scan for Q2: A. Scan for C and read matching tuples ----------- R(A,B,C,D,E,F,G) {AB->CD, D->EG, G->A} Key: ABF, GBF In BCNF? AB->CD No, AB is not a superkey D->EG D is not a superkey G->A G is not a superkey In 3NF? AB->CD No, AB is not a superkey and CD are not prime ----------- 3NF decomposition {AB->CD, D->EG, G->A} ABCD AB->CD Key: AB DEG D->EG Key: D AG G->A Key: G ABF Key: ABF AB=>F ------------------ BCNF decomposition R(A,B,C,D,E,F,G) {AB->CD, D->EG, G->A, AB=>F} Key: ABF, GBF In BCNF? AB->CD No, AB is not a superkey D->EG D is not a superkey G->A G is not a superkey ---- D->EG D+ = {D,E,G,A} D->EGA R1(A,D,E,G) D->EG, G->A, in BCNF? Key: D, G->A violates BCNF R2(B,C,D,F) BD->C Key: BDF, BD is not a superkey, not IN BCNF R1(A,D,E,G) D->EG, G->A, in BCNF? Key: D, G->A violates BCNF R11(A,G) G->A R12(D,E,G) D->EG R2(B,C,D,F) BD->C Key: BDF, BD is not a superkey, not IN BCNF R21(B,C,D) BD->C R22(B,D,F) {} The result: AG DEG BCD BDF --------- 4NF decomposition ABFH AB=>F (ABF) (ABH) ---------------- People: id, name Students: class Staff: rank Converting to relational data model: Model 1: People(id, name, class, rank, isstudent, isstaff) Model 2: People(id, name) Student(id, class) Staff(id, rank) To query students: People join Student To query staff: People join Staff create table people .... create table students under people ... create table staff under people ... Model 3: Student(id, name, class) Staff(id, name, rank) --------- Weak entities AwardType({Type,Name}) AwardYear({AwardType,AwwardName,Year}) If awardyear was a strong entity AwardYear({id}, year,awardtype,awardname) --------------------- Which of the following schedule and sequence of locks are possible under two phase locking (Note: sl, xl, ul stand for shared lock, xlock and unlock of a data item. Assume that xl works like an upgrade lock.)? sl1(x) sl2(x) sl2(y) r1(x) r2(x) r2(y) ul2(y) ul2(x) xl1(x) xl1(y) w1(x) w1(y) [not possible! xl2(z)] w2(z) sl2(x) sl2(y) sl1(x) r1(x) r2(x) r2(y) ul2(x) ul2(y) xl1(x) w1(x) [not possible! w1(y)] xl1(y) xl2(z) w2(z) T1: xl x T2: sl2(x) sl2(y) sl1(x) r1(x) r2(x) r2(y) ul2(x) ul2(y) xl1(x) [not possible! xl2(z)] w1(x) w1(y) xl1(y) w2(z) T1: xl x T2: sl2(x) sl2(y) sl1(x) r1(x) r2(x) r2(y) xl2(z) ul2(x) ul2(y) [not possible! w1(x)] w1(y) xl1(y) w2(z) T1: sl x T2: xl z sl1(x) sl2(x) r1(x) r2(x) sl2(y) r2(y) xl2(z) ul2(y) ul2(x) xl1(x) xl1(y) w1(x) w1(y) w2(z) Possible! T1: xl x xl y T2: xl z xl1(x) sl2(x) r1(x) r2(x) sl2(y) r2(y) xl2(z) ul2(y) xl1(y) w1(x) w1(y) w2(z) ------------------- Question 3. For the following schedule, check all that is correct: r1(x) r3(y) w3(y) r1(t) w1(x) r3(z) r3(z) r2(w) w2(y) w2(z) Conflicts: r3(y) w2(y) w3(y) w2(y) r3(z) w2(z) T3->T2 T1 T3 T2 T3 T1 T2 T3 T2 T1 -------------- books(isbn13, title, published_date) authors(isbn13, author) For each author, return the first book they have written >> such that for this author and book, there is no earlier book by the same author select a.author , a.isbn13 , b.published_date from authors a , books b where a.isbn13 = b.isbn13 and not exists (a book by the same author written earlier, i.e. before b.published_date) select a.author , a.isbn13 , b.published_date from authors a , books b where a.isbn13 = b.isbn13 and not exists (SELECT * from authors a2, books b2 WHERE a2.author = a.author -- same author and a2.isbn13 <> a.isbn13 --- different book and a2.isbn13 = b2.isbn13 and b2.published_date < b.published_date -- written earlier )