Announcements: --------------------------- Hw8, given out on thursday, due next thursday Lecture exercise: 23, 24, 25 Today: Size Estimation + Serializability Thursday: TwoPhase locking + Concurrency control + Durability Next monday: Durability Next thursday: Tuning (no prerecorded lectures) + overview [start lecture 2:30] Final exam: in the final exam week [comprehensive] --------------------------------------------------------------- --------------------------- Example: Tuples(R)= 8,000 --students Tuples(S)= 150,000 --transcript Attribute N_DISTINCT Minval Maxval R.A 8,000 298382370 23129803912 R.B 6 A E R.C 10 17 44 R.D 700 198382370 3129803912 S.A 7,800 298382370 23129803912 S.E 200 dsljdslkaj werpewurpowe Q1: SELECT * FROM R WHERE A=340238404 ; Sel(Q1) = 1/8000 Exp(Q1) = 1 Q2: SELECT * FROM R WHERE B='B'; Sel(Q2) = 1/6 Exp(Q2) = 8000/6 = 1333 Q3: SELECT * FROM R WHERE B<>'B'; Sel(Q2) = 1-1/6 = 5/6 Exp(Q2) = 8000*5/6 = 6666 Q4: SELECT * FROM R WHERE C>= 20 AND C<=24 ; Sel(Q4)= (24-19)/(44-16)=0.18 Exp(Q4)= 1429 Q5: SELECT * FROM R WHERE C>= 20 AND C<=24 AND B='B'; Sel(Q5)=Sel(Q4)*Sel(Q2) = 0.18/6 = 0.03 Q6: SELECT * FROM R,S WHERE R.A=S.A AND (R.B='A' OR S.E= 'fsdfsdf') ; R.A=S.A => 1/8000 ## Size of the join; 150,000 * 8,000 * 1/8,000 = 150,000 R.B='A' => 1/6 S.E= 'fsdfsdf' => 1/200 Sel: 1/8,000 * ((1- ((1-1/6)*(1- 1/200)) ) Exp: 150,000 * 8,000 * 1/8,000 * ((1- ((1-1/6.)*(1- 1/200.)) ) Statistics: ---------------- N_DISTINCT(Attr) MinVal MaxVal Sel(R.A=x) = 1/N_DISTINCT(R.A) Sel(c1<=R.B<=c2) = (c2-c1)/(maxval-minval) Sel(R.A=S.B) = 1/max(N_DISTINCT(R.A), N_DISTINCT(S.B)) sel(not Cond) = 1 - sel(Cond) sel(Cond1 and Cond2) = sel(Cond1) * sel(Cond2) sel(Cond1 or Cond2) = sel( not ((not Cond1) and (not Cond2)) ) = 1 - ((1 - sel(Cond1)) * (1 - sel(Cond2))) ==================================================================== Concurrency Control: ------------------------- Concurrent operations: Schedule: S0: r1(x) r2(x) w1(x) w2(x) S1: r1(x) w1(x) r2(x) w2(x) T1 -> T2 S2: r2(x) w2(x) r1(x) w1(x) T2 -> T1 S1,S2: serial schedules Conflicting operations ------------------------ Two operations conflict with each other, if they are by two different transactions and one of them is a write operation r1(x) w2(x) w1(x) r2(x) w1(x) w2(x) Serializability: ----------------------------- A schedule S is serializable (equal to a serial schedule) iff all conflicting operations in S occur in the same order as a serial schedule. Checking Serializability: ---------------------------- Create a conflict graph for a schedule S: -- Each transaction is a node -- For each conflict in S of the form: ri(x) wj(x) wi(x) rj(x) wi(x) wj(x) Draw an edge from Ti to Tj. T1->T2, T2->T1 If the resulting graph has a cycle, then this schedule is NOT serializable. If the resulting graph has no cycles, then this schedule is serializable. I can order transaction using topological sort on the conflict graph. Examples: S2: r1(x) w2(x) w1(y) r3(y) w3(z) r2(z) r3(w) w2(z) Conflicts: r1(x) w2(x) w1(y) r3(y) w3(z) r2(z) w3(z) w2(z) Is it serializable? -> No cycles, it is serializable. Equivalent serial order: T1, T3, T2 ---------------------------------- S3: r1(x) w1(x) w1(z) w3(z) r4(z) w2(w) r4(w) r2(w) r3(x) r2(z) w4(w) w4(z) Conflicts: w1(x) r3(x) w1(z) r4(z) w1(z) w3(z) w1(z) w4(z) w1(z) r2(z) w3(z) r2(z) w3(z) r4(z) w3(z) w4(z) w2(w) w4(w) w2(w) r4(w) Serializable, no cycles. T1, T3, T2, T4