Lecture 24
------------

- No lecture exercise today
- HW#7 is due today

- Final exam: Wednesday December 14, 3:00PM-6:00PM (West 220)
  -> PLEASE REPORT ANY CONFLICTS NOW!


Query Optimization
--------------------------
Given a query, find the fastest implementation of it
given the database model, storage and its current contents.

Step 1: Parse query

Step 2: Look at conditions, remove redundant conditions and
check for inconsistencies

Step 3: Find alternate logical query plans: join ordering/pushing
selections down/adding projections

Step 4: Query size estimation: estimate number of tuples matching a
specific condition (before executing a query)

Step 5: Enumerate all possible physical query plans
(operators+implementaiton), estimate its cost and find the cheapest
one.


Step 4: Size estimation
------------------------------------

SEL(C) = the proportion of tuples that will pass the condition C

Given:

T = SELECT_{C} (R)     TUPLES(T) = TUPLES(R) * SEL(C)

T2 = R JOIN_{C} S      TUPLES(T2) = TUPLES(R) * TUPLES(S) * SEL(C)


DISTINCT(R.A): number of distinct R.A values currently stored
MINVAL(R.A), MAXVAL(R.A): min and max values for R.A

SEL(R.A=c) = 1/DISTINCT(R.A)

SEL(R.A>=c1 AND R.A<=c2) =  (c2-c1)/(MAXVAL(R.A)-MINVAL(R.A))

SEL(R.A=S.B) = 1/{MAX(DISTINCT(R.A), DISTINCT(S.B))}


SEL(C1 AND C2) =  SEL(C1) * SEL(C2)
SEL (NOT C1)   =  1-SEL(C1)
SEL(C1 OR C2)  =  1 - ((1-SEL(C1)) * (1-SEL(C2)))

-----------------

Join ordering

Index selectivity:
-------------------
Given an index, how much does it reduce the number of
truples that need to be retrieved for a query?

R(A,B,C,K,L,M)  10,000,000 tuples
    Distinct(R.A)= 200    
    Distinct(R.B)=5,000   
          MinVal(R.B)=500   MaxVal(R.B)=10,500 
    Distinct(R.C)=6,000  
    Distinct(R.D)=500  
          MinVal(R.B)=100   MaxVal(R.B)=2100


Index I1 on R.A,   R.A=10

Sel(R.A=10) = 1/200

Index I2 on R.B,   R.B=400

Sel(R.b=400) = 1/5000

Index I3 on R.C,   R.C='ABC'

Sel(R.C='ABC') = 1/6000

Index I4 on R.D, R.D=400

Sel(R.D=400) = 1/500

1. Find most selective indexes

2. Find less selective indexes that have the attributes needed
for a query

SELECT F FROM R WHERE R.A = 200 AND R.E<= 500;


F,A,E -> scan all leaf
E,A,F -> Scan for R.E<=500
E,F,A -> Scan for R.E<=500

A,E,F -> best index (scan leaf from R.A=200 and min (R.E) up to
R.A=200 and R.E>500)

-----------------


TRANSACTION (XACT) PROCESSING
================================

Concurrency control

Abstract sequence of operations in a database as read/write by
transactions of data items

Schedule S1: r1(X) r2(X) w1(X) w2(X)

Schedule S2: r1(X) w1(X) r2(X) w2(X)

Schedule S3: r2(X) w2(X) r1(X) w1(X)

Schedule S4: r2(X) r1(X) w2(X) w1(X)

A serial schedule is assumed to be always correct
(S2 and S3 above are both serial!)

A schedule is serializable if its final results are guaranteed to be
equal to a serial schedule.

If a xact in a schedule always reads the same values and writes values
in the same order as a serial schedule, then it is serializable.


S1: r1(x) r2(y) w1(x) r2(z) w2(y) r1(y) w1(y)

S2: r2(y)  r2(z) w2(y) r1(x) w1(x) r1(y) w1(y)