Sel(Cond) = percentage of tuples that will pass this condition
High selectivity, low Sel(Cond) value
0.001 is more selective than 0.1
Q1:
Sel(A in range condition) = (between 1K-11K), 10/12
Sel(C=8) = 1/50
Sel(D=50) = 1/1000
Attribute D is the most selective index, R(A)
Sel(Q1= Cond1 and Cond2 and Cond3) = (10/12 * 1/50 * 1/1000)
Exp(Q1) = Sel(Q1) * Tuples® = (10/12) * (1/50) * (1/1000) * 100000
PAGES(Q1) = PAGES® * Sel(Q1)
The best index for an index only search: R(CDAB) or R(DCAB)
Suppose I have indices
I1 on R(A)
Scan 2+ leaf nodes for A condition: Cost = 2+ (10/12) * #leafnodes(I1)
Read the matching tuples to check C,D conditions and return the B attribute: 100,000*(10/12)
I2 on R(A,C)
Index scan cost = 2+ #leaf(I2) * (10/12) (A condition only)
After scanning, we find 100000*(10/12)*(1/50)=1666 tuples that satisfy A and C conditions, read tuples from disk. Read 1666 tuples from 1666 pages in the worst case.
I3 on R(D)
Index scan cost: Scan 2+ #leaf(I3)/1000 leaf nodes
Relation cost: 100000/1000=100 matching tuples to read from disk
I4 on R(B,C,D,A)
Scan all leaf nodes and no relation tuples
Cost = 2+ #leaf(I4)
I5 on R(C,D,A,B)
Scan for C,D and A conditions
Cost = 2 + #leaf(I5) * Sel(Q1) + 0 (relation cost) because all attributes are in the relation
Tuples® = 100,000
Pages® = 4,000
Index I5, #Leaf(I5) = 2,000
Index I2, #Leaf(I2) = 1,000
R has 100000/4000=25 tuples per page
I5 has 100000/2000= 50 tuples per page
I5 has 100000/2000= 100 tuples per page
Scan for C,D and A conditions
Exp(Q1) = 100000 * (10/12) * (1/50) * (1/1000) = 2 tuplesHow many leaf nodes? 2
I2 on R(A,C)
- Index scan for A condition:
Exp(A range) = 100000*(10/12) = 83,334 tuples
For I2, 83,334/100 = 834 pages
- #leaf(I2) = 1000, 10/12 of it is scanned: 100010/12
- After scanning, we find 100000(10/12)*(1/50)=1666 tuples that satisfy A and C conditions, read tuples from disk. Read 1666 tuples from 1666 pages in the worst case.
Block nested loop join (M number of memory blocks available):
R join S (read left relation once ®, and the right relation a number of times?
PAGES®= 1,000
PAGES(S)= 500
M=1001, Cost = 1,000+ 500 (Read R once and Read S once)
M=101, Cost = 1,000 + 500 * (1000/100) = 6,000
M=51, Cost = 1,000 + 500 * (1000/50) = 11,000
S join R
M=1001, Cost = 1,500
M=101, Cost = 500 + (500/100) * 1,000 = 5,500
M=51, Cost = 500 + (500/50)*1,000 = 10,500
Lecture exercise 24:
PAGES®= 1,000
PAGES(S)= 500
PAGES(SELECT®)= 200 (size after selection)
M=101
Cost of reading R, 1,000
Cost of reading S: 500* (200/100) = 1,000
Cost of this join: 2,000
Sort:
PAGES(SELECT(R JOIN S))= 4,000 (size after join and selection)
Sorting, M=100
Step 1: Read T in 100 pages at a time, sort and write
Cost = Read once (0 cost since it is output by the join), write once = 4,000 (produce 4000/100=40 sorted groups)
Step 2: Merge 40 groups and output (Cost = read T once, 4,000)
Total cost = 2*4,000
Sorting
PAGES(T) = 4000, M=100
Step 1: Read T in 100 pages at a time, sort and write
Cost = Read once, write once = 8,000 (produce 4000/100=40 sorted groups)
Step 2: Merge 40 groups and output (Cost = read T once, 4,000)
Total cost = 3*4,000
PAGES(T)=4,000, M= 50
Step 1: Cost = 8,000 pages, produce 4,000/50=80 sorted groups
Step 2, part 1: Read once,write once, reduce 80 sorted groups to 2
Cost = 8,000
Step 2, part 2: Read once, merge 2 groups, Cost = 4,000
Total cost= 4,000*5