Contents

Sel(Cond) = percentage of tuples that will pass this condition

High selectivity, low Sel(Cond) value

0.001 is more selective than 0.1

  • Q1:

    • Sel(A in range condition) = (between 1K-11K), 10/12

    • Sel(C=8) = 1/50

    • Sel(D=50) = 1/1000

    • Attribute D is the most selective index, R(A)

    • Sel(Q1= Cond1 and Cond2 and Cond3) = (10/12 * 1/50 * 1/1000)

    • Exp(Q1) = Sel(Q1) * Tuples® = (10/12) * (1/50) * (1/1000) * 100000

    • PAGES(Q1) = PAGES® * Sel(Q1)

    • The best index for an index only search: R(CDAB) or R(DCAB)

    • Suppose I have indices

      • I1 on R(A)

        • Scan 2+ leaf nodes for A condition: Cost = 2+ (10/12) * #leafnodes(I1)

        • Read the matching tuples to check C,D conditions and return the B attribute: 100,000*(10/12)

      • I2 on R(A,C)

        • Index scan cost = 2+ #leaf(I2) * (10/12) (A condition only)

        • After scanning, we find 100000*(10/12)*(1/50)=1666 tuples that satisfy A and C conditions, read tuples from disk. Read 1666 tuples from 1666 pages in the worst case.

      • I3 on R(D)

        • Index scan cost: Scan 2+ #leaf(I3)/1000 leaf nodes

        • Relation cost: 100000/1000=100 matching tuples to read from disk

      • I4 on R(B,C,D,A)

        • Scan all leaf nodes and no relation tuples

        • Cost = 2+ #leaf(I4)

      • I5 on R(C,D,A,B)

        • Scan for C,D and A conditions

        • Cost = 2 + #leaf(I5) * Sel(Q1) + 0 (relation cost) because all attributes are in the relation

Tuples® = 100,000
Pages® = 4,000

Index I5, #Leaf(I5) = 2,000
Index I2, #Leaf(I2) = 1,000

R has 100000/4000=25 tuples per page

I5 has 100000/2000= 50 tuples per page
I5 has 100000/2000= 100 tuples per page

  • Scan for C,D and A conditions
    Exp(Q1) = 100000 * (10/12) * (1/50) * (1/1000) = 2 tuples

    • How many leaf nodes? 2

I2 on R(A,C)
- Index scan for A condition:
Exp(A range) = 100000*(10/12) = 83,334 tuples
For I2, 83,334/100 = 834 pages
- #leaf(I2) = 1000, 10/12 of it is scanned: 100010/12
- After scanning, we find 100000
(10/12)*(1/50)=1666 tuples that satisfy A and C conditions, read tuples from disk. Read 1666 tuples from 1666 pages in the worst case.

  • Block nested loop join (M number of memory blocks available):

    • R join S (read left relation once ®, and the right relation a number of times?

    • PAGES®= 1,000

    • PAGES(S)= 500

    • M=1001, Cost = 1,000+ 500 (Read R once and Read S once)

    • M=101, Cost = 1,000 + 500 * (1000/100) = 6,000

    • M=51, Cost = 1,000 + 500 * (1000/50) = 11,000

    • S join R

    • M=1001, Cost = 1,500

    • M=101, Cost = 500 + (500/100) * 1,000 = 5,500

    • M=51, Cost = 500 + (500/50)*1,000 = 10,500

Lecture exercise 24:

PAGES®= 1,000

PAGES(S)= 500

PAGES(SELECT®)= 200 (size after selection)

M=101

  • Cost of reading R, 1,000

  • Cost of reading S: 500* (200/100) = 1,000

  • Cost of this join: 2,000

  • Sort:

PAGES(SELECT(R JOIN S))= 4,000 (size after join and selection)

  • Sorting, M=100

    • Step 1: Read T in 100 pages at a time, sort and write

    • Cost = Read once (0 cost since it is output by the join), write once = 4,000 (produce 4000/100=40 sorted groups)

    • Step 2: Merge 40 groups and output (Cost = read T once, 4,000)

    • Total cost = 2*4,000

  • Sorting

    • PAGES(T) = 4000, M=100

    • Step 1: Read T in 100 pages at a time, sort and write

    • Cost = Read once, write once = 8,000 (produce 4000/100=40 sorted groups)

    • Step 2: Merge 40 groups and output (Cost = read T once, 4,000)

    • Total cost = 3*4,000

  • PAGES(T)=4,000, M= 50

    • Step 1: Cost = 8,000 pages, produce 4,000/50=80 sorted groups

    • Step 2, part 1: Read once,write once, reduce 80 sorted groups to 2

    • Cost = 8,000

    • Step 2, part 2: Read once, merge 2 groups, Cost = 4,000

    • Total cost= 4,000*5