<no title> — CSCI 4380 Database Systems 1 documentation

Sel(Cond) = percentage of tuples that will pass this condition

High selectivity, low Sel(Cond) value

0.001 is more selective than 0.1

Q1:
- Sel(A in range condition) = (between 1K-11K), 10/12
- Sel(C=8) = 1/50
- Sel(D=50) = 1/1000
- Attribute D is the most selective index, R(A)
- Sel(Q1= Cond1 and Cond2 and Cond3) = (10/12 * 1/50 * 1/1000)
- Exp(Q1) = Sel(Q1) * Tuples® = (10/12) * (1/50) * (1/1000) * 100000
- PAGES(Q1) = PAGES® * Sel(Q1)
- The best index for an index only search: R(CDAB) or R(DCAB)
- Suppose I have indices
  - I1 on R(A)
    - Scan 2+ leaf nodes for A condition: Cost = 2+ (10/12) * #leafnodes(I1)
    - Read the matching tuples to check C,D conditions and return the B attribute: 100,000*(10/12)
  - I2 on R(A,C)
    - Index scan cost = 2+ #leaf(I2) * (10/12) (A condition only)
    - After scanning, we find 100000*(10/12)*(1/50)=1666 tuples that satisfy A and C conditions, read tuples from disk. Read 1666 tuples from 1666 pages in the worst case.
  - I3 on R(D)
    - Index scan cost: Scan 2+ #leaf(I3)/1000 leaf nodes
    - Relation cost: 100000/1000=100 matching tuples to read from disk
  - I4 on R(B,C,D,A)
    - Scan all leaf nodes and no relation tuples
    - Cost = 2+ #leaf(I4)
  - I5 on R(C,D,A,B)
    - Scan for C,D and A conditions
    - Cost = 2 + #leaf(I5) * Sel(Q1) + 0 (relation cost) because all attributes are in the relation

Tuples® = 100,000
Pages® = 4,000

Index I5, #Leaf(I5) = 2,000
Index I2, #Leaf(I2) = 1,000

R has 100000/4000=25 tuples per page

I5 has 100000/2000= 50 tuples per page
I5 has 100000/2000= 100 tuples per page

Scan for C,D and A conditions
Exp(Q1) = 100000 * (10/12) * (1/50) * (1/1000) = 2 tuples
- How many leaf nodes? 2

I2 on R(A,C)
- Index scan for A condition:
Exp(A range) = 100000*(10/12) = 83,334 tuples
For I2, 83,334/100 = 834 pages
- #leaf(I2) = 1000, 10/12 of it is scanned: 100010/12
- After scanning, we find 100000(10/12)*(1/50)=1666 tuples that satisfy A and C conditions, read tuples from disk. Read 1666 tuples from 1666 pages in the worst case.

Block nested loop join (M number of memory blocks available):
- R join S (read left relation once ®, and the right relation a number of times?
- PAGES®= 1,000
- PAGES(S)= 500
- M=1001, Cost = 1,000+ 500 (Read R once and Read S once)
- M=101, Cost = 1,000 + 500 * (1000/100) = 6,000
- M=51, Cost = 1,000 + 500 * (1000/50) = 11,000
- S join R
- M=1001, Cost = 1,500
- M=101, Cost = 500 + (500/100) * 1,000 = 5,500
- M=51, Cost = 500 + (500/50)*1,000 = 10,500

Lecture exercise 24:

PAGES®= 1,000

PAGES(S)= 500

PAGES(SELECT®)= 200 (size after selection)

M=101

Cost of reading R, 1,000
Cost of reading S: 500* (200/100) = 1,000
Cost of this join: 2,000
Sort:

PAGES(SELECT(R JOIN S))= 4,000 (size after join and selection)

Sorting, M=100
- Step 1: Read T in 100 pages at a time, sort and write
- Cost = Read once (0 cost since it is output by the join), write once = 4,000 (produce 4000/100=40 sorted groups)
- Step 2: Merge 40 groups and output (Cost = read T once, 4,000)
- Total cost = 2*4,000

Sorting
- PAGES(T) = 4000, M=100
- Step 1: Read T in 100 pages at a time, sort and write
- Cost = Read once, write once = 8,000 (produce 4000/100=40 sorted groups)
- Step 2: Merge 40 groups and output (Cost = read T once, 4,000)
- Total cost = 3*4,000
PAGES(T)=4,000, M= 50
- Step 1: Cost = 8,000 pages, produce 4,000/50=80 sorted groups
- Step 2, part 1: Read once,write once, reduce 80 sorted groups to 2
- Cost = 8,000
- Step 2, part 2: Read once, merge 2 groups, Cost = 4,000
- Total cost= 4,000*5

Contents