Lecture 24#

Today’s topics:#

Remaining lectures:
- Today: Query Processing
- Next week: Concurrency in Transactions and Database Tuning
Remaining assignments:
- Hw6: Due on monday midnight
- Lecture Exercise 23: already out, due on Friday 12/5
- Lecture Exercise 24: out today, due on Friday 12/12
- Optional Lecture Exercise 3: out today, due on Friday 12/12
- Lecture Exercise 25: out on monday 12/8, due on Friday 12/12
We will drop the lowest 5 lecture exercises (despite the weird naming convention I used!). So two of these exercises can be treated as optional.
Use lecture exercises to study! Just because we drop some does not mean the material in them is optional. They are there for you to study.
Final Exam is on 12/18, 11:30AM-2:30PM on DCC 308
- If you need extra time, expect to take the exam between 11:30AM-4PM
- Notify me of any conflicts if you have not done so

Suppose PAGES®=1,000 and M=10,000 blocks available to the query,

Cost of external sort: 1,000 reads

Suppose PAGES®=1,000 and M=1,000 blocks available to the query,

Cost of external sort: 1,000 reads

Suppose PAGES®=1,000 and M=100 blocks available to the query

External sort, step 1:
- Repeat as long as more pages of R to read
  - Read 100 pages of R, sort and write to disk in a temporary location
- Cost of step 1: 2*PAGES® = 2,000 pages
- Resulting in 10 sorted groups of 100 pages each
Step 2: sort-merge phase
- Read page with the smallest values from each group in memory, until all pages from all sorted groups are
- Merge tuples in sorted order and output
- When a page becomes empty, read the next page from the same group
- I need to be able to read one page from each sorted group, for which I need 10 blocks (M=100)
- Cost= PAGES® = 1,000
Total cost = 3,000

Suppose PAGES®=1,000 and M=20 blocks available to the query

Step 1: Read/sort 20 blocks of data at a time
- Cost = 2,000
- Number of sorted groups = 1000/20= 50 groups
Step 2: (Read and merge 20 groups and write to disk)*(Repeat 3 times)
- Cost = 2,000
- Resulted in 3 sorted groups. (400, 400, 200 pages)
Step 2: Read and merge the 3 blocks (need 3 blocks), and output
- Cost = 1,000
Total cost = 5,000

PAGES®=1,000
PAGES(S)=5,000

Cost of Plan A-1:

PAGES( PROJECT_(R.A,R.B,R.C) ® = 500

Cost of block-nested-loop join, M=19
- R join S (18 blocks for R after the projection, 1 block for S):
  - Read R once (18 blocks at a time)
  - Read S, ceil(500/18) times
  - Total cost = 1,000 + 28*5,000 = 141,000 pages

No additional cost to select, so = 141,000 pages

Cost of Plan B-1:

PAGES( PROJECT_(R.B,R.C) (SELECT_(R.A=10) R) = 15

Index on R(A): 200 left nodes Selectivity (R.A=10) = 1/100 TUPLES® = 20,000

Select on R.A=10 using index on R(A)

Scan index 200/100 leaf nodes (for A=10) plus 2 internal
Expect: 20000/100 = 200 matching tuples in 200 pages in the worst case
Cost = 200+4 = 204
- Takes up 15 pages according to the given statistics
Block-nested-loop join (18 blocks for R after selection and 1 block for S)
- Can fit all matching tuples from the index scan in memory (15<18)
- Read S once
- Cost = 5,000 blocks (read R is in the index search)
Total cost = 5,204