Lecture 24#
Today’s topics:#
External Sort
Query Processing and Pipelining
Query Optimization
Announcements#
Remaining lectures:
Today: Query Processing
Next week: Concurrency in Transactions and Database Tuning
Remaining assignments:
Hw6: Due on monday midnight
Lecture Exercise 23: already out, due on Friday 12/5
Lecture Exercise 24: out today, due on Friday 12/12
Optional Lecture Exercise 3: out today, due on Friday 12/12
Lecture Exercise 25: out on monday 12/8, due on Friday 12/12
We will drop the lowest 5 lecture exercises (despite the weird naming convention I used!). So two of these exercises can be treated as optional.
Use lecture exercises to study! Just because we drop some does not mean the material in them is optional. They are there for you to study.
Final Exam is on 12/18, 11:30AM-2:30PM on DCC 308
If you need extra time, expect to take the exam between 11:30AM-4PM
Notify me of any conflicts if you have not done so
External Sort#
Suppose PAGES®=1,000 and M=10,000 blocks available to the query,
Cost of external sort: 1,000 reads
Suppose PAGES®=1,000 and M=1,000 blocks available to the query,
Cost of external sort: 1,000 reads
Suppose PAGES®=1,000 and M=100 blocks available to the query
External sort, step 1:
Repeat as long as more pages of R to read
Read 100 pages of R, sort and write to disk in a temporary location
Cost of step 1: 2*PAGES® = 2,000 pages
Resulting in 10 sorted groups of 100 pages each
Step 2: sort-merge phase
Read page with the smallest values from each group in memory, until all pages from all sorted groups are
Merge tuples in sorted order and output
When a page becomes empty, read the next page from the same group
I need to be able to read one page from each sorted group, for which I need 10 blocks (M=100)
Cost= PAGES® = 1,000
Total cost = 3,000
Suppose PAGES®=1,000 and M=20 blocks available to the query
Step 1: Read/sort 20 blocks of data at a time
Cost = 2,000
Number of sorted groups = 1000/20= 50 groups
Step 2: (Read and merge 20 groups and write to disk)*(Repeat 3 times)
Cost = 2,000
Resulted in 3 sorted groups. (400, 400, 200 pages)
Step 2: Read and merge the 3 blocks (need 3 blocks), and output
Cost = 1,000
Total cost = 5,000
PAGES®=1,000
PAGES(S)=5,000
Cost of block-nested-loop join, M=19
R join S (18 blocks for R, 1 block for S):
Read R once (18 blocks at a time)
Read S, ceil(1,000/18) times
Total cost = 1,000 + 56*5,000 = 281,000 pages
S join R (18 blocks of R, 1 block of S)
Read S once
Read R 5000/18 = 278 times
Total cost = 5000 + 278*1000 = 283,000
Cost of Plan A-1:
PAGES( PROJECT_(R.A,R.B,R.C) ® = 500
Cost of block-nested-loop join, M=19
R join S (18 blocks for R after the projection, 1 block for S):
Read R once (18 blocks at a time)
Read S, ceil(500/18) times
Total cost = 1,000 + 28*5,000 = 141,000 pages
No additional cost to select, so = 141,000 pages
Cost of Plan B-1:
PAGES( PROJECT_(R.B,R.C) (SELECT_(R.A=10) R) = 15
Index on R(A): 200 left nodes Selectivity (R.A=10) = 1/100 TUPLES® = 20,000
Select on R.A=10 using index on R(A)
Scan index 200/100 leaf nodes (for A=10) plus 2 internal
Expect: 20000/100 = 200 matching tuples in 200 pages in the worst case
Cost = 200+4 = 204
Takes up 15 pages according to the given statistics
Block-nested-loop join (18 blocks for R after selection and 1 block for S)
Can fit all matching tuples from the index scan in memory (15<18)
Read S once
Cost = 5,000 blocks (read R is in the index search)
Total cost = 5,204