Lecture 24#

Today’s topics:#

  • External Sort

  • Query Processing and Pipelining

  • Query Optimization

Announcements#

  • Remaining lectures:

    • Today: Query Processing

    • Next week: Concurrency in Transactions and Database Tuning

  • Remaining assignments:

    • Hw6: Due on monday midnight

    • Lecture Exercise 23: already out, due on Friday 12/5

    • Lecture Exercise 24: out today, due on Friday 12/12

    • Optional Lecture Exercise 3: out today, due on Friday 12/12

    • Lecture Exercise 25: out on monday 12/8, due on Friday 12/12

  • We will drop the lowest 5 lecture exercises (despite the weird naming convention I used!). So two of these exercises can be treated as optional.

  • Use lecture exercises to study! Just because we drop some does not mean the material in them is optional. They are there for you to study.

  • Final Exam is on 12/18, 11:30AM-2:30PM on DCC 308

    • If you need extra time, expect to take the exam between 11:30AM-4PM

    • Notify me of any conflicts if you have not done so

External Sort#

Suppose PAGES®=1,000 and M=10,000 blocks available to the query,

Cost of external sort: 1,000 reads

Suppose PAGES®=1,000 and M=1,000 blocks available to the query,

Cost of external sort: 1,000 reads

Suppose PAGES®=1,000 and M=100 blocks available to the query

  • External sort, step 1:

    • Repeat as long as more pages of R to read

      • Read 100 pages of R, sort and write to disk in a temporary location

    • Cost of step 1: 2*PAGES® = 2,000 pages

    • Resulting in 10 sorted groups of 100 pages each

  • Step 2: sort-merge phase

    • Read page with the smallest values from each group in memory, until all pages from all sorted groups are

    • Merge tuples in sorted order and output

    • When a page becomes empty, read the next page from the same group

    • I need to be able to read one page from each sorted group, for which I need 10 blocks (M=100)

    • Cost= PAGES® = 1,000

  • Total cost = 3,000

Suppose PAGES®=1,000 and M=20 blocks available to the query

  • Step 1: Read/sort 20 blocks of data at a time

    • Cost = 2,000

    • Number of sorted groups = 1000/20= 50 groups

  • Step 2: (Read and merge 20 groups and write to disk)*(Repeat 3 times)

    • Cost = 2,000

    • Resulted in 3 sorted groups. (400, 400, 200 pages)

  • Step 2: Read and merge the 3 blocks (need 3 blocks), and output

    • Cost = 1,000

  • Total cost = 5,000

PAGES®=1,000
PAGES(S)=5,000

  • Cost of block-nested-loop join, M=19

    • R join S (18 blocks for R, 1 block for S):

      • Read R once (18 blocks at a time)

      • Read S, ceil(1,000/18) times

      • Total cost = 1,000 + 56*5,000 = 281,000 pages

    • S join R (18 blocks of R, 1 block of S)

      • Read S once

      • Read R 5000/18 = 278 times

      • Total cost = 5000 + 278*1000 = 283,000

Cost of Plan A-1:

PAGES( PROJECT_(R.A,R.B,R.C) ® = 500

  • Cost of block-nested-loop join, M=19

    • R join S (18 blocks for R after the projection, 1 block for S):

      • Read R once (18 blocks at a time)

      • Read S, ceil(500/18) times

      • Total cost = 1,000 + 28*5,000 = 141,000 pages

No additional cost to select, so = 141,000 pages

Cost of Plan B-1:

PAGES( PROJECT_(R.B,R.C) (SELECT_(R.A=10) R) = 15

Index on R(A): 200 left nodes Selectivity (R.A=10) = 1/100 TUPLES® = 20,000

Select on R.A=10 using index on R(A)

  • Scan index 200/100 leaf nodes (for A=10) plus 2 internal

  • Expect: 20000/100 = 200 matching tuples in 200 pages in the worst case

  • Cost = 200+4 = 204

    • Takes up 15 pages according to the given statistics

  • Block-nested-loop join (18 blocks for R after selection and 1 block for S)

    • Can fit all matching tuples from the index scan in memory (15<18)

    • Read S once

    • Cost = 5,000 blocks (read R is in the index search)

  • Total cost = 5,204