Announcements
---------------
Hw#7 to be assigned later today (friday morning to latest)
Due on Monday November 29th
Expect another homework after that
FINAL EXAM on: Thursday, 12/16 6:30pm-9:30pm
Please email me all conflicts
-> What course does it conflict with and who is the professor
Query Processing
-----------------
Cost of queries: Number of disk pages read/written
Each relational algebra operator implements the same interface
Iterator interface:
---------------------
Initialize: Allocate memory space to the operator, initialize
relations to be read
Get_Next: Do processing needed to produce 1 block of output
Close: Deallocate memory space and release any resources
---------------
For each operator: let M is the number of blocks of memory allocated
Relations R ->
PAGES(R) is the number of pages for relation R
TUPLES(R) is the numebr of tuples in the relation R
Often, TUPLES(R) >> PAGES(R)
Single pass operations:
------------------------
Sequential scan (R)
PAGES(R) and only needs M=1 blocks
Duplicate Removal (R)
Group by (R)
-Depends on whether M-1 blocks is sufficient to hold the
temporary results in memory
R union all S
R union S
R except all S
R except S
R intersect all S
R intersect S
-Depends on whether M-1 blocks is sufficient to hold the
temporary results in memory
See costs in hand written notes
Index Scan
-----------
Relation R
PAGES(R) = 2,000
TUPLES(R) = 20,000
(How many tuples of R per page on average? 10 )
Index I on R(C)
with 3 levels (root, internal, leaf)
and 400 nodes at leaf level
( Each leaf node of I1 on average indexes how many tuples? -20000/400= 50)
TUPLES(R.C=10) = 120
Q1: SELECT A,B FROM R WHERE C=10;
Cost of query Q1:
Sequential scan: Cost = 2,000 pages
Using index scan with I1 (M=2):
From index = Root + Internal + 3 or 4 leaf nodes = 5 or 6 pages (nodes)
From relation = 120 pages in the worst case
(each tuple is in a different page)
Total = 126 (or 125)
PAGES(R) = 2,000
TUPLES(R) = 20,000
(How many tuples of R per page on average? 10 )
Index I2 on R(B,C)
with 3 levels (root, internal, leaf)
and 800 nodes at leaf level
(Each leaf node of I2 on average indexes how many tuples? -20000/800= 25)
TUPLES(R.C=10) = 120
Q1: SELECT A,B FROM R WHERE C=10;
Cost of Q1 using I2:
Index scanning cost + Relation scanning cost
2+ 800 + 120 (worst case) = 922
Sequantial scan = 2000
Using I1 = 126
Using I2 = 922
Index I3 on R(C,A,B)
with 3 levels (root, internal, leaf)
and 1200 nodes at leaf level
(Each leaf node of I3 on average indexes how many tuples? -20000/1200= 16 tuples)
Q1: SELECT A,B FROM R WHERE C=10;
120 tuples are stored in how many leaf nodes? 120/16= 8 approx
Cost of Q1 using I3:
Index scanning cost + Relation scanning cost
2 + 8 + 0 (index already stores A and B)
Index types:
1. Selective index for a condition C: only a few tuples match condition C
(very low relation scan cost)
2. Indices that enable index only searches (storing attributes needed
for a query)
Q2: SELECT A,B FROM R WHERE C=10 AND D > 5;
Index I3 on R(C,A,B)
Scan index I3 using C=10, then read matching tuples to check on D>5
Cost of answering Q2 using I3?
Index scanning cost + Relation scanning cost
2 + 8 + 120 (worst case, all tuples for C=10)