Lecture 22#
Lecture exercise out today, due on saturday at midnight
Hope to release another optional lecture exercise, timing is unknown
Hope to release Hw#6 before the week ends
Exam#2 grading to be completed very soon (though probably not today)
Hw#5 grading is next
B-tree indices and index scan#
Review of search/insertion/deletion
Handling indices with multiple attributes
Handling indices with duplicate key values
Index search for answering select queries
Generalizing B-tree indices

Suppose we have the following query:
SELECT A,B,C,D FROM R WHERE R.A=10 AND R.B>5 AND R.B<10 AND R.C > 20;
And indices:
I1 on R(A)
I2 on R(B)
I3 on R(B,A,C)
I4 on R(A,B,C,D)
Possible access paths:
Sequential scan: read all pages of R
Use index I1:
find all tuples with A=10,
read the matching tuples from disk and check on B and C conditions,
output A,B,C,D
Use index I2:
find all tuples with B<10,
read the matching tuples from disk and check on A and C conditions,
output A,B,C,D
Use index I1, find all tuples with A=10
Use index I2, find all tuples with B<10,
Find the intersection of tuples to find those that satisfy both
read the matching tuples from disk and check on C condition,
output A,B,C,D
Use index I3, on R(B,A,C) :
Start the search at A=10 and B>5, end the scan at B>=10 (more or less same as the search for B>5 and B<10).
Find all tuples that satisfy conditions on A,B and C
Read the tuples to returnt the missing D value
Use index I4 on R(A,B,C,D)
Scan the index for A and B conditions
Return the A,B,C,D values of the found tuples that satisfy A,B,C conditions
This is called an index only scan
Assume a page has 8000 bytes of addressable space, a tuple pointer is 10 bytes
movieroles(actorid, movieid, role, info1, info2)
TUPLES(movieroles) = 265K PAGES(movieroles) = 2048
`select actorid, role from movieroles where actorid = 123;’
Cost = 2048
Index I1 on mr(actorid)
4+10= 14 bytes per entry
Index can store 570 entries per page. (8000/14)
Leaf level has 465 nodes (if all nodes are 100% full) – the tree will have 2 levels (root and leaf)
Leaf level has 665 nodes (if all nodes are 70% full) – the tree will have 3 levels (root, internal, leaf)
Index I2 on mr(movieid)
Same as I1
Index I3 on mr(movieid, actorid)
4+4+10= 18 bytes per entry
Index can store 445 entries per page. (8000/18)
Leaf level has 851 nodes (if all nodes are 70% full) – the tree will have 3 levels (root, internal, leaf)
Index I4 on mr(actorid, movieid, role)
4+4+15+10=23 bytes per entry
Index can store 347 entries per page
1091 leaf nodes at 70% full – the tree will have 3 levels (root, internal, leaf)
Assume indices are 70% full:
`select movieid, role from movieroles where actorid = 1234;’
Sequential scan: Cost = 2048
Index I1, (assume about 5 movies per actor), 1 or 2 leaf nodes to scan
Index scan: 1 root, 1 internal, 1 or 2 (worst case) leaf nodes = 4 disk pages
Read the matching 5 tuples from disk (worst case 5 different pages)
Total cost = 4 (index scan) + 5 (reading the tuples) = 9 pages
I2 not usable for this query
I3:
Index scan:Scan root, internal and then all leaf: 2+ 851 nodes
Read the matching 5 tuples from disk for the “role” attribute (worst case 5 different pages)
Total cost: 853+5= 858
I4:
Index scan: 1 root, 1 internal, 1 or 2 (worst case) leaf nodes = 4 disk pages
Return movieid, role directly from the index
Total cost = 4 (index scan) = 4 pages
1091/(347*.70)
4.491560312885961