Announcements ----------------- - Hw#6 due next monday, next homework due on thursday after Thanksgiving - Lecture exercise today Sequential scan: ----------------- Read every data page and check condition Cost of query: PAGES(R) Ex: Movies(id, title, years, ep_num, ep_name) Index I1 on Movies(title, years) Cost of answering queries: -------------------------- q1: select id from movies where ep_name is not null ; I1 is not applicable Cost = sequential scan, pages(R) q2: select id from movies where title like 'A%' ; Cost: 1. Sequential scan: pages(B) 2. Index scan: Scan the index (internal modes + leaf nodes) to find title like 'A%' + Read data pages for the found tuples (in the worst case as many as number of tuples that match the query) q2': select title , years from movies where title like 'A%' ; Cost: 1. Sequential scan: pages(B) 2. Index scan: Scan the index (internal modes + leaf nodes) to find title like 'A%' (Index only scan!) q2'': select count(*) from movies where title like 'A%' ; Same as q2' q3: select title from movies where title like 'A%' and ep_num is not null ; Index scan cost: same as q2, scan for title and then read found tuples from data pages to check the ep_num condition q4: select id from movies where years = 2010 ; Index scan: internal nodes+ all leaf nodes + read all matching tuples from data pages (worst case: one data page per matching tuple) q5: select title from movies where years = 2010 ; Movies(id, title, years, ep_num, ep_name) Index I1 on Movies(title, years) Index scan: internal nodes+ all leaf nodes (index only scan) Index Types -------------------------- Secondary: 1. B-tree indices that point to the data pages Primary: 1. Hashing -> hashed the data pages on title select id from movies where title like 'A%' ; Given the data is stored hashed on title, the cost of this query is 1 (or the number of pages in the bucket for hash(title=A) select title from movies where years = 2010 ; Hash is not useful, but the whole relation can be scanned 2. Clustering Sort the data pages on disk based on an attribute select DISTINCT m2.actorid , a2.name , a2.surname from actors a1 --find KB id , movieroles m1 -- find movies for KB , movieroles m2 , actors a2 where a1.id = m1.actorid and m1.movieid = m2.movieid and a1.name like 'Kevin%' and a1.surname = 'Bacon' and m2.actorid <> a1.id and a2.id = m2.actorid; Cluster by movierole(actorid) and actors(actorid) ----------------------- Creating B-tree indices: Create index idx1 on movies(title) ; ==================== Query Processing Memory (buffer) available to a query: M blocks (same size as M pages) Sequential Scan: