Announcements
-----------------

- Hw#6 due next monday, next homework due on thursday after
  Thanksgiving

- Lecture exercise today


Sequential scan:
-----------------

Read every data page and check condition

Cost of query: PAGES(R)

Ex:

Movies(id, title, years, ep_num, ep_name)

Index I1 on Movies(title, years)

Cost of answering queries:
--------------------------

q1:

select
   id
from
   movies
where
   ep_name is not null ;

I1 is not applicable
Cost = sequential scan, pages(R)

q2:

select
   id
from
   movies
where
   title like 'A%' ;

Cost:

1. Sequential scan: pages(B)
2. Index scan:
    Scan the index (internal modes + leaf nodes) to find
        title like 'A%'
    + Read data pages for the found tuples
      (in the worst case as many as number of tuples that match the query)

q2':

select
   title
   , years
from
   movies
where
   title like 'A%' ;

Cost:

1. Sequential scan: pages(B)
2. Index scan:
    Scan the index (internal modes + leaf nodes) to find
        title like 'A%'

(Index only scan!)

q2'':

select
   count(*)
from
   movies
where
   title like 'A%' ;

Same as q2'

q3:

select title
from   movies
where  title like 'A%'
       and ep_num is not null ;

Index scan cost: same as q2, scan for title and then read found tuples
from data pages to check the ep_num condition

q4:

select id
from movies
where  years = 2010 ;

Index scan: internal nodes+ all leaf nodes + read all matching tuples
from data pages (worst case: one data page per matching tuple)

q5:

select title
from movies
where years = 2010 ;

Movies(id, title, years, ep_num, ep_name)
Index I1 on Movies(title, years)

Index scan: internal nodes+ all leaf nodes (index only scan)


Index Types
--------------------------

Secondary:

1. B-tree indices that point to the data pages


Primary:

1. Hashing  -> hashed the data pages on title

select
   id
from
   movies
where
   title like 'A%' ;

Given the data is stored hashed on title, the cost of this query is 1
(or the number of pages in the bucket for hash(title=A)

select title
from movies
where years = 2010 ;

Hash is not useful, but the whole relation can be scanned


2. Clustering

Sort the data pages on disk based on an attribute

select DISTINCT
    m2.actorid
    , a2.name
    , a2.surname
from
    actors a1  --find KB id
    , movieroles m1  -- find movies for KB
    , movieroles m2
    , actors a2
where
    a1.id = m1.actorid
    and m1.movieid = m2.movieid
    and a1.name like 'Kevin%'
    and a1.surname = 'Bacon'
    and m2.actorid <> a1.id
    and a2.id = m2.actorid;

Cluster by movierole(actorid) and actors(actorid)

-----------------------

Creating B-tree indices:

Create index idx1  on movies(title) ;

====================

Query Processing


Memory (buffer) available to a query: M blocks (same size as M pages)

Sequential Scan: