Lecture 17 ----------------- 1. Disk access is slow 2. All data may not fit in memory 3. Optimize number of disk pages read/written for each query -> Query Cost: number of disk pages read/written Data page/Data block: 8KB ------------------------------- moviescast | 38 pages SELECT COUNT(*) FROM moviescast; SELECT 6075/38.0; A single page of moviescast, stores 160 tuples. SELECT COUNT(*) FROM series; SELECT 945/85.0; SELECT AVG(LENGTH(castname)) FROM moviescast; ------------ A single table spans many disk pages, each disk page stores multiple tuples. ----------- Integer: 8 bytes SELECT current_setting('block_size'); SELECT relname, relpages FROM pg_class pc, pg_user pu WHERE pc.relowner = pu.usesysid and pu.usename = user ORDER BY relpages desc; ----------- What is the cost a query: SELECT * FROM moviescast ; Cost: 38 pages SELECT * FROM moviescast WHERE movieid = 30; Cost: 38 pages [Sequential Scan] Index on movieid? Secondary Access Structures: search index for a query + find matching tuples + read those tuples from disk Relation is stored on Disk ---------------- Dense index: a single node for each tuple Sparse index: a single node for a range of tuples B-tree indices ------------------ A node in a B-tree -> A disk page Leaf nodes: Point to tuples Internal nodes: Point to index nodes below B-tree of order n Leaf nodes: max n tuples + 1 sibling pointer // min: floor((n+1)/2) tuples plus one additional pointer to the next sibling node (nodes to be half full) Internal nodes: max n+1 pointers , n key values// min: floor((n+1)/2) (and one less key value) pointers and 1 less key Exception root: Min: 2 pointers and 1 key value Searching a B-tree N=6: equality search, # of nodes scanned: 3, one for each level N=54: equality search, # of nodes scanned: 3, one for each level Range query 12<=N<=45: # of nodes scanned: 6 Indexing ----------------------------------- 1. Primary vs. Secondary indices 2. Sparse vs Dense indices B-trees --------- B-tree order of n ~~~~~~~~~~~~~~~~~~~ Each node: a disk page Rule: each node should have between n/2 and n entries (except for root) Leaf nodes: point to tuples in the relation Internal nodes: point to nodes in the level below 3. B-tree order example: max/avg/min <#nodes/levels> 8K - a disk page R(A,B,C,D) -> 2,000,000 tuples Index I1 on R(C), and C is 50 bytes long, a page address is 15 bytes long Each index node has 1,200 bytes of header info n -> Maximum number of entries I can store in a single node: n = (8*1024 - 1200 - 15)/(50+15) = 107 53 - 107 entries in each ------- How big is this B-tree? Max//Smallest b-tree: Assume each node has 107 entries ceil(2000000/107) = 18692 leaf nodes ceil(18692/107) = 175 internal nodes above leaf ceil(175/107) = 2 internal nodes 1 root (depth 3, 4 levels) Min//Largest b-tree: Assume each node has 53 entries ceil(2000000/53) = 37736 leaf nodes ceil(37736/53) = 712 internal nodes above leaf ceil(712/53) = 14 internal nodes 1 root (4 levels) number of levels is in the order of log_n(K) where K is the number of tuples in the relation