Announcements ------------------ - No new hw till monday - Lecture exercise today Space, the final frontier --------------------------- RAM << Hard disk Data resides on hard disk and is brought to memory Disks are much slower than memory SSD: fast but expensive Magnetic disks: cheap but slow, but requires back up solutions or RAID Virtual machines (running on magnetic disk): more expensive but incorporates stable storage, but even slower than magnetic disks For any large data, hard disk is much slower than memory DB Cost = DISK I/O + CPU cost (but assume CPU cost is negligible for most large data) Sequential vs random I/O ioping: ------------- us -> microseconds, 1/1 000 000 ms -> miliseconds, 1/1 000 --------- Data Page: A single unit of data that I can read from hard disk (eg. 8KB) Memory block - assume it is the same unit of data -------- SELECT * FROM authors WHERE author = 'Stephen King' ; Cost of this query? = Read every page in the authors relation and check if author is Stephen King Sequential read: Cost = number of pages in the author relation Multilevel indices --------------------- Relation is stored in data pages SELECT oid::regclass AS tbl, relpages FROM pg_class WHERE relname = 'books'; Relation ratings: 38K pages, but many attributes Index on ISBN13 For each tuple: 13 chars (ISBN13) + a link to the page taht stores that tuple (suppose 20 chars) = 33 chars 8000/33 = 242 index tuples per page Relation ratings has 396,844w tuples Index at the leaf level: 396,844/242 = 1640 pages (leaf) 1640/242 = 7 pages at the next level (level 1) 7./242 = 1 page at the next level (root) Given a B-Tree of order n: --------------- 1. Each node of a B-tree is a disk page 2. Each node can store at most n key values (n+1 pointers) and at least n/2 values (each node is at least half full, with the exception of root) 3. At leaf level, we store (in sorted order of indexed attribute), for each tuple in the relation, the indexed key value and a pointer to the page containing the tuple A leaf node can address at most n tuples and has n+1 pointers (1 pointer to the next leaf node in the B-tree) 4. Internal nodes point to index nodes at the level below. An internal node can have at most n key values and n+1 pointers 5. Root can have anywhere between 2 to n+1 pointers (1 to n key values) Example: 100,000 tuples B-tree is of order 100 (suppose all nodes are full!, except for root!) (Each node indexes 100 tuples) How large is this B-tree? How many nodes in the leaf level? ceil(100,000/100) = 1,000 nodes At the next internal level: ceil(1,000/100) = 10 nodes At the next internal level: ceil(10/100) = 1