Query Execution: ---------------------- How to speed up queries: - better queries - buy more memory - change your data model - create indices// remove indices --------------------- Query plan for any SQL query -> relational algebra operators // implementation How is data stored: relation clustered/hashed/distributed? - what indices are available What are the resources available for the query: memory allocation: M ----------------------------- Unit of cost: Number of pages read/written Size of a relation R, PAGES(R) Memory allocated M determine the cost ------- Iterator interface: open/close/get_next ----------------------- Sequential Scan: ----------------- SELECT * FROM R WHERE ; Memory M = 1 Cost = PAGES(R) Ex. SELECT A,B,C FROM R WHERE B=10; Pages(R) = 8,000 Tuples(R) = 600,000 Cost of sequential scan = 8,000 Index scan: ------------- Cost of index scan: Cost of scanning the index + cost of reading tuples from relation as needed Query 1: SELECT A,B,C FROM R WHERE B=10; [Query returns 200 tuples] Pages(R) = 8,000 Tuples(R) = 600,000 Index RI1 ON R(A) (3 levels, 400 pages at the leaf) Cost// don't use this index/not applicable Index RI2 ON R(B) (3 levels, 800 pages at the leaf) Cost = 4 (scanning the index) + 200 (worst case: each tuple is in a different page) RI2 stores: 600000/ 800 = 750 tuples/node Index RI3 ON R(A,B) (3 levels, 1,000 pages at the leaf) Cost = 2+1000+200 Index RI4 ON R(B,C,A) (3 levels, 1,500 pages at the leaf) RI4 stores: 600000/1500 = 400 tuples/node Cost = 4+ 0 (Index only scan!) ----------------------- Query 2: SELECT A,B,C FROM R WHERE B=10 AND A='romney'; [Query returns 15 tuples] B=10 returns 200 tuples A='romney' returns 1,000 tuples Index RI1 ON R(A) (3 levels, 400 pages at the leaf) R11 stores = 1500 tuples/node Scan for: A='romney' returns 1,000 tuples Cost of only scanning the index: 4 Multi-index scan: ---------------------- Index RI2 ON R(B) (3 levels, 800 pages at the leaf) RI2 stores: 600000/ 800 = 750 tuples/node Scan for: B= 10 Cost of only scanning the index: 4 Intersect the found tuple addresses in memory [no additional disk cost] For the matching 15 tuples: read from disk = 15 disk pages Cost = 4 (RI1) + 4 (RI2) + 15 pages. Cost of using Index RI2 to answer query 2? ------------------------------------------- Index RI2 ON R(B) (3 levels, 800 pages at the leaf) Total Cost = 4 + 200 Cost of using Index RI1 to answer query 2? ------------------------------------------- Total Cost = 4 + 1,000 Selectivity of an index on attr A Lower this value is: Tuples(R.A=c)/Tuples(R), the more selective attribute A is..... Creating indices: 1. Create single attribute indices on more selective attributes. 2. If there is a frequent range query (c1