Lecture 20 - Secondary Storage Management#
Announcements#
Hw#5 is due tonight, I will extend the deadline to monday night. We will grade it over the Thanksgiving break.
No new hw or lecture exercise today
We need to discuss the next hw assignment.
Secondary Storage#
Data is stored on disk for stable storage.
Query execution, to run a query
Parse query, validate, optimize and generate query plan
Execute the query:
Read the data from disk into memory
Compute operations (such as joins, group by, sort, etc.)
Output the results
Reading data from disk is much much more expensive than doing operations in memory.
Unit of data to be read is a disk page, often 8K in size, read into a memory block. We will assume a disk page and a memory block are the same size.
For any relation, we will use PAGES® as the total number of pages the relation is stored in
A page stores many tuples, how many tuples depends on the size of a tuple
Types of storage#
Magnetic disks: to read a page, we need to move a disk head (seek) and wait for the disk to spin to the correct place (rotational latency) and read the page (data transfer)
Cost of reading a page = seek time + rotational latency + transfer time
Cost of reading 100 pages:
100 pages in random locations: = 100 * (seek time + rotational latency + transfer time)
100 pages in a sequential location (same track)
= seek time + rotational latency + 100* transfer timeThere is a big difference in cost of random and sequential I/O
Solid state disks: no movable parts and faster than random I/O for sure
Virtual machines (access disk through a virtualization layer that has additional costs)
ioping . -warmup 5 -c 10
us -> microseconds, 1/1 000 000 ms -> miliseconds, 1/1 000
Solid state: min/avg/max/mdev = 51 us / 56 us / 60 us / 3.52 us Magnetic disk: min/avg/max/mdev = 158.4 us / 170.9 us / 180.2 us / 8.82 us Virtual machine: min/avg/max/mdev = 472.6 us / 504.5 us / 588.1 us / 43.7 us
Cost of a query: Number of pages read from disk (or written to disk) to answer a query
SELECT * FROM R WHERE R.A = 5 ;
What is the cost of this query?
Sequential scan:
Read all the data pages into memory
Read all the tuples and check if R.A=5
If yes, output
Cost of sequential scan: PAGES®
The main cost in sequential scan is driven by how many pages the relation spans, not how many tuples it has