Lecture 21 - B-tree indices#

Announcements#

Homework 5 due tonight
Lecture Exercise 21 to be posted at 2pm today, due on wednesday midnight
Initial (unofficial) version of the grade db is up

Today’s lecture#

Secondary access methods
- Tree indices
B-tree indices
Insertion/deletion to B-trees
Searching in B-trees

Two variables:

Pages ®
Tuples®
So if X = Tuples®/Pages® then X is the number of tuples per disk page
Note that X depends on how large the tuples are
Primary access method:read every page of a relation to answer a query
- Sequential scan
- Cost: Pages®
Secondary access method: an additional access method for a query
- An index on a number of attributes
Each node in a dense index contains an entry for each tuple
- An entry is the key value (if I index on R(A), then A is the key) and a pointer to the tuple
Each node in a sparse index stores an entry for a range of values. Often the upper levels of tree indices are sparse.

load_ext sql

%config SqlMagic.displaylimit = None
%sql --section parks

displaylimit: Value None will be treated as 0 (no limit)

Connecting to 'parks'

%%sql 

SELECT 
   pt.tablename
    , pc.reltuples
    , pc.relpages
FROM   
    pg_class pc
    , pg_tables pt
WHERE  
    pt.tableowner=user
    and pt.tablename not like 'pg%'
    and pt.tablename = pc.relname
order by pc.relpages desc;

Running query in 'parks'

22 rows affected.

tablename	reltuples	relpages
places	531.0	112
place_tags	3261.0	35
people	222.0	32
people_facts	1130.0	21
place_relatedparks	565.0	19
people_tags	1593.0	17
people_relatedparks	249.0	12
park_topics	541.0	9
sql_features	755.0	8
people_images	149.0	7
park_images	151.0	7
parks	32.0	6
park_activities	394.0	5
park_hours	137.0	2
park_emails	32.0	1
park_phoneno	41.0	1
park_addresses	64.0	1
sql_implementation_info	12.0	1
sql_sizing	23.0	1
sql_parts	11.0	1
park_states	75.0	1
output_lines	0.0	0

%%sql


select 
    relpages
    , relname 
from 
    pg_class 
where 
    relname not like 'pg%' 
    and relpages>0;

Running query in 'parks'

36 rows affected.

relpages	relname
2	parks_pkey
13	park_topics_pkey
6	parks
9	park_topics
1	park_addresses
7	park_images
5	pimpk
2	park_phoneno_pkey
1	park_phoneno
2	park_emails_pkey
1	park_emails
2	park_states_pkey
1	park_states
5	park_activities
4	park_hours_pkey
2	park_hours
19	place_relatedparks
35	place_tags
7	places_pkey
112	places
44	place_tags_pkey
10	plrpk
5	people_pkey
22	people_tags_pkey
32	people
17	people_tags
12	people_relatedparks
6	perpk
22	people_facts_pkey
21	people_facts
7	people_images
6	pipk
1	sql_parts
8	sql_features
1	sql_implementation_info
1	sql_sizing

Suppose I create an index on places(placeid)

where placeid is 36 bytes a tuple pointer is 10 bytes

For each tuple, I need 46 bytes

TUPLES(places) = 531

A disk page is 8000 bytes (assume)

8000/(46) approx= 170 tuples per page

531/170 approx= 4 pages of index

B-tree indices#

Secondary access method on a key value, R(A) or R(A,B)
Each node in the B-tree is a disk page, stored on disk
Leaf nodes of the B-tree are dense, leaf nodes store on entry per tuple
- An entry: a key value + a disk page pointer (points to a tuple)
Internal nodes of the B-tree are sparse, they point to the B-tree nodes in the level below.
- Each entry is a key value and a disk page pointer (points to a B-tree node page)
Leaf nodes have an extra pointer to the next node in the leaf to the right (sibling pointers)
Each node in the B-tree has a maximum capacity, which we call n: maximum number of key value and pointer pairs I can store in a page
Each node (except root) has a minimum capacity, floor((n+1)/2) entries

Searching in B-trees#

Searching key = value

Start from root, find the internal node in the next level until you reach the leaf level
If the value is found, return the key value and tuple address

Searching key in range [A,B]

Start from root searching for the first leaf node >=A, find the internal node in the next level until you reaach the leaf level.
Scan the leaf level using sibling pointers until you find the first leaf node greater than B.
Output all found key values and associated tuple pointers

Lecture 21 - B-tree indices

Contents

Lecture 21 - B-tree indices#

Announcements#

Today’s lecture#

B-tree indices#

Searching in B-trees#