Announcements
------------------

- Monday December 12 at 3PM: review session and final office hours
  Will post link on Submitty

- Final exam is comprehensive! I will post the self-study topic list
  and the final exam schema by later today or tomorrow morning the
  latest.

- Exam is on Wednesday December 14, 3:00PM-6:00PM (West 220)

  - Those needing extra accommadations will get an email from me soon.


Database Tuning
------------------


- Workload: A collection of DB operations: queries/inserts/updates/deletes
and frequency of each operation

- Response time expectations: time to first result and time to all results

- Benchmarking


Hardware/Systems Tuning
------------------------

- Database administration
- Better hardware: more memory, faster disks, CPU with fast L3 caches
- Memory per process, max run time, and other adjustable parameters

- tune to operating system, change page size
- distribute data to multiple disks using tablespaces
- parallelism to make use of multiple CPUs

Query Tuning
----------------

- track slow queries
- analyze ; - periodically recreate db statistics
- explain;  - check query plans

- write them in multiple ways and test


1. select * from t where t.a_timestamp + interval ‘3 days’ < current_timestamp; 
2. select * from t where t.a_timestamp < current_timestamp - interval ‘3 days’;

1. select ..... where extract(year from published_date) = 2022;
2. select ..... where published_date >= '01/01/2022' and published_date <='12/31/2022';


1.  select isbn13 from books where title like '%Game%';
2.  select isbn13 from books where lower(title) like 'a game%';
3.  select isbn13 from books where title like 'A Game%';

Index on title, will be scanned fully for 1 and 2, but partially for 3.

- Optimizer may think certain queries are not selective

-- Suppose A only takes values 1,2,3
A<>3 - not very selective!
A in (1,2) - more selective

- Order by/distinct are costly, avoid if not necessary
- Nested queries/subqueries: be careful, avoid correlated subqueries is possible

- Balance cost of a query with the cost of other processing
  (operations and data shipping)

------------------------------

Indices
-------

- Indexes speed up queries, slow down insert/update/delete
  - assess overall workload cost

- Clustered index/sort the data itself (only 1 per relation,
  periodically reorganize it)

MovieRoles(actorid, movieid)

Cluster by: MovieRoles(actorid, movieid)

--> extremely useful for queries: actorid = 1000
    all movies of an actor
--> not so useful for quries: movieid = 5000
    all actors in a movie


- Create all reasonable selective indices
  Indices that reduce the number of matching tuples considerably for a query
  Attributes where NDISTINCT(A) is large, A= c

   Index on R(A)

- Less selective indices: add frequently queried attributes

   Suppose B is not selective but I frequently ask:
   select C from R where B=c1;

   Index on R(B,C) -- allows index only search


Index types:

create index mr_idx on movieroles(movieid,actorid) ;
cluster movieroles using mr_idx ;

-- recluster using

cluster movierolesl


CREATE INDEX ON films ((lower(title)));

CREATE INDEX title_idx_german ON films (title COLLATE "de_DE");

CREATE INDEX title_idx_nulls_low ON films (title NULLS FIRST);

CREATE INDEX code_idx ON films (code) TABLESPACE indexspace;


-- partial indices
CREATE INDEX orders_unbilled_index ON orders (order_nr)
    WHERE billed is not true;
    
CREATE UNIQUE INDEX tests_success_constraint ON tests (subject, target)
    WHERE success;

- Consider other types of indices

Text keyword query: inverted file indices (GIN)
Spatial data query: spatial index: R-tree, quad-trees, kd-trees (GIST)

Normalization
-----------------

- Good normalized data simplifies application logic, easier to modify

A->B, B->C

R1(A,B)
R2(B,C)

- Denormalize

  - Add redundant data or aggregates to your data model to improve performance
    Actor(id, ... , nummovies) --need to implement a workflow to keep this updated

- Column store: Extreme normalization (key stores)


S(A,B,C,D,E)  A is the key

Store as: (A,B) (A,C), (A,D) (A,E)