Database Systems -- LAST LECTURE Office hours next week: monday and tuesday Tuning up a database! ------------------------- - Find out about your workload - Queries/update and their cost/frequency - Insert/update and their cost/frequency - Choose interventions: - One intervention may impact many queries in your cost (both positively and negatively) - Load test and revise! Interventions ------------------ Data modeling: - Why normalize? - Easier to maintain and use A->BCD R(A,B,C,D) R1(A,B) R2(A,C) R3(A,D) - Why denormalize? - Improve joins etc. for infrequently modified data A->B B->C R1(A,B) R2(B,C) BCNF Denormalize! R(A,B,C) ---> Query writing: - Nested queries, correlated queries, subqueries can be costly to run and harder to optimize - miss out which indices can be used - Avoid sorts if not needed, avoid conditions that will estimated as low selectivity - Use views purposefully, do not introduce more complexity through views - All DBMSs and different versions of the same DBMS may have different peculiarities, be flexible and learn to do the same thing in multiple ways. Application programming: - Ship data or ship processed data - Maintainability - Ease of debugging, catching errors, implementing atomic transactions, etc. System Level: - Store unused portions of data in different locations - Understand how to use DBMS memory effectively - Query optimization: learn how good your DBMS performs is in query size estimation - Learn distributed and parallel computing components of DBMSs - Optimize query and update heavy systems differently - Faster hard disk, more hard disks (RAID), more RAM \/ Indexing: Find and create most useful indices - A clustered index allows fast access to a range query - There can be only one clustered index per relation, and it needs to be maintained regularly. - Prioritize the most important queries (cost*frequency) --> Consider other index types: Inverted files (text fields), R-trees (spatial data) - Find the most selective conditions in your queries -> A = C and A has many distinct values - Make sure the DB size estimation agrees with you! - Create the index and see if it is being used - If multiple attributes are indexed, the ordering matters - After selective indices are exhausted, consider less selective indices that facilitate index only searches - You may consider merging multiple indices to one if updates are also frequent as a trade off \/ Some thoughts for the future ------------------------------ - Data governance: 1. Data Regulation at a Global Scale 2. Data Trust for users 3. Quality and transparency in data \/ - Data and Society Institute https://datasociety.net \/ - Ethics considerations: Weapons of Math Destruction, Cathy O'Neil Automating Inequality, Virginia Eubanks Race After Technology, Ruha Benjamin Algorithms of Oppression, How Search Engines Reinforce Racism, Safiya Umoja Noble Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass by Mary L. Gray, Siddharth Suri The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power, Shoshana Zuboff \/ ----> Please fill out course evaluations! Your voice is important. \/ Thanks for all your positive attitude in this most difficult of times: - towards the course material and engaging with it - towards others in the class, helping with material and making this class (hopefully) a positive experience I am always delighted to see your generosity of spirit. \/ If you did not have a positive attitute and have been super cranky, that is perfectly ok too. - This is not a normal time and we all cope with it differently. - Just remember to be kind and nice to one person that matters. Yourself! \/ I wish you good luck with the final exam and a very nice break. Consider being a mentor next semester or next Fall!