.. role:: underline :class: underline Introduction to Relational Databases ===================================== Course Notes ------------- - These notes are meant as a study guide. They will often be an outline of the material discussed in class. - The notes are meant to complement the book, not replace. Terminology ------------ - *DBMS (Database Management System)* is a software tool for storing and managing large amounts of data. A database server is a specific installation of a DBMS. - *Database* is a collection of data organized for a specific application, often stored in a DBMS. - *Database application* is a software product that uses DBMSs to store one or more databases to accomplish a specific purpose. What makes something a DBMS? ----------------------------- - Can we call any tool for storing data a DBMS? - Is Excel a DBMS? - Is your file system a DBMS? - What are desired properties of a DBMS? - what type of data can be stored (data model) - store massive amounts of data - allow access (read/write/update) to stored data easily (query language) - allow durable data storage, even when there are power failures (durability) - allow multiple users to read and write the same data without compromising data integrity (concurrent access) DBMS Components ---------------- A DBMS is a complex software system with many components: - Storage manager - Index/file manager - Database language tools - Data query/manipulation language compiler (DML) - Data definition language compiler (DDL) - Query execution engine - Buffer manager - Transaction manager - Logging and recovery - Concurrency control - *A database administrator* is responsible for designing the data model. - *A database programmer* is responsible for writing application software that stores the database. - *A DBMS systems administrator* is responsible for installation and tuning of the DBMS system. A C I D ------- A program that changes data is called a *transaction*. A DBMS is *generally* expected to support ACID properties for transactions that can be implemented on them: - Atomicity: transactions must either complete fully or leave no effect in the database. - Consistency: DMBS must not allow the programmers to violate the consistency rules for the database schema. - Isolation: multiple transactions executing at the same time should result in a database state identical to transactions executing one at a time. - Durability: when a transaction completes, DBMS must guarantee its results are recorded and not lost. There may be different ways to define these properties in a DBMS as we will see. Databases ---------- - *Database* is given by rules regarding data (data schema or data model) and the data (database instance) - *Database schema (or data model)* describes what types of data are valid to store - *Database instance* is the actual data that satisfies the rules of the database schema. - We often use the term database to refer to the database instance assuming the data model is encapsulated within the data - Relational data model is the most popular way to describe data schema, but many others are possible: RDF, graph data, object-oriented, object-relational Data Model ---------- - Logical data model - Relations and attributes - Constraints: what is valid data and what is not - Physical data model - Where to store the data: which file systems, distributed, replicated - How to store the data: which indices to create - Application logic - Built on top of database queries: declarative, write once and optimize on top of the logical data model Relational Data Model ----------------------- - *Relations (or tables)* store information about the world - *Attribute (or column)* is a property of a specific object represented by a relation - *Tuple (or row)* is a specific object stored in a relation. - *Domain* is a set of valid values. - Simple domains are integers, strings. - More complex domains can be defined with restrictions over these domains: a RIN is an 8 digit integer, starts with 6. - *Schema* for a relation defined the names of the attributes and the domain for the attributes. - Note: logical vs. physical names are used interchangeably, but remember the distinction: - Logical terms refer to the mathematical definition of the relational data model: based on set semantics. - Physical terms refer to the storage/implementation of the same data model. Sometimes the implementation is not identical to the logical model. For now, we care about the logical model. Relational Data Model ----------------------- - A database is given by a set of relations. - Each relation has a name and stores a set of tuples. - Each relation schema consists of a set of attributes, the ordering of the attributes is not relevant. - Each attribute has a domain, the set of valid values. Relation Instance ----------------- - A relation contains a set of tuples - In a valid relation instance each tuple contains values for all the attributes in the relation schema that are drawn from the domain of that attribute. - We can represent a relation in one of many ways: - A table: **Hero** ============== =============== Alias Name ============== =============== Black Panther T'Challa Flash Barry Allen Jessica Jones Jessica Jones ============== =============== - A logical representation of tuples using predicates where the attributes are arguments of the predicate. Each tuple is a fact about the world. :: Hero('Black Panther', 'T''Challa') Hero('Flash', 'Barry Allen') Hero('Jessica Jones', 'Jessica Jones') - A set representation: :: Hero = { <'Black Panther':Alias, 'T''Challa':Name>, <'Flash':Alias, 'Barry Allen':Name>, <'Jessica Jones':Alias, 'Jessica Jones':Name> } - All representations are equivalent after we agree on the convention. Example relation: Avengers --------------------------- ========================== ============ ======= ======= ========= ======================================================== Name/Alias Appearances Gender Year NumYears URL ========================== ============ ======= ======= ========= ======================================================== Peter Benjamin Parker 4333 MALE 1990 25 http://marvel.wikia.com/Peter_Parker_(Earth-616)# Steven Rogers 3458 MALE 1964 51 http://marvel.wikia.com/Steven_Rogers_(Earth-616) James "Logan" Howlett 3130 MALE 2005 10 http://marvel.wikia.com/James_Howlett_(Earth-616)# Anthony "Tony" Stark 3068 MALE 1963 52 http://marvel.wikia.com/Anthony_Stark_(Earth-616) Thor Odinson 2402 MALE 1963 52 http://marvel.wikia.com/Thor_Odinson_(Earth-616) Reed Richards 2125 MALE 1989 26 http://marvel.wikia.com/Reed_Richards_(Earth-616)# Robert Bruce Banner 2089 MALE 1963 52 http://marvel.wikia.com/Robert_Bruce_Banner_(Earth-616) Clinton Francis Barton 1456 MALE 1965 50 http://marvel.wikia.com/Clint_Barton_(Earth-616) Henry Jonathan "Hank" Pym 1269 MALE 1963 52 http://marvel.wikia.com/Henry_Pym_(Earth-616) Natalia Alianovna Romanova 1112 FEMALE 1973 42 http://marvel.wikia.com/Natalia_Romanova_(Earth-616)# Victor Shade (alias) 1036 MALE 1968 47 http://marvel.wikia.com/Vision_(Earth-616) Carol Susan Jane Danvers 935 FEMALE 1979 36 http://marvel.wikia.com/Carol_Danvers_(Earth-616)# Jennifer Walters 933 FEMALE 1982 33 http://marvel.wikia.com/Jennifer_Walters_(Earth-616)# Jessica Miriam Drew 525 FEMALE 2008 7 http://marvel.wikia.com/Jessica_Drew_(Earth-616)# Roberto da Costa 491 MALE 2013 2 http://marvel.wikia.com/Roberto_da_Costa_(Earth-616)# Maria Hill 359 FEMALE 2010 5 http://marvel.wikia.com/Maria_Hill_(Earth-616)# Jessica Jones 205 FEMALE 2010 5 http://marvel.wikia.com/Jessica_Jones_(Earth-616)# Black Panther MALE 1966 https://marvel.fandom.com/wiki/T'Challa_(Earth-616) ========================== ============ ======= ======= ========= ======================================================== Rules of Relational Data Model ------------------------------- - The domain of attributes have to be simple: integer, float, decimal, string, boolean, date, time, timestample or restrictions of these (9 digit integer). - This restriction is called the first normal form (1NF): attributes are indivisible pieces of information (so no sets or lists for example).! - It says that relations are simple flat pieces of information. - Each relation contains a set of attributes: the ordering of attributes is not important for the meaning of the relation. - Each relation instance contains a set of tuples. No two tuples can repeat, because we are making a logical statement: - Jessica Jones is an avenger. This does not change even if we repeat this value multiple times. - All relations have at least one key. Key --- - A key is a set of attributes in a relation such that: - no two different tuples may have the same value for the attributes in the key, - and no subset of these attributes is a key (i.e. it is minimal). - In different terms: a key is a way to identify a specific tuple. - Keys define the meaning of the relation. - All relations have a key. Some relations may have multiple keys. - We will discuss some basic relations: student, class, section, book Student(rin, name, major, year) Key: rin Movies(title, year, studio, boxofficevalue) Key: title, year - We generally underline the key attributes. See below: .. math:: Student(\underline{rin}, name, major, year) Movies(\underline{title, year}, studio, boxofficevalue) - The key really depends on what the specific data model and its attributes are representing. - In the above example, we are modeling an RPI student for whom RIN is unique. - For movies, we are assuming no two different movies with the same title will come out in the same year, because that would be confusing. But, if our model was about the specific release of a movie, then the box office value would change from country to country. So, our key would need to reflect that: .. math:: MovieReleases(\underline{title, year, country}, studio, boxofficevalue) Defining relations in SQL ------------------------- - To store a relation, we create a Table in a relational database system. - Examples of attribute types are following: - CHAR(n), VARCHAR(n), BIT(n) - BOOLEAN - INT - FLOAT, DOUBLE ; floating point precision - NUMERIC(n,p) ; fixed point precision - DATE - Create table command :: CREATE TABLE tablename ( attribute1 datatype , attribute2 datatype , ... , attributen datatype constraints ) ; - Example :: CREATE TABLE student ( id int , name varchar(255) , major char(4) , enrolledDate date , constraint student_pk primary key (id) -- student_pk is the name we have given to the primary key constraint )