Introduction to Relational Databases

Course Notes

  • These notes are meant as a study guide. They will often be an outline of the material discussed in class.
  • The notes are meant to complement the book, not replace.

Terminology

  • DBMS (Database Management System) is a software tool for storing and managing large amounts of data.
  • Database is a collection of data organized for a specific application, often stored in a DBMS.
  • Database application is a software product that uses DBMSs to store one or more databases to accomplish a specific purpose.

What makes something a DBMS?

  • Can we call any tool for storing data a DBMS?
    • Is Excel a DBMS?
    • Is your file system a DBMS?
  • What are desired properties of a DBMS?
    • what type of data can be stored (data model)
    • find and change stored data easily (query language)
    • store massive amounts of data
    • allow durable data storage, even when there are power failures (durability)
    • allow multiple users to read and write the same data (consistency)

DBMS Components

A DBMS is a complex software system with many components:

  • Storage manager
    • Index/file manager
  • Database language tools
    • Data query/manipulation language compiler (DML)
    • Data definition language compiler (DDL)
  • Query execution engine
    • Buffer manager
  • Transaction manager
    • Logging and recovery
    • Concurrency control
  • A database administrator is responsible for designing the data model.
  • A database programmer is responsible for writing application software that stores the database.
  • A DBMS systems administrator is responsible for installation and tuning of the DBMS system.

A C I D

A program that changes data is called a transaction.

A DBMS is generally expected to support ACID properties for transactions that can be implemented on them:

  • Atomicity: transactions must either complete fully or leave no effect in the database.
  • Consistency: DMBS must not allow the programmers to violate the consistency rules for the database schema.
  • Isolation: multiple transactions executing at the same time should result in a database state identical to transactions executing one at a time.
  • Durability: when a transaction completes, DBMS must guarantee its results are recorded and not lost.

There may be different ways to define these properties in a DBMS as we will see.

Databases

  • Database
    is given by rules regarding data (data schema or data model) and the data (database instance)
  • Database schema (or data model) describes what types of data are valid to store
  • Database instance is the actual data that satisfies the rules of the database schema.
  • We often use the term database to refer to the database instance assuming the data model is encapsulated within the data
  • Relational data model is the most popular way to describe data schema, but many others are possible: RDF, graph data, object-oriented, object-relational

Data Model

  • Logical data model
    • Relations and attributes
    • Constraints: what is valid data and what is not
  • Physical data model
    • Where to store the data: which file systems, distributed, replicated
    • How to store the data: which indices to create
  • Application logic
    • Built on top of database queries: declarative, write once and optimize on top of the logical data model

Relational Data Model

  • Relations (or tables) store information about the world

  • Attribute (or column) is a property of a specific object represented by a relation

  • Tuple (or row) is a specific object stored in a relation.

  • Domain is a set of valid values.

    • Simple domains are integers, strings.
    • More complex domains can be defined with restrictions over these domains: a RIN is an 8 digit integer, starts with 6.
  • Schema for a relation defined the names of the attributes and the domain for the attributes.

  • Note: logical vs. physical names are used interchangeably, but remember the distinction:

    • Logical terms refer to the mathematical definition of the relational data model: based on set semantics.
    • Physical terms refer to the storage/implementation of the same data model. Sometimes the implementation is not identical to the logical model.

    For now, we care about the logical model.

Relational Data Model

  • A database is given by a set of relations.
  • Each relation has a name and stores a set of tuples.
  • Each relation schema consists of a set of attributes, the ordering of the attributes is not relevant.
  • Each attribute has a domain, the set of valid values.

Relation Instance

  • A relation contains a set of tuples

  • In a valid relation instance each tuple contains values for all the attributes in the relation schema that are drawn from the domain of that attribute.

  • We can represent a relation in one of many ways:

    • A table:

    Hero

    Alias Name
    Flash Barry Allen
    Arrow Oliver Queen
    Jessica Jones Jessica Jones
    • A logical representation of tuples using predicates where the attributes are arguments of the predicate. Each tuple is a fact about the world.

      Hero('Flash', 'Barry Allen')
      Hero('Arrow', 'Oliver Queen')
      Hero('Jessica Jones', 'Jessica Jones')
      
    • A set representation:

      Hero = { <'Flash':Alias, 'Barry Allen':Name>,
               <'Arrow':Alias, 'Oliver Queen':Name>,
               <'Jessica Jones':Alias, 'Jessica Jones':Name> }
      
    • All representations are equivalent after we agree on the convention.

Example relation: Avengers

Name/Alias Appearances Gender Year NumYears URL
Peter Benjamin Parker 4333 MALE 1990 25 http://marvel.wikia.com/Peter_Parker_(Earth-616)#
Steven Rogers 3458 MALE 1964 51 http://marvel.wikia.com/Steven_Rogers_(Earth-616)
James “Logan” Howlett 3130 MALE 2005 10 http://marvel.wikia.com/James_Howlett_(Earth-616)#
Anthony “Tony” Stark 3068 MALE 1963 52 http://marvel.wikia.com/Anthony_Stark_(Earth-616)
Thor Odinson 2402 MALE 1963 52 http://marvel.wikia.com/Thor_Odinson_(Earth-616)
Reed Richards 2125 MALE 1989 26 http://marvel.wikia.com/Reed_Richards_(Earth-616)#
Robert Bruce Banner 2089 MALE 1963 52 http://marvel.wikia.com/Robert_Bruce_Banner_(Earth-616)
Clinton Francis Barton 1456 MALE 1965 50 http://marvel.wikia.com/Clint_Barton_(Earth-616)
Henry Jonathan “Hank” Pym 1269 MALE 1963 52 http://marvel.wikia.com/Henry_Pym_(Earth-616)
Natalia Alianovna Romanova 1112 FEMALE 1973 42 http://marvel.wikia.com/Natalia_Romanova_(Earth-616)#
Victor Shade (alias) 1036 MALE 1968 47 http://marvel.wikia.com/Vision_(Earth-616)
Carol Susan Jane Danvers 935 FEMALE 1979 36 http://marvel.wikia.com/Carol_Danvers_(Earth-616)#
Jennifer Walters 933 FEMALE 1982 33 http://marvel.wikia.com/Jennifer_Walters_(Earth-616)#
Jessica Miriam Drew 525 FEMALE 2008 7 http://marvel.wikia.com/Jessica_Drew_(Earth-616)#
Roberto da Costa 491 MALE 2013 2 http://marvel.wikia.com/Roberto_da_Costa_(Earth-616)#
Maria Hill 359 FEMALE 2010 5 http://marvel.wikia.com/Maria_Hill_(Earth-616)#
Jessica Jones 205 FEMALE 2010 5 http://marvel.wikia.com/Jessica_Jones_(Earth-616)#

Rules of Relational Data Model

  • The domain of attributes have to be simple: integer, float, decimal, string, boolean, date, time, timestample or restrictions of these (9 digit integer).
    • This restriction is called the first normal form (1NF): attributes are indivisible pieces of information.!
    • It says that relations are simple flat pieces of information.
  • Each relation contains a set of attributes: the ordering of attributes is not important for the meaning of the relation.
  • Each relation instance contains a set of tuples. No two tuples can repeat, because we are making a logical statement:
    • Jessica Jones is an avenger. This does not change even if we repeat this value multiple times.
  • All relations have at least one key.

Key

  • A key is a set of attributes in a relation such that no two different tuples may have the same value for the attributes in the key.

  • In different terms: a key is a way to identify a specific tuple.

  • Keys define the meaning of the relation.

  • All relations have a key. Some relations may have multiple keys.

  • We will discuss some basic relations: student, class, section, book

    Student(rin, name, major, year) Key: rin

    Movies(title, year, studio, boxofficevalue) Key: title, year

  • We generally underline the key attributes. See below:

    Student(\underline{rin}, name, major, year)

Movies(\underline{title, year}, studio, boxofficevalue)

Defining relations in SQL

  • To store a relation, we create a Table in a relational database system.

  • Examples of attribute types are following:

    • CHAR(n), VARCHAR(n), BIT(n)
    • BOOLEAN
    • INT
    • FLOAT, DOUBLE ; floating point precision
    • NUMERIC(n,p) ; fixed point precision
    • DATE
  • Create table command

    CREATE TABLE tablename
    (
        attribute1  datatype
        , attribute2  datatype
    
        , ...
    
        , attributen  datatype
        constraints
    ) ;
    
  • Example

    CREATE TABLE student
    (
      id int
      , name  varchar(255)
      , major  char(4)
      , enrolledDate   date
      , constraint student_pk primary key (id)
      --  student_pk is the name we have given to the primary key constraint
    )
    

SQL for changing tables

  • Delete a table (with all the tuples in it):

    DROP TABLE tablename ;
    
  • Add a new attribute to a table:

    ALTER TABLE tableName add attributeName attributeType ;
    
  • Remove an attribute from a table:

    ALTER TABLE tableName drop attributeName ;