Introduction to Relational Databases

Course Notes

  • These notes are meant as a study guide. They will often be an outline of the material discussed in class.
  • The notes are meant to complement the book, not replace.


  • DBMS (Database Management System) is a software tool for storing and managing large amounts of data.
  • Database is a collection of data organized for a specific application, often stored in a DBMS.
  • Database application is a software product that uses DBMSs to store one or more databases to accomplish a specific purpose.

What makes something a DBMS?

  • Can we call any tool for storing data a DBMS?
    • Is Excel a DBMS?
    • Is your file system a DBMS?
  • What are desired properties of a DBMS?
    • what type of data can be stored (data model)
    • find and change stored data easily (query language)
    • store massive amounts of data
    • allow durable data storage, even when there are power failures (durability)
    • allow multiple users to read and write the same data (consistency)

DBMS Components

A DBMS is a complex software system with many components:

  • Storage manager
    • Index/file manager
  • Database language tools
    • Data query/manipulation language compiler (DML)
    • Data definition language compiler (DDL)
  • Query execution engine
    • Buffer manager
  • Transaction manager
    • Logging and recovery
    • Concurrency control
  • A database administrator is responsible for designing the data model.
  • A database programmer is responsible for writing application software that stores the database.
  • A DBMS systems administrator is responsible for installation and tuning of the DBMS system.


A program that changes data is called a transaction.

A DBMS is generally expected to support ACID properties for transactions that can be implemented on them:

  • Atomicity: transactions must either complete fully or leave no effect in the database.
  • Consistency: DMBS must not allow the programmers to violate the consistency rules for the database schema.
  • Isolation: multiple transactions executing at the same time should result in a database state identical to transactions executing one at a time.
  • Durability: when a transaction completes, DBMS must guarantee its results are recorded and not lost.

There may be different ways to define these properties in a DBMS as we will see.


  • Database

    is given by rules regarding data (data schema or data model) and the data (database instance)

  • Database schema (or data model) describes what types of data are valid to store

  • Database instance is the actual data that satisfies the rules of the database schema.

  • We often use the term database to refer to the database instance assuming the data model is encapsulated within the data

  • Relational data model is the most popular way to describe data schema, but many others are possible: RDF, graph data, object-oriented, object-relational

Data Model

  • Logical data model
    • Relations and attributes
    • Constraints: what is valid data and what is not
  • Physical data model
    • Where to store the data: which file systems, distributed, replicated
    • How to store the data: which indices to create
  • Application logic
    • Built on top of database queries: declarative, write once and optimize on top of the logical data model

Relational Data Model

  • Relations (or tables) store information about the world

  • Attribute (or column) is a property of a specific object represented by a relation

  • Tuple (or row) is a specific object stored in a relation.

  • Domain is a set of valid values.

    • Simple domains are integers, strings.
    • More complex domains can be defined with restrictions over these domains: a RIN is an 8 digit integer, starts with 6.
  • Schema for a relation defined the names of the attributes and the domain for the attributes.

  • Note: logical vs. physical names are used interchangeably, but remember the distinction:

    • Logical terms refer to the mathematical definition of the relational data model: based on set semantics.
    • Physical terms refer to the storage/implementation of the same data model. Sometimes the implementation is not identical to the logical model.

    For now, we care about the logical model.

Relational Data Model

  • A database is given by a set of relations.
  • Each relation has a name and stores a set of tuples.
  • Each relation schema consists of a set of attributes, the ordering of the attributes is not relevant.
  • Each attribute has a domain, the set of valid values.

Relation Instance

  • A relation contains a set of tuples

  • In a valid relation instance each tuple contains values for all the attributes in the relation schema that are drawn from the domain of that attribute.

  • We can represent a relation in one of many ways:

    • A table:


    Alias Name
    Flash Barry Allen
    Arrow Oliver Queen
    Jessica Jones Jessica Jones
    • A logical representation of tuples using predicates where the attributes are arguments of the predicate. Each tuple is a fact about the world.

      Hero('Flash', 'Barry Allen')
      Hero('Arrow', 'Oliver Queen')
      Hero('Jessica Jones', 'Jessica Jones')
    • A set representation:

      Hero = { <'Flash':Alias, 'Barry Allen':Name>,
               <'Arrow':Alias, 'Oliver Queen':Name>,
               <'Jessica Jones':Alias, 'Jessica Jones':Name> }
    • All representations are equivalent after we agree on the convention.

Example relation: Avengers

Name/Alias Appearances Gender Year NumYears URL
Peter Benjamin Parker 4333 MALE 1990 25
Steven Rogers 3458 MALE 1964 51
James “Logan” Howlett 3130 MALE 2005 10
Anthony “Tony” Stark 3068 MALE 1963 52
Thor Odinson 2402 MALE 1963 52
Reed Richards 2125 MALE 1989 26
Robert Bruce Banner 2089 MALE 1963 52
Clinton Francis Barton 1456 MALE 1965 50
Henry Jonathan “Hank” Pym 1269 MALE 1963 52
Natalia Alianovna Romanova 1112 FEMALE 1973 42
Victor Shade (alias) 1036 MALE 1968 47
Carol Susan Jane Danvers 935 FEMALE 1979 36
Jennifer Walters 933 FEMALE 1982 33
Jessica Miriam Drew 525 FEMALE 2008 7
Roberto da Costa 491 MALE 2013 2
Maria Hill 359 FEMALE 2010 5
Jessica Jones 205 FEMALE 2010 5

Rules of Relational Data Model

  • The domain of attributes have to be simple: integer, float, decimal, string, boolean, date, time, timestample or restrictions of these (9 digit integer).
    • This restriction is called the first normal form (1NF): attributes are indivisible pieces of information.!
    • It says that relations are simple flat pieces of information.
  • Each relation contains a set of attributes: the ordering of attributes is not important for the meaning of the relation.
  • Each relation instance contains a set of tuples. No two tuples can repeat, because we are making a logical statement:
    • Jessica Jones is an avenger. This does not change even if we repeat this value multiple times.
  • All relations have at least one key.


  • A key is a set of attributes in a relation such that no two different tuples may have the same value for the attributes in the key.

  • In different terms: a key is a way to identify a specific tuple.

  • Keys define the meaning of the relation.

  • All relations have a key. Some relations may have multiple keys.

  • We will discuss some basic relations: student, class, section, book

    Student(rin, name, major, year) Key: rin

    Movies(title, year, studio, boxofficevalue) Key: title, year

  • We generally underline the key attributes. See below:

    Student(\underline{rin}, name, major, year)

Movies(\underline{title, year}, studio, boxofficevalue)

Defining relations in SQL

  • To store a relation, we create a Table in a relational database system.

  • Examples of attribute types are following:

    • CHAR(n), VARCHAR(n), BIT(n)
    • INT
    • FLOAT, DOUBLE ; floating point precision
    • NUMERIC(n,p) ; fixed point precision
    • DATE
  • Create table command

    CREATE TABLE tablename
        attribute1  datatype
        , attribute2  datatype
        , ...
        , attributen  datatype
    ) ;
  • Example

    CREATE TABLE student
      id int
      , name  varchar(255)
      , major  char(4)
      , enrolledDate   date
      , constraint student_pk primary key (id)
      --  student_pk is the name we have given to the primary key constraint

SQL for changing tables

  • Delete a table (with all the tuples in it):

    DROP TABLE tablename ;
  • Add a new attribute to a table:

    ALTER TABLE tableName add attributeName attributeType ;
  • Remove an attribute from a table:

    ALTER TABLE tableName drop attributeName ;