Introduction to Relational Databases

Course Notes

  • These notes are meant as a study guide. They will often be an outline of the material discussed in class.

  • The notes are meant to complement the book, not replace.

Terminology

  • DBMS (Database Management System) is a software tool for storing and managing large amounts of data.

    A database server is a specific installation of a DBMS.

  • Database is a collection of data organized for a specific application, often stored in a DBMS.

  • Database application is a software product that uses DBMSs to store one or more databases to accomplish a specific purpose.

What makes something a DBMS?

  • Can we call any tool for storing data a DBMS?

    • Is Excel a DBMS?

    • Is your file system a DBMS?

  • What are desired properties of a DBMS?

    • what type of data can be stored (data model)

    • store massive amounts of data

    • allow access (read/write/update) to stored data easily (query language)

    • allow durable data storage, even when there are power failures (durability)

    • allow multiple users to read and write the same data without compromising data integrity (concurrent access)

DBMS Components

A DBMS is a complex software system with many components:

  • Storage manager

    • Index/file manager

  • Database language tools

    • Data query/manipulation language compiler (DML)

    • Data definition language compiler (DDL)

  • Query execution engine

    • Buffer manager

  • Transaction manager

    • Logging and recovery

    • Concurrency control

  • A database administrator is responsible for designing the data model.

  • A database programmer is responsible for writing application software that stores the database.

  • A DBMS systems administrator is responsible for installation and tuning of the DBMS system.

A C I D

A program that changes data is called a transaction.

A DBMS is generally expected to support ACID properties for transactions that can be implemented on them:

  • Atomicity: transactions must either complete fully or leave no effect in the database.

  • Consistency: DMBS must not allow the programmers to violate the consistency rules for the database schema.

  • Isolation: multiple transactions executing at the same time should result in a database state identical to transactions executing one at a time.

  • Durability: when a transaction completes, DBMS must guarantee its results are recorded and not lost.

There may be different ways to define these properties in a DBMS as we will see.

Databases

  • Database

    is given by rules regarding data (data schema or data model) and the data (database instance)

  • Database schema (or data model) describes what types of data are valid to store

  • Database instance is the actual data that satisfies the rules of the database schema.

  • We often use the term database to refer to the database instance assuming the data model is encapsulated within the data

  • Relational data model is the most popular way to describe data schema, but many others are possible: RDF, graph data, object-oriented, object-relational

Data Model

  • Logical data model

    • Relations and attributes

    • Constraints: what is valid data and what is not

  • Physical data model

    • Where to store the data: which file systems, distributed, replicated

    • How to store the data: which indices to create

  • Application logic

    • Built on top of database queries: declarative, write once and optimize on top of the logical data model

Relational Data Model

  • Relations (or tables) store information about the world

  • Attribute (or column) is a property of a specific object represented by a relation

  • Tuple (or row) is a specific object stored in a relation.

  • Domain is a set of valid values.

    • Simple domains are integers, strings.

    • More complex domains can be defined with restrictions over these domains: a RIN is an 8 digit integer, starts with 6.

  • Schema for a relation defined the names of the attributes and the domain for the attributes.

  • Note: logical vs. physical names are used interchangeably, but remember the distinction:

    • Logical terms refer to the mathematical definition of the relational data model: based on set semantics.

    • Physical terms refer to the storage/implementation of the same data model. Sometimes the implementation is not identical to the logical model.

    For now, we care about the logical model.

Relational Data Model

  • A database is given by a set of relations.

  • Each relation has a name and stores a set of tuples.

  • Each relation schema consists of a set of attributes, the ordering of the attributes is not relevant.

  • Each attribute has a domain, the set of valid values.

Relation Instance

  • A relation contains a set of tuples

  • In a valid relation instance each tuple contains values for all the attributes in the relation schema that are drawn from the domain of that attribute.

  • We can represent a relation in one of many ways:

    • A table:

    Hero

    Alias

    Name

    Black Panther

    T’Challa

    Flash

    Barry Allen

    Jessica Jones

    Jessica Jones

    • A logical representation of tuples using predicates where the attributes are arguments of the predicate. Each tuple is a fact about the world.

      Hero('Black Panther', 'T''Challa')
      Hero('Flash', 'Barry Allen')
      Hero('Jessica Jones', 'Jessica Jones')
      
    • A set representation:

      Hero = { <'Black Panther':Alias, 'T''Challa':Name>,
               <'Flash':Alias, 'Barry Allen':Name>,
               <'Jessica Jones':Alias, 'Jessica Jones':Name> }
      
    • All representations are equivalent after we agree on the convention.

Example relation: Avengers

Name/Alias

Appearances

Gender

Year

NumYears

URL

Peter Benjamin Parker

4333

MALE

1990

25

http://marvel.wikia.com/Peter_Parker_(Earth-616)#

Steven Rogers

3458

MALE

1964

51

http://marvel.wikia.com/Steven_Rogers_(Earth-616)

James “Logan” Howlett

3130

MALE

2005

10

http://marvel.wikia.com/James_Howlett_(Earth-616)#

Anthony “Tony” Stark

3068

MALE

1963

52

http://marvel.wikia.com/Anthony_Stark_(Earth-616)

Thor Odinson

2402

MALE

1963

52

http://marvel.wikia.com/Thor_Odinson_(Earth-616)

Reed Richards

2125

MALE

1989

26

http://marvel.wikia.com/Reed_Richards_(Earth-616)#

Robert Bruce Banner

2089

MALE

1963

52

http://marvel.wikia.com/Robert_Bruce_Banner_(Earth-616)

Clinton Francis Barton

1456

MALE

1965

50

http://marvel.wikia.com/Clint_Barton_(Earth-616)

Henry Jonathan “Hank” Pym

1269

MALE

1963

52

http://marvel.wikia.com/Henry_Pym_(Earth-616)

Natalia Alianovna Romanova

1112

FEMALE

1973

42

http://marvel.wikia.com/Natalia_Romanova_(Earth-616)#

Victor Shade (alias)

1036

MALE

1968

47

http://marvel.wikia.com/Vision_(Earth-616)

Carol Susan Jane Danvers

935

FEMALE

1979

36

http://marvel.wikia.com/Carol_Danvers_(Earth-616)#

Jennifer Walters

933

FEMALE

1982

33

http://marvel.wikia.com/Jennifer_Walters_(Earth-616)#

Jessica Miriam Drew

525

FEMALE

2008

7

http://marvel.wikia.com/Jessica_Drew_(Earth-616)#

Roberto da Costa

491

MALE

2013

2

http://marvel.wikia.com/Roberto_da_Costa_(Earth-616)#

Maria Hill

359

FEMALE

2010

5

http://marvel.wikia.com/Maria_Hill_(Earth-616)#

Jessica Jones

205

FEMALE

2010

5

http://marvel.wikia.com/Jessica_Jones_(Earth-616)#

Black Panther

MALE

1966

https://marvel.fandom.com/wiki/T’Challa_(Earth-616)

Rules of Relational Data Model

  • The domain of attributes have to be simple: integer, float, decimal, string, boolean, date, time, timestample or restrictions of these (9 digit integer).

    • This restriction is called the first normal form (1NF): attributes are indivisible pieces of information (so no sets or lists for example).!

    • It says that relations are simple flat pieces of information.

  • Each relation contains a set of attributes: the ordering of attributes is not important for the meaning of the relation.

  • Each relation instance contains a set of tuples. No two tuples can repeat, because we are making a logical statement:

    • Jessica Jones is an avenger. This does not change even if we repeat this value multiple times.

  • All relations have at least one key.

Key

  • A key is a set of attributes in a relation such that:

    • no two different tuples may have the same value for the attributes in the key,

    • and no subset of these attributes is a key (i.e. it is minimal).

  • In different terms: a key is a way to identify a specific tuple.

  • Keys define the meaning of the relation.

  • All relations have a key. Some relations may have multiple keys.

  • We will discuss some basic relations: student, class, section, book

    Student(rin, name, major, year) Key: rin

    Movies(title, year, studio, boxofficevalue) Key: title, year

  • We generally underline the key attributes. See below:

    Student(\underline{rin}, name, major, year)

Movies(\underline{title, year}, studio, boxofficevalue)

  • The key really depends on what the specific data model and its attributes are representing.

    • In the above example, we are modeling an RPI student for whom RIN is unique.

    • For movies, we are assuming no two different movies with the same title will come out in the same year, because that would be confusing. But, if our model was about the specific release of a movie, then the box office value would change from country to country. So, our key would need to reflect that:

    MovieReleases(\underline{title, year, country}, studio, boxofficevalue)

Defining relations in SQL

  • To store a relation, we create a Table in a relational database system.

  • Examples of attribute types are following:

    • CHAR(n), VARCHAR(n), BIT(n)

    • BOOLEAN

    • INT

    • FLOAT, DOUBLE ; floating point precision

    • NUMERIC(n,p) ; fixed point precision

    • DATE

  • Create table command

    CREATE TABLE tablename
    (
        attribute1  datatype
        , attribute2  datatype
    
        , ...
    
        , attributen  datatype
        constraints
    ) ;
    
  • Example

    CREATE TABLE student
    (
      id int
      , name  varchar(255)
      , major  char(4)
      , enrolledDate   date
      , constraint student_pk primary key (id)
      --  student_pk is the name we have given to the primary key constraint
    )