SQL - Part 3: Data Definition and Manipulation#

  • A lot of the mechanics for manipulating data builds on SQL select syntax. It is crucial to understand SELECT well before learning the material in this section.

Overview#

  • Many data manipulation statements refer to a single table only: you can only change one table at a time.

    • If your query needs to access data from another table Y while you are modifying table X, you must use a subquery.

    • Review the syntax and examples below carefully to see how this is done.

  • Whenever you run a statement that will change the database contents, you are running a transaction.

    • If the transaction violates some constraint in the database, it will not succeed and your statement will fail.

    • Unsuccessful statements should not change the database. See more about transactions below.

Transactions#

  • A transaction is a series of database operations executed as a logical unit. They may involve zero, one or more operations that change the data.

  • Example:

    Withdraw 100 dollars from account X:
      Read account value of X into memory
      If greater than 100 then
         subtract 100 from X
         write X back to disk
         commit
     else
         abort
    
  • Transaction have multiple properties that are highly desirable in their implementations. We will talk about two in this section:

  • Atomicity: refers to all-or-nothing approach to transactions. Either all steps of the transaction succeed, or the transaction leaves the database unchanged.

    To accomplish atomicity, the database transaction management system keeps track of all changes made and either makes them permanent if transaction succeeds (commit) or rolls them back if transactions fails or aborts (rollback).

  • Concurrency: Multiple transactions executing at the same time should not have bad effects on each other. Programmers write programs as if it is the only program executing at a time.

Transactions - Atomicity#

  • Define a transaction start point with START TRANSACTION or BEGIN.

  • A transaction either ends successfully with a COMMIT or aborted completely with a ROLLBACK.

  • When a transaction is rolled back, all its operations are reversed. It is as if it never executed.

  • A transaction may involve multiple statements and multiple tuples.

    • Whenever an update/delete/insert statement is executed, all tuples changed by the statement is part of the same transaction.

    • Sometimes an update statement triggers other operations, for example due to constraints attached to foreign keys (see below).

      All these operation become part of the same transaction.

    • If any part of a transaction fails, the whole transaction fails.

  • It is also possible to explicitly specify the beginning and end point of transactions by explicit BEGIN (or BEGIN TRANSACTION) and COMMIT/ROLLBACK statements.

Insert statement#

  • Insert a single tuple into a table:

    INSERT INTO results
    VALUES(10, 'Rahul', 'winner');
    
  • Attributes must appear in the values statement in the same order they appear in the “create table” statement.

  • All attributes must be given a value, even if is null.

  • If you want to only populate some attributes with a value, they must be listed:

    INSERT INTO episodes(id, title, signature, technical, showstopper)
    VALUES(11,'The Great Christmas Bake Off','12 Iced Biscuits','6 Laufabrauð','Hidden Design Christmas Present Cake') ;
    
  • Any missing attribute is assumed to have a NULL value.

Insert results of a query to a table#

  • You can also replace the values part of insert with a query. In this case, all the tuples returned by a query are inserted to a table.

    CREATE TABLE eliminated (
        episodeid  int
        , baker    character varying(100)
        , PRIMARY KEY (episodeid, baker)
        , FOREIGN KEY (episodeid) REFERENCES episodes(id)
        , FOREIGN KEY (baker) REFERENCES bakers(baker)
    ) ;
    
    INSERT INTO eliminated (episodeid, baker)
    SELECT
       episodeid, baker
    FROM
       results
    WHERE
       result = 'eliminated';
    
  • There is a shorthand for creating a copy of a table by copying both the schema and the tuples from a different table that basically extends the above operation:

    CREATE TABLE eliminated as
        SELECT episodeid, baker FROM results WHERE result = 'eliminated';
    

    However, this query will not create primary and foreign key constraints and these must be added explicitly:

    ALTER TABLE eliminated ADD PRIMARY KEY(episodeid, baker) ;
    ALTER TABLE eliminated ADD FOREIGN KEY (episodeid) REFERENCES episodes(id);
    ALTER TABLE eliminated ADD FOREIGN KEY (baker) REFERENCES bakers(baker);
    

Delete statement#

  • You can delete tuples from a table (but leave the table definition in the database) using the delete statement.

    DELETE FROM episodes WHERE firstaired is NULL ;
    
  • This is an example of a WHERE statement that can only refer to the tuples in the table we are deleting from:

    • For each tuple in the table, if the WHERE statement is True, then delete the tuple.

  • The following will then delete all the tuples in the table:

    DELETE FROM favorites ;
    
  • If you wanted to completely remove the table, you will then need to use the drop statement:

    DROP TABLE favorites ;
    
  • If you wanted to delete based on a different table, you need a subquery:

    DELETE FROM
        favorites f
    WHERE
        EXISTS (SELECT 1
                FROM results r
                WHERE f.episodeid = r.episodeid
                      AND f.baker=r.baker
                      AND r.result = 'star baker');
    
    • Favorites should only contain bakers who were favorites but did not win star baker in that eposide. This is a case where EXISTS is needed because you cannot use join here.

Update statement#

  • Update statement is very similar to delete.

    • Update tuples by changing the value of one or more attributes.

    • WHERE statement describes which tuples should be updated.

    ALTER TABLE episodes ADD season int ;
    ALTER TABLE episodes ADD year int ;
    
    UPDATE episodes SET season = 9 ;  -- all tuples from season 9
    
    UPDATE
        episodes
    SET
        year = extract(year from firstaired)
    WHERE
        firstaired is not null ; --only for tuples with an airdate value
    
  • If there is no WHERE statement, then all tuples are updated.

  • If you need to update based on a different table, you must use a subquery:

    ALTER TABLE bakers ADD numwins ;
    
     UPDATE
         bakers b
     SET
         b.numwins = (SELECT count(*)
                      FROM results r
                      WHERE r.baker = b.baker AND r.result='star baker') ;
    

    For all bakers, the number of wins is computed with a correlated query that returns the wins for that baker.

Foreign keys#

  • A foreign key is a referential integrity constraint:

    • R.A is a foreign key references S.B means that non-null values of R.A must be stored in S.B,

    • S.B is a unique attribute or a primary key.

  • Example

    CREATE TABLE ABC (
        X int
        , Y int,
        , PRIMARY KEY(X,Y)
    ) ;
    
    CREATE TABLE DEF (
        Z int
        , W int
        , Q int
        , PRIMARY KEY Z
        , FOREIGN KEY (Z,W)
          REFERENCES ABC(X,Y)
          ON DELETE CASCADE
          ON UPDATE SET NULL
    ) ;
    
  • This means that DEF(Z,W) can be null (as there is no not null constraint), but if they have a value, the value must exit in DEF.

  • When a tuple from ABC is deleted, tuples that reference this tuple are also deleted (CASCADE).

  • If the primary key for a tuple in ABC is updated, then the corresponding tuples in DEF are set to null.

  • If there is no corresponding ON DELETE or ON UPDATE actions, the default behavior is “RESTRICT”. In this case, an update/delete from ABC will fail if there are any tuples in DEF that reference it.

  • All these cascade and set null events become part of the same transaction as the triggering update/delete/insert.

Constraint checking#

  • All constraints are checked immediately, i.e. as soon as the relation they are attached to is changed (foreign keys are attached to the referenced relation).

  • This is not desirable for cyclic constraints:

    Example: check for egg a chicken exists and check for chicken an egg exists

  • You can defer checking of constraints to the end of the transaction:

    FOREIGN KEY (Z,W) REFERENCES ABC(X,Y)
    DEFERRABLE INITIALLY DEFERRED
    

Other constraints#

  • NOT NULL: checks that the values stored for an attribute should not be null

  • CHECK: checks for an attribute or tuple, the values satisfy a condition (anything that can be written in the WHERE clause of a query that refers to the attributes in the given table only)

  • Example

    CREATE TABLE class (
       id int PRIMARY KEY
       , code  CHAR(4)
       , name VARCHAR(50) NOT NULL
       , semester VARCHAR(5) CHECK (semester in ('Fall', 'Spring','Summer'))
       , year INT CHECK (year > 1990)
       , CHECK (code IS NOT NULL OR name = 'MISC')
    ) ;
    
  • The constraint:

    CHECK (code IS NOT NULL OR name = 'MISC')
    

    is checked when a new tuple is inserted into class or when it is updated.

Database Level Constraints#

  • Assertions: Integrity constraints can be expressed in SQL using assertions for a database, not a specific table.

    CREATE ASSERTION assertionName CHECK  ( … )
    
  • Even though assertions are an SQL standard, they are rarely implemented in production level DBMS systems.

  • Triggers: Triggers are actually programs that are executed when triggered by certain insert/update/delete statements. We will revisit triggers after discussing procedural elements of SQL programming.

    CREATE TRIGGER xyz AFTER INSERT ON transcript
    

Transactions - Isolation#

  • Isolation principle says that if one transaction executes completely before the other, than its result is acceptable.

  • Hence, any serial ordering of transactions results in an acceptable database state.

  • There is an implicit assumption that transactions are complete units of execution and are implemented correctly.

  • Let us see an example of what can go wrong if the database does not guarantee serializability.

    Given two transactions, T1 and T2 that access the same data X (tuple or attribute)

    T1: read(x), x++, write(x), read(y), y--, write(y)
    T2: read(x), x--, write(x)
    

    Assume read/write are disk operations, reading/writing data to the database. The increment/decrement are operation that take place in memory.

  • Suppose X=10, Y=10

  • Isolation says that if one transaction executes completely before the other, than its result is acceptable.

  • After: T1->T2 or T2->T1, we have: X=10,Y=9. See for example:

    Time

    T1

    T2

    1

    read(x)

    2

    x++

    3

    write(x)

    4

    read(y)

    5

    y–

    6

    write(y)

    7

    read(x)

    8

    x–

    9

    write(x)

    Which gives x=10, y=9.

  • Instead, assume the following set of operations take place in the following order of time:

    Time

    T1

    T2

    1

    read(x)

    2

    x++

    3

    read(x)

    4

    x–

    5

    write(x)

    6

    read(y)

    7

    write(x)

    8

    y–

    9

    write(y)

    Since T2 reads the value of x before it is written, T1 and T2 read the same value of x.

    The final result of this database is X=9, Y=9.

    There is no equivalent serial execution that gives this result, which is a problem.

  • Let us see a different execution order:

    Time

    T1

    T2

    1

    read(x)

    2

    x++

    3

    write(x)

    4

    read(x)

    5

    x–

    6

    read(y)

    7

    write(x)

    8

    y–

    9

    write(y)

    This one gives the result x=10, y=9, which is equivalent to a serial execution.

More on Transactions - Serializability#

  • Make sure that even though operations of different transactions may be interleaved, the resulting state is equivalent to the result of some serial execution.

  • Serializability is a desirable but costly level of isolation for implementing a transactions. This may not be necessary for different transactions. In SQL, it is possible to define for each transaction the level of isolation desirable for a transaction.

Dirty Read#

  • Dirty read: dirty read is a value written by an uncommitted transaction.

    Time

    T1

    T2

    1

    read(x)

    2

    x++

    3

    write(x)

    4

    read(x)

    Here, value read by T2 is written by T1. If T1 is not yet committed, we must not allow T2 to commit either.

    If T1 aborts, then T2 must also be aborted (leading to cascading aborts).

SQL Levels of isolation#

  • Four levels, each one overcomes a problem that may happen in the previous level

Isolation Level

Prevents Dirty Reads

Prevents Non-Repeatable Reads

Prevents Phantom Reads

READ UNCOMMITTED

No

No

No

READ COMMITTED

Yes

No

No

REPEATABLE READ

Yes

Yes

No

SERIALIZABLE

Yes

Yes

Yes

  • As each level is more restrictive, fewer transactions may run concurrently.

  • Most DBMSs do not allow READ UNCOMMITTED or force a transaction at this level to be read only. (Microsoft SQL Server is one of the rare DBMS that allows it.) It is useful only for transactions that need approximate results as it is reading potentially incorrect data.

  • A READ COMMITTED transaction allows other transactions to read/write an item after the transaction is done reading/writing it. Hence, if the same value is read again during the same transaction, its value may be different.

  • For most databases, READ COMMITTED is the default isolation level.

  • REPEATABLE READ does not allow the existing tuples in a table to be changed by another transaction while the current transaction is executing.

    But, it is possible that new tuples that may be relevant to a transaction are inserted by another transaction while the transaction is executing.

    Example: Find number of students in class CSCI-4380.

    While counting, none of the existing students may drop the class.

    But, new students may be added. This is called phantom update.

  • SERIALIZABLE does not allow phantom updates either. It is the most restrictive method, often requiring monitoring the tuples that may effect a query/insert/update condition.

  • Here is an example with transaction isolation levels:

    START TRANSACTION ;
    SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
    INSERT INTO T1 SELECT NAME FROM T2 ;
    INSERT INTO T1 SELECT NAME FROM T2 ;
    COMMIT ;
    
  • How isolation is implemented will be discussed in more detail when we discuss transaction management.

  • When a transaction tries to execute an action that is not allowed by their current isolation level, they may be forced to wait until it is possible or be cancelled, i.e. rolled back.

  • Sometimes a transaction may wait forever (deadlock!) and needs to be cancelled by another program or times out, also resulting in a roll back.

  • Note that transactions may commit or rollback also programmatically such as in the case of table constraint violations.

  • Atomicity means that if any part of a transaction fails, then all the changes made by the program will be erased.

    What is the result of the following transaction?

    START TRANSACTION ;
    SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
    INSERT INTO T1 SELECT NAME FROM T2 ;
    INSERT INTO T1 SELECT NAME FROM T2 ;
    ROLLBACK ;