Transactions: A tomicity C onsistency I solation D urability - Atomicity: execute all operations or none - Consistency: transform data from one consistent state to another consistent state - Isolation: execute correctly even in the presence of concurrent operations - Durability: changes made by transactions are not lost Concurrency Control: ------------------------- Transaction: T1: r1(x) w1(x) T2: r2(x) w2(x) Concurrent operations: Time flows down: Assume x = 20 | T1 T2 | r1(x) // x = 20 | x = x + 5 // x = 25 | | r2(x) // x = 20 | x = x - 5 // x = 15 | w2(x) writes 15 | | w1(x) // writes 25 | | Not serializable!!! v Time Schedule: S0: r1(x) r2(x) w1(x) w2(x) S1: r1(x) w1(x) r2(x) w2(x) S2: r2(x) w2(x) r1(x) w1(x) S1: Time flows down: Assume x = 20 | T1 T2 | r1(x) // x = 20 | x = x + 5 // x = 25 | w1(x) // writes 25 | r2(x) //x = 25 | x = x - 5 // x = 20 | w2(x) writes 20 | | v Time S2: Time flows down: Assume x = 20 | T1 T2 | r2(x) //x = 20 | x = x - 5 // x = 15 | w2(x) writes 15 | r1(x) // x = 15 | x = x + 5 // x = 20 | w1(x) // writes 20 | | v Time What is a good schedule? -------------------------- A serial schedule (where one transaction executes at a time) is a good schedule: if I execute one transaction at a time, the resulting database state is always correct. A serializable schedule is one that is guaranteed to provide the same results as a serial schedule. Conflicting operations ------------------------ Two operations conflict with each other, if they are by two different transactions and at least one of them is a write operation: r1(x) w2(x) w1(x) r2(x) w1(x) w2(x) If in two schedules, the conflicting operations occur in the same order, then their result is guaranteed to be the same: then these two schedules are equivalent S0: r1(x) [ r2(x) w1(x) ] w2(x) --> r2(x) reads values before it is changed S0': r1(x) [ w1(x) r2(x)] w2(x) --> r2(x) read the value after T1 writes S0 and S0' are not guaranteed to produce the same result A schedule S is serializable (equal to a serial schedule) iff all conflicting operations in S occur in the same order as a serial schedule. ---------- Conflicting operations ------------------------ Two operations conflict with each other, if they are by two different transactions and one of them is a write operation r1(x) w2(x) w1(x) r2(x) w1(x) w2(x) Serializability: ----------------------------- A schedule S is serializable (equal to a serial schedule) iff all conflicting operations in S occur in the same order as a serial schedule. Checking Serializability: ---------------------------- Create a conflict graph for a schedule S: -- Each transaction is a node -- For each conflict in S of the form: ri(x) wj(x) wi(x) rj(x) wi(x) wj(x) Draw an edge from Ti to Tj. T1->T2, T2->T1 If the resulting graph has a cycle, then this schedule is NOT serializable. If the resulting graph has no cycles, then this schedule is serializable. I can order transaction using topological sort on the conflict graph. Examples: S1: r1(x) r1(y) w2(y) w2(z) r3(z) w3(k) r1(k) w2(l) w1(x) S1': r1(x) r1(y) w2(y) w2(z) r3(z) r1(k) w3(k) w2(l) w1(x) Concurrency Control: ------------------------- Concurrent operations: Schedule: S0: r1(x) r2(x) w1(x) w2(x) S1: r1(x) w1(x) r2(x) w2(x) T1 -> T2 S2: r2(x) w2(x) r1(x) w1(x) T2 -> T1 S1,S2: serial schedules Conflicting operations ------------------------ Two operations conflict with each other, if they are by two different transactions and one of them is a write operation r1(x) w2(x) w1(x) r2(x) w1(x) w2(x) Serializability: ----------------------------- A schedule S is serializable (equal to a serial schedule) iff all conflicting operations in S occur in the same order as a serial schedule. Checking Serializability: ---------------------------- Create a conflict graph for a schedule S: -- Each transaction is a node -- For each conflict in S of the form: ri(x) wj(x) wi(x) rj(x) wi(x) wj(x) Draw an edge from Ti to Tj. T1->T2, T2->T1 If the resulting graph has a cycle, then this schedule is NOT serializable. If the resulting graph has no cycles, then this schedule is serializable. I can order transaction using topological sort on the conflict graph. Examples: S2: r1(x) w2(x) w1(y) r3(y) w3(z) r2(z) r3(w) w2(z) Conflicts: r1(x) w2(x) w1(y) r3(y) w3(z) r2(z) w3(z) w2(z) Is it serializable? -> No cycles, it is serializable. Equivalent serial order: T1, T3, T2 ---------------------------------- S3: r1(x) w1(x) w1(z) w3(z) r4(z) w2(w) r4(w) r2(w) r3(x) r2(z) w4(w) w4(z) Conflicts: w1(x) r3(x) w1(z) r4(z) w1(z) w3(z) w1(z) w4(z) w1(z) r2(z) w3(z) r2(z) w3(z) r4(z) w3(z) w4(z) w2(w) w4(w) w2(w) r4(w) Serializable, no cycles. T1, T3, T2, T4 Concurrency Control: ---------------------- Each transaction must obtain a lock for each operation, read (shared lock) to read an item and write (exclusive) lock to write an item. Read/Shared locks: Multiple transactions can hold shared lock on the same item Write/Exclusive locks: Only one transaction can hold a write lock on a specific item 2PL: Two phase locking: ------------------------- Any transaction can be in only one of the two phases: Growing phase: A transaction can get new locks. As soon as xact releases a lock, then it enters shrinking phase. Shrinking phase: A transaction can only release locks (but cannot get new locks). If a transaction manager implements two phase locking, then the resulting schedules are serializable. Proof: showing no cycles are possible using Two phase locking. ------------ Strict 2PL: No shrinking phase: Get locks and keep them until commit time. All schedules resulting from strict 2PL are also guaranteed to be serializable. -------------- r1(x) .... w1(x) shared_lock1(x) r1(x) .... upgrade_lock1(x) w1(x) Concurrency control with locks ---------------------------------- Two phase locking/Strict two phase locking | T1 T2 | sl1(x) | r1(x) | sl2(y) | r2(y) | request xl1(y) to w1(y) | wait | request xl2(x) to w2(x) | wait | | v Time Wait for /-----------| / v /----\ /----\ | T1 | | T2 | \----/ \----/ ^ / |----------/ Wait for Deadlocks: -> Periodically check for deadlocks: wait-for graph -> Avoid deadlocks: --------------------- What are we locking? --------------------- Relation Page Tuple Trade off: high concurrency - high overhead -------------------------- Lock Granularity ----------------- IS IX S X PHANTOM UPDATE ----------------- Read uncommitted -> possible to read uncommitted data Read committed -> possible to read the same data twice and get different results Repeatable read -> possible to have phantom updates Serializable -> not possible to have phantom updates T1: SELECT count(*) FROM episodes; Locked all tuples using S (or IS) T2: INSERT INTO episodes VALUES(....) ; << possible since no lock on a new tuple!>> To implement serializability: lock a condition! SELECT count(*) FROM episodes WHERE year = 2020;