SELECT R.B, S.E
FROM R, S
WHERE R.A=10 AND R.B <=200
      AND S.C = R.D;
Number of tuples expected in the output of this query?

Tuples(R)=10,000   PAGES(R)=1,000    (Each page can have 10 tuples)
Tuples(S)=20,000   PAGES(S)= 800     

NDistinct(R.A)=500
NDistinct(R.D)=10,000
NDistinct(S.C)=12,000
MinVal(R.B)=1
MinVal(R.B)=1000

SEL(R.A=10) = 1/NDISTINCT(R.A)
SEL(R.B <=200) = 200-minval(R.B)/(maxval(R.B)-minVal(R.B)
SEL(S.C = R.D) = 1/max(NDISTINCT(S.C), NDISTINCT(R.D))

TUPLES(R)*TUPLES(S)* SEL(R.A=10)*SEL(R.B <=200)*SEL(S.C = R.D)
10,000*20,000*1/500*200/1000*1/12,000

-------------
Query Plan for:

Index scan on I1 on R(B,A) with 500 leaf nodes and 3 level


Join the result of this with S using block nested loop join with M=30


Index scan using: R.A=10 AND R.B <=200

Cost of the index scan (find all tuples satisfying R.A=10 AND R.B <=200)

Scan the index for R.B<=200  (200/1000, 1/5)

500/5=100 leaf nodes + 2

Read 4 tuples to find R.D

Index cost = 2 + 100 + 4 = 106

--> Which condition am I going to scan the index by?
    --> How many leaf nodes?

---> How many matching tuples am I going to find after scanning?
---> Do I need to read these tuples from data pages due to missing attributes
     needed for the query?

--------------
# tuples in the output
and how many disk pages they take?

TUPLES(R)*SEL(R.A=10)*SEL(R.B <=200) = 
10,000 * 1/500*200/1000 =  4 tuples (1 page)

Due to pipelining, there is no additional cost of reading R (it is already
in memory after  index scan).

How many times do I have to read S? Since R after selection fits in 1
block, I only have to read S once!

Total cost = 106 (for index scan) + 800 (PAGES(S)) 


------------------------

R1 = (SELECT R)
      Sequential scan: cost = PAGES(R) = 1,000

R2 = (SELECT R) join S
      2*500
      
R3 = SORT (R2)
     4000 (write 40 sorted groups) + 4000 (read/merge/output)

------------------------
Cost of joining

 R1 = 200 pages and S= 500 pages using M=101

  R1 once, S two times
  Cost = 200 + 2*500 
------------------------
Sorting a relation of 4,000 pages
  M = 100

4000 + 4000  ( 40 sorted groups)
4000 and output

Cost of sort = 3*4000

------------
A. Which condition am I going to scan the index by?
B. How many matching tuples am I going to find after scanning?
C. Do I need to read these tuples from data pages due to missing attributes
   needed for the query?


Index scan examples:

Q1: SELECT B,E FROM R WHERE B<=200 AND B>=10 AND C=10 AND A=5
Q2: SELECT C,D FROM R WHERE C<10 AND B=10

Index I1 on R(A,B,C,D)

Scan for Q1:
    A. scan for A and B conditions
    B. find matching for A,B,C conditions
    C. yes, I need to read for the E attribute
Scan for Q2:
    A. scan the whole leaf
    B. find matching B,C conditions
    C. no
Index I2 on R(C,A,E)

Scan for Q1:
    A. Scan for A,C conditions
    B. Find tuples matching A,C conditions
    C. Yes, to check the B condition
Scan for Q2:
    A. Scan for C condition and read matching tuples from Data pages

Q1: SELECT B,E FROM R WHERE B<=200 AND B>=10 AND C=10 AND A=5
Q2: SELECT C,D FROM R WHERE C<10 AND B=10

Index I3 on R(C,D,A)

Scan for Q1:
     A. Scan for C=10
     B. Find matching tuples for A,C attributes
     C. Yes, for B condition and attribute E
Scan for Q2:
     A. Scan for C and read matching tuples

-----------

R(A,B,C,D,E,F,G)

{AB->CD, D->EG, G->A}

Key: ABF, GBF

In BCNF?

AB->CD  No, AB is not a superkey
D->EG   D is not a superkey
G->A    G is not a superkey

In 3NF?

AB->CD  No, AB is not a superkey and CD are not prime
-----------

3NF decomposition

{AB->CD, D->EG, G->A}


ABCD  AB->CD   Key: AB
DEG   D->EG    Key: D
AG    G->A     Key: G
ABF            Key: ABF AB=>F

------------------

BCNF decomposition

R(A,B,C,D,E,F,G)

{AB->CD, D->EG, G->A, AB=>F}

Key: ABF, GBF

In BCNF?

AB->CD  No, AB is not a superkey
D->EG   D is not a superkey
G->A    G is not a superkey

---- D->EG

D+ = {D,E,G,A}   D->EGA

R1(A,D,E,G)   D->EG, G->A,  in BCNF? Key: D, G->A violates BCNF

R2(B,C,D,F)   BD->C   Key: BDF, BD is not a superkey, not IN BCNF

R1(A,D,E,G)   D->EG, G->A,  in BCNF? Key: D, G->A violates BCNF

R11(A,G)   G->A
R12(D,E,G) D->EG


R2(B,C,D,F)   BD->C   Key: BDF, BD is not a superkey, not IN BCNF


R21(B,C,D)  BD->C
R22(B,D,F)  {}

The result:

AG
DEG
BCD
BDF

---------

4NF decomposition

ABFH

AB=>F

(ABF)
(ABH)

----------------


People:  id, name
Students: class
Staff:   rank

Converting to relational data model:

Model 1:

People(id, name, class, rank, isstudent, isstaff)

Model 2:

People(id, name)

Student(id, class)

Staff(id, rank)

To query students: People join Student
To query staff: People join Staff

create table people ....

create table students under people ...
create table staff under people ...


Model 3:

Student(id, name, class)
Staff(id, name, rank)

---------

Weak entities

AwardType({Type,Name})
AwardYear({AwardType,AwwardName,Year})

If awardyear was a strong entity

AwardYear({id}, year,awardtype,awardname)

---------------------


Which of the following schedule and sequence of locks are possible
under two phase locking (Note: sl, xl, ul stand for shared lock, xlock
and unlock of a data item. Assume that xl works like an upgrade
lock.)?

sl1(x) sl2(x) sl2(y) r1(x) r2(x) r2(y) ul2(y) ul2(x) xl1(x) xl1(y) w1(x) w1(y) [not possible! xl2(z)] w2(z)


sl2(x) sl2(y) sl1(x) r1(x) r2(x) r2(y) ul2(x) ul2(y) xl1(x) w1(x) [not possible! w1(y)] xl1(y) xl2(z) w2(z)

T1: xl x  
T2:  <shrinking phase>

sl2(x) sl2(y) sl1(x) r1(x) r2(x) r2(y) ul2(x) ul2(y) xl1(x) [not possible! xl2(z)] w1(x) w1(y) xl1(y) w2(z)


T1: xl x
T2: <shrinking phase>

sl2(x) sl2(y) sl1(x) r1(x) r2(x) r2(y) xl2(z) ul2(x) ul2(y) [not possible! w1(x)] w1(y) xl1(y) w2(z)

T1: sl x
T2:  xl z <shrinking phase>

sl1(x) sl2(x) r1(x) r2(x) sl2(y) r2(y) xl2(z) ul2(y) ul2(x) xl1(x) xl1(y) w1(x) w1(y) w2(z)
Possible!

T1: xl x xl y
T2:  xl z <shrinking phase>

xl1(x) sl2(x) r1(x) r2(x) sl2(y) r2(y) xl2(z) ul2(y)  xl1(y) w1(x) w1(y) w2(z)


-------------------

Question 3. For the following schedule, check all that is correct:

r1(x) r3(y) w3(y) r1(t) w1(x) r3(z) r3(z) r2(w) w2(y) w2(z)

Conflicts:

r3(y) w2(y)
w3(y) w2(y)
r3(z) w2(z)

T3->T2

T1 T3 T2
T3 T1 T2
T3 T2 T1

--------------

books(isbn13, title, published_date)
authors(isbn13, author)

For each author, return the first book they have written
  >> such that for this author and book, there is no earlier
     book by the same author

select
    a.author
    , a.isbn13
    , b.published_date
from
    authors a
    , books b
where
    a.isbn13 = b.isbn13
    and not exists (a book by the same author written earlier,
                    i.e. before b.published_date)


select
    a.author
    , a.isbn13
    , b.published_date
from
    authors a
    , books b
where
    a.isbn13 = b.isbn13
    and not exists
        (SELECT * from authors a2, books b2
	 WHERE a2.author = a.author  -- same author
	       and a2.isbn13 <> a.isbn13  --- different book
	       and a2.isbn13 = b2.isbn13
	       and b2.published_date < b.published_date  -- written earlier )