Query Size Estimation ------------------------- How many tuples will be returned by a query? Sel() = the percentage of tuples that will pass the condition C = selectivity of condition C SELECT * FROM R WHERE ; Expected tuples in the result = Tuples(R) * Sel() Statistics: ---------------- N_DISTINCT(Attr) MinVal MaxVal ----------------- Sel(X = a) = 1/N_DISTINCT(Attr) Class (FR/So/Jr/Se) Class = 'Junior': 1/4 Sel(Age = 20) = 1/30 ------------------ Sel(a1 <= X and X <= a2) = (a2-a1)/ (maxval-minval) -------------------- Q: SELECT * FROM R WHERE R.A=S.B; sel(R.A = S.B) = 1/max(N_DISTINCT(R.A), N_DISTINCT(S.B)) Movies=> movieid primary key, R 1,000 tuples (N_DISTINCT(movies.movieid)=1000) MovieGenres=> 8,000 tuples (N_DISTINCT(MovieGenres.movieid)=950) sel(Movies.movieid = MovieGenres.movieid) = 1/max{1000,950} Tuples(Q) = Tuples(Movies)*Tuples(MovieGenres)* 1/1000 = 1000 * 8000 * 1/1000 = 8,000 -------------------- sel(not Cond) = 1 - sel(Cond) sel(Cond1 and Cond2) = sel(Cond1) * sel(Cond2) sel(Cond1 or Cond2) = sel( not ((not Cond1) and (not Cond2)) ) = 1 - ((1 - sel(Cond1)) * (1 - sel(Cond2))) Class = 'Junior' and Age = 24 -------------------- select avg_width , histogram_bounds from pg_stats where tablename = 'movies' and attname = 'duration'; select avg_width , histogram_bounds from pg_stats where tablename = 'series' and attname = 'imdbrating'; select avg_width , histogram_bounds from pg_stats where tablename = 'series' and attname = 'yearreleased'; ----------