\documentclass{article}
\usepackage{techexpl}

\input{control}
%\renewcommand{\privatenotes}[1]{{\em Comment: #1}}

\begin{document}

\thispagestyle{empty}

\begin{center}
\vspace{1.0in}
\Large\bf
Data Structures and Algorithms ---   CSCI 230 \\
Algorithm Analysis
\end{center}

\privatenotes{Presentation order differs from that of the book.
Motivating examples are presented first.}


\hdr{Motivating Examples}

Here are three standard algorithms --- two searches and one sort ---
which should be analyzed to determine their computational
efficiency. 

\privatenotes{Discuss briefly why we might want to do this.}

\bigskip
\bigskip

\noindent
\textbf{Sequential Search:}

\begin{verbatim}

//  Sequentially search an array of n elements to 
//  determine if a given value is there.  If so, set loc to 
//  be the first array location containing it and return true.  
//  Otherwise, return false.

bool
SeqSearch(float arr[], int n, float value, int & loc)
{
  loc=0;
  while (loc<n && arr[loc] != value) {
    ++ loc;
  }
  return loc<n;
}
\end{verbatim}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\bigskip
\bigskip
\noindent
\textbf{Insertion Sort:}

\begin{verbatim}
//  Sort an array of n elements using insertion sort.

void
InsertSort(float arr[], int n)
{
  for (int i=1; i<n; i++) {
    float temp = arr[i];
    int j = i-1;
    while (j>=0 && arr[j] > temp) {
      arr[j+1] = arr[j];
      j -- ;
    }
    arr[j+1] = temp;
  }
}
\end{verbatim}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\noindent
\textbf{Binary Search:}

\begin{verbatim}

// Use binary search to determine if a given value is
// somewhere in an ordered array of n elements.  If so,
// set loc to be the first array location containing it
// and return true.  Otherwise, set loc to be the array
// location where it should be inserted and return false.
// The array ordering is assumed to be what's called
// "non-decreasing" order, which means that
//     arr[0] <= arr[1] <= ... <= arr[n-1]
//  or, more precisely,
//     for 0 <= i < n-1, arr[i] <= arr[i+1]

bool
BinSearch(float arr[], int n, float value, int & loc)
{
  int low = 0, high = n-1, mid;
    
  //  Before each iteration of the loop, the following
  //  conditions hold:
  //     0 <= low < high < n, 
  //     for each j, 0 <= j < low, arr[j] < value
  //     for each j, high <= j < n, value <= arr[j]
  //
  while (low < high) {
    mid = (low + high) / 2;
    if (value <= arr[mid])
      high = mid;
    else
      low = mid+1;
  }

  loc = low;
  if (arr[loc] == value)
    return true;
  else {
    if (loc == n-1 && arr[n-1] < value) loc = n;
    return false;
  }
}
\end{verbatim}

\privatenotes{Explain the comments in the loop conditions}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\hdr{Exercise}

How many operations, as a function of the array size $n$, are required
by \verb$SeqSearch$.  If you finish this, try to answer the same
question for \verb$InsertSort$.  What issues arose in your discussion?

\privatenotes{I am looking for several confusions / controversies: the
definition of an operation; different actual functions as answers
(which motivates order notation); questions about how to handle nested
loops; germination of best case, worst case, average case issues in
insertion sort.}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\hdr{Algorithm Analysis Rules}
\begin{itemize}
\item The goal is to determine the worst-case or average-case time
required by an algorithm, generally as a function of the ``size'' of
the data.  (Sometimes even the ``best-case'' is considered!)

\item Assumptions:  sequential execution, simple statements cost 1
unit of time, infinite memory, integers and reals represented in a
fixed amount of memory.

\overnew
\item Generally, statements are counted to form a function $f(n)$,
where $n$ is the size of the data.  Sometimes only special operations
such as comparisons or exchanges are counted.

\item We will discuss in class rules for counting when algorithms include:

 \begin{itemize}
 \item Consecutive statements.

 \item If-then-else.

 \item Loops and nested loops.

  \privatenotes{Illustrate two styles with insertion sort: (1)
  deriving a function to count the number of operations in the inner
  loop and summing over the outer loop; (2) placing an immediate upper
  bound for the inner loop and using it to simplify the outer loop.}

 \end{itemize}
\end{itemize}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\hdr{Order Notation}

Order notation is a mathematical formalism used to summarize the
computation time required by an algorithm, simplifying the function
derived to count the number of operations.  On the positive side, this
avoids quibbling over the number of operations (and cost) involved in simple
algorithmic steps.  On the negative side, this does result in some
loss of precision.

\privatenotes{I am dropping discussion of little-oh and big omega as
too confusing and not terribly important at this stage in the
students' careers.}

\begin{itemize}
\item $T(n) = O(f(n))$ if there are constants $c$ and $n_0$ such that
$T(n) \leq c f(n)$ for all $n \geq n_0$.

\item $T(n) = \theta(f(n))$ if and only if $T(n) = O(f(n))$ and $f(n)
= O(T(n))$.

\privatenotes{Use some of the different answers for SeqSearch
to illustrate this.}

\item Limits may be used to simplify this.  Suppose
\[
   \lim_{n\rightarrow \infty} \frac{f(n)}{g(n)} = L.
\]
Then
  \begin{itemize}
  \item If $L=0$, $f(n) = O(g(n))$.
  \item If $0 < L < \infty$, $f(n) = \theta(g(n))$.
  \end{itemize}
If $L$ doesn't exist, nothing can be concluded.  L'Hopital's rule may be
used to analyze the limit. This requires converting $f$ and $g$ from
functions of integers to functions of real numbers, which is usually trivial.

\privatenotes{Show another example here.  Then skip ahead
to the first exercise.}

\end{itemize}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\hdr{Order Notation --- Rules for Manipulation}

\begin{itemize}
\item  If $T_1(n) = O(f(n))$ and $T_2(n) = O(g(n))$, then
  \begin{itemize}
  \item $T_1(n) + T_2(n) = \max(O(f(n)), O(g(n))$, and
  \item $T_1(n) T_2(n) = O(f(n) \cdot g(n))$
  \end{itemize}
The same rules hold when $\theta$ is used throughout.

\item  If $T(n)$ is a polynomial of degree $n$ then $T(n) = O(n^k)$
(actually, $T(n) = \theta(n^k)$).

\item $(\log n)^k = O(n)$ for any constant $k>0$, but $(\log n)^k \neq
\theta(n)$.  Also, if $a>0$ is any \emph{fixed} constant, then $a =
O((\log n)^k)$, but $a \neq \theta((\log n)^k)$.

\privatenotes{Work through a combined example.}

\overnew
\item  ``$O$'' estimates for summations are done in two ways.
  \begin{itemize}
  \item Evaluate the summation using techniques from Chapter~1 and
  then determine an ``$O$'' (or ``$\theta$'') estimate from the
  resulting function.
  \item Place upper bounds on terms in the summation to simplify it
  and eliminate the summation.
  \end{itemize}

  \privatenotes{Demonstrate each of these.}

\end{itemize}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


\hdr{Order Notation Exerices}

\begin{enumerate}

\item Show that $5n^2 + 6n = O(n^2)$ using the original definition of
``$O$'' and then using limits.

\overnew
\item
For each pair of functions, $T(n)$ and $f(n)$, determine which of the
following hold:
\[
T(n) = O(f(n)) \qquad
T(n) = \theta(f(n))
\]
Justify your answer.
(Assume $k$, $a$ and $b$ are unspecified constants greater than 1 and
$a > b$.)
\begin{enumerate}
\item
$T(n) = n^2 \displaystyle \log n + 5 n$,
$f(n) = n^3$

\item
$T(n) = \displaystyle \log (n^k)$,
$f(n) = (\displaystyle \log n)^k$

\item
$T(n) = \displaystyle \log_a n$,
$f(n) = \displaystyle \log_b n$

\item
$T(n) = 2^n$,
$f(n) = 2^{(2n)}$.
\end{enumerate}

\overnew
\item Give the best possible $O$ estimate for
$T(n)$,
\begin{enumerate}
\item
$T(n) = (n^3 + 10 n^2) \cdot (n^3 \log n + 20 n^4)$

\answer{\begin{eqnarray*}
T(n)& = &(n^3 + 10 n^2) \cdot (n^3 \log n + 20 n^4)\\
    & = &(n^6 \log n + 10 n^5 \log n + 20 n^7 + 200 n^6)\\
    & = & O(n^7)
\end{eqnarray*}
}

\item
$T(n) = n 3^n + n^{10} + 1500 n^3 \displaystyle \log n$.

\answer{\begin{eqnarray*}
T(n)& = & n 3^n + n^{10} + 1500 n^3 \log n\\
    & = & O(n 3^n)
\end{eqnarray*}
}



\item 
$T(n) = \sum_{i=1}^n 5 i(i-1)$

\answer{\begin{eqnarray*}
T(n) & = & \sum_{i=1}^n 5 i(i-1) \\
     & = & 5 \sum_{i=1}^n i^2 - 5 \sum_{i=1}^n i\\
     & = & \frac{5}{6}n(n + 1)(2n+1) - \frac{5}{2}n(n+1)\\
     & = & \frac{5}{3}n^3 + \mbox{lower order terms}\\
     & = & O(n^3)
\end{eqnarray*}

Or, even more simply:

{\begin{eqnarray*}
T(n) & = & \sum_{i=1}^n 5 i(i-1) \\
     & = & \sum_{i=1}^n O(i^2)\\
     & = & O(n^3)
\end{eqnarray*}
}}


\end{enumerate}

\item Derive an ``$O$'' estimate for the worst-case of \verb$InsertSort$
based on the function we derived in class.

 \answer{We actually got a variety of answers for the number
of operations, depending on what we counted.
If we count the two simple statements in the body of the
outer loop and the two simple statements in the body of the inner
loop, and assume the inner loop makes the maximum number of iterations,
we get 
\begin{eqnarray*}
T(n) & = & \sum_{i=1}^{n-1} (2 + \sum_{j=0}^{i-1} 2) 
\end{eqnarray*}
Try evaluating this yourself.

\continue{First evaluating the inner sum and then the outer one,
\begin{eqnarray*}
T(n) & = & \sum_{i=1}^{n-1} (2 + 2i) \\
     & = & 2 (n - 1) + (n - 1) n \\
     & = & n^2 + n - 2\\
\end{eqnarray*}

What's this in $O$ notation?

\continue{
\begin{eqnarray*}
 T(N)  & = & O(n^2)
\end{eqnarray*}
We could count other operations, such as incrementing i and j,
and we would get different constants in the exact formula for $T(n)$, but
constants don't matter in the $O$ notation.
}}}
\end{enumerate}



%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\hdr{Algorithm Analysis Exercises}

\begin{enumerate}

\item
Count the number of operations in each of the following two code
fragments as a function of $n$, the length of the array.  Each should
yield a summation.  Then, analyze each summation to give the best
possible ``$O$'' estimate for each fragment.
\begin{enumerate}
\overnew
\item
\begin{verbatim}
    //  assume  arr  is an array containing n integers
    int k = 5;
    for (int i=0; i<=n-k; i++) {
      sum = 0;
      for (int j=i; j<i+k; j++) {
        sum += arr[j];
      }
      cout << "Sum of elements " << i << " through "
           << i+k-1 << " is " << sum << "\n";
    }
\end{verbatim}

\answer{Counting each assignment and output statement as one
operation, we have
\begin{eqnarray*}
T(n) & = & 1 + \sum_{i=1}^{n-5} (2 + \sum_{j=i}^{i+5} 1)\\
\end{eqnarray*}
Try evaluating this yourself.  

\continue{
\begin{eqnarray*}
T(n) & = & 1 + \sum_{i=1}^{n-5} (2 + \sum_{j=i}^{i+5} 1)\\
     & = & 1 + \sum_{i=1}^{n-5} (2 + 5)\\
     & = & 1 + 7 \sum_{i=1}^{n-5} 1\\
     & = & 1 + 7 (n - 6)\\
     & = & 7n - 5\\
     & = & O(n)
\end{eqnarray*}

Note: Often when there are nested loops, the number of operations
is quadratic.  In this case, though, the inner loop
only iterates a constant number of times (5), so the total
time is linear, not quadratic.
}}

\bigskip

\overnew
\item
\begin{verbatim}
    //  assume  arr  is an array containing n integers
    int k = n/2;
    for (int i=0; i<=n-k; i++) {
      sum = 0;
      for (int j=i; j<i+k; j++) {
        sum += arr[j];
      }
      cout << "Sum of elements " << i << " through "
           << i+k-1 << " is " << sum << "\n";
    }
\end{verbatim}
\end{enumerate}

\answer{Again counting each assignment and output statement as one
operation, we have 
\begin{eqnarray*}
T(n) & = & 1 + \sum_{i=1}^{n/2} (2 + \sum_{j=i}^{i+n/2} 1)\\
     & = & 1 + \sum_{i=1}^{n/2} (2 + n/2)\\
     & = & 1 + 2 \sum_{i=1}^{n/2} 1 + (n/2)\sum_{i=1}^{n/2} 1 \\
     & = & 1 + n + n^2/4 \\
     & = & O(n^2)
\end{eqnarray*}
}
\item
Rewrite the second code fragment to make it as efficient as possible.
Start by thinking carefully about what it actually does!  What is the
complexity of your new code fragment?

 \hint{It computes a series of sums, but they are closely related to
each other, so it's not necessary to compute each sum from scratch.}


\item In the analysis of \verb$InsertSort$ we assumed that the
worst-case would occur at all times.  What must be the state of the
array for the absolute maximum number of operations to occur?  Repeat
the analysis of \verb$InsertSort$ to derive average case and best case
estimates?  What state of the array causes the best-case to occur?

\privatenotes{Foreshadow Quick Sort.}
\end{enumerate}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\hdr{More advanced analysis:}

\begin{itemize}

\item In analyzing recursive algorithms, usually a recursive equation
(also called a ``recurrence relation'') is derived modeling the number
of steps required, which is then solved to yield a non-recursive
formula.  We will examine this using the factorial function and then,
later, a solution to the max subsequence sum problem.

\item Logarithmic times in analysis usually arise from algorithms,
such as Binary Search and Merge Sort, that break an array or data set
in half and then consider one or both halves separately.

\privatenotes{Use binary search.}

\end{itemize}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\hdr{Max Subsequence Sum}

\privatenotes{Obviously, define the problem and give an example first.

If there is time, have the students try all of these.

The analysis of the divide-and-conquer algorithm must be done.  Show
the derivation of the recursive equation and then solve using
back-substitution.}

\begin{itemize}
\item The simplest solution:  algorithm 1 and its analysis.

\item An easy refinement: algorithm 2 and its analysis.

\item Divide-and-conquer:  algorithm 3 and its analysis.

\item A simple, fast solution:  algorithm 4 and its analysis.

\item We will confirm the analysis results experimentally.

\end{itemize}

\privatenotes{At this point do two things.  First, run the code
timing the 4 different algorithms.  Second, put up the algorithm
running time chart.}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\hdr{Exercises:}
\begin{enumerate}
\item Derive a recursive equation to analyze MergeSortand then solve
this equation.  Assume the array is of size $n=2^k$ for integer $k
\geq 0$.  For completeness, here is the algorithm (combining material
from Ch~1 and the Ch~1 review):
\begin{verbatim}
    template <class T>
    void MergeSort(T * pts, int n)
    {
      MergeSort(pts, 0, n-1);
    }

    templage <class T>
    void MergeSort(T * pts, int low, int high)
    {
      if (low == high) return;

      int mid = (low + high) / 2;
      MergeSort(T, low, mid);
      MergeSort(T, mid+1, high);

      //  At this point the lower and upper halves
      //  of "pts" are sorted. All that remains is
      //  to merge them into a single sorted list.
      T* temp = new T[high-low+1];  
         // scratch array for merging
      int i=low, j=mid+1, loc=0;

      //  while neither the left nor the right half is exhausted, 
      //  take the next smallest value into the temp array
      while (i<=mid && j<=high) {
        if (pts[i] < pts[j]) temp[loc++] = pts[i++];
        else                   temp[loc++] = pts[j++];
      }

      //  copy the remaining values --- only one of 
      //  these will iterate
      for (; i<=mid; i++, loc++) temp[loc] = pts[i];
      for (; j<=high; j++, loc++) temp[loc] = pts[j];

      //  copy back from the temp array
      for (loc=0, i=low; i<=high; loc++, i++) pts[i]=temp[loc];
      delete [] temp;
    }
\end{verbatim}


\item Find an efficient algorithm (along with a running time analysis)
to find the \emph{minimum} subsequence sum.
\end{enumerate}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\hdr{Review Problems}

Here are a few review problems which have appeared on homeworks or
tests in previous semesters.  Practice writing solutions carefully and
then compare to solutions provided on-line.  If you can solve these
problems and the problems we worked on in class then you are ready for
the chapter quiz!

\begin{enumerate}

%% From spring 96, test 1
\item  Show that ${\displaystyle \sum_{i=1}^n 2 i^3 = O(n^4)}$.

\item For each of the following, find $f(n)$ such
that $t(n) = O(f(n))$.  Make $f(n)$ as small and simple as possible,
i.e. don't write $t(n) = O(n^4)$ when $t(n) = O(n^3)$.  Justify your
answers.
\begin{enumerate}
\item $ \displaystyle t(n) = 13 n^2 +  2^n $

\item $ \displaystyle t(n) = 5 (n + 3 \log n) (n \log n + 13) \log n +
13 n^2$
\item $ \displaystyle t(n) =  \sum_{i=3}^{n} \sum_{j=i}^n i (n - j)$
\end{enumerate}

\item Exercise 2.6a from the text.  Try to derive summations first.
Note program fragment~(6) is quite difficult.

\item
Derive a summation to count, as a function of $n$, the number of times
\verb$Hello$ is output by each of the following code fragments.
Obtain an accurate``$O$'' estimate from the summation.

\begin{enumerate}
\item
\begin{verbatim}
    for (i=1; i<=n; i++) 
      for (j=1; j<=i; j++)
        for (k=j+1; k<=n; k++)
          cout << "Hello\n";
\end{verbatim}


\item For this part, assume $n = 2^k$ and assume the notation
\verb$2^i$ means $2^i$.
\begin{verbatim}
    for (i=0; i<=k; i++) 
      for (j=2^i+1; j<=n; j++)
        cout << "Hello\n";
\end{verbatim}

\end{enumerate}

\item Exercise 2.11 of the text.

\item Write an algorithm that takes an unsorted list (provided as an
array) of $n$ floating point values and returns the smallest
difference between any two values in the list.  For example, for the
list
\begin{verbatim}
        2.9, 3.5, 1.1, 6.1, 2.3, 1.8, 8.7, 3.0, 2.4,
\end{verbatim}
the algorithm should return 0.1, which is the difference between 3.0
and 2.9.  Make your algorithm as efficient as you can and give the
worst-case running time of your algorithm as a function of $n$,
briefly justifying your answer.  Hint: you may change or reorganize
the contents of the array.

\end{enumerate}


\end{document}
