DSA Lab 9

CSCI 2300

Data Structures and Algorithms

Spring 2000

Lab 3
Sorting Algorithms

The components in the standard library will be used frequently, so it is worth making an extra effort to provide the best implementation possible for a given algorithm or data structure. In this lab we will explore the implementation of the standard library sorting algorithm, sort. The sort algorithm has a prototype that looks like this:

          template <class Iterator>
          void sort(Iterator first, Iterator last);

where first and last are random access iterators to a container. (There is another prototype for sort that allows the programmer to specify an arbitrary comparison function, which we ignore in this lab.) The function is required to sort the container in O(n log n) time in the average case, and the natural choice of algorithm to use is Quicksort. However, as we will see, the garden variety Quicksort can be improved upon significantly.

Copy the files:

insertion_sort.h

rng.h

quicksort.h

quicksort

test.cpp

the amount of time it takes to execute;
the number of comparison operations performed; and
the number of assignments performed (similar to the number of swaps).

timer.h

counter.h

operator<

operator=

Run the program. Build and run the test program, using the above methodology (running it three times). Note that the main program is currently using the standard library sort as the tested algorithm. Jot down the results that you get so that you can use them as a baseline for comparison with your own Quicksort implementation.

Note: You may want to adjust the size of the test vectors; you want it to be big enough that the timing results are accurate, but not so big that each test takes too long. A good target is for the average time to sort the vector to be around a tenth of a second.

Implement and test Quicksort. Edit the file quicksort.h and fill in the body of the function quicksort. Don't do anything fancy (yet); recall that Quicksort is most easily specified as a recursive algorithm, and follow this pseduo-code:

          if the size of array is greater than 1
            partition the array (using the supplied function)
            recursively sort the left half
            recursively sort the right half
          endif

Comparison to other algorithms. What about other sorting algorithms? Could they produce a faster library sort? Let's try them, using functions found in the standard library:

Merge sort: replace the calls to quicksort in the main program with calls to std::stable_sort.
Heap sort: replace the calls to quicksort in the main program with calls to std::partial_sort (this function takes a third argument, which here should be the same as the second; i.e., the end() iterator).
Insertion sort (try this on a small containers, say, of size less than 10 elements): use the implementation of insertion sort given in insertion_sort.h.

Quicksort seems to be the best algorithm (except for very small containers), but our simple implementation can be improved. Let's look at some optimizations and try to get our version of Quicksort to be as fast as the one in the standard library.

Improving the base case. We can speed things up by increasing the size of the base case, and using insertion sort to handle "small" arrays less than a certain cutoff in length. (Remember that insertion sort is more efficient for small arrays.) A good value for the cutoff is anywhere between 9 and 16. An implementation of insertion sort can be found in insertion_sort.h.

Change your code to match the pseduo-code below and build and test your program, again jotting down the results. Do you notice anything?

Pseduo-code:

          if the size of array is greater than the base case cutoff length
            partition the array (using the supplied function)
            recursively sort the left half
            recursively sort the right half
          else (base case)
            insertion sort the array
          endif

A temporary setback. We are making good progress, but danger lurks just beneath the surface. Test your function again, but first modify the main program to comment out the call to the random_shuffle function. (We are therefore sorting a vector that is already sorted; surprisingly, this case is actually quite common, and so we should expect a standard sort function to handle it well). What happens? (You may want to temporarily reduce the size of the test vectors here.) Change the main program again to use the standard library sort; does it have the same problem?

How can we fix this major flaw in our implementation? If the vector is not in random order to begin with, we can provide some randomization ourselves by choosing as the pivot for the partitioning function a random element from the container, rather than just the first. The correct way to do this is to swap the first element with another element in the container chosen at random. Modify the partitioning code in your implementation to do this now (you will want to use the standard function std::iter_swap). Make use of the code in rng.h, which provides a function object to generate random numbers. You use the function object like this:

          rng random_number;     // The function object; this should be
                                 // a global variable, not a local.

  ...

          template <class Iterator, class T>
          Iterator partition(Iterator first, Iterator last, T*)
          {
           ...
            int N = 10;
            int r = random_number(N);  // Returns a number between 0 and N-1.

quicksort

A better pivot choice. While choosing the pivot at random is ideal from a theoretical standpoint, as a practical matter generating the random numbers is fairly time consuming. There is a better way to choose the pivot, known as the median-of-3 strategy. The pivot is chosen as the median of the first, middle, and last elements in the container. Here is some pseudo-code that will swap the median of the first, middle, and last elements to the first position in the container:

          if the middle is greater than the last, swap them
          if the first is greater than the last, swap them
          (note that the last element is now the largest of the 3)
          if the middle element is greater than the first, swap them
          (the first element is now the median of the 3)

(last-1)

(first + (last - first) / 2)

Build and test your program; it should still handle the presorted vector case with good performance. In fact, even on random inputs you should notice better performance, due to the fact that the median-of-3 is a better choice for the pivot than just a single random choice. Change the main program back so that it randomly mixes up the test vectors and test your function one more time, recording the results.

Using sentinels. Most of the time used by Quicksort is spent executing the "inner loops" of the partitioning algorithm. Take a look at your partitioning code; notice how each inner loop contains two tests: one to make sure we don't go off the end of the container, and another to check for an element greater (or less than) the pivot. This former test takes a considerable fraction of the time spent in the inner loops, and if we could remove it, we would expect to see a marked improvement in performance.

The way we can remove the extra test is through the use of sentinels. A sentinel is simply an element in the container that we know will prevent the loop from going off the end; for example, for the first inner loop, a sentinel would be any value greater than the pivot that we know is located before the end of the container. As another example, look at the code for insertion sort; it uses a special case to ensure that there will always be a "smaller" element found before the begining of the container, and thus its inner loop does not have to check for the begining of the array.

Since we have just chosen the pivot to be the median of three elements, the other two elements provide ideal sentinels; all we have to do is make sure they are in the right locations in the container. In fact, one of them is already in the right spot; the largest of the three elements was swapped to the last position of the container, and so serves as a sentinel for the first inner loop. The smallest of the three elements is located in the middle position; we need to swap it with the element in the second position of the container (we want to keep the pivot in the first position).

We can make one more small optimization; since we know the two sentinel elements are less than and greater than the pivot, respectively, we can start the "scans" of the partitioning function at first + 1 and last - 1. Make these changes, rebuild, and test your program, recording the results. How does your function compare to the standard library's now?

Summary of changes:

Swap the middle element (smallest of 3) with the 2nd element in the array.

Remove the test for the end of the array from the two inner loops in the partitioning algorithm.

Start the partitioning "scan" from the 2nd and 2nd-to-last elements, instead of the first and last elements.

Using an explicit stack. The recursion in Quicksort serves to "remember" the subarrays that still need to be sorted; by keeping track of these ourselves with an explicit stack data structure, we can gain some more performance.

The simplest (and fastest) way to implement this stack is simply as an array of pairs of iterators, with a variable to keep track of the "top" position, i.e.:

          std::pair<Iterator, Iterator> stack[30];
          int top = 0;

smaller

Remember that you can determine the size of a container between two iterators by subtracting them; also remember that remember that "recursively sorting the smaller half first" means that we push it onto the stack last!

          initialize stack
          push <first, last> onto stack
          while the stack is not empty
            pop <first, last> off the stack (the "current portion")
            if the size of the current portion is greater than the cutoff
              partition the current portion
              push the larger half onto the stack
              push the smaller half onto the stack
            else
              insertion sort the entire container
            endif
          endwhile

Removing tail recursion. Look at the pseudo-code above, and notice that we often push the smaller half onto the stack, followed immediately by popping it off again. We can save some time by eliminating the extraneous push/pop pair; this is known as eliminating tail recursion.

We can also move the insertion sort outside the loop, so that the entire array is insertion sorted at the end (this will run in linear time, since due to the partitioning each element in the array will be close to its final position).

          initialize stack
          push <first, last> onto stack
          while the stack is not empty
            pop <first, last> off the stack (the "current portion")
            while the size of the current portion is greater than the cutoff
              partition the current portion
              push the larger half onto the stack
              current portion is now the smaller half
            endwhile
          endwhile

          insertion sort entire container (one pass)

Eliminating still more push/pop pairs. Notice that if both the smaller half and the larger half following partitioning are smaller than the base case cutoff, we will push the larger half, ignore the smaller half, and then immediately pop the larger half back off the stack.

          initialize stack
          push <first, last> onto stack
          while the stack is not empty
            pop <first, last> off the stack (the "current portion")
            while the size of the current portion is greater than the cutoff
              partition the current portion
              if the larger half is larger than the base case cutoff
                push the larger half onto the stack
                current portion is now the smaller half
              else (both halves are below cutoff for base case)
                break
              endif
            endwhile
          endwhile

          insertion sort entire container (one pass)

Additional optimizations? If you get this far and have time, try tinkering with the code to make it even faster. Can you make it any better than it already is? (One thing to try is to notice that there is an extra push/pop done at the very begining of the main loop, when the entire container is pushed onto the stack initially and then immediately popped off. This can be eliminated by recoding the loop using a goto statement. Another possible improvement is to eliminate the overhead of doing both the partitioning and the final insertion sort pass as a function call by moving the code into the quicksort function proper.) Remember there will be some statistical variation in the timings, so to count as a real optimmization, your change must consistently result in faster results.

At the conclusion of this lab, take a moment to review the following:

You have explored some optimizations to the basic Quicksort algorithm, resulting in significant improvements in performance. You have also seen that this cause the code to more than double in size and become much harder to understand.

You should have an appreciation for the amount of effort that goes into developing a good standard library component, and for the value of knowing and using the tools that a library makes available.

Last updated: 2 Nov 1998 by valoisj@cs.rpi.edu