CSCI  2300 Data Structures and Algorithms Spring 2000

Lab 3
Sorting Algorithms

The components in the standard library will be used frequently, so it is worth making an extra effort to provide the best implementation possible for a given algorithm or data structure. In this lab we will explore the implementation of the standard library sorting algorithm, sort. The sort algorithm has a prototype that looks like this:

          template <class Iterator>
          void sort(Iterator first, Iterator last);
where first and last are random access iterators to a container. (There is another prototype for sort that allows the programmer to specify an arbitrary comparison function, which we ignore in this lab.) The function is required to sort the container in O(n log n) time in the average case, and the natural choice of algorithm to use is Quicksort. However, as we will see, the garden variety Quicksort can be improved upon significantly.
  1. Copy the files:
  2. Create a new project as usual. For the time being, ignore insertion_sort.h and rng.h. The file quicksort.h contains an implementation of the partitioning algorithm discussed in lecture, and a stub for a quicksort function that you will implement. The main program, in test.cpp, tests a sorting algorithm using three measures: The files timer.h and counter.h provide a timer class and a special class that acts like an int type but counts the number of times operator< and operator= are called. Briefly take a look at these files, but don't worry too much about their details. Spend a little more time with the main program. There are several factors that can affect the performance of an algorithm, one being the actual input case, and another being the environment it runs in (for example, are other processes in the operating system executing concurrently). We will account for the former by running multiple trials and taking the mean of all the measurements. For the latter (which will only affect the running time, not the number of comparisons and assignments), you should run the test program three times and select the median (middle) result.
     
  3. Run the program. Build and run the test program, using the above methodology (running it three times). Note that the main program is currently using the standard library sort as the tested algorithm. Jot down the results that you get so that you can use them as a baseline for comparison with your own Quicksort implementation.
  4. Note: You may want to adjust the size of the test vectors; you want it to be big enough that the timing results are accurate, but not so big that each test takes too long. A good target is for the average time to sort the vector to be around a tenth of a second.
  5. Implement and test Quicksort. Edit the file quicksort.h and fill in the body of the function quicksort. Don't do anything fancy (yet); recall that Quicksort is most easily specified as a recursive algorithm, and follow this pseduo-code:
  6.           if the size of array is greater than 1
                partition the array (using the supplied function)
                recursively sort the left half
                recursively sort the right half
              endif
    Modify the main program to call your version of Quicksort rather than the standard library's (you need to do this in two places), and build and test the main program. Jot down the results; you should notice that your simple version of Quicksort is somewhere in the neighborhood of 30% slower than the standard library's.
     
  7. Comparison to other algorithms. What about other sorting algorithms? Could they produce a faster library sort? Let's try them, using functions found in the standard library:
  8. Are any of them faster?

    Quicksort seems to be the best algorithm (except for very small containers), but our simple implementation can be improved. Let's look at some optimizations and try to get our version of Quicksort to be as fast as the one in the standard library.
     

  9. Improving the base case. We can speed things up by increasing the size of the base case, and using insertion sort to handle "small" arrays less than a certain cutoff in length. (Remember that insertion sort is more efficient for small arrays.) A good value for the cutoff is anywhere between 9 and 16. An implementation of insertion sort can be found in insertion_sort.h.

  10.  

     

    Change your code to match the pseduo-code below and build and test your program, again jotting down the results. Do you notice anything?

    Pseduo-code:

              if the size of array is greater than the base case cutoff length
                partition the array (using the supplied function)
                recursively sort the left half
                recursively sort the right half
              else (base case)
                insertion sort the array
              endif
  11. A temporary setback. We are making good progress, but danger lurks just beneath the surface. Test your function again, but first modify the main program to comment out the call to the random_shuffle function. (We are therefore sorting a vector that is already sorted; surprisingly, this case is actually quite common, and so we should expect a standard sort function to handle it well). What happens? (You may want to temporarily reduce the size of the test vectors here.) Change the main program again to use the standard library sort; does it have the same problem?

  12.  

     

    How can we fix this major flaw in our implementation? If the vector is not in random order to begin with, we can provide some randomization ourselves by choosing as the pivot for the partitioning function a random element from the container, rather than just the first. The correct way to do this is to swap the first element with another element in the container chosen at random. Modify the partitioning code in your implementation to do this now (you will want to use the standard function std::iter_swap). Make use of the code in rng.h, which provides a function object to generate random numbers. You use the function object like this:

              rng random_number;     // The function object; this should be
                                     // a global variable, not a local.
    
      ...
    
              template <class Iterator, class T>
              Iterator partition(Iterator first, Iterator last, T*)
              {
               ...
                int N = 10;
                int r = random_number(N);  // Returns a number between 0 and N-1.
    Build and test your new function, recording the results (remember to first change the main program back to call quicksort rather than std::sort). Does this fix the problem with already sorted containers? Do you notice anything about the performance of your function?
     
  13. A better pivot choice. While choosing the pivot at random is ideal from a theoretical standpoint, as a practical matter generating the random numbers is fairly time consuming. There is a better way to choose the pivot, known as the median-of-3 strategy. The pivot is chosen as the median of the first, middle, and last elements in the container. Here is some pseudo-code that will swap the median of the first, middle, and last elements to the first position in the container:
  14.           if the middle is greater than the last, swap them
              if the first is greater than the last, swap them
              (note that the last element is now the largest of the 3)
              if the middle element is greater than the first, swap them
              (the first element is now the median of the 3)
    Remember that the "last" element is actually at position (last-1), and the middle element is at position
    (first + (last - first) / 2).)

    Build and test your program; it should still handle the presorted vector case with good performance. In fact, even on random inputs you should notice better performance, due to the fact that the median-of-3 is a better choice for the pivot than just a single random choice. Change the main program back so that it randomly mixes up the test vectors and test your function one more time, recording the results.
     

  15. Using sentinels. Most of the time used by Quicksort is spent executing the "inner loops" of the partitioning algorithm. Take a look at your partitioning code; notice how each inner loop contains two tests: one to make sure we don't go off the end of the container, and another to check for an element greater (or less than) the pivot. This former test takes a considerable fraction of the time spent in the inner loops, and if we could remove it, we would expect to see a marked improvement in performance.

  16.  

     

    The way we can remove the extra test is through the use of sentinels. A sentinel is simply an element in the container that we know will prevent the loop from going off the end; for example, for the first inner loop, a sentinel would be any value greater than the pivot that we know is located before the end of the container. As another example, look at the code for insertion sort; it uses a special case to ensure that there will always be a "smaller" element found before the begining of the container, and thus its inner loop does not have to check for the begining of the array.

    Since we have just chosen the pivot to be the median of three elements, the other two elements provide ideal sentinels; all we have to do is make sure they are in the right locations in the container. In fact, one of them is already in the right spot; the largest of the three elements was swapped to the last position of the container, and so serves as a sentinel for the first inner loop. The smallest of the three elements is located in the middle position; we need to swap it with the element in the second position of the container (we want to keep the pivot in the first position).

    We can make one more small optimization; since we know the two sentinel elements are less than and greater than the pivot, respectively, we can start the "scans" of the partitioning function at first + 1 and last - 1. Make these changes, rebuild, and test your program, recording the results. How does your function compare to the standard library's now?

    Summary of changes:
  17. Using an explicit stack. The recursion in Quicksort serves to "remember" the subarrays that still need to be sorted; by keeping track of these ourselves with an explicit stack data structure, we can gain some more performance.

  18.  

     

    The simplest (and fastest) way to implement this stack is simply as an array of pairs of iterators, with a variable to keep track of the "top" position, i.e.:

              std::pair<Iterator, Iterator> stack[30];
              int top = 0;
    How big should we make the stack? We could use a vector, which will resize itself, and not worry about it, but there is a more efficient way. If we are careful and always recursively sort the smaller half first after partitioning, the stack will never be more than log n (intuitively, this is because the smaller half will result in less stuff being put on the stack than the larger half). Since 2^30 is more than one billion (larger than any array we are likely to sort; one billion integers would take up 4 Gigabytes of memory), 30 should be more than enough.

    Remember that you can determine the size of a container between two iterators by subtracting them; also remember that remember that "recursively sorting the smaller half first" means that we push it onto the stack last!

              initialize stack
              push <first, last> onto stack
              while the stack is not empty
                pop <first, last> off the stack (the "current portion")
                if the size of the current portion is greater than the cutoff
                  partition the current portion
                  push the larger half onto the stack
                  push the smaller half onto the stack
                else
                  insertion sort the entire container
                endif
              endwhile
    Make the changes, build, test, and record the results as usual.
     
  19. Removing tail recursion. Look at the pseudo-code above, and notice that we often push the smaller half onto the stack, followed immediately by popping it off again. We can save some time by eliminating the extraneous push/pop pair; this is known as eliminating tail recursion.

  20.  

     

    We can also move the insertion sort outside the loop, so that the entire array is insertion sorted at the end (this will run in linear time, since due to the partitioning each element in the array will be close to its final position).

              initialize stack
              push <first, last> onto stack
              while the stack is not empty
                pop <first, last> off the stack (the "current portion")
                while the size of the current portion is greater than the cutoff
                  partition the current portion
                  push the larger half onto the stack
                  current portion is now the smaller half
                endwhile
              endwhile
    
              insertion sort entire container (one pass)
    Make the changes, build, test, and record the results as usual.
     
  21. Eliminating still more push/pop pairs. Notice that if both the smaller half and the larger half following partitioning are smaller than the base case cutoff, we will push the larger half, ignore the smaller half, and then immediately pop the larger half back off the stack.
  22.           initialize stack
              push <first, last> onto stack
              while the stack is not empty
                pop <first, last> off the stack (the "current portion")
                while the size of the current portion is greater than the cutoff
                  partition the current portion
                  if the larger half is larger than the base case cutoff
                    push the larger half onto the stack
                    current portion is now the smaller half
                  else (both halves are below cutoff for base case)
                    break
                  endif
                endwhile
              endwhile
    
              insertion sort entire container (one pass)
    Make the changes, build, test, and record the results as usual.
     
  23. Additional optimizations? If you get this far and have time, try tinkering with the code to make it even faster. Can you make it any better than it already is? (One thing to try is to notice that there is an extra push/pop done at the very begining of the main loop, when the entire container is pushed onto the stack initially and then immediately popped off. This can be eliminated by recoding the loop using a goto statement. Another possible improvement is to eliminate the overhead of doing both the partitioning and the final insertion sort pass as a function call by moving the code into the quicksort function proper.) Remember there will be some statistical variation in the timings, so to count as a real optimmization, your change must consistently result in faster results.

At the conclusion of this lab, take a moment to review the following:

You have explored some optimizations to the basic Quicksort algorithm, resulting in significant improvements in performance. You have also seen that this cause the code to more than double in size and become much harder to understand.
You should have an appreciation for the amount of effort that goes into developing a good standard library component, and for the value of knowing and using the tools that a library makes available.

Last updated: 2 Nov 1998 by valoisj@cs.rpi.edu