The C++ Standard Library specification of generic algorithms
places requirements not only on the syntax (interfaces) and semantics
(input/output relations) of the algorithms, but also on the computing
time the algorithms are allowed to take. Most of the fundamental data
processing algorithms specified have O(N) (linear time) or
O(N logN) bounds, but in a few cases
O(N^{2}) (quadratic) worst case time behavior is
permitted. Such bounds were specified wherever there were algorithms
known, such as quicksort, that were significantly faster than other
algorithms in the average case, and the cases that could cause
quadratic behavior were rarely encountered in practice. It would be
much more satisfactory, however, if these bounds could be tightened,
disallowing quadratic behavior for these generic algorithms entirely.
The following papers show how this goal can be achieved.
 David R. Musser, Introspective Sorting and Selection
Algorithms (draft
April 1996, revised December 1996, final version published in
SoftwarePractice and Experience, (8): 983993 (1997)),
presents a new, hybrid sorting algorithm, introsort (for
introspective sort), that behaves almost exactly like
medianof3 quicksort for most inputs (and is just as fast) but which
is capable of detecting when partitioning is tending toward quadratic
behavior. By switching to heapsort in those situations, introsort
achieves the same O(N logN) time bound as heapsort
but is almost always faster than just using heapsort in the first
place. Detailed performance measurements reported in the paper were
conducted with the aid of the new counting components discussed
in a later section.
 The same paper describes a new selection algorithm,
introselect (for selecting the kth largest element of
a sequence) based on an introspective approach similar to that of
introsort. Its worst case computing time is O(N) instead of
the O(N^{2}) bound of Hoare's find algorithm.
 David R. Musser and Gor V. Nishanov, A Fast Generic
Sequence Matching
Algorithm
(December 1997; also available, without appendices, as TR
9711), describes a string matchingand
more generally, sequence matchingalgorithm that has a linear
worstcase computing time bound, a low worstcase bound on the number
of comparisons (2N), and sublinear averagecase behavior that is
better than that of the fastest versions of the BoyerMoore
algorithm. The algorithm retains its efficiency advantages in a wide
variety of sequence matching problems of practical interest, including
traditional string matching; largealphabet problems (as in Unicode
strings); and smallalphabet, longpattern problems (as in DNA
searches). Since it is expressed as a generic algorithm for searching
in sequences over an arbitrary type T, it is well suited for
use in generic software libraries such as the C++ Standard Template
Library. The algorithm was obtained by adding to the
KnuthMorrisPratt algorithm one of the patternshifting techniques
from the BoyerMoore algorithm, with provision for use of hashing in
this technique. In situations in which a hash function or random
access to the sequences is not available, the algorithm falls back to
an optimized version of the KnuthMorrisPratt algorithm. The source
files for the algorithms, test programs, and test data are
available.
Based on these results, quadratic computing time behavior could now
be banished from the STL components! Unfortunately it is too late to
change the C++ Standard, but perhaps compiler and library vendors will
adopt the new algorithms voluntarily. Silicon Graphics has already
adopted introsort instead of quicksort for the implementation
of sort
in SGI STL,
and it has propagated from there to GNU C++ and to
Boris Fomitchev's
STLport
(which makes SGI STL available on many different platforms).
