\documentclass{report} \usepackage{color} \definecolor{light}{gray}{.93} \usepackage[plainpages=false,pdftex,colorlinks,backref]{hyperref} \usepackage{graphicx} \title{Generic Distribution and Filtering Algorithms} \author{Larry Bush $<$bushl2@@cs.rpi.edu$>$\\Dan Jemiolo $<$jemiod@@rpi.edu$>$\\Kyle Ross $<$rossk2@@rpi.edu$>$\\Dave Siebecker $<$siebed@@cs.rpi.edu$>$\\David R. Musser $<$musser@@cs.rpi.edu$>$} \date{8 December 2002} \newcommand{\first}{\mbox{\texttt{first}}} \newcommand{\last}{\mbox{\texttt{last}}} \newcommand{\predTRUE}{\mbox{\texttt{predTRUE}}} \newcommand{\predFALSE}{\mbox{\texttt{predFalse}}} %created by KR ... this way we can switch how we represent code later, if we need to; I like this better than hard-coding \texttt \newcommand{\code}[1]{\texttt{#1}} % Following command is for comments or questions by me. % leave this command in for my future use but remove all uses of it % (after adding or correcting text to take care of the noted problem) \newcommand{\dm}[1]{{\bf {$\langle$DM: #1$\rangle$}}} \begin{document} \maketitle \begin{abstract} A generic software library is presented to solve the problem of classifying objects based on user-defined criteria and distributing the objects based on this classfication. Along with documented versions of the source code, a comprehensive discussion of the library is made from several perspectives: The user's guide \& tutorial and the reference manual help the reader to quickly begin to use the library. A discussion of the design issues involved in the creation of the library provides future software designers the insight required to extend the library. A lengthy set of tests and results assures the user that the library operates correctly and efficiently. \end{abstract} \tableofcontents \newpage \chapter{User's Guide} \label{sec-user-guide} \section{Introduction} \subsection{Overview} \label{sec-overview} It is often useful to take a set of items and distribute the elements of this set into two or more destinations based on some sort of decision schema. If this schema is phrased as a sequence of predicates paired with \code{OutputIterator}s then a generalised view of this problem can be solved by a small set of components that work together to filter elements from a batch of data into one or more destinations based on a collection of user-defined predicates. Input is specified with either a single \code{InputIterator} (to specify a single element) or a pair of \code{InputIterator}s (to signify a range of elements); the various output destinations and their associated predicates will be supplied by creating one or more \code{functor} classes derived from an abstract base class within the framework. Our components work seamlessly with C++ Standard Library containers and streams, since all communication with them will be through \code{Input}- and \code{OutputIterator}s. In order to be as generic and flexible as possible, the interfaces will have to make extensive use of member template functions --- this means that our work will be limited to the GNU C++ compiler, as Microsoft's Visual Studio does not yet support this important language feature.\footnote{Use of a compiler front-end --- like those developed by Comeau Computing --- would allow use of compilers such as Microsoft's that are grossly non-compliant with the C++ standard.} \subsection{Document Conventions} Since English words like ``destination'' and ``router'' have numerous meanings (the latter, particularly, in the computing sciences), explicit textual formatting will be used to differentiate between abstract ideas and concrete C++ types or concepts based on normal text for the former and \code{special text} for the latter. For example: in the context of this discussion, it is quite useful to differentiate between an abstract destination (a place where output is being sent: possibly a member of a sub-type of the \code{destination} class, possibly an \code{OutputIterator}, etc.) and the \code{destination} class as defined by the Generic Distribution and Filtering Algorithms Library. This should make it significantly easier to understand the present report. \subsection{The Generic Distribution and Filtering Algorithms Library} To solve the problem described in Section \ref{sec-overview}, the Generic Distribution and Filtering Algorithms Library (henceforth referenced as ``GDFA Library'') has been developed. Conceptually, the GDFA Library consists in three main components: the trivial \code{distribute} function, the \code{destination} base class, and the \code{router} class. As mentioned supra, each of these is templated to allow for the greatest possible generality, making the library useful over a varied domain of computing problems. \subsubsection{The \code{distribute} Function} The \code{distribute} function solves the trivial case of the general filtering problem: an input range is to be distributed to two output destinations based upon element-wise application of a single predicate --- elements for which the predicate returns \code{true} are output to one destination; all other elements are output to a second destination. \subsubsection{The \code{destination} Base Class} The \code{destination} class is an abstract class from which users of the GDFA Library may inherit in defining their own destinations, which conceptually model the pairing of a predicate and an \code{OutputIterator}, as described in Section \ref{sec-overview}. \code{destinations} have a \code{distribute} method (not to be confused with the trivial \code{distribute} function) which takes an element, applies a predicate, takes an appropriate action, and returns \code{false} if and only if the element should be passed to the following \code{destination}. This is explained in much greater detail in Section \ref{sec-apiref} and is demonstrated by example in Section \ref{sec-tutorial}. \subsubsection{The \code{router} Class} For the more general case of the problem, the GDFA Library defines a \code{router} class. This class stores and prioritises \code{destination}s and provides a convenient and re-usable way to apply an arbitrarily large number of destinations in a specified order to an input sequence. The \code{distribute} method takes a range of elements and applies the filtering order implicitly defined by the \code{destination}s; hence a single \code{router} can be defined and used to filter \emph{multiple} sequences of elements. The general schema for doing so is: create a \code{router}, add appropriate \code{destination}s to that \code{router}, call the \code{distribute} method for each sequence to be filtered. Again, a more detailed explanation can be found in Section \ref{sec-apiref} and some practical examples in Section \ref{sec-tutorial}. \section{Tutorial} \label{sec-tutorial} This tutorial contains a number of ``learn by example'' programmes that are explained in some detail, each of which illustrates a different aspect of using the GDFA Library. The database example in Section \ref{sec-database-example} illustrates a use of the trivial \code{distribute} function. The sporting goods store example in Section \ref{sec-sports-store-example} demonstrates creation of \code{destination}s and use of the \code{router} class in performing a fairly simple single-destination\footnote{``Single-destination'' refers to the fact that each element will be placed in at most (or, in the case, exactly) on \code{destination}.} filtering task. Finally, the criminal profiling example in Section \ref{sec-criminal-example} demonstrates use of a complex multi-tier\footnote{``Multi-tier'' refers to the fact that the input data are filtered and then the results of this filtering are \emph{again} filtered and so on.} filtering application. \subsection{The Database Example} \label{sec-database-example} A simple example using the trivial \code{distribute} function is to filter database jobs by priority: database jobs each have a priority from the set $\lbrace 1,2,3,4,5\rbrace$ where $1$ is low priority and $5$ is high priority. We want to separate out just the jobs that are designated high-priority. These jobs will be run first, followed by all of the other jobs. In this small example, we, of course, are not actually going to do anything with our jobs, but the idea could easily be extended and implemented for a real database. Here is an overview of the programme we will use to solve the problem: @O run-database-jobs.cpp @{#include #include #include #include #include #include #include "router.hpp" @< define database job @> int main() { std::vector urgentJobs; std::vector normalJobs; @< read and filter the jobs @> @< execute the jobs @> return EXIT_SUCCESS; } @} A simple class to represent a database job is required. It must provide storage of the job's name, its priority, and its PID. We need functionality to access the priority-level, check if the job is high-priority or not, check if the job has finished running, compare two jobs by priority, and run the job. We define an input operator as a \code{friend}. @d define database job @{class DatabaseJob { public: @< priority-level accessor @> @< is the job high-priority? @> @< is the job finished? @> @< run the job @> @< compare jobs by priority @> friend std::istream& operator>>(std::istream& in, DatabaseJob& job); private: std::string m_scriptName; int m_priorityLevel; std::string m_processID; }; @< input a job @>@} To get the job's priority-level we create an accessor; by convention we define priority-levels in the range $[0..5]$. @d priority-level accessor @{int PriorityLevel() const { return m_priorityLevel; }@} Is the job high-priority? We can easily check this by comparing to see if the priority is greater than $4$. @d is the job high-priority? @{bool IsHighPriority() const { return m_priorityLevel>4; }@} A job is finished when the process that is executing its tasks exits. For the sake of our example, we just return \code{true}. @d is the job finished? @{bool IsFinished() const { return true; }@} We create a new process that performs the tasks this job entails; for the sake of the example, we just display the job's information. @d run the job @{void Execute() { std::cout< (const DatabaseJob& rhs) const { return m_priorityLevel>rhs.m_priorityLevel; }@} In order to read jobs from our input file, we create a simple input operator. It reads a job's PID and then its priority-level. @d input a job @{std::istream& operator>>(std::istream& in, DatabaseJob& job) { in>>job.m_processID>>job.m_priorityLevel; return in; }@} Now that we have defined a simple class to hold a database job, we are ready to filter them! We open the database jobs file and filter the jobs as we read them. @d read and filter the jobs @{std::ifstream inputFile("./nightly-database-routines.txt"); @< handle file error @> @< filter the jobs @>@} If the file listing the jobs is unavailable, we display an error message and exit from the programme, indicating failure to the operating system. @d handle file error @{if (!inputFile) { std::cerr<<"No ./nightly-database-routines.txt file"<(inputFile), std::istream_iterator(), std::back_inserter(urgentJobs), std::back_inserter(normalJobs), std::mem_fun_ref(&DatabaseJob::IsHighPriority));@} Now that we have the jobs filtered by priority, we can execute them: first the high-priority jobs, then all the other jobs. For the sake of this example programme, we just print the job information. @d execute the jobs @{std::cout << "\n\nUrgentJobs\n\n"; for(std::vector::iterator i=urgentJobs.begin(); i!=urgentJobs.end(); ++i) { (*i).Execute(); } std::cout << "\n\nNormalJobs\n\n"; for(std::vector::iterator i=normalJobs.begin(); i!=normalJobs.end(); ++i) { (*i).Execute(); }@} To complete our discussion of this example, all we need is a data file. @O nightly-database-routines.txt @{JobTypeA 2 JobTypeA 2 JobTypeH 5 JobTypeH 5 JobTypeA 2 JobTypeA 2 JobTypeB 1 JobTypeA 2 JobTypeA 2 JobTypeH 5 JobTypeH 5 JobTypeA 2 JobTypeA 2 JobTypeB 1 @} Finally, we can take a look at the output from our programme, where we not that the jobs have been filtered properly: @D output from database example @{UrgentJobs JobTypeH 5 JobTypeH 5 JobTypeH 5 JobTypeH 5 NormalJobs JobTypeA 2 JobTypeA 2 JobTypeA 2 JobTypeA 2 JobTypeB 1 JobTypeA 2 JobTypeA 2 JobTypeA 2 JobTypeA 2 JobTypeB 1@} \subsection{The Sporting Goods Store Example} \label{sec-sports-store-example} We now look at a simple example of more robust filtering using the \code{router} class: suppose we own a sporting goods store ``Bob's Big Bad Sports Store'' and we want to send a mailing to customers telling them of the sale that is currently being held at the store. We have a database of the customers and their past purchases and want to send a \emph{single} mailing to each customer to try and encourage him or her to come into the store and to make a purchase. Here is an overview of our code: @O sports-store.cpp @{#include #include #include #include #include #include "router.hpp" @< define customer record @> @< define sales offer destination @> int main() { std::vector store_database; @< create the groups of items for each sport @> @< create the customers @> @< set up purchases for each customer @> @< add each customer to the store database @> @< set up the sales items offers @> @< create the junk-mailer @> @< let the junk-mail begin @> return EXIT_SUCCESS; } @} We define a record for a \code{customer}: we need to store the customer's name, his or her address, and the items that he or she has purchased from our store. @d define customer record @{struct customer { @< define customer constructor @> std::string name, address_line_1, address_line_2; std::set items_purchased; };@} Every \code{customer} must have a name and an address; hence we force values to be set for each of these fields in our \code{customer} constructor. @d define customer constructor @{customer(const std::string &new_name, const std::string &new_address_line_1, const std::string &new_address_line_2) :name(new_name), address_line_1(new_address_line_1), address_line_2(new_address_line_2) {}@} What do we need to determine if a user should get a specific mailing? Well, we are going to base advertising on past purchases: if a customer has purchased basketball equipment in the past then he or she will receive an advertisement for basketball equipment now! Our predicate must keep track of what is on sale, what items are from the same sport, and where to output the advertisement letter (e.g. to some sort of file that will be routed to a printer where the junk mail will be printed, addressed, and mailed). @d define sales offer destination @{template class sale_offer : public gdfa::destination { public: @< sales\_offer constructor @> @< sales\_offer distribute function @> private: std::set items; std::string item_on_sale; OutputStream &where; };@} The constructor for the group of items pertaining to a particular sport takes an \code{OutputStream} indicating the destination for the propaganda letters, two \code{InputIterator}s which are assumed to refer to a range of \code{string}s that specify the paraphernalia related to that sport, and a \code{std::string} indicating the item on sale. @d sales\_offer constructor @{template sale_offer(OutputStream &new_where, std::string new_item_on_sale, InputIterator first, InputIterator last) :where(new_where), item_on_sale(new_item_on_sale), items(first, last) {}@} Here we do the main work of the \code{purchased\_type}'s filtering: see if the customer is a ``fan'' of the given sport by checking to see if the customer has purchased any paraphernalia relating to the current sport. If so, we print out the letter and return \code{true} to indicate that an advertisement has been sent to this customer; hence, another advert will \emph{not} be sent to that customer. @d sales\_offer distribute function @{bool distribute(const customer& recipient) { std::vector items_of_type_purchased; std::set_intersection(recipient.items_purchased.begin(), recipient.items_purchased.end(), items.begin(), items.end(), std::back_inserter( items_of_type_purchased)); if(!items_of_type_purchased.empty()) { where<<"Bob's Big Bad Sports Store"<(where, ", ")); where<<"we think you would be interested in this "; where<<"fantastic promotion. Please visit our store today "; where<<"to take advantage of this limited-time offer!\n"; where<( &_mr_jones_purchases[0], &_mr_jones_purchases[sizeof(_mr_jones_purchases)/sizeof(char*)]); mrs_hobbins.items_purchased=std::set( &_mrs_hobbins_purchases[0], &_mrs_hobbins_purchases[ sizeof(_mrs_hobbins_purchases)/sizeof(char*)]); ms_smith.items_purchased=std::set( &_ms_smith_purchases[0], &_ms_smith_purchases[sizeof(_ms_smith_purchases)/sizeof(char*)]); mr_jasper.items_purchased=std::set( &_mr_jasper_purchases[0], &_mr_jasper_purchases[sizeof(_mr_jasper_purchases)/ sizeof(char*)]);@} The customers need to be inserted into the store database. @d add each customer to the store database @{store_database.push_back(mr_jones); store_database.push_back(mrs_hobbins); store_database.push_back(ms_smith); store_database.push_back(mr_jasper);@} Next, we set up the advertisements for each sport. For simplicity, we dump all of the letters-to-customers to \code{std::cout}. The second parameter in each of the constructor calls indicates the name of the item that is on sale; the final two parameters indicate the items which relate to the same sport as the one on sale. @d set up the sales items offers @{sale_offer golf_sale(std::cout, "titanium driver", &golf_stuff[0], &golf_stuff[sizeof(golf_stuff) /sizeof(char*)]); sale_offer lacrosse_sale(std::cout, "lacrosse stick", &lacrosse_stuff[0], &lacrosse_stuff[ sizeof(lacrosse_stuff) /sizeof(char*)]); sale_offer basketball_sale(std::cout, "hoop", &basketball_stuff[0], &basketball_stuff[ sizeof(basketball_stuff) /sizeof(char*)]);@} Add each of the sale adverts to the junk mailer. Since golf equipment is more expensive than lacrosse equipment which, in turn, is more expensive than basketball equipment, we prioritise the advertisements appropriately: if a customer has purchased both golf and lacrosse equipment, we want to send him or her the golf advertisement and \emph{not} the lacrosse advertisement since we, at Big Bob's, are trying to make the most money off of the customers and we can best do so by advertising expensive products to them. @d create the junk-mailer @{typedef gdfa::router mailer; mailer junk_mailer; junk_mailer.add_destination(golf_sale, gdfa::router::high_priority); junk_mailer.add_destination(lacrosse_sale); junk_mailer.add_destination(basketball_sale, gdfa::router::low_priority);@} Finally, we produce the items to-be-mailed using the junk-mailer that we have created and the store's database: we pass the database to the mailer and the mailer determines the most expensive sale item which fits the tastes of each customer and then creates a proper mailing for that advertisement, addressed to the customer and making mention of the previously-purchased products for that sport. @d let the junk-mail begin @{junk_mailer.distribute(store_database.begin(), store_database.end());@} In the interest of space, we present here one of the advertisements rather than all of them, but this should give a reasonable idea of the style of the output:\footnote{Line breaks were added to the output so as to make it readable.} @D sample of output from sporting store example @{Bob's Big Bad Sports Store 112 42nd St. Troy, NY 12180 Mr. Charles Jones 123 Harcourt Terrace Troy, NY 12180 Dear Mr. Charles Jones: Bob's Big Bad Sports Store is pleased to announce that we currently are running a sale on titanium drivers. Since you have purchased items such as golf bag, set of clubs, we think you would be interested in this fantastic promotion. Please visit our store today to take advantage of this limited-time offer! Yours sincerely, Big Bob@} \subsection{The Criminal Profiling Example} \label{sec-criminal-example} We now look at a slightly more complicated use of the \code{router} class. Suppose that we know that two suspects were spotted by eyewitnesses leaving a crime scene matching the following descriptions: Suspect 1 is between 1.65 and 1.75m tall, weighs between 65 and 75kg, and has brown hair; Suspect 2 is between 1.75 and 1.85m tall, weights between 80 and 90kg and also has brown hair. Given that we have a database of criminals, we want to find out if any criminals in the database match these descriptions so that we can round up the usual suspects and interrogate them. The purpose of this programme is to illustrate a multi-tier use of the \code{router} class: we will first filter based on height, next filter based on weight, and, finally, filter based on hair-colour in a three-step process. This example is, of course, very contrived, but this could be used for advanced data-mining or analysis of complex data in a real programme. @O criminal.cpp @{#include #include #include #include #include #include "router.hpp" @< define criminal profile @> @< define classes for filtering @> int main() { std::vector criminal_database; std::vector suspect1, suspect2; @< create the criminal records @> @< run the search @> @< display the results @> return EXIT_SUCCESS; } @} Since we need a way to store a criminal's profile, we create a structure in which we can store vital information such as name, height, weight, build, hair-colour, eye-colour, sex, and race. @d define criminal profile @{struct criminal { @< define useful enumerations for criminal data @> @< criminal constructor @> std::string name; double height; unsigned int weight; build my_build; hair_colour my_hair_colour; eye_colour my_eye_colour; sex my_sex; race my_race; }; @< criminal output operator @>@} A few useful enumerations are needed to help us deal with the data in an abstract sense. \code{``\_hair''} is suffixed on each of the hair-colours because of conflicts that would otherwise arise between the constants in the \code{hair\_colour} enumeration and the \code{eye\_colour} and \code{race} enumerations. @d define useful enumerations for criminal data @{enum build {thin, medium, stocky, heavy}; enum hair_colour {black_hair, brown_hair, blonde_hair, white_hair, red_hair, grey_hair}; enum eye_colour {brown, blue, green, hazel}; enum sex {male, female}; enum race {black, white, hispanic, arab};@} The constructor simply initialises the right values. @d criminal constructor @{criminal(const std::string &new_name, double new_height, unsigned int new_weight, build new_build, hair_colour new_hair_colour, eye_colour new_eye_colour, sex new_sex, race new_race) :name(new_name), height(new_height), weight(new_weight), my_build(new_build), my_hair_colour(new_hair_colour), my_eye_colour(new_eye_colour), my_sex(new_sex), my_race(new_race) {}@} It would be nice to have a convenient way to display a criminal's statistics; pursuant to this goal, we create an output operator for a \code{criminal} object. @d criminal output operator @{std::ostream& operator<<(std::ostream &out, const criminal &to_print) { @< create string mappings for the enumerations @> out< build_labels; static std::map hair_labels; static std::map eye_labels; static std::map sex_labels; static std::map race_labels; if(first) { build_labels[criminal::thin]="thin"; build_labels[criminal::medium]="medium"; build_labels[criminal::stocky]="stocky"; build_labels[criminal::heavy]="heavy"; hair_labels[criminal::black_hair]="black"; hair_labels[criminal::brown_hair]="brown"; hair_labels[criminal::blonde_hair]="blonde"; hair_labels[criminal::white_hair]="white"; hair_labels[criminal::red_hair]="red"; hair_labels[criminal::grey_hair]="grey"; eye_labels[criminal::brown]="brown"; eye_labels[criminal::blue]="blue"; eye_labels[criminal::green]="green"; eye_labels[criminal::hazel]="hazel"; sex_labels[criminal::male]="male"; sex_labels[criminal::female]="female"; race_labels[criminal::black]="black"; race_labels[criminal::white]="white"; race_labels[criminal::hispanic]="hispanic"; race_labels[criminal::arab]="arab"; first=false; }@} Next, we create several ways of filtering criminals. @d define classes for filtering @{@< filter by height @> @< filter by weight @> @< filter by hair-colour @>@} We need a way to filter the criminals in our database by their heights. Since witness testimony is often inaccurate, we want to be able to search for an height range. This \code{destination} will allow us to do this. @d filter by height @{template class height_filter : public gdfa::destination { public: @< height\_filter constructor @> @< height\_filter distribute @> private: double min, max; OutputIterator where; };@} The constructor just takes an \code{OutputIterator} and two \code{double}s which define the range of height values that we seek. @d height\_filter constructor @{height_filter(OutputIterator new_where, double new_min, double new_max) :where(new_where), min(new_min), max(new_max) {}@} Next, we compare a criminal's height against the range that we are looking for; output his or her information if an height match is found. \code{false} is vacuously returned because a criminal may match the description of one or more suspects. @d height\_filter distribute @{bool distribute(const criminal &suspect) { if(min<=suspect.height && suspect.height<=max) { *(where++)=suspect; } return false; }@} We also want to be able to filter by weight. We need a separate \code{weight\_filter} class because in the \code{distribute} function, we need to access different members of the \code{criminal} structure. @d filter by weight @{template class weight_filter : public gdfa::destination { public: @< weight\_filter constructor @> @< weight\_filter distribute @> private: unsigned int min, max; OutputIterator where; };@} We create another trivial constructor. @d weight\_filter constructor @{weight_filter(OutputIterator new_where, unsigned int new_min, unsigned int new_max) :where(new_where), min(new_min), max(new_max) {}@} We distribute exactly like we did for \code{filter\_height} except that we change \code{suspect.height} to \code{suspect.weight}. @d weight\_filter distribute @{bool distribute(const criminal &suspect) { if(min<=suspect.weight && suspect.weight<=max) { *(where++)=suspect; } return true; }@} Since we also have information about the suspects' hair-colour, we create a \code{destination} to filter by hair-colour. @d filter by hair-colour @{template class colour_filter : public gdfa::destination { public: typedef criminal::hair_colour colour; @< colour\_filter constructor @> @< colour\_filter distribute @> private: colour to_match; OutputIterator where; };@} Yet another fairly simple constructor. @d colour\_filter constructor @{colour_filter(OutputIterator new_where, colour new_to_match) :where(new_where), to_match(new_to_match) {}@} Use a simple equality comparison against a known hair-colour and the criminal's to see if he or she matches the suspect's description. @d colour\_filter distribute @{bool distribute(const criminal &suspect) { if(suspect.my_hair_colour==to_match) { *(where++)=suspect; } return true; }@} We look again at the main programme. The database is populated with a set of criminals,\footnote{This information is pulled from the U.S. FBI's Most-Wanted database; this example is not intended to indicate the authors' support of profiling in investigations.} each created using the \code{criminal} constructor. @d create the criminal records @{criminal_database.push_back(criminal("Usama Bin Laden", 1.96, 72, criminal::thin, criminal::brown_hair, criminal::brown, criminal::male, criminal::arab)); criminal_database.push_back(criminal("Hopeton Eric Brown", 1.73, 80, criminal::medium, criminal::black_hair, criminal::brown, criminal::male, criminal::black)); criminal_database.push_back(criminal("James J. Bulger", 1.73, 70, criminal::medium, criminal::grey_hair, criminal::blue, criminal::male, criminal::white)); criminal_database.push_back(criminal("Robert William Fisher", 1.83, 86, criminal::medium, criminal::brown_hair, criminal::blue, criminal::male, criminal::white)); criminal_database.push_back(criminal("Victor Manuel Gerena", 1.68, 75, criminal::stocky, criminal::brown_hair, criminal::green, criminal::male, criminal::hispanic)); criminal_database.push_back(criminal("Glen Stewart Godwin", 1.83, 90, criminal::medium, criminal::grey_hair, criminal::green, criminal::male, criminal::white)); criminal_database.push_back(criminal("Richard Steve Goldberg", 1.83, 72, criminal::medium, criminal::brown_hair, criminal::brown, criminal::male, criminal::white)); criminal_database.push_back(criminal("Eric Robert Rudolph", 1.80, 78, criminal::medium, criminal::brown_hair, criminal::blue, criminal::male, criminal::white)); criminal_database.push_back(criminal("James Spencer Springette", 1.80, 109, criminal::heavy, criminal::black_hair, criminal::brown, criminal::male, criminal::white)); criminal_database.push_back(criminal("Donald Eugene Webb", 1.75, 75, criminal::medium, criminal::grey_hair, criminal::brown, criminal::male, criminal::white));@} In an overview of our search, as mentioned earlier, we filter based on height, then based on weight, then based on hair-colour. @d run the search @{std::vector temp1, temp2; typedef std::back_insert_iterator > BItype; @< filter based on height @> @< filter based on weight @> @< filter based on hair-colour @>@} First, we separate the criminals in the database based on height. @d filter based on height @{@< set up height filters @> @< run height filter @>@} We know that one of the suspects is between 1.65 and 1.75m and that the other is between 1.75 and 1.85 in height; we create a \code{destination} for each (more precisely, a \code{height\_filter}) and add these to a \code{router}. @d set up height filters @{height_filter suspect_1_height(std::back_inserter( suspect1), 1.65, 1.75); height_filter suspect_2_height(std::back_inserter( suspect2), 1.75, 1.85); gdfa::router search_by_height; search_by_height.add_destination(suspect_1_height); search_by_height.add_destination(suspect_2_height);@} Then, we distribute the criminals from the database into the vectors \code{suspect1} and \code{suspect2}; criminals who fit neither of the sets of height requirements are discarded. @d run height filter @{search_by_height.distribute(criminal_database.begin(), criminal_database.end());@} Next, we filter based on the weights described by the eyewitnesses. @d filter based on weight @{@< set up weight filters @> @< run weight filters @>@} We now set a \code{router} for each criminal's weight: 65 to 75kg for the first, 80 to 90kg for the second. It would be nice if we could use the trivial version of \code{distribute}, but there seems no logical location where the criminals whose records did not match the predicate would be output. This version does not output them to any iterator, which is exactly what is desired. @d set up weight filters @{weight_filter suspect_1_weight(std::back_inserter(temp1), 65, 75); gdfa::router filter_1_by_weight; filter_1_by_weight.add_destination(suspect_1_weight); weight_filter suspect_2_weight(std::back_inserter(temp2), 80, 90); gdfa::router filter_2_by_weight; filter_2_by_weight.add_destination(suspect_2_weight);@} Now, filter by weight the criminals who (so far) match the description of the respective suspects. The refined suspect lists are held in \code{temp1} and \code{temp2} for suspect1 and suspect2, respectively, we copy this back into \code{suspect1} and \code{suspect2} and clear the temporary \code{std::vector}s. @d run weight filters @{filter_1_by_weight.distribute(suspect1.begin(), suspect1.end()); filter_2_by_weight.distribute(suspect2.begin(), suspect2.end()); suspect1=temp1; suspect2=temp2; temp1.clear(); temp2.clear();@} Finally, we examine the hair-colour entries for each of the criminals in our database and eliminate those who do not match. @d filter based on hair-colour @{@< set up colour filters @> @< run colour filters @>@} We set up a filter for each suspect to eliminate all potential suspects that do not have brown hair. @d set up colour filters @{colour_filter suspect_1_colour(std::back_inserter(temp1), criminal::brown_hair); gdfa::router filter_1_by_colour; filter_1_by_colour.add_destination(suspect_1_colour); colour_filter suspect_2_colour(std::back_inserter(temp2), criminal::brown_hair); gdfa::router filter_2_by_colour; filter_2_by_colour.add_destination(suspect_2_colour);@} Run these filters and copy the results back from the temp \code{std::vector}s. We are done at last! @d run colour filters @{filter_1_by_colour.distribute(suspect1.begin(), suspect1.end()); filter_2_by_colour.distribute(suspect2.begin(), suspect2.end()); suspect1=temp1; suspect2=temp2;@} Calls to \code{std::copy} get the data printed, and, voil\'a, we have a list of all criminals matching each of the two suspects described by the eyewitnesses who saw them fleeing the scene of the crime. @d display the results @{std::cout<<"--- Criminals matching the description "; std::cout<<"of Suspect1 ---"; std::cout<(std::cout, "\n")); std::cout<<'\n'<(std::cout, "\n"));@} Here is the output of our search: @D output from criminal profiling example @{--- Criminals matching the description of Suspect1 --- Victor Manuel Gerena height: 1.68m weight: 75kg build : stocky hair : brown eyes : green sex : male race : hispanic --- Criminals matching the description of Suspect2 --- Robert William Fisher height: 1.83m weight: 86kg build : medium hair : brown eyes : blue sex : male race : white@} \chapter{Reference Manual} \label{sec-ref-manual} \section{API Reference} \label{sec-apiref} \subsection{Trivial \code{distribute} Function} \subsubsection{Prototype} @d prototype for the trivial distribute function @{template void distribute(InputIterator first, InputIterator last, OutputIteratorIfTrue out1, OutputIteratorIfFalse out2, Predicate p)@} \subsubsection{Description} Evaluates each element in the input range against the predicate specified; the element is copied to one of the two output iterators based on the result of the test.\footnote{This implies that the value types of the input iterator and each of the output iterators should be equal; failing that, the output iterator types should be copy constructible from objects of the input iterator's value type.} All elements will be placed in one of the destinations. \subsubsection{Definition} Defined in the header file \code{router.hpp}. \subsubsection{Type Requirements} \begin{itemize} \item \code{InputIterator} models the \code{Input Iterator} concept, as defined in \cite{sgi-stl}. \item \code{OutputIteratorIfTrue} and \code{OutputIteratorIfFalse} model the \code{Output Iterator} concept, as defined in \cite{sgi-stl}. \item \code{Predicate} models the \code{Predicate} concept, as defined in \cite{sgi-stl}. \end{itemize} \subsubsection{Preconditions} \begin{itemize} \item $[\code{first},\code{last})$ is a valid range. \end{itemize} \subsubsection{Complexity} $O(N)$ predicate evaluations and assignments (both average and worst-case) where $N$ is $\code{last}-\code{first}$. \subsection{\code{destination} Base Class} \subsubsection{Template Signature} @d template signature for destination base class @{template destination@} \subsubsection{Description} \code{destination} defines an abstract base class from which users of the GDFA Library can derive wrapper classes for the predicate-destination pairs used by the \code{router} class. \subsubsection{Definition} Defined in the header file \code{router.hpp}. \subsubsection{Template Parameters} \begin{center} \begin{tabular}{|p{2.25in}|p{2.25in}|} \hline Parameter & Description\\ \hline \code{T} & The \code{destination}'s value type: the type of object filtered by the \code{destination}.\\ \hline \end{tabular} \end{center} \subsubsection{Type Requirements} None, except those imposed by derived classes. \subsubsection{Refinement of} \code{Copy Constructable} (as defined in \cite{sgi-stl}). \subsubsection{Valid Expressions} \begin{center} \begin{tabular}{|p{1.75in}|p{1in}|p{1.75in}|} \hline Expression & Return Type & Semantics\\ \hline \code{distribute(o)} & \code{bool} & When this function is called on an object \code{o} that is being examined. The function should return \code{true} if and only if the given destination is the last to evaluate the item \code{o} (that is, the router should stop processing it), \code{false} else.\\ \hline \end{tabular} \end{center} \subsection{\code{router} Class} \subsubsection{Prototype} @d template signature for router class @{template router@} \subsubsection{Definition} Defined in the header file \code{router.hpp}. \subsubsection{Template Parameters} \begin{center} \begin{tabular}{|p{2.25in}|p{2.25in}|} \hline Parameter & Description\\ \hline \code{T} & The \code{router}'s value type: the type of object distributed by the \code{router}.\\ \hline \end{tabular} \end{center} \subsubsection{Refinement of} \code{Default Constructable}, \code{Copy Constructable}, \code{Assignable}. \subsubsection{Members} \begin{center} \begin{tabular}{|p{2.25in}|p{2.25in}|} \hline Member & Description\\ \hline \code{priority\_level} & Internal class for specifying the relative priority of destinations managed by the router. Instances of this class can be compared with the less-than, greater-than, and equality operators.\\ \hline \code{low\_priority}, \code{normal\_priority}, \code{high\_priority}, \code{urgent\_priority} & The four allowable priority levels for destinations added to the router. Each is a static instance of \code{router::priority\_level}.\footnotemark\\ \hline \code{router()} & Default constructor; creates an empty \code{router}.\\ \hline \code{template $<$typename Iterator$>$ router(Iterator first, Iterator last, priority\_level priority=normal\_priority)} & Constructor; creates a \code{router} that manages the \code{destination} objects held in the range specified. The destinations found in the range must be instantiated from a sub-class of \code{destination$<$T$>$}. The priority selected will be assigned to every \code{destination} in the range.\\ \hline \code{$\sim$router()} & Destructor.\\ \hline \code{template $<$typename Iterator$>$ void distribute(Iterator item)} & Allows each destination to evaluate the object at the location referenced by the given iterator until a match is found or all destinations have been considered.\footnotemark\\ \hline \code{template $<$typename Iterator$>$ void distribute(Iterator first, Iterator last)} & Invokes \code{distribute} for each iterator in the range specified.\\ \hline \code{template $<$typename Destination$>$ void add\_destination(const Destination\& d, priority\_level priority = normal\_priority)} & Adds the given \code{destination} to the group managed by the \code{router}: its place in the group will be determined by the assigned priority. The \code{destination} given must be instantiated from a sub-class of \code{destination$<$T$>$}. The priority level must be one of the four defined by \code{router$<$T$>$}.\\ \hline \end{tabular} \end{center} \footnotetext[2]{The implementation of the priority levels is trivial and thus, not expensive; the only reason for their existence is to limit the range of priorities.} \footnotetext[3]{The time bound on this operation is $O(N)$, where $N$ is the number of destinations.} \begin{center} \begin{tabular}{|p{2.25in}|p{2.25in}|} \hline Member & Description\\ \hline \code{template $<$typename Destination$>$ void set\_default(const Destination\& d)} & Adds a default \code{destination} to the \code{router}; the default will be used whenever no match is found for an input. The \code{destination} given must be instantiated from a sub-class of \code{destination$<$T$>$}. No priority is assigned to the default \code{destination}.\footnotemark\\ \hline \code{void remove\_default()} & Removes a default destination; if there is no default, no action is taken.\\ \hline \end{tabular} \end{center} \footnotetext[4]{The actual priority can be said to be less than \code{router::low\_priority.}} \chapter{Design Issues and Source Code} This chapter will discuss the design issues involved in the creation of the GDFA Library; for each topic presented, the relevant source code will be included for reference. We will first review the ideas and concepts that led to the development of the library, and then revisit each individually as we show how ideas were transformed into implementation. Extra attention should be given to the explanation of terminology, as some words used here have definitions that are quite different from their more common meanings. \section{Concept of Distribution and Filtering} The foundation for the GDFA Library is in the simple \code{distribute} function documented earlier. To see how this utility function serves as the premise for the more advanced \code{router} and \code{destination} classes, we need to revisit the first example presented in the User's Guide (Chapter \ref{sec-user-guide}). Recall that the example programme was managing batch jobs that ran as part of daily database maintenance procedures; jobs were split into two categories---\code{normal} and \code{urgent}---and then executed in terms of their relative priority. The \code{distribute} function was used to filter the jobs into the correct container directly from the input stream, so that no unnecessary copying was performed. \subsection{Introduction to Distribution and Filtering} To see how this generic utility offers a real value to the programmer, let us first consider an applications-specific solution to this I/O problem---it takes a file name and two \code{std::vector}s in which to copy the data: @d database distribute example 1 @{void distribute(const string& fileName, vector& urgent, vector& normal) { ifstream inputFile(fileName); DatabaseJob job; while(inputFile>>job) { if(job.IsHighPriority()) { urgent.push_back(job); } else { normal.push_back(job); } } }@} \subsection{Generic Input Range} The first thing to note is that we have restricted our source of input to a formatted file---we can never use this code to filter objects in an existing container or another stream (such as the standard input stream, \code{std::cin}). We would like to have the extensibility found in STL algorithms and allow users to transfer data similarly to the way \code{std::copy} moves elements from one range to another. C++ templates can be used to parameterise the type of input and remove this restriction. A generic range of input, then, is the first definitive aspect of our filtering component. The range is defined by two parameterised input iterators (specified by the \code{InputIterator} parameter), which may traverse either streams or containers. Note that in the case where the input iterator is a \code{std::istream\_iterator}, the \code{operator>>} method will still be necessary to read in the \code{DatabaseJob} objects. @d database distribute example 2 @{template void distribute(InputIterator begin, InputIterator end, vector& urgent, vector& normal) { while(begin!=end) { DatabaseJob job=*begin++; if(job.IsHighPriority()) { urgent.push_back(job); } else { normal.push_back(job); } } }@} \subsection{Generic Output} The next concern is that the function names a specific container for the filtered \code{DatabaseJob} objects. Given the standard iterator interface used by all STL containers and streams, there is no reason this code should force users to put their elements in a \code{std::vector}. Users should be allowed to specify any destination for the \code{DatabaseJob} objects in the same way they are allowed to specify a generic source of input; this output destination might be a container like \code{std::set} or a stream such as \code{std::cout}. A generic destination, however, is only part of the answer to the question of how to parameterise the output range. Why does the output have to be composed of \code{DatabaseJob} objects? We are looking for an efficient, generic solution that can be easily reused, and there is no reason that this algorithm cannot be applied to other types of data. Our function is reading in objects and putting them in one of two containers given the results of a conditional, which is not database-specific at all. By replacing all references to a vector of \code{DatabaseJobs} with a set of output iterators, the \code{distribute} function takes on a completely generic I/O interface. \code{OutputIteratorIfTrue} and \code{OutputIteratorIfFalse} are the two output iterator types associated with the destinations that are chosen given the Boolean result of the predicate---they are independent of each other and may be of different types. @d database distribute example 3 @{template void distribute(InputIterator begin, InputIterator end, OutputIteratorIfTrue out1, OutputIteratorIfFalse out2) { while(begin!=end) { typename iterator_traits::value_type item; item=*begin++; item.IsHighPriority() ? *out1++=item : *out2++=item; } }@} \subsection{Generic Predicate} There is one glaring problem left in the algorithm, and that is the call to \code{DatabaseJob::IsHighPriority}. Our function refers to no specific type, and even if the type in question is guaranteed to be a class (which it is certainly not), it is not reasonable to demand that the class have an \code{IsHighPriority} member. The correct approach to making this solution completely reusable is to use predicates in the same way that the standard library algorithms do. In the final version of \code{distribute}, a fifth parameter has been added to allow programmers to name the comparator that should be used to analyse the data. This component now accepts a generic range of input, evaluates it with any user-defined predicate (specified by the \code{Predicate} parameter), and places it in one of two independent destinations. @d the distribute function @{template void distribute(InputIterator begin, InputIterator end, OutputIteratorIfTrue out1, OutputIteratorIfFalse out2, Predicate p) { while (begin != end) { typename std::iterator_traits::value_type item=*begin++; p(item) ? *out1++=item : *out2++=item; } }@} \section{A More Generic Phrasing of Distribution and Filtering} In order to further generalise the algorithm and add more power to its filtering capabilities, it was necessary to remove the restriction on the number of destinations to which the input elements could be written. With this limitation eliminated, users could filter data to one or more locations based on a set of criteria that was more complex than a single Boolean check---elements could be evaluated against multiple, independent predicates and have compound actions performed against them. This was by far the most complicated issue in the development of the library, as it required the construction of a full supporting framework to offer flexibility without decreasing usability. It was clear that the resulting set of classes and/or functions would have to pair the multiple predicates and output destinations. The \code{router} class, along with the helper \code{destination} class, offers this advanced functionality by building on the ideas presented so far. \subsection{Arbitrary Number of Destinations} Our \code{router} class manages a set of predicate-output pairs through which it filters incoming data. Our \code{router} class manages a set of predicate-output pairs through which it filters incoming data. The most complex aspect of \code{router} is not the implementation of its methods but rather the parameterisation of their parameters---the class makes extensive use of member templates in order to provide the desired flexibility. The aforementioned pairs are really instances of the \code{destination} class. The first building block in the GDFA framework is the \code{destination} class for evaluating input and performing some action against it. Users may extend the class any way they wish, but they must override the pure virtual function \code{distribute} or the derived class will remain abstract. @d the destination class @{template class destination { public: virtual ~destination() {} virtual bool distribute(const T& element)=0; };@} \subsection{Creating Derived Classes from \code{destination}} The purpose of the \code{distribute} function is to perform an action against the parameter it receives. Technically, there is no limit or restriction on what the implementation can or should do, but the problems that the framework hopes to solve tend to follow a common pattern: the element is tested against some criteria and written to an output based on the result of that test. The Boolean value returned by the function should be true if the router is to stop evaluation of the current element---that is, if it should move on to the next element in the input range without passing the current element to other destinations. The User's Guide (Chapter \ref{sec-user-guide}) has multiple examples to demonstrate how the \code{destination} class can be used as a base for programmer tests, but the basic idea behind most implementations can be described with the following pseudo-code: @d deriving from destination @{template class new_destination : destination { private: Output out; public: new_destination(Output _out) : out(_out) {} bool distribute(const type& element) { if(some test against element) write element to output return true to stop evaluation } };@} \subsection{The \code{destination\_handle} Class} For any range of input, users can create multiple \code{destination}s through which to filter the elements---for each \code{destination} created, an instance is added to the \code{router}, which will pass the input through the \code{destination}s until it is told to stop or until it has considered every one. Because users must be able to create and use multiple different destination sub-classes to be placed in a single \code{router}, the internal mechanism for handling these destination objects is more complicated than initially planned. Fortunately, the user interface is not affected by this complexity, and users who can understand the above pseudo-derivation from \code{destination} can use the GDFA Library effectively. \subsubsection{Introduction to Handles} The \code{router} class manages the destination objects created by the user in a single container. Holding many objects of different sub-types in one container without complicating the interface was a challenge, but one that was necessary since we did not want to limit the number of destinations that a router could manage. Applications that used the library to filter large batches of data might have hundreds of predicates to evaluate the many possible attributes associated with their data; imposing a limit on the number of destinations would be imposing a limit on the acceptance of the library in mainstream software development. Furthermore, a class interface that only accepted $N$ destinations but without the aid of member template functions would be extremely cumbersome to programme, with much of the parameterisation information having to be specified explicitly. Programming \code{router} to hold a single container of destinations so that all actions could be performed against them in a generic way was essential to both interface and implementation simplicity---the aforementioned increase in complexity came with the added layer of abstraction within \code{router} to allow for the generic treatment of \code{destination} objects. The \code{destination}s are held indirectly by the \code{router} class through an handle---a wrapper that holds a pointer to a \code{destination} and performs all of the dynamic memory management for that pointer. This handle class, \code{destination\_handle}, is a member of \code{router}, and is internal to the framework---it is used in combination with member template functions to provide seamless interaction with the user-defined \code{destination}s. The basic functionality of the handle---its memory management---is demonstrated below; \code{destination\_handle} uses reference counting to offer copying and assignment semantics similar to pointers while preventing memory leaks and/or data corruption. It is assumed the reader is familiar with the basic ideas behind reference counting, and we will not review them here. The key aspect to note is that the handle holds a pointer to a \code{destination} and takes advantage of that class's virtual functions to invoke the correct method(s) in the concrete sub-class. Thus, object-oriented polymorphism has provided a simple solution to the problem of managing objects of different types in a single container. @D handle class used to store destinations @{class destination_handle { private: destination *dest; std::size_t *referenceCount; void destroy() { if(--*referenceCount==0) { delete referenceCount; delete dest; } } priority_level priority; public: @< member template constructor @> destination_handle(const destination_handle& copy) :dest(copy.dest), referenceCount(copy.referenceCount) { ++*referenceCount; priority=copy.priority; } destination_handle& operator= (const destination_handle& rhs) { ++*rhs.referenceCount; destroy(); dest=rhs.dest; referenceCount=rhs.referenceCount; priority=rhs.priority; return *this; } bool operator< (const destination_handle& rhs) const { return priority(const destination_handle& rhs) const { return rhs<*this; } ~destination_handle() { destroy(); } @< destination accessor function @> };@} \subsubsection{Abstracting Away from Pointers} One thing it did not solve, however, was the requirement that users pass a pointer to each \code{destination} they added to the \code{router}. This was undesirable both because it decreased usability (users should be able to use the \code{router} without having to do any dynamic memory management) and increased the potential for bugs (\code{destination}s declared locally on the stack which went out of scope after being added to the router would raise exceptions and corrupt programme results). We wanted the insertion methods of the \code{router} class to be similar to those of STL containers---the user would pass a reference to the given destination and know that an independent copy was being kept; such an interface is as simple as possible in terms of C++ language constructs. The \code{destination\_handle} class uses a member template constructor to create a dynamically allocated copy of the given destination; this method is central to the \code{router}'s ability to accept any class built off the GDFA \code{destination} without having to include any reflection-like code. @d member template constructor @{template destination_handle(const Destination& d, priority_level _priority) :priority(_priority), dest(new Destination(d)), referenceCount(new std::size_t(1)){}@} \subsubsection{Access to Functionality of \code{destination}} The \code{router} accesses the distribution functionality written by the user with a simple wrapper function that passes along the given element to the \code{destination} class. We will revisit the handle class later when discussing the priority of \code{destination}s vis-\'a-vis input evaluation. @d destination accessor function @{bool distribute(const T& element) { return dest->distribute(element); }@} \subsection{The \code{router} Class} With a generic handle class to take away the issues of polymorphism presented by the different \code{destination} sub-classes, the implementation of \code{router} becomes fairly simple. Most of the functionality is similar to that of an ordinary STL container, with the significant addition being the processing of an input ranges through the set of destinations according to user preference. The most basic methods of router are presented below to give an idea of its internal structure before presenting the input processing methods. @D the router class @{template class router { public: typedef unsigned int priority_level; static const priority_level low_priority=0; static const priority_level normal_priority=1; static const priority_level high_priority=2; static const priority_level urgent_priority=3; private: @< handle class used to store destinations @> std::vector destinations; destination_handle *defaultDestination; // not implemented: prohibited router(const router& copy); router& operator=(const router& rhs); public: router() : defaultDestination(NULL) {} template router(OutputIterator first, OutputIterator last) :defaultDestination(NULL) { std::copy(first, last, std::back_inserter(destinations)); std::make_heap(first, last); } ~router() { remove_default(); } template void distribute(InputIterator item) { typename std::vector::iterator i; for (i=destinations.begin(); i!=destinations.end(); ++i) { if(i->distribute(*item)) { break; } } if(i==destinations.end() && defaultDestination) { defaultDestination->distribute(*item); } } template void distribute(InputIterator first, InputIterator last) { while(first!=last) { distribute(first++); } } template void add_destination(const Destination& d, int _priority=normal_priority) { destinations.push_back(destination_handle(d,_priority)); std::push_heap(destinations.begin(), destinations.end()); } template void set_default(const Destination& d) { remove_default(); defaultDestination= new destination_handle(d,normal_priority); } void remove_default() { if(defaultDestination) { delete defaultDestination; defaultDestination=NULL; } } };@} \subsubsection{Default \code{destination}} The default \code{destination} is an optional component that provides a way to catch any elements in the input range that did not pass any of the tests conducted by the \code{router}'s \code{destination}s. If no default is set and an element is passed to every destination without one of them returning true (to tell the router to stop evaluation), no action will be taken---the router will simply move on to the next element. When a default is specified, however, an element that does not match up against any regular \code{destination} will be given to the default for further processing. We decided to keep the default \code{destination} completely separate from the other \code{destination} objects in order to simplify the implementation---the only other option would be to keep it at the end of the container of \code{destination}s, and this would be cumbersome to maintain even without the existing priority scheme that will be discussed later. Storing the default as a dynamically allocated resource (that is, in terms of a pointer) allowed us to easily tell whether a default existed (by comparing the pointer to NULL) and to access it without using any obtuse iterator trickery. From the programmer's perspective, a default \code{destination} class need not be any different from other sub-classes of \code{destination}. The only difference in its use is that instances of the default \code{destination} must be added or removed with special member functions (\code{set\_default} and \code{remove\_default}) rather than the normal \code{add\_destination} method. Regarding implementation, the only complication that arises from our choices is that \code{router} now owns a dynamic resource, and must implement a destructor to prevent memory leaks. \subsubsection{Priorities} The \code{router} class offers the ability to assign priorities to the \code{destination}s it manages. Four built-in priorities (\code{low\_priority}, \code{normal\_priority}, \code{high\_priority}, and \code{urgent\_priority}) are available to users so that they can ensure that some \code{destination}s are always considered before others, no matter the order in which they were added. The user specifies the \code{destination}'s priority when adding it to the \code{router}, with the default being \code{normal\_priority} (this means that if the user has no interest in using the priority scheme, he or she will not be affected by it). As each \code{destination} is inserted, the \code{router} uses the binary heap functions of the Standard Library to adjust the \code{std::vector} of \code{destination} objects it keeps; by storing the elements as a binary heap, the \code{router} is able to keep them prioritised very efficiently (the amount of copying that will be done for each heap adjustment will be equal to $O(\lg N)$, and the copies themselves will only be of the lightweight handles, not the actual \code{destination} objects). \section{Trivial \code{distribute}, \code{destination}, and \code{router} Code} To finish everything up, we combine everything and define the header file which will form the basis of the GDFA Library. All that we need to do is add the standard \code{\#ifdef}-\code{\#endif} block to avoid issues with multiple inclusions of the file, include some standard library components, and define the \code{gdfa} namespace. @O router.hpp @{#ifndef _router_ #define _router_ #include #include #include #include namespace gdfa { @< the distribute function @> @< the destination class @> @< the router class @> } #endif @} \chapter{Testing Approach and Results} \label{appendix-test-code} \section{Overview} This section includes a description and motivation behind all of the tests performed on the Generic Distribution and Filtering Algorithms. The results of those tests and a discussion of the results are also presented here. Our testing strategy consists of tests for performance and correctness. The performance tests consist of timed tests and operation counting. Correctness and Performance testing was performed on all key features of the Generic Distribution and Filtering Algorithms. Correctness testing was performed on many conceivable use cases implemented on a thorough set of built-in types and STL types. Timed tests and Operation Counts were performed in various configurations in order to time (and, likewise, to count) different aspects of the algorithms. \subsection{Notation} In some of the following tests, the order, or $O$-bound (asymptotic complexity), of the algorithm is tested. The algorithm is initially assumed to operate in $O(MN)$ time. $N$ represents the algorithm input size or the number of elements fed through the algorithm. These are the elements to be distributed. $M$ represents the number of destination classes used (including the default destination). The term ``growth rate'' used here is defined as the increase in measured execution time when $N$ is doubled. Hence, we can define growth rate by the formula $G(N)=T(N)/T(N/2)$ where $G(N)$ is the growth function and $T(N)$ is the measured execution time of an $N$-element input. The growth rate for an algorithm of linear complexity would be $2$. This measurement allows us to look more closely at the time increase relative to $N$. The increase from $N/2$ to $N$ is precisely calculated and graphed. In all cases the calculated growth rate is approximately $2$, which indicates a linear time complexity. \subsection{Computer Testing Configuration} The performance tests presented here were performed on an IBM ThinkPad Computer with $1.8$ GHz Mobile Pentium $4$ with Intel SpeedStep technology and $256$MB of RAM. All code was compiled with g++ version $3.2$ with the ``-O3'' flag. Performance on additional platforms, in particular the Rensselaer Polytechnic Institute CS Department Solaris Server (sparc-sun-solaris$2.8$)(compiled under g++ $3.2$) was tested and relative ($O$-bound) performance was comparable to that generated by the Pentium machine. The correctness tests presented here were also performed on an IBM ThinkPad Computer with $1.8$ GHz Mobile Pentium $4$ with Intel SpeedStep technology and $256$MB of RAM. All code was compiled with g++ version $3.2$ with the ``-O3'' flag. The correctness tests were run on additional platforms, in particular a Macintosh ibook with a $700$MHz PowerPC G3 processor, $640$MB RAM, running OSX $10.2.2$ and the results were identical. \section{Timed Tests} \subsection{Running Time Tests: (Time vs. N)} The following $3$ timed tests attempt to test for $3$ different configurations of the algorithm. All $3$ tests are intended to measure the running time of the algorithm with respect to input $N$ (the number of elements in the input range). \subsubsection{Timed Test 1a (Time vs. N): Each Element Distributed to m Destinations} \paragraph{Description} This test runs the full router class using $3$ destinations plus $1$ default destination. This configuration attempts to capture a worst-case for the algorithm. As such, each destination object will insert the element into their respective destination, and then pass the element on to the next destination object (by returning \code{false}). The code implementing this test can be found in Appendix \ref{sec-code-test1a}. \paragraph{Test Objective} This test determines the order of the algorithm with respect to the size of the input, $N$. The input is the number of elements fed in through the algorithm via an iterator range. \paragraph{Results} \begin{figure} \begin{center} \includegraphics[width=5in]{timed_test_1a} \caption{Results of Timed Test $1a$}\label{fig-r1a} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=5in]{timed_test_1a_e} \caption{Growth Rate of Timed Test $1a$}\label{fig-r1a_e} \end{center} \end{figure} The results of this test, as represented in Figure \ref{fig-r1a}, show that the algorithm runs in linear time with respect to $N$. The growth rate results results, as represented in Figure \ref{fig-r1a_e} confirm these findings; in fact, the growth rate graph shows the temporal complexity of the algorithm to be consistently linear. This test is very useful in showing the variation in running time because it emphasises the most significant (time-wise) part of the algorithm, which is inserting the element into the destination via an iterator. As explained above, this test forces a worst-case scenario and likely does not represent a typical implementation. This atypical configuration combined with the observation that element insertion dominates the running time (and masks the decision making time) results in the need for additional testing. \subsubsection{Timed Test 1b (Time vs. N): Each Element Distributed to Exactly 1 Destination} \label{sec-test-1b} \paragraph{Description} This test runs the full router class using $3$ destinations plus $1$ default destination. This configuration attempts to capture a typical case for the algorithm. As such, the elements read in on the input iterator will each be distributed to $1$ and only $1$ destination. This is accomplished as follows. The primary destinations are set up so that the elements are always passed through to the next destination. Each element is then distributed by the default destination. The code implementing this test can be found in Appendix \ref{sec-code-test1b}. \paragraph{Test Objective} The purpose of this test is to determine the order of the algorithm with respect to the size of the input, $N$. The input is the number of elements fed through the algorithm via an iterator range. \paragraph{Results} \begin{figure} \begin{center} \includegraphics[width=5in]{timed_test_1b} \caption{Results of Timed Test $1b$}\label{fig-r1b} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=5in]{timed_test_1b_e} \caption{Growth Rate of Timed Test $1b$}\label{fig-r1b_e} \end{center} \end{figure} The results of this test, as presented in Figure \ref{fig-r1b}, show that the algorithm runs in linear time with respect to $N$. The growth rate results, as represented in Figure \ref{fig-r1b_e} confirm these findings. The growth rate graph, in fact, demonstrates that the temporal complexity of the algorithm is consistently linear. This test reflects a different case than Test $1a$ and is arguably a more typical case. This test de-emphasises, to some extent, the most significant (time-wise) part of the algorithm, which is inserting the element into the destination via an iterator. This test creates a worst-case scenario regarding the number of decisions concerned because, although each element is distributed by just one destination object, it is always the default destination. This typical configuration complements and confirms the above results. \subsubsection{Timed Test 1c: (Time vs. N) - Each Element Distributed to 0 Destinations} \paragraph{Description} This test runs the full router class using $3$ destinations plus $1$ default destination. This configuration attempts to capture the decision making time complexity of the algorithm. To do this, the destination predicates were designed so that every element fit into every category and every destination class passed the element to each succeeding destination class. Therefore, $M$ decisions were made for each element. However, in order to measure this, the time of the insertion of the element into the input iterator had to minimise. Therefore, each destination does not actually insert the element, it just decides if it should and passes the element on to the next destination (by returning \code{false}). Each destination object will insert the element into their respective destination, and then pass the element on to the next destination object (by returning \code{false}). The code implementing this test can be found in Appendix \ref{sec-code-test1c}. \paragraph{Test Objective} The purpose of this test is to determine the order of the algorithm with respect to the size of the input, $N$. The input is the number of elements fed through the algorithm via an iterator range. This test is designed to factor out the time component associated with the insertion of the element via the input iterator in the destination object because this time is independent of this algorithm. This enables us to more accurately measure the other aspects of this algorithm. \paragraph{Results} \begin{figure} \begin{center} \includegraphics[width=5in]{timed_test_1c} \caption{Results of Timed Test $1c$}\label{fig-r1c} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=5in]{timed_test_1c_e} \caption{Growth Rate of Timed Test $1c$}\label{fig-r1c_e} \end{center} \end{figure} The results of this test, as represented in Figure \ref{fig-r1c}, show that the algorithm runs in linear time with respect to $N$. The growth rate results, as represented in Figure \ref{fig-r1c_e} confirm these findings. In fact, this growth rate graph illustrates that the complexity of the algorithm is perfectly linear. This test reflects a different case than Test $la$ and Test $1b$. This test entirely de-emphasises a large time component of the algorithm, which is inserting the element into the destination via an iterator. This test creates a worst-case scenario regarding the number of decisions. This configuration complements and confirms the above results. \subsection{Running Time compared to M destinations (Time vs. M)} \label{sec-tests-2} The following $3$ timed tests attempt to test for $3$ different configurations of the algorithm. All $3$ tests are intended to measure the running time of the algorithm with respect to the number of destination objects added to the router object ($M$). \subsubsection{Timed Test 2a (Time vs. M): Each Element Distributed to m Destinations} \paragraph{Description} This test runs the full router class using $M$ destinations plus $1$ default destination (where $M$ varies from $0$ to $1048576$). This configuration attempts to capture a worst-case for the algorithm. As such, each destination object will insert the element into their respective destination, and then pass the element on to the next destination object (by returning \code{false}). In this test $N$ is held constant. The code for this test may be found in Appendix \ref{sec-code-test2a}. \paragraph{Test Objective} The purpose of this test is to determine the order of the algorithm with respect to the number of destination objects added to the router object ($M$). \paragraph{Results} \begin{figure} \begin{center} \includegraphics[width=5in]{timed_test_2a} \caption{Results of Timed Test $2a$}\label{fig-r2a} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=5in]{timed_test_2a_e} \caption{Growth Rate of Timed Test $2a$}\label{fig-r2a_e} \end{center} \end{figure} The results of this test show that the algorithm runs in linear time with respect to $M$. The growth rate results, as represented in Figure \ref{fig-r2a_e} confirm these findings.\footnote{See Section \ref{sec-efficiency-disclaimer} for a possible explanation of the graph's behaviour.} This test is very useful in showing the variation in running time because it emphasises the most significant (time-wise) part of the algorithm, which is inserting the element into the destination via an iterator. This test forces a worst-case scenario and likely does not represent a typical implementation. This atypical configuration combined with the observation that element insertion dominates the running time (and masks the decision making time) results in the need for additional testing. \subsubsection{Timed Test 2b (Time vs. M)} \paragraph{Description} This test runs the full router class using $M$ destinations plus $1$ default destination (where $M$ varies from $0$ to $1048576$). This configuration attempts to capture a typical case for the algorithm. As such, each element is distributed by exactly $1$ destination object. In this test $N$ is held constant. The code for this test may be found in Appendix \ref{sec-code-test2b}. \paragraph{Test Objective} The purpose of this test is to determine the order of the algorithm with respect to the number of destination objects added to the router object ($M$). \paragraph{Results} \begin{figure} \begin{center} \includegraphics[width=5in]{timed_test_2b} \caption{Results of Timed Test $2b$}\label{fig-r2b} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=5in]{timed_test_2b_e} \caption{Growth Rate of Timed Test $2b$}\label{fig-r2b_e} \end{center} \end{figure} The results of this test, as represented in Figure \ref{fig-r2b}, show that the algorithm runs in linear time with respect to $M$. The growth rate results, as represented in Figure \ref{fig-r2b_e} confirm these findings.\footnote{See Section \ref{sec-efficiency-disclaimer} for a possible explanation of the graph's behaviour.} This typical configuration confirms the results of Test $2a$. \subsubsection{Timed Test 2c (Time vs. M)} \paragraph{Description} This test runs the full router class using $M$ destinations plus $1$ default destination (where $M$ varies from $0$ to $1048576$). This configuration attempts to measure the decision making time component and de-emphasising the insertion component. This is accomplished by having destinations (for the purpose of testing) that do not actually insert the element into the input iterator. They just make a decision and pass the element on to the next destination object. In this test $N$ is held constant. The code for this test may be found in Appendix \ref{sec-code-test2c}. \paragraph{Test Objective} The purpose of this test is to determine the order of the algorithm with respect to the number of destination objects added to the router object, $M$. This test is designed to factor out the time component associated with the insertion of the element via the input iterator in the destination object because this time is independent of this algorithm. This enables us to more accurately measure the other aspects of this algorithm. \paragraph{Results} \begin{figure} \begin{center} \includegraphics[width=5in]{timed_test_2c} \caption{Results of Timed Test $2c$}\label{fig-r2c} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=5in]{timed_test_2c_e} \caption{Growth Rate of Timed Test $2c$}\label{fig-r2c_e} \end{center} \end{figure} The results of this test, as represented in Figure \ref{fig-r2c}, show that the algorithm runs in linear time with respect to $M$. The growth rate results, as represented in Figure \ref{fig-r2c_e} confirm these findings.\footnote{See Section \ref{sec-efficiency-disclaimer} for a possible explanation of the graph's behaviour.} This test reflects a different case than Test $1a$ and Test $1b$. This test entirely de-emphasises a large time component of the algorithm, which is inserting the element into the destination via an iterator. This test creates a worst-case scenario regarding the number of decisions. This configuration complements and confirms the above results. \subsubsection{Note Regarding Growth Rate} \label{sec-efficiency-disclaimer} The growth rate graphs for Section \ref{sec-tests-2} reveal some deviation from strict linearity. This behaviour is probably due to the re-allocation of memory for the \code{std::vector}'s array sub-structure. The re-allocation may have occurred toward the end of the test run and the time required to perform this costly operation was not equally amortised over runs of different input sizes ($N$). \subsection{Running Time Test (Time vs. N): Simple Version} The test presented in this section is intended to measure the running time of the simple ($2$-predicate) functional version of the algorithm with respect to input $N$ (the number of elements in the input range). \subsubsection{Description} In this configuration, each element is distributed to exactly $1$ \code{destination}. This test uses the simple $2$-predicate function version of distribution. The \code{destination}s are set up so that the elements are distributed to exactly $1$ of the \code{destination}s. The code for this test is located in \ref{sec-code-test3}. \subsubsection{Test Objective} The purpose of this test is to determine the order of the algorithm with respect to the size of the input $N$. The input is the number of elements fed through the algorithm via an iterator range. \subsubsection{Results} \begin{figure} \begin{center} \includegraphics[width=5in]{timed_test_3} \caption{Results of Timed Test $3$}\label{fig-r3} \end{center} \end{figure} \begin{figure} \begin{center} \includegraphics[width=5in]{timed_test_3_e} \caption{Growth Rate of Timed Test $3$}\label{fig-r3_e} \end{center} \end{figure} The results of this test, as represented in Figure \ref{fig-r3}, show that the algorithm runs in linear time with respect to $N$. The growth rate results, as represented in Figure \ref{fig-r3_e} confirm these findings.\footnote{See Section \ref{sec-efficiency-disclaimer-3} for an explanation of the slightly anomalous behaviour of the graph.} This test is designed similarly to Test $1b$ (see Section \ref{sec-test-1b}). In this test, however, each element is distributed to the first (of $2$) destinations. \subsubsection{Note Regarding Growth Rate} \label{sec-efficiency-disclaimer-3} The sub-linear time increases are probably due to the memory being re-allocated toward the end of the previous run. Then, the next run may not have had to re-allocate memory as many times or at all, resulting in sub-linear time increase for that run. \section{Operation Counting} This section presents a group of tests performed on the GDFA. The tests are constructed to reinforce the results derived from the timed test in {section timed tests}. As such, these tests run the GDFA in $5$ of the configurations used in the timed tests. These tests use operation counting to measure the complexity of the GDFA with respect to the size of the input range ($N$) and the number of \code{destination}s ($M$). Two of the timed tests that were performed were specifically designed to overcome the limitations of timing generic algorithms. However, since the motivation behind operation counting is also to overcome the limitations of timing generic algorithms, those $2$ previous tests do not need to be performed again. The tests in this section are based on the testing method presented in \cite{opcounting}. These evaluation methods are useful for evaluation generic algorithms such as GDFA. These tests make use of the counting adaptors implemented \cite{opcounting}.\footnote{The code for this library is included in Appendix \ref{sec-code-counters}.} These libraries provide adaptors which count the operations supported by a given type. In particular, the tests presented in this section make use of the \code{iterator\_counter} and \code{value\_counter} for assignment, comparison and other operations. The graphs presented here compare the sum of the assignment operations, comparison operations, other operations and total operations against the GDFA variable parameters, specifically the size of the input range ($N$) and the number of \code{destination}s ($M$). \subsection{Operation Counts vs. N} The following $2$ timed tests perform operation counting tests for $2$ different configurations of the algorithm. Both tests are intended to measure the assignment operations, comparison operations, other operations and total operations of the algorithm with respect to Input $N$ (the number of elements in the input range), where N varies from $0$ to $1048576$. \subsubsection{Operation Counting Test 1a: Each Element Distributed to M=4 Destinations} \paragraph{Description} This test runs the full router class using $3$ \code{destination}s plus $1$ default \code{destination}. This configuration attempts to capture a worst-case for the algorithm. As such, each \code{destination} object will insert the element into its respective \code{OutputIterator}, then pass the element on to the next \code{destination} object (by returning its Boolean value \code{false}). The element type in this test is \code{int}. \paragraph{Test Objective} The objective of this test is to determine the complexity of the algorithm with respect to the size of the input range ($N$). The input is the number of elements fed through the algorithm via an iterator range. The code implementing this test may be found in Appendix \ref{sec-code-count-1a}. \paragraph{Results} \begin{figure} \begin{center} \includegraphics[width=5in]{opcount1a} \caption{Operation Counts for Test $1a$}\label{fig-count-1a} \end{center} \end{figure} The results of this test, as represented in Figure \ref{fig-count-1a} show that the algorithm runs in linear time with respect to $N$. This typical configuration confirms the results of Timed Test 1a. \subsubsection{Operation Counting Test 1b: Each Element Distributed 1 Destination} \paragraph{Description} This test runs the full \code{router} class using $3$ \code{destination}s plus $1$ default \code{destination}. This configuration attempts to capture a typical case for the algorithm. As such, the elements read in via the input iterator will each be distributed to $1$ (and only $1$) \code{destination}. This is accomplished as follows: The primary \code{destination}s are set up so that the elements are always passed through to the next \code{destination} object (by returning the Boolean value \code{false}). Each element is then distributed by the default \code{destination}. The element type in this test is also \code{int}. The code for this section may be found in Appendix \ref{sec-code-count-1b}. \paragraph{Test Objective} The purpose of this test is to determine the complexity of the algorithm with respect to the size of the input ($N$). The input is the number of elements fed through the algorithm via an iterator range. \paragraph{Results} \begin{figure} \begin{center} \includegraphics[width=5in]{opcount1b} \caption{Operation Counts for Test $1b$}\label{fig-count-1b} \end{center} \end{figure} The results of this test, as represented in Figure \ref{fig-count-1b}, show that the algorithm runs in linear time with respect to $N$. This typical configuration confirms the results of Operation Counting Test $1a$ as well as Timed Test $1b$. \subsection{Operation Counts vs. M} The following $2$ Operation Counting tests are run on $2$ different configurations of the algorithm. Both tests are intended to measure the assignment operations, comparison operations, other operations and total operations of the algorithm with respect to the number of \code{destination} objects added to the router object $M$. \subsubsection{Operation Counting Test 2a: Each Element Distributed to M Destinations} \paragraph{Description} This test runs the full \code{router} class using $M$ \code{destination}s plus $1$ default \code{destination} (where $M$ varies from $0$ to $1048576$). This configuration attempts to capture a worst-case scenario for the algorithm. As such, each \code{destination} object will insert the element into their respective \code{destination}, and then pass the element on to the next \code{destination} object (by returning the Boolean value \code{false}). In this test $N$ is held constant. The element type in this test is also \code{int}. The code for this test may be found in Appendix \ref{sec-code-count-2a}. \paragraph{Test Objective} This test aims to determine the complexity of the algorithm with respect to the number of destination objects ($M$) added to the \code{router} object. \paragraph{Results} \begin{figure} \begin{center} \includegraphics[width=5in]{opcount2a} \caption{Operation Counts for Test $2a$}\label{fig-count-2a} \end{center} \end{figure} The results of this test, as represented in Figure \ref{fig-count-2a}, show that the algorithm runs in linear time with respect to $M$. This typical configuration confirms the results of Timed Test $2a$. \subsubsection{Operation Counting Test 2b: Each Element Distributed 1 Destination} \paragraph{Description} This test runs the full \code{router} class using $M$ \code{destination}s plus $1$ default \code{destination} (where $M$ varies from $0$ to $1048576$). This configuration attempts to capture a typical case for the algorithm. As such, each element is distributed by exactly $1$ \code{destination} object. In this test $N$ is held constant. The element type in this test is int. The code for this test may be found in Appendix \ref{sec-code-count-2b}. A notable characteristic of the results of this test is the number of assignment operations. While the overall results of the test show that the algorithm is linear with respect to $M$, the number of assignment operations (\code{iterator\_counter} + \code{value\_counter}) is constant with respect to $M$. This is because each element is distributed by exactly $1$ destination object while the size of the input range ($N$) is held constant. \paragraph{Test Objective} Here, we want to determine the order (asymptotic complexity) of the algorithm with respect to the number of \code{destination} objects ($M$) added to the \code{router} object. \paragraph{Results} \begin{figure} \begin{center} \includegraphics[width=5in]{opcount2b} \caption{Operation Counts for Test $2b$}\label{fig-count-2b} \end{center} \end{figure} The results of this test, as represented in Figure \ref{fig-count-2b}, show that the algorithm runs in linear time with respect to $M$. This typical configuration confirms the results of Operation Counting Test $2a$ as well as Timed Test $2b$. \subsection{Simple Version: Operation Counts vs. N} \subsubsection{Description} In this configuration, each element is distributed to exactly $1$ \code{destination}. This test uses the simple $2$-predicate function version of \code{distribute}. The \code{destination}s are set up so that the elements are distributed to exactly $1$ of the \code{destination}s. The element type in this test is \code{int}. The code for this test may be found in Appendix \ref{sec-code-count-3}. \subsubsection{Test Objective} We wish to determine the order of the algorithm with respect to the size of the input range ($N$) (where $N$ varies from $0$ to $1048576$). The input range is the number of elements in fed through the algorithm via an iterator range. \subsubsection{Results} \begin{figure} \begin{center} \includegraphics[width=5in]{opcount3} \caption{Operation Counts for Test $3$}\label{fig-count-3} \end{center} \end{figure} The results of this test, as represented in Figure \ref{fig-count-3} show that the algorithm runs in linear time with respect to $N$. This test is designed similarly to Test $1b$. In this test, however, each element is distributed to the first (of $2$) \code{destination}s. This test also confirms the results of Timed Test $3$. \subsection{Operation Counts vs. N and M} \paragraph{Description} To more accurately describe the behavior of the GDFA \code{router} functionality, we now vary both the number of \code{destination}s ($M$) and the size of the input to be filtered ($N$). This test runs the full \code{router} class using $M$ \code{destination}s plus $1$ default \code{destination} (where $M$ varies from $0$ to $16384$) and $N$ input objects (where $N$ varies from $1$ to $16384$). This configuration attempts to capture the trends that emerge as both $M$ and $N$ vary. To do this, we measure the number of total operations of the algorithm with respect to $M$ and $N$. The code for this test may be found in Appendix \ref{sec-code-count-4}. \paragraph{Test Objective} In this test we attempt to demonstrate that the behaviour of the \code{router} class has $O(MN)$ asymptotic complexity with respect to the number of operations performed given $M$ destinations and and input of range size $N$. \paragraph{Results} \begin{figure} \begin{center} \includegraphics[width=5in]{opcount4} \caption{Operation Counts for Test $4$}\label{fig-count-4} \end{center} \end{figure} The results of this test, as represented in Figure \ref{fig-count-4}, confirm by the fact that the graph is planer that the performance of the \code{router} class is as expected---$O(MN)$. \section{Correctness Tests} This section presents a large group of tests performed on the \code{router} class. The tests are constructed to validate the performance of the Generic Distribution and Filtering Algorithms. As such, the tests run the algorithm in various configurations using various built in and standard types. The output of the GDFA along with the input to the GDFA is given to a correctness valid method for that particular test. The correctness validate method performs checks, calculations and comparisons to determine if the output from the GDFA is correct for the given input. \subsection{Common Validation Methods} \label{sec-common-validation-methods} Many of the tests presented in this section use a common set of validate methods. (Other tests use validate methods specialised for those particular tests.) Code for the common set of validate methods is presented in Appendix \ref{sec-code-validate-common}. \subsubsection{Common Validation 1} The first validate method iterates through 2 groups of data and determines if they are identical or not. The tests for which they are used have a baseline data set. The baseline data set is the result of distributing the data using a previously validated configuration. The types and priorities used are different; however, the desired result is known to be the same. Therefore, this method can be used to validate configurations for which these relationships exist. The code for this method is located in Appendix \ref{sec-code-validate-first}. \subsubsection{Common Validation 2 (Size Validation)} The second validate method (validate size) works on an associative container of elements (\code{set}, \code{multiset}). The code for this method is located in Appendix \ref{sec-code-validate-second}. \subsection{Helper Classes} \label{sec-helper-classes} The tests presented in this section all instantiate \code{router} objects. Most of the tests also instantiate \code{destination} objects which are added to the \code{router} objects used in the tests. The \code{destination} objects are types derived from the \code{destination} base class provided as part of the GDFA. The tests presented here use $10$ different derived \code{destination} types. The following tests use the first set of helper classes (defined in Appendix \ref{sec-code-test_classes.hpp}): \begin{center} \begin{tabular}{|l|l|} \hline Test & Sub-test\\ \hline word\_sorter & (fall through)\\ length\_counter & (fall through)\\ contains\_string & (fall through)\\ contains\_letter & (fall through)\\ My\_default & (fall through)\\ \hline \end{tabular} \end{center} The following tests use the second set of helper classes (defined in Appendix \ref{sec-code-test_classes_fallthrough.hpp}): \begin{center} \begin{tabular}{|l|l|} \hline Test & Sub-test\\ \hline word\_sorter & (no fall through)\\ length\_counter & (no fall through)\\ contains\_striing & (no fall through)\\ contains\_letter & (no fall through)\\ my\_default & (no fall through)\\ word\_sorter\_map & (no fall through)\\ my\_default\_map & (no fall through)\\ \hline \end{tabular} \end{center} The \code{word\_sorter} classes distribute words that begin with a particular letter. The \code{length\_counter} classes distribute words of a particular length. The \code{contains\_string} classes distribute words that contain a particular letter. The \code{contains\_letter} classes distribute words that contain a particular letter. The \code{my\_default} classes distribute all words. It is to be used as the default destination. The classes that are designated as ``fall through'' will always pass the element (word) on to the next destination handled by the router. The classes that are designated as ``no fall through'' will not pass the element (word) on to the next destination handled by the router if the element fits its distributing requirements. In other words, a ``no fall through'' destination object will either distribute the element or pass it on the next destination object handled by the router, but not both. The classes that are designated as ``map" (such as \code{word\_sorter\_map} and \code{my\_default\_map}) are specialised versions which works with pairs. These are needed for distributing from an \code{std::map}. \subsection{Test Files} Many of the tests presented in this section perform operations on an input range containing \code{char}s or \code{std::string}s. The algorithms that are tested require access to the input range via \code{InputIterator}s. The following tests all require a text input file to be used as the source of \code{char}s and \code{std::string}s: \begin{itemize} \item \code{nodestinations} \item \code{noinput} \item \code{testreadcont} \item \code{testarray} \item \code{test\_inserters} \item \code{test\_char\_istream} \item \code{test\_string\_istream} \item \code{testrwfiles} \end{itemize} This file is redirected to standard input in all cases, except for \code{testrwfile.cpp}. In the \code{testrwfile.cpp} test, the file is indicated as a command-line argument. These tests are run on the text file \cite{dickens}. Some of the tests presented in this section perform operations on an input range containing numbers. The following tests all require an input file of numbers to be used as a data source: \begin{itemize} \item \code{test\_int\_istream} \item \code{test\_double\_istream} \end{itemize} These files are generated by the following test helper programmes: \begin{itemize} \item \code{genints} \item \code{gendoubles} \end{itemize} These programmes are all found in Appendix \ref{sec-code-correctness}. \subsection{Testing with Built-In Types} \subsubsection{Test Objective} This test is designed to check that the GDFA works properly when computing on an input range of type \code{char}, \code{std::string}, \code{int}, and \code{double}. This set is comprised of $4$ tests, $1$ for each type. \subsubsection{Description of Tests} All of the tests described in this section use an \code{std::vector} as the input container. Therefore, the use of a \code{std::vector} and its associated interator is also tested. \paragraph{\code{char} Type Test} In the first test, a \code{router} object operates on a range of \code{char}. The \code{router} is set up to sort the range of \code{char}s into a set of containers. There are $27$ containers: one for each letter of the alphabet and a default container for all other characters. Each of these containers is associated with a \code{destination} object via an iterator. The desired output of the test is known based on the purposeful construction of the input range. An input range of $27000$ characters is generated. The range consists of $1000$ instances of each letter of the alphabet as well a $1000$ instances of the character `\{'. Once the input is generated, it is shuffled using the \code{std::random\_shuffle} function. Then the test is run. After testing, each container should contain (and only contain) $1000$ instances of the character associated with that container. The validate method for this test checks the number of elements in each container and then makes sure that there are no misplaced characters. It checks for misplaced characters by iterating through each container to check that only the correct character is in that container. The code for this test may be found in Appendix \ref{sec-code-test-char}. \paragraph{\code{std::string} Type Test} The second test uses the word sorter helper class (described in Section \ref{sec-helper-classes}) to sort \code{std::string}s in the input range based on its first character (insensitive to case). As in the above test, the validating of this test is dependent on the purposeful construction of the data in the input range. An input range consisting of $3$-letter words (character sequences) is generated. The sequence consists of every $3$ letter combination of the letters of the $26$-character English alphabet as well as the non-alphabetical character `\{'. This results in $27^3$ input strings and $27^2$ strings beginning with each character. The generated data is shuffled using the \code{std::random\_shuffle} function. Then the test is run. After testing, the validation test iterates through each container to check that each string in the container does in fact start with the correct letter and that each container has $27^2$ strings. The code for this test may be found in Appendix \ref{sec-code-test-string}. \paragraph{\code{int} Type Test} The third test uses a \code{destination} called range checker. This \code{destination} distributes elements based on a value range. For this test, $10$ range checker destination objects are instantiated. Each checks a sub-range in the range $[0,1000000)$. The sub-ranges are equal in size and are constructed as follows: \begin{itemize} \item $(0,100000)$ \item $(100000,200000)$ \item $(200000,300000)$ \item $(300000,400000)$ \item $(400000,500000)$ \item $(500000,600000)$ \item $(600000,700000)$ \item $(700000,800000)$ \item $(800000,900000)$ \item $(900000,1000000)$ \end{itemize} Like the above test, the validating of this test is dependent on the purposeful construction of the data in the input range. For this test data for the input range are generated, consisting of \code{int}s in the range $[0,1000000)$. The input data are shuffled using the \code{std::random\_shuffle} function. Based on the construction of the destination ranges and the overall range of the input data, after distribution, there should be $99999$ elements in each range and $10$ elements that get caught by the default case $(0,100000,200000, \ldots , 900000)$. Therefore, the validate method for this test checks the size of the respective resultant containers. The validate method also checks the containers to be sure that all elements contained in them fall within their respective ranges. This is quickly accomplished by using the \code{std::min\_element} and \code{std::max\_element} functions from the Standard Library. Type \code{int} works in the same way as type \code{long} and type \code{short}. In fact, on most computers, \code{int}s are equivalent either to \code{short} or to \code{long}. The range of values for an \code{int} is at least the same as the range for \code{short int}s and no larger than the range for \code{long}s. Therefore, it is assumed that if GDFA works properly with \code{ints}, it will also work properly with these other similar types. The code for this test may be found in Appendix \ref{sec-code-test-int}. \paragraph{\code{double} Type Test} The forth test also uses a destination called range checker to distributes elements based on a specified value range. For this test, $10$ range checker destination objects are instantiated. Each checks a sub-range in the range $[0,1)$. The sub-ranges are equal in size and are constructed as follows: \begin{itemize} \item $(0.0,0.1)$ \item $(0.1,0.2)$ \item $(0.2,0.3)$ \item $(0.3,0.4)$ \item $(0.4,0.5)$ \item $(0.5,0.6)$ \item $(0.6,0.7)$ \item $(0.7,0.8)$ \item $(0.8,0.9)$ \item $(0.9,1.0)$ \end{itemize} Like the above test, validation of this test is dependent on the purposeful construction of the data in the input range. For this test data for the input range is generated, consisting of integers in the range $[0,1)$. The input range is constructed starting with $0.0$. Each subsequent value is incremented by $0.000001$. The input data is shuffled using the \code{std::random\_shuffle} function. Based on the construction of the destination ranges and the overall range of the input data, after distribution, there should be $99999$ elements in each range and $10$ elements that get caught by the default case $(0.0, 0.1, 0.2, \ldots , 0.9)$. Therefore, the validate method for this test checks the size of the respective resultant containers. The validate method also checks the containers to be sure that all elements contained in them fall within their respective ranges. This is quickly accomplished by using the \code{std::min\_element} and \code{std::max\_element} functions from the Standard Library. Type \code{double} functions in the same way as type \code{float} (although the range and precision differ). Therefore, it is assumed that if GDFA works properly with type \code{double}, it will also work properly with type \code{float}. Note that floating point accuracy causes inconsistencies in the comparison of elements of type \code{double}. Therefore, the input generator, \code{destination}s, and \code{validate} method make use of a variable \code{epsilon} which is the precision that we use for distribution and validation purposes. Specifically, two \code{double}s, \code{one} and \code{two}, are considered to be the ``equal'' if \code{abs(one - two)