CSCI4380 Spring 2004
Project 3: Sort-Merge Join
Due: Friday April 30th at 11:59pm

Introduction

In this assignment, you will implement the sort-merge join algorithm. This is a group project.

Available Documentation

You should begin by reading the chapter on Implementation of Relational Operations, in particular, the section on Sort-Merge Join.

What You Have to Implement

class sortMerge {
  public:

    sortMerge(
      char       *filename1,    // Name of heapfile for relation R.

      int         len_in1,      // # of columns in R.

      AttrType    in1[],        // Array containing field types of R.

      short       t1_str_sizes[], // Array containing size of columns in R.

      int         join_col_in1, // The join column number of R.

      char       *filename2,    // Name of heapfile for relation S

      int         len_in2,      // # of columns in S.

      AttrType    in2[],        // Array containing field types of S.

      short       t2_str_sizes[], // Array containing size of columns in S.

      int         join_col_in2, // The join column number of S.

      char*       filename3,    // Name of heapfile for merged results

      int         amt_of_mem,   // Number of pages available for sorting

      TupleOrder  order,        // Sorting order: Ascending or Descending

      Status&     s             // Status of constructor

    );
   ~sortMerge();
};
The sortMerge constructor joins two relations R and S, represented by the heapfiles filename1 and filename2, respectively, using the sort-merge join algorithm. Note that the columns for relation R (S) are numbered from 0 to len_in1 - 1 (len_in2 - 1). You are to concatenate each matching pair of records and write it into the heapfile filename3.

You will need to use the following classes which are given: Sort, HeapFile, and Scan. You will call the Sort constructor to sort the input heapfiles (which means your primary responsibility will be to implement the merging phase of the algorithm). To compare the join columns of two tuples, you will call the function tupleCmp (declared in sort.h). Once a scan is opened on a heapfile, the scan cursor can be positioned to any record within the heapfile calling the Scan method position with an RID argument. The next call to the Scan method getNext will proceed from the new cursor position.

Where to Find Makefiles, Code, etc.

Copy all the hw6.tar.gz files into your own local directory. The main files are:
You will also find in the project directory the implentation of the external sort algorithm in the files sort.C and sort.h and the interface to the Scan and HeapFile classes in the include directory.