wksht10

CSCI.1200 Computer Science II
Spring, 2001
Worksheet 10
The copy constructor and overloading the assignment operator

Reading: Deitel & Deitel, Sections 8.8, 7.1, 7.5

Alert: This is the hardest worksheet. Read it slowly and ponder every sentence.

It often happens that you want to create a new object and initialize it with the same values as another instance of the class which was created previously. This calls for a special constructor called a copy constructor.

class someclass {
...
};
int main()
   someclass A(...)
   someclass B(A);  // use of the copy constructor
   ...

C++ automatically provides a default copy constructor which simply copies all data members to the new instance of the class from the existing one (i.e., from A to B). For many classes this is exactly what you want.

A very similar situation arises with the assignment operator if you wish to assign to one instance of the class the values of another instance of the class.

     int main() 
        ...
        someclass A(...)
        ...
        someclass B;
        B = A;
        ...

As with the copy constructor, C++ automatically overloads the assignment operator = for new classes, and its action is to copy the member values of the object on the right side of the operator to those on the left side.

We have seen that there are problems when you use the = operator on strings or other pointers when you really wanted to use the strcpy function or the equivalent. Similar problems arise when you use the default copy constructor and assignment operator for classes which use pointers. Here is an example, found in /dept/cs/cs2/download/ex1bug.cpp:

#include <iostream>
using namespace std;
class simple {
 private:
   char *data;
 public:
   simple() {data = NULL;}          // default constructor
   simple(char *s) {                // constructor
     data = new char[strlen(s)+1];
     strcpy (data, s);
   }
   ~simple() {                      // destructor
     if (data != NULL)
        delete [] data;
   }
   char *getdata() {return data;}
   void setdata(char *s) {
     if (data != NULL)              // free up memory from previous string,
        delete [] data;             //   if there was one
     data = new char[strlen(s)+1];  // allocate new memory
     strcpy(data,s);
   }
};

int main()
{
    simple A("Mickey Mouse");
    simple B;
    B = A;
    simple C(A);
    cout << A.getdata() << endl;
    cout << B.getdata() << endl;
    cout << C.getdata() << endl;
    A.setdata("Donald Duck");
    cout << A.getdata() << endl;    // should print Donald Duck
    cout << B.getdata() << endl;    // should print Mickey Mouse
    cout << C.getdata() << endl;    // should print Mickey Mouse
    return 0;
}

Study this code closely; there is a very subtle error in it which will plague you throughout your programming career if you do not understand it. The program creates three instances of the class simple, called A, B, and C. A is initialized using a constructor, B is set using the assignment operator, and C is initialized using the copy constructor. Note that there is no code to overload the assignment operator (=) and there is no code for a copy constructor. In both cases the defaults are used. The problem is that this code does not behave properly. The value of data is changed from Mickey Mouse to Donald Duck in object A, but this has the unwanted effect of changing the value in all three. In addition, the program crashes in the destructor function for object B at the end of main. (It is reported in Visual Studio as a ``Debug Assertion Error''.) Here is the output of this program before it crashes:

Mickey Mouse
Mickey Mouse
Mickey Mouse
Donald Duck
Donald Duck
Donald Duck

The reason this program does not work properly is that the value of data in instances B and C is set using the = operator, which means that data, which is a pointer, is pointing to the same memory address in all three objects. When the call to A.setdata occurs, that memory is deleted and reallocated. On our systems it just happens that the same memory which formerly held Mickey Mouse is reallocated to hold Donald Duck. Objects B and C are still pointing to this memory. The following diagram represents what has happened.

$\begin{picture}(400,90)(0,0) \par\put(10,68)A \put(20,65){\framebox (50,15)} \pu... ...){$\backslash$0}} \par\put(230,0){after call to {\tt A.setdata()}} \end{picture}$

At the end of main, the destructor is called for each of the three objects. Since they all point to the same memory, only the first destructor called will succeed in deleting the memory. When the second destructor call tries to delete the same memory, the crash occurs.

The problem is that these copies of instance A are referred to as shallow copies. Only the pointer values themselves were duplicated. What is needed here is a deep copy, so that each new instance receives its own copies of the dynamically allocated data members.

C++ allows you to write your own copy constructor and to overload the assignment operator for classes if you need to in order to correct this problem. For this example, adding the following two functions to the simple class will cause the program to behave correctly.

class simple {
...
  simple(const simple &copy) {           // copy constructor
    if (copy.data != NULL) {
      data = new char[strlen(copy.data)+1];
      strcpy(data,copy.data);
    } else
      data = NULL;
  }
                                         // rhs stands for "right hand side"
  simple& operator=(const simple &rhs) {  // overload assignment operator
    if (this != &rhs) {                  // do not copy to yourself
      if (data!=NULL) delete [] data;    // free up memory if needed
      if (rhs.data != NULL) {            // copy string from rhs if it exists
        data = new char[strlen(rhs.data)+1];
        strcpy(data,rhs.data);
      } else
        data = NULL;
    }
    return *this;                        // always end with this line
  }
...

The corrected program is in /dept/cs/cs2/download/ex1fix.cpp. The output and the corresponding memory diagrams now look like this:

Mickey Mouse
Mickey Mouse
Mickey Mouse
Donald Duck
Mickey Mouse
Mickey Mouse

$\begin{picture}(400,90)(0,0) \par\put(10,68)A \put(20,65){\framebox (50,15)} \pu... ...){$\backslash$0}} \par\put(230,0){after call to {\tt A.setdata()}} \end{picture}$

Copy constructors and const declarations

Notice that the parameter for both the copy constructor and the overloaded = operator is a const reference (&) parameter. A copy constructor requires that its argument be passed in this way. If its parameter were passed by value instead of by reference, it would have to make a copy of the object for use inside the function - but it would have to call itself, the copy constructor, to do that. The keyword const means that the function is not permitted to change the values of any data members of the object copy being passed in.

When an object is passed into a function as a const reference parameter, any statements which could change the values stored in that object are caught as errors. We clearly don't want a copy constructor to change the values stored in the object being copied. For example, if you add the statement copy.data = NULL; to the copy constructor above, you will get a compiler error. In Visual Studio 6, the error will read:
error C2166: l-value specifies const object

What if you call a member function of a const reference object? How can the compiler be sure that the other function won't change the object's values? For example, what if we changed the simple copy constructor to look like this:

  simple(const simple &x)  {
    if (x.data != NULL) {
      data = new char[strlen(x.getdata())+1];
      strcpy (data, x.getdata());
    } else
      data = NULL;
  }

The compiler can't guarantee that the call to getdata won't change the values of object x, so it gives an error message. The Visual Studio 6 message reads:

error C2662: 'getdata' : cannot convert 'this' pointer from 'const class simple' to 'class simple &'

To tell the compiler that a member function will not change the object calling it, you must add the const keyword to that function's prototype and definition. In this case, we have to change the getdata member function to the following:
char * getdata() const {return data;}

The addition of the keyword const after the parameter list is how you specify that the function will not modify the object.

Assignment operators and the this pointer

The code for an overloaded assignment operator should contain all of the lines in the copy constructor, but it has a few more things to do because it is operating on an object that has already been instantiated. (A copy constructor, in contrast, is creating a new instance.)

First check to make sure that the program is not trying to assign an object to itself, e.g., the statement B = B; How can this be checked? In C++, the keyword this in any class member function is a pointer to the object on which the function has been invoked. The first line of the assignment operator checks to see if this and the address of the object being passed in (&rhs) are the same memory location. If they are, the function skips the rest of its statements until the return statement, because nothing is necessary to make an object equal to itself.
The second task is to check if any pointers in the object on the left hand side of the = are pointing to memory that needs to be recovered using delete. This test prevents memory leaks.
Next, copy each of the members of the object, just as is done for the copy constructor. You should be able to reuse the same code.
Finally, return a copy of the object that has been assigned. This is accomplished by making the return type of the function a reference to an object of the class, and by having the last statement of the function be return *this;. Why do we need to do this? The following short example shows two cases where this feature is used:

1  int x, y, z;
2  x = y = z = 3;     
3  while (x=y) {
4     cout << x << ' ';
5     y--;
6  }

The = operator associates right to left (most other operators associate left to right). This means that the rightmost assignment is done first, and the return value is then used as the right operand for the next operation. You can see this in line 2. The operation z = 3 is done first. This operation returns 3, so this value is then assigned to y. This second assignment also returns 3, and this value is assigned to x.

In line 3, the statement inside the parentheses looks like it is wrong, but in fact it is assigning the value of y to the value of x, and resulting return value is tested as the while loop condition. Thus, this is a strange way of looping until y has the value zero (remember that zero means false and nonzero means true). This code would print

3 2 1

and after the loop, both x and y would have the value zero.

Exercise 1: Write code for a class Bag which starts out as follows:

class Bag {
private:
   double *thedata[100]; // note that this is an array of pointers
   int size;             // counter of how many items are stored in the bag
public:
   Bag();
   Bag(const Bag&);      // copy constructor
   Bag& operator=(const Bag &);  // assignment operator
   int BagSize();        // returns the number of elements in the bag
   void Insert(double);  // inserts a new element into the bag
   void PrintBag();      // prints each element in the bag
   ~Bag();               // a destructor
};

A main program has been provided for you in /dept/cs/cs2/wksht10/bagmain.cpp.

Implicit calls to the copy constructor and destructor

Even if there are no statements in your code which explictly call a copy constructor or a destructor for a class, these are implicitly called whenever an instance of a class is passed by value into a function. This is because a function call makes a copy of whatever data is passed when the data is call-by-value (the default), and for a class, this is done by calling the copy constructor for the class. Likewise, when a function terminates, the destructor is called for all local variables and local arguments.

Consider this program, which you can find in /dept/cs/cs2/download/ex2bug.cpp:

#include <iostream>
using namespace std;
class buggyclass {
private:
   char *aString;
public:
  buggyclass() {aString = NULL;}
  buggyclass(char *s) {
     aString = new char[strlen(s)+1];
     strcpy(aString,s);
  }
  buggyclass(const buggyclass& b)            // copy constructor, identical to
  {                                          // default provided by C++
    cout << " In copy constructor " << endl;
    aString = b.aString;                     // what's wrong with this?
  }
  ~buggyclass() {
    cout << " In buggyclass destructor, deleting memory at address " << 
      (int)aString << endl;
     if (aString != NULL) delete [] aString;
  }

  char *getstring() {            // eventually needs to be a const function
      char *retval = new char[strlen(aString)+1];
      strcpy(retval, aString);
      return retval;
  }
};

void afunction(buggyclass b) 
{
    cout << "  In afunction, b is " << b.getstring() << endl;
}

int main()
{
   buggyclass rpi("Rensselaer");  // create an instance of the class
   cout << "Before first call" << endl;
   afunction(rpi);  
   cout << "Before second call" << endl;
   afunction(rpi);
   cout << "About to exit" << endl;
   return 0;
}

Note that nowhere in the code is either the copy constructor or the destructor for the class buggyclass ever explicitly called in the program. If you compile and run the program on Visual Studio 6, it will crash. You will see output that looks like this:

Before first call
 In copy constructor 
  In afunction, b is Rensselaer
 In buggyclass destructor, deleting memory at address 264288
Before second call
 In copy constructor 
  In afunction, b is Rensselaer
 In buggyclass destructor, deleting memory at address 264288
<DEBUG ASSERTION ERROR>

If this program worked correctly, it would call the copy constructor twice, at the beginning of each function call when a new copy of the object rpi is made. It would call the destructor three times, at the termination of each function call and at the termination of main. As it is written, it attempts to delete the same memory location twice, resulting in a crash.

There are two ways to avoid and correct these problems, and it is important to understand both of them. You will often need to use both in your programs.

Method 1: Write the copy constructor correctly for the class buggyclass, since it uses pointers. (While you're at it, write a correct assignment operator for the class as well.)

  buggyclass(const buggyclass& b)  // copy constructor
  {
    cout << " In copy constructor " << endl;
     if (b.aString != NULL) {
       aString = new char[strlen(b.getstring())+1];
       strcpy(aString,b.getstring());
     } else
       aString = NULL;
  }

Method 2: When you pass an instance of a class to a function, pass it as a const reference parameter. In this example, the function header for afunction would be changed to the following:
void afunction(const buggyclass& b)
When objects are passed in this way, no copy of the class instance needs to be made. Remember, though, that some member functions will need to be changed to const functions (such as getstring in this example). Passing objects by reference also gives better performance, since the function call does not create and then destroy a temporary object.

  void afunction(const buggyclass& b) 
  {
     cout << "  In afunction, b is " << b.getstring() << endl;
  }

The Moral of the Story

If you write a class which uses any pointers, it should always include the following member functions:

copy constructor
overloaded assignment operator
destructor

In addition, any member functions that manipulate pointers (such as setname in the first example) must avoid memory leaks by properly deleting memory which is no longer being used.

If your class does not use any pointers, do not write these functions. Rely on the ones built into the language.

Pass objects to functions as const & unless you need to be able to change the object being passed in.