CompOrg Fall 2005 Lab

Lab #11 - C and Assembly Optimization

11/30/2005

All the files for this lab are located in ~hollingd/public/lab12.


Using time command and performance tuning

The C program shown below calls the function count_to_n with a count specified as a command line parameter. You also specify the number of times that count_to_n is called, for example, to run the program and tell it to call count_to_n(10000000) 100 times:

> ./cnt 10000000 100

The program does not generate any output.

cnt.c
#include <stdio.h>

int count_to_n(int n) {
  int i;
  int tot;
  for (i=0;i<n;i++) {
	tot++;
  }
  return(tot);
}


int main(int argc, char **argv) {
  int n,times;
  int i;

  if (argc<3) {
	printf("Usage: cnt n number_of_times\n");
	exit(1);
  }

  n = atoi(argv[1]);
  times = atoi(argv[2]);

  for (i=0;i<times;i++) {
	count_to_n(n);
  }
  return(1);
}

You will use the time command to measure how long the program runs. The time command will run a program for you and keep track of how long it takes the program to run. The time command will output three times:

Below is sample use of the time command on the above program:

> gcc -o cnt cnt.c
> time ./cnt 10000000 100
 
real    0m12.600s
user    0m11.122s
sys     0m0.039s

As shown, this program took 11.122 seconds when running with the parameters shown (counts to 10,000,000 100 times).

Your job is to make some measurements of this program using the time command, then apply some optimizations and see how fast you can get it to run.

  1. Try modifying the command line parameters to the cnt program to determine what the overhead of the function call contributes to the total execution time. For example, if you switch the parameters the program will count to 100 10,000,000 times (same total number of counts, but lots more subroutine calls).

  2. The function count_to_n doesn't do much, but your jobs is to speed it up as much as possible by changing the C code (using compiler optimizations not allowed!). Use the parameters 10000000 100 to make your measurements.

  3. Beat this time (on monica.cs.rpi.edu): 5 seconds (C), 4.3 seconds (Assembly)

  4. Get serious! Generate assembly code (gcc -S cnt.c), see what is happening at the assembly level, and now see how fast you can get the program by modifying the assembly code for count_to_n. Your code must actually have the loop (and do all the actual work involved, you can't just return n)

  5. Compile the original source with optimizations turned on: gcc -O3 -o cnt cnt.c. Run (and time) this version. Go back to previous step and try harder! (or look at the code generated by gcc when optimization is turned on, and yell "it cheats!").


Using time to measure various operations.

Use the time command to make some measurements that can be used to compare the relative time of various integer and floating point operations. Below is some code to get you started (it supports only integer addition and subtraction). You need to make measurements of the following operations:

Your results should be relative (how much slower is division that addition), absolute times are not important. Don't trust any timings less than 1 second (modify parameters to make sure the program takes at least one second to run).

ops.c
#include <stdio.h>

int main(int argc, char **argv) {
  int times,i;
  int a,b,c;
  char op;

  if (argc<3) {
	printf("Usage: ops operation number_of_times\n");
	printf("   valid operations: + - \n");
	exit(1);
  }

  op = argv[1][0];
  times = atoi(argv[2]);

  switch(op) {
  case '+':
	for (i=0;i<times;i++)  a = b + c; break;
  case '-':
	for (i=0;i<times;i++)  a = b - c; break;
  default:
	printf("invalid operation. Must be + or - \n");
  }
  return(1);
}
  

Using gprof

This exercise involves using gprof to profile a program (so that you can see where the program can be improved). Profiling using gprof requires that you use a special option when compiling (-pg), this includes profiling code in your application that will create a data file named gmon.out when you run the program. gmon.out includes information about how many times each function got called, and how much time was spent in each function. The gprof program presents this information in a human readable format.

You can find complete documentation on using gprof at gnu.org gprof documentation. Below is shown a sample session to get you started.

> gcc -pg -o cnt cnt.c
>  ./cnt 1000000 100
>  gprof -b -T cnt
 
granularity: each sample hit covers 4 byte(s) for 4.35% of 0.23
seconds
 
                                  called/total       parents
index  %time    self descendants  called+self    name           index
                                  called/total       children
 
                0.23        0.00     100/100         main [2]
[1]    100.0    0.23        0.00     100         count_to_n [1]
 
-----------------------------------------------
 
                                                     <spontaneous>
[2]    100.0    0.00        0.23                 main [2]
                0.23        0.00     100/100         count_to_n [1]
 
-----------------------------------------------
 

 
granularity: each sample hit covers 4 byte(s) for 4.35% of 0.23
seconds
 
  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
100.0       0.23     0.23      100     2.30     2.30  count_to_n [1]

Index by function name
 
   [1] count_to_n

Your job is to profile the code shown below that searches a string for a substring. There are a things you need to know before you can get this to work:

Here is the code you need to profile:

substring.c
#include <stdio.h>

/* substring locates a substring sub in a string s.
   If the substring is not found, it returns -1.
   If the substring is found, it returns the index in s
   where the substring is found.

   strncmp(s1,s2,n) compares the first n characters of two strings
   and returns 0 if they match.
*/
int substring( char *s, char *sub) {
  int i;

  for (i=0;i<strlen(s);i++) {
	if (strncmp(s+i,sub,strlen(sub))==0) 
	  return(i);
  }
  return(-1);
}

int main(int argc, char **argv) {
  int pos;

  if (argc<3) {
	printf("Usage: substring string sub\n");
	exit(1);
  }
  printf("Looking for %s in %s\n",argv[2],argv[1]);
  pos = substring(argv[1],argv[2]);
  if (pos==-1) 
	printf("Not found\n");
  else
	printf("found at position %d\n",pos);
  return(1);
}

Once you have profiling working, start trying to optimize the substring function and see how much faster you can make it for your test case(s). Profile the new code and find out where the time is spent as well!