CompOrg Spring 2005 Lab 13

Cache Architecture

All the files for this lab are located on d29-170.dyn.cs.rpi.edu in ~hollingd/public/lab13.


Cache Simulator Program

The program cachesim.pl does some simple caching simulations. The program allows you to set some memory/caching parameters via the command line arguments:

Note that the simulator insists that cachesize, blocksize and memorysize are all powers of 2 and that the entire design makes sense (it won't let you use a block size larger than the memory!).

You can also turn on "verbose" mode by including the -v flag on the command line. For example, to start up the simulator on a simulated system that includes 2K main memory, 256 byte cache, block size 16 bytes, direct mapped block placement and with lots of messages printed out:

code/cachesim.pl -m 2k -s 256 -b 16 -a 1 -v
Simulated Cache Parameters:
  Main memory is 2048 bytes (11 bit addresses)
  Total cache size is 256 bytes
  Block size is 16 bytes
  Number of cache slots is 16
  Direct Mapped
  Block offset mask is 00000001111
  Index mask is        00011110000
  Block number mask is 11100000000

With the verbose flag turned on you see the output shown above, this includes information about the simulated memory system/cache including the masks that define which part of an address correspond to the offset, index and block address.

There are two general mode of operation: you can use it to interactively follow what happens during a sequence of memory references. To do this simply start up the simulator and then type in a sequence of addresses (one per line) in decimal. The addresses you enter need to fall within the range of possible addresses, for example in the example above the possible addresses are 0 - 2047.If you have the verbose flag turned on then each time you enter a new address you see information about that address including what set/slot it will be in (or be put in if it's a miss) and whether the memory access is a hit or a miss.

You can type the command 'show' and the simulator will display the current contents of the cache. The slot ages displayed are used by the simulator to use a least-recently-used displacement policy (determined who is kicked out of the cache in a set-associative cache). You can also type 'help' to get a list of valid commands. This interactive mode is for exploring what happens during a sequence of memory accesses - you can see every change to the cache as it happens.

Below is a sample session using the simulator in this interactive mode:

~/public/vlab  ./cachesim.pl -m 256 -s 32  -b 4 -a 1 -v
Simulated Cache Parameters:
  Main memory is 256 bytes (8 bit addresses)
  Total cache size is 32 bytes
  Block size is 4 bytes
  Number of cache slots is 8
  Direct Mapped
  Block offset mask is 00000011
  Index mask is        00011100
  Block number mask is 11100000
show cache
  Slot: 0        Tag: -----      Age: --
  Slot: 1        Tag: -----      Age: --
  Slot: 2        Tag: -----      Age: --
  Slot: 3        Tag: -----      Age: --
  Slot: 4        Tag: -----      Age: --
  Slot: 5        Tag: -----      Age: --
  Slot: 6        Tag: -----      Age: --
  Slot: 7        Tag: -----      Age: --
13
Address: 00001101  block 0 (000) set 3 (011) offset 1 (01)  MISS
44
Address: 00101100  block 1 (001) set 3 (011) offset 0 (00)  MISS
0
Address: 00000000  block 0 (000) set 0 (000) offset 0 (00)  MISS
show cache
  Slot: 0        Tag:     0      Age: 0         Last Address 0
  Slot: 1        Tag: -----      Age: --
  Slot: 2        Tag: -----      Age: --
  Slot: 3        Tag:     1      Age: 0         Last Address 44
  Slot: 4        Tag: -----      Age: --
  Slot: 5        Tag: -----      Age: --
  Slot: 6        Tag: -----      Age: --
  Slot: 7        Tag: -----      Age: --
22
Address: 00010110  block 0 (000) set 5 (101) offset 2 (10)  MISS
17
Address: 00010001  block 0 (000) set 4 (100) offset 1 (01)  MISS
2
Address: 00000010  block 0 (000) set 0 (000) offset 2 (10)  HIT
1
Address: 00000001  block 0 (000) set 0 (000) offset 1 (01)  HIT
0
Address: 00000000  block 0 (000) set 0 (000) offset 0 (00)  HIT
2
Address: 00000010  block 0 (000) set 0 (000) offset 2 (10)  HIT
show
  Slot: 0        Tag:     0      Age: 0         Last Address 2
  Slot: 1        Tag: -----      Age: --
  Slot: 2        Tag: -----      Age: --
  Slot: 3        Tag:     1      Age: 0         Last Address 44
  Slot: 4        Tag:     0      Age: 0         Last Address 17
  Slot: 5        Tag:     0      Age: 0         Last Address 22
  Slot: 6        Tag: -----      Age: --
  Slot: 7        Tag: -----      Age: --
quit

The other way to use the program is to generate a sequence of memory accesses that correspond to a real program. For example, you could write a C program that outputs the address of each array element at the time the element is accessed, then feed this sequence to the simulator and find out the resulting hit rate. The simulator simply reads commands one line at a time from STDIN, so if you arrange for this to be connected to the output of your program you can easily measure the hit rate for a fixed program on a variety of cache designs.

For example, the following code generates the same memory access pattern that a program that steps through an array of char in stride-1 would generate:

s1char.c
#include <stdio.h>

// stride-1 access pattern for an array of size 100

int main() {
  int i;
  for (i=0;i<100;i++) {
	printf("%d\n",i);
  }
}

Compiling an running the code with the output piped to the simulator looks like this:

./s1char | ./cachesim.pl -m 256 -s 32 -b 4 -a 1
hit rate was 75.0000%

If we change the block size to be larger, we will benefit from the support for spatial locality:

 ./s1char | ./cachesim.pl -m 256 -s 32 -b 16 -a 1
hit rate was 93.0000%

You can run the simulator as ~hollingd/public/vlab/cachesim.pl or you can just make a copy of the program in your own directory:

cp ~hollingd/public/vlab/cachesim.pl .


Direct Mapped Cache Designs

Modify the programs we used in lab #12 (transpose.c and mm.c) to output addresses so that they can be used with the simulator. The simulator is written in Perl and is pretty slow, so you should use a SIZE that is not too big (50, 100, ?) and run a few loops (but not millions!). You need to have these programs output the address of each array element instead of actually accessing the array element.

Run the programs with various direct mapped cache designs and see if how well you can do (maximize the hit rate). Keep in mind that you can modify block size while keeping cachesize fixed (if you modify both at the same time you don't really know which change effects the resulting hit rate!).

Set-Associative Mapped Cache Designs

Try to improve on your best cache design by playing with 2-way and 4-way set associative caches. Once again, don't change two cache design parameters at the same time!