Project

The project consists of implementing several algorithms for analysis of Java. It is written by Nasko Rountev at Ohio State University. The implementation itself will be in Java. All work should be done on one of CS UNIX machines. Use tcsh (i.e., tc shell), and add the following lines to your .cshrc:

setenv JAVAVERSION 1.2.2
setenv CLASSPATH .':'/projects/proganalysis/SEII-project/soot-1.0.0/soot/classes':'/projects/proganalysis/SEII-project/soot-1.0.0/jasmin/classes

For this project we will use Soot — a framework for analysis and optimization of Java bytecode developed at McGill University. All your code will be written on top of Soot. Successive project phases will introduce aspect of Soot that are relevant to the goals of the project.

Phase 1: due March 7

The first phase of the project aims at introducing the internal representation used by Soot. Given the bytecode for some Java classes (A.class, B.class, etc), Soot creates its own representation referred to as JIMPLE. The easiest way to think about JIMPLE is as "simplified Java". Conceptually, the complexity of JIMPLE is somewhere between Java source code (A.java) and Java bytecode (A.class). All analyses in this project will process JIMPLE as input. This approach has several advantages: In this phase you will learn more about JIMPLE. I have prepared several simple Java programs for which you will have to generate JIMPLE and examine it. This will be a warm-up for the next project phase, in which you will implement an algorithm that analyzes the JIMPLE for these programs.

Detailed instructions for phase 1 are available here.

Phase 2: due March 28

In this phase of the project you will implement Class Hierarchy Analysis (CHA) on top of Soot. Your code will traverse the JIMPLE representation and will implement the algorithm for CHA presented in class. A large part of this algorithm is already coded; you just have to finish it.

The CHA implementation consists of five Java classes: ChaMain, Loader, ChaAnalysis, ChaWriter, and Hierarchy. To invoke this implementation, you have to use a Perl script similar to the one from phase 1. As output, the analysis creates several text files in the appropriate directory - for example, if you are analyzing p3, the files are stored in phase2/p3. These files contain information about the call graph computed by CHA.

At this point, the 5 Java classes (ChaMain, ...) compile and run successfully on the 7 programs from phase 1. However, since the implementation is incomplete, the computed call graph info is incorrect. You have to finish the implementation and make sure that the output for the 7 programs is correct, using the results from the manual simulation of CHA from phase 1.

Detailed instructions for phase 2 are available here.

Testing: For testing, you may want to have a look at the results from phase 1

Phase 3: due April 8

In this phase of the project you will insert instrumentation in the JIMPLE code. This instrumentation will keep track of the run-time behavior of the program (e.g., which methods are called, and from which call sites). Such information is essential for performing coverage analysis of test suites. Only coverage for the toy programs will be measured at this point.

As before, most of the code for this phase is already in place; your job is to finish the implementation. To get started, please do the following: