Performance Comparisons

Performance Comparison of Sequential and Parallel Fortran 77, Fortran 90 and C++ Programs

A small subset of programs written in Fortran 77, Fortran 90 and C++ are compared based on their run-time performance. The Fortran 90 and C++ programs are object-oriented, derived from the original Fortran 77 programs. A variety of simulations have been developed in one, two and three dimensions. For illustrative purposes we have selected some test cases from a two-stream instability experiment.

The benchmark code used is a plasma particle-in-cell code based on the General Concurrent PIC algorithm [2]. The Fortran 77 codes have been well-benchmarked [1]. The Fortran 90 and C++ [3,4,5] versions were designed from the original Fortran 77 codes.

IBM RS/6000 (AIX 4.1) Sequential Performance Comparison

Machine Language Compiler Particles Time (sec)
One-Dimensional Program
RS/6000 Fortran 77 IBM xlf 450,000 245.49
RS/6000 Fortran 90 IBM xlf90 450,000 364.25
RS/6000 C++ IBM xlC 450,000 508.00
Two-Dimensional Program
RS/6000 Fortran 90 IBM xlf90 327,680 526.71
RS/6000 Fortran 77 IBM xlf 327,680 549.23
RS/6000 C++ IBM xlC 327,680 667.00

Functions calling private data without in-lining contributed to the Fortran 90 program overhead in the one-dimensional program. A different object model, which included better abstractions, allows the Fortran 90 program to perform better than the Fortran 77 and C++ versions in the two-dimensional case as seen in the graph below.

IBM SP2 Parallel Performance Comparison

The table below shows performance comparisons for a two-dimensional parallel Fortran 90 program using the MPI message passing library.

Machine PEs Language Compiler Particles Time (sec)
Two-Dimensional Program
SP2 32 Fortran 77 IBM xlf 3,571,712 159.08
SP2 32 Fortran 90 IBM xlf90 3,571,712 202.88
SP2 32 C++ IBM xlC 3,571,712 359.00
Two-Dimensional Program
SP2 4 Fortran 77 IBM xlf 327,680 114.31
SP2 4 Fortran 90 IBM xlf90 327,680 117.49
SP2 4 C++ IBM xlC 327,680 249.00

Much more extensive performance comparisons are available in the publications, including comparisons among various machines and compilers from additional vendors. A plot of the 32 processor experiment is shown below.

Performance of a three-dimensional parallel Fortran 90 program, using MPI, is also available. Details of this work can be found in the following paper [5].

Machine PEs Language Compiler Particles Time (sec)
Three-Dimensional Program
SP2 32 Fortran 77 IBM xlf90 7,962,624 1548.71
SP2 32 Fortran 77 IBM xlf 7,962,624 1550.14
SP2 32 Fortran 90 IBM xlf90 7,962,624 1339.91
SP2 32 C++ IBM xlC 7,962,624 2797.00

The Fortran 90 version outperformed the Fortran 77 versions due to improved cache-utilization of field components. The Fortran 90 (and C++) version encapsulates components into a single derived type, but the Fortran 77 version stores field elements in separate arrays.

Comparison against the KAI Optimizing C++ Compiler

The chart below shows results for a 3D code on the Cornell SP, recently upgraded with the P2SC Chips. The C++ code used the KAI C++ compiler.

The most aggressive optimizations produced the fastest timings; these are represented in the table. The KAI C++ compiler with K3 -O3 --abstract_pointer spent OVER 2 HOURS in the compilation process. The IBM F90 compiler with -O3 -qlanglvl=90std -qstrict -qalias=noaryovrlp used 5 MINUTES for compilation. (The KAI compiler generated faster executables than the IBM xlC C++ compiler.)

Times in yellow use the -qarch=pwr2 -qtune=pwr2 hardware optimization switches.

3D Parallel Plasma PIC Experiments - CPU Times for Various Compilers
(KAI C++, IBM F90, and IBM F77 with IBM MPI)

References

  1. Skeleton PIC Codes for Parallel Computers
    V. K. Decyk
    Computer Physics Communications, 87(1&2):87-94, May II, 1995.

  2. A General Concurrent Algorithm for Plasma Particle-in-Cell Simulation Codes
    P. C. Liewer and V. K. Decyk
    J. of Computational Physics, 85:302-322, 1989.

  3. On Parallel Object Oriented Programming in Fortran 90
    C. D. Norton, V. K. Decyk, and B. K. Szymanski
    ACM SIGAPP Applied Computing Review, 4(1):27-31, Spring 1996.

  4. Object Oriented Parallel Computation for Plasma Simulation
    C. Norton, B. Szymanski and V. Decyk
    Communications of the ACM, 38(10):88-100, Oct. 1995.

  5. High Performance Object-Oriented Programming in Fortran 90
    C. D. Norton, V. K. Decyk, and B. K. Szymanski
    To appear in Proc. Eighth SIAM Conference on Parallel Processsing for Scientific Computing, March 14-17, 1997.
UP