The benchmark code used is a plasma particle-in-cell code based on the General Concurrent PIC algorithm . The Fortran 77 codes have been well-benchmarked . The Fortran 90 and C++ [3,4,5] versions were designed from the original Fortran 77 codes.
|RS/6000||Fortran 77||IBM xlf||450,000||245.49|
|RS/6000||Fortran 90||IBM xlf90||450,000||364.25|
|RS/6000||Fortran 90||IBM xlf90||327,680||526.71|
|RS/6000||Fortran 77||IBM xlf||327,680||549.23|
Functions calling private data without in-lining contributed to the Fortran 90 program overhead in the one-dimensional program. A different object model, which included better abstractions, allows the Fortran 90 program to perform better than the Fortran 77 and C++ versions in the two-dimensional case as seen in the graph below.
|SP2||32||Fortran 77||IBM xlf||3,571,712||159.08|
|SP2||32||Fortran 90||IBM xlf90||3,571,712||202.88|
|SP2||4||Fortran 77||IBM xlf||327,680||114.31|
|SP2||4||Fortran 90||IBM xlf90||327,680||117.49|
Much more extensive performance comparisons are available in the publications, including comparisons among various machines and compilers from additional vendors. A plot of the 32 processor experiment is shown below.
|SP2||32||Fortran 77||IBM xlf90||7,962,624||1548.71|
|SP2||32||Fortran 77||IBM xlf||7,962,624||1550.14|
|SP2||32||Fortran 90||IBM xlf90||7,962,624||1339.91|
The Fortran 90 version outperformed the Fortran 77 versions due to improved cache-utilization of field components. The Fortran 90 (and C++) version encapsulates components into a single derived type, but the Fortran 77 version stores field elements in separate arrays.
The most aggressive optimizations produced the fastest timings; these are represented in the table. The KAI C++ compiler with K3 -O3 --abstract_pointer spent OVER 2 HOURS in the compilation process. The IBM F90 compiler with -O3 -qlanglvl=90std -qstrict -qalias=noaryovrlp used 5 MINUTES for compilation. (The KAI compiler generated faster executables than the IBM xlC C++ compiler.)
3D Parallel Plasma PIC Experiments - CPU Times for Various
(KAI C++, IBM F90, and IBM F77 with IBM MPI)