Performance Comparison of Data Structure Implementations in Fortran 90

An important question, often asked, is "what is the performance penalty of using Fortran 90 data abstraction features?". To address this issue, we collaborated with Gunnar Ledfelt of the Department of Numerical Analysis and Computing Science, NADA at KTH, who was working on a computational electromagnetics code. A series of benchmarks that explore abstraction techniques and performance are studied.

Comparison of F90 and F77 Compilers on F77 Style Code

In the first benchmark we compare the performance of a vendor's F90 and F77 compilers on standard F77 style loops and data structures. The test problem solves Maxwell's Equations using the Yee-scheme.

In the F77 style version static arrays are used:

dimension Ex(n,n,n), Ey(n,n,n), Ez(n,n,n)
dimension Hx(n,n,n), Hy(n,n,n), Hz(n,n,n)

And passed to:

STANDARD: call update77(Ex,Ey,Ez,Hx,Hy,Hz,n)
Fortran 77 style loop on Fortran 77 data.

Fortran 77 Source Code For: Ledfelt Examples

In the F90 version, allocatable arrays are used:

integer, parameter :: rfp=kind(0e0)
real(kind=rfp), dimension(:,:,:), allocatable :: Ex, Ey, Ez, Hx, Hy, Hz

And passed to:

F90: call update90(Ex,Ey,Ez,Hx,Hy,Hz,n)
Fortran 77 style loop on Fortran 90 data.

Fortran 90 Source Code For: Ledfelt Examples

We expect the performance of these examples to be identical:


Vendor	F90 & F77 Performance Ratio
SUN	1.80
IBM	1.00
HP/Convex	5.75
DEC	0.96
Cray T3E	1.07
SGI Origin 2000	1.08
Apple Absoft Mac	1.07

One can see from the chart that HP and SUN have substantial performance problems with their F90 compiler.

Abstract Data Type with Array Pointers

One useful abstract data type consists of bundling together multiple arrays in a single type. This simplifies argument passing for arrays which logically belong together.

In the second benchmark, we create a data structure which consists of six pointers to 3D arrays of reals that represent the electromagnetic field in Maxwell's Equations.

integer, parameter :: rfp=kind(0e0)
type adt_type
    real(kind=rfp), dimension(:,:,:), pointer :: Ex, Ey, Ez, Hx, Hy, Hz
end type adt_type
type (adt_type) :: adt

After initialization, this abstract type is passed to:

ADT: call adt_update(adt,n)
Pointer components are referenced directly within the procedure.

The chart below compares performance of the abstract data type (ADT) versus the Fortran 77 (STANDARD) procedure update77.

In all cases we observe a performance penalty when using pointers to arrays within a derived type.

Some performance loss using these ADTs is expected due to pointer aliasing (multiple pointers referring to the same memory location). The compiler cannot always determine if two different pointer references refer to the same memory location, and therefore must produce safer, slower code. In Fortran 77 style arrays such overlapping references are forbidden, allowing the compiler to produce faster code. This issue is common to the C programming language where excessive use of pointers can degrade performance due to aliasing.

However with some compilers (especially HP and Absoft/Mac) the degradation is very large.


Vendor	ADT/Standard Performance Ratio
SUN	1.80
IBM	2.04
HP/Convex	11.5
DEC	1.71
Cray T3E	1.41
SGI Origin 2000	1.57
Apple Absoft Mac	3.13

Fortran 2000 has created a new language feature to overcome the aliasing problem: derived types will be able to contain allocatable arrays instead of pointers.

Layering with Abstract Data Types

One way to overcome the aliasing problem, while retaining the simple user interface, is to construct a layer. The ADT is passed to the layer, but its components are dereferenced within the layer. With this approach the compiler can infer that the ADT components do not overlap. The layer approach should perform as well as the Fortran 77 approach.

ADT_LAYER	subroutine adt_layer(adt,n)
	call update90(adt%Ex,adt%Ey,adt%Ez,adt%Hx,adt%Hy,adt%Hz,n)
	end subroutine adt_layer(adt,n)

The chart below compares passing components of the abstract data type (ADT) versus the standard way of passing array components.

Most compilers performed as expected with the layer, but SUN, HP, and DEC did not.


Vendor	ADT_Layer/Standard Performance Ratio
SUN	3.00
IBM	1.00
HP/Convex	3.28
DEC	*Core Dump*
Cray T3E	1.07
SGI Origin 2000	1.12
Apple Absoft Mac	1.06

Conclusions

The results vary dramatically by compiler. The penalty for using ADTs directly was usually substantial (>40%). The layering approach worked well most of the time, and was almost always better than using ADTs directly. (The DEC compiler produced a core dump when the ADT_LAYER example was applied, so no results are available in this case.)

Notes

The compiler options used were:


SUN f77 -O3 (version 4.0)
SUN f90 -O3 (version 1.1)
IBM xlf -O3 -qarch=pwr2 (version 3.2.5)
IBM xlf90 -O3 -qarch=pwr2 (version 3.2.5)
HP/Convex f77 -O (version 1.2.6)
HP/Convex f90 -O3 (version 2.0)
DEC f77 -O3 (version 5.1-156)
DEC f90 -O3 (version 5.1-594)
Cray T3E f90 -O3 (version 3.2.0)
SGI Origin f77 -O3 -IPA (version 7.2.1.3m)
SGI Origin f90 -O3 -IPA (version 7.2.1.2m)
Absoft/Mac f77 -O -Q92 (version Pro Fortran 6.0)
Absoft/Mac f90 -O -Q92 (version Pro Fortran 6.0)

* = Non-standard array size.