Go to the first, previous, next, last section, table of contents.

Compiling Programs

Note: the procedures described in this section are only available when the compiler is loaded, as in the `compiler.com' world image. Furthermore, cf is only available on machines that support native-code compilation.

Compilation Procedures

procedure+: cf filename [destination]

This is the program that transforms a source-code file into native-code binary form. If destination is not given, as in

(cf "foo")

cf compiles the file `foo.scm', producing the file `foo.com' (incidentally it will also produce `foo.bin', `foo.bci', and possibly `foo.ext'). If you later evaluate

(load "foo")

`foo.com' will be loaded rather than `foo.scm'.

If destination is given, it says where the output files should go. If this argument is a directory, they go in that directory, e.g.:

(cf "foo" "../bar/")

will take `foo.scm' and generate the file `../bar/foo.com'. If destination is not a directory, it is the root name of the output:

(cf "foo" "bar")

takes `foo.scm' and generates `bar.com'.

About the `.bci' files: these files contain the debugging information that Scheme uses when you call debug to examine compiled code. When you load a `.com' file, Scheme remembers where it was loaded from, and when the debugger (or pp) looks at the compiled code from that file, it attempts to find the `.bci' file in the same directory from which the `.com' file was loaded. Thus it is a good idea to leave these files together.

`.bci' files are stored in a compressed format. The debugger has to uncompress the files when it looks at them, and on a slow machine this can take a noticeable time. The system takes steps to reduce the impact of this behaviour: debugging information is cached in memory, and uncompressed versions of `.bci' files are kept around. The default behavior is that a temporary file is created and the `.bci' file is uncompressed into it. The temporary file is kept around for a while afterwards, and during that time if the uncompressed `.bci' file is needed the temporary file is used. Each such reference updates an `access time' that is associated with the temporary file. The garbage collector checks the access times of all such temporary files, and deletes any that have not been accessed in five minutes or more. All of the temporaries are deleted automatically when the Scheme process is killed.

Two other behaviors are available. One of them uncompresses the `.bci' file each time it is referenced, and the other uncompresses the `.bci' file and writes it back out as a `.bif' file. The `.bif' file remains after Scheme exits. The time interval and the behavior are controlled by variables.

variable+: *save-uncompressed-files?*

This variable affects what happens when `.bci' files are uncompressed. It allows a trade-off between performance and disk space. There are three possible values:

#f: The uncompressed versions of `.bci' files are never saved. Each time the information is needed the `.bci' file is uncompressed. This option requires the minimum amount of disk space and is the slowest.
automatic: Uncompressed versions of `.bci' files are kept as temporary files. The temporary files are deleted when Scheme exits, and if they have not been used for a while. This is the default.
#t: The `.bci' files are uncompressed to permanent `.bif' files. These files remain on disk after Scheme exits, and are rather large - about twice the size of the corresponding `.bci' files. If you choose this option and you are running out of disk space you may delete the `.bif' files. They will be regenerated as needed.

variable+: *uncompressed-file-lifetime*: The minimum length of time that a temporary uncompressed version of a `.bci' file will stay on disk after it is last used. The time is in milliseconds; the default is `300000' (five minutes).

variable+: load-debugging-info-on-demand?: If this variable is `#f', then printing a compiled procedure will print the procedure's name only if the debugging information for that procedure is already loaded. Otherwise, it will force loading of the debugging information. The default value is #f.

procedure+: sf filename [destination]

sf is the program that transforms a source-code file into binary SCode form; it is used on machines that do not support native-code compilation. It performs numerous optimizations that can make your programs run considerably faster than unoptimized interpreted code. Also, the binary files that it generates load very quickly compared to source-code files.

The simplest way to use sf is just to say:

(sf filename)

This will cause your file to be transformed, and the resulting binary file to be written out with the same name, but with pathname type "bin". If you do not specify a pathname type on the input file, "scm" is assumed.

Like load, the first argument to sf may be a list of filenames rather than a single filename.

sf takes an optional second argument, which is the filename of the output file. If this argument is a directory, then the output file has its normal name but is put in that directory instead.

Declarations

Several declarations can be added to your programs to help cf and sf make them more efficient.

Standard Names

Normally, all files have a line

(declare (usual-integrations))

near their beginning, which tells the compiler that free variables whose names are defined in system-global-environment will not be shadowed by other definitions when the program is loaded. If you redefine some global name in your code, for example car, cdr, and cons, you should indicate it in the declaration:

(declare (usual-integrations car cdr cons))

You can obtain an alphabetically-sorted list of the names that the usual-integrations declaration affects by evaluating the following expression:

(eval '(sort (append usual-integrations/constant-names
                     usual-integrations/expansion-names)
             (lambda (x y)
               (string<=? (symbol->string x)
                          (symbol->string y))))
      (->environment '(scode-optimizer)))

In-line Coding

Another useful facility is the ability to in-line code procedure definitions. In fact, the compiler will perform full beta conversion, with automatic renaming, if you request it. Here are the relevant declarations:

declaration+: integrate name ...: The variables names must be defined in the same file as this declaration. Any reference to one of the named variables that appears in the same block as the declaration, or one of its descendant blocks, will be replaced by the corresponding binding's value expression.

declaration+: integrate-operator name ...: Similar to the integrate declaration, except that it only substitutes for references that appear in the operator position of a combination. All other references are ignored.

declaration+: integrate-external filename

Causes the compiler to use the top-level integrations provided by filename. filename should not specify a file type, and the source-code file that it names must have been previously processed by the compiler.

If filename is a relative filename (the normal case), it is interpreted as being relative to the file in which the declaration appears. Thus if the declaration appears in file `/usr/cph/foo.scm', then the compiler looks for a file called `/usr/cph/filename.ext'.

Note: When the compiler finds top-level integrations, it collects them and outputs them into an auxiliary file with extension `.ext'. This `.ext' file is what the integrate-external declaration refers to.

Note that the most common use of this facility, in-line coding of procedure definitions, requires a somewhat complicated use of these declarations. Because this is so common, there is a special form, define-integrable, which is like define but performs the appropriate declarations. For example:

(define-integrable (foo-bar foo bar)
  (vector-ref (vector-ref foo bar) 3))

Here is how you do the same thing without this special form: there should be an integrate-operator declaration for the procedure's name, and (internal to the procedure's definition) an integrate declaration for each of the procedure's parameters, like this:

(declare (integrate-operator foo-bar))

(define foo-bar
  (lambda (foo bar)
    (declare (integrate foo bar))
    (vector-ref (vector-ref foo bar) 3)))

The reason for this complication is as follows: the integrate-operator declaration finds all the references to foo-bar and replaces them with the lambda expression from the definition. Then, the integrate declarations take effect because the combination in which the reference to foo-bar occurred supplies code that is substituted throughout the body of the procedure definition. For example:

(foo-bar (car baz) (cdr baz))

First use the integrate-operator declaration:

((lambda (foo bar)
   (declare (integrate foo bar))
   (vector-ref (vector-ref foo bar) 3))
 (car baz)
 (cdr baz))

Next use the internal integrate declaration:

((lambda (foo bar)
   (vector-ref (vector-ref (car baz) (cdr baz)) 3))
 (car baz)
 (cdr baz))

Next notice that the variables foo and bar are not used, and eliminate them:

((lambda ()
   (vector-ref (vector-ref (car baz) (cdr baz)) 3)))

Finally, remove the ((lambda () ...)) to produce

(vector-ref (vector-ref (car baz) (cdr baz)) 3)

Useful tip

To see the effect of integration declarations (and of macros) on a source file, pretty-print the `.bin' file like this (be prepared for a lot of output).

(sf "foo.scm")
(pp (fasload "foo.bin"))

Operator Replacement

The replace-operator declaration is provided to inform the compiler that certain operators may be replaced by other operators depending on the number of arguments. For example:

Declaration:

(declare (replace-operator (map (2 map-2) (3 map-3))))

Replacements:

(map f x y z) ==> (map f x y z)
(map f x y) ==> (map-3 f x y)
(map f x) ==> (map-2 f x)
(map f) ==> (map f)
(map) ==> (map)

Presumably map-2 and map-3 are efficient versions of map that are written for exactly two and three arguments respectively. All the other cases are not expanded but are handled by the original, general map procedure, which is less efficient because it must handle a variable number of arguments.

declaration+: replace-operator name ...

The syntax of this declaration is

(replace-operator
  (name
    (nargs1 value1)
    (nargs2 value2)
    ...))

where

name is a symbol.
nargs1, nargs2 etc. are non-negative integers, or one of the following symbols: any, else or otherwise.
value1, value2 etc. are simple expressions in one of these forms:

'constant
A constant.
variable
A variable.
(primitive primitive-name [arity])
The primitive procedure named primitive-name. The optional element arity, a non-negative integer, specifies the number of arguments that the primitive accepts.
(global var)
A global variable.

The meanings of these fields are:

name is the name of the operator to be reduced. If is is not shadowed (for example, by a let) then it may be replaced according to the following rules.
If the operator has nargsN arguments then it is replaced with a call to valueN with the same arguments.
If the number of arguments is not listed, and one of the nargsN is any, else or otherwise, then the operation is replaced with a call to the corresponding valueN. Only one of the nargsN may be of this form.
If the number of arguments is not listed and none of the nargsN is any, else or otherwise, then the operation is not replaced.

Operator Reduction

The reduce-operator declaration is provided to inform the compiler that certain names are n-ary versions of binary operators. Here are some examples:

Declaration:

(declare (reduce-operator (cons* cons)))

Replacements:

(cons* x y z w) ==> (cons x (cons y (cons z w))),
(cons* x y) ==> (cons x y)
(cons* x) ==> x
(cons*) error--> too few arguments

Declaration:

(declare (reduce-operator (list cons (null-value '() any))))

Replacements:

(list x y z w) ==> (cons x (cons y (cons z (cons w '()))))
(list x y) ==> (cons x (cons y '()))
(list x) ==> (cons x '())
(list) ==> '()

Declaration:

(declare (reduce-operator (- %- (null-value 0 single) (group left))))

Replacements:

(- x y z w) ==> (%- (%- (%- x y) z) w)
(- x y) ==> (%- x y)
(- x) ==> (%- 0 x)
(-) ==> 0

Declaration:

(declare (reduce-operator (+ %+ (null-value 0 none) (group right))))

Replacements:

(+ x y z w) ==> (%+ x (%+ y (%+ z w)))
(+ x y) ==> (%+ x y)
(+ x) ==> x
(+) ==> 0

Note: This declaration does not cause an appropriate definition of %+ (in the last example) to appear in your code. It merely informs the compiler that certain optimizations can be performed on calls to + by replacing them with calls to %+. You should provide a definition of %+ as well, although it is not required.

Declaration:

(declare (reduce-operator (apply (primitive cons)
                                 (group right)
                                 (wrapper (global apply) 1))))

Replacements:

(apply f x y z w) ==> ((access apply ()) f (cons x (cons y (cons z w))))
(apply f x y) ==> ((access apply ()) f (cons x y))
(apply f x) ==> (apply f x)
(apply f) ==> (apply f)
(apply) ==> (apply)

declaration+: reduce-operator name ...

The general format of the declaration is (brackets denote optional elements):

(reduce-operator
  (name
    binop
    [(group ordering)]
    [(null-value value null-option)]
    [(singleton unop)]
    [(wrapper wrap [n])]
    [(maximum m)]
  ))

where

n and m are non-negative integers.
name is a symbol.
binop, value, unop, and wrap are simple expressions in one of these forms:

'constant
A constant.
variable
A variable.
(primitive primitive-name [arity])
The primitive procedure named primitive-name. The optional element arity specifies the number of arguments that the primitive accepts.
(global var)
A global variable.
null-option is either always, any, one, single, none, or empty.
ordering is either left, right, or associative.

The meaning of these fields is:

name is the name of the n-ary operation to be reduced.
binop is the binary operation into which the n-ary operation is to be reduced.
The group option specifies whether name associates to the right or left.
The null-value option specifies a value to use in the following cases:

none

empty
When no arguments are supplied to name, value is returned.
one

single
When a single argument is provided to name, value becomes the second argument to binop.
any

always
binop is used on the "last" argument, and value provides the remaining argument to binop.
In the above options, when value is supplied to binop, it is supplied on the left if grouping to the left, otherwise it is supplied on the right.
The singleton option specifies a function, unop, to be invoked on the single argument left. This option supersedes the null-value option, which can only take the value none.
The wrapper option specifies a function, wrap, to be invoked on the result of the outermost call to binop after the expansion. If n is provided it must be a non-negative integer indicating a number of arguments that are transferred verbatim from the original call to the wrapper. They are passed to the left of the reduction.
The maximum option specifies that calls with more than m arguments should not be reduced.

Efficiency Tips

How you write your programs can have a large impact on how efficiently the compiled program runs. The most important thing to do, after choosing suitable data structures, is to put the following declaration near the beginning of the file.

(declare (usual-integrations))

Without this declaration the compiler cannot recognize any of the common operators and compile them efficiently.

The usual-integrations declaration is usually sufficient to get good quality compiled code.

If you really need to squeeze more performance out of you code then we hope that you find the following grab-bag of tips, hints and explanations useful.

Coding style

Better predicates

Consider the following implementation of map as might be found in any introductory book on Scheme:

(define (map f lst)
  (if (null? lst)
      '()
      (cons (f (car lst)) (map f (cdr lst)))))

The problem with this definition is that at the points where car and cdr are called we still do not know that lst is a pair. The compiler must insert a type check, or if type checks are disabled, the program might give wrong results. Since one of the fundamental properties of map is that it transforms lists, we should make the relationship between the input pairs and the result pairs more apparent in the code:

(define (map f lst)
  (cond ((pair? lst)
         (cons (f (car lst)) (map f (cdr lst))))
        ((null? lst)
         '())
        (else
         ...) ; You decide - '() or an error?

Note also that the pair? case comes first because we expect that map will be called on lists which have, on average, length greater that one.

Internal procedures

Calls to internal procedures are faster than calls to global procedures. There are two things that make internal procedures faster: First, the procedure call is compiled to a direct jump to a known location, which is more efficient that jumping `via' a global binding. Second, there is a knock-on effect: since the compiler can see the internal procedure, the compiler can analyze it and possibly produce better code for other expressions in the body of the loop too:

(define (map f original-lst)
  (let walk ((lst original-lst))
    (cond ((pair? lst)
           (cons (f (car lst)) (walk (cdr lst))))
          ((null? lst)
           '())
          (else
           (error "Not a proper list:"  original-lst)))))

Internal defines

Internal definitions are a useful tool for structuring larger procedures. However, certain internal definitions can thwart compiler optimizations. Consider the following two procedures, where compute-100 is some unknown procedure that we just know returns `100'.

(define (f1)
  (define v 100)
  (lambda () v))

(define (f2)
  (define v (compute-100))
  (lambda () v))

The procedure returned by f1 will always give the same result and the compiler can prove this. The procedure returned by f2 may return different results, even if f2 is only called once. Because of this, the compiler has to allocate a memory cell to v. How can the procedure return different results?

The fundamental reason is that the continuation may escape during the evaluation of (compute-100), allowing the rest of the body of f2 to be executed again:

(define keep)

(define (compute-100)
  (call-with-current-continuation
   (lambda (k)
     (set! keep k)
     100)))

(define p (f2))

(p)                => 100
(keep -999)        => p     re-define v and p
(p)                => -999

To avoid the inefficiency introduced to handle the general case, the compiler must prove that the continuation cannot possibly escape. The compiler knows that lambda expressions and constants do not let their continuations escape, so order the internal definitions so that definitions of the following forms come first:

(define x 'something)
(define x (lambda (...) ...))
(define (f u v) ...)

Global variables

Compiled code usually accesses variables in top-level first-class environments via variable caches. Each compiled procedure has a set of variable caches for the global variables that it uses. There are three kinds of variable cache - read caches for getting the value of a variable (referencing the variable), write caches for changing the value, and execute caches for calling the procedure assigned to that variable.

Sometimes the variable caches contain special objects, called reference traps, that indicate that the operation cannot proceed normally and must either be completed by the system (in order to keep the caches coherent) or must signal an error. For example, the assignment

(set! newline my-better-newline)

will cause the system to go to each compiled procedure that calls newline and update its execute cache to call the new procedure. Obviously you want to avoid updating hundreds of of execute caches in a critical loop. Using fluid-let to temporarily redefine a procedure has the same inefficiency (but twice!).

To behave correctly in all situations, each variable reference or assignment must check for the reference traps.

Sometimes you can prove that the variable (a) will always be bound, (b) will always be assigned and (c) there will never be any compiled calls to that variable. The compiler can't prove this because it assumes that other, independently compiled, files might be loaded that invalidate these assumptions. If you know that these conditions hold, the following declarations can speed up and reduce the size of a program that uses global variables.

declaration+: ignore-reference-traps variables: This declaration tells the compiler that it need not check for reference-trap objects when referring to the given variables. If any of the variables is unbound or unassigned then a variable reference will yield a reference-trap object rather than signaling an error. This declaration is relatively safe: the worst that can happen is that a reference-trap object finds its way into a data structure (e.g. a list) or into interpreted code, in which case it will probably cause some `unrelated' variable to mysteriously become unbound or unassigned.

declaration+: ignore-assignment-traps variables: This declaration tells the compiler that it need not check for reference-trap objects when assigning to the given variables. An assignment to a variable that ignores assignment traps can cause a great deal of trouble. If there is a compiled procedure call anywhere in the system to this variable, the execute caches will not be updated, causing an inconsistency between the value used for the procedure call and the value seen by reading the variable. This mischief is compounded by the fact that the assignment can cause other assignments that were compiled with checks to behave this way too.

The variables are specified with expressions from the following set language:

variable-specification: set name ...: All of the explicitly listed names.

variable-specification: all
variable-specification: none
variable-specification: free
variable-specification: bound
variable-specification: assigned: These expressions name sets of variables. all is the set of all variables, none is the empty set, free is all of the variables bound outside the current block, bound is all of the variables bound in the current block and assigned is all of the variables for which there exists an assignment (i.e. set!).

variable-specification: union set1 set2

variable-specification: intersection set1 set2

variable-specification: difference set1 set2

For example, to ignore reference traps on all the variables except x, y and any variable that is assigned to

(declare (ignore-reference-traps
          (difference all (union assigned (set x y)))))

Fixnum arithmetic

The usual arithmetic operations like + and < are called generic arithmetic operations because they work for all (appropriate) kinds of number.

A fixnum is an exact integer that is small enough to fit in a machine word. In MIT Scheme, fixnums are typically 24 or 26 bits, depending on the machine; it is reasonable to assume that fixnums are at least 24 bits. Fixnums are signed; they are encoded using 2's complement.

All exact integers that are small enough to be encoded as fixnums are always encoded as fixnums -- in other words, any exact integer that is not a fixnum is too big to be encoded as such. For this reason, small constants such as 0 or 1 are guaranteed to be fixnums. In addition, the lengths of and valid indexes into strings and vectors are also always fixnums.

If you know that a value is always a small fixnum, you can substitute the equivalent fixnum operation for the generic operation. However, care should be exercised: if used improperly, these operations can return incorrect answers, or even malformed objects that confuse the garbage collector. The Scheme Reference Manual lists all the fixnum operations.

A fruitful area for inserting fixnum operations is in the index operations in tight loops.

Flonum arithmetic

Getting efficient flonum arithmetic is much more complicated and harder than getting efficient fixnum arithmetic.

Flonum consing

One of the main disadvantages of generic arithmetic is that not all kinds of number fit in a machine register. Flonums have to be boxed because a 64-bit IEEE floating-point number (the representation that MIT Scheme uses) does not fit in a regular machine word. This is true even on 64-bit architectures because some extra bits are needed to distinguish floating-point numbers from other objects like pairs and strings. Values are boxed by storing them in a small record in the heap. Every floating-point value that you see at the REPL is boxed. Floating-point values are unboxed only for short periods of time when they are in the machine's floating-point unit and actual floating-point operations are being performed.

Numerical calculations that happen to be using floating-point numbers cause many temporary floating-point numbers to be allocated. It is not uncommon for numerical programs to spend over half of their time creating and garbage collecting the boxed flonums.

Consider the following procedure for computing the distance of a point (x,y) from the origin.

(define (distance x y)
  (sqrt (+ (* x x) (* y y))))

The call (distance 0.3 0.4) returns a new, boxed flonum, 0.5. The calculation also generates three intermediate boxed flonums. This next version works only for flonum inputs, generates only one boxed flonum (the result) and runs eight times faster:

(define (flo:distance x y)
  (flo:sqrt (flo:+ (flo:* x x) (flo:* y y))))

Note that flo: operations are usually effective only within a single arithmetic expression. If the expression contains conditionals or calls to procedures then the values tend to get boxed anyway.

Flonum vectors

Flonum vectors are vectors that contain only floating-point values, in much the same way as a string is a `vector' containing only character values.

Flonum vectors have the advantages of compact storage (about half that of a conventional vector of flonums) and judicious use of flonum vectors can decrease flonum consing.

The disadvantages are that flonum vectors are incompatible with ordinary vectors, and if not used carefully, can increase flonum consing. Flonum vectors are a pain to use because they require you to make a decision about the representation and stick with it, and it might not be easy to ascertain whether the advantages in one part of the program outweigh the disadvantages in another.

The flonum vector operations are:

procedure+: flo:vector-cons n: Create a flonum vector of length N. The contents of the vector are arbitrary and might not be valid floating-point numbers. The contents should not be used until initialized.

procedure+: flo:vector-ref flonum-vector index
procedure+: flo:vector-set! flonum-vector index value
procedure+: flo:vector-length flonum-vector: These operations are analogous to the ordinary vector operations.

Examples

The following operation causes no flonum consing because the flonum is loaded directly from the flonum vector into a floating-point machine register, added, and stored again. There is no need for a temporary boxed flonum.

(flo:vector-set v 0 (flo:+ (flo:vector-ref v 0) 1.2))

In this next example, every time g is called, a new boxed flonum has to be created so that a valid Scheme object can be returned. If g is called more often than the elements of v are changed then an ordinary vector might be more efficient.

(define (g i)
  (flo:vector-ref v i))

Common pitfalls

Pitfall 1: Make sure that your literals are floating-point constants:

(define (f1 a) (flo:+ a 1))
(define (f2 a) (flo:+ a 1.))

f1 will most likely cause a hardware error, and certainly give the wrong answer. f2 is correct.

Pitfall 2: It is tempting to insert calls to exact->inexact to coerce values into flonums. This does not always work because complex numbers may be exact or inexact too. Also, the current implementation of exact->inexact is slow.

Pitfall 3: A great deal of care has to be taken with the standard math procedures. For example, when called with a flonum, both sqrt and asin can return a complex number (e.g -1.5).

Go to the first, previous, next, last section, table of contents.