Go to the first, previous, next, last section, table of contents.

A Simple Scheme Compiler

[ The example compiler in compiler.scm is the skeleton of a simple compiler for a subset of Scheme, whose structure corresponds fairly closely to the example interpreter in eval.scm. ]

[ this is out of place, or needs more introductory intro first: ]

Where the interpreter has a basic dispatch routine called eval, which can evaluate any kind of expression, the compiler has a basic dispatch routine called compile, which can compile any kind of expression. Like eval, compile examines the expression to be compiled, and dispatches to an appropriate routine for that kind of expression. The routine that compiles an expression may recursively call compile to compile subexpressions, just as the interpretive evaluator may call eval recursively to evaluate subexpressions.

What is a Compiler?

[ This is somewhat redundant with earlier stuff, but more concrete. Should I cut it down?]

Before answering what a compiler is, let's backtrack and talk about interpreters.

What is an Interpreter?

An interpreter really does two things:

  1. it examines expressions and dispatches to the appropriate code for that kind of expression
  2. it performs the actual operations specified by the program

Typically, most of the work done by an interpreter is in the first category--our example interpreter, for example, spends a lot of time examining expressions to see if they're self-evaluating or symbols or lists, and dispatching to the appropriate procedure to evaluate that kind of expression. This dispatching is interleaved with the actual work that evaluates the expressions.

One of the problems with an interpreter is that every time an expression is encountered, it is analyzed again. Consider an expression like (+ foo bar) embedded in a loop that iterates many times. Our interpreter will encounter this expression at each iteration of the loop, and at each iteration of the loop it will do mostly the same things: it will examine the expression and find out it's a list, then call eval-list, which will further examine it to find out it's a combination (not a special form or macro), and call eval-combo. Then eval-combo will call eval recursively to evaluate the subexpressions, and each call to eval will examine the subexpressions and dispatch to the appropriate specialized evaluation routine. Only then do we start actually computing the value of the expression, by computing the values of the subexpressions +, foo, and bar, i.e., looking up the values of those variables.

OK, so what's a compiler?

We would rather factor out most of this redundant work, and examine the expression just once to see what to do. Then each time we "evaluate" the expressions, we can just do those things. For the expression (+ foo bar), the set of actions an interpreter will execute (leaving out all of the analysis and dispatching is):

     look up variable bar
     look up variable foo
     look up variable +
     apply

(Here we've assumed that we evaluate subexpressions of a combination from right-to-left, rather than the more intuitive left-to-right order; that's a legal way to do it in Scheme an it turns out to be handy in a very simple compiler, as we'll explain in a minute.)

[ maybe I should change this to do args left-to-right, but the operator last, like RScheme. ]

For code like this, which doesn't have any conditionals in it, we can convert an interpreter into a compiler very easily. We just modify the interpreter so that instead of actually evaluating the expressions, it just records what operations it would execute if it were interpreting the expression. I'm intentionally being vague as to how exactly these simple operations (like look-up-variable) work, but you should be able to see that each of them should be translatable into a handful of machine instructions. That's how most compilers work: they first translate a program into an intermediate code representation, like our look-up-variable operations, and then translate that representation into machine instructions. (In between there may be one or more steps that "optimize" the intermediate code, and each step may represent the code in a different way.)

So this simple compiler just "pretends" to evaluate the expression, but whenever it gets to an actual action (like looking up a variable, or calling a procedure), it simply records what action it would take if it were just an interpreter. The result is a list of actions which, if taken, will have the same effect as interpreting the expression.

Here's another example:

(let ((x 22)
      (y 15))
   (+ x (* x y)))

Supposing that our intermediate code representation is a sequence of lists that represent operations and their operands, the code that our simple compiler will generate is:

  (fetch-literal 22)
  (fetch-literal 15)
  (bind x y)
  (look-up-variable y)
  (look-up-variable x)
  (look-up-variable *)
  (call-procedure)
  (look-up-variable x)
  (call-procedure)
  (unbind)

Later on, we'll talk in more concrete detail about where values are temporarily stored when they're looked up, and various tweaks to make it possible to translate intermediate code into smaller and faster sequences of machine instructions. For now just notice that we can string together sequences of these intermediate code operations, and if we just translate each of them into some machine instructions, we can string those sequences of machine instructions together and get a larger sequence that we can execute to evaluate the whole expression. We can execute it as many times as we want, and all of the expression analysis and dispatching will already have been done--the only work done each time it's executed is the work that actually binds variables, looks up values, calls procedures, and so on.

It's not much harder to compile conditional expressions like if. When we compile an if, we need to generate code for the condition expression, the consequent expression, and the alternative expression. (The "consequent" is the code executed if the condition is true, and the "alternative" is the code executed if it's false.) Then we need to string the code for those expressions together appropriately with some conditional branches:

   <code for condition>
   (branch-if-false "else-action-label")
   <code for consequent>
   (branch-unconditionally "end-label")
"else-action-label"
   <code for alternative>
"end-label"

The labels here will actually be translated into the addresses of the code they label, and the branches will be filled in with those addresses. (We have to be careful to use a unique pair of new labels for each if we compile, so or some other trick like that, so that we can nest if expressions and keep their labels straight.)

(One way of generating machine code from this representation is to translate each of the statements into a short sequence of assembly instructions and each label into an assembly label, stringing them together as shown. Then the assembly code can be assembled into machine code.)

Note that for an if, the control structure of the compiler is actually simpler than the control structure of an interpreter. The interpreter will evaluate the condition expression, and then decide at run time whether to evaluate the consequent ("then") expression or the alternative ("else") expression. The compiler will always compile all three subexpressions, and string them together with conditional branches that will do the deciding at run time, based on the runtime value computed by the code for the condition expression.

Here's a slightly more complicated example:

(let ((x 15))
   (if x
       (let ((y (* 2 x)))
          (+ x y))
       #t))

translates to intermediate code roughly like:

   (fetch-literal 15)
   (bind x)
   (lookup-variable x)
   (branch-if-false "else22")
   (lookup-variable x)
   (fetch-literal 2)
   (lookup-variable *)
   (call-procedure)
   (bind y)               ; create and enter envt that binds y
   (lookup-variable y)
   (lookup-variable x)
   (lookup-variable +)
   (call-procedure)
   (unbind)               ; exit envt that binds y
   (branch "end22")
 "else22"
   (fetch-literal #t)
 "end22"          

There are actually a couple of minor things wrong with the code we've generated, but this is pretty close to a workable intermediate representation.

What Does a Compiler Generate?

[ Talk about machine code, interpretive virtual machines, etc. ]

Basic Structure of the Compiler

The main function of the compiler is compile, which generates intermediate code for an expression, and which may call itself recursively to generate code for subexpressions.

Calls to compile hand it an expression and some bookkeeping information we'll discuss later. Compile returns intermediate code, plus updated versions of some of the bookkeeping information.

To start this process off, top-level forms (like the ones you type into the read-compile-run-print loop, or definitions of top-level procedures) are massaged a little, then intermediate code for them is generated. Then the intermediate code is converted into real executable code and packaged up as a closure that can be called.

We will discuss these issues of massaging top-level forms and generating executable closures later; for now, the main thing to understand is the recursive generation of intermediate code for nested expressions.

Here's the main dispatch routine of the compiler, which is analagous to the interpreter's eval:

(define (compile expr c-t-envt literal-state c-t-cont)
   (cond ((symbol? expr)
          (compile-symbol expr c-t-envt literal-state c-t-cont))
         ((pair? expr)
          (compile-list expr c-t-envt literal-state c-t-cont))
         ((self-evaluating? expr)
          (compile-self-eval expr c-t-envt literal-state c-t-cont))
         (#t
          (syntax-error "Illegal expression form" expr))))

For now, ignore most of the arguments to compile, we'll explain them later. The main thing to notice is that it looks a lot like eval.

[ blah blah blah...]

[ Somewhere, it's important to bring out the difference between the mutual recursion of eval and apply in the interpreter and the way the compiler works. Eval recurs locally, but just generates code for apply... The control structure of the compiler is actually simpler than for the interpreter, because the hairy stuff just happens at run time... ]

Data Representations, Calling Convention, etc.

Before trying to understand the compiler itself in more detail, it is probably best to have a concrete idea of what the representations of data are, how procedure calls work, and how registers are used. That is, you have to understand the "abstract machine" that the compiler compiles for.

An abstract machine is an abstraction of low-level operations and objects. The compiler first compiles code from the source language into this lower-level representation, and then translates the "abstract machine language" into actual executable code. (The executable code may be machine code that runs directly on the hardware, or an interpretive executable code such as bytecodes, which are interpreted by a fast interpreter.)

You can think of an abstract machine as being more like an assembler than an interpreter, but maybe a little smarter than most assemblers.

I will describe one particular set of features to make things concrete; this is not quite how RScheme works, or Scheme-48, or any particular other system that I know of, but there's nothing unusual about it except maybe its simplicity.

In fleshing out our example compiler, let's suppose our system works this way:

  1. We have several important registers used in stereotyped ways, e.g., to hold a pointer to the current binding environment.
  2. We have an evaluation stack that's used to store intermediate values while evaluating nested expressions.
  3. We use a continuation chain to represent the saved state of callers, their callers, and so on, so that they can be resumed after a procedure returns.

The registers of the abstract machine may represent hardware registers, or just storage locations that are used in these stereotyped ways. (For example, if compiling to C, the registers might be C global variables, and the C compiler might or might not let you specify that the variables should be put in hardware registers.)

The Registers

We'll assume that there are several important registers that can be used by the code our compiler generates:

  1. The VALUE register, where an expression leaves a value so that it can be used by an enclosing expression. In The case of a procedure, this is where the value is left for the caller when the procedure returns. The value register is also used when calling a procedure.
  2. The ENVT register, which holds a pointer to the environment that code is currently executing in.
  3. The CONT register, which holds a pointer to the chain of saved continuations of callers.
  4. The TEMPLATE register, which holds a pointer to a special data structure associated with the procedure that is currently executing, and
  5. The PC (program counter) register, which says which instruction we are currently executing. (If we're compiling to normal machine code, this is a special register built into the CPU for this purpose, and we use it pretty much like any other code would. If we're compiling to an interpretive executable code, this is probably variable in the interpreter.)

The Evaluation Stack (or Eval Stack, for short)

The eval stack is used for holding values that have been computed by evaluating subexpressions, but not yet used or bound.

In evaluating the expression (+ foo 22), the three values will be computed. When each value is computed, it will be left in the VALUE register. We evaluate right to left, and after evaluating each argument, we perform a PUSH operation on the eval stack, which copies the value in the value register onto the eval stack. When we get to the first subexpression (the one that's supposed to return a function to call), we leave the value in the value register, because that's where we put the closure pointer for a procedure call.

The eval stack is used for two main purposes:

  1. storing intermediate values for nested expressions, and
  2. passing arguments to procedures.

The eval stack is not used to hold intermediate values or local variables for suspended procedures--it isn't like the activation stack in a conventional implementation of C or Pascal. The values in the eval stack at any given time are only the intermediate values stored for the currently executing procedure. Intermediate values for suspended procedures are saved in the continuation chain as necessary.

When we call a procedure, the only values on the eval stack are the arguments to the procedure. Any other values used by the caller are moved from the eval stack into a continuation before calling.

The Continuation Chain

The continuation chain is a data structure that fills roughly the role of an activation stack in the implementation of a conventional programming language. The continuation chain is a linked list of partial continuations, each of which is a record that stores the saved state of a procedure.

When a procedure performs a non-tail procedure call, it packages its important state information up into a partial continuation; this record saves the values of the environment, template, PC, and continuation registers, and any temporary values on the eval stack.

Once a caller has saved its state in a partial continuation, then the callee can do whatever it wants with the important registers and the evaluation stack. (This is called a caller-saves register usage convention, because the caller of a procedure is obliged to save any values that it will need when it resumes.)

Remember that continuations are allocated on the garbage collected heap and are immutable--we never modify a continuation once it's created. When we resume from a saved partial continuation, we copy the values from the partail continuation into the registers and eval stack, but that doesn't modify the partial continuation itself--it's still sitting out there on the heap. This is important for being able to implement call-with-current-continuation: it's what allows us to resume from the same continuation any number of times.

Environments

The compiler assumes that a binding environment is a chain of frames, each of which is a vector of slots which are the variable bindings. Each frame also has a static link or scope link field, which points to the frame representing the next lexically enclosing environment.

Top-level environments are represented specially, as hash tables that map names to bindings. We'll use a hash table instead of the association lists we used in our simple interpreter, because they're faster if you have a lot of bindings. A binding object for a top-level environment is pretty much the same as in the interpreter: a little vector with two important slots: a slot for its name and another slot that is the actual value field.

Local environments are represented very differently from toplevel environments: each frame is a vector of slots, and does not store the names of the bindings. It turns out that the names are only needed at compile time, so they don't actually have to be stored in the runtime environment. (The compiler also turns out to be able to do most of the work for looking up a toplevel variable at compile time, so the speed of our hash tables is not going to be critical to our runtime speed.)

Closure Representation and Calling

Closures in our system are represented as objects with two fields: a pointer to the environment captured by the closure, and a pointer to an object called a template, which in turn contains a pointer to the code for the procedure.

When we evaluate the following expression

(let ((foo 22)
      (bar 15))
   (lambda (...)
       ...))

we'll create an environment frame to hold the bindings of foo and bar, and initialize them to 22 and 15, respectively. This environment frame will have a scope link pointing to the environment we were executing in when we entered the let. Inside this environment, we'll create a closure. The closure will hold a pointer to the new environment, and a pointer to a template object representing the anonymous procedure being closed. The template will have a pointer to the actual executable code. All of these things will be heap-allocated objects, and in our implementation we'll give each one header field showing what kind of object it is:

                                          <to envt. frame
                                           for enclosing
                                              scope>
                                               
                                                ^
                                                |
                                +----------+    |
                                | envt. fr.|    |
                        ,------>+==========+    |
                       /  scope |     +----+----+       
     +----------+     /         +----------+
     | closure  |    /      foo |       22 |
     +==========+   /           +----------+
envt |     +----+--'        bar |       15 |
     +----------+               +----------+     
proc |     +----+--.    
     +----------+   \      +----------+   
                     \     | template |       +----------+
                      `--->+==========+       |   code   |
                      code |     +----+-----> +==========+
                           +----------+       |executable|
                           |          |       + code for +
                           +----------+       |procedure |
                           |          |       +          +
                           +----------+       |   ...    |
                           |   ...    |       +----------+
                           +----------+

The template object holds not only the pointer to the actual code, but any other handy values that the compiler can compute or look up at compile time, and which should be available to the procedure at run time. We'll discuss that more later.

When we want to apply a procedure to some argument values, we put the argument values on the eval stack, and a pointer to the closure we want to call in the VALUE register. Then we execute a short sequence of instructions that does the call:

Thus actual machine code for our "apply" operation in our intermediate representation is just a handful of instructions that do this stuff--a stereotyped little instruction sequence that destructures a closure and puts the appropriate values in registers before beginning execution of the procedure.

Because this is the way the procedure calling convention works, we know that when we begin executing the code for a procedure, the environment register will point to the right environment (where the procedure was defined) and the template register will point to the template for that procedure. Any values stored in the template by the compiler can be fetched at compile time by doing an indexed load with the template register as a base.

Consider this procedure:

(define (foo x y)
   (list 'bar x y))

Here, the literal bar is needed by the procedure--there must be some code in foo that will somehow fetch a pointer to the symbol bar. That's what the template object is for. When this procedure is compiled, the compiler accumulates a list of such literals, and when the template object for the procedure is created, all of those values will be stored into it. When the compiler generates code to fetch the symbol bar, it just looks at the symbol's position in the literal list (and thus its offset in the template object), and generates code to do an indexed load to fetch that value from the template at run time.

Continuations

Applying a Procedure Doesn't Save the Caller's State

Remember that when we do a procedure call, we do not necessarily save the state of the caller. For a non-tail call, the compiler must generate code to save the caller's state plus code for the actual call. For a tail call, there is no need to save the state. Because of this, there isn't really a single "procedure call" operation that saves the caller's state and invokes the callee. There are two separate operations, save-continuation and apply.

As mentioned above, the code sequence that performs a procedure applicatin assumes that the pointer to the closure to be called is in the VALUE register. The procedure will leave its value in that register when it returns.

Continuation Saving

save-continuation is the operation that saves the state of the currently executing procedure in a partial continuation, and pushes it onto the continuation chain.

When pushing a continuations, it is important to save all of the values on the eval stack, except for the arguments to the procedure being called. Therefore, when generating code for a combination (procedure call) expression, the code to save the caller's state does not come just before the actual code to call the procedure. This would remove the arguments to the procedure from the eval stack. Instead, the continuation save comes just before the code that generates the argument values that will be passed to the procedure:

  (save-continuation <label>) ; save everything else before computing args
  <compute argn>
    ...
  <compute arg1>
  <compute callee>
  (apply)
<label>

that way, the arguments to the call (and nothing else) will be on the eval stack when the procedure is called, and when the procedure returns, it will restore the other values from the partial continuation onto the eval stack.

This separation of the saving and calling code looks especially funny for nested expressions that call procedures, but it makes perfect sense.

save-continuation takes an argument which is the address of the code to execute when the continuation is resumed. This address is saved in the partial continuation, and when the continuation is resumed, it will be branched to (put in the PC register).

An Example

Now that we have a more detailed idea how the registers, eval stack, and continuations work, here's an example:

   (+ (- a b) (/ c d))

compiles to intermediate code something like:

  (push-continuation "resume23") ; save cont for call to +
  (push-continuation "resume24") ; save cont for call to -
  (lookup-variable d)   ; get value of d into value reg
  (push)                ; push value of d on eval stack
  (lookup-variable c)   ; get value of c into value reg
  (push)                ; push value of c on eval stack
  (lookup-variable /)   ; look up /
  (apply)               ; call /, which is in value reg after lookup
  (push)                 
"resume24" 
  (push-continuation "resume25") ;save cont for -, incl. value of (/ c d)
  (lookup-variable b)   ; get value of b
  (push) 
  (lookup-variable a)   ; get value of a
  (push)
  (lookup-variable -)   ; get value of -
  (apply)               ; call -
  (push)                ; push returned value on top of restored e stack     
"resume25"
  (lookup-variable +)   ; look up + 
  (apply)               ; tail call +
"resume23"  

Things to notice:

  1. after the first apply, the called routine (or something it directly or indirectly tail calls) will eventually do a procedure return, and pop the latest continuation we pushed, restoring anything that was on the eval stack at that point, and resuming execution at label1. [ OOPS... fix this ]
  2. after the second apply, the called routine will eventually (directly or indirectly) do a procedure return, which will pop the second continuation we pushed, restoring the already-computed value of the subexpression (/ c d) to the eval stack.
  3. we generated code for the expression (+ (- a b) (/ c d)) to be used in tail position. This code doesn't save a continuation before the final call to +. If the expression is to be used in non-tail position, we must generate slightly different code, which will save a continuation that will resume after this expression.

Generating Unique Labels

[ where does this go? ]

Like compile-if, compile-combo generates labels as necessary to be able to name the code where execution should be resumed after a call--in the code it generates, it puts the label just before the intermediate code instruction to resume, and the same label in the call to save-continuation that should resume there.

It is easy to generate unique labels for every resumable point in a program. We just keep a counter of labels we've used so far, and to create a new one we append the digits representing this number to the string "resume", so that we get "resume1", "resume2", and so on.

We can write a Scheme procedure, generate-label, which keeps a counter, and when given a string as an argument, returns the a new string with the same characters plus the digits representing the number in the counter. That way, we can use labels that start with "else" and "end" to label the branch targets of an if expression, and labels that start with "resume" to represent the resumption points for continuation saving. This makes the intermediate code we generate fairly understandable, while ensuring that labels are still unique, and easy to use as assembler labels when translating intermediate code to machine language.

More on Representations of Environments

To get reasonable performance for our system, we'll want to treat the top-level environment differently from local variable binding environments. We'll use a trick involving lexical scope to precompute most of the work done in looking up a local variable binding, and a different trick to make it fast to look up top-level variables.

As we said before, each local variable binding contour (e.g., the bindings introduced by a let, or by binding the args to a procedure) is represented at run time as a frame with slots for each variable, plus a scope link that points to the frame representing the enclosing contour.

A top-level environment is likely to be large, and we will want to be able to access it in special ways. We will represent it as a hash table that maps symbols (variable names) to their toplevel bindings. The bindings themselves will be represented as objects, whose main function is to have one field that holds the current value of the variable. For simplicity, we'll make them self-identifying as well: not only will the names be used as keys in the hash table, but the binding objects will hold pointers to their names as well as values.

Keep in mind that this representation is just one that's convenient. A toplevel environment could be represented as any kind of table (e.g., an association list), but we want it to be reasonably fast to access even if there are thousands of top-level variables. (We'll use a special trick to make normal accesses to top-level variable bindings very fast at run time, but we want to make them reasonably fast at compile time as well, and hash tables are good for that.)

Suppose we evaluate the following expressions at the top level, to define and initialize a couple of variables:

(define quux "fubar")

(define (double x) (+ x x))

This will modify the toplevel environment by adding bindings for quux and double, in addition to what's already there:

   +-------------------------------------------------------------------+
   |                                                                   |
   |                                                                   |
  \|/                                                                  |
   +------------------+                                                |
   |   toplevel env.  |                                                |
   +==================+         +----------+                           |
   |   cons |    *----+-------->| binding  |                           |
   +--------+---------+         +==========+                           |
   |        |         |   value |     *----+------><closure for cons>  |
        .        .              +----------+                           |
        .        .         name |     cons |                           |
        .        .              +----------+                           |
                                                                       |
   |        |         |         +----------+                           |
   +--------+---------+         | binding  |                           |
   |   quux |    *----+-------->+==========+                           |
   +--------+---------+   value |     *----+------>"fubar"             |
   |        |         |         +----------+                           |
        .        .         name |     quux |                           |
        .        .              +----------+                           |
        .        .                                                     |
                                                                       |
   |        |         |         +----------+                           |
   +--------+---------+         | binding  |          +----------+     |
   | double |    *----+-------->+==========+          | closure  |     |
   +--------+---------+   value |     *----+--------->+==========+     |   
   |        |         |         +----------+     envt |     *----+-----+
        .        .         name |  double  |          +----------+
        .        .              +----------+     proc |     *----+---> ...
        .        .                                    +----------+

Several things to note:

Compiling Code for Literals

When the compiler compiles code for a literal, like 'foo or "foo" or 22 in the following expression,

(list 'foo "foo" 22)

it notices which literals the expression will need at run time, and ensures that those literals will appear in the template of the procedure. It keeps a list of literals needed by the procedure it's compiling, and after compiling the code for the procedure, it uses that list to construct the template that goes with the code.

If foo is the first literal encountered, it might be put into the list first, and assigned the first free slot in the template (after the code pointer). "foo" might be assigned the second slot, and so on. In the terminology of language implementation, the template acts as a literal frame, as well as holding the pointer to the procedure's code.

After assigning a literal a position in the template, the compiler can generate one or two instructions that can fetch the value out of the template, by using the address of the template, adding the offset of that slot, and loading from the resulting address. Since the template address is guaranteed to be in the TEMPLATE register, this is probably just a single indexed load instruction. In pseudo-C, it might look like:

  value_register = *(template_register + offset)

where offset is computed by the compiler and therefore is probably an immediate operand to the load instruction that loads the value into the value register.

Notice that here we're taking advantage of the fact that the compiler runs in our system, and generates code that's just data in our system. The code will run in the same heap, and the compiler can therefore just compute values and put them in the template, and they'll stay around until that code is executed. (Life gets a little more complicated if you want to generate code that will be loaded into a different system and executed there.)

[ Now we should explain that the literal-state argument to compile is the list of literals seen so far in compiling a procedure. The return value of compile is intermediate code that includes an updated literal-state. ]

Compiling Code for Top-Level Variable References

Because of lexical scoping, it is easy for the compiler to tell the difference between a reference to a top-level variable binding and a reference to a local variable. We'll talk about that in detail later, but for now just assume that the compiler knows the difference at the point where it generates code for a variable reference.

When the compiler generates code for a top-level variable, it can usually look up the binding of that variable in the environment that the code is being generated for--the expression that defines the variable has already been executed, so the binding already exists.

The compiler can therefore do the actual lookup at compile time, e.g., hashing into the hash-table that implements a toplevel environment and getting a pointer to the actual binding object that will be referenced at run time.

To make references to this object fast, the compiler can simply put this pointer in the template for the procedure being compiled, as though it were a literal value.

Be clear on what's going on here: the compiler can't look up the value of the variable, because that's not known until the moment the variable is referenced at run time. (Before the code is executed, some other piece of code might run and change the value stored in the binding.) But the identity of the binding itself is known, and can be stashed in the literal frame.

Actually, it's just slightly more complicated than this. A variable can be used in a procedure definition before the variable itself is defined. (This is called a "forward reference.") To get around this, the compiler "cheats," and goes ahead and creates the binding object and its entry in the toplevel environment before the definition of the variable is actually encountered. At the language level, the variable hasn't been bound or given a value, but we go ahead and create the unique binding object and use it in compiling other expressions. For error checking, we put a special value in the binding to indicate that the binding isn't "real" yet--we put a reference to some object we consider "not a real value," so that we can detect uses of a variable that isn't really bound.

(In a system designed for maximum safety and early error checking, we could ensure that each reference to a toplevel variable would check for this value, and signal an error if it's found. If we're not quite so concerned with early error checking, we can wait until somebody attempts to use such a value, e.g., by adding it to something, or taking the car of it, and we rely on the normal type checking of the language to tell us something's wrong at the point that operation occurs.)

Now consider compiling a procedure like

(define (make-foo-list)
   (list 'foo "foo"))

The compiler will accumulate a list of toplevel bindings and literals needed for the procedure, namely a string "foo", the symbol foo, and toplevel binding of the symbol list. These will be put in the template for the procedure, in the first, second, and third slots after the code pointer. The code generated for this procedure (assuming right-to-left evaluation) will be something like:

  (fetch-literal 1) ; get string "foo" from template slot 1
  (push)            ; push it on eval stack
  (fetch-literal 2) ; get symbol foo from template slot 2
  (push)            ; push it on eval stack
  (fetch-literal 3) ; get toplevel binding of list from template slot 3
  (t-l-bdg-get)     ; extract value from binding
  (apply)           ; (tail-)call list

Notice, of course, that we've made our intermediate code representation more concrete now--we use slot numbers as the arguments to fetch-literal, and we explicitly get the value of the toplevel variable from the toplevel binding object in the value register. For setting the value in a binding, we'll use a different intermediate code instruction, t-l-bdg-set! (t-l-binding-set! expects the value register to hold a pointer to a toplevel binding object; it extracts the value of the binding, and leaves that value in the value register.)

[ Now we can explain more about literal states--we accumulate a list of literal values and top-level variable bindings that must be accessible when the procedure runs. ]

By now it should be very clear how you would translate each of these little operations in our intermediate representation into a few assembly-language instructions.

[ need picture? ]

Precomputing Local Variable Lookups using Lexical Scope

We can't really look up local variable bindings at compile time the way we can toplevel bindings--local variable bindings don't exist yet when we're compiling the expressions that create and use them. (Consider the fact that every time you enter a let, or call a procedure that binds arguments, a new binding environment frame is created. Code that executes in such environments will see a different binding environment frame in the environment register each time it runs.)

What we can do is take advantage of lexical scope to precompute most of the search for a variable in an environment.

Consider the execution of this procedure:

  (lambda (x y)
     (let ((a <some-expression>
           (b <some-expression>))
        (list a b x y)))

When we compute the arguments to the call to list, it's obvious that the first and second variables (a and b) will be in the first and second slots of the first binding environment frame, pointed directly to by the ENVT register. This is the environment created by the let. The third and fourth variables (x and y) will be in the next environment frame, pointed to by the scope link of the first.

The compiler can compute the lexical address of each variable binding at the point where a reference to it is's compiled--that's the relative location of the variable starting from the environment register. A lexical address has two parts: the number of environment frames to skip to find the right frame, and the offset of the binding in that frame. In the above example, the lexical addresses are:

   a:  0,1
   b:  0,2
   x:  1,1
   y:  1,2

(We use the convention that frame numbers start at zero, but slot numbers appear to start at 1 because the scope link is in slot 0.)

The code generated for the reference to a can simply be an indexed load instruction, using the environment register plus an offset to grab the value in the first variable binding slot. In pseudo-C, that's something like

  value_register = *(envt_register + offset)

where offset is probably 4 (bytes) to index past the scope link slot. Slightly more abstractly, its lexical address is [ WHAT? ]

The code for the reference to the variable y would involve one level of indirection--first the scope link pointer must be extracted from the first environment frame, and then that can be used for an indexed load to get the value of the second slot of the second frame:

  value_register = *(envt_register)  ; get ptr to 2nd envt frame
  value_register = *(envt_register + offset)

where offset is probably 8 (bytes) to index past the scope link and the binding of x.

Given this scheme, accessing a local variable takes time proportional to the number of environment frames intervening between between the expression being compiled and the environment where the referenced variable is defined. That's usually fairly fast, for two reasons:

  1. The depth of lexical nesting is usually small--it corresponds to the nesting of binding expressions in the program, and is usually between one and three, an only rarely greater than five or so.
  2. Most references that are executed at run time are to variables in the current scope, or maybe a level or two back from that. (Consider references to variables in inner loops, which constitute the most frequently-executed code in most programs.)

For these reasons, most references to local variables will take between one and five instructions. To a first approximation, the time to reference local variables can be regarded as a small constant. (A slightly snazzier compiler can reduce this by using more registers, and leaving many values in registers instead of pushing and popping them from the eval stack, but that's a more advanced technique than we want to address here.)

Lexical Addressing and Compile-Time Environments

Computing lexical addresses is very easy for the compiler. All it needs to do is maintain a data structure called a compile-time environment, which records the structure of the runtime environment.

Each time the compiler compiles an expression that creates new bindings, it extends the compile-time environment to reflect the change to the environment structure, and when compiling expressions that will execute in that environment, it hands the new compile-time environment to the recursive call to compile which will compile that expression.

For example, when compiling a let, the compiler dispatches to compile-let, the analogue of eval-let, which does four things:

  1. Compiles code for the initial value expressions. This code executes in the environment outside the let, so the compile-let uses the environment is was given when making recursive calls to compile to generate the initial value code.
  2. Generates code to create a binding environment and intialize it with those values.
  3. Extends the compile-time environment with a new frame, reflecting the fact that the body of the let will execute in a new scope including the new bindings.
  4. Calls compile-sequence to compile the body of the let, passing it the new compile-time environment.

Just as the overall recursive structure of the compiler closely resembles the recursive structure of the interpreter, the role of the compile-time environment is very much like the role of the environment in the interpreter.

When the interpreter (compiler) evaluates (compiles) subexpressions that execute in the same environment as their parent expressions, it hands the recursive invocation the same environment it was given. When the interpreter (compiler) evaluates (compiles) an expression in a new environment, it hands the recursive call the new (compile-time) environment.

The structure of the compile time environment at any point in the compilation process mirrors the structure of the runtime environment where the code will execute. Unlike the interpreter's representation of the environment, however, the compile-time environment doesn't contain actual bindings--it can't, and it doesn't need to.

In effect, we split the interpreter's environment into two parts with parallel structure. Where the interpreter's environements are chains of frames holding name-binding pairs, the compiler splits those into two chains of frames: the runtime environment, whose frames hold the actual bindings, and the compile-time environment, whose frames hold the corresponding names.

Consider the expression

(let ((x 1)
      (y 2))
   (let ((foo 3)
         (bar 4))
      (list foo bar x y)))

At the point where (list a b x y) is compiled or executed, the environment for an interpreted system appears as shown on the left, below, while the environments for a compiled system appear as shown on the right:

     INTERPRETED                   COMPILED              COMPILED
                                (compile time)          (run time)
                    /\                         /\                   /\
                     |                          |                    |
 +---------------+   |           +----------+   |     +----------+   |
 |   envt. frame |   |           | c.t.e.fr.|   |     | envt. fr.|   |
 +===============+  /            +==========+  /      +==========+  /
 |          +----+-'             |     +----+-'       |     +----+-'
 +-------+-------+               +----------+         +----------+
 |     x |     1 |               |        x |         |        1 |
 +-------+-------+               +----------+         +----------+
 |     y |     2 |               |        y |         |        2 |
 +-------+-------+               +----------+         +----------+
                  _                         _                    _
                 |\                        |\                   |\
                   \                         \                    \
                    \                         \                    \
 +---------------+   |           +----------+  |      +----------+  | 
 |   envt. frame |   |           | c.t.e.fr.|  |      | envt. fr.|  |
 +===============+  /            +==========+  /      +==========+  /
 |          +----+-'             |     +----+-'       |     +----+-'
 +-------+-------+               +----------+         +----------+
 |   foo |     3 |               |      foo |         |        3 |
 +---------------+               +----------+         +----------+
 |   bar |     4 |               |      bar |         |        4 |
 +-------+-------+               +----------+         +----------+

Note that there is a many-to-one relationship between the compile-time environments and the run-time environments: a let or lambda expression is compiled once, and the corresponding environment frame is created and passed to the recursive calls that compile subexpressions. The code may be executed many times, however, and each time a run-time environment frame will be created so that the code for subexpressions can be executed in that environment.

A Detailed Example

Taking it step by step, let's look at the compilation of the expression

(let ((x 1234)
      (y 3456))
   (let ((foo z))
      (+ (- foo x)
         (+ bar y))))

goes as follows. (We'll assume that this expression occurs at the top level.)

Since we're compiling a top-level expression, we compile it in a compile-time environment that corresponds to the top-level environment. Toplevel environments are treated specially, because they're not created dynamically the way local binding environments. There's a one-to-one relationship bewteen top-level compile-time environments and the corresponding run-time environments. We'll represent the top-level compile-time environment as a special kind of environment frame, which really just holds a pointer to the top-level runtime environment so that top-level variables can be looked up.

So we start in a top-level environment, which we'll depict as [top-level]; we hand this to compile along with the expression to compile. compile is the main dispatch that analyzes the expression; in this case, it analyzes it and dispatches (via compile-list and compile-special-form) to compile-let, the specialized procedure for compiling let expressions.

compile-let compiles the initial value expressions for x and y using compile-multi, which in turn calls compile recursively; they are compiled in the (top-level) environment, which is just passed along because no new environments have been created yet. In this case it doesn't matter, though, because they're just literals. (The values 1234 and 3456 get added to the literal list at this point.) Then compile-let generates the code to bind x and y.

So far, the code generated looks like:

  (fetch-literal #1)    ; fetch 1234
  (push)                ; push it on eval stack
  (fetch-literal #2)    ; fetch 3456
  (push)                ; push it on eval stack
  (bind 2)              ; bind 2 vars (x and y), w/values form eval stack

and the literals list holds 1234 and 3456.

compile-let then calls compile-sequence to compile the body of the let, but first it creates a new compile-time environment, to represent the fact that the body sequence will execute in the new runtime environment after x and y have been bound. The structure of this environment is

   [ x y ] -> [toplevel]

(This is our new, terse way of drawing the box-and-arrow data structure for compile time environments. I got tired of drawing ascii art.)

compile-sequence calls compile recursively to evaluate a sequence of expressions; in this case, there's only one expression in the body.

The recursive call to compile dispatches (again via compile-list and compile-special-form to compile-let, to compile the inner let.

compile-let compiles the initial value expressions using compile-multi. compile-multi calls compile recursively to compile the one expression in the list, the symbol z. (Again, the same environment is just passed along, because we haven't created a new environment.)

The recursive call to compile now dispatches to compile-symbol, which looks up the binding information for the symbol z in the compile-time environment. There's no binding in the first frame (containing x and y), so the search goes to the second frame, which is the top-level environment, and the top-level binding is returned. This causes a dispatch to compile-toplevel-var-ref, which adds the toplevel binding of z to the literals list and generates code to get it from the template and extract its value at run time.

Then compile-let generates code to bind the fetched value as the local variable foo.

The code generated so far is:

  (fetch-literal 1)    ; fetch 1234
  (push)
  (fetch-literal 2)    ; fetch 3456
  (push)
  (bind 2)             ; bind 2 values (x and y)
  (fetch-literal 3)       ; get toplevel binding (of z)
  (t-l-bdg-get)           ; get value from (z's) binding
  (push)
  (bind 1)                ; bind one variable (foo)

and the literals list contains 1234, 3456, and the binding of z. Now compile-let creates a new compile-time environment to represent the environment created by the inner let; its structure is

 [ foo ] -> [ x y ] -> [toplevel]

and it passes this to compile-sequence to compile the body of the let. compile-sequence calls compile recursively once, handing it the new environment, to compile the one body expression, (+ (- foo x) (+ bar y)).

The recursive call to compile dispatches (through compile-list) to compile-combo, which recursively calls compile three times, to generate code for the subexpressions (+ bar y), (- foo x), and +. Since no new bindings are being created, the recursive calls are given the same environment.

The recursive call to call (+ bar y) similarly dispatches to compile-combo and compiles y, bar, and +. Each of these calls dispatches to compile-symbol and the variables are looked up in the compile-time environment. The lookup for y returns a lexical address of 1,2, so the intermediate code generated is

  (local-var-ref 1 2)

The lookup for bar doesn't find any local bindings and returns the toplevel binding so the binding is added to the literal list and the intermediate code is

  (literal-lookup 4)
  (t-l-bdg-get)

Similarly, the lookup for + doesn't find any local bindings and returns the toplevel binding, so the binding is added to the literal list and the intermediate code is

  (literal-lookup 4)
  (t-l-bdg-get)

now the call to compile-combo that compiles (+ bar y) can string these three fragments together to get

  (save-continuation "resume26") ; save state for call to +
  (local-var-ref 1 2)            ; look up y
  (push)
  (literal-lookup 4)             ; get toplevel binding (of bar)
  (t-l-bdg-get)                  ; get value from bdg (of bar)
  (push)
  (literal-lookup 5)             ; get toplevel binding (of +)
  (t-l-bdg-get)                  ; get value from binding (of +)
  (apply)                        ; call +
"resume26"                    

and return that. Notice that for the argument subexpressions, compile-combo inserts (push)es to save the values on the eval stack. For the first (function position) subexpression, it leaves the value in the value register, which is where it's expected (by apply).

The recursive call to compile-combo to compile (- foo x) goes pretty similarly to the one for (+ bar y), the main difference being that both foo and x are found to be local variables and compiled appropriately, with the result being the sequence

  (save-continuation "resume27") ; save state for call to -
  (local-var-ref 1 2)            ; look up x
  (push)
  (local-var-ref 0 1)            ; look up foo
  (push)
  (literal-lookup 4)             ; get toplevel binding (of -)
  (t-l-bdg-get)                  ; get value from binding (of -)
  (apply)                     ; call -
"resume27"

The recursive call to compile the symbol + goes striaghtforwardly to compile-symbol, which looks up + and finds that it's a toplevel variable; the binding is already on the literals list, so the code generated is:

  (literal-lookup 5)   ; get toplevel binding (of +)
  (t-l-bdg-get)        ; get value from binding (of +)

and this is returned to the outer invocation of compile combo. It can then string together the code for the outer + expression, putting a save-continuation at the front and adding an apply at the end. This code is returned to the inner invocation of compile-let, which appends it to its code and returns it to the outer invocation of compile-let, which returns the entire code sequence

  (fetch-literal 1)    ; fetch 1234
  (push)
  (fetch-literal 2)    ; fetch 3456
  (push)
  (bind 2)             ; bind 2 values (x and y)
  (fetch-literal 3)       ; get toplevel binding (of z)
  (t-l-bdg-get)           ; get value from (z's) binding
  (push)
  (bind 1)                ; bind one variable (foo)
  (save-continuation "resume26") ; save state for call to +
  (local-var-ref 1 2)            ; look up y
  (push)
  (literal-lookup 4)             ; get toplevel binding (of bar)
  (t-l-bdg-get)                  ; get value from bdg (of bar)
  (push)
  (literal-lookup 5)             ; get toplevel binding (of +)
  (t-l-bdg-get)                  ; get value from binding (of +)
  (apply)                     ; call +
"resume26"
  (save-continuation "resume27") ; save state for call to -
  (local-var-ref 1 2)            ; look up x
  (push)
  (local-var-ref 0 1)            ; look up foo
  (push)
  (literal-lookup 4)             ; get toplevel binding (of -)
  (t-l-bdg-get)                  ; get value from binding (of -)
  (apply)                     ; call -
"resume27"
  (literal-lookup 5)          ; get toplevel binding (of +)
  (t-l-bdg-get)               ; get value from binding (of +)
  (apply)                  ; (tail-)call + 

Preserving Tail-Recursiveness using Compile-Time Continuations

One very important thing we glossed over just now when describing the workings of the compiler was when exactly to save continuations, and when not to. It is important to save continuations before calling procedures, if the callee must return and resume the execution of the caller. This is not necessary for tail calls, and in fact Scheme requires that continuations not be saved for tail calls--if we save continuations before tail-calls that happen to implement loops, we may end up with an unbounded accumulation of unnecessary continuations.

Another important question is when returns should be executed. If a procedure ends in a tail-call, it is assumed that the callee will do a return. But eventually something actually has to do a return, and pass control back to its caller (or the caller of its caller... whatever). This situation occurs when the tail expression of a procedure is not another procedure call, e.g., returning the value of a variable or a literal.

When Should We Save Continuations?

The general rule is that if a procedure call is the last thing a procedure does, no continuation should be saved--we can just jump into the callee, and since our state was not saved, the callee's return will resume our caller. To get this right, we just have to keep track of which expressions are being compiled in "tail position."

In the procedure

(lambda (x)
   (if (foo x)
       (bar (quux x))
       (baz)))

the if expression is in tail position, because the value of the if will be returned as the value of the procedure. The condition expression (foo x) is not in tail position, because after it is executed, control must return to this procedure so that either the consequent expression (bar (quux x)) or the alternative expression (baz) can be executed.

Note that both the consequent and the alternative expressions are in tail-position; whichever is executed, that will be the last thing this procedure does, and the value computed will be the result of this procedure.

On the other hand, if we modify the procedure to always return #f, none of these expressions is in tail position.

(lambda (x)
   (if (foo x)
       (bar (quux x))
       (baz))
   #f)                                                  

That's because now the expression #f is in tail position, not the if expression; whatever the if does, control must come back to this procedure so that the value #f can be returned.

Notice that the values to compute the arguments of a combination (procedure call) are never in tail position--after computing them, control must always come back so that the procedure can be applied. The combination itself may be a tail call, of course, in which case once the arguments are computed, the apply may happen and control may never return.

To get this kind of right, all that is necessary is that each recursive call to compile should know whether the code being compiled is going to be used in tail position or not; for this we use a compile-time continuation. (Fear not--it's simpler than compile time environments. It's really just a flag that gets passed along to recursive calls to compile, sometimes getting turned off along the way.) Keep in mind that tails of tails are in tail positions, but non-tail subexpressions are not. So in the case of nested if's where the outer if is in tail position, only the consequent and the alternative of the consequent and the alternative are in tail position, e.g., in

(lambda ()
   (if (if (a)
           (b)
           (c))
       (if (d)
           (e)
           (f))
       (if (g)
           (h)
           (i)))

the tail calls are (e), (f), (h), and (i). All of the calls in the first inner if must return, because the value returned must be used by the outer if. The calls to the condition expressions in the other two inner ifs must also return, because the values must be used to tell which of their alternative and consequent to use.

For each basic kind of expression, we can tell which subexpressions should be considered tails if the overall expression is:

When we compile something in tail position, we pass compile a flag saying so. The flag will be examined, and passed along to subexpressions if appropriate for compiling the kind of subexpression in question.

For example, if compile-sequence is handed a flag saying it should compile for tail position, it will pass the tail flag along when calling compile recursively on its last subexpression. For its other subexpressions, however, it will always pass the non-tail flag, because they must always return to execute the next expression in the sequence.

Similarly, compile-if will pass whatever flag it is given along to when calling compile for its consequent and alternative subexpressions, but never when compiling its condition expression.

compile-combo will always pass along a non-tail flag when calling compile on its subexpressions, but will examine the flag it's given to tell whether it should save a continuation before evaluating all of them.

compile-lambda will always compile body expressions in non-tail position, except for the last one, which is always compiled in tail position. (For simplicitly, compile-lambda just hands the whole body to compile-sequence, with a tail flag.)

compile-let, always compiles its initial value expressions in non-tail position, and its body expressions like a sequence. (For simplicity, it just hands the whole body to compile-sequence, with whatever flag it's given.)

Compiling Returns

As mentioned above, when an expression other than a procedure call is a tail of a procedure, it must be accompanied by a return. That is, every tail of a procedure must be either an apply (which will call something which will return, perhaps indirectly because of tail calling) or a return.

The compiler can handle this by putting ensuring that wherever we generate intermediate code that is a leaf of the expression graph (e.g., in compile-variable-ref and compile-literal), we check the compile-time continuation flag to see if the expression occurs in tail position. If so, rather than simply leaving the value in the value register, we also execute a return sequence--a series of instructions that will grab the values out of the first partial continuation on the chain, and restore them into the registers and evaluation stack to resume the suspended procedure. We have a special intermediate code instruction that stands for this sequence, called return.

Consider the following procedure:

(lambda (a b c)
   (if (if a
           (b)
           c)
       d
       (e)))

When compiling its body, we dispatch through compile-sequence and recursively call compile to compile the if in tail position. It recursively calls compile to compile the nested if in non-tail position, which in turn recursively calls compile to compile a, (b) and c in non-tail position.

Note that a is a leaf expression, and since it's in non-tail position, it can just leave its value in the value register. The subsequent code (the test for false and conditional branch that's part of the code for the inner if) will expect that value there, so that's fine.

The expression (b) is not in tail position, because it inherits non-tail position from the inner if, so a continuation must be saved before the call to b. When b returns, its value will be in the value register and execution will resume at the branch that is part of the if.

Similarly, the expression c is in non-tail position (which it also inherited from the inner if); it can just leave its value in the value register where subsequent code can find it. (In this case, it's the value returned by the inner if, and tested by the outer if's test for false and conditional branch.)

The expression d is different. It's in tail position, and it's a leaf (not a call). It can't just leave it's value in the register, because it's the end of the procedure; it must therefore have a return sequence tagged onto it.

The expression (e) is just a tail call, which can just call e without saving a continuation. Whatever e calls can do whatever it wants, and probably something will eventually leave something in the value register and pop the caller's continuation.

The code generated for the above procedure is:

  (bind 3)                     ; bind args (a, b, and c)
  (local-var-ref 0 1)             ; get value of a
  (push)
  (branch-on-false "else32")      ; compare and maybe br to inner else
  (save-continuation "resume33")
  (local-var-ref 0 2)             ; get value of b
  (apply)                         ; call b
"resume33"
  (branch end)
"else32"
  (local-var-ref 0 3)                ; get value of c
"end32"
  (branch-on-false else1)      ; compare and may br to outer else
  (fetch-literal 1)               ; get toplevel binding of d
  (t-l-var-get)                   ; get value of d from binding
  (return)                        ; and return it
  (branch end1)
"else31"
  (fetch-literal 2)               ; get toplevel binding of e
  (t-l-var-get)                   ; get value of e from binding
  (apply)                         ; and tail-call it
"end31"

(Notice that when we generated the code for the outer else, we generated a branch that can never be taken. compile-if generates a label for the end of the code, so that after executing the consequent, control will resume at whatever code follows the if. In the case of this if, the consequent will always execute a return before encountering the branch. A slightly smarter compiler would probably recognize this situation, and eliminate the branch.)

Compiling Top-Level Expressions

We said earlier that the compiler mainly uses recursion to generate intermediate code for nested expressions. To make this useful, though, at some point the intermediate code for a top-level expression must be converted into actual executable code and packaged up so that it can be called.

Suppose we interact with our system via a read-eval-print loop where eval is really implemented by compiling the expression and executing the resulting compiled code.

To make it easy to implement this, it's nice if there aren't very many kinds of top-level expressions that the compiler has to generate code for and be able to actually call. In particular, it's convenient if different kinds of expression can be transformed into the same kind of expression. The easy way to do this is to make all different kinds of executable expressions into expressions that generate procedures, and then call those procedures.

If we type

(let ((x 2))
  (+ x x))

to the r.e.p. loop, the r.e.p. loop can simply wrap this up in a procedure expression compile that and package it up as something executable, and call it. In effect the read-eval-print loop will convert it to

(lambda ()
   (let ((x 2))
      (+ x x))

before compiling it, and call the resulting closure to execute it.

Likewise, expressions like

(set! foo quux)

and

(if bar baz)

can be wrapped up as

(lambda () (set! foo quux))

and

(lambda () (if bar baz))

Now when we start compiling, we only have to deal with one kind of thing--a whole procedure, and when we get the resulting code back and package it up to run it, we'll always be dealing with the code for a whole procedure. That makes it easy to create an actual closure to call.

The main routine we use to start off compilation is compile-procedure, which takes an expression, a compile-time environment, a compile-time continuation, and a literal list as arguments. It returns intermediate code and an updated literal list for the procedure.

We take the intermediate code and hand it to the procedure intermediate-code->executable-code which generates the executable code object. (This may be by translating the sequence of intermediate code instructions into the equivalent sequences of assembly language instructions, and running that through an assembler. Before doing the assembly, it may run the intermediate code through one or more optimization phases.

We take the resulting executable code and the literals list, and hand them to make-template to create the template object.

Now we can hand the appropriate runtime environment and the template to make-closure and get back a callable closure for the procedure.

Compiling lambda Expressions Inside Procedures

When we compile a lambda expression, we must generate code that will create a closure at run time. A very naive way to do this would be to generate code that would call the compiler at runtime to compile the lambda expression into a procedure, plus a little code to create a closure of that object in the runtime environment.

Luckily, this is not necessary, and the compiler can do all of the real compilation at compile time--since the code for the lambda expression will be the same every time it's executed, and since lexical scope guarantees that it will always execute in an environment with the same structure, only one version of the code is needed, and it can be shared among all closures of that procedure. The template can be shared as well.

The compiler therefore generates code and a template for the lambda procedure; at run time, the actual code for the lambda expression just makes a closure on the heap and initializes its environment pointer and template pointer. This code will get the environment pointer from the environment register (and put it in the environment field of the new closure); the template pointer will be the ponter to the template for the lambda procedure.

To allow this little code sequence to quickly grab the template for the procedure being closed, the compiler stores a pointer to that template in the template of the procedure which executes the lambda expression. For example, if a lambda expression is encountered while compiling procedure foo, the compiler will compile the lambda procedure and store its template in the template of foo. (While compiling foo, it simply records the pointer to the new lambda procedure's template as another literal. Then it will end up in foo's template like other literals.)

So the code generated for

   ...
   (lambda (x)
      (...))
   ...

looks like

   ...
   (envt-reg-get)     ; primitive to copy envt. reg. onto eval stack
   (push)
   (fetch-literal 15) ; grab template pointer for lambda proc
   (push)
   (make-closure)     ; code that will create closure w/those values
   ...

The real trick is in compiling the lambda procedure and stuffing its template into the template of the procedure that contains the lambda expression. The compiler just calls itself to generate the code and template then saves the template in the literal list and generates code like the above to reference the right literal.

Compiling Top-level Definitions

We said above that we can deal with top-level expressions by turning them all into lambda expressions, which will have the effect of evaluating those expressions when called.

This is a little bit tricky when dealing with top-level definitions, because top-level definitions can't be nested inside lambda expressions in the obvious way--they'd just become local definitions.

We therefore have to treat them specially. We use a special procedure, environment-define!, which operates on top-level environments and allows us to create top-level bindings. This is not a standard Scheme procedure--it's a special procedure that the compiler can generate calls to, but normal portable Scheme code cannot use directly.

The read-eval-print-loop will therefore recognize top-level definitions and treat them specially. When it encounters one, the initial-value expression will be wrapped up as a lambda and compiled, and the results turned into code, a template, and a closure. (The closure is given the runtime toplevel environment pointer.)

The closure will be called to get a result for the initial value expression, and environment-define! will be used to create and initialize the toplevel variable.

(This might appear at first to cause a scoping problem: if the binding isn't created until after the initial value expression is compiled, the compiler won't see the binding. But recall that if we compile an expression that uses an undefined variable, we assume it's a toplevel variable and create a binding object for it, and mark that object invalid. If the binding has already been created by a forward reference in this way, environment-define! will just overwrite the marker with a real value.)

Of course, if the top-level definition uses procedure definition syntax, it is necessary to massage that into a lambda expression before doing the above massaging and compiling.

Interfacing to the Runtime System

In order to support garbage collection (as is required for Scheme), there must be some agreement between the compiler and the garbage collector, so that the collector can find the pointers that the running program might find, and ensure that all objects the program could reach from them are preserved.

A common way of doing this (used in RScheme and Scheme-48) is to have a fixed set of registers (plus maybe an eval stack) that hold all of the root values that the collector needs to know about, and guarantee that whenever garbage collection occurs, all pointers will be identifiable as such. Any given register must be known to never contain pointers, to always contain a pointer, or to contain self-identifying (tagged) values that are decodable to see if they're pointers.

For example, in the straightforward compiled system we've described in detail, the VALUE register and the EVAL stack only contain normal Scheme values: tagged values that can be checked to see if they're pointers. On the other hand, the template and procedure, pointers would probably always contain raw pointers, since they can only point at one kind of thing, and the tags would slow some things down.

There might also be some other registers, which always contain nonpointers.

Garbage Collection

Safe Points

Many systems (like RScheme and Scheme-48) ensure that garbage collection can only happen when a program is at a well-defined "safe point", not at an arbitrary point in the code. At a safe point, all pointer values are known to be recognizable. In between safe points, it's okay for the code to use values that aren't properly decipherable by the GC. (For example, a register that normally contains only tagged values might briefly hold a raw pointer.)

This is easy in a single-threaded system; the GC just keeps some space in reserve, so that it never runs out of memory between safe points. If an allocation requires dipping into this reserve, a flag is set so that a GC will occur at the next safe point.

The usual trick is to ensure that each procedure call and backward branch is a safe point. This ensures that the a program (or thread) reaches safe points periodically,

It's a little bit trickier in a multithreaded system--you have to make sure that you suspend threads at safe points, so that if another thread forces a GC while another thread is suspended.

GC at Any Time

Some systems do not use safe points, and in effect make every point in the code a safe point for collection. They ensure that no matter where a GC occurs (or where a thread was suspended before the GC occurred), there will be enough information lying around so that the collector can find all of the pointers it needs to find.

Some compilers do this by restricting the way registers are used and code is generated. (For example, the Orbit compiler only uses certain registers to hold pointers, and only uses certain others to hold nonpointers. In addition, all pointers in registers must point directly to the beginning of an object; array indexing cannot be converted into arbitrary ponter arithmetic by the optimizing compiler.)

Other compilers allow more use of odd representations and more flexible use of registers, so that values can be figured out at run time. For example, a register might be assumed to hold nonpointers, except at points in the code flagged by the compiler, based on its having register allocated a variable there.

Interrupts

Advanced Compiler and Runtime System Techniques

Inlining Small Procedures

The system we've described so far generates fairly slow code, and a major culprit is the frequency of continuation saving and procedure calls. Even very small, frequently-executed procedures like eq?, car, cdr, and + require a handful if instructions to call and another handful to return, plus another handful to save a continuation if it's a non-tail call. This is much slower that the cost of similar operations in conventional languages like C or Pascal; in those languages, these simple little "operations" don't have the semantics of calls to first-class procedures.

Sometimes it is desirable to trade away some of the purity and elegance of a language like Scheme, and trade reduced flexibility for better performance. One way of doing this is by declaring frequently-used small procedures not to be redefinable, and allowing the compiler to compile those operations inline rather than as procedure calls. In some systems this only works for built-in procedures that the compiler understands, but in others the compiler is smart enough to inline user-defined procedures if so directed.

In some Scheme systems, you can declare procedures to be inlinable, or use a compiler flag that says you promise not to redefine the common little procedures that are most valuable to inline. This means that you can't change the definition of something like + on the fly, but you seldom want to. A common tradeoff is to avoid inlining any but the most frequently-called procedures during program development, and once the program is finished, recompile with lots of inlining. This gives you the flexibility to modify procedure definitions on the fly during debugging, while getting maximum speed once it's clear which procedures won't ever be redefined in normal operation.

Some high-tech compilers use advanced techniques to do lots of inlining when it's safe, without reducing flexibility much or requiring the user to supply a lot of declarations.

The Self compiler aggressively inlines code, and automatically recompiles the code that is invalidated by changes to procedure definitions. (This compiler is for the language Self, not Scheme, but similar techniques could be applied to Scheme.)

Some compilers currently in development have a special mode for compiling finished programs which will not be used with a read-eval-print loop. Such a compiler takes advantage of the fact that if it can look at the whole program (rather than having parts typed in by the user interactively), it can tell which variables could ever be modified at run time. (As long as there are no calls to eval at run time, the compiler can tell that all of the code for the program exists at compile-time; new closures may be created at run time, but not totally new procedures.) After globally determining that there is no code in the program that could change the definition of a procedure, it is free to inline the code for that procedure into its callers.

Type Declarations and Type Analysis

Another key performance problem with naive implementations of Scheme (or other dynamically typed languages) is that basic operations are generally slow relative to their execution in conventional statically-typed languages. For example, the Scheme procedure + must check the types of its arguments and (depending on those types) execute any of several possible code sequences to add two numbers. Usually, the checking overhead alone is several times greater than the cost of the actual addition.

One way of reducing this cost is by extending Scheme to allow the user to declare the types of some variables. The compiler may be able to use this information to compile fast versions of operations for values of known types. (This is especially true if common operations are inlined--the compiler can choose to inline the appropriate version rather than the more general code.)

Another way of reducing type checking cost is for the system to automatically infer the types of some expressions. For example, consider the expression (+ a 22). Since 22 is a literal, its type is known at compile time. If the compiler can inline the + procedure, it may at least omit the type check of that argument.

A combination of declarations and inferencing can work well. For example, if the user has declared variable a to be of type <integer>, then the compiler can tell that (+ a 22) is an expression whose arguments are integers (so no run time type test are necessary there) and whose result is an integer, which may eliminate the need for type checks by the expression that uses the value.

More aggressive schemes are possible for reducing the frequency of dynamic type checks. For example, the Self compiler aggressively inlines and transforms code so that multiple dynamic type checks can be collapsed into a single one.

Using More Hardware Registers

[ blah blah ]

For example, it's very likely a good idea to use more registers, and either not have an eval stack or not use it as often. Our simple abstract machine requires arguments to be passed on the eval stack, which means storing into memory at least once for each argument, and loading back from memory when arguments are used. Most modern machines have several hardware registers available for argument passing, and more for holding intermediate values of computations.

If we have a few more registers that can be used for argument passing, we could just leave the argument values in those known registers, and procedures could expect them there. In many cases, argument values could be computed in a way that the result is left in the appropriate argument-passing register, without having to copy it there from somwhere else. Similarly, in many cases, procedures could leave their arguments in the argument passing registers and use them there, without actually copying them into a binding environment on the heap. (Even if only a few registers can be devoted to this, it will account for the large majority of arguments passed, since most procedure calls are to procedures that take between one and three arguments.)

Similarly, in many cases a temporary value generated by evaluating a subexpression could be left in a register, and then used by another expression, without pushing and popping the eval stack.

This can be a big performance win--it is much faster to operate on arguments and temporary values that are already in registers, rather than copying them to and from memory all of the time.

Using more registers can make the compiler and runtime system more complicated. If variables are in registers when continuations are saved, their values must be saved in the continuations and restored at procedure returns. This requires the compiler to keep track of which registers are in use at which points, and generate appropriate code. It also complicates the interface between the compiled code and the garbage collector; the garbage collector must be able to find all of the pointer values that are stored in registers, so that it can find all of the reachable objects. The compiler must therefore record sufficient information that all pointer values can be found at garbage collection time. (Alternatively, the compiler may record a safe approximation of the information, and require the collector to make conservative guesses about what's what.)

Closure Analysis

One of the performance problems with a naive implementation of Scheme is that in the general case, variable bindings must be allocated on the garbage-collected heap, and procedure calls must be via pointers to closures. This is often much slower than the usual implementation of conventional programming languages, which don't have to support lambda. Allocating closures and environments on the heap is mainly slow because creating and accessing variable bindings is slower than if the variables were allocated on a stack or in registers.

A smart Scheme compiler can get rid of most of this overhead by analyzing programs and noticing that many closures are used in stereotyped ways, and calls to them can be implemented more cheaply than the naive implementation. Similarly, analysis of expressions may reveal that most binding environments can't possibly be captured by closures, and therefore don't need to be allocated on the garbage-collected heap. The bindings can be saved in continuations along with temporary values, or a more conventional stack may be used, or (best of all), the bindings can be register-allocated.

A simple example of a language-level closure that doesn't need the fully general naive implementation is a closure created by a lambda expression that appears in the function position of a combination:

((lambda (x)
    (+ x x))  
 2))

(Recall that constructs like this are often generated by macros that implement binding constructs like let---this one is equivalent to

(let ((x 2)
   (+ x x))

In this case, we can tell from the fact that the lambda expression appears in the function position that the closure can't "escape" and have anything weird done with it. That is, no pointer to the closure is assigned into a variable binding, or passed to a procedure call, or inserted into a data structure. It's clear that the only thing that can happen to this closure is that it will be called, and then the pointer to it will be "dropped," i.e., not passed anywhere else. The closure will therefore become garbage immediately after it's executed.

A smart compiler will therefore recognize that all the closure really does is bind its variable and execute it's body; it will leave out the code to create the closure and just compile in the equivalent code--in this case, it will generate the obvious code for a let expression.

(Some compilers always transform let's and letrec's into lambda combinations, and rely on their optimizers to recognize the unnecessary lambda's and remove them. This may seem backwards, but it's nice because the same optimizations work whether the lambda combinations were the result of transforming a let, or macroexpanding a user-defined macro, or written directly by the user, or whatever. The more sophisticated the optimizer, the more simply the user can write macros and procedures, and expect the compiler to sort it all out and generate efficient code.)

Another simple case for closure and environment analyis is binding environments that don't have any closures created in their scopes. Suppose that our compiler inlines calls to car, eq?, and cdr, and consider the expression

(let ((x (car a))
   (if (eq? (car x) target)
       (car (cdr x))
       #f))

in this case, the body of the let can be compiled into entirely inline code, and it is clear that there is no possible path of execution that can create a closure that captures x. x can therefore be allocated in a register for its whole lifetime, making this code much faster.

[ separate section? Figure out structure here... ]

Actually, some of these analyses are trickier than they appear, due to the presence of side effects and call/cc.

[ Haven't talked about call/cc yet! ]

Consider the expression, where we don't assume any inlining

(let ((x (car a)))
   (if (eq? (car x) target)
       (car (cdr x))
       (set! x (foo))))

At first it appears that since there are no lambda's in the expression, x can be allocated in a register, and saved in continuations across calls. (E.g., when calling car, we could just save the value of x in the continuation and have it restored when car returns, right?) Unfortunately, if we don't have any guarantees that car won't be redefined in weird ways, then it's possible that the call will be to procedure that will (directly or indirectly) call call/cc, and capture a continuation that could be used to return into this procedure any number of times. In that case, we can't be sure that we won't return into this code and modify x. If we did, then each time we returned into this environment, we should see the latest value of x. This will happen if the value of x is in a normal binding environment on the heap, but not if it's in a register that gets saved in a continuation. Recall that when we restore a continuation, we just copy the values out into the registers. If we restore the same continuation multiple times, we'll just keep copying the same value of x back out.

To get this right, we have to ensure that if there are any assignments to x, then all references to x go through a pointer to a heap-allocated binding. Then when we save a continuation, we save this pointer to the binding of x, not the state (value) of the binding of x. Every time set or read the value of x, we go through this indirection to the same binding, and see the latest value.

Because of this, high-tech scheme compilers keep track of which variables are ever set! anywhere in their scopes, and make sure to allocate those variables' bindings on the heap.

In Scheme, it is a common idiom to code iteration as recursion; macros for different looping constructs often compile into letrec's with tail-calling lambda expressions.

While this is a very powerful framework for expression various patterns of iteration, a naive implementation is slow. In most cases, loops created in this way are actually just used as loops, and it is desirable to compile away the overhead of closure creation and calling. For example, consider a named let like

(let loop ((x 0))
   <body>
   (if (< x 10)
       (loop (+ x 1))))

that has been transformed to

(let ((loop (lambda x)
               <body>
               (if (< x 10)
                   (loop (+ x 1))))))
   (loop 0)

We can look at this expression, and if no reference to the variable loop occurs in the <body> expression, we can tell that we can compile it as a loop.

The analysis here is just slightly more complicated than the one that allows us to optimize closures that are produced by lambda expressions in function position of a combination.

When compiling the let, we can keep track of each let variable and see whether it is ever used for as anything but the name of a procedure to tail-call--if the value of loop is never assigned, and never read except to call it, then we know that the "calls" to loop don't really need to be full-blown closure calls at all. We can inline the code for the body of the loop and compile these calls as jumps directly to that code.

FOOD FOR THOUGHT--does it matter whether the calls are tail-calls or not, if we just treat them as procedure calls to a known address, and go ahead and save a continuation with the right label?

Register Allocating Loop Variables for Loops

Notice that register closure analysis is particularly important for loop control variables and variables for little let's inside loops. Because Scheme requires that a loop variable be bound again (to fresh memory) at each iteration of a loop, actually allocating them on the heap is expensive. If it can be determined that the variable is dead at the end of the loop, however, then the variable can be re-bound at each iteration by simply re-using the same register. (We're binding the name to a particular piece of memory--the register--over and over again, and it just happens that these conceptual rebindings incur no runtime cost.)

With good closure analysis, loop conversion, and register allocation, a Scheme compiler can compile "normal" loops into code that's just as efficient as any compiler's.

Conventional Optimizations

Besides the optimizations described above, conventional compiler optimizations are applicable to optimizing languages like Scheme.

Just as in a FORTRAN or C compiler, data flow analysis and control flow analysis can let the compiler simplify intermediate code and produce better machine code.

Stack Caches

Inlining and closure analyisis can greatly reduce the amount of heap allocation in a Scheme implementation. Allocating all binding environments and continuations on the heap may inflate allocation rates by an order of magnitude over the rate of allocation of normal data structures like pairs and vectors. With a simple compiler and garbage collector, this can greatly inflate garbage collection costs. Despite the high rate at which continuations and environments are allocated, there are typically relatively few of them live at any given time--the vast majority of them are used very, very briefly and then become garbage.

Inlining procedure calls may greatly reduce the allocation of continuations, and closure analysis may allow most bindings to be allocated in registers instead of on the heap.

Still, it may be desirable to keep most of the continuations and environments from making it to the normal garbage-collected heap.

A stack cache is an area of memory (or pool of discontiguous chunks of memory) that's used to for initial allocation of continuations and/or binding environments, in the expectation that most of them will die quickly. A stack cache caches part of the continuation chain; it's called a stack cache because it behaves mostly like a stack. Stack caches may be used for continuatons, with environments still being allocated on the heap, or a more complex design may be used to keep most environments from making it to the heap as well.

For the most part, a stack cache is treated like a stack, in that continuations are pushed and popped as though it were a stack. When a continuation is captured by call/cc, however, the continuation chain is first moved to the heap so that it can be preserved in the usual way. This is generally a good tradeoff, because call/cc is not typically executed very often, and the stack cache can behave like a stack most of the time. The large majority of continuations will be reclaimed very quickly, by popping the stack cache, while a small minority will be moved out to the normal heap.

Caching binding environments is a little trickier, but the basic principle is the same; most environments are created in the stack cache, and only moved to the garbage-collected heap when necessary, i.e., when a closure is created on the heap. At that moment, the environment is moved to the heap, one frame at a time, until a frame is reached that is already on the heap. (The code that does this must ensure that an environment is never copied to the heap twice, destroying the sharing of outer environements by inner environments created in their scope.)

It is not clear how desirable a stack cache for environments is, given a compiler that does a reasonably good job of closure analysis. Using a stack cache for environments makes closure creation slower, and if most of the short-lived environments have been eliminated by closure analysis and register allocation, it may not be worth it.

There is also some controversy about whether stack caches are worthwhile in general, or whether a generational garbage collector will take care of the large volume of short-lived data efficiently.

One interesting point is that a stack cache really is a kind of generational garbage collection scheme, which exploits the typically short lifetimes of particular kinds of data. (When environments and continuations are moved to the normal heap, that can be viewed as moving objects from one generation to the next. This special generation is cheaper than a normal generational scheme, however, because of the stereotyped structures of continuation chains and binding environments.)

A stack cache, because it's small, can reduce the amount of memory that is used very frequently, compared to a generational GC without a stack cache. (A stack cache may only be a few kilobytes, but the youngest generation of a generational GC may be hundreds of kilobytes, or megabytes.) For some cache architectures, frequent reuse of this large an area causes significant cache miss penalties. (For some other architectures, the misses still occur but the cost is surprisingly low. I believe that stack caches are nonetheless a good idea, because they never hurt much and may sometimes help a lot.)

Scheme-48 has a stack cache that caches both continuations and binding environments. RScheme has a stack cache for continuations only, and relies on the compiler to compile away most heap allocation of binding environments. (This may not currently be as effective as it should be--the compiler needs more testing and improvement before it will generate really good code.)


Go to the first, previous, next, last section, table of contents.