An Introduction to Scheme and its Implementation

Go to the first, previous, next, last section, table of contents.

Lists (Hunk F)

==================================================================
Hunk F starts here:
==================================================================

We usually use pairs in Scheme in a particular, stereotyped way, to build lists.

Pairs are really like binary tree nodes--you can use the car and cdr fields in the same ways. The normal way of using them treats the car and the cdr differently, however.

The cdr field of a pair is used to hold a pointer to another pair, or a pointer to the empty list, i.e., a null pointer. This lets you string pairs to gether to make linked lists of pairs. The car fields of the pairs hold pointers to any kind of object you want to put in a list.

We can therefore define the term list recursively as

an empty list, i.e., the null pointer object (), or
a pair whose cdr value is a list.

Think about this, and make sure that you understand why this covers all null-terminated lists strung together by the cdrs, and nothing else. Lists of this form are also called proper lists, and that's usually what we mean when we say "list." The important fact about a proper list is that it is a linear sequence of pairs ending with the empty list.

We usually think of lists as holding a sequence of values--we ignore the actual pairs, and think about their cdr values.

Because this is how lists are usually used, Scheme has a special way of printing lists. In the earlier examples, I showed that the result of (cons 1 2) prints as (1 . 2).

You might think that the result of (cons 1 (cons 2 '())) would print as (1 . (2 . '())), but it doesn't. It prints as (1 2).

The reason is that when Scheme encounters a pair whose cdr points to another pair or the empty list, it assumes you want to think of it as a list of pairs strung together by the cdrs, and it only shows you the car values. This is because we usually ignore the actual structure of a list--the sequence of pairs--and think about the values the list holds.

Try this in your system:

Scheme>'()
()
Scheme>(cons 1 '())
(1)
Scheme>(cons 1 (cons 2 '()))
(1 2)
Scheme>(cons 1 (cons 2 (cons 3 '())))
(1 2 3)

Notice that the data structure that prints as (1 2 3) is really a binary tree, and we could draw it like this:

      \
       \
        +---+---+
        | * | * |
        +/--+---\
        /        \
       1          +---+---+
                  | * | * |
                  +/--+---\
                  /        \
                 2          +---+---+
                            | * | * |
                            +/--+---+
                            /
                           3

We generally wouldn't, though, because we think of it as a sequence of numbers, and the pairs are just there to string them together in order. We'd draw it more like this, using the box-and-arrow notation from the previous chapter:

       +---+---+    +---+---+   +---+---+
   --->| * | *-+--->| * | *-+-->| * | * |
       +-+-+---+    +-+-+---+   +-+-+---*
         |            |           |
         |            |           |
         1            2           3

We've really just rotated the picture 45 degrees, so that "down and to the right" in the tree goes straight right, and looks more like "next" in a linear list.

(The arrow coming in from the left represents the pointer value that was returned, which the read-eval-print loop handed to write so that we could see the printed representation of the data structure.)

Drawing things this way lets us show shared structure, if a list overlaps with another list, e.g, if one list joins with the other because some car in each list points at the same object.

Note that a list of this form always ends with a pair whose cdr is (), (i.e., the empty list, a.k.a. the null pointer).

If we had forgotten that, we might have tried to construct the list this way, with the innermost cons just consing two numbers together:

Scheme>(cons 1 (cons 2 3))
(1 2 . 3)

This is a common beginning mistake. We have constructed an improper list---one which is not null-terminated. It doesn't end with ().

We could draw the list this way:

      +---+---+      +---+---+  
  --->| * | *-+----->| * | *-+---->3
      +-+-+---+      +-+-+---+
        |              |    
       \|/            \|/  
        1              2

Notice the dot in (1 2 . 3)---that's like the dot in (2 . 3), saying that the cdr of a pair points to 3, not another pair or '(). That is, it's an improper list, not just a list of pairs. It doesn't fit the recursive definition of a list, because when we get to the second pair, its cdr isn't a list--it's an integer.

Scheme printed out the first part of the list as though it were a normal cdr-linked list, but when it got to the end, it couldn't do that, so it used "dot notation."

You generally shouldn't need to worry about dot notation, because you should use normal lists, not improper list. But if you see an unexpected dot when Scheme prints out a data structure, it's a good guess that you used cons and gave it a non-list as its second argument--something besides another pair or ().

Scheme provides a handy procedure that creates proper lists, called list. list can take any number of arguments, and constructs a proper list with those elements in that order. You don't have to remember to supply the empty list---list automatically terminates the list that way.

Scheme>(list 1 2 3 4)
(1 2 3 4)

We could draw the result like this:

    +---+---+      +---+---+      +---+---+      +---+---+
--->| * | *-+----->| * | *-+----->| * | *-+----->| * | * |
    +-+-+---+      +-+-+---+      +-+-+---+      +-+-+---+
      |              |              |              |
     \|/            \|/            \|/            \|/
      1              2              3              4

Like any other procedure, list can be used with arguments that are procedure calls, such as calls to list itself.

Scheme>(list (list 1 2) (list 3 4))
((1 2) (3 4))

We can draw the result like this:

    +---+---+                     +---+---+
--->| * | *-+-------------------->| * | * |
    +-+-+---+                     +-+-+---+ 
      |                             | 
     \|/                           \|/ 
    +---+---+      +---+---+      +---+---+      +---+---+
    | * | *-+----->| * | * |      | * | *-+----->| * | * |
    +-+-+---+      +-+-+---+      +-+-+---+      +-+-+---+
      |              |              |              |
     \|/            \|/            \|/            \|/
      1              2              3              4

While Scheme prints lists in normal list notation when it can (and only uses dot notation when it has to), it can read either one.

We can type in literal lists using the quote special form, which just returns a list of the form we typed:

Scheme>(quote (1 2 3 4))
(1 2 3 4)

Since Scheme can read dot notation, we can do this in an equivalent way, using parentheses around the contents of each pair, and a dot to separate the car and cdr values:

Scheme> (quote (1 . (2 . (3 . ( 4 . ())))))
(1 2 3 4)

The difference between list and quote is that list is just a procedure, and each time you call it, it creates a new list. The arguments to list can be any expressions you like, and their results are what's put in the list.

Scheme>(list (double 1) (double 2) (double 3) (double 4))
(2 4 6 8)

On the other hand, quote is a special form. It always takes exactly takes one argument, which is not evaluated at all---it's just a textual representation of a data structure.

Scheme>(quote (double 1))
(double 1)

What happened here is that quote just returned a data structure, the list (double 1). It did not try to interpret it as an expression and give its value.

(The first item in the list is the symbol double. A symbol is just another kind of data object, roughly like a string, which we'll discuss later. It's not the same thing as a variable, even though it prints like a variable name.)

Quoting is so common that Scheme provides a special bit of syntactic sugar to make it easier. Instead of writing out (quote before an expression, and a closing parenthesis after, you can just use the special character '. Whatever follows should be the textual representation of a data structure, and Scheme constructs that literal data structure.

Scheme>'(1 2 3 4)
(1 2 3 4)

Scheme>'((1 2) (3 4) (5 6))
((1 2) (3 4) (5 6))

Scheme>'(#f #t)
(#f #t)

Notice that you only need one quote character at the beginning of a whole literal--you don't need to separately quote the subparts, and you shouldn't.

Later, I'll talk about quoting things besides lists. Quoted lists are enough for now--we'll use them a lot in examples.

[ Should demonstrate list, length, append, reverse, and member here, combining them in various ways. ]

==================================================================
This is the end of Hunk F.

At this point, you should go back to the previous chapter and
read Hunk G before returning here and continuing this tutorial.
==================================================================

(Go BACK to read Hunk G, which starts at section Type and Equality Predicates (Hunk G).)

Go to the first, previous, next, last section, table of contents.