[Scheme] Values are always pointers to objects

Values

You can define a variable in Scheme using a define:

(define my-variable 5)

This tells Scheme to allocate space for my-variable, and initialize that storage with the value 5.In Scheme, you always give a variable an initial value, so there's no such thing as an uninitialized variable or an unininitialized variable error.

Scheme values are always pointers to objects, so when we use the literal 5, Scheme interprets that as meaning a pointer to the object 5. Numbers are objects you can have pointers to, just like any other kind of data structure. (Actually, most Scheme implementations use a couple of tricks to avoid pointer overheads on numbers, but that doesn't show up at the language level. You don't have to be aware of it.)

After the above definition, we can draw the resulting situation like this:

    +-------+ 
foo |   *---+--->5 
    +-------+

In the picture, the box represents the fact that Scheme has allocated storage for a variable. The name foo beside the box means that we've given that storage the name foo. The arrow says that the value in the box is a pointer to the integer object 5. (Don't worry about how the integer object is actually represented. It doesn't really matter.)

Pointer semantics

As I said earlier, all values are conceptually pointers to objects on a heap, and you don't ever have to explicitly free memory.

By "object," I don't necessarily mean object in the object-oriented sense. I just mean data objects like Pascal records or C structs, which can be referenced via pointers and may (or may not) hold state information.

Conceptually, all Scheme objects are allocated on the heap, and referred to via pointers. This actually makes life simple, because you don't have to worry about whether you should dereference a pointer when you want to use a value--you always do. Since pointer dereferencing is uniform, procedures always dereference a pointer to a value when they really use the value, and you never have to explicitly force the dereferencing.

For example, the predefined Scheme procedure + takes two pointers to numbers, and automatically dereferences both pointers before doing the addition. It returns a pointer to the number that's the result of the addition. So when we evaluate the expression (+ 2 3) to add two to three, we are taking a pointer to the integer 2 and a pointer to integer 3, and passing those as arguments to the procedure +. + returns a pointer to the integer 5.

When you think about it, it doesn't make any sense to change the value of an integer, in a mathematical sense. For example, what would it mean to change the integer 6's value to be 7? It wouldn't mean anything sensible, for sure. 6 is a unique, abstract mathematical object that doesn't have any state that can be changed---6 is 6, and behaves like 6, forever.

What's going on in conventional programming languages is not really changing the value of an integer-- it's replacing one (copy of an) integer value with (a copy of) another. That's because most programming languages have both pointer semantics (for pointer variables) and value semantics (for nonpointer variables, like integers). You make multiple copies of values, and then clobber the copies when you perform an assignment.

In Scheme, we don't need to clobber the value of an integer, because we get the effect we want by replacing pointers with other pointers. An integer in Scheme is a unique entity, just as it is in mathematics. We don't have multiple copies of a particular number, just multiple references to it. (Actually, Scheme's treatment of numbers is not quite this simple and pretty, for efficiency reasons I'll explain later, but it's close.)

As we'll see later, an implementation is free to optimize away these pointers if it doesn't affect the programmer's view of things--but when you're trying to understand a program, you should always think of values as pointers to objects.

It is sometimes said that languages like Scheme (and Lisp, Smalltalk, Eiffel, and Java) "don't have pointers." It's at least as reasonable to say that the opposite is true--everything's a pointer. What they don't have is a distinction between pointers and nonpointers that you have to worry about.

The Implementation and Optimization

You might think that making every value a pointer to an object would be expensive, because you'd have to have space for all of the pointers as well as the things they point to, and you'd have to use extra instructions to access things via pointers.

Everything's a pointer at the language level--i.e., from the programmer's point of view--but a Scheme system doesn't actually have to represent things the way they appear at the languages level.

Most Scheme implementations optimize away a lot of pointers. For example, it's inefficient to actually represent integer values as pointers to integer objects on the heap. Scheme implementations therefore use tricks to represent integers without really using pointers. (Again, keep in mind that this is just an implementation trick that's hidden from the programmer. Integer values have the semantics of pointers, even if they're represented differently from other things.)

Rather than putting integer values on the heap, and then passing around pointers to them, most implementations put the actual integer bit pattern directly into variables--after all, a reasonable-sized integer will fit in a machine word.

A short value (like a normal integer) stored directly into a variable is called an immediate value, in contrast to pointers which are used to refer to objects indirectly.

The problem with putting integers or other short values into variables is that Scheme has to tell them apart from each other, and from pointers which might have the same bit patterns. The solution to this is tagging. The value in each variable actually has a few bits devoted to a type tag which says what kind of thing it is--e.g., whether it's a pointer or not. The use of a few bits for a tag slightly reduces the amount of storage available for the actual value, but as we'll see next, that usually isn't a problem.

It might seem that storing integer bit patterns directly in variables would break the abstraction that Scheme is supposed to present--the illusion that all values are pointers to objects on the heap. That's not so, though, because the language enforces restrictions that keep programmers from seeing the difference.

Objects on the Heap

Most Scheme objects only have fields that are general-purpose value cells---any field can hold any Scheme value, whether it's a tagged immediate value or a tagged pointer to another heap-allocated object. (Of course, conceptually they're all pointers, so the type of a field is just "pointer to anything.")

So, for example, a pair (also known in Lisp terminology as a "cons cell") is a heap-allocated object with two fields. Either field can hold any kind of value, such as a number, a text character, a boolean, or a pointer to another heap object.

The first field of a pair is called the car field, and the second field is called the cdr field. These are among the dumbest names for anything in all of computer science. (They are just a historical artifact of the first Lisp implementation and the machine it ran on.)

Pairs can be created using the procedure cons. For example, to create a pair with the number 22 as the value of its car field, and the number 15 as the value of its cdr field, you can write the procedure call (cons 22 15).

The fields of a pair are like variable bindings, in that they can hold any kind of Scheme value. Both bindings and fields are called value cells---i.e., they're places you can put any kind of value.

In most implementations, each heap-allocated object has a hidden "header" field that you, as a Scheme programmer, are not supposed to know about. This extra field holds type information, saying exactly what kind of heap allocated object it is. So, laid out in memory, the pair looks something like this:

      +-----------+ 
header|  | 
      +===========+ 
   car|     *-----+----->22
      +-----------+ 
   cdr|     *-----+----->15 
      +-----------+

In this case, the car field of the pair (cons cell) holds the integer 22, and the cdr field holds the integer 15. The values stored in the fields of the pair are drawn as arrows, because they are pointers to the numbers 22 and 15.

(The actual representation of these values might be a 30-bit binary number with a two-bit tag field used to distinguish integers from real pointers, but you don't have to worry about that.)

Suppose we have a top-level variable binding for the variable foo, and its value is a pointer to the above pair. We would draw that situation something like this:

                             +---------+ 
    +---------+        header|   | 
foo |    *----+------------->+=========+ 
    +---------+           car|    *----+---->22 
                             +---------+ 
                          cdr|    *----+---->15 
                             +---------+

Most other objects in Scheme are represented similarly. For example, a vector (one-dimensional array) is typically represented as a linear array of value cells, which can hold any kind of value.

Even objects that aren't actually represented like this can be thought of this way, since conceptually, everything's on the heap and referred to via a pointer.

Reference

An Introduction to Scheme and its Implementation