Garbage Collection

Garbage collection is a pretty critical decision when building a virtual machine. It determines a bunch of other decisions such as object layout in memory. For a good overview of different garbage collection techniques checkout wikipedia. To see some of the other issues that you face when actually building a GC, there is a really good FAQ on the I.E.C.C GC List FAQ.

One of the central issues that a garbage collector has to figure out is how to distinguish between what is a pointer and an integer? Given any value, is it a real reference or just a really big number? It's difficult to tell. One method to tell if a value is a reference is to check if the pointer resides in an area of allocated memory. If so, we just assume that this value is really a pointer, but there is no real proof that it isn't an actual integer. Another method is to use some of the bits in all values as a type tag. This is what SquirrelFish does.

The lowest 2 bits in a 32 bit field are for the type tag. If the two least significant bits are 0, the next 30 bits refer to a pointer address. If the least significant bit is a 1, the next 31 bits are used as a signed integer. For example, consider the signed integer 4.

Normal binary would be:
0000 0100

SquirrelFish internal representation:
0000 1001

There are other tags in SquirrelFish as well. The implementation details as well as the other comments are in JSImmediate.h..

[ high 30 bits: pointer address ] [ low 2 bits -- always 0 ]

[ high 31 bits: signed integer ] [ lowest bit -- always 1 ]

000000000000000000000000000 01 10
[ boolean value ] [ bool ] [ tag 'other' ]

000000000000000000000000000 10 10
[ zero ] [ undefined ] [ tag 'other' ]

000000000000000000000000000 00 10
[ zero ] [ zero ] [ tag 'other' ]

* Tip from Aaron. Pointers are guaranteed to point to aligned word boundaries.