SXEmacs Internals Manual: How Lisp Objects Are Represented in C

Lisp objects are represented in C using a 32-bit or 64-bit machine word (depending on the processor; i.e. DEC Alphas use 64-bit Lisp objects and most other processors use 32-bit Lisp objects). The representation stuffs a pointer together with a tag, as follows:

 [ 3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ]
 [ 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 ]

   <---------------------------------------------------------> <->
            a pointer to a structure, or an integer            tag

A tag of 00 is used for all pointer object types, a tag of 10 is used for characters, and the other two tags 01 and 11 are joined together to form the integer object type. This representation gives us 31 bit integers and 30 bit characters, while pointers are represented directly without any bit masking or shifting. This representation, though, assumes that pointers to structs are always aligned to multiples of 4, so the lower 2 bits are always zero.

Lisp objects use the typedef Lisp_Object, but the actual C type used for the Lisp object can vary. It is a simple type (long on the DEC Alpha, int on other machines).

Various macros are used to convert between Lisp_Objects and the corresponding C type. Macros of the form XINT(), XCHAR(), XSTRING(), XSYMBOL(), do any required bit shifting and/or masking and cast it to the appropriate type. XINT() needs to be a bit tricky so that negative numbers are properly sign-extended. Since integers are stored left-shifted, if the right-shift operator does an arithmetic shift (i.e. it leaves the most-significant bit as-is rather than shifting in a zero, so that it mimics a divide-by-two even for negative numbers) the shift to remove the tag bit is enough. This is the case on all the systems we support.

Note that when ERROR_CHECK_TYPECHECK is defined, the converter macros become more complicated—they check the tag bits and/or the type field in the first four bytes of a record type to ensure that the object is really of the correct type. This is great for catching places where an incorrect type is being dereferenced—this typically results in a pointer being dereferenced as the wrong type of structure, with unpredictable (and sometimes not easily traceable) results.

There are similar XSETTYPE() macros that construct a Lisp object. These macros are of the form

XSETTYPE
(lvalue, result)

, i.e. they have to be a statement rather than just used in an expression. The reason for this is that standard C doesn’t let you “construct” a structure (but GCC does). Granted, this sometimes isn’t too convenient; for the case of integers, at least, you can use the function make_int(), which constructs and returns an integer Lisp object. Note that the XSETTYPE() macros are also affected by ERROR_CHECK_TYPECHECK and make sure that the structure is of the right type in the case of record types, where the type is contained in the structure.

The C programmer is responsible for guaranteeing that a Lisp_Object is the correct type before using the XTYPE macros. This is especially important in the case of lists. Use XCAR and XCDR if a Lisp_Object is certainly a cons cell, else use Fcar() and Fcdr(). Trust other C code, but not Lisp code. On the other hand, if SXEmacs has an internal logic error, it’s better to crash immediately, so sprinkle assert()s and “unreachable” abort()s liberally about the source code. Where performance is an issue, use type_checking_assert, bufpos_checking_assert, and gc_checking_assert, which do nothing unless the corresponding configure error checking flag was specified.

7 How Lisp Objects Are Represented in C