SXEmacs Internals Manual: The SXEmacs Object System (Abstractly Speaking)

6 The SXEmacs Object System (Abstractly Speaking)

At the heart of the Lisp interpreter is its management of objects. SXEmacs Lisp contains many built-in objects, some of which are simple and others of which can be very complex; and some of which are very common, and others of which are rarely used or are only used internally. (Since the Lisp allocation system, with its automatic reclamation of unused storage, is so much more convenient than malloc() and free(), the C code makes extensive use of it in its internal operations.)

The basic Lisp objects are

integer: 28 or 31 bits of precision, or 60 or 63 bits on 64-bit machines; the reason for this is described below when the internal Lisp object representation is described.
float: Same precision as a double in C.
cons: A simple container for two Lisp objects, used to implement lists and most other data structures in Lisp.
char: An object representing a single character of text; chars behave like integers in many ways but are logically considered text rather than numbers and have a different read syntax. (the read syntax for a char contains the char itself or some textual encoding of it—for example, a Japanese Kanji character might be encoded as ‘^[$(B#&^[(B’ using the ISO-2022 encoding standard—rather than the numerical representation of the char; this way, if the mapping between chars and integers changes, which is quite possible for Kanji characters and other extended characters, the same character will still be created. Note that some primitives confuse chars and integers. The worst culprit is eq, which makes a special exception and considers a char to be eq to its integer equivalent, even though in no other case are objects of two different types eq. The reason for this monstrosity is compatibility with existing code; the separation of char from integer came fairly recently.)
symbol: An object that contains Lisp objects and is referred to by name; symbols are used to implement variables and named functions and to provide the equivalent of preprocessor constants in C.
vector: A one-dimensional array of Lisp objects providing constant-time access to any of the objects; access to an arbitrary object in a vector is faster than for lists, but the operations that can be done on a vector are more limited.
string: Self-explanatory; behaves much like a vector of chars but has a different read syntax and is stored and manipulated more compactly.
bit-vector: A vector of bits; similar to a string in spirit.
compiled-function: An object containing compiled Lisp code, known as byte code.
subr: A Lisp primitive, i.e. a Lisp-callable function implemented in C.

Note that there is no basic “function” type, as in more powerful versions of Lisp (where it’s called a closure). SXEmacs Lisp does not provide the closure semantics implemented by Common Lisp and Scheme. The guts of a function in SXEmacs Lisp are represented in one of four ways: a symbol specifying another function (when one function is an alias for another), a list (whose first element must be the symbol lambda) containing the function’s source code, a compiled-function object, or a subr object. (In other words, given a symbol specifying the name of a function, calling symbol-function to retrieve the contents of the symbol’s function cell will return one of these types of objects.)

SXEmacs Lisp also contains numerous specialized objects used to implement the editor:

buffer: Stores text like a string, but is optimized for insertion and deletion and has certain other properties that can be set.
frame: An object with various properties whose displayable representation is a window in window-system parlance.
window: A section of a frame that displays the contents of a buffer; often called a pane in window-system parlance.
window-configuration: An object that represents a saved configuration of windows in a frame.
device: An object representing a screen on which frames can be displayed; equivalent to a display in the X Window System and a TTY in character mode.
face: An object specifying the appearance of text or graphics; it has properties such as font, foreground color, and background color.
marker: An object that refers to a particular position in a buffer and moves around as text is inserted and deleted to stay in the same relative position to the text around it.
extent: Similar to a marker but covers a range of text in a buffer; can also specify properties of the text, such as a face in which the text is to be displayed, whether the text is invisible or unmodifiable, etc.
event: Generated by calling next-event and contains information describing a particular event happening in the system, such as the user pressing a key or a process terminating.
keymap: An object that maps from events (described using lists, vectors, and symbols rather than with an event object because the mapping is for classes of events, rather than individual events) to functions to execute or other events to recursively look up; the functions are described by name, using a symbol, or using lists to specify the function’s code.
glyph: An object that describes the appearance of an image (e.g. pixmap) on the screen; glyphs can be attached to the beginning or end of extents and in some future version of SXEmacs will be able to be inserted directly into a buffer.
process: An object that describes a connection to an externally-running process.

There are some other, less-commonly-encountered general objects:

hash-table: An object that maps from an arbitrary Lisp object to another arbitrary Lisp object, using hashing for fast lookup.
obarray: A limited form of hash-table that maps from strings to symbols; obarrays are used to look up a symbol given its name and are not actually their own object type but are kludgily represented using vectors with hidden fields (this representation derives from GNU Emacs).
specifier: A complex object used to specify the value of a display property; a default value is given and different values can be specified for particular frames, buffers, windows, devices, or classes of device.
char-table: An object that maps from chars or classes of chars to arbitrary Lisp objects; internally char tables use a complex nested-vector representation that is optimized to the way characters are represented as integers.
range-table: An object that maps from ranges of integers to arbitrary Lisp objects.

And some strange special-purpose objects:

charset
coding-system: Objects used when MULE, or multi-lingual/Asian-language, support is enabled.
color-instance
font-instance
image-instance: An object that encapsulates a window-system resource; instances are mostly used internally but are exposed on the Lisp level for cleanness of the specifier model and because it’s occasionally useful for Lisp program to create or query the properties of instances.
subwindow: An object that encapsulate a subwindow resource, i.e. a window-system child window that is drawn into by an external process; this object should be integrated into the glyph system but isn’t yet, and may change form when this is done.
toolbar-button: An object used in conjunction with the toolbar.

And objects that are only used internally:

opaque: A generic object for encapsulating arbitrary memory; this allows you the generality of malloc() and the convenience of the Lisp object system.
lstream: A buffering I/O stream, used to provide a unified interface to anything that can accept output or provide input, such as a file descriptor, a stdio stream, a chunk of memory, a Lisp buffer, a Lisp string, etc.; it’s a Lisp object to make its memory management more convenient.
char-table-entry: Subsidiary objects in the internal char-table representation.
extent-auxiliary
menubar-data
toolbar-data: Various special-purpose objects that are basically just used to encapsulate memory for particular subsystems, similar to the more general “opaque” object.
symbol-value-forward
symbol-value-buffer-local
symbol-value-varalias
symbol-value-lisp-magic: Special internal-only objects that are placed in the value cell of a symbol to indicate that there is something special with this variable – e.g. it has no value, it mirrors another variable, or it mirrors some C variable; there is really only one kind of object, called a symbol-value-magic, but it is sort-of halfway kludged into semi-different object types.

Some types of objects are permanent, meaning that once created, they do not disappear until explicitly destroyed, using a function such as delete-buffer, delete-window, delete-frame, etc. Others will disappear once they are not longer used, through the garbage collection mechanism. Buffers, frames, windows, devices, and processes are among the objects that are permanent. Note that some objects can go both ways: Faces can be created either way; extents are normally permanent, but detached extents (extents not referring to any text, as happens to some extents when the text they are referring to is deleted) are temporary. Note that some permanent objects, such as faces and coding systems, cannot be deleted. Note also that windows are unique in that they can be undeleted after having previously been deleted. (This happens as a result of restoring a window configuration.)

Note that many types of objects have a read syntax, i.e. a way of specifying an object of that type in Lisp code. When you load a Lisp file, or type in code to be evaluated, what really happens is that the function read is called, which reads some text and creates an object based on the syntax of that text; then eval is called, which possibly does something special; then this loop repeats until there’s no more text to read. (eval only actually does something special with symbols, which causes the symbol’s value to be returned, similar to referencing a variable; and with conses [i.e. lists], which cause a function invocation. All other values are returned unchanged.)

The read syntax

converts to an integer whose value is 17297.

1.983e-4

converts to a float whose value is 1.983e-4, or .0001983.

?b

converts to a char that represents the lowercase letter b.

?^[$(B#&^[(B

(where ‘^[’ actually is an ‘ESC’ character) converts to a particular Kanji character when using an ISO2022-based coding system for input. (To decode this goo: ‘ESC’ begins an escape sequence; ‘ESC $ (’ is a class of escape sequences meaning “switch to a 94x94 character set”; ‘ESC $ ( B’ means “switch to Japanese Kanji”; ‘#’ and ‘&’ collectively index into a 94-by-94 array of characters [subtract 33 from the ASCII value of each character to get the corresponding index]; ‘ESC (’ is a class of escape sequences meaning “switch to a 94 character set”; ‘ESC (B’ means “switch to US ASCII”. It is a coincidence that the letter ‘B’ is used to denote both Japanese Kanji and US ASCII. If the first ‘B’ were replaced with an ‘A’, you’d be requesting a Chinese Hanzi character from the GB2312 character set.)

"foobar"

converts to a string.

foobar

converts to a symbol whose name is "foobar". This is done by looking up the string equivalent in the global variable obarray, whose contents should be an obarray. If no symbol is found, a new symbol with the name "foobar" is automatically created and added to obarray; this process is called interning the symbol.

(foo . bar)

converts to a cons cell containing the symbols foo and bar.

(1 a 2.5)

converts to a three-element list containing the specified objects (note that a list is actually a set of nested conses; see the SXEmacs Lisp Reference).

[1 a 2.5]

converts to a three-element vector containing the specified objects.

#[... ... ... ...]

converts to a compiled-function object (the actual contents are not shown since they are not relevant here; look at a file that ends with .elc for examples).

#*01110110

converts to a bit-vector.

#s(hash-table ... ...)

converts to a hash table (the actual contents are not shown).

#s(range-table ... ...)

converts to a range table (the actual contents are not shown).

#s(char-table ... ...)

converts to a char table (the actual contents are not shown).

Note that the #s() syntax is the general syntax for structures, which are not really implemented in SXEmacs Lisp but should be.

When an object is printed out (using print or a related function), the read syntax is used, so that the same object can be read in again.

The other objects do not have read syntaxes, usually because it does not really make sense to create them in this fashion (i.e. processes, where it doesn’t make sense to have a subprocess created as a side effect of reading some Lisp code), or because they can’t be created at all (e.g. subrs). Permanent objects, as a rule, do not have a read syntax; nor do most complex objects, which contain too much state to be easily initialized through a read syntax.