SXEmacs Internals Manual: lrecords

11.7 lrecords

[see lrecord.h]

All lrecords have at the beginning of their structure a struct lrecord_header. This just contains a type number and some flags, including the mark bit. All builtin type numbers are defined as constants in enum lrecord_type, to allow the compiler to generate more efficient code for typeP. The type number, thru the lrecord_implementation_table, gives access to a struct lrecord_implementation, which is a structure containing method pointers and such. There is one of these for each type, and it is a global, constant, statically-declared structure that is declared in the DEFINE_LRECORD_IMPLEMENTATION() macro.

Simple lrecords (of type (b) above) just have a struct lrecord_header at their beginning. lcrecords, however, actually have a struct lcrecord_header. This, in turn, has a struct lrecord_header at its beginning, so sanity is preserved; but it also has a pointer used to chain all lcrecords together, and a special ID field used to distinguish one lcrecord from another. (This field is used only for debugging and could be removed, but the space gain is not significant.)

Simple lrecords are created using ALLOCATE_FIXED_TYPE(), just like for other frob blocks. The only change is that the implementation pointer must be initialized correctly. (The implementation structure for an lrecord, or rather the pointer to it, is named lrecord_float, lrecord_extent, lrecord_buffer, etc.)

lcrecords are created using alloc_lcrecord(). This takes a size to allocate and an implementation pointer. (The size needs to be passed because some lcrecords, such as window configurations, are of variable size.) This basically just malloc()s the storage, initializes the struct lcrecord_header, and chains the lcrecord onto the head of the list of all lcrecords, which is stored in the variable all_lcrecords. The calls to alloc_lcrecord() generally occur in the lowest-level allocation function for each lrecord type.

Whenever you create an lrecord, you need to call either DEFINE_LRECORD_IMPLEMENTATION() or DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION(). This needs to be specified in a .c file, at the top level. What this actually does is define and initialize the implementation structure for the lrecord. (And possibly declares a function error_check_foo() that implements the XFOO() macro when error-checking is enabled.) The arguments to the macros are the actual type name (this is used to construct the C variable name of the lrecord implementation structure and related structures using the ‘##’ macro concatenation operator), a string that names the type on the Lisp level (this may not be the same as the C type name; typically, the C type name has underscores, while the Lisp string has dashes), various method pointers, and the name of the C structure that contains the object. The methods are used to encapsulate type-specific information about the object, such as how to print it or mark it for garbage collection, so that it’s easy to add new object types without having to add a specific case for each new type in a bunch of different places.

The difference between DEFINE_LRECORD_IMPLEMENTATION() and DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION() is that the former is used for fixed-size object types and the latter is for variable-size object types. Most object types are fixed-size; some complex types, however (e.g. window configurations), are variable-size. Variable-size object types have an extra method, which is called to determine the actual size of a particular object of that type. (Currently this is only used for keeping allocation statistics.)

For the purpose of keeping allocation statistics, the allocation engine keeps a list of all the different types that exist. Note that, since DEFINE_LRECORD_IMPLEMENTATION() is a macro that is specified at top-level, there is no way for it to initialize the global data structures containing type information, like lrecord_implementations_table. For this reason a call to INIT_LRECORD_IMPLEMENTATION must be added to the same source file containing DEFINE_LRECORD_IMPLEMENTATION, but instead of to the top level, to one of the init functions, typically syms_of_foo.c. INIT_LRECORD_IMPLEMENTATION must be called before an object of this type is used.

The type number is also used to index into an array holding the number of objects of each type and the total memory allocated for objects of that type. The statistics in this array are computed during the sweep stage. These statistics are returned by the call to garbage-collect.

Note that for every type defined with a DEFINE_LRECORD_*() macro, there needs to be a DECLARE_LRECORD_IMPLEMENTATION() somewhere in a .h file, and this .h file needs to be included by inline.c.

Furthermore, there should generally be a set of XFOOBAR(), FOOBARP(), etc. macros in a .h (or occasionally .c) file. To create one of these, copy an existing model and modify as necessary.

Please note: If you define an lrecord in an external dynamically-loaded module, you must use DECLARE_EXTERNAL_LRECORD, DEFINE_EXTERNAL_LRECORD_IMPLEMENTATION, and DEFINE_EXTERNAL_LRECORD_SEQUENCE_IMPLEMENTATION instead of the non-EXTERNAL forms. These macros will dynamically add new type numbers to the global enum that records them, whereas the non-EXTERNAL forms assume that the programmer has already inserted the correct type numbers into the enum’s code at compile-time.

The various methods in the lrecord implementation structure are:

A mark method. This is called during the marking stage and passed a function pointer (usually the mark_object() function), which is used to mark an object. All Lisp objects that are contained within the object need to be marked by applying this function to them. The mark method should also return a Lisp object, which should be either nil or an object to mark. (This can be used in lieu of calling mark_object() on the object, to reduce the recursion depth, and consequently should be the most heavily nested sub-object, such as a long list.)
Please note: When the mark method is called, garbage collection is in progress, and special precautions need to be taken when accessing objects; see section (B) above.

If your mark method does not need to do anything, it can be NULL.
A print method. This is called to create a printed representation of the object, whenever princ, prin1, or the like is called. It is passed the object, a stream to which the output is to be directed, and an escapeflag which indicates whether the object’s printed representation should be escaped so that it is readable. (This corresponds to the difference between princ and prin1.) Basically, escaped means that strings will have quotes around them and confusing characters in the strings such as quotes, backslashes, and newlines will be backslashed; and that special care will be taken to make symbols print in a readable fashion (e.g. symbols that look like numbers will be backslashed). Other readable objects should perhaps pass escapeflag on when sub-objects are printed, so that readability is preserved when necessary (or if not, always pass in a 1 for escapeflag). Non-readable objects should in general ignore escapeflag, except that some use it as an indication that more verbose output should be given.
Sub-objects are printed using print_internal(), which takes exactly the same arguments as are passed to the print method.

Literal C strings should be printed using write_c_string(), or write_string_1() for non-null-terminated strings.

Functions that do not have a readable representation should check the print_readably flag and signal an error if it is set.

If you specify NULL for the print method, the default_object_printer() will be used.
A finalize method. This is called at the beginning of the sweep stage on lcrecords that are about to be freed, and should be used to perform any extra object cleanup. This typically involves freeing any extra malloc()ed memory associated with the object, releasing any operating-system and window-system resources associated with the object (e.g. pixmaps, fonts), etc.
The finalize method can be NULL if nothing needs to be done.

WARNING #1: The finalize method is also called at the end of the dump phase; this time with the for_disksave parameter set to non-zero. The object is not about to disappear, so you have to make sure to not free any extra malloc()ed memory if you’re going to need it later. (Also, signal an error if there are any operating-system and window-system resources here, because they can’t be dumped.)

Finalize methods should, as a rule, set to zero any pointers after they’ve been freed, and check to make sure pointers are not zero before freeing. Although I’m pretty sure that finalize methods are not called twice on the same object (except for the for_disksave proviso), we’ve gotten nastily burned in some cases by not doing this.

WARNING #2: The finalize method is only called for lcrecords, not for simply lrecords. If you need a finalize method for simple lrecords, you have to stick it in the ADDITIONAL_FREE_foo() macro in alloc.c.

WARNING #3: Things are in an extremely bizarre state when ADDITIONAL_FREE_foo() is called, so you have to be incredibly careful when writing one of these functions. See the comment in gc_sweep(). If you ever have to add one of these, consider using an lcrecord or dealing with the problem in a different fashion.
An equal method. This compares the two objects for similarity, when equal is called. It should compare the contents of the objects in some reasonable fashion. It is passed the two objects and a depth value, which is used to catch circular objects. To compare sub-Lisp-objects, call internal_equal() and bump the depth value by one. If this value gets too high, a circular-object error will be signaled.
If this is NULL, objects are equal only when they are eq, i.e. identical.
A hash method. This is used to hash objects when they are to be compared with equal. The rule here is that if two objects are equal, they must hash to the same value; i.e. your hash function should use some subset of the sub-fields of the object that are compared in the “equal” method. If you specify this method as NULL, the object’s pointer will be used as the hash, which will fail if the object has an equal method, so don’t do this.
To hash a sub-Lisp-object, call internal_hash(). Bump the depth by one, just like in the “equal” method.

To convert a Lisp object directly into a hash value (using its pointer), use LISP_HASH(). This is what happens when the hash method is NULL.

To hash two or more values together into a single value, use HASH2(), HASH3(), HASH4(), etc.
getprop, putprop, remprop, and plist methods. These are used for object types that have properties. I don’t feel like documenting them here. If you create one of these objects, you have to use different macros to define them, i.e. DEFINE_LRECORD_IMPLEMENTATION_WITH_PROPS() or DEFINE_LRECORD_SEQUENCE_IMPLEMENTATION_WITH_PROPS().
A size_in_bytes method, when the object is of variable-size. (i.e. declared with a _SEQUENCE_IMPLEMENTATION macro.) This should simply return the object’s size in bytes, exactly as you might expect. For an example, see the methods for window configurations and opaques.