As mentioned above, strings are a special case. A string is logically
two parts, a fixed-size object (containing the length, property list,
and a pointer to the actual data), and the actual data in the string.
The fixed-size object is a
struct Lisp_String and is allocated in
frob blocks, as usual. The actual data is stored in special
string-chars blocks, which are 8K blocks of memory.
Currently-allocated strings are simply laid end to end in these
string-chars blocks, with a pointer back to the
stored before each string in the string-chars block. When a new string
needs to be allocated, the remaining space at the end of the last
string-chars block is used if there’s enough, and a new string-chars
block is created otherwise.
There are never any holes in the string-chars blocks due to the string
compaction and relocation that happens at the end of garbage collection.
During the sweep stage of garbage collection, when objects are
reclaimed, the garbage collector goes through all string-chars blocks,
looking for unused strings. Each chunk of string data is preceded by a
pointer to the corresponding
struct Lisp_String, which indicates
both whether the string is used and how big the string is, i.e. how to
get to the next chunk of string data. Holes are compressed by
block-copying the next string into the empty space and relocating the
pointer stored in the corresponding
This means you have to be careful with strings in your code.
See the section above on
Note that there is one situation not handled: a string that is too big
to fit into a string-chars block. Such strings, called big
strings, are all
malloc()ed as their own block. (#### Although it
would make more sense for the threshold for big strings to be somewhat
lower, e.g. 1/2 or 1/4 the size of a string-chars block. It seems that
this was indeed the case formerly—indeed, the threshold was set at
1/8—but Mly forgot about this when rewriting things for 19.8.)
Note also that the string data in string-chars blocks is padded as
necessary so that proper alignment constraints on the
Lisp_String back pointers are maintained.
Finally, strings can be resized. This happens in Mule when a
character is substituted with a different-length character, or during
modeline frobbing. (You could also export this to Lisp, but it’s not
done so currently.) Resizing a string is a potentially tricky process.
If the change is small enough that the padding can absorb it, nothing
other than a simple memory move needs to be done. Keep in mind,
however, that the string can’t shrink too much because the offset to the
next string in the string-chars block is computed by looking at the
length and rounding to the nearest multiple of four or eight. If the
string would shrink or expand beyond the correct padding, new string
data needs to be allocated at the end of the last string-chars block and
the data moved appropriately. This leaves some dead string data, which
is marked by putting a special marker of 0xFFFFFFFF in the
Lisp_String pointer before the data (there’s no real
Lisp_String to point to and relocate), and storing the size of the dead
string data (which would normally be obtained from the now-non-existent
struct Lisp_String) at the beginning of the dead string data gap.
The string compactor recognizes this special 0xFFFFFFFF marker and
handles it correctly.