Next: , Previous: , Up: Allocation of Objects in SXEmacs Lisp   [Contents][Index]


11.14 String

As mentioned above, strings are a special case. A string is logically two parts, a fixed-size object (containing the length, property list, and a pointer to the actual data), and the actual data in the string. The fixed-size object is a struct Lisp_String and is allocated in frob blocks, as usual. The actual data is stored in special string-chars blocks, which are 8K blocks of memory. Currently-allocated strings are simply laid end to end in these string-chars blocks, with a pointer back to the struct Lisp_String stored before each string in the string-chars block. When a new string needs to be allocated, the remaining space at the end of the last string-chars block is used if there’s enough, and a new string-chars block is created otherwise.

There are never any holes in the string-chars blocks due to the string compaction and relocation that happens at the end of garbage collection. During the sweep stage of garbage collection, when objects are reclaimed, the garbage collector goes through all string-chars blocks, looking for unused strings. Each chunk of string data is preceded by a pointer to the corresponding struct Lisp_String, which indicates both whether the string is used and how big the string is, i.e. how to get to the next chunk of string data. Holes are compressed by block-copying the next string into the empty space and relocating the pointer stored in the corresponding struct Lisp_String. This means you have to be careful with strings in your code. See the section above on GCPROing.

Note that there is one situation not handled: a string that is too big to fit into a string-chars block. Such strings, called big strings, are all malloc()ed as their own block. (#### Although it would make more sense for the threshold for big strings to be somewhat lower, e.g. 1/2 or 1/4 the size of a string-chars block. It seems that this was indeed the case formerly—indeed, the threshold was set at 1/8—but Mly forgot about this when rewriting things for 19.8.)

Note also that the string data in string-chars blocks is padded as necessary so that proper alignment constraints on the struct Lisp_String back pointers are maintained.

Finally, strings can be resized. This happens in Mule when a character is substituted with a different-length character, or during modeline frobbing. (You could also export this to Lisp, but it’s not done so currently.) Resizing a string is a potentially tricky process. If the change is small enough that the padding can absorb it, nothing other than a simple memory move needs to be done. Keep in mind, however, that the string can’t shrink too much because the offset to the next string in the string-chars block is computed by looking at the length and rounding to the nearest multiple of four or eight. If the string would shrink or expand beyond the correct padding, new string data needs to be allocated at the end of the last string-chars block and the data moved appropriately. This leaves some dead string data, which is marked by putting a special marker of 0xFFFFFFFF in the struct Lisp_String pointer before the data (there’s no real struct Lisp_String to point to and relocate), and storing the size of the dead string data (which would normally be obtained from the now-non-existent struct Lisp_String) at the beginning of the dead string data gap. The string compactor recognizes this special 0xFFFFFFFF marker and handles it correctly.


Next: , Previous: , Up: Allocation of Objects in SXEmacs Lisp   [Contents][Index]