Next: General Guidelines for Writing Mule-Aware Code, Previous: Working With Character and Byte Positions, Up: Coding for Mule
When an external function, such as a C library function, returns a
char pointer, you should almost never treat it as Bufbyte.
This is because these returned strings may contain 8bit characters which
can be misinterpreted by SXEmacs, and cause a crash. Likewise, when
exporting a piece of internal text to the outside world, you should
always convert it to an appropriate external encoding, lest the internal
stuff (such as the infamous \201 characters) leak out.
The interface to conversion between the internal and external
representations of text are the numerous conversion macros defined in
buffer.h. There used to be a fixed set of external formats
supported by these macros, but now any coding system can be used with
these macros. The coding system alias mechanism is used to create the
following logical coding systems, which replace the fixed external
formats. The (dontusethis-set-symbol-value-handler) mechanism was
enhanced to make this possible (more work on that is needed - like
remove the dontusethis- prefix).
Qbinarybinary coding
system:
Qfile_namefile-name-coding-system or pathname-coding-system (now
obsolete) variables.
Qnativeargv[], stuff
from getenv(), stuff from the /etc/passwd file, etc.
Currently this is the same as Qfile_name. The two should be
distinguished for clarity and possible future separation.
QctextThere are two fundamental macros to convert between external and internal format.
TO_INTERNAL_FORMAT converts external data to internal format, and
TO_EXTERNAL_FORMAT converts the other way around. The arguments
each of these receives are a source type, a source, a sink type, a sink,
and a coding system (or a symbol naming a coding system).
A typical call looks like
TO_EXTERNAL_FORMAT (LISP_STRING, str, C_STRING_MALLOC, ptr, Qfile_name);
which means that the contents of the lisp string str are written
to a malloc'ed memory area which will be pointed to by ptr, after
the function returns. The conversion will be done using the
file-name coding system, which will be controlled by the user
indirectly by setting or binding the variable
file-name-coding-system.
Some sources and sinks require two C variables to specify. We use some preprocessor magic to allow different source and sink types, and even different numbers of arguments to specify different types of sources and sinks.
So we can have a call that looks like
TO_INTERNAL_FORMAT (DATA, (ptr, len),
MALLOC, (ptr, len),
coding_system);
The parenthesized argument pairs are required to make the preprocessor magic work.
Here are the different source and sink types:
DATA, (ptr, len),ALLOCA, (ptr, len),MALLOC, (ptr, len),C_STRING_ALLOCA, ptr,ALLOCA (ptr, len_ignored) on output.
C_STRING_MALLOC, ptr,MALLOC (ptr, len_ignored) on output
C_STRING, ptr,DATA, (ptr, strlen (ptr) + 1) on input
LISP_STRING, string,LISP_BUFFER, buffer,(point) in lisp buffer buffer
LISP_LSTREAM, lstream,LISP_OPAQUE, object,Often, the data is being converted to a '\0'-byte-terminated string,
which is the format required by many external system C APIs. For these
purposes, a source type of C_STRING or a sink type of
C_STRING_ALLOCA or C_STRING_MALLOC is appropriate.
Otherwise, we should try to keep SXEmacs '\0'-byte-clean, which means
using (ptr, len) pairs.
The sinks to be specified must be lvalues, unless they are the lisp
object types LISP_LSTREAM or LISP_BUFFER.
For the sink types ALLOCA and C_STRING_ALLOCA, the
resulting text is stored in a stack-allocated buffer, which is
automatically freed on returning from the function. However, the sink
types MALLOC and C_STRING_MALLOC return xmalloc()ed
memory. The caller is responsible for freeing this memory using
xfree().
Note that it doesn't make sense for LISP_STRING to be a source
for TO_INTERNAL_FORMAT or a sink for TO_EXTERNAL_FORMAT.
You'll get an assertion failure if you try.