Next: , Up: Coding Systems   [Contents][Index]


66.5.1 Coding System Types

The coding system type determines the basic algorithm SXEmacs will use to decode or encode a data stream. Character encodings will be converted to the MULE encoding, escape sequences processed, and newline sequences converted to SXEmacs’s internal representation. There are three basic classes of coding system type: no-conversion, ISO-2022, and special.

No conversion allows you to look at the file’s internal representation. Since SXEmacs is basically a text editor, "no conversion" does convert newline conventions by default. (Use the ’binary coding-system if this is not desired.)

ISO 2022 (see ISO 2022) is the basic international standard regulating use of "coded character sets for the exchange of data", ie, text streams. ISO 2022 contains functions that make it possible to encode text streams to comply with restrictions of the Internet mail system and de facto restrictions of most file systems (eg, use of the separator character in file names). Coding systems which are not ISO 2022 conformant can be difficult to handle. Perhaps more important, they are not adaptable to multilingual information interchange, with the obvious exception of ISO 10646 (Unicode). (Unicode is partially supported by SXEmacs with the addition of the Lisp package ucs-conv.)

The special class of coding systems includes automatic detection, CCL (a "little language" embedded as an interpreter, useful for translating between variants of a single character set), non-ISO-2022-conformant encodings like Unicode, Shift JIS, and Big5, and MULE internal coding. (NB: this list is based on XEmacs 21.2. Terminology may vary slightly for other versions of SXEmacs, XEmacs and for GNU Emacs 20.)

no-conversion

No conversion, for binary files, and a few special cases of non-ISO-2022 coding systems where conversion is done by hook functions (usually implemented in CCL). On output, graphic characters that are not in ASCII or Latin-1 will be replaced by a ‘?’. (For a no-conversion-encoded buffer, these characters will only be present if you explicitly insert them.)

iso2022

Any ISO-2022-compliant encoding. Among others, this includes JIS (the Japanese encoding commonly used for e-mail), national variants of EUC (the standard Unix encoding for Japanese and other languages), and Compound Text (an encoding used in X11). You can specify more specific information about the conversion with the flags argument.

ucs-4

ISO 10646 UCS-4 encoding. A 31-bit fixed-width superset of Unicode.

utf-8

ISO 10646 UTF-8 encoding. A “file system safe” transformation format that can be used with both UCS-4 and Unicode.

undecided

Automatic conversion. SXEmacs attempts to detect the coding system used in the file.

shift-jis

Shift-JIS (a Japanese encoding commonly used in PC operating systems).

big5

Big5 (the encoding commonly used for Taiwanese).

ccl

The conversion is performed using a user-written pseudo-code program. CCL (Code Conversion Language) is the name of this pseudo-code. For example, CCL is used to map KOI8-R characters (an encoding for Russian Cyrillic) to ISO8859-5 (the form used internally by MULE).

internal

Write out or read in the raw contents of the memory representing the buffer’s text. This is primarily useful for debugging purposes, and is only enabled when SXEmacs has been compiled with DEBUG_XEMACS set (the ‘--debug’ configure option). Warning: Reading in a file using internal conversion can result in an internal inconsistency in the memory representing a buffer’s text, which will produce unpredictable results and may cause SXEmacs to crash. Under normal circumstances you should never use internal conversion.


Next: , Up: Coding Systems   [Contents][Index]