Next: , Previous: Encodings, Up: MULE Character Sets and Encodings


18.3 Internal Mule Encodings

In SXEmacs/Mule, each character set is assigned a unique number, called a leading byte. This is used in the encodings of a character. Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has a leading byte of 0), although some leading bytes are reserved.

Charsets whose leading byte is in the range 0x80 - 0x9F are called official and are used for built-in charsets. Other charsets are called private and have leading bytes in the range 0xA0 - 0xFF; these are user-defined charsets.

More specifically:

     Character set           Leading byte
     -------------           ------------
     ASCII                   0
     Composite               0x80
     Dimension-1 Official    0x81 - 0x8D
                               (0x8E is free)
     Control-1               0x8F
     Dimension-2 Official    0x90 - 0x99
                               (0x9A - 0x9D are free;
                                0x9E and 0x9F are reserved)
     Dimension-1 Private     0xA0 - 0xEF
     Dimension-2 Private     0xF0 - 0xFF

There are two internal encodings for characters in SXEmacs/Mule. One is called string encoding and is an 8-bit encoding that is used for representing characters in a buffer or string. It uses 1 to 4 bytes per character. The other is called character encoding and is a 19-bit encoding that is used for representing characters individually in a variable.

(In the following descriptions, we'll ignore composite characters for the moment. We also give a general (structural) overview first, followed later by the exact details.)