Next: , Previous: , Up: MULE Character Sets and Encodings   [Contents][Index]


18.3 Internal Mule Encodings

In SXEmacs/Mule, each character set is assigned a unique number, called a leading byte. This is used in the encodings of a character. Leading bytes are in the range 0x80 - 0xFF (except for ASCII, which has a leading byte of 0), although some leading bytes are reserved.

Charsets whose leading byte is in the range 0x80 - 0x9F are called official and are used for built-in charsets. Other charsets are called private and have leading bytes in the range 0xA0 - 0xFF; these are user-defined charsets.

More specifically:

Character set           Leading byte
-------------           ------------
ASCII                   0
Composite               0x80
Dimension-1 Official    0x81 - 0x8D
                          (0x8E is free)
Control-1               0x8F
Dimension-2 Official    0x90 - 0x99
                          (0x9A - 0x9D are free;
                           0x9E and 0x9F are reserved)
Dimension-1 Private     0xA0 - 0xEF
Dimension-2 Private     0xF0 - 0xFF

There are two internal encodings for characters in SXEmacs/Mule. One is called string encoding and is an 8-bit encoding that is used for representing characters in a buffer or string. It uses 1 to 4 bytes per character. The other is called character encoding and is a 19-bit encoding that is used for representing characters individually in a variable.

(In the following descriptions, we’ll ignore composite characters for the moment. We also give a general (structural) overview first, followed later by the exact details.)