SXEmacs User’s Manual: Syntax Entry

27.5.1 Information About Each Character

The syntax table entry for a character is a number that encodes six pieces of information:

The syntactic class of the character, represented as a small integer
The matching delimiter, for delimiter characters only (the matching delimiter of ‘(’ is ‘)’, and vice versa)
A flag saying whether the character is the first character of a two-character comment starting sequence
A flag saying whether the character is the second character of a two-character comment starting sequence
A flag saying whether the character is the first character of a two-character comment ending sequence
A flag saying whether the character is the second character of a two-character comment ending sequence

The syntactic classes are stored internally as small integers, but are usually described to or by the user with characters. For example, ‘(’ is used to specify the syntactic class of opening delimiters. Here is a table of syntactic classes, with the characters that specify them.

‘-’: The class of whitespace characters. Please don’t use the formerly advertised , which is not supported by GNU Emacs.
‘w’: The class of word-constituent characters.
‘_’: The class of characters that are part of symbol names but not words. This class is represented by ‘_’ because the character ‘_’ has this class in both C and Lisp.
‘.’: The class of punctuation characters that do not fit into any other special class.
‘(’: The class of opening delimiters.
‘)’: The class of closing delimiters.
‘'’: The class of expression-adhering characters. These characters are part of a symbol if found within or adjacent to one, and are part of a following expression if immediately preceding one, but are like whitespace if surrounded by whitespace.
‘"’: The class of string-quote characters. They match each other in pairs, and the characters within the pair all lose their syntactic significance except for the ‘\’ and ‘/’ classes of escape characters, which can be used to include a string-quote inside the string.
‘$’: The class of self-matching delimiters. This is intended for TeX’s ‘$’, which is used both to enter and leave math mode. Thus, a pair of matching ‘$’ characters surround each piece of math mode TeX input. A pair of adjacent ‘$’ characters act like a single one for purposes of matching.
‘/’: The class of escape characters that always just deny the following character its special syntactic significance. The character after one of these escapes is always treated as alphabetic.
‘\’: The class of C-style escape characters. In practice, these are treated just like ‘/’-class characters, because the extra possibilities for C escapes (such as being followed by digits) have no effect on where the containing expression ends.
‘<’: The class of comment-starting characters. Only single-character comment starters (such as ‘;’ in Lisp mode) are represented this way.
‘>’: The class of comment-ending characters. Newline has this syntax in Lisp mode.

The characters flagged as part of two-character comment delimiters can have other syntactic functions most of the time. For example, ‘/’ and ‘*’ in C code, when found separately, have nothing to do with comments. The comment-delimiter significance overrides when the pair of characters occur together in the proper order. Only the list and sexp commands use the syntax table to find comments; the commands specifically for comments have other variables that tell them where to find comments. Moreover, the list and sexp commands notice comments only if parse-sexp-ignore-comments is non-nil. This variable is set to nil in modes where comment-terminator sequences are liable to appear where there is no comment, for example, in Lisp mode where the comment terminator is a newline but not every newline ends a comment.