charmap — character symbols to define character encodings
A character set description (charmap) defines a character set of available characters and their encodings. All supported character sets should have the portable character set as a proper subset.
The charmap file starts with a header, that may consist of the following keywords:
<codeset>
is followed by the name of the codeset.
<mb_cur_max>
is followed by the max number of bytes for a multibyte-character. Multibyte characters are currently not supported. The default value is 1.
<mb_cur_min>
is followed by the min number of bytes for a
character. This value must be less or equal than
mb_cur_max
.
If not specified, it defaults to mb_cur_max
.
<escape_char>
is followed by a character that should be used as
the escape-character for the rest of the file to mark
characters that should be interpreted in a special
way. It defaults to the backslash ( \
).
<comment_char>
is followed by a character that will be used as
the comment-character for the rest of the file. It
defaults to the number sign ( #
).
The charmap-definition itself starts with the keyword
CHARMAP
in column 1.
The following lines may have one of the two following forms to define the character-encodings:
This form defines exactly one character and its encoding.
This form defines a couple of characters. This is only useful for multibyte-characters, which are currently not implemented.
The last line in a charmap-definition file must contain END CHARMAP.
A symbolic name
for a character contains only characters of the
portable character
set. The name itself is enclosed between angle
brackets. Characters following an <escape_char>
are
interpreted as itself; for example, the sequence '<\\\>>'
represents the symbolic name '\>'
enclosed in angle
brackets.