16.5. Multibyte CharactersIn multibyte character sets, each character is coded as a sequence of one or more bytes (see "Wide Characters and Multibyte Characters" in Chapter 1). Unlike wide characters, each of which is represented by a single object of the type wchar_t, individual multibyte characters may be represented by different numbers of bytes. However, the number of bytes that represent a multibyte character , including any necessary state-shift sequences, is never more than the value of the macro MB_CUR_MAX, which is defined in the header stdlib.h. C provides standard functions to obtain the wide-character code, or wchar_t value, that corresponds to any given multibyte character, and to convert any wide character to its multibyte representation. Some multibyte encoding schemes are stateful; the interpretation of a given multibyte sequence may depend on its position with respect to control characters, called shift sequences, that are used in the multibyte stream or string. In such cases, the conversion of a multibyte character to a wide character, or the conversion of a multibyte string into a wide string, depends on the current shift state at the point where the first multibyte character is read. For the same reason, converting a wide character to a multibyte character, or a wide string to a multibyte string, may entail inserting appropriate shift sequences in the output. Conversions between wide and multibyte characters or strings may be necessary when you read or write characters from a wide-oriented stream (see "Byte-Oriented and Wide-Oriented Streams" in Chapter 13). Table 16-17 lists all of the standard library functions for handling multibyte characters.
The letter r in the names of functions declared in wchar.h stands for "restartable." The restartable functions, in contrast to those declared in stdlib.h, without the r in their names, take an additional argument, which is a pointer to an object that stores the shift state of the multibyte character or string argument. |