3.3. Character ConstantsA character constant consists of one or more characters enclosed in single quotation marks. Some examples: 'a' 'XY' '0' '*' All the characters of the source character set are permissible in character constants , except the single quotation mark ', the backslash \, and the newline character. To represent these characters, you must use escape sequences: '\'' '\\' '\n' All the escape sequences that are permitted in character constants are described in the upcoming section "Escape sequences." 3.3.1. The Type of Character ConstantsCharacter constants have the type int, unless they are explicitly defined as wide characters, with type wchar_t, by the prefix L. If a character constant contains one character that can be represented in a single byte, then its value is the character code of that character in the execution character set. For example, the constant 'a' in ASCII encoding has the decimal value 97. The value of character constants that consist of more than one character can vary from one compiler to another. The following code fragment tests whether the character read is a digit between 1 and 5, inclusive: #include <stdio.h> int c = 0; /* ... */ c = getchar( ); // Read a character. if ( c != EOF && c > '0' && c < '6' ) // Compare input to character // constants. { /* This block is executed if the user entered a digit from 1 to 5. */ } If the type char is signed, then the value of a character constant can also be negative, because the constant's value is the result of a type conversion of the character code from char to int. For example, ISO 8859-1 is a commonly used 8-bit character set, also known as the ISO Latin 1 or ANSI character set . In this character set, the currency symbol for pounds sterling, £, is coded as hexadecimal A3: int c = '\xA3'; // Symbol for pounds sterling printf("Character: %c Code: %d\n", c, c); If the execution character set is ISO 8859-1, and the type char is signed, then the printf statement in the preceding example generates the following output: Character: £ Code: -93 In a program that uses characters that are not representable in a single byte, you can use wide-character constants . Wide-character constants have the type wchar_t, and are written with the prefix L, as in these examples: L'a' L'12' L'\012' L'\u03B2' The value of a wide-character constant that contains a single multibyte character is the value that the standard function mbtowc( ) ("multibyte to wide character") would return for that multibyte character.
3.3.2. Escape SequencesAn escape sequence begins with a backslash \, and represents a single character. Escape sequences allow you to represent any character in character constants and string literals, including nonprintable characters and characters that otherwise have a special meaning, such as ' and ". Table 3-3 lists the escape sequences recognized in C.
In the table, the active position refers to the position at which the output device prints the next output character, such as the position of the cursor on a console display. The behavior of the output device is not defined in the following cases: if the escape sequence \b (backspace) occurs at the beginning of a line; if \t (tab) occurs at the end of a line; or if \v (vertical tab) occurs at the end of a page. As Table 3.3 shows, universal character names are also considered escape sequences. Universal character names allow you to specify any character in the extended character set, regardless of the encoding used. See "Universal Character Names" in Chapter 1 for more information. You can also specify any character code in the value range of the type unsigned charor any wide-character code in the value range of wchar_tusing the octal and hexadecimal escape sequences , as shown in Table 3-4.
There is no equivalent octal notation for the last constant in the table, L'\xF82', because octal escape sequences cannot have more than three octal digits. For the same reason, the wide-character constant L'\3702' consists of two characters: L'\370' and L'2'. |