Categories of Data ValuesMySQL knows about several general categories in which data values can be represented. These include numbers, string values, temporal values such as dates and times, spatial values, and the NULL value. Numeric ValuesNumbers are values such as 48 or 193.62. MySQL understands numbers specified as integers (which have no fractional part) and floating-point or fixed-point values (which may have a fractional part). Integers can be specified in decimal or hexadecimal format. An integer consists of a sequence of digits with no decimal point. In numeric contexts, an integer can be specified as a hexadecimal constant and is treated as a 64-bit integer. For example, 0x10 is 16 decimal. Hexadecimal values are treated as strings by default, so their syntax is given in the next section, "String Values." A floating-point or fixed-point number consists of a sequence of digits, a decimal point, and another sequence of digits. The sequence of digits before or after the decimal point may be empty, but not both. MySQL understands scientific notation. This is indicated by immediately following an integer or floating-point number with 'e' or 'E', a sign character ('+' or '-'), and an integer exponent. 1.34E+12 and 43.27e-1 are legal numbers in scientific notation. The number 1.34E12 is also legal even though it is missing an optional sign character before the exponent. Hexadecimal numbers cannot be used in scientific notation; the 'e' that begins the exponent part is also a legal hex digit and thus would be ambiguous. Any number can be preceded by a plus or minus sign character ('+' or '-'), to indicate a positive or negative value. As of MySQL 5.0.3, bit-field values can be written as b'val', where val consists of one or more binary digits (0 or 1). For example, b'1001' is 9 decimal. This notation coincides with the introduction of the BIT data type, but bit-field values can be used more generally in other contexts. mysql> SELECT b'101010' + 0; +---------------+ | b'101010' + 0 | +---------------+ | 42 | +---------------+ mysql> SELECT CHAR(b'1100001'); +------------------+ | CHAR(b'1100001') | +------------------+ | a | +------------------+ String ValuesStrings are values such as 'Madison, Wisconsin', 'patient shows improvement', or even '12345' (which looks like a number, but isn't). Usually, you can use either single or double quotes to surround a string value, but there are two reasons to stick with single quotes:
For the examples that use the double quote as a string quoting character in the discussion that follows, assume that ANSI_QUOTES mode is not enabled. MySQL recognizes several escape sequences within strings that indicate special characters, as shown in Table 3.1. Each sequence begins with a backslash character ('\') to signify a temporary escape from the usual rules for character interpretation. Note that a NUL byte is not the same as the SQL NULL value; NUL is a zero-valued byte, whereas NULL in SQL signifies the absence of a value.
The escape sequences shown in the table are case sensitive, and any character not listed in the table is interpreted as itself if preceded by a backslash. For example, \t is a tab, but \T is an ordinary 'T' character. The table shows how to escape single or double quotes using backslash sequences, but you actually have several options for including quote characters within string values:
To turn off the special meaning of backslash and treat it as an ordinary character, enable the NO_BACKSLASH_ESCAPES SQL mode, which is available as of MySQL 5.0.2. As an alternative to using quotes for writing string values, you can use two forms of hexadecimal notation. The first consists of '0x' followed by one or more hexadecimal digits ('0' through '9' and 'a' through 'f'). For example, 0x0a is 10 decimal, and 0xffff is 65535 decimal. The non-decimal hex digits ('a' through 'f') can be specified in uppercase or lowercase, but the leading '0x' cannot be given as '0X'. That is, 0x0a and 0x0A are legal hexadecimal values, but 0X0a and 0X0A are not. In string contexts, pairs of hexadecimal digits are interpreted as 8-bit numeric byte values in the range from 0 to 255, and the result is used as a string. In numeric contexts, a hexadecimal constant is treated as a number. The following statement illustrates the interpretation of a hex constant in each type of context:
mysql> SELECT 0x61626364, 0x61626364+0;
+------------+--------------+
| 0x61626364 | 0x61626364+0 |
+------------+--------------+
| abcd | 1633837924 |
+------------+--------------+
If a hexadecimal value written using 0x notation has an odd number of hex digits, MySQL treats it as though the value has a leading zero. For example, 0xa is treated as 0x0a. String values may also be specified using the standard SQL notation X'val', where val consists of pairs of hexadecimal digits. As with 0x notation, such values are interpreted as strings, but may be used as numbers in a numeric context:
mysql> SELECT X'61626364', X'61626364'+0;
+-------------+---------------+
| X'61626364' | X'61626364'+0 |
+-------------+---------------+
| abcd | 1633837924 |
+-------------+---------------+
Unlike 0x notation, the leading 'X' is not case sensitive:
mysql> SELECT X'61', x'61';
+-------+-------+
| X'61' | x'61' |
+-------+-------+
| a | a |
+-------+-------+
Properties of Binary and Non-Binary StringsString values fall into two general categories, binary and non-binary:
Character units vary in their storage requirements. A single-byte character set such as latin1 uses one byte per character, but there also are multi-byte character sets in which some or all characters require more than one byte. For example, both of the Unicode character sets available in MySQL are multi-byte. ucs2 is a double-byte character set in which each character requires two bytes. utf8 is a variable-length multi-byte character set with characters that take from one to three bytes. To find out which character sets and collations are available in your server as it currently is configured, use these two statements: mysql> SHOW CHARACTER SET; +----------+-----------------------------+---------------------+--------+ | Charset | Description | Default collation | Maxlen | +----------+-----------------------------+---------------------+--------+ | big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 | | dec8 | DEC West European | dec8_swedish_ci | 1 | | cp850 | DOS West European | cp850_general_ci | 1 | | hp8 | HP West European | hp8_english_ci | 1 | | koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 | | latin1 | ISO 8859-1 West European | latin1_swedish_ci | 1 | ... | utf8 | UTF-8 Unicode | utf8_general_ci | 3 | | ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 | ... +----------+-----------------------------+---------------------+--------+ mysql> SHOW COLLATION; +----------------------+----------+-----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +----------------------+----------+-----+---------+----------+---------+ | big5_chinese_ci | big5 | 1 | Yes | Yes | 1 | | big5_bin | big5 | 84 | | Yes | 1 | ... | latin1_german1_ci | latin1 | 5 | | | 0 | | latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 | | latin1_danish_ci | latin1 | 15 | | | 0 | | latin1_german2_ci | latin1 | 31 | | Yes | 2 | | latin1_bin | latin1 | 47 | | Yes | 1 | | latin1_general_ci | latin1 | 48 | | | 0 | | latin1_general_cs | latin1 | 49 | | | 0 | | latin1_spanish_ci | latin1 | 94 | | | 0 | ... +----------------------+----------+-----+---------+----------+---------+ As shown by the output from SHOW COLLATION, each collation is specific to a given character set, but a given character set might have several collations. Collation names usually consist of a character set name, a language, and an additional suffix. For example, utf8_icelandic_ci is a collation for the utf8 Unicode character set in which comparisons follow Icelandic sorting rules and characters are compared in case-insensitive fashion. Collation suffixes have the following meanings:
The sorting properties for binary and non-binary strings differ as follows:
Because collations are used for comparison and sorting, they affect many operations:
To determine the character set or collation of a string, you can use the CHARSET() and COLLATION() functions. Quoted string literals are interpreted according to the current server settings. The default character set and collation are latin1 and latin1_swedish_ci:
mysql> SELECT CHARSET('abcd'), COLLATION('abcd');
+-----------------+-------------------+
| CHARSET('abcd') | COLLATION('abcd') |
+-----------------+-------------------+
| latin1 | latin1_swedish_ci |
+-----------------+-------------------+
MySQL treats hexadecimal constants as binary strings by default:
mysql> SELECT CHARSET(0x0123), COLLATION(0x123);
+-----------------+------------------+
| CHARSET(0x0123) | COLLATION(0x123) |
+-----------------+------------------+
| binary | binary |
+-----------------+------------------+
Two forms of notation can be used to force a string literal to be interpreted with a given character set:
Introducer notation works for literal quoted strings or hexadecimal constants, but not for string expressions or column values. However, any string or string expression can be used to produce a string in a designated character set using the CONVERT() function: CONVERT(str USING charset); Introducers and CONVERT() are not the same. An introducer does not change the string value; it merely modifies how the string is interpreted. CONVERT() takes a string argument and produces a new string in the desired character set. To see the difference between introducers and CONVERT(), consider the following two statements that refer to the ucs2 double-byte character set: mysql> SET @s1 = _ucs2 'ABCD'; mysql> SET @s2 = CONVERT('ABCD' USING ucs2); Assume that the default character set is latin1 (a single-byte character set). The first statement interprets each pair of characters in the string 'ABCD' as a single double-byte ucs2 character, resulting in a two-character ucs2 string. The second statement converts each character of the string 'ABCD' to the corresponding ucs2 character, resulting in a four-character ucs2 string. What is the "length" of each string? It depends. If you measure with CHAR_LENGTH(), you get the length in characters. If you measure with LENGTH(), you get the length in bytes:
mysql> SELECT CHAR_LENGTH(@s1), LENGTH(@s1), CHAR_LENGTH(@s2), LENGTH(@s2);
+------------------+-------------+------------------+-------------+
| CHAR_LENGTH(@s1) | LENGTH(@s1) | CHAR_LENGTH(@s2) | LENGTH(@s2) |
+------------------+-------------+------------------+-------------+
| 2 | 4 | 4 | 8 |
+------------------+-------------+------------------+-------------+
Here is a somewhat subtle point: A binary string is not the same thing as a non-binary string that has a binary collation. The binary string has no character set. It is interpreted with byte semantics and comparisons use single-byte numeric codes. A non-binary string with a binary collation has character semantics and comparisons use numeric character values that might be based on multiple bytes per character. Here's one way to see the difference between binary and non-binary strings with regard to lettercase. Create a binary string and a non-binary string that has a binary collation, and then pass each string to the UPPER() function: mysql> SET @s1 = BINARY 'abcd'; mysql> SET @s2 = _latin1 'abcd' COLLATE latin1_bin; mysql> SELECT UPPER(@s1), UPPER(@s2); +------------+------------+ | UPPER(@s1) | UPPER(@s2) | +------------+------------+ | abcd | ABCD | +------------+------------+ Why doesn't UPPER() convert the binary string to uppercase? This occurs because it has no character set, so there is no way to know which byte values correspond to uppercase or lowercase characters. To use a binary string with functions such as UPPER() and LOWER(), you must first convert it to a non-binary string:
mysql> SELECT @s1, UPPER(CONVERT(@s1 USING latin1));
+------+----------------------------------+
| @s1 | UPPER(CONVERT(@s1 USING latin1)) |
+------+----------------------------------+
| abcd | ABCD |
+------+----------------------------------+
Character SetRelated System VariablesThe server maintains several system variables that are involved in various aspects of character set support. Six of these variables refer to character sets and three refer to collations. Each of the collation variables is linked to a corresponding character set variable.
Very likely you'll find that most character set and collation variables are set to the same value by default. For example, the following output indicates that client/server communication takes place using the latin1 character set: mysql> SHOW VARIABLES LIKE 'character\_set\_%'; +--------------------------+--------+ | Variable_name | Value | +--------------------------+--------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | +--------------------------+--------+ mysql> SHOW VARIABLES LIKE 'collation\_%'; +----------------------+-------------------+ | Variable_name | Value | +----------------------+-------------------+ | collation_connection | latin1_swedish_ci | | collation_database | latin1_swedish_ci | | collation_server | latin1_swedish_ci | +----------------------+-------------------+ A client that wants to talk to the server using another character set can change the communication-related variables. For example, if you want to use utf8, change three variables: mysql> SET character_set_client = utf8; mysql> SET character_set_results = utf8; mysql> SET character_set_connection = utf8; However, it's more convenient to use a SET NAMES statement for this purpose. The following statement is equivalent to the preceding three SET statements:
mysql> SET NAMES 'utf8';
One restriction on setting the communication character set is that you cannot use ucs2. Many client programs support a --default-character-set option that produces the same effect as a SET NAMES statement by informing the server of the desired communication character set. For variables that come in pairs (a character set variable and a collation variable), the members of the pair are linked in the following ways:
For example, setting character_set_connection to utf8 sets collation_connection to utf8_general_ci. Setting collation_connection to latin1_spanish_ci sets character_set_connection to latin1. Date and Time (Temporal) ValuesDates and times are values such as '2005-06-17' or '12:30:43'. MySQL also understands combined date/time values, such as '2005-06-17 12:30:43'. Take special note of the fact that MySQL represents dates in year-month-day order. This often surprises newcomers to MySQL, although this is standard SQL format (also known as "ISO 8601" format). You can display date values any way you like using the DATE_FORMAT() function, but the default display format lists the year first, and input values must be specified with the year first. Spatial ValuesMySQL 4.1 and up supports spatial values, although currently only for MyISAM tables. This capability allows representation of values such as points, lines, and polygons. For example, the following statement uses the text representation of a point value with X and Y coordinates of (10, 20) to create a POINT and assigns the result to a user-defined variable: SET @pt = POINTFROMTEXT('POINT(10 20)'); The NULL ValueNULL is something of a "typeless" value. Generally, it's used to mean "no value," "unknown value," "missing value," "out of range," "not applicable," "none of the above," and so forth. You can insert NULL values into tables, retrieve them from tables, and test whether a value is NULL. However, you cannot perform arithmetic on NULL values; if you try, the result is NULL. Also, many functions return NULL if you invoke them with a NULL argument. |