< Day Day Up > |
4.10 Column TypesMySQL can work with many different kinds of data. Generally speaking, data values can be grouped into three categories:
When you create a table, the declaration for each of its columns includes the column name, a datatype specification that indicates what kind of values the column may hold, and possibly some options that more specifically define how MySQL should handle the column. For example, the following statement creates a table named people, which contains a numeric column named id and two 30-byte string columns named first_name and last_name: CREATE TABLE people ( id INT, first_name CHAR(30), last_name CHAR(30) ); The column definitions in this CREATE TABLE statement contain only names and column datatype specifications. To control the use of a column more specifically, options may be added to its definition. For example, to disallow negative values in the id column, add the UNSIGNED option. To disallow NULL (missing or unknown) values in any of the columns, add NOT NULL to the definition of each one. The modified CREATE TABLE statement looks like this: CREATE TABLE people ( id INT UNSIGNED NOT NULL, first_name CHAR(30) NOT NULL, last_name CHAR(30) NOT NULL ); For each of the general datatype categories (number, string, date, and time), MySQL has several specific column types from which to choose. It's important to properly understand the datatypes that are available for representing data, to avoid choosing a column type that isn't appropriate. The following sections provide a general description of the column datatypes and their properties. For additional details, the MySQL Reference Manual provides an extensive discussion on column datatypes. 4.10.1 Numeric Column TypesMySQL provides numeric column types for integer values, values with a fixed number of decimal places, and floating-point values that have a variable number of decimal places. When you choose a numeric column datatype, consider the following factors:
4.10.1.1 Integer Column TypesInteger datatypes include TINYINT, SMALLINT, MEDIUMINT, INT, and BIGINT. Smaller datatypes require less storage space, but are more limited in the range of values they represent. For example, a TINYINT column has a small range (–128 to 127), but its values take only one byte each to store. INT has a much larger range (–2,147,483,648 to 2,147,483,647) but its values require four bytes each. The integer datatypes are summarized in the following table, which indicates the amount of storage each type requires as well as its range. For integer values declared with the UNSIGNED option, negative values are not allowed, and the high end of the range shifts upward to approximately double the maximum value.
Integer datatypes may be declared with a display width, which affects the number of characters used to display column values in query output. For example, assume that you declare an INT column with a display width of 4 like this: century INT(4) The result is that values in the century column will usually be displayed four digits wide. It's important to remember that the display width is unrelated to the range of the datatype. The display width you define for a column affects only the maximum number of digits MySQL will use to display column values. Values shorter than the display width are padded with spaces as necessary. Note also that the display width is not a hard limit; it won't cause output truncation of a value that's too long to fit within the width. Instead, the full value will be shown. For example, assume that you've inserted the number 57622 into the century column. When you SELECT the column in a query, MySQL will display the entire value (57622) rather than just the first four digits of the value. If you don't specify a display width for an integer type, MySQL chooses a default based on the number of characters needed to display the full range of values for the type (including the minus sign). For example, SMALLINT has a default display width of 6 because the widest possible value is -32768. 4.10.1.2 Floating-Point and Fixed-Decimal Column TypesThe floating-point datatypes include FLOAT and DOUBLE. The fixed-point datatype is DECIMAL. Each of these types may be used to represent numeric values that have a scale, or fractional part. FLOAT and DOUBLE datatypes represent floating-point values in the native binary format used by the server host's CPU. This is a very efficient type for storage and computation, but values are subject to rounding error. DECIMAL uses a fixed-decimal storage format: All values in a DECIMAL column have the same number of decimal places. Values are stored in string format using one byte per digit. Numbers represented as strings cannot be processed as quickly as numbers represented in binary, so operations on DECIMAL columns are slower than operations on FLOAT and DOUBLE columns. DECIMAL values are not subject to rounding error when stored, which makes the DECIMAL column type a popular choice for financial applications involving currency calculations. However, be aware that currently MySQL does internal calculations using floating-point arithmetic, which can produce rounding error in the result. FLOAT and DOUBLE are used to represent single-precision and double-precision floating-point values. They use 4 and 8 bytes each for storage, respectively. By default, MySQL represents values stored in FLOAT and DOUBLE columns to the maximum precision allowed by the hardware, but you can specify a display width and precision in the column definition. The following single-precision column definition specifies a display width of 10 digits, with a precision of 4 decimals: avg_score FLOAT(10,4) DECIMAL columns may also be declared with a display width and scale. If you omit them, the defaults are 10 and 0, so the following declarations are equivalent: total DECIMAL total DECIMAL(10) total DECIMAL(10,0) If you want to represent values such as dollar-and-cents currency figures, you can do so using a two-digit scale: total DECIMAL(10,2) The amount of storage required for DECIMAL column values depends on the type. Normally, the number of bytes of storage required per value is equal to the display width plus 2. For example, DECIMAL(6,3) requires 8 bytes: the display width is 6 and 2 bytes are needed to store the sign and decimal point. If the scale is 0, no decimal point needs to be stored, so one fewer byte is required. If the column is UNSIGNED, no sign character needs to be stored, also requiring one fewer byte. 4.10.2 String Column TypesThe string column types are listed in the following table:
When you choose a string datatype, consider the following factors:
The following discussion first describes the general differences between binary and nonbinary strings, and then the specific characteristics of each of the string column datatypes. 4.10.2.1 Binary and Nonbinary String CharacteristicsStrings in MySQL may be treated as binary or nonbinary. The two types are each most suited to different purposes. Binary strings have the following characteristics:
Nonbinary strings are associated with a character set. The character set affects interpretation of string contents and sorting as follows:
The preceding remarks regarding case and accent sensitivity are not absolute, just typical. A given character set can be defined with a collating order that's case or accent sensitive, or both. MySQL takes care to create character sets that correspond to the sorting order rules of different languages. String comparison rules are addressed in more detail in section 6.1.1, "Case Sensitivity in String Comparisons." The different treatment of binary and nonbinary strings in MySQL is important when it comes to choosing datatypes for table columns. If you want column values to be treated as case and accent insensitive, you should choose a nonbinary column type. Conversely, if you want case and accent sensitive values, choose a binary type. You should also choose a binary type for storing raw data values that consist of untyped bytes. The CHAR and VARCHAR string column types are nonbinary by default, but can be made binary by including the keyword BINARY in the column definition. Other string types are inherently binary or nonbinary. BLOB columns are always binary, whereas TEXT columns are always nonbinary. You can mix binary and nonbinary string columns within a single table. For example, assume that you want to create a table named auth_info, to store login name and password authorization information for an application. You want login names to match in any lettercase but passwords to be case sensitive. This statement would accomplish the task: CREATE TABLE auth_info ( login CHAR(16), # not case sensitive password CHAR(16) BINARY # case sensitive ); 4.10.2.2 The CHAR and VARCHAR Column TypesThe CHAR and VARCHAR column types hold strings up to the maximum length specified in the column definition. To define a column with either of these datatypes, provide the column name, the keyword CHAR or VARCHAR, the maximum length of acceptable values in parentheses, and possibly the keyword BINARY. The maximum length should be a number from 0 to 255. (One of the sample exercises at the end of this chapter discusses why you might declare a zero-length column.) By default, CHAR and VARCHAR columns contain nonbinary strings. The BINARY modifier causes the values they contain to be treated as binary strings. The CHAR datatype is a fixed-length type. Values in a CHAR column always take the same amount of storage. A column defined as CHAR(30), for example, requires 30 bytes for each value, even empty values. In contrast, VARCHAR is a variable-length datatype. A VARCHAR column takes only the number of bytes required to store each value, plus one byte per value to record the value's length. For MySQL 4.0, the length for CHAR and VARCHAR columns is measured in bytes, not characters. There's no difference for single-byte character sets, but the two measures are different for multi-byte character sets. In MySQL 4.1, this will change; column lengths will be measured in characters. For example, CHAR(30) will mean 30 characters, even for multi-byte character sets. 4.10.2.3 The BLOB and TEXT Column TypesThe BLOB and TEXT datatypes each come in four different sizes, differing in the maximum length of values they can store. All are variable-length types, so an individual value requires storage equal to the length (in bytes) of the value, plus 1 to 4 bytes to record the length of the value. The following table summarizes these datatypes; L represents the length of a given value.
BLOB column values are always binary and TEXT column values are always nonbinary. When deciding which of the two to choose for a column, you would normally base your decision on whether you want to treat column values as case sensitive or whether they contain raw bytes rather than characters. BLOB columns are more suitable for case-sensitive strings or for raw data such as images or compressed data. TEXT columns are more suitable for case-insensitive character strings such as textual descriptions. 4.10.2.4 The ENUM and SET Column TypesTwo of the string column types, ENUM and SET, are used when the values to be stored in a column are chosen from a fixed set of values. You define columns for both types in terms of string values, but MySQL represents them internally as integers. This leads to very efficient storage, but can have some surprising results unless you keep this string/integer duality in mind. ENUM is an enumeration type. An ENUM column definition includes a list of allowable values; each value in the list is called a "member" of the list. Every value stored in the column must equal one of the values in the list. A simple (and very common) use for ENUM is to create a two-element list for columns that store yes/no or true/false choices. The following table shows how to declare such columns: CREATE TABLE booleans ( yesno ENUM('Y','N'), truefalse ENUM('T','F') ); Enumeration values aren't limited to being single letters or uppercase. The columns could also be defined like this: CREATE TABLE booleans ( yesno ENUM('yes','no'), truefalse ENUM('true','false') ); An ENUM column definition may list up to 65,535 members. Enumerations with up to 255 members require one byte of storage per value. Enumerations with 256 to 65,535 members require two bytes per value. The following table contains an enumeration column continent that lists continent names as valid enumeration members: CREATE TABLE Countries ( name char(30), continent ENUM ('Asia','Europe','North America','Africa', 'Oceania','Antarctica','South America') ); The values in an ENUM column definition are given as a comma-separated list of quoted strings. Internally, MySQL stores the strings as integers, using the values 1 through n for a column with n enumeration members. The following statement assigns the enumeration value 'Africa' to the continent column; MySQL actually stores the value 4 because 'Africa' is the fourth continent name listed in the enumeration definition: INSERT INTO Countries (name,continent) VALUES('Kenya','Africa'); MySQL reserves the internal value 0 as an implicit member of all ENUM columns. It's used to represent illegal values assigned to an enumeration column. For example, if you assign 'USA' to the continent column, MySQL will store the value 0, rather than any of the values 1 through 7, because 'USA' is not a valid enumeration member. If you select the column later, MySQL displays 0 values as '' (the empty string). The SET datatype, like ENUM, is declared using a comma-separated list of quoted strings that define its valid members. But unlike ENUM, a given SET column may be assigned a value consisting of any combination of those members. The following definition contains a list of symptoms exhibited by allergy sufferers: CREATE TABLE allergy ( symptom SET('sneezing','runny nose','stuffy head','red eyes') ); A patient may have any or all (or none) of these symptoms, and symptom values therefore might contain zero to four individual SET members, separated by commas. The following statements set the symptom column to the empty string (no SET members), a single SET member, and multiple SET members, respectively: INSERT INTO allergy (symptom) VALUES(''); INSERT INTO allergy (symptom) VALUES('stuffy head'); INSERT INTO allergy (symptom) VALUES('sneezing,red eyes'); MySQL represents SET columns as a bitmap using one bit per member, so the elements in the symptom definition have internal values of 1, 2, 4, and 8 (that is, they have the values of bits 0 through 3 in a byte). Internally, MySQL stores the values shown in the preceding INSERT statements as 0 (no bits set), 4 (bit 2 set), and 9 (bits 0 and 3 set; that is, 1 plus 8). A SET definition may contain up to 64 members. The internal storage required for set values varies depending on the number of SET elements (1, 2, 3, 4, or 8 bytes for sets of up to 8, 16, 24, 32, or 64 members). If you try to store an invalid list member into a SET column, it's ignored because it does not correspond to any bit in the column definition. For example, setting a symptom value to 'coughing,sneezing,wheezing' results in an internal value of 1 ('sneezing'). The 'coughing' and 'wheezing' elements are ignored because they aren't listed in the column definition as legal set members. As mentioned earlier in this section, the conversion between string and numeric representations of ENUM and SET values can result in surprises if you aren't careful. For example, although you would normally refer to an enumeration column using the string forms of its values, you can also use the internal numeric values. The effect of this can be very subtle if the string values look like numbers. Suppose that you define a table t like this: CREATE TABLE t (age INT, siblings ENUM('0','1','2','3','>3')); In this case, the enumeration values are the strings '0', '1', '2', '3', and '>3', and the matching internal numeric values are 1, 2, 3, 4, and 5, respectively. Now suppose that you issue the following statement: INSERT INTO t (age,siblings) VALUES(14,'3'); The siblings value is specified here as the string '3', and that is the value assigned to the column in the new record. However, you can also specify the siblings value as a number, as follows: INSERT INTO t (age,siblings) VALUES(14,3); But in this case, 3 is interpreted as the internal value, which corresponds to the enumeration value '2'! The same principle applies to retrievals. Consider the following two statements: SELECT * FROM t WHERE siblings = '3'; SELECT * FROM t WHERE siblings = 3; In the first case, you get records that have an enumeration value of '3'. In the second case, you get records where the internal value is 3; that is, records with an enumeration value of '2'. 4.10.3 Date and Time Column TypesMySQL provides column types for storing different kinds of temporal information. In the following descriptions, the terms YYYY, MM, DD, hh, mm, and ss stand for a year, month, day of month, hour, minute, and second value, respectively. The storage requirements and ranges for the date and time datatypes are summarized in the following table:
For TIMESTAMP, MySQL 4.0 displays values such as '1970-01-01 00:00:00' as 19700101000000. In MySQL 4.1, this changes so that TIMESTAMP display format is the same as DATETIME. Each of these temporal datatypes also has a "zero" value that's used when you attempt to store an illegal value. The "zero" value is represented in a format appropriate for the type (such as '0000-00-00' for DATE and '00:00:00' for TIME). 4.10.3.1 The DATE and TIME Column TypesThe DATE datatype represents date values in 'YYYY-MM-DD' format. This representation corresponds to the ANSI SQL date format, also known as ISO 8601 format. The supported range of DATE values is '1000-01-01' to '9999-12-31'. You might be able to use earlier dates than that, but it's better to stay within the supported range to avoid unexpected behavior. MySQL always represents DATE values in ISO 8601 format when it displays them. If necessary, you can reformat DATE values into other display formats using the DATE_FORMAT() function. For date entry, MySQL also expects to receive dates in ISO format, or at least close to ISO format. That is, date values must be given in year-month-day order, but some deviation from strict ISO format is allowed:
Instead of attempting to load values that aren't in an acceptable format into a DATE column, you should convert them into ISO format. An alternative approach that's useful in some circumstances is to load the values into a string column and perform reformatting operations using SQL string functions to produce ISO format values that can be assigned to a DATE column. The TIME datatype represents time values in 'hh:mm:ss' format. TIME values may represent elapsed time, and thus might be outside the range of time-of-day values. They may even be negative values. (The actual range of TIME values is '-838:59:59' to '838:59:59'.) MySQL represents TIME values in 'hh:mm:ss' format when displaying them. If necessary, you can reformat TIME values into other display formats using the TIME_FORMAT() function. For TIME value entry, some variation on this format is allowed. For example, leading zeros on TIME parts may be omitted. 4.10.3.2 The TIMESTAMP and DATETIME Column TypesThe DATETIME column type stores date-and-time values in 'YYYY-MM-DD hh:mm:ss' format. It's similar to a combination of DATE and TIME values, but the TIME part represents time of day rather than elapsed time and has a range limited to '00:00:00' to '23:59:59'. The date part of DATETIME columns has the same range as DATE columns; combined with the TIME part, this results in a DATETIME range from '1000-01-01 00:00:00' to '9999-12-31 23:59:59'. The TIMESTAMP type, like DATETIME, stores date-and-time values, but has a different range and some special properties that make it especially suitable for tracking data modification times. TIMESTAMP also has a different display format from DATETIME prior to MySQL 4.1:
The range of TIMESTAMP values begins at 1970-01-01 00:00:00 (GMT) and extends partway into the year 2037. TIMESTAMP values actually represent the number of seconds elapsed since the beginning of 1970 and are stored using four bytes. This provides room for sufficient seconds to represent a date in the year 2037. Note that TIMESTAMP values are stored using the server's local timezone. TIMESTAMP columns have the following special properties:
It's important to know about the automatic-update property. It's what makes TIMESTAMP columns useful for tracking record modification times, but is a source of confusion if you're not aware of it. People who choose TIMESTAMP for a column on the basis of the fact that it stores date-and-time values become dismayed and mystified when they discover that the column's values change unexpectedly. 4.10.3.3 The YEAR Column TypeThe YEAR column type represents year-only values. You can declare such columns as YEAR(4) or YEAR(2) to obtain a four-digit or two-digit display format. If you don't specify any display width, the default is four digits. If you don't need a full date and the range of values you need to store falls into the YEAR range, consider using YEAR to store temporal values. It's a very space-efficient datatype because values require only one byte of storage each. 4.10.4 Column OptionsThe final part of a column definition (following the datatype) can include optional modifiers. These options are described in the following list. Note that many of them apply only to certain column types.
4.10.5 Using the AUTO_INCREMENT Column OptionThe AUTO_INCREMENT option may be added to an integer column definition to create a column for which MySQL automatically generates a new sequence number each time you create a new row. The option is used in conjunction with an index (usually a primary key) and provides a mechanism whereby each value is a unique identifier that can be used to refer unambiguously to the row in which it occurs. MySQL also provides a LAST_INSERT_ID() function that returns the most recently generated AUTO_INCREMENT value. This function is useful for determining the identifier when you need to look up the record just created, or when you need to know the identifier to create related records in other tables. The following scenario illustrates how you can set up and use an AUTO_INCREMENT column. Assume that you're organizing a conference and need to keep track of attendees and the seminars for which each attendee registers. (When someone submits a registration form for the conference, the form must indicate which of the available seminars the person wants to attend.) Your task is to record seminar registrations and associate them with the appropriate attendee. Unique ID numbers provide a way to keep track of attendees and an AUTO_INCREMENT column makes the implementation for the task relatively easy:
The preceding description shows how to use an AUTO_INCREMENT column—how to declare the column, how to generate new ID values when inserting new records, and how to use the ID values to tie together related tables. However, the description glosses over some of the details. These are presented in the following discussion, beginning with declaration syntax and then providing further information about how AUTO_INCREMENT columns work. The att_id-related declarations in the attendee table look like this: att_id INT UNSIGNED NOT NULL AUTO_INCREMENT, PRIMARY KEY (att_id) These declarations involve the following factors, which you should consider when creating an AUTO_INCREMENT column:
After setting up an AUTO_INCREMENT column, use it as follows:
The MyISAM storage engine supports composite indexes that include an AUTO_INCREMENT column. This allows creation of independent sequences within a single table. Consider the following table definition: CREATE TABLE multisequence ( name CHAR(10) NOT NULL, name_id INT UNSIGNED NOT NULL AUTO_INCREMENT, PRIMARY KEY (name, name_id) ); Inserting name values into the multisequence table generates separate sequences for each distinct name: mysql> INSERT INTO multisequence (name) -> VALUES('Petr'),('Ilya'),('Ilya'),('Yuri'),('Ilya'),('Petr'); mysql> SELECT * FROM multisequence ORDER BY name, name_id; +------+---------+ | name | name_id | +------+---------+ | Ilya | 1 | | Ilya | 2 | | Ilya | 3 | | Petr | 1 | | Petr | 2 | | Yuri | 1 | +------+---------+ Note that for this kind of AUTO_INCREMENT column, values deleted from the high end of any sequence are reused. This differs from MyISAM behavior for single-column AUTO_INCREMENT sequences. 4.10.6 Automatic Type Conversion and Value ClippingFor historical reasons, MySQL is forgiving about signaling an error if a given value doesn't match the datatype of the column that is the insert target. Instead, MySQL does its best to perform automatic type conversion. For example, if you attempt to store a negative value in an UNSIGNED integer column, MySQL silently converts it to zero, which is the nearest legal value for the column. In other words, the MySQL server converts input values to the types expected from the column definitions, inserts the result, and continues on its way. If you need to prevent attempts to insert invalid values into a table, you should first validate the values on the client side; however, because that isn't an exam topic, it isn't discussed further here. This section describes the kinds of conversions that MySQL performs and the circumstances under which they occur. After you know these principles, you'll know what types of validation are necessary before trying to store your data in a MySQL database. In many cases, type conversion affords you the flexibility to write a statement different ways and get the same result. For example, if i is an integer column, the following statements both insert 43 into it, even though the value is specified as a number in one statement and as a string in the other: INSERT INTO t (i) VALUES(43); INSERT INTO t (i) VALUES('43'); MySQL performs automatic string-to-number conversion for the second statement. In other cases, the effects of type conversion might be surprising, particularly if you're unaware that these conversions occur. You can avoid such surprises by understanding the conditions under which conversion takes place. In general, MySQL performs type conversion based on the constraints implied by a column's definition. These constraints apply in several contexts:
The following list discusses some of the conversions that MySQL performs. It isn't exhaustive, but is sufficiently representative to provide you with a good idea of how MySQL treats input values and what you'll be tested on in the exam. Circumstances under which automatic type conversion occurs include the following:
Using ALTER TABLE to change a column's datatype maps existing values to new values according to the constraints imposed by the new datatype. This might result in some values being changed. For example, if you change a TINYINT to an INT, no values are changed because all TINYINT values fit within the INT range. However, if you change an INT to a TINYINT, any values that lie outside the range of TINYINT are clipped to the nearest endpoint of the TINYINT range. Similar effects occur for other types of conversions, such as TINYINT to TINYINT UNSIGNED (negative values are converted to zero), and converting a long string column to a shorter one (values that are too long are truncated to fit the new size). If a column is changed to NOT NULL using ALTER TABLE, MySQL converts NULL values to the default value for the column type. The following table shows how several types of string values are handled when converted to date or numeric datatypes. It demonstrates several of the points just discussed. Note that only string values that look like dates or numbers convert properly without loss of information. Note too that leading zeros are retained for the DECIMAL column during conversion.
|
< Day Day Up > |