9.3 Datafile Format Specifiers
LOAD DATA INFILE and SELECT … INTO OUTFILE both assume by default a datafile format in which column values are separated by tab characters and records are terminated by newlines. If a datafile to be read by LOAD DATA INFILE has different column separators or line terminators, you must indicate what the format is so that MySQL can read the file contents correctly. Similarly, if you want SELECT … INTO OUTFILE to write a file with different separators or terminators, you'll need to indicate the format to use. It's also possible to control quoting and escaping behavior.
LOAD DATA INFILE and SELECT … INTO OUTFILE don't allow you to specify anything about the characteristics of specific individual columns in the datafile, such as that column 3 is numeric or that column 17 contains dates. Instead, you define the general characteristics that apply to all column values: What characters separate column values in data rows, whether values are quoted, and whether there is an escape character that signifies special character sequences.
For LOAD DATA INFILE, any format specifiers given are listed after the table name. For SELECT … INTO OUTFILE, they follow the output filename. The syntax for format specifiers is the same for both statements and looks like this:
FIELDS
TERMINATED BY 'string'
ENCLOSED BY 'char'
ESCAPED BY 'char'
LINES TERMINATED BY 'string'
The FIELDS clause defines the formatting of data values within a line and the LINES clause defines the line-ending sequence. In other words, FIELDS indicates the structure of column values within records and LINES indicates where record boundaries occur.
The TERMINATED BY, ENCLOSED BY, and ESCAPED BY parts of the FIELDS clause may be given in any order. You need not specify all three parts. Defaults are used for any that are missing (or if the FIELDS clause itself is missing):
Data values are assumed to be terminated by (that is, separated by) tab characters. To indicate a different value, include a TERMINATED BY option. Data values are assumed not to be quoted. To indicate a quoting character, include an ENCLOSED BY option. For LOAD DATA INFILE, enclosing quotes are stripped from input values if they're found. For SELECT … INTO OUTFILE, output values are written enclosed within quoting characters. A variation on ENCLOSED BY is OPTIONALLY ENCLOSED BY. For LOAD DATA INFILE, this is the same as ENCLOSED BY. For SELECT … INTO OUTFILE, it is different: The presence of OPTIONALLY causes output value quoting only for CHAR and VARCHAR columns, not for all values. The default escape character is backslash ('/'). Any occurrence of this character within a data value modifies interpretation of the character that follows it. To indicate a different escape character, include an ESCAPED BY option. MySQL understands the following special escape sequences: Sequence | Meaning |
---|
\N | NULL value | \0 | ASCII NUL byte | \b | Backspace | \n | Newline (linefeed) | \r | Carriage return | \s | Space | \t | Tab | \' | Single quote | \" | Double quote | \\ | Backslash |
All these sequences except \N are understood whether they appear alone or within a longer data value. \N is understood as NULL only when it appears alone.
The default line terminator is the newline (linefeed) character. To indicate a line-ending sequence explicitly, use a LINES clause. Common line-terminator specifiers are newline, carriage return, and carriage return/newline pairs. These are specified as follows:
LINES TERMINATED BY '\n'
LINES TERMINATED BY '\r'
LINES TERMINATED BY '\r\n'
Because newline is the default line terminator, it need be specified only if you want to make the line-ending sequence explicit. Newline terminators are common on Unix systems, carriage returns are common on Mac OS and Mac OS X, and carriage return/newline pairs are common on Windows.
The ESCAPED BY option controls only the handling of values in the datafile, not how you specify the statement itself. If you want to specify a datafile escape character of @, you'd write ESCAPED BY '@'. That doesn't mean you then use @ to escape special characters elsewhere in the statement. For example, you'd still specify carriage return as the line termination character using LINES TERMINATED BY '\r', not using LINES TERMINATED BY '@r'.
Suppose that a file named data.txt contains information in comma-separated values (CSV) format, with values quoted by double quote characters and lines terminated by carriage returns. To load the file into a table t, use this LOAD DATA INFILE statement:
LOAD DATA INFILE 'data.txt' INTO TABLE t
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r';
To write information in that same format, use this SELECT … INTO OUTFILE statement:
SELECT * INTO OUTFILE 'data.txt'
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r'
FROM t;
|