Team LiB
Previous Section Next Section

Internationalization and Localization Issues

Internationalization refers to the capability of software to be used according to local convention, for any of a variety of locations. Localization refers to selecting a particular set of local conventions from among those sets that are supported. The following aspects of MySQL configuration relate to internationalization and localization:

  • The server default time zone

  • The language used for displaying diagnostic and error messages

  • The available character sets and the default character set

Configuring Time Zone Support

Your server's time zone capabilities depend on what version of MySQL you are running. In all versions of MySQL, the server sets its default time zone by examining its environment. Most often, this will be the local time zone of the server host, but you can specify the time zone explicitly at server startup. In addition, as of MySQL 4.1.3, the server allows each client that connects to override the default setting and set its own time zone. This enables applications to use time settings that depend on where the client program is running rather than where the server is running. The following discussion describes the current capabilities for multiple time zones, and then briefly describes how to set the time zone for older single time zone servers.

From MySQL 4.1.3 on, two system variables hold time zone information:

  • The system_time_zone variable represents the time zone that the server determines to be the server host time zone at startup time. This variable exists only as a global system variable and cannot be reset at runtime. You can influence how the server sets system_time_zone at startup time by setting the TZ environment variable to the desired time zone before starting the server. However, it may not be easy to guarantee that TZ will be set in some contexts, such as when the server is started during the system boot sequence. On Unix, another way to set the time zone is by specifying a --timezone option to the mysqld_safe startup script (not to mysqld, which does not understand the option). It's probably best to list this option in the [mysqld_safe] group of an option file, especially if you invoke mysqld_safe indirectly through mysql.server, which does not support command-line options. For example, to specify the U.S. Central time zone for mysqld_safe, add the following to your server option file:

    [mysqld_safe]
    timezone=US/Central
    

    The example shows one widely used syntax (it works on Solaris, Linux, and Mac OS X, for example). Another common syntax is as follows:

    [mysqld_safe]
    timezone=CST6CDT
    

    Use whatever syntax is indicated in your system documentation.

  • The time_zone variable represents the MySQL server's default time zone. By default, this variable is set to SYSTEM, which means "use the system_time_zone setting." You can set time_zone at startup time by using the --default-time-zone option for mysqld. As the server runs, it uses the global value of time_zone to set the session time_zone value for each client the connects, which becomes the client's default time zone. Any client can reset the time zone for its own connection by setting the session time_zone variable. An administrative client that has the SUPER privilege can set the global time_zone variable to change the default for clients that connect thereafter.

To determine the current values of the global and session time zone variables, use this statement:

SELECT @@GLOBAL.time_zone, @@SESSION.time_zone;

You can set the value of time_zone using three kinds of variables, although one of them requires additional administrative action. The statements shown here set the session value. Substitute GLOBAL to set the global value. (However, be aware that you cannot set the GLOBAL value at runtime if binary logging is enabled, because that would cause replication difficulties.)

  • You can use the value SYSTEM to set time_zone to the value of system_time_zone:

    SET SESSION time_zone = 'SYSTEM';
    

  • You can use values that specify a signed hour and minute offset from UTC:

    SET SESSION time_zone = '+00:00';    # UTC
    SET SESSION time_zone = '+03:00';    # 3 hours ahead of UTC
    SET SESSION time_zone = '-11:00';    # 11 hours behind UTC
    

  • You can use named time zones that refer to a locale:

    SET SESSION time_zone = 'US/Central';
    SET SESSION time_zone = 'CST6CDT';
    SET SESSION time_zone = 'Asia/Jakarta';
    

To use the third method (setting the time zone by name), you must enable the server to understand time zone names by loading information from the operating system's time zone files into a set of tables in the mysql database. This does not happen automatically during MySQL installation. You must populate the tables manually by using the mysql_tzinfo_to_sql program. This program reads the time zone files and constructs SQL statements from their contents. You can feed these statements to the mysql program to execute them.

To set up the time zone tables on a system that has time zone files, determine where they are installed. If this location is /usr/share/zoneinfo, the command to load the files into the mysql database looks like this:

% mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -p -u root mysql

That should suffice for most versions of Unix. For Windows and for Unix systems that do not have a set of time zone files, you can obtain a package containing a set of pre-built MyISAM tables containing time zone information from this location:

http://dev.mysql.com/downloads/timezones.html

Download the package and unpack it. With the MySQL server stopped, copy the .frm, .MYD, and .MYI files into the mysql database directory under your data directory. Then restart the server.

Before MySQL 4.1.3, the server always operates in a single time zone. This time zone applies to all client connections even if the client happens to be in a different time zone. The timezone system variable indicates the time zone in which the server operating. (This variable was renamed to system_time_zone in MySQL 4.1.3.) You can cause the server to set its timezone variable by setting the TZ environment variable or by using the --timezone option to mysqld_safe, as described earlier.

Selecting the Language for Error Messages

The MySQL server can produce diagnostic and error messages in any of several languages. The default is english, but you can specify others. To see which are available, look under the share/mysql directory of your MySQL installation. The directories that have language names correspond to the available languages. To change the message language, use the --language startup option. The argument can be either the language name or the pathname to the language directory. For example, to use French if your installation is located under /usr/local/mysql, you can use either --language=french or --language=/usr/local/mysql/share/mysql/french.

Configuring Character Set Support

Beginning with version 4.1, MySQL supports multiple character sets simultaneously. You can select character sets at the server, database, table, column, and string constant levels. Character sets determine which characters are allowed in string values. MySQL also supports multiple collating sequences per character set. Collations affect string comparison and sorting operations.

This section describes how to configure MySQL's character set support. For general background on the server's character set capabilities, see Chapter 2, "MySQL SQL Syntax and Use." For details on creating character columns and using them, see Chapter 3, "Working with Data in MySQL."

Note: Prior to MySQL 4.1, the server supports only a single character set at a time, although you can select which one to use at server startup. If you upgrade a pre-4.1 installation to 4.1 or later, you should convert tables that contain character columns to the current format. See "Upgrading Older Tables to Enable MySQL 4.1 Character Set Support" for details.

When you configure the server at build time, you can specify which character sets the server should support, as well as the default character set and collation. Use the following options to the configure script:

  • To specify which of the available character sets the server should support, use the --with-extra-charsets option. Its argument is a comma-separated list of character set names. For example, you can include support for the latin1, big5, and hebrew character sets like this:

    % ./configure --with-extra-charsets=latin1,big5,hebrew
    

    Two special character set names select groups of character sets: all includes all available character sets, and complex includes all complex character sets. A set is complex if it is either a multi-byte character set or it requires special rules for sorting.

    To determine which character sets can be selected, use the following command and look for the description of the --with-charset option in the output:

    % ./configure --help
    

  • The default character set is latin1. To select a different default, use the --with-charset option.

  • The default collation is latin1_swedish_ci. To select a different default, use the --with-collation option. The collation must be compatible with the default character set. (That is, the beginning part of the collation name must be the same as the character set name.)

Here is a sample configuration command that uses all three options:

% ./configure --with-charset=utf8 \
      --with-collation=utf8_icelandic_ci \
      --with-extra-charsets=all

At runtime, the server sets its default character set and collation to the built-in defaults unless you specify otherwise. To select different values, use the --character-set-server and --collation-server startup options. The collation must be compatible with the character set.

When you run a client program, you can specify the character set that you want the program to use by giving the --default-character-set option. If the character set you want isn't available as part of your MySQL installation, but you do have the necessary character set files installed under another directory, you can tell the client program their location by specifying the --character-sets-dir option.

Upgrading Older Tables to Enable MySQL 4.1 Character Set Support

Before MySQL 4.1, the server always uses a single character set. The default is latin1 but that can be changed if you start the server with the --default-character-set option. If your tables all use the default character set, you can upgrade them to MySQL 4.1 format in straightforward fashion with no character set conversion necessary.

However, some people use 4.0 servers to store data in character sets other than the default. This requires that applications know the server's limitations and compensate for them. A common technique is to "fake it" by storing information that really isn't in the server's default character set and then interpreting it using the proper character set on the client side. For example, your server might use latin1 as the default character set, but you have applications that need to work with UTF-8 information. One workaround is to store the raw character information in a latin1 CHAR or TEXT column, and have the application interpret column values as UTF-8 on the client side. In MySQL 4.1 and up, you should convert your tables so that their columns definitions explicitly indicate that they have a character set of utf8. However, you must be careful when doing so to avoid loss of information due to inappropriate conversions.

The following sections describe table upgrading procedures to be used when upgrading to MySQL 4.1 or later. Use the first procedure if your tables all use the default character set, and the second if they don't. It's a good idea to make a backup first.

Upgrading Tables When Character Set Conversion Is Unnecessary

If your tables use the default character set and you just want to upgrade them to the current table format, use this procedure:

1.
Before upgrading MySQL to version 4.1 or later, make a backup of your databases using mysqldump:

% mysqldump -p -u root --all-databases --opt > dumpfile.sql

--all-databases causes all databases to be dumped and --opt optimizes the dump file to be smaller and so that it can be processed more quickly when reloaded. (mysqldump is discussed further in Chapter 13, "Database Backups, Maintenance, and Repair.")

2.
Stop the server.

3.
Upgrade your MySQL installation and restart the server, but do not change the server's default character set.

4.
Convert your tables to 4.1 format by reloading them from the backup file:

% mysql -p -u root < dumpfile.sql

This procedure re-creates your tables in the current format, which has two important effects:

  • Each table is assigned the server's character set as its own. This means you can change the server character set later and each table column will retain its character set, unaffected by the change. (That is, tables become "insulated" from modifications to the server character set and thus indexes on its column retain their proper collation.)

  • Columns in converted tables no longer need all have the same character set, so you can change the character set for individual columns should you want to do so. And if you do modify a column's character set or collation, the server automatically reorders any indexes of which it is a part to reflect the new collating sequence.

If you don't want to dump and reload all your tables, you can upgrade MySQL and then convert individual tables by using statements of this form:

ALTER TABLE tbl_name DEFAULT CHARACTER SET charset;

tbl_name is the name of the table to convert, and charset is the name of the server's default character set.

This latter individual-table upgrade method does have one caveat if you run the server with a multi-byte default character set: The server will appear to believe that the lengths of character columns have decreased after you upgrade to 4.1. This occurs due to the change from byte-oriented semantics in 4.0 to character-oriented semantics in 4.1. Suppose that you have a table created using a MySQL 4.0 server with the following definition:

CREATE TABLE t (c CHAR(100));

If you use a multi-byte character set, the server will show a different table definition after you upgrade to MySQL 4.1 or up. For example, if the server uses the sjis double-byte character set, a SHOW CREATE TABLE t statement will display the following table definition after the MySQL upgrade:

CREATE TABLE t (c CHAR(50)) ENGINE=MyISAM DEFAULT CHARSET=sjis;

This happens because when the 4.1 server sees a 4.0 table, it knows that it must be conservative and assume that the column can hold only as many instances of the widest character from the character set that will fit. sjis characters take two bytes, so the server knows it can fit at most 50 two-byte characters into 100 bytes. Your tables still contain the same data, but if you want to make sure that you can insert 100-character strings into the column in the future, convert it with ALTER TABLE:

ALTER TABLE t MODIFY c CHAR(100);

Upgrading Tables When Character Set Conversion Is Necessary

If your tables contain string columns in which you've been storing data that is not really in the server's default character set, you should convert the tables to assign the proper character set explicitly to those columns. Then the server will handle their contents properly with no special handling. However, you must be careful when assigning character sets to avoid having the server improperly convert column contents. The following conversion procedure should be performed for each table individually after upgrading MySQL to 4.1 or later.

Suppose that you have a table with two character columns, where c1 stores utf8 values and c2 stores ujis values:

CREATE TABLE t
(
    c1 CHAR(75),
    c2 CHAR(125)
);

To convert columns in the table to have explicit character set information, use these statements:

ALTER TABLE t
    MODIFY c1 CHAR(75) CHARACTER SET binary,
    MODIFY c2 CHAR(125) CHARACTER SET binary;
ALTER TABLE t
    MODIFY c1 CHAR(75) CHARACTER SET utf8,
    MODIFY c2 CHAR(125) CHARACTER SET ujis;,

The reason that two steps are necessary is that if you try to convert the columns directly to utf8 and ujis, the server will think it should convert them from latin1 to those character sets. That won't work because the values are not really latin1 and the conversion would mangle them. Instead, use one step that converts the columns to the binary character set, and a second step to convert from binary to the desired character sets. This works properly because conversions to and from the binary character set result in no attempt by the server to interpret the column contents.

There is a special case for which only the second step is necessary. If you were storing strings in a column that already had a binary string data type, you can convert the column directly to the desired character set. For example, if you were storing utf8 strings in a BLOB column, you can convert it directly to TEXT CHARACTER SET utf8. If you were using a 4.0 CHAR(M) BINARY column, 4.1 considers it to have the BINARY(M) data type. You can convert it directly to CHAR(M) CHARACTER SET utf8.

    Team LiB
    Previous Section Next Section