Processing SQL StatementsThe purpose of connecting to the server is to conduct a conversation with it while the connection is open. This section shows how to communicate with the server to process statements. Each statement you execute involves the following steps:
Prior to MySQL 4.1, the client library included a single set of routines for statement execution. These are based on sending each statement as a string to the server and retrieving the results with all columns returned in string format. MySQL 4.1 introduces a binary protocol that allows non-string data values to be sent and returned in native format without conversion to and from string format. This section discusses the original method for processing SQL statements. The section "Using Server-Side Prepared Statements" later in the chapter covers the newer binary protocol. One factor to consider in constructing statements is which function to use for sending them to the server. The more general statement-issuing routine is mysql_real_query(). With this routine, you provide the statement as a counted string (a string plus a length). You must keep track of the length of your statement string and pass that to mysql_real_query(), along with the string itself. Because the statement is treated as a counted string rather than as a null-terminated string, it may contain anything, including binary data or null bytes. The other statement-issuing function, mysql_query(), is more restrictive in what it allows in the statement string but often is easier to use. Any statement passed to mysql_query() should be a null-terminated string. This means the statement text cannot contain null bytes because those would cause it to be interpreted erroneously as shorter than it really is. Generally speaking, if your statement can contain arbitrary binary data, it might contain null bytes, so you shouldn't use mysql_query(). On the other hand, when you are working with null-terminated strings, you have the luxury of constructing statements using standard C library string functions that you're probably already familiar with, such as strcpy() and sprintf(). Another factor to consider in constructing statements is whether you need to perform any character-escaping operations. This is necessary if you want to construct statements using values that contain binary data or other troublesome characters, such as quotes or backslashes. This is discussed in "Working with Strings That Contain Special Characters." A simple outline of statement handling looks like this: if (mysql_query (conn, stmt_str) != 0) { /* failure; report error */ } else { /* success; find out what effect the statement had */ } mysql_query() and mysql_real_query() both return zero for statements that succeed and non-zero for failure. To say that a statement "succeeded" means the server accepted it as legal and was able to execute it. It does not indicate anything about the effect of the statement. For example, it does not indicate that a SELECT statement selected any rows or that a DELETE statement deleted any rows. Checking what effect the statement actually had involves additional processing. A statement may fail for a variety of reasons. Common causes of failure include the following:
Statements may be grouped into two broad categories: those that do not return a result set (a set of rows) and those that do. Statements such as INSERT, DELETE, and UPDATE fall into the "no result set returned" category. They don't return any rows, even for statements that modify your database. What you get back is a count of the number of rows affected. Statements such as SELECT and SHOW fall into the "result set returned" category; after all, the purpose of issuing those statements is to get something back. In the MySQL C API, the result set returned by such statements is represented by the MYSQL_RES data type. This is a structure that contains the data values for the rows, and also metadata about the values (such as the column names and data value lengths). Is it legal for a result set to be empty (that is, to contain zero rows). Handling Statements That Return No Result SetTo process a statement that does not return a result set, issue it with mysql_query() or mysql_real_query(). If the statement succeeds, you can find out how many rows were inserted, deleted, or updated by calling mysql_affected_rows(). The following example shows how to handle a statement that returns no result set: if (mysql_query (conn, "INSERT INTO my_tbl SET name = 'My Name'") != 0) { print_error (conn, "INSERT statement failed"); } else { printf ("INSERT statement succeeded: %lu rows affected\n", (unsigned long) mysql_affected_rows (conn)); } Note how the result of mysql_affected_rows() is cast to unsigned long for printing. This function returns a value of type my_ulonglong, but attempting to print a value of that type directly does not work on some systems. (For example, I have observed it to work under FreeBSD but to fail under Solaris.) Casting the value to unsigned long and using a print format of %lu solves the problem. The same principle applies to any other functions that return my_ulonglong values, such as mysql_num_rows() and mysql_insert_id(). If you want your client programs to be portable across different systems, keep this in mind. mysql_affected_rows() returns the number of rows affected by the statement, but the meaning of "rows affected" depends on the type of statement. For INSERT, REPLACE, or DELETE, it is the number of rows inserted, replaced, or deleted. For UPDATE, it is the number of rows updated, which means the number of rows that MySQL actually modified. MySQL does not update a row if its contents are the same as what you're updating it to. This means that although a row might be selected for updating (by the WHERE clause of the UPDATE statement), it might not actually be changed. This meaning of "rows affected" for UPDATE actually is something of a controversial point because some people want it to mean "rows matched"that is, the number of rows selected for updating, even if the update operation doesn't actually change their values. If your application requires such a meaning, you can request that behavior when you connect to the server by passing a value of CLIENT_FOUND_ROWS in the flags parameter to mysql_real_connect(). Handling Statements That Return a Result SetStatements that return data do so in the form of a result set that you retrieve after issuing the statement by calling mysql_query() or mysql_real_query(). It's important to realize that in MySQL, SELECT is not the only statement that returns rows. Statements such as SHOW, DESCRIBE, EXPLAIN, and CHECK TABLE do so as well. For all of these statements, you must perform additional row-handling processing after issuing the statement. Handling a result set involves these steps: The following example outlines how to process a statement that returns a result set: MYSQL_RES *res_set; if (mysql_query (conn, "SHOW TABLES FROM sampdb") != 0) print_error (conn, "mysql_query() failed"); else { res_set = mysql_store_result (conn); /* generate result set */ if (res_set == NULL) print_error (conn, "mysql_store_result() failed"); else { /* process result set, and then deallocate it */ process_result_set (conn, res_set); mysql_free_result (res_set); } } The example hides the details of result set processing within another function, process_result_set(), which we have not yet defined. Generally, operations that handle a result set are based on a loop that looks something like this: MYSQL_ROW row; while ((row = mysql_fetch_row (res_set)) != NULL) { /* do something with row contents */ } mysql_fetch_row() returns a MYSQL_ROW value, which is a pointer to an array of values. If the return value is assigned to a variable named row, each value within the row may be accessed as row[i], where i ranges from 0 to one less than the number of columns in the row. There are several important points about the MYSQL_ROW data type to note:
What you do with each row depends on the purpose of your application. For purposes of illustration, let's just print each row as a set of column values separated by tabs. To do that, it's necessary to know how many column values rows contain. That information is returned by another client library function, mysql_num_fields(). Here's the code for process_result_set(): void process_result_set (MYSQL *conn, MYSQL_RES *res_set) { MYSQL_ROW row; unsigned int i; while ((row = mysql_fetch_row (res_set)) != NULL) { for (i = 0; i < mysql_num_fields (res_set); i++) { if (i > 0) fputc ('\t', stdout); printf ("%s", row[i] != NULL ? row[i] : "NULL"); } fputc ('\n', stdout); } if (mysql_errno (conn) != 0) print_error (conn, "mysql_fetch_row() failed"); else printf ("%lu rows returned\n", (unsigned long) mysql_num_rows (res_set)); } process_result_set() displays the contents of each row in tab-delimited format (displaying NULL values as the word "NULL"), and then prints a count of the number of rows retrieved. That count is available by calling mysql_num_rows(). Like mysql_affected_rows(), mysql_num_rows() returns a my_ulonglong value, so you should cast its value to unsigned long and use a %lu format to print it. But note that unlike mysql_affected_rows(), which takes a connection handler argument, mysql_num_rows() takes a result set pointer as its argument. The code that follows the loop includes an error test as a precautionary measure. If you create the result set with mysql_store_result(), a NULL return value from mysql_fetch_row() always means "no more rows." However, if you create the result set with mysql_use_result(), a NULL return value from mysql_fetch_row() can mean "no more rows" or that an error occurred. Because process_result_set() has no idea whether its caller used mysql_store_result() or mysql_use_result() to generate the result set, the error test allows it to detect errors properly either way. The version of process_result_set() just shown takes a rather minimalist approach to printing column valuesone that has certain shortcomings. Suppose that you execute this query: SELECT last_name, first_name, city, state FROM president ORDER BY last_name, first_name You will receive the following output, which is not so easy to read: Adams John Braintree MA Adams John Quincy Braintree MA Arthur Chester A. Fairfield VT Buchanan James Mercersburg PA Bush George H.W. Milton MA Bush George W. New Haven CT Carter James E. Plains GA ... We could make the output prettier by providing information such as column labels and making the values line up vertically. To do that, we need the labels, and we need to know the widest value in each column. That information is available, but not as part of the column data valuesit's part of the result set's metadata (data about the data). After we generalize our statement handler a bit, we'll write a nicer display formatter in the section "Using Result Set Metadata." A General Purpose Statement HandlerThe preceding statement-handling examples were written using knowledge of whether the statement should return any data. That was possible because the statements were hardwired into the code: We used an INSERT statement, which does not return a result set, and a SHOW TABLES statement, which does. However, you might not always know what kind of statement a given statement represents. For example, if you execute a statement that you read from the keyboard or from a file, it might be anything. You won't know ahead of time whether to expect it to return rows, or even whether it's legal. What then? You certainly don't want to try to parse the statement to determine what kind of statement it is. That's not as simple as it might seem. For example, it's not sufficient to check whether the first word is SELECT because the statement might begin with a comment, as follows: /* comment */ SELECT ... Fortunately, you don't have to know the statement type in advance to be able to handle it properly. The MySQL C API makes it possible to write a general purpose statement handler that correctly processes any kind of statement, whether it returns a result set, and whether it executes successfully or fails. Before writing the code for this handler, let's outline the procedure that it implements: The following listing shows a function that processes any statement, given a connection handler and a null-terminated statement string: void process_statement (MYSQL *conn, char *stmt_str) { MYSQL_RES *res_set; if (mysql_query (conn, stmt_str) != 0) /* the statement failed */ { print_error (conn, "Could not execute statement"); return; } /* the statement succeeded; determine whether it returned data */ res_set = mysql_store_result (conn); if (res_set) /* a result set was returned */ { /* process rows and then free the result set */ process_result_set (conn, res_set); mysql_free_result (res_set); } else /* no result set was returned */ { /* * does the lack of a result set mean that the statement didn't * return one, or that it should have but an error occurred? */ if (mysql_field_count (conn) == 0) { /* * statement generated no result set (it was not a SELECT, * SHOW, DESCRIBE, etc.); just report rows-affected value. */ printf ("%lu rows affected\n", (unsigned long) mysql_affected_rows (conn)); } else /* an error occurred */ { print_error (conn, "Could not retrieve result set"); } } } Alternative Approaches to Statement ProcessingThe version of process_statement() just shown has these three properties:
Alternative approaches are possible for all three of these aspects of statement handling:
Any or all of these approaches can be used instead of those used in process_statement(). Here is a process_real_statement() function that is analogous to process_statement() but that uses all three alternatives: void process_real_statement (MYSQL *conn, char *stmt_str, unsigned int len) { MYSQL_RES *res_set; if (mysql_real_query (conn, stmt_str, len) != 0) /* the statement failed */ { print_error (conn, "Could not execute statement"); return; } /* the statement succeeded; determine whether it returned data */ res_set = mysql_use_result (conn); if (res_set) /* a result set was returned */ { /* process rows and then free the result set */ process_result_set (conn, res_set); mysql_free_result (res_set); } else /* no result set was returned */ { /* * does the lack of a result set mean that the statement didn't * return one, or that it should have but an error occurred? */ if (mysql_errno (conn) == 0) { /* * statement generated no result set (it was not a SELECT, * SHOW, DESCRIBE, etc.); just report rows-affected value. */ printf ("%lu rows affected\n", (unsigned long) mysql_affected_rows (conn)); } else /* an error occurred */ { print_error (conn, "Could not retrieve result set"); } } } mysql_store_result() and mysql_use_result() ComparedThe mysql_store_result() and mysql_use_result() functions are similar in that both take a connection handler argument and return a result set. However, the differences between them actually are quite extensive. The primary difference between the two functions lies in the way rows of the result set are retrieved from the server. mysql_store_result() retrieves all the rows immediately when you call it. mysql_use_result() initiates the retrieval but doesn't actually get any of the rows. These differing approaches to row retrieval give rise to all other differences between the two functions. This section compares them so that you'll know how to choose the one that's most appropriate for a given application. When mysql_store_result() retrieves a result set from the server, it fetches the rows, allocates memory for them, and stores them in the client. Subsequent calls to mysql_fetch_row() never return an error because they simply pull a row out of a data structure that already holds the result set. Consequently, a NULL return from mysql_fetch_row() always means you've reached the end of the result set. By contrast, mysql_use_result() doesn't retrieve any rows itself. Instead, it simply initiates a row-by-row retrieval, which you must complete yourself by calling mysql_fetch_row() for each row. In this case, although a NULL return from mysql_fetch_row() normally still means the end of the result set has been reached, it may mean instead that an error occurred while communicating with the server. You can distinguish the two outcomes by calling mysql_errno() or mysql_error(). mysql_store_result() has higher memory and processing requirements than does mysql_use_result() because the entire result set is maintained in the client. The overhead for memory allocation and data structure setup is greater, and a client that retrieves large result sets runs the risk of running out of memory. If you're going to retrieve a lot of rows in a single result set, you might want to use mysql_use_result() instead. mysql_use_result() has lower memory requirements because only enough space to handle a single row at a time need be allocated. This can be faster because you're not setting up as complex a data structure for the result set. On the other hand, mysql_use_result() places a greater burden on the server, which must hold rows of the result set until the client sees fit to retrieve all of them. This makes mysql_use_result() a poor choice for certain types of clients:
In both of these types of situations, the client fails to retrieve all rows in the result set quickly. This ties up the server and can have a negative impact on other clients, particularly if you are using a storage engine like MyISAM that uses table locks: Tables from which you retrieve data are read-locked for the duration of the query. Other clients that are trying to update those tables will be blocked. Offsetting the additional memory requirements incurred by mysql_store_result() are certain benefits of having access to the entire result set at once. All rows of the set are available, so you have random access into them: The mysql_data_seek(), mysql_row_seek(), and mysql_row_tell() functions allow you to access rows in any order you want. With mysql_use_result(), you can access rows only in the order in which they are retrieved by mysql_fetch_row(). If you intend to process rows in any order other than sequentially as they are returned from the server, you must use mysql_store_result() instead. For example, if you have an application that allows the user to browse back and forth among the rows selected by a query, you'd be best served by using mysql_store_result(). With mysql_store_result(), you have access to certain types of column information that are unavailable when you use mysql_use_result(). The number of rows in the result set is obtained by calling mysql_num_rows(). The maximum widths of the values in each column are stored in the max_width member of the MYSQL_FIELD column information structures. With mysql_use_result(), mysql_num_rows() doesn't return the correct value until you've fetched all the rows; similarly, max_width is unavailable because it can be calculated only after every row's data have been seen. Because mysql_use_result() does less work than mysql_store_result(), it imposes a requirement that mysql_store_result() does not: The client must call mysql_fetch_row() for every row in the result set. If you fail to do this before issuing another statement, any remaining records in the current result set become part of the next statement's result set and an "out of sync" error occurs. (You can avoid this by calling mysql_free_result() before issuing the second statement. mysql_free_result() will fetch and discard any pending rows for you.) One implication of this processing model is that with mysql_use_result() you can work only with a single result set at a time. Sync errors do not happen with mysql_store_result() because when that function returns, there are no rows yet to be fetched from the server. In fact, with mysql_store_result(), you need not call mysql_fetch_row() explicitly at all. This can sometimes be useful if all that you're interested in is whether you got a non-empty result, rather than what the result contains. For example, to find out whether a table mytbl exists, you can execute this statement: SHOW TABLES LIKE 'mytbl' If, after calling mysql_store_result(), the value of mysql_num_rows() is non-zero, the table exists. mysql_fetch_row() need not be called. Result sets generated with mysql_store_result() should be freed with mysql_free_result() at some point, but this need not necessarily be done before issuing another statement. This means that you can generate multiple result sets and work with them simultaneously, in contrast to the "one result set at a time" constraint imposed when you're working with mysql_use_result(). If you want to provide maximum flexibility, give users the option of selecting either result set processing method. mysql and mysqldump are two programs that do this. They use mysql_store_result() by default but switch to mysql_use_result() if the --quick option is given. Using Result Set MetadataResult sets contain not only the column values for data rows but also information about the data. This information is called the result set "metadata," which includes:
Metadata availability is partially dependent on your result set processing method. As indicated in the previous section, if you want to use the row count or maximum column length values, you must create the result set with mysql_store_result(), not with mysql_use_result(). Result set metadata is helpful for making decisions about how to process result set data:
Earlier, in the section "Handling Statements That Return a Result Set," we wrote a version of process_result_set() that printed columns from result set rows in tab- delimited format. That's good for certain purposes (such as when you want to import the data into a spreadsheet), but it's not a nice display format for visual inspection or for printouts. Recall that our earlier version of process_result_set() produced this output: Adams John Braintree MA Adams John Quincy Braintree MA Arthur Chester A. Fairfield VT Buchanan James Mercersburg PA Bush George H.W. Milton MA Bush George W. New Haven CT Carter James E. Plains GA ... Let's write a different version of process_result_set() that produces tabular output instead by titling and "boxing" each column. This version will display those same results in a format that's easier to interpret: +------------+---------------+---------------------+-------+ | last_name | first_name | city | state | +------------+---------------+---------------------+-------+ | Adams | John | Braintree | MA | | Adams | John Quincy | Braintree | MA | | Arthur | Chester A. | Fairfield | VT | | Buchanan | James | Mercersburg | PA | | Bush | George H.W. | Milton | MA | | Bush | George W. | New Haven | CT | | Carter | James E. | Plains | GA | ... +------------+---------------+---------------------+-------+ The general outline of the display algorithm is as follows:
This exercise provides a good demonstration showing how to use result set metadata because it requires knowledge of quite a number of things about the result set other than just the values of the data contained in its rows. You may be thinking to yourself, "Hmm, that description sounds suspiciously similar to the way mysql displays its output." Yes, it does, and you're welcome to compare the source for mysql to the code we end up with for process_result_set(). They're not the same, and you might find it instructive to compare the two approaches to the same problem. First, it's necessary to determine the display width of each column. The following listing shows how to do this. Observe that the calculations are based entirely on the result set metadata, and make no reference whatsoever to the row values: MYSQL_FIELD *field; unsigned long col_len; unsigned int i; /* determine column display widths -- requires result set to be */ /* generated with mysql_store_result(), not mysql_use_result() */ mysql_field_seek (res_set, 0); for (i = 0; i < mysql_num_fields (res_set); i++) { field = mysql_fetch_field (res_set); col_len = strlen (field->name); if (col_len < field->max_length) col_len = field->max_length; if (col_len < 4 && !IS_NOT_NULL (field->flags)) col_len = 4; /* 4 = length of the word "NULL" */ field->max_length = col_len; /* reset column info */ } This code calculates column widths by iterating through the MYSQL_FIELD structures for the columns in the result set. We position to the first structure by calling mysql_field_seek(). Subsequent calls to mysql_fetch_field() return pointers to the structures for successive columns. The width of a column for display purposes is the maximum of three values, each of which depends on metadata in the column information structure:
Notice that after the display width for a column is known, we assign that value to max_length, which is a member of a structure that we obtain from the client library. Is that allowable, or should the contents of the MYSQL_FIELD structure be considered read-only? Normally, I would say "read-only," but some of the client programs in the MySQL distribution change the max_length value in a similar way, so I assume that it's okay. (If you prefer an alternative approach that doesn't modify max_length, allocate an array of unsigned long values and store the calculated widths in that array.) The display width calculations involve one caveat. Recall that max_length has no meaning when you create a result set using mysql_use_result(). Because we need max_length to determine the display width of the column values, proper operation of the algorithm requires that the result set be generated using mysql_store_result(). In programs that use mysql_use_result() rather than mysql_store_result(), one possible workaround is to use the length member of the MYSQL_FIELD structure, which tells you the maximum length that column values can be. When we know the column widths, we're ready to print. Titles are easy to handle. For a given column, we simply use the column information structure pointed to by field and print the name member, using the width calculated earlier: printf (" %-*s |", (int) field->max_length, field->name); For the data, we loop through the rows in the result set, printing column values for the current row during each iteration. Printing column values from the row is a bit tricky because a value might be NULL, or it might represent a number (in which case we print it right justified). Column values are printed as follows, where row[i] holds the data value and field points to the column information: if (row[i] == NULL) /* print the word "NULL" */ printf (" %-*s |", (int) field->max_length, "NULL"); else if (IS_NUM (field->type)) /* print value right-justified */ printf (" %*s |", (int) field->max_length, row[i]); else /* print value left-justified */ printf (" %-*s |", (int) field->max_length, row[i]); The value of the IS_NUM() macro is true if the column data type indicated by field->type is one of the numeric types, such as INT, FLOAT, or DECIMAL. The final code to display the result set is as follows. Because we're printing lines of dashes multiple times, it's easier to write a print_dashes() function to do so rather than to repeat the dash-generation code several places: void print_dashes (MYSQL_RES *res_set) { MYSQL_FIELD *field; unsigned int i, j; mysql_field_seek (res_set, 0); fputc ('+', stdout); for (i = 0; i < mysql_num_fields (res_set); i++) { field = mysql_fetch_field (res_set); for (j = 0; j < field->max_length + 2; j++) fputc ('-', stdout); fputc ('+', stdout); } fputc ('\n', stdout); } void process_result_set (MYSQL *conn, MYSQL_RES *res_set) { MYSQL_ROW row; MYSQL_FIELD *field; unsigned long col_len; unsigned int i; /* determine column display widths -- requires result set to be */ /* generated with mysql_store_result(), not mysql_use_result() */ mysql_field_seek (res_set, 0); for (i = 0; i < mysql_num_fields (res_set); i++) { field = mysql_fetch_field (res_set); col_len = strlen (field->name); if (col_len < field->max_length) col_len = field->max_length; if (col_len < 4 && !IS_NOT_NULL (field->flags)) col_len = 4; /* 4 = length of the word "NULL" */ field->max_length = col_len; /* reset column info */ } print_dashes (res_set); fputc ('|', stdout); mysql_field_seek (res_set, 0); for (i = 0; i < mysql_num_fields (res_set); i++) { field = mysql_fetch_field (res_set); printf (" %-*s |", (int) field->max_length, field->name); } fputc ('\n', stdout); print_dashes (res_set); while ((row = mysql_fetch_row (res_set)) != NULL) { mysql_field_seek (res_set, 0); fputc ('|', stdout); for (i = 0; i < mysql_num_fields (res_set); i++) { field = mysql_fetch_field (res_set); if (row[i] == NULL) /* print the word "NULL" */ printf (" %-*s |", (int) field->max_length, "NULL"); else if (IS_NUM (field->type)) /* print value right-justified */ printf (" %*s |", (int) field->max_length, row[i]); else /* print value left-justified */ printf (" %-*s |", (int) field->max_length, row[i]); } fputc ('\n', stdout); } print_dashes (res_set); printf ("%lu rows returned\n", (unsigned long) mysql_num_rows (res_set)); } The MySQL client library provides several ways of accessing the column information structures. For example, the code in the preceding example accesses these structures several times using loops of the following general form: mysql_field_seek (res_set, 0); for (i = 0; i < mysql_num_fields (res_set); i++) { field = mysql_fetch_field (res_set); ... } However, the mysql_field_seek() / mysql_fetch_field() combination is only one way of getting MYSQL_FIELD structures. See the entries for the mysql_fetch_fields() and mysql_fetch_field_direct() functions in Appendix G, for other ways of accessing column information structures.
Encoding Special Characters and Binary DataIf a program executes statements entered by the user, you can assume either that those statements are legal or that the program can simply report an error to the user. For example, a user who wants to include a quote character within a quoted string must either double the quote or precede it by a backslash: 'O''Malley' 'O\'Malley' Applications that construct their own statements must take the same precautions. This section describes how to handle quoting issues in string values and how to work with binary data. Working with Strings That Contain Special CharactersIf inserted literally into a statement, data values containing quotes, nulls, or backslashes can cause problems when you try to execute the statement. The following discussion describes the nature of the difficulty and how to solve it. Suppose that you want to construct a SELECT statement based on the contents of the null-terminated string pointed to by the name_val variable: char stmt_buf[1024]; sprintf (stmt_buf, "SELECT * FROM mytbl WHERE name='%s'", name_val); If the value of name_val is something like O'Malley, Brian, the resulting statement is illegal because a quote appears inside a quoted string: SELECT * FROM mytbl WHERE name='O'Malley, Brian' You need to treat the quote specially so that the server doesn't interpret it as the end of the name. The standard SQL convention for doing this is to double the quote within the string. MySQL understands that convention, and also allows the quote to be preceded by a backslash, so you can write the statement using either of the following formats: SELECT * FROM mytbl WHERE name='O''Malley, Brian' SELECT * FROM mytbl WHERE name='O\'Malley, Brian' To deal with this problem, use mysql_real_escape_string(), which encodes special characters to make them usable in quoted strings. Characters that mysql_real_escape_string() considers special are the null character, single quote, double quote, backslash, newline, carriage return, and Ctrl-Z. (The last one is special on Windows, where it often signifies end-of-file.) When should you use mysql_real_escape_string()? The safest answer is "always." However, if you're sure of the format of your data and know that it's okayperhaps because you have performed some prior validation check on ityou need not encode it. For example, if you are working with strings that you know represent legal phone numbers consisting entirely of digits and dashes, you don't need to call mysql_real_escape_string(). Otherwise, you probably should. mysql_real_escape_string() encodes problematic characters by turning them into two-character sequences that begin with a backslash. For example, a null byte becomes '\0', where the '0' is a printable ASCII zero, not a null. Backslash, single quote, and double quote become '\\', '\'', and `\"'. To use mysql_real_escape_string(), invoke it like this: to_len = mysql_real_escape_string (conn, to_str, from_str, from_len); mysql_real_escape_string() encodes from_str and writes the result into to_str. It also adds a terminating null, which is convenient because you can use the resulting string with functions such as strcpy(), strlen(), or printf(). from_str points to a char buffer containing the string to be encoded. This string may contain anything, including binary data. to_str points to an existing char buffer where you want the encoded string to be written; do not pass an uninitialized or NULL pointer, expecting mysql_real_escape_string() to allocate space for you. The length of the buffer pointed to by to_str must be at least (from_len*2)+1 bytes long. (It's possible that every character in from_str will need encoding with two characters; the extra byte is for the terminating null.) from_len and to_len are unsigned long values. from_len indicates the length of the data in from_str; it's necessary to provide the length because from_str may contain null bytes and cannot be treated as a null-terminated string. to_len, the return value from mysql_real_escape_string(), is the actual length of the resulting encoded string, not counting the terminating null. When mysql_real_escape_string() returns, the encoded result in to_str can be treated as a null-terminated string because any nulls in from_str are encoded as the printable '\0' sequence. To rewrite the SELECT-constructing code so that it works even for name values that contain quotes, we could do something like this: char stmt_buf[1024], *p; p = strcpy (stmt_buf, "SELECT * FROM mytbl WHERE name='"); p += strlen (p); p += mysql_real_escape_string (conn, p, name_val, strlen (name_val)); *p++ = '\''; *p = '\0'; Yes, that's ugly. To simplify the code a bit, at the cost of using a second buffer, do this instead: char stmt_buf[1024], buf[1024]; (void) mysql_real_escape_string (conn, buf, name_val, strlen (name_val)); sprintf (stmt_buf, "SELECT * FROM mytbl WHERE name='%s'", buf); It's important to make sure that the buffers you pass to mysql_real_escape_string() really exist. Consider the following example, which violates that principle: char *from_str = "some string"; char *to_str; unsigned long len; len = mysql_real_escape_string (conn, to_str, from_str, strlen (from_str)); What's the problem? to_str must point to an existing buffer, and it doesn'tit's not initialized and may point to some random location. Don't pass an uninitialized pointer as the to_str argument to mysql_real_escape_string() unless you want it to stomp merrily all over some random piece of memory. Working with Binary DataAnother problematic situation involves the use of arbitrary binary data in a statement. This happens, for example, in applications that store images in a database. Because a binary value may contain any character (including null bytes, quotes, or backslashes), it cannot be considered safe to put into a statement as is. mysql_real_escape_string() is essential for working with binary data. This section shows how to do so, using image data read from a file. The discussion applies to any other form of binary data as well. Suppose that you want to read images from files and store them in a table named picture, along with a unique identifier. The MEDIUMBLOB type is a good choice for binary values less than 16MB in size, so you could use a table specification like this: CREATE TABLE picture ( pict_id INT NOT NULL PRIMARY KEY, pict_data MEDIUMBLOB ); To actually get an image from a file into the picture table, the following function, load_image(), does the job, given an identifier number and a pointer to an open file containing the image data: int load_image (MYSQL *conn, int id, FILE *f) { char stmt_buf[1024*1000], buf[1024*10], *p; unsigned long from_len; int status; /* begin creating an INSERT statement, adding the id value */ sprintf (stmt_buf, "INSERT INTO picture (pict_id,pict_data) VALUES (%d,'", id); p = stmt_buf + strlen (stmt_buf); /* read data from file in chunks, encode each */ /* chunk, and add to end of statement */ while ((from_len = fread (buf, 1, sizeof (buf), f)) > 0) { /* don't overrun end of statement buffer! */ if (p + (2*from_len) + 3 > stmt_buf + sizeof (stmt_buf)) { print_error (NULL, "image is too big"); return (1); } p += mysql_real_escape_string (conn, p, buf, from_len); } *p++ = '\''; *p++ = ')'; status = mysql_real_query (conn, stmt_buf, (unsigned long) (p - stmt_buf)); return (status); } load_image() doesn't allocate a very large statement buffer (1MB), so it works only for relatively small images. In a real-world application, you might allocate the buffer dynamically based on the size of the image file. Getting an image value (or any binary value) back out of a database isn't nearly as much of a problem as putting it in to begin with. The data value is available in raw form in the MYSQL_ROW variable, and the length is available by calling mysql_fetch_lengths(). Just be sure to treat the value as a counted string, not as a null-terminated string. |