Section 6.4. Extending PMD

6.4. Extending PMD

PMD's default rulesets serve as a solid foundation for developing new rulesets to find common insecure coding practices. Initial rules should target code that's high-risk and easily exploitable. Dynamically building SQL statements with user-controllable input is a good example of high-risk code commonly vulnerable to SQL injection. Rule implementations should be adaptable to new or previously unseen custom code. This is an important feature, as web applications differ in how they implement common functionality, such as authentication, authorization, and data access.

What follows is a walkthrough of a web application security rule that flags symptom code commonly vulnerable to SQL injectionSQL select statements concatenated with user-controllable input. The DynSqlSelectStmts class implements the rule logic and is located in the net.sourceforge.pmd.rules.web.security package. This implementation doesn't cover every potential instance of dynamic SQL. It serves only as a guide for writing future security rules that target a variety of symptom code.

6.4.1. Objectives

The primary objective of DynSqlSelectStmts is to identify and report dynamically built SQL statements embedded in Java code. For each instance of dynamic SQL, the class identifies and reports any concatenated expressions, such as variables and method calls that return data. Then the rule traces each expression to determine whether they are linked to sources of user-controllable input. Ultimately, the rule generates a list of PMD security violations that serve as a test plan for manually verifying SQL Injection vulnerabilities.

For the purposes of this rule, method arguments/parameters in the source are considered user-controllable input. Therefore, data tracing from identified symptom code to sources of user-controllable input is limited to the scope of a single method.

Consider the simple doGet method in Example 6-1. Based on the outlined objectives, the class should accomplish the following, in the order shown:

Identify and report the dynamic SQL statement.
Identify the concatenated variable id, and trace it back to the request object (i.e., user-controllable input).
Halt the trace upon reaching the request object and report the finding.

The request object is a source of user-controllable input and therefore warrants close manual inspection, especially if it's related to a security violation PMD has reported. In the big picture of code review, do not forget to investigate the response object for potential security vulnerabilities.

Example 6-1. Simplified doGet method

public void doGet(HttpServletRequest request, HttpServletResponse response)
                               throws ServletException, IOException {
 
    ...  
    String id = request.getParameter("id");
    ...  
    String strSql = "SELECT * FROM USERS WHERE ID = '" + id + "'";

6.4.2. Code Walkthrough

The following code walkthrough includes only the code most relevant to the functionality of the DynSqlSelectStmts class. To view the source code in its entirety, see Example 6-6 later in this chapter.

The SqlInjectionExample.java example source file (refer to Example 6-5) is referenced throughout the code walkthrough to help you better understand the class implementation of DynSqlSelectStmts.

The DynSqlSelectStmts class imports the net.sourceforge.pmd.ast package, much like other PMD rules. Where this rule differs is the org.apache.regexp package, which provides an API for building regular expressions.

package net.sourceforge.pmd.rules.web.security;

import net.sourceforge.pmd.ast.*;

import org.apache.regexp.*;
import java.util.*;
import java.text.MessageFormat;

You can download the regexp package from the Apache Jakarta Project web site at http://jakarta.apache.org. You can substitute this package for any other available Java API supporting regular expressions.

Similar to other PMD rules, DynSqlSelectStmts extends the AbstractRule base class. The debug class variable controls the printing of debug statements to standard out at runtime:

public class DynSqlSelectStmts extends AbstractRule {
    
    private static boolean debug = true;

If you're working with PMD for the first time, you'll want to leave debug set to true.

The PATTERN variable is initialized with the select.+from stringa regular expression to identify SQL select statements. The syntax of this regular expression matches strings containing the words select and from, with one or more of any character in between. You can adapt this regular expression to match other types of dynamic SQL, such as insert into or update statements.

    private static final String PATTERN = "select.+from";

At first glance, the select.+from pattern seems like it will generate a large number of false positives. For example, commented code or HTML could contain strings with a similar pattern. Unlike pattern-matching tools, PMD can minimize these false positives. By generating an AST from the source file, our class can drill down on specific code (such as string concatenation), look at surrounding nodes for additional symptoms, and more accurately report whether a pattern is potentially vulnerable to SQL injection.

The next set of variables store data about the current method being visited in the AST. For simplicity these are prefixed with the description currMeth:

    private String currMethName;
    private int currMethXsVis;
    private Map currMethParams;
    private String currMethSymptomCode;
    private List currMethExprsToChase;
    private List currMethVarsChased;

Here is a brief description of each variable:

currMethName

String representing the method name.

currMethXsVis

Maintains the number of visits to the method.

currMethParams

Stores the name and type of each method parameter as a set of key/value pairs.

currMethSymptomCode

String representing the SQL select statement potentially vulnerable to SQL Injection.

currMethExprsToChase

List of expressions (variables, method calls returning data, etc.) concatenated to the identified SQL statement.

currMethVarsChased

Maintains an ordered list of expressions that are initialized when tracing from symptom code to sources of user-controllable input.

Refer back to the doGet method in Example 6-1. After tracing from the SQL select statement to request.getParameter (user-controllable input), currMethVarsChased would contain the following:

Position

Value

0

request.getParameter

1

id

The DynS qlSelectStmts class implements visit methods for each node of interest when traversing the AST. The remaining code walkthrough focuses on the visit method for each of the following AST nodes:

ASTCompilationUnit
ASTClassBodyDeclaration
ASTMethodDeclaration
ASTMethodDeclarator
ASTAdditiveExpression

Relevant sections of the SqlInjectionExample.java AST are shown with each visit method to help you follow the logic (refer to Example 6-5 for the full source code of this file). In addition, you might want to have the following:

EBNF grammar reference^[2] (useful for understanding the structure of the AST)
^[2] You can review the EBNF grammar reference at http://cvs.sourceforge.net/viewcvs.py/pmd/pmd/etc/grammar/Java1.4-c.jjt?rev=1.3&content-type=text/vnd.viewcvs-markup.
AST Viewer and Designer utilities included with your downloaded PMD distribution (paste the Java source into these utilities to see the full AST)
PMD API documentation

6.4.2.1 ASTCompilationUnit

Here is the relevant node for ASTCompilationUnit from the SqlInjectionExample.java AST:

     CompilationUnit

The ASTCompilationUnit node is always the first node and the code contained in its visit method is executed for each source file scanned:

public Object visit(ASTCompilationUnit node, Object data)
{
         getInfo(node);
         printDebug("Rule: " + this.getName( ) + "\n\n");
         return super.visit(node,data);
}

Every visit method in the class begins with a call to getInfo, which retrieves the class name and scope of the node and prints this information to standard out. The printDebug method prints debug statements to standard out (if debug is set to true). To programmatically enable and disable debugging code, the setDebug convenience method is provided. By returning with a call to visit of the super class, DynSqlSelectStmts continues analyzing child nodes.

To make the code more readable, subsequent debug statements have been removed.

6.4.2.2 ASTClassBodyDeclaration

Here are the relevant nodes for ASTClassBodyDeclaration from the SqlInjectionExample.java AST:

     ClassBodyDeclaration
      MethodDeclaration:(public)

Because the identification of symptom code is on a per-method basis, visit looks at the immediate child node of ASTClassBodyDeclaration to check whether it is an instance of ASTMethodDeclaration. If it is not a method declaration, it returns null to avoid unnecessary visits to children nodes.

    public Object visit(ASTClassBodyDeclaration node, Object data) 
    {
        getInfo(node);
        if (!(node.jjtGetChild(0) instanceof ASTMethodDeclaration))
        {
            return null;
        } 
        this.init( );

When traversing the AST, calls to getInfo are handy if you're unsure of the current node. To see this information printed to standard out, you must set debug to true.

Otherwise it continues with a call to init in preparation for the upcoming method. This init( ) method is shown in Example 6-2.

Example 6-2. init( ) method

private void init ( )
{
currMethName = "";
    currMethXsVis = 0;
    currMethParams = new HashMap( );
    currMethSymptomCode = "";        
    currMethExprsToChase = new ArrayList( );    
    currMethVarsChased = new LinkedList( );
}

By returning with a call to visit of the super class, DynSqlSelectStmts continues analyzing child nodes. The previous conditional statement guarantees the next visit is to ASTMethodDeclaration:

         return super.visit(node,data);
    }

6.4.2.3 ASTMethodDeclaration

Here are the relevant nodes for ASTMethodDeclaration from the SqlInjectionExample.java AST:

      MethodDeclaration:(public)
       ResultType
       MethodDeclarator:doGet
        FormalParameters
         FormalParameter:(package private)
          Type:
           Name:HttpServletRequest
          VariableDeclaratorId:request
         FormalParameter:(package private)
          Type:
           Name:HttpServletResponse
          VariableDeclaratorId:response

The ASTMethodDeclaration node marks the beginning of the current method in the source:

public Object visit(ASTMethodDeclaration node, Object data) 
{
        getInfo(node);

On every visit to this node, currMethXsVis is incremented by 1. If it's the first visit the name of the current method is retrieved:

        currMethXsVis++;
        
        if (currMethXsVis == 1)
        {
            currMethName = ((ASTMethodDeclarator)node.jjtGetChild(1)).getImage( );
        }

When currMethXsVis is greater than 1, the class is in the midst of its data-chasing logici.e., tracing from symptom code to sources of user input. To avoid confusion, let's assume this is the first visit to the current method and defer an explanation of the else block until later:

        else 
        {    
            List locVarDecList =                 
        (ArrayList)node.findChildrenOfType(ASTLocalVariableDeclaration.class);
            for (Iterator j = locVarDecList.iterator( ); j.hasNext( );) 
            {
                if (currMethExprsToChase.size( ) > 0)
                   chkLocVarsForUCI((ASTLocalVariableDeclaration)j.next( ),data);
                else
                   break;
            }
            
            return null;
        }

As mentioned before, a call to visit of the super class ensures our class traverses the children of this node:

        return super.visit(node,data);
    }

6.4.2.4 ASTMethodDeclarator

Here are the relevant nodes for ASTMethodDeclarator from the SqlInjectionExample.java AST:

       MethodDeclarator:doGet
        FormalParameters
         FormalParameter:(package private)
          Type:
           Name:HttpServletRequest
          VariableDeclaratorId:request
         FormalParameter:(package private)
          Type:
           Name:HttpServletResponse
          VariableDeclaratorId:response

On the first visit to ASTMethodDeclarator, visit retrieves a list of method arguments/parameters with a call to getCurrMethParams:

    public Object visit(ASTMethodDeclarator node, Object data) 
    {
        getInfo(node);
        
        if (currMethXsVis == 1)
        {
            getCurrMethParams(node);
        }

getCurrMethParams begins with a call to getParameterCount, which returns the parameter count for the method in scope. If this number is greater than zero, the code retrieves each parameter represented by the ASTFormalParameter class and stores the name and type as key/value pairs in currMethParams. This list of parameters represents the sources of user-controllable input for the current method.

private void getCurrMethParams (ASTMethodDeclarator node)
{ 
 if (node.getParameterCount( ) > 0) 
 {
  List methodParams = node.findChildrenOfType(ASTFormalParameter.class);
  for (Iterator i = methodParams.iterator( );i.hasNext( );) 
  {
   ASTFormalParameter p = (ASTFormalParameter)i.next( );    
   ASTName pType = (ASTName)p.jjtGetChild(0).jjtGetChild(0);
   ASTVariableDeclaratorId pName = (ASTVariableDeclaratorId)p.jjtGetChild(1);
currMethParams.put(pName.getImage( ),pType.getImage( ));
  }
 }
}

After calling getCurrMethParams, the visit method resumes execution with a call to visit of the super class:

    return super.visit(node,data);
    }

With a list of all user-controllable input for the current method, the class could trace each parameter through the AST. The data paths for some of the parameters might lead to symptom code. Methods with a large number of parameters and the likelihood for multiple complex data paths make this approach inefficient. Instead, the class takes a more direct approach by targeting indicators of high-risk code firsti.e., the symptomatic code approach. It navigates down the AST, visiting ASTAdditiveExpression nodes, because these are indicators of string concatenation and, more specifically, the dynamic building of SQL statements. Further analysis is required to confirm this assumption as well as to chase any expressions concatenated to sources of user-controllable input. A benefit of this approach is that methods without parameters are still analyzed for the presence of dynamic SQL.

Pursuing indicators of high-risk code, as opposed to vulnerabilities themselves, enables the tester (and not the tool) to decide if code is vulnerable and exploitable. The added functionality of tracing from symptom code to user-controllable input is valuable because:

It saves the tester from having to perform this task manually.
It provides information about the data path to further assist the tester when making decisions about the exploitability of a piece of code.

6.4.2.5 ASTAdditiveExpression

Here are relevant nodes for ASTAdditiveExpression from the SqlInjectionExample.java AST:

AdditiveExpression:+
  PrimaryExpression
   PrimaryPrefix
    Literal:"SELECT * FROM USERS WHERE ID = '"
  PrimaryExpression
   PrimaryPrefix
    Name:id
  PrimaryExpression
   PrimaryPrefix
    Literal:"'"

To hone in on dynamic SQL, the class visits ASTAdditiveExpression nodes and its children ASTLiteral and ASTName nodes. The visit method begins by searching down the AST for ASTLiteral nodes becausethey are likely to contain SQL strings. If it finds any, the code extracts the string stored by the node and passes this value to isMatch:

public Object visit(ASTAdditiveExpression node, Object data) {
        
        getInfo(node);
        
        List literals = node.findChildrenOfType(ASTLiteral.class);
        
        for (Iterator l = literals.iterator( ); l.hasNext( );) 
        {
            ASTLiteral astLiteral = (ASTLiteral)l.next( );
            String literal = astLiteral.getImage( );
            if (literal != null && isMatch(literal))
            {

The isMatch method relies on the regular expression stored in the PATTERN variable to detect the presence of SQL select statements. The org.apache.regexp.RE class creates the regular expression and matches it against each literal. The method sets the case-independent flag because the case sensitivity of SQL statements often varies with code implementation. A successful match returns the Boolean TRue, indicating the existence of dynamic SQL in the source:

private boolean isMatch(String literal)
{
    boolean match = false;
     
    RE sql = new RE(PATTERN);
     
    sql.setMatchFlags(RE.MATCH_CASEINDEPENDENT);
        
    return sql.match(literal);
}

When isMatch returns true, the class prepares to add a security violation to the PMD report. The SQL literal is stored for future reference and is added to the message of the current security violation:

                    RuleContext ctx = (RuleContext) data;
                    currMethSymptomCode = literal;
                    String msg = MessageFormat.format(getMessage( ), new
                        Object[]{"SQL select statement detected: " + 
                            currMethSymptomCode});

The format method of java.text.MessageFormat customizes the generic message in dynamicsql.xml, as in Example 6-3, by including the identified symptom code, which in this case is an SQL select statement.

Example 6-3. Snippet from dynamicsql.xml

<ruleset>
    <rule name="DynSqlSelectStmts" message="'' {0} ''" class="net.sourceforge.pmd.
rules.web.security.DynSqlSelectStmts">
...

The next line of code actually adds the security violation to the PMD report:

ctx.getReport( ).addRuleViolation(createRuleViolation(ctx, 
   astLiteral.getBeginLine( ), msg));

At this point the class implementation satisfies its primary objective: to identify and report dynamically built SQL statements. The next task is to identify expressions concatenated to the dynamic SQL and determine whether they contain user-controllable input. Examples of these expressions include method parameters, local variables, and methods calls that return data, each a potential source of user-controllable input. Examples from the doGet method (Example 6-1) include the following:

Method parameter

Request

Local variable

id

Method that returns data

request.getParameter

In the AST, ASTName nodes represent these expressions and are therefore retrieved for analysis:

List names = node.findChildrenOfType(ASTName.class);

If the list size is greater than zero, the entire list is passed to chkForUCI to determine whether any of the expressions are a source of user-controllable input:

if ( names.size( ) > 0 )
{
     ArrayList uci = chkForUCI(names);

The chkForUCI method, shown in Example 6-4, compares each ASTName node to those stored in the currMethParams class variable. Although the nodes can refer to the same instance of an object, they are not always identical expressions. For example, a method parameter named request of type HttpServletRequest could appear in an ASTName node in these forms: request.getParameter, request.getQueryString, request.getCookies, request.getHeader, etc. To determine whether these represent sources of user-controllable input, the class could compare them against a list of HttpServletRequest methods known to retrieve user-controllable input from an HTTP request. While exact-match comparisons are ideal for well-known objects (such as HttpServletRequest), the technique falls short when looking for representations of user-controllable input with unfamiliar or custom objects. Instead, the org.apache.regexp.RE regular expression evaluator class is used to compare method parameters (i.e., request) to specific uses of those objects (i.e., request.getParameter). The most effective approach is a combination of exact match and regular expression comparisons.

Example 6-4. chkForUCI( )

private ArrayList chkForUCI(List names) 
{ 
 ArrayList uci = new ArrayList( );
 for (Iterator i = names.iterator( );i.hasNext( );) 
 {
  ASTName name = (ASTName)i.next( );
  for (Iterator j = currMethParams.keySet( ).iterator( );                                 
    j.hasNext( );) 
  {
   String currMethParam = (String)j.next( );
   RE re = new RE (currMethParam);
   if ( re.match(name.getImage( )) ) 
   {
    uci.add(name);
    break;
   }
  }
 }
  
 return uci;
}

chkForUCI returns a list of ASTName nodes that represent user-controllable input linked to the previously identified SQL select statement. These symptoms point to the existence of a potentially exploitable SQL Injection vulnerability in the source.

Next, the code reports the security violation along with the appended user-controllable input, which is similar to that already described. At this point, the rule has satisfied the objective: to identify and report user-controllable input concatenated to dynamic SQL statements.

if ( ! uci.isEmpty( ) )
{
  // Report the violation
}

The following SQL statement would be reported as a potentially exploitable SQL Injection vulnerability.

String strSql = "select * from user where USER_ID = '" + request.getParameter("id") + "'";

If chkForUCI returns an empty list, none of the expressions concatenated to the SQL statement represents immediate sources of user-controllable input (for example, id in the previous AST). However, these expressions might be on a data path that traces back to user-controllable input. To kick off the data-tracing logic, the code stores the expressions (ASTName nodes) into currMethExprsToChase and revisits the ASTMethodDeclaration node (refer to the next section, Section 6.4.2.6, to step through this code):

 else 
 {
   currMethExprsToChase = new ArrayList(names);
   visit( (ASTMethodDeclaration)         
node.getFirstParentOfType(ASTMethodDeclaration.class),data);

When visit returns, the data-chasing logic is complete for the expressions in currMethExprsToChase. The calls to init() and super.visit mark the end of analysis for the method in scope and allow the class to visit the next available ASTClassBodyDeclaration node:

                            this.init( );
                        }
                    }
                }
            }
        }
        
        return super.visit(node,data);
    }

6.4.2.6 Data tracing

The data-tracing logic presented in this section follows data paths that are linked by consecutive variable initializations. Demonstrating this technique should give you an idea of how to implement data tracing for other potential scenarios.

Here are the relevant nodes from the SqlInjectionExample.java AST:

 LocalVariableDeclaration:(package private)
  Type:
   Name:String
  VariableDeclarator
   VariableDeclaratorId:id
   VariableInitializer
    Expression
     PrimaryExpression
      PrimaryPrefix
       Name:request.getParameter

Revisiting the ASTMethodDeclaration method diverts execution to the else block, which retrieves a list of ASTLocalVariableDeclaration nodes and passes each to chkLocVarsForUCI:

    public void chkLocVarsForUCI(ASTLocalVariableDeclaration node, Object data) 
    {

This method retrieves the name of the local variable declaration from ASTVariableDeclaratorId (id in the AST) and stores it in varName:

        ASTVariableDeclarator varDec = (ASTVariableDeclarator)
                                                 node.jjtGetChild(1);
        String varName =
                ((ASTVariableDeclaratorId)varDec.jjtGetChild(0)).getImage( );

Then the code looks for the expression initializing the local variable (such as request.getParameter in the AST). If an ASTName node is found, the method stores the expression into initExp; otherwise, it returns to visit to analyze the remaining ASTLocalVariableDeclaration nodes:

        ASTVariableInitializer varInit =
                         (ASTVariableInitializer)varDec.jjtGetChild(1);
        
        If (varInit.findChildrenOfType(ASTName.class).get(0) instanceof
                                                            ASTName) 
        {
            ASTName initExp = (ASTName) 
                           varInit.findChildrenOfType(ASTName.class).get(0);
        } else {
            return;
        }

Assuming an ASTName node is retrieved, the code iterates over currMethExprsToChase (which would contain id after visiting ASTAdditiveExpression), comparing each expression to the local variable stored in varName (id in this case). A match means the class found the initialization of the expression concatenated to the dynamic SQL:

        boolean chase = false;
        boolean srcOfUCI = false;
        int cnt = 0;
        int index = 0;
        for (Iterator i = currMethExprsToChase.iterator( ); i.hasNext( );) 
        {
            ASTName currNode = (ASTName)i.next( );
            if ( currNode.getImage( ).matches(varName) )
            {

The chase Boolean variable controls whether additional data tracing is required (i.e., the initializing expression for the local variable is not user-controllable input) and srcOfUCI TRiggers the reporting code if the initializing expression is a source of user-controllable input. The cnt integer tracks the current position in the currMethExprsToChase array. index stores the value of cnt when either chase or srcOfUCI is set to true.

If varName matches the name of an expression in currMethExprsToChase, the variable is added to the end of currMethVarsChased and the initializing expression initExp (request.getParameter in this case) is checked as a source of user-controllable input. This implementation of chkForUCI is an overloaded version of the previously discussed chkForUCI. It takes a single ASTName node as an argument and returns a string containing the user-controllable input, if the passed-in node matches one in currMethParams (as a result of visiting ASTMethodDeclarator, currMethParams would contain the request object and match the initializing expression request.getParameter identifying it as a source of user-controllable input):

                ((LinkedList)currMethVarsChased).addLast(currNode.getImage( ));
                String uci = chkForUCI(initExp);

Given that uci is not null, srcOfUCI is set to true, triggering the following block of code that reports initExp as user-controllable input. The index integer stores the current position in the currMethExprsToChase array so that the previously matched expression (id in this case) can be removed, as it no longer needs to be chased. The break keyword exits the loop.

                if (uci != null)
                {        
                    srcOfUCI = true;
                    index = cnt;
                    break;
                }

If uci is null (i.e., initExp is not a source of user-controllable input), chase is set to true, which repeats the data-tracing code for initExp. Similar to the preceding if block, the index integer stores the current position in the currMethExprsToChase array so that its contents can be replaced with initExp, as this initializing expression now needs to be chased. The break keyword exits the loop.

                else 
                {
                    chase = true;
                    index = cnt;
                    break;                }
            }
        }
        cnt++;
    }

If srcOfUCI is true, the local variable initialized with initExp is removed from currMethExprsToChase and initExp is added to the end of currMethVarsChased. The initializing expression is also added to the PMD report as a source of user-controllable input, making the previously identified dynamic SQL statement a likely SQL Injection candidate.

Remember, you can verify this vulnerability with access to a live instance of the application.

    if (srcOfUCI)
    {
        ((ArrayList)currMethExprsToChase).remove(index);
                
        ((LinkedList)currMethVarsChased).addLast(initExp.getImage( ));
        
        // Report the violation
        
        currMethVarsChased = new LinkedList( );
            
    }

If chase is TRue, currMethExprsToChase is updated with initExp and the data-chasing logic is repeated with a new call to the ASTMethodDeclaration visit method. This last method call of the data-chasing routine ensures that the rule continues to trace variable initializations until the original source of user-controllable input is found.

    else if (chase)
    {
        ((ArrayList)currMethExprsToChase).remove(index);

        ((ArrayList)currMethExprsToChase).add(index,initExp);

        visit( 
            (ASTMethodDeclaration)node.getFirstParentOfType
                                (ASTMethodDeclaration.class), data);
    }
}

To illustrate this new rule in action, Figure 6-2 shows the report PDM generated when scanning SqlInjectionExample.java.

Figure 6-2. PMD report for SqlInjectionExample.java

In summary, the DynSqlSelectStmts class is designed to help testers find exploitable SQL Injection vulnerabilities by flagging instances of dynamic SQL and tracing backward to determine whether the symptom code is tied to sources of user-controllable input. The concepts, ideas, and code examples provided in this chapter should supply the groundwork for building future security rules that target a variety of symptom code, regardless of the static code analysis tool you use.

6.4.3. SqlInjectionExample.java

Example 6-5 provides the full source code of the SqlInjectionExample.java example discussed in this chapter.

Example 6-5. Source code for SqlInjectionExample.java

import java.io.*;
import java.sql.*;
import javax.servlet.*;
import javax.servlet.http.*;

public class SqlInjectionExample extends HttpServlet {

  public void doGet(HttpServletRequest request, HttpServletResponse response)
                               throws ServletException, IOException {
    Connection con = null;
    Statement stmt = null;
    ResultSet rs = null;

    response.setContentType("text/html");
    PrintWriter out = response.getWriter( );

    String id = request.getParameter("id");

    try {

      Class.forName("oracle.jdbc.driver.OracleDriver");

      con = DriverManager.getConnection(
        "jdbc:oracle:thin:@dbhost:1521:ORCL", "user", "passwd");

      String strSql = "SELECT * FROM USERS WHERE ID = '" + id + "'";

      stmt = con.createStatement( );

      rs = stmt.executeQuery(strSql);

      out.println("<HTML><HEAD><TITLE>SqlInjectionExample</TITLE></HEAD>");
      out.println("<BODY>");
      while(rs.next( )) {
        out.println(rs.getString("firstname") + "&nbsp;" + rs.getString("lastname"));
      }
      out.println("</BODY></HTML>");
    }
    catch(ClassNotFoundException e) {
      out.println("Couldn't load database driver: " + e.getMessage( ));
    }
    catch(SQLException e) {
      out.println("SQLException caught: " + e.getMessage( ));
    }
    finally {

      try {
        if (con != null) con.close( );
      }
      catch (SQLException ignored) { }
    }
  }
}

6.4.4. DynSqlSelectStmts.java

Example 6-6 provides the full source code of the DynSqlSelectStmts.java example discussed in this chapter.

Example 6-6. Source code for DynSqlSelectStmts.jav

package net.sourceforge.pmd.rules.web.security;

import net.sourceforge.pmd.AbstractRule;
import net.sourceforge.pmd.ast.*;
import net.sourceforge.pmd.RuleContext;
import org.apache.regexp.*;
import java.util.*;
import java.text.MessageFormat;

public class DynSqlSelectStmts extends AbstractRule {

  private static boolean debug = true;

  private static final String PATTERN = "select.+from";

  private String currMethName;
  private int currMethXsVis;
  private Map currMethParams;
  private String currMethSymptomCode;
  private List currMethExprsToChase;
  private List currMethVarsChased;

  private void init ( )
  {
    currMethName = "";
    currMethXsVis = 0;
    currMethParams = new HashMap( );
    currMethSymptomCode = "";
    currMethExprsToChase = new ArrayList( );
    currMethVarsChased = new LinkedList( );
  }

  public void setDebug (boolean x)
  {
    debug = x;
  }

  public void printDebug (String str)
  {
    if (debug)
      System.out.print(str + "\n");
  }
  public Object visit(ASTCompilationUnit node, Object data)
  {
    getInfo(node);
    printDebug("Rule: " + this.getName( ) + "\n\n");
    return super.visit(node,data);
  }
  public Object visit(ASTClassBodyDeclaration node, Object data)
  {
    getInfo(node);

    if (!(node.jjtGetChild(0) instanceof ASTMethodDeclaration))
    {
      return null;
    }

    this.init( );

        return super.visit(node,data);
  }

  public Object visit(ASTMethodDeclaration node, Object data)
  {
      getInfo(node);
      currMethXsVis++;
      printDebug ("Number of visits to " + node.getClass( ).getName( ) + ": " + currMethXsVis +
        "\n");

      if (currMethXsVis == 1)
      {
        currMethName = ((ASTMethodDeclarator)node.jjtGetChild(1)).getImage( );
        printDebug ("Current Method: " + currMethName + "\n");
      }

      else
      {
        List locVarDecList = (ArrayList)node.findChildrenOfType
           (ASTLocalVariableDeclaration.class);
        for (Iterator j = locVarDecList.iterator( ); j.hasNext( );)
        {
          if (currMethExprsToChase.size( ) > 0)
            chkLocVarsForUCI((ASTLocalVariableDeclaration)j.next( ),data);
          else
            break;
        }

        return null;
      }

      return super.visit(node,data);
    }

  public Object visit(ASTMethodDeclarator node, Object data)
  {
    getInfo(node);

    if (currMethXsVis == 1)
    {
      getCurrMethParams(node);
      printCurrMethParams( );
    }
    return super.visit(node,data);
  }


  public Object visit(ASTAdditiveExpression node, Object data)
  {

    getInfo(node);

    List literals = node.findChildrenOfType(ASTLiteral.class);

      for (Iterator l = literals.iterator( ); l.hasNext( );)
      {
        ASTLiteral astLiteral = (ASTLiteral)l.next( );
        String literal = astLiteral.getImage( );
        printDebug("Literal: " + literal + "\n");

        if (literal != null && isMatch(literal))
        {
          RuleContext ctx = (RuleContext) data;
          currMethSymptomCode = literal;
          String msg = MessageFormat.format(getMessage( ), new Object[]
            {"SQL select statement detected: " + currMethSymptomCode});
          printDebug("Report message: " + msg + "\n");
          ctx.getReport( ).addRuleViolation(createRuleViolation
            (ctx, astLiteral.getBeginLine( ), msg));

          // Look for expression(s) other than literals appended to SQL
          List names = (ArrayList) node.findChildrenOfType(ASTName.class);
          if ( names.size( ) > 0 )
          {
            // Check whether the appended expression(s) are UCI
            List uci = chkForUCI(names);
            if ( ! uci.isEmpty( ) )
            {
              for (Iterator i = uci.iterator( );i.hasNext( );)
              {
                ASTName n = (ASTName)i.next( );
                msg = MessageFormat.format(getMessage( ), new Object[]
                 {"SQL select statement detected with UCI: " + n.getImage( )});
                printDebug("Report message: " + msg + "\n");
                ctx.getReport( ).addRuleViolation
                  (createRuleViolation(ctx, astLiteral.getBeginLine( ), msg));
              }
            }

            /*
             * Expression(s) appended to SQL are not immediate source of UCI
             * Re-visit method declaration to begin logic for finding initializer of UCI
             */

            else
            {
              printDebug ("Expression(s) appended to SQL are not immediate source of 
                UCI\n\n");
              currMethExprsToChase = new ArrayList(names);
              printDebug("*** Begin expression chasing routine *** \n\n");
              visit( (ASTMethodDeclaration) node.getFirstParentOfType
                  (ASTMethodDeclaration.class),data);
              printDebug("... Exiting from visit - ASTAdditiveExpression ...\n");
              printDebug("*** Returning from expression chasing routine ... 
                        Done with this ASTAdditiveExpression ... any more?? ***\n\n");
              this.init( );
            }
          }

        }
      }

      return super.visit(node,data);
  }

  public void chkLocVarsForUCI(ASTLocalVariableDeclaration node, Object data)
  {
    getInfo(node);

    printCurrMethExprsToChase( );

    ASTVariableDeclarator varDec = (ASTVariableDeclarator)node.jjtGetChild(1);
    String varName = ((ASTVariableDeclaratorId)varDec.jjtGetChild(0)).getImage( );
    printDebug("Local Variable Name: " + varName + "\n");

    ASTVariableInitializer varInit = (ASTVariableInitializer)varDec.jjtGetChild(1);

    ASTName initExp = null;
    if (varInit.findChildrenOfType(ASTName.class).size( ) 
        > 0 && varInit.findChildrenOfType(ASTName.class).get(0) instanceof ASTName)
    {
      initExp = (ASTName) varInit.findChildrenOfType(ASTName.class).get(0);
      printDebug("Local Variable Initializer: " + initExp.getImage( ) + "\n");
    } else {
      return;
    }

    boolean chase = false;
    boolean srcOfUCI = false;
    int cnt = 0;
    int index = 0;
    for (Iterator i = currMethExprsToChase.iterator( ); i.hasNext( );)
    {
      ASTName currNode = (ASTName)i.next( );
      printDebug("Checking: " + currNode.getImage( ) + "\n");
      if ( currNode.getImage( ).matches(varName) )
      {
        printDebug("Loc var: " + varName + " matches '" + currNode.getImage( ) + "', which is
           an expression we are currently chasing\n");
        ((LinkedList)currMethVarsChased).addLast(currNode.getImage( ));
        String uci = chkForUCI(initExp);
        if (uci != null)
        {
          printDebug("Initializing expression: " + initExp.getImage( ) + " is a source of UCI:
            [" + uci + "]\n");
          srcOfUCI = true;
          index = cnt;
          break;
        }
        else
        {
          printDebug("Need to chase the local var initializer: '" 
                    + initExp.getImage( ) + "'\n");
          chase = true;
          index = cnt;
          break;
        }
      }
      cnt++;
    }

    if (srcOfUCI)
    {
      ((ArrayList)currMethExprsToChase).remove(index);

      /* Add uci - Appending the ASTLiteral node with the expectation that the source
       * of uci is from HttpServletRequest ( i.e. something like req.getParameter("id") ).
       * This will not always be the case, and so will have to make this 
         a little more generic.
       */

      ASTLiteral lit = (ASTLiteral)node.findChildrenOfType(ASTLiteral.class).get(0);
      ((LinkedList)currMethVarsChased).addLast(initExp.getImage( ) 
        + "(" + lit.getImage( ) + ")");
      String uciChased = printCurrMethVarsChased( );

      RuleContext ctx = (RuleContext) data;
      String msg = MessageFormat.format(getMessage( ), new Object[]
        {"SQL select statement detected with UCI: " + uciChased });
      printDebug("Report message: " + msg + "\n");
      ctx.getReport( ).addRuleViolation(createRuleViolation(ctx, lit.getBeginLine( ), msg));
      currMethVarsChased = new LinkedList( );

    } else if (chase)
    {
      ((ArrayList)currMethExprsToChase).remove(index);

      ((ArrayList)currMethExprsToChase).add(index,initExp);

      visit( (ASTMethodDeclaration)node.getFirstParentOfType
        (ASTMethodDeclaration.class),data);
      printDebug("... Exiting from chkLocVarsForUCI\n");
    }


  }

  public void getInfo (SimpleNode node)
    {
    printDebug ("\n====================");

    Object o = node;
    Class c = o.getClass( );
    printDebug ("Class Name: " + c.getName( ));

    int begLine = node.getBeginLine( );
    if (begLine != 0)
    {
      printDebug("Line #: " + begLine);
    }

    }

  private void getCurrMethParams (ASTMethodDeclarator node)
  {
   if (node.getParameterCount( ) > 0)
   {
    List methodParams = node.findChildrenOfType(ASTFormalParameter.class);
    for (Iterator i = methodParams.iterator( );i.hasNext( );)
    {
     ASTFormalParameter p = (ASTFormalParameter)i.next( );
     ASTName pType =   (ASTName)p.jjtGetChild(0).jjtGetChild(0);
     ASTVariableDeclaratorId pName =   (ASTVariableDeclaratorId)p.jjtGetChild(1);
     currMethParams.put(pName.getImage( ),pType.getImage( ));
    }
   }
  }

  private void printCurrMethParams ( )
  {
    for (Iterator i = currMethParams.keySet( ).iterator( ); i.hasNext( );)
    {
        String key = (String)i.next( );
        String value = (String)currMethParams.get(key);
        printDebug ("Param Name: " + key + ", Param Type: " + value);
    }
  }

  private void printCurrMethExprsToChase ( )
  {
    printDebug ("Chasing the following expressions:\n");
    for (Iterator i = currMethExprsToChase.iterator( ); i.hasNext( );)
    {
        String value = ((ASTName)i.next( )).getImage( );
        printDebug (value + "\n");
    }
  }

  private String printCurrMethVarsChased ( )
  {
    printDebug ("Chased the following variables to UCI: " + currMethVarsChased.size( ) 
+ "\n");
    String str = "";
    for (Iterator i = currMethVarsChased.iterator( ); i.hasNext( );)
    {
        String value = (String)i.next( );
        if (i.hasNext( ))
        {
          str = str + (value + " --> ");
        }
        else
        {
          str = str + value;
        }
    }

    printDebug(str + "\n");
    return str;
  }

  private boolean isMatch(String literal)
  {
   boolean match = false;

   RE sql = new RE(PATTERN);

   sql.setMatchFlags(RE.MATCH_CASEINDEPENDENT);

   return sql.match(literal);

  }

  private List chkForUCI(List names)
  {
   List uci = new ArrayList( );
   for (Iterator i = names.iterator( );i.hasNext( );)
   {
    ASTName name = (ASTName)i.next( );
    for (Iterator j = currMethParams.keySet( ).iterator( ); j.hasNext( );)
    {
     String currMethParam = (String)j.next( );
     RE re = new RE (currMethParam);
     if ( re.match(name.getImage( )) )
     {
      uci.add(name);
      break;
     }
    }
   }
   return uci;
  }

    private String chkForUCI(ASTName name)
    {
      for (Iterator j = currMethParams.keySet( ).iterator( );                     
  j.hasNext( );)
      {
       String currMethParam = (String)j.next( );
       RE re = new RE (currMethParam);
       if ( re.match(name.getImage( )) )
       {
        return currMethParam;
       }
      }
      return null;
     }
}

6.4.5. dynamicsql.xml

Example 6-7 provides the rule file that is used with Example 6-6.

Example 6-7. Rule file used with DynSqlSelectStmts.java

<?xml version="1.0"?>

<ruleset name="Dynamic SQL Ruleset">
  <description>
This ruleset contains a collection of rules that find instances of potentially 
exploitable dynamic SQL.
  </description>

  <rule name="DynamicSqlSelectStmts"
        message="'' {0} ''"
        class="net.sourceforge.pmd.rules.web.security.DynSqlSelectStmts">
    <description>
Dynamic SQL or "string building" techniques that rely on unsanitized input values 
are potentially vulnerable to SQL Injection.
    </description>
      <priority>1</priority>
    <example>
<![CDATA[

int id = request.getParameter("id");

String sql = "select * from employees where employeeid = " + id;

]]>
    </example>
  </rule>

<!-- MORE RULES -->

</ruleset>