Section 5.2. Text

5.2. Text

Most programs manipulate text in one form or another, and the Java platform defines a number of important classes and interfaces for representing, formatting, and scanning text. The sections that follow provide an overview.

5.2.1. The String Class

Strings of text are a fundamental and commonly used data type. In Java, however, strings are not a primitive type, like char, int, and float. Instead, strings are represented by the java.lang.String class, which defines many useful methods for manipulating strings. String objects are immutable: once a String object has been created, there is no way to modify the string of text it represents. Thus, each method that operates on a string typically returns a new String object that holds the modified string.

This code shows some of the basic operations you can perform on strings:

// Creating strings
String s = "Now";               // String objects have a special literal syntax
String t = s + " is the time."; // Concatenate strings with + operator
String t1 = s + " " + 23.4;     // + converts other values to strings
t1 = String.valueOf('c');       // Get string corresponding to char value
t1 = String.valueOf(42);        // Get string version of integer or any value
t1 = object.toString();         // Convert objects to strings with toString()

// String length
int len = t.length();           // Number of characters in the string: 16

// Substrings of a string
String sub = t.substring(4);    // Returns char 4 to end: "is the time."
sub = t.substring(4, 6);        // Returns chars 4 and 5: "is"
sub = t.substring(0, 3);        // Returns chars 0 through 2: "Now"
sub = t.substring(x, y);        // Returns chars between pos x and y-1
int numchars = sub.length();    // Length of substring is always (y-x)

// Extracting characters from a string
char c = t.charAt(2);           // Get the 3rd character of t: w
char[] ca = t.toCharArray();    // Convert string to an array of characters
t.getChars(0, 3, ca, 1);        // Put 1st 3 chars of t into ca[1]-ca[3]

// Case conversion
String caps = t.toUpperCase();  // Convert to uppercase
String lower = t.toLowerCase(); // Convert to lowercase

// Comparing strings
boolean b1 = t.equals("hello");         // Returns false: strings not equal
boolean b2 = t.equalsIgnoreCase(caps);  // Case-insensitive compare: true
boolean b3 = t.startsWith("Now");       // Returns true
boolean b4 = t.endsWith("time.");       // Returns true
int r1 = s.compareTo("Pow");            // Returns < 0: s comes before "Pow"
int r2 = s.compareTo("Now");            // Returns 0: strings are equal
int r3 = s.compareTo("Mow");            // Returns > 0: s comes after "Mow"
r1 = s.compareToIgnoreCase("pow");      // Returns < 0 (Java 1.2 and later)

// Searching for characters and substrings
int pos = t.indexOf('i');         // Position of first 'i': 4
pos = t.indexOf('i', pos+1);      // Position of the next 'i': 12
pos = t.indexOf('i', pos+1);      // No more 'i's in string, returns -1
pos = t.lastIndexOf('i');         // Position of last 'i' in string: 12
pos = t.lastIndexOf('i', pos-1);  // Search backwards for 'i' from char 11

pos = t.indexOf("is");            // Search for substring: returns 4
pos = t.indexOf("is", pos+1);     // Only appears once: returns -1
pos = t.lastIndexOf("the ");      // Search backwards for a string
String noun = t.substring(pos+4); // Extract word following "the"

// Replace all instances of one character with another character
String exclaim = t.replace('.', '!');  // Works only with chars, not substrings

// Strip blank space off the beginning and end of a string
String noextraspaces = t.trim();

// Obtain unique instances of strings with intern() 
String s1 = s.intern();        // Returns s1 equal to s
String s2 = "Now";             // String literals are automatically interned
boolean equals = (s1 == s2);   // Now can test for equality with ==

5.2.2. The Character Class

As you know, individual characters are represented in Java by the primitive char type. The Java platform also defines a Character class, which contains useful class methods for checking the type of a character and for converting the case of a character. For example:

char[] text;  // An array of characters, initialized somewhere else
int p = 0;    // Our current position in the array of characters
// Skip leading whitespace
while((p < text.length) && Character.isWhitespace(text[p])) p++;  
// Capitalize the first word of text
while((p < text.length) && Character.isLetter(text[p])) {
  text[p] = Character.toUpperCase(text[p]);
  p++;
}

5.2.3. The StringBuffer Class

Since String objects are immutable, you cannot manipulate the characters of an instantiated String. If you need to do this, use a java.lang.StringBuffer or java.lang.StringBuilder instead. These two classes are identical except that StringBuffer has synchronized methods. StringBuilder was introduced in Java 5.0 and you should use it in preference to StringBuffer unless it might actually be manipulated by multiple threads. The following code demonstrates the StringBuffer API but could be easily changed to use StringBuilder:

// Create a string buffer from a string
StringBuffer b = new StringBuffer("Mow");

// Get and set individual characters of the StringBuffer
char c = b.charAt(0);        // Returns 'M': just like String.charAt()
b.setCharAt(0, 'N');         // b holds "Now": can't do that with a String!

// Append to a StringBuffer
b.append(' ');               // Append a character
b.append("is the time.");    // Append a string
b.append(23);                // Append an integer or any other value

// Insert Strings or other values into a StringBuffer
b.insert(6, "n't");          // b now holds: "Now isn't the time.23"

// Replace a range of characters with a string (Java 1.2 and later) 
b.replace(4, 9, "is");       // Back to "Now is the time.23"

// Delete characters
b.delete(16, 18);            // Delete a range: "Now is the time"
b.deleteCharAt(2);           // Delete 2nd character: "No is the time"
b.setLength(5);              // Truncate by setting the length: "No is"

// Other useful operations
b.reverse();                 // Reverse characters: "si oN"
String s = b.toString();     // Convert back to an immutable string
s = b.substring(1,2);        // Or take a substring: "i"
b.setLength(0);              // Erase buffer; now it is ready for reuse

5.2.4. The CharSequence Interface

As of Java 1.4, both the String and the StringBuffer classes implement the java.lang.CharSequence interface, which is a standard interface for querying the length of and extracting characters and subsequences from a readable sequence of characters. This interface is also implemented by the java.nio.CharBuffer interface, which is part of the New I/O API that was introduced in Java 1.4. CharSequence provides a way to perform simple operations on strings of characters regardless of the underlying implementation of those strings. For example:

/** 
 * Return a prefix of the specified CharSequence that starts at the first
 * character of the sequence and extends up to (and includes) the first 
 * occurrence of the character c in the sequence. Returns null if c is 
 * not found. s may be a String, StringBuffer, or java.nio.CharBuffer.
 */
public static CharSequence prefix(CharSequence s, char c) {
  int numChars = s.length();          // How long is the sequence?
  for(int i = 0; i < numChars; i++) { // Loop through characters in sequence
    if (s.charAt(i) == c)             // If we find c,
      return s.subSequence(0,i+1);    // then return the prefix subsequence
  }
  return null;                        // Otherwise, return null
}

5.2.5. The Appendable Interface

Appendable is a Java 5.0 interface that represents an object that can have a char or a CharSequence appended to it. Implementing classes include StringBuffer, StringBuilder, java.nio.CharBuffer, java.io.PrintStream, and java.io.Writer and all of its character output stream subclasses, including PrintWriter. Thus, the Appendable interface represents the common appendability of the text buffer classes and the text output stream classes. As we'll see below, a Formatter object can send its output to any Appendable object.

5.2.6. String Concatenation

The + operator concatenates two String objects or one String and one value of some other type, producing a new String object. Be aware that each time a string concatenation is performed and the result stored in a variable or passed to a method, a new String object has been created. In some circumstances, this can be inefficient and can result in poor performance. It is especially important to be careful when doing string concatenation within a loop. The following code is inefficient, for example:

// Inefficient: don't do this
public String join(List<String> words) {
    String sentence = "";
    // Each iteration creates a new String object and discards an old one.
    for(String word: words) sentence += word;
    return sentence;
}

When you find yourself writing code like this, switch to a StringBuffer or a StringBuilder and use the append() method:

// This is the right way to do it
public String join(List<String> words) {
    StringBuilder sentence = new StringBuilder();
    for(String word: words) sentence.append(word);
    return sentence.toString();
}

There is no need to be paranoid about string concatenation, however. Remember that string literals are concatenated by the compiler rather than the Java interpreter. Also, when a single expression contains multiple string concatenations, these are compiled efficiently using a StringBuilder (or StringBuffer prior to Java 5.0) and result in the creation of only a single new String object.

5.2.7. String Comparison

Since strings are objects rather than primitive values, they cannot, in general, be compared for equality with the = = operator. == compares references and can determine if two expressions evaluate to a reference to the same string. It cannot determine if two distinct strings contain the same text. To do that, use the equals( ) method. In Java 5.0 you can compare the content of a string to any other CharSequence with the contentEquals( ) method.

Similarly, the < and > relational operators do not work with strings. To compare the order of strings, use the compareTo() method, which is defined by the Comparable<String> interface and is illustrated in the sample code above. To compare strings without taking the case of the letters into account, use compareToIgnoreCase( ) .

Note that StringBuffer and StringBuilder do not implement Comparable and do not override the default versions of equals( ) and hashCode() that they inherit from Object. This means that it is not possible to compare the text held in two StringBuffer or StringBuilder objects for equality or for order.

One important, but little understood method of the String class is intern( ). When passed a string s, it returns a string t that is guaranteed to have the same content as s. What's important, though, is that for any given string content, it always returns a reference to the same String object. That is, if s and t are two String objects such that s.equals(t), then:

s.intern() == t.intern()

This means that the intern( ) method provides a way of doing fast string comparisons using ==. Importantly, string literals are always implicitly interned by the Java VM, so if you plan to compare a string s against a number of string literals, you may want to intern s first and then do the comparison with = =.

The compareTo() and equals( ) methods of the String class allow you to compare strings. compareTo( ) bases its comparison on the character order defined by the Unicode encoding while equals( ) defines string equality as strict character-by-character equality. These are not always the right methods to use, however. In some languages, the character ordering imposed by the Unicode standard does not match the dictionary ordering used when alphabetizing strings. In Spanish, for example, the letters "ch" are considered a single letter that comes after "c" and before "d." When comparing human-readable strings in an internationalized application, you should use the java.text.Collator class instead:

import java.text.*;

// Compare two strings; results depend on where the program is run
// Return values of Collator.compare() have same meanings as String.compareTo()
Collator c = Collator.getInstance();      // Get Collator for current locale
int result = c.compare("chica", "coche"); // Use it to compare two strings

5.2.8. Supplementary Characters

Java 5.0 has adopted the Unicode 4.0 standard, which, for the first time, has defined codepoints that fall outside the 16-bit range of the char type. When working with these "supplementary characters" (which are primarily Han ideographs), you must use int values to represent the individual character. In String objects, or for any other type that represents text as a sequence of char values, these supplementary characters are represented as a series of two char values known as a surrogate pair.

Although readers of the English edition of this book are unlikely to ever encounter supplementary characters, you should be aware of them if you are working on programs that might be localized for use in China or another country that uses Han ideographs. To help you work with supplementary characters, the Character, String, StringBuffer, and StringBuilder classes have been extended with new methods that operate on int codepoints rather than char values. The following code illustrates some of these methods. You can find other, similar methods in the reference section and read about them in the online javadoc documentation.

int codepoint = 0x10001;  // This codepoint doesn't fit in a char
// Get the UTF-16 surrogate pair of chars for the codepoint
char[] surrogatePair = Character.toChars(codepoint); 
// Convert the chars to a string.
String s = new String(surrogatePair);

// Print string length in characters and codepoints
System.out.println(s.length());
System.out.println(s.codePointCount(0, s.length()-1));

// Print encoding of first character, then encoding of first codepoint.
System.out.println(Integer.toHexString(s.charAt(0)));
System.out.println(Integer.toHexString(s.codePointAt(0)));

// Here's how to safely loop through a string that may contain
// supplementary characters
String tricky = s + "Testing" + s + "!";
int i = 0, n = tricky.length();
while(i < n) {
    // Get the codepoint at the current position
    int cp = tricky.codePointAt(i);
    if (cp < '\uffff') System.out.println((char) cp);
    else System.out.println("\\u" + Integer.toHexString(cp));

    // Increment the string index by one codepoint (1 or 2 chars).
    i = tricky.offsetByCodePoints(i, 1);
}

5.2.9. Formatting Text with printf() and format( )

A common task when working with text output is to combine values of various types into a single block of human-readable text. One way to accomplish this relies on the string-conversion power of Java's string concatenation operator. It results in code like this:

System.out.println(username + " logged in after " + numattempts +
                   "attempts. Last login at: " + lastLoginDate);

Java 5.0 introduces an alternative that is familiar to C programmers: a printf( ) method. "printf" is short for "print formatted" and it combines the printing and formatting functions into one call. The printf( ) method has been added to the PrintWriter and PrintStream output stream classes in Java 5.0. It is a varargs method that expects one or more arguments. The first argument is the "format string." It specifies the text to be printed and typically includes one or more "format specifiers," which are escape sequences beginning with character %. The remaining arguments to printf( ) are values to be converted to strings and substituted into the format string in place of the format specifiers. The format specifiers constrain the types of the remaining arguments and specify exactly how they are converted to strings. The string concatenation shown above can be rewritten as follows in Java 5.0:

System.out.printf("%s logged in after %d attempts. Last login at: %tc%n",
                  username, numattempts, lastLoginDate);

The format specifier %s simply substitutes a string. %d expects the corresponding argument to be an integer and displays it as such. %tc expects a Date, Calendar, or number of milliseconds and converts that value to text representation of the full date and time. %n performs no conversion: it simply outputs the platform-specific line terminator, just as the println( ) method does.

The conversions performed by printf() are all properly localized. Times and dates are displayed with locale-appropriate punctuation, for example. And if you request that a number be displayed with a thousands separator, you'll get locale-specific punctuation there, too (a comma in England and a period in France, for example).

In addition to the basic printf( ) method, PrintWriter and PrintStream also define a synonymous method named format() : it takes exactly the same arguments and behaves in exactly the same way. The String class also has a format() method in Java 5.0. This static String.format() method behaves like PrintWriter.format( ) except that instead of printing the formatted string to a stream, it simply returns it:

// Format a string, converting a double value to text using two decimal
// places and a thousands separator.
double balance = getBalance();
String msg = String.format("Account balance: $%,.2f", balance);

The java.util.Formatter class is the general-purpose formatter class behind the printf() and format( ) utility methods. It can format text to any Appendable object or to a named file. The following code uses a Formatter object to write a file:

public static void writeFile(String filename, String[] lines)
    throws IOException
{
    Formatter out = new Formatter(filename);  // format to a named file
    for(int i = 0; i < lines.length; i++) {
        // Write a line of the file
        out.format("%d: %s%n", i, lines[i]);
        // Check for exceptions
        IOException e = out.ioException();
        if (e != null) throw e;
    }
    out.close();
}

When you concatenate an object to a string, the object is converted to a string by calling its toString() method. This is what the Formatter class does by default as well. Classes that want more precise control over their formatting can implement the java.util.Formattable interface in addition to implementing toString().

We'll see additional examples of formatting with printf( ) when we cover the APIs for working with numbers, dates, and times. See java.util.Formatter for a complete list of available format specifiers and options.

5.2.10. Logging

Simple terminal-based programs can send their output and error messages to the console with System.out.println() or System.out.print( ). Server programs that run unattended for long periods need a different solution for output: the hardware they run on may not have a display terminal attached, and, if it does, there is unlikely to be anyone looking at it. Programs like this need logging functionality in which output messages are sent to a file for later analysis or through a network socket for remote monitoring. Java 1.4 provides a logging API in the java.util.logging package.

Typically, the application developer uses a Logger object associated with the class or package of the application to generate log messages at any of seven severity levels (see java.util.logging.Level). These messages may report errors and warnings or provide informational messages about interesting events in the application's life cycle. They can include debugging information or even trace the execution of important methods within the program.

The system administrator or end user of the application is responsible for setting up a logging configuration file that specifies where log messages are directed (the console, a file, a network socket, or a combination of these), how they are formatted (as plain text or XML documents), and at what severity threshold they are logged (log messages with a severity below the specified threshold are discarded with very little overhead and should not significantly impact the performance of the application). The logging level severity threshold can be configured independently so that Logger objects associated with different classes or packages can be "tuned in" or "tuned out." Because of this end-user configurability, you should feel free to use logging output liberally in your program. In normal operation, most log messages will be discarded efficiently and automatically. During program development, or when diagnosing a problem in a deployed application, however, the log messages can prove very valuable.

For most applications, using the Logging API is quite simple. Obtain a named Logger object whenever necessary by calling the static Logger.getLogger( ) method, passing the class or package name of the application as the logger name. Then, use one of the many Logger instance methods to generate log messages. The easiest methods to use have names that correspond to severity levels, such as severe() , warning( ), and info(). Here is some sample code:

import java.util.logging.*;

// Get a Logger object named after the current package
Logger logger = Logger.getLogger("com.davidflanagan.servers.pop");
logger.info("Starting server.");       // Log an informational message
ServerSocket ss;                       // Do some stuff
try { ss = new ServerSocket(110); }
catch(Exception e) {                   // Log exceptions
  logger.log(Level.SEVERE, "Can't bind port 110", e); // Complex log message
  logger.warning("Exiting");                          // Simple warning 
  return;
}
logger.fine("got server socket"); // Fine-detail (low-severity) debug message

5.2.11. Pattern Matching with Regular Expressions

In Java 1.4 and later, you can perform textual pattern matching with regular expressions. Regular expression support is provided by the Pattern and Matcher classes of the java.util.regex package, but the String class defines a number of convenient methods that allow you to use regular expressions even more simply. Regular expressions use a fairly complex grammar to describe patterns of characters. The Java implementation uses the same regex syntax as the Perl 5 programming language. See the java.util.regex.Pattern class in the reference section for a summary of this syntax or consult a good Perl programming book for further details. For a complete tutorial on Perl-style regular expressions, see Mastering Regular Expressions (O'Reilly).

The simplest String method that accepts a regular expression argument is matches( ); it returns true if the string matches the pattern defined by the specified regular expression:

// This string is a regular expression that describes the pattern of a typical
// sentence. In Perl-style regular expression syntax, it specifies
// a string that begins with a capital letter and ends with a period,
// a question mark, or an exclamation point.
String pattern = "^[A-Z].*[\\.?!]$";  
String s = "Java is fun!";
s.matches(pattern);  // The string matches the pattern, so this returns true.

The matches( ) method returns true only if the entire string is a match for the specified pattern. Perl programmers should note that this differs from Perl's behavior, in which a match means only that some portion of the string matches the pattern. To determine if a string or any substring matches a pattern, simply alter the regular expression to allow arbitrary characters before and after the desired pattern. In the following code, the regular expression characters .* match any number of arbitrary characters:

s.matches(".*\\bJava\\b.*"); // True if s contains the word "Java" anywhere
                             // The b specifies a word boundary

If you are already familiar with Perl's regular expression syntax, you know that it relies on the liberal use of backslashes to escape certain characters. In Perl, regular expressions are language primitives and their syntax is part of the language itself. In Java, however, regular expressions are described using strings and are typically embedded in programs using string literals. The syntax for Java string literals also uses the backslash as an escape character, so to include a single backslash in the regular expression, you must use two backslashes. Thus, in Java programming, you will often see double backslashes in regular expressions.

In addition to matching, regular expressions can be used for search-and-replace operations. The replaceFirst( ) and replaceAll( ) methods search a string for the first substring or all substrings that match a given pattern and replace the string or strings with the specified replacement text, returning a new string that contains the replacements. For example, you could use this code to ensure that the word "Java" is correctly capitalized in a string s:

s.replaceAll("(?i)\\bjava\\b",// Pattern: the word "java", case-insensitive
             "Java");         // The replacement string, correctly capitalized

The replacement string passed to replaceAll() and replaceFirst( ) need not be a simple literal string; it may also include references to text that matched parenthesized subexpressions within the pattern. These references take the form of a dollar sign followed by the number of the subexpression. (If you are not familiar with parenthesized subexpressions within a regular expression, see java.util.regex.Pattern in the reference section.) For example, to search for words such as JavaBean, JavaScript, JavaOS, and JavaVM (but not Java or Javanese) and to replace the Java prefix with the letter J without altering the suffix, you could use code such as:

s.replaceAll("\\bJava([A-Z]\\w+)",  // The pattern
             "J$1");      // J followed by the suffix that matched the
                          // subexpression in parentheses: [A-Z]\\w+

The other String method that uses regular expressions is split(), which returns an array of the substrings of a string, separated by delimiters that match the specified pattern. To obtain an array of words in a string separated by any number of spaces, tabs, or newlines, do this:

String sentence = "This is a\n\ttwo-line sentence";
String[] words = sentence.split("[ \t\n\r]+");

An optional second argument specifies the maximum number of entries in the returned array.

The matches( ), replaceFirst(), replaceAll( ), and split() methods are suitable for when you use a regular expression only once. If you want to use a regular expression for multiple matches, you should explicitly use the Pattern and Matcher classes of the java.util.regex package. First, create a Pattern object to represent your regular expression with the static Pattern.compile() method. (Another reason to use the Pattern class explicitly instead of the String convenience methods is that Pattern.compile( ) allows you to specify flags such as Pattern.CASE_INSENSITIVE that globally alter the way the pattern matching is done.) Note that the compile() method can throw a PatternSyntaxException if you pass it an invalid regular expression string. (This exception is also thrown by the various String convenience methods.) The Pattern class defines split() methods that are similar to the String.split() methods. For all other matching, however, you must create a Matcher object with the matcher() method and specify the text to be matched against:

import java.util.regex.*;

Pattern javaword = Pattern.compile("\\bJava(\\w*)", Pattern.CASE_INSENSITIVE);
Matcher m = javaword.matcher(sentence);
boolean match = m.matches();  // True if text matches pattern exactly

Once you have a Matcher object, you can compare the string to the pattern in various ways. One of the more sophisticated ways is to find all substrings that match the pattern:

String text = "Java is fun; JavaScript is funny.";
m.reset(text);  // Start matching against a new string
// Loop to find all matches of the string and print details of each match
while(m.find()) {
  System.out.println("Found '" + m.group(0) + "' at position " + m.start(0));
  if (m.start(1) < m.end(1)) System.out.println("Suffix is " + m.group(1));
}

The Matcher class has been enhanced in several ways in Java 5.0. The most important of these is the ability to save the results of the most recent match in a MatchResult object. The previous algorithm that finds all matches in a string could be rewritten in Java 5.0 as follows:

import java.util.regex.*;
import java.util.*;

public class FindAll {
    public static void main(String[] args) {
        Pattern pattern = Pattern.compile(args[0]);
        String text = args[1];

        List<MatchResult> results = findAll(pattern, text);
        for(MatchResult r : results) {
            System.out.printf("Found '%s' at (%d,%d)%n",
                              r.group(), r.start(), r.end());
        }
    }

    public static List<MatchResult> findAll(Pattern pattern, CharSequence text)
    {
        List<MatchResult> results = new ArrayList<MatchResult>();
        Matcher m = pattern.matcher(text);
        while(m.find()) results.add(m.toMatchResult());
        return results;
    }
}

5.2.12. Tokenizing Text

java.util.Scanner is a general purpose text tokenizer, added in Java 5.0 to complement the java.util.Formatter class described earlier in this chapter. Scanner takes full advantage of Java regular expressions and can take its input text from a string, file, stream, or any object that implements the java.lang.Readable interface. Readable is also new in Java 5.0 and is the opposite of the Appendable interface.

A Scanner can break its input text into tokens separated by whitespace or any desired delimiter character or regular expression. It implements the Iterator<String> interface, which allows for simple looping through the returned tokens. Scanner also defines a variety of convenience methods for parsing tokens as boolean, integer, or floating-point values, with locale-sensitive number parsing. It has skip( ) methods for skipping input text that matches a specified pattern and also has methods for searching ahead in the input text for text that matches a specified pattern.

Here's how you could use a Scanner to break a String into space-separated words:

public static List<String> getTokens(String line) {
    List<String> result = new ArrayList<String>();
    for(Scanner s = Scanner.create(line); s.hasNext(); )
        result.add(s.next());
    return result;
}

Here's how you might use a Scanner to break a file into lines:

public static void printLines(File f) throws IOException {
    Scanner s = Scanner.create(f);
    // Use a regex to specify line terminators as the token delimiter
    s.useDelimiter("\r\n|\n|\r");
    while(s.hasNext()) System.out.println(s.next());
}

The following method uses Scanner to parse an input line in the form x + y = z. It demonstrates the ability of a Scanner to scan numbers. Note that Scanner does not just parse Java-style integer literals: it supports thousands separators and does so in a locale-sensitive wayfor example, it would parse the integer 1,234 for an American user and 1.234 for a French user. This code also demonstrates the skip() method and shows that a Scanner can scan text directly from an InputStream.

public static boolean parseSum() {
    System.out.print("enter sum> "); // Prompt the user for input
    System.out.flush();          // Make sure prompt is visible immediately

    try {
        // Read and parse the user's input from the console
        Scanner s = Scanner.create(System.in);
        s.useDelimiter("");    // Don't require spaces between tokens
        int x = s.nextInt();   // Parse an integer
        s.skip("\\s*\\+\\s*"); // Skip optional space and literal +  
        int y = s.nextInt();   // Parse another integer
        s.skip("\\s*=\\s*");   // Skip optional space and literal =
        int z = s.nextInt();   // Parse a third integer

        return x + y == z;
    }
    catch(InputMismatchException e) { // pattern does not match
        throw new IllegalArgumentException("syntax error");
    }
    catch(NoSuchElementException e) { // no more input available 
        throw new IllegalArgumentException("syntax error");
    }
}

5.2.13. StringTokenizer

A number of other Java classes operate on strings and characters. One notable class is java.util.StringTokenizer, which you can use to break a string of text into its component words:

String s = "Now is the time";
java.util.StringTokenizer st = new java.util.StringTokenizer(s);
while(st.hasMoreTokens()) {
  System.out.println(st.nextToken());
}

You can even use this class to tokenize words that are delimited by characters other than spaces:

String s = "a:b:c:d";
java.util.StringTokenizer st = new java.util.StringTokenizer(s, ":");

java.io.StreamTokenizer is another tokenizing class. It has a more complicated API and has more powerful features than StringTokenizer.