5.2. Text
Most
programs manipulate text in one form or another, and the Java
platform defines a number of important classes and interfaces for
representing, formatting, and scanning text. The sections that follow
provide an overview.
5.2.1. The String Class
Strings of
text are a fundamental and commonly used
data type. In Java, however, strings are not a primitive type, like
char, int, and
float. Instead, strings are represented by the
java.lang.String class, which defines many useful
methods for manipulating strings. String objects
are immutable: once a String
object has been created, there is no way to modify the string of text
it represents. Thus, each method that operates on a string typically
returns a new String object that holds the
modified string.
This code shows some of the basic
operations you can perform on strings:
// Creating strings
String s = "Now"; // String objects have a special literal syntax
String t = s + " is the time."; // Concatenate strings with + operator
String t1 = s + " " + 23.4; // + converts other values to strings
t1 = String.valueOf('c'); // Get string corresponding to char value
t1 = String.valueOf(42); // Get string version of integer or any value
t1 = object.toString(); // Convert objects to strings with toString()
// String length
int len = t.length(); // Number of characters in the string: 16
// Substrings of a string
String sub = t.substring(4); // Returns char 4 to end: "is the time."
sub = t.substring(4, 6); // Returns chars 4 and 5: "is"
sub = t.substring(0, 3); // Returns chars 0 through 2: "Now"
sub = t.substring(x, y); // Returns chars between pos x and y-1
int numchars = sub.length(); // Length of substring is always (y-x)
// Extracting characters from a string
char c = t.charAt(2); // Get the 3rd character of t: w
char[] ca = t.toCharArray(); // Convert string to an array of characters
t.getChars(0, 3, ca, 1); // Put 1st 3 chars of t into ca[1]-ca[3]
// Case conversion
String caps = t.toUpperCase(); // Convert to uppercase
String lower = t.toLowerCase(); // Convert to lowercase
// Comparing strings
boolean b1 = t.equals("hello"); // Returns false: strings not equal
boolean b2 = t.equalsIgnoreCase(caps); // Case-insensitive compare: true
boolean b3 = t.startsWith("Now"); // Returns true
boolean b4 = t.endsWith("time."); // Returns true
int r1 = s.compareTo("Pow"); // Returns < 0: s comes before "Pow"
int r2 = s.compareTo("Now"); // Returns 0: strings are equal
int r3 = s.compareTo("Mow"); // Returns > 0: s comes after "Mow"
r1 = s.compareToIgnoreCase("pow"); // Returns < 0 (Java 1.2 and later)
// Searching for characters and substrings
int pos = t.indexOf('i'); // Position of first 'i': 4
pos = t.indexOf('i', pos+1); // Position of the next 'i': 12
pos = t.indexOf('i', pos+1); // No more 'i's in string, returns -1
pos = t.lastIndexOf('i'); // Position of last 'i' in string: 12
pos = t.lastIndexOf('i', pos-1); // Search backwards for 'i' from char 11
pos = t.indexOf("is"); // Search for substring: returns 4
pos = t.indexOf("is", pos+1); // Only appears once: returns -1
pos = t.lastIndexOf("the "); // Search backwards for a string
String noun = t.substring(pos+4); // Extract word following "the"
// Replace all instances of one character with another character
String exclaim = t.replace('.', '!'); // Works only with chars, not substrings
// Strip blank space off the beginning and end of a string
String noextraspaces = t.trim();
// Obtain unique instances of strings with intern()
String s1 = s.intern(); // Returns s1 equal to s
String s2 = "Now"; // String literals are automatically interned
boolean equals = (s1 == s2); // Now can test for equality with ==
5.2.2. The Character Class
As you know, individual characters are
represented in Java by the primitive char type.
The Java platform also defines a Character class,
which contains useful class methods for checking the type of a
character and for converting the case of a character. For example:
char[] text; // An array of characters, initialized somewhere else
int p = 0; // Our current position in the array of characters
// Skip leading whitespace
while((p < text.length) && Character.isWhitespace(text[p])) p++;
// Capitalize the first word of text
while((p < text.length) && Character.isLetter(text[p])) {
text[p] = Character.toUpperCase(text[p]);
p++;
}
5.2.3. The StringBuffer Class
Since String objects
are immutable, you cannot manipulate the characters of an
instantiated String. If you need to do this, use a
java.lang.StringBuffer or
java.lang.StringBuilder instead. These two classes
are identical except that StringBuffer has
synchronized methods.
StringBuilder was introduced in Java 5.0 and you
should use it in preference to StringBuffer unless
it might actually be manipulated by multiple threads. The following
code demonstrates the StringBuffer API but could
be easily changed to use StringBuilder:
// Create a string buffer from a string
StringBuffer b = new StringBuffer("Mow");
// Get and set individual characters of the StringBuffer
char c = b.charAt(0); // Returns 'M': just like String.charAt()
b.setCharAt(0, 'N'); // b holds "Now": can't do that with a String!
// Append to a StringBuffer
b.append(' '); // Append a character
b.append("is the time."); // Append a string
b.append(23); // Append an integer or any other value
// Insert Strings or other values into a StringBuffer
b.insert(6, "n't"); // b now holds: "Now isn't the time.23"
// Replace a range of characters with a string (Java 1.2 and later)
b.replace(4, 9, "is"); // Back to "Now is the time.23"
// Delete characters
b.delete(16, 18); // Delete a range: "Now is the time"
b.deleteCharAt(2); // Delete 2nd character: "No is the time"
b.setLength(5); // Truncate by setting the length: "No is"
// Other useful operations
b.reverse(); // Reverse characters: "si oN"
String s = b.toString(); // Convert back to an immutable string
s = b.substring(1,2); // Or take a substring: "i"
b.setLength(0); // Erase buffer; now it is ready for reuse
5.2.4. The CharSequence Interface
As of Java 1.4, both
the String and the StringBuffer
classes implement the
java.lang.CharSequence interface, which is a standard
interface for querying the length of and extracting characters and
subsequences from a readable sequence of characters. This interface
is also implemented by the java.nio.CharBuffer
interface, which is part of the New I/O API that was introduced in
Java 1.4. CharSequence provides a way to perform
simple operations on strings of characters regardless of the
underlying implementation of those strings. For example:
/**
* Return a prefix of the specified CharSequence that starts at the first
* character of the sequence and extends up to (and includes) the first
* occurrence of the character c in the sequence. Returns null if c is
* not found. s may be a String, StringBuffer, or java.nio.CharBuffer.
*/
public static CharSequence prefix(CharSequence s, char c) {
int numChars = s.length(); // How long is the sequence?
for(int i = 0; i < numChars; i++) { // Loop through characters in sequence
if (s.charAt(i) == c) // If we find c,
return s.subSequence(0,i+1); // then return the prefix subsequence
}
return null; // Otherwise, return null
}
5.2.5. The Appendable Interface
Appendable
is a Java 5.0
interface that represents an object that can have a
char or a CharSequence appended
to it.
Implementing
classes include StringBuffer,
StringBuilder,
java.nio.CharBuffer,
java.io.PrintStream, and
java.io.Writer and all of its character output
stream subclasses, including PrintWriter. Thus,
the Appendable interface represents the common
appendability of the text buffer classes and the text output stream
classes. As we'll see below, a
Formatter object can send its output to any
Appendable object.
5.2.6. String Concatenation
The +
operator concatenates two
String objects or one String
and one value of some other type, producing a new
String object. Be aware that each time a string
concatenation is performed and the result stored in a variable or
passed to a method, a new String object has been
created. In some circumstances, this can be inefficient and can
result in poor performance. It is especially important to be careful
when doing string concatenation within a loop. The following code is
inefficient, for example:
// Inefficient: don't do this
public String join(List<String> words) {
String sentence = "";
// Each iteration creates a new String object and discards an old one.
for(String word: words) sentence += word;
return sentence;
}
When you find yourself writing code like this, switch to a
StringBuffer or a StringBuilder
and use the append() method:
// This is the right way to do it
public String join(List<String> words) {
StringBuilder sentence = new StringBuilder();
for(String word: words) sentence.append(word);
return sentence.toString();
}
There is no need to be paranoid about string concatenation, however.
Remember that string literals are concatenated by the compiler rather
than the Java interpreter. Also, when a single expression contains
multiple string concatenations, these are compiled efficiently using
a StringBuilder (or
StringBuffer prior to Java 5.0) and result in the
creation of only a single new String object.
5.2.7. String Comparison
Since
strings are objects rather than primitive values, they cannot, in
general, be compared for equality with the =
= operator. ==
compares references and can determine if two expressions evaluate to
a reference to the same string. It cannot determine if two distinct
strings contain the same text. To do that, use the equals(
) method. In
Java 5.0 you can compare the content of a
string to any other
CharSequence
with the contentEquals( ) method.
Similarly, the < and
> relational operators do not work with
strings. To compare the order of strings, use the
compareTo() method, which is defined by the
Comparable<String> interface and is
illustrated in the sample code above. To compare strings without
taking the case of the letters into account, use
compareToIgnoreCase(
)
.
Note that StringBuffer and
StringBuilder do not implement
Comparable and do not override the default
versions of equals( ) and
hashCode() that they inherit from
Object. This means that it is not possible to
compare the text held in two StringBuffer or
StringBuilder objects for equality or for order.
One important, but little understood method of the
String
class is
intern( ). When passed a string
s, it returns a string t that
is guaranteed to have the same content as s.
What's important, though, is that for any given
string content, it always returns a reference to the same
String object. That is, if s
and t are two String objects
such that s.equals(t), then:
s.intern() == t.intern()
This means that the intern( ) method provides a
way of doing fast string comparisons using ==.
Importantly, string literals are always implicitly interned by the
Java VM, so if you plan to compare a string s
against a number of string literals, you may want to intern
s first and then do the comparison with =
=.
The
compareTo()
and equals( )
methods of the String class allow you to compare
strings. compareTo( ) bases its comparison on the
character order defined by the Unicode encoding while
equals( ) defines string equality as strict
character-by-character equality. These are not always the right
methods to use, however. In some languages, the character ordering
imposed by the Unicode standard does not match the dictionary
ordering used when alphabetizing strings. In Spanish, for example,
the letters "ch" are considered a
single letter that comes after "c"
and before "d." When comparing
human-readable strings in an internationalized application, you
should use the java.text.Collator class instead:
import java.text.*;
// Compare two strings; results depend on where the program is run
// Return values of Collator.compare() have same meanings as String.compareTo()
Collator c = Collator.getInstance(); // Get Collator for current locale
int result = c.compare("chica", "coche"); // Use it to compare two strings
5.2.8. Supplementary Characters
Java 5.0 has adopted the
Unicode 4.0 standard, which, for the first time, has defined
codepoints that fall outside the 16-bit range of the
char type. When working with these
"supplementary characters" (which
are primarily Han ideographs), you must use int
values to represent the individual character. In
String objects, or for any other type that
represents text as a sequence of char values,
these supplementary characters are represented as a series of two
char values known as a surrogate
pair.
Although readers of the English edition of this book are unlikely to
ever encounter supplementary characters, you should be aware of them
if you are working on programs that might be localized for use in
China or another country that uses Han ideographs. To help you work
with supplementary characters, the Character,
String, StringBuffer, and
StringBuilder classes have been extended with new
methods that operate on int codepoints rather than
char values. The following code illustrates some
of these methods. You can find other, similar methods in the
reference section and read about them in the online javadoc
documentation.
int codepoint = 0x10001; // This codepoint doesn't fit in a char
// Get the UTF-16 surrogate pair of chars for the codepoint
char[] surrogatePair = Character.toChars(codepoint);
// Convert the chars to a string.
String s = new String(surrogatePair);
// Print string length in characters and codepoints
System.out.println(s.length());
System.out.println(s.codePointCount(0, s.length()-1));
// Print encoding of first character, then encoding of first codepoint.
System.out.println(Integer.toHexString(s.charAt(0)));
System.out.println(Integer.toHexString(s.codePointAt(0)));
// Here's how to safely loop through a string that may contain
// supplementary characters
String tricky = s + "Testing" + s + "!";
int i = 0, n = tricky.length();
while(i < n) {
// Get the codepoint at the current position
int cp = tricky.codePointAt(i);
if (cp < '\uffff') System.out.println((char) cp);
else System.out.println("\\u" + Integer.toHexString(cp));
// Increment the string index by one codepoint (1 or 2 chars).
i = tricky.offsetByCodePoints(i, 1);
}
5.2.9. Formatting Text with printf() and format( )
A common task when working with text output
is to combine values of various types into a single block of
human-readable text. One way to accomplish this relies on the
string-conversion power of Java's string
concatenation operator. It results in code like this:
System.out.println(username + " logged in after " + numattempts +
"attempts. Last login at: " + lastLoginDate);
Java 5.0 introduces an alternative that is familiar to C programmers:
a printf( ) method.
"printf" is short for
"print formatted" and it combines
the printing and formatting functions into one call. The
printf( ) method
has been added to the PrintWriter and
PrintStream output stream classes in
Java 5.0.
It is a varargs method that expects one or more arguments. The first
argument is the "format string." It
specifies the text to be printed and typically includes one or more
"format specifiers," which are
escape sequences beginning with character %. The
remaining arguments to printf( ) are values to be
converted to strings and substituted into the format string in place
of the format specifiers. The format specifiers constrain the types
of the remaining arguments and specify exactly how they are converted
to strings. The string concatenation shown above can be rewritten as
follows in Java 5.0:
System.out.printf("%s logged in after %d attempts. Last login at: %tc%n",
username, numattempts, lastLoginDate);
The format specifier %s simply substitutes a
string. %d expects the corresponding argument to
be an integer and displays it as such. %tc expects
a Date, Calendar, or number of
milliseconds and converts that value to text representation of the
full date and time. %n performs no conversion: it
simply outputs the platform-specific line terminator, just as the
println( ) method does.
The conversions performed by printf() are all
properly localized. Times and dates are displayed with
locale-appropriate punctuation, for example. And if you request that
a number be displayed with a thousands separator,
you'll get locale-specific punctuation there, too (a
comma in England and a period in France, for example).
In addition to the basic printf( ) method,
PrintWriter and PrintStream
also define a synonymous method named
format()
: it takes exactly the same arguments
and behaves in exactly the same way. The
String
class also has a
format() method in Java 5.0. This static
String.format() method behaves like
PrintWriter.format( ) except that instead of
printing the formatted string to a stream, it simply returns it:
// Format a string, converting a double value to text using two decimal
// places and a thousands separator.
double balance = getBalance();
String msg = String.format("Account balance: $%,.2f", balance);
The
java.util.Formatter class is the general-purpose formatter
class behind the
printf()
and format( )
utility methods. It can format text to any
Appendable object or to a named file. The
following code uses a Formatter object to write a
file:
public static void writeFile(String filename, String[] lines)
throws IOException
{
Formatter out = new Formatter(filename); // format to a named file
for(int i = 0; i < lines.length; i++) {
// Write a line of the file
out.format("%d: %s%n", i, lines[i]);
// Check for exceptions
IOException e = out.ioException();
if (e != null) throw e;
}
out.close();
}
When you concatenate an object to a string, the object is converted
to a string by calling its toString() method. This
is what the Formatter class does by default as
well. Classes that want more precise control over their formatting
can implement the
java.util.Formattable interface in addition to implementing
toString().
We'll see additional examples of formatting with
printf( ) when we cover the APIs for working with
numbers, dates, and times. See java.util.Formatter
for a complete list of available format specifiers and options.
5.2.10. Logging
Simple terminal-based programs can
send their output and error messages to the console with
System.out.println() or System.out.print(
). Server programs that run unattended for long periods
need a different solution for output: the hardware they run on may
not have a display terminal attached, and, if it does, there is
unlikely to be anyone looking at it. Programs like this need
logging functionality in which output messages
are sent to a file for later analysis or through a network socket for
remote monitoring. Java 1.4 provides a logging API in the
java.util.logging package.
Typically, the application developer uses a
Logger object
associated with the class or package of the application to generate
log messages at any of seven severity levels (see
java.util.logging.Level). These messages may
report errors and warnings or provide informational messages about
interesting events in the application's life cycle.
They can include debugging information or even trace the execution of
important methods within the program.
The system administrator or end user of the application is
responsible for setting up a logging
configuration
file that specifies where log messages are directed (the console, a
file, a network socket, or a combination of these), how they are
formatted (as plain text or XML documents), and at what severity
threshold they are logged (log messages with a severity below the
specified threshold are discarded with very little overhead and
should not significantly impact the performance of the application).
The logging level severity threshold can be configured independently
so that Logger objects associated with different
classes or packages can be "tuned
in" or "tuned
out." Because of this end-user configurability, you
should feel free to use logging output liberally in your program. In
normal operation, most log messages will be discarded efficiently and
automatically. During program development, or when diagnosing a
problem in a deployed application, however, the log messages can
prove very valuable.
For
most applications, using the Logging API is quite simple. Obtain a
named Logger object whenever necessary by calling
the static Logger.getLogger( ) method, passing the
class or package name of the application as the logger name. Then,
use one of the many Logger instance methods to
generate log messages. The easiest methods to use have names that
correspond to severity levels, such as
severe() , warning( ), and
info(). Here is some sample code:
import java.util.logging.*;
// Get a Logger object named after the current package
Logger logger = Logger.getLogger("com.davidflanagan.servers.pop");
logger.info("Starting server."); // Log an informational message
ServerSocket ss; // Do some stuff
try { ss = new ServerSocket(110); }
catch(Exception e) { // Log exceptions
logger.log(Level.SEVERE, "Can't bind port 110", e); // Complex log message
logger.warning("Exiting"); // Simple warning
return;
}
logger.fine("got server socket"); // Fine-detail (low-severity) debug message
5.2.11. Pattern Matching with Regular Expressions
In Java 1.4 and later, you can perform
textual pattern matching with regular expressions. Regular expression
support is provided by the Pattern and
Matcher classes of the
java.util.regex package, but the
String class defines a number of convenient
methods that allow you to use regular expressions even more simply.
Regular expressions use a fairly complex grammar to describe patterns
of characters. The Java implementation uses the same regex syntax as
the Perl 5 programming language. See the
java.util.regex.Pattern class in the reference section for a
summary of this syntax or consult a good Perl programming book for
further details. For a complete tutorial on Perl-style regular
expressions, see Mastering Regular Expressions
(O'Reilly).
The
simplest String method that accepts a regular
expression argument is matches( ); it returns
true if the string matches the pattern defined by
the specified regular expression:
// This string is a regular expression that describes the pattern of a typical
// sentence. In Perl-style regular expression syntax, it specifies
// a string that begins with a capital letter and ends with a period,
// a question mark, or an exclamation point.
String pattern = "^[A-Z].*[\\.?!]$";
String s = "Java is fun!";
s.matches(pattern); // The string matches the pattern, so this returns true.
The matches( ) method
returns true only if the entire string is a match
for the specified pattern. Perl programmers should note that this
differs from Perl's behavior, in which a match means
only that some portion of the string matches the pattern. To
determine if a string or any substring matches a pattern, simply
alter the regular expression to allow arbitrary characters before and
after the desired pattern. In the following code, the regular
expression characters .* match any number of
arbitrary characters:
s.matches(".*\\bJava\\b.*"); // True if s contains the word "Java" anywhere
// The b specifies a word boundary
If you are already familiar with
Perl's regular expression syntax, you know that it
relies on the liberal use of backslashes to escape certain
characters. In Perl, regular expressions are language primitives and
their syntax is part of the language itself. In Java, however,
regular expressions are described using strings and are typically
embedded in programs using string literals. The syntax for Java
string literals also uses the backslash as an escape character, so to
include a single backslash in the regular expression, you must use
two backslashes. Thus, in Java programming, you will often see double
backslashes in regular expressions.
In addition to matching, regular
expressions can be used for search-and-replace operations. The
replaceFirst( ) and
replaceAll( )
methods search a string for the first substring or all substrings
that match a given pattern and replace the string or strings with the
specified replacement text, returning a new string that contains the
replacements. For example, you could use this code to ensure that the
word "Java" is correctly
capitalized in a string s:
s.replaceAll("(?i)\\bjava\\b",// Pattern: the word "java", case-insensitive
"Java"); // The replacement string, correctly capitalized
The replacement string passed to replaceAll() and
replaceFirst( ) need not be a simple literal
string; it may also include references to text that matched
parenthesized subexpressions
within the pattern. These references take the form of a
dollar sign followed by the number
of the subexpression. (If you are not familiar with parenthesized
subexpressions within a regular expression, see
java.util.regex.Pattern in the reference section.)
For example, to search for words such as JavaBean, JavaScript,
JavaOS, and JavaVM (but not Java or Javanese) and to replace the Java
prefix with the letter J without altering the suffix, you could use
code such as:
s.replaceAll("\\bJava([A-Z]\\w+)", // The pattern
"J$1"); // J followed by the suffix that matched the
// subexpression in parentheses: [A-Z]\\w+
The other
String method that uses regular expressions is
split(), which returns an array of the substrings
of a string, separated by delimiters that match the specified
pattern. To obtain an array of words in a string separated by any
number of spaces, tabs, or newlines, do this:
String sentence = "This is a\n\ttwo-line sentence";
String[] words = sentence.split("[ \t\n\r]+");
An optional second argument specifies the maximum number of entries
in the returned array.
The
matches( ), replaceFirst(),
replaceAll( ), and split()
methods are suitable for when you use a regular expression only once.
If you want to use a regular expression for multiple matches, you
should explicitly use the
Pattern and
Matcher classes of the
java.util.regex package. First, create a
Pattern object to represent your regular
expression with the static
Pattern.compile() method.
(Another reason to use the Pattern class
explicitly instead of the String convenience
methods is that Pattern.compile( ) allows you to
specify flags such as
Pattern.CASE_INSENSITIVE that globally alter the way
the pattern matching is done.) Note that the
compile() method can throw a
PatternSyntaxException
if you pass it an invalid regular expression string. (This exception
is also thrown by the various String convenience
methods.) The Pattern class defines
split() methods that are similar to the
String.split() methods. For all other matching,
however, you must create a Matcher object with the
matcher() method and specify the text to be
matched against:
import java.util.regex.*;
Pattern javaword = Pattern.compile("\\bJava(\\w*)", Pattern.CASE_INSENSITIVE);
Matcher m = javaword.matcher(sentence);
boolean match = m.matches(); // True if text matches pattern exactly
Once you have a Matcher object, you can compare
the string to the pattern in various ways. One of the more
sophisticated ways is to find all substrings that match the pattern:
String text = "Java is fun; JavaScript is funny.";
m.reset(text); // Start matching against a new string
// Loop to find all matches of the string and print details of each match
while(m.find()) {
System.out.println("Found '" + m.group(0) + "' at position " + m.start(0));
if (m.start(1) < m.end(1)) System.out.println("Suffix is " + m.group(1));
}
The Matcher class has been enhanced in several ways
in Java 5.0. The most important of
these is the ability to save the results of the most recent match in
a MatchResult object. The previous algorithm that
finds all matches in a string could be rewritten in Java 5.0 as
follows:
import java.util.regex.*;
import java.util.*;
public class FindAll {
public static void main(String[] args) {
Pattern pattern = Pattern.compile(args[0]);
String text = args[1];
List<MatchResult> results = findAll(pattern, text);
for(MatchResult r : results) {
System.out.printf("Found '%s' at (%d,%d)%n",
r.group(), r.start(), r.end());
}
}
public static List<MatchResult> findAll(Pattern pattern, CharSequence text)
{
List<MatchResult> results = new ArrayList<MatchResult>();
Matcher m = pattern.matcher(text);
while(m.find()) results.add(m.toMatchResult());
return results;
}
}
5.2.12. Tokenizing Text
java.util.Scanner
is a
general purpose text tokenizer, added in
Java 5.0 to complement the
java.util.Formatter class described earlier in
this chapter. Scanner takes full advantage of Java
regular expressions and can take its input text from a string, file,
stream, or any object that implements the
java.lang.Readable
interface. Readable is also new in Java 5.0 and is
the opposite of the Appendable interface.
A Scanner can break its input text into tokens
separated by whitespace or any desired delimiter character or regular
expression. It implements the
Iterator<String> interface, which allows for
simple looping through the returned tokens.
Scanner also defines a variety of convenience
methods for parsing tokens as boolean, integer, or
floating-point values, with locale-sensitive number parsing. It has
skip( ) methods
for skipping input text that matches a specified pattern and also has
methods for searching ahead in the input text for text that matches a
specified pattern.
Here's how you could use a
Scanner to break a String into
space-separated words:
public static List<String> getTokens(String line) {
List<String> result = new ArrayList<String>();
for(Scanner s = Scanner.create(line); s.hasNext(); )
result.add(s.next());
return result;
}
Here's how you might use a
Scanner to break a file into lines:
public static void printLines(File f) throws IOException {
Scanner s = Scanner.create(f);
// Use a regex to specify line terminators as the token delimiter
s.useDelimiter("\r\n|\n|\r");
while(s.hasNext()) System.out.println(s.next());
}
The following method uses Scanner to parse an
input line in the form x + y = z. It demonstrates
the ability of a Scanner to scan numbers. Note
that Scanner does not just parse Java-style
integer literals: it supports thousands separators and does so in a
locale-sensitive wayfor example, it would parse the integer
1,234 for an American user and 1.234 for a French user. This code
also demonstrates the skip() method and shows that
a Scanner can scan text directly from an
InputStream.
public static boolean parseSum() {
System.out.print("enter sum> "); // Prompt the user for input
System.out.flush(); // Make sure prompt is visible immediately
try {
// Read and parse the user's input from the console
Scanner s = Scanner.create(System.in);
s.useDelimiter(""); // Don't require spaces between tokens
int x = s.nextInt(); // Parse an integer
s.skip("\\s*\\+\\s*"); // Skip optional space and literal +
int y = s.nextInt(); // Parse another integer
s.skip("\\s*=\\s*"); // Skip optional space and literal =
int z = s.nextInt(); // Parse a third integer
return x + y == z;
}
catch(InputMismatchException e) { // pattern does not match
throw new IllegalArgumentException("syntax error");
}
catch(NoSuchElementException e) { // no more input available
throw new IllegalArgumentException("syntax error");
}
}
5.2.13. StringTokenizer
A
number of other Java classes operate on strings and characters. One
notable class is java.util.StringTokenizer, which
you can use to break a string of text into its component words:
String s = "Now is the time";
java.util.StringTokenizer st = new java.util.StringTokenizer(s);
while(st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
You can even use this class to tokenize words that are delimited by
characters other than spaces:
String s = "a:b:c:d";
java.util.StringTokenizer st = new java.util.StringTokenizer(s, ":");
java.io.StreamTokenizer is another tokenizing
class. It has a more complicated API and has more powerful features than
StringTokenizer.
|