[ Team LiB ] |
![]() ![]() |
3.7 Filtering Character StreamsFilterReader is an abstract class that defines a null filter; it reads characters from a specified Reader and returns them with no modification. In other words, FilterReader defines no-op implementations of all the Reader methods. A subclass must override at least the two read( ) methods to perform whatever sort of filtering is necessary. Some subclasses may override other methods as well. Example 3-6 shows RemoveHTMLReader, which is a custom subclass of FilterReader that reads HTML text from a stream and filters out all of the HTML tags from the text it returns. In the example, we implement the HTML tag filtration in the three-argument version of read( ), and then implement the no-argument version in terms of that more complicated version. The example includes an inner Test class with a main( ) method that shows how you might use the RemoveHTMLReader class. Note that we could also define a RemoveHTMLWriter class by performing the same filtration in a FilterWriter subclass. Or, to filter a byte stream instead of a character stream, we could subclass FilterInputStream and FilterOutputStream. RemoveHTMLReader is only one example of a filter stream. Other possibilities include streams that count the number of characters or bytes processed, convert characters to uppercase, extract URLs, perform search-and-replace operations, convert Unix-style LF line terminators to Windows-style CRLF line terminators, and so on. Example 3-6. RemoveHTMLReader.javapackage je3.io; import java.io.*; /** * A simple FilterReader that strips HTML tags (or anything between * pairs of angle brackets) out of a stream of characters. **/ public class RemoveHTMLReader extends FilterReader { /** A trivial constructor. Just initialize our superclass */ public RemoveHTMLReader(Reader in) { super(in); } boolean intag = false; // Used to remember whether we are "inside" a tag /** * This is the implementation of the no-op read( ) method of FilterReader. * It calls in.read( ) to get a buffer full of characters, then strips * out the HTML tags. (in is a protected field of the superclass). **/ public int read(char[ ] buf, int from, int len) throws IOException { int numchars = 0; // how many characters have been read // Loop, because we might read a bunch of characters, then strip them // all out, leaving us with zero characters to return. while (numchars == 0) { numchars = in.read(buf, from, len); // Read characters if (numchars == -1) return -1; // Check for EOF and handle it. // Loop through the characters we read, stripping out HTML tags. // Characters not in tags are copied over previous tags int last = from; // Index of last non-HTML char for(int i = from; i < from + numchars; i++) { if (!intag) { // If not in an HTML tag if (buf[i] == '<') intag = true; // check for tag start else buf[last++] = buf[i]; // and copy the character } else if (buf[i] == '>') intag = false; // check for end of tag } numchars = last - from; // Figure out how many characters remain } // And if it is more than zero characters return numchars; // Then return that number. } /** * This is another no-op read( ) method we have to implement. We * implement it in terms of the method above. Our superclass implements * the remaining read( ) methods in terms of these two. **/ public int read( ) throws IOException { char[ ] buf = new char[1]; int result = read(buf, 0, 1); if (result == -1) return -1; else return (int)buf[0]; } /** This class defines a main( ) method to test the RemoveHTMLReader */ public static class Test { /** The test program: read a text file, strip HTML, print to console */ public static void main(String[ ] args) { try { if (args.length != 1) throw new IllegalArgumentException("Wrong number of args"); // Create a stream to read from the file and strip tags from it BufferedReader in = new BufferedReader( new RemoveHTMLReader(new FileReader(args[0]))); // Read line by line, printing lines to the console String line; while((line = in.readLine( )) != null) System.out.println(line); in.close( ); // Close the stream. } catch(Exception e) { System.err.println(e); System.err.println("Usage: java RemoveHTMLReader$Test" + " <filename>"); } } } } |
[ Team LiB ] |
![]() ![]() |