[ Team LiB ] |
![]() ![]() |
Recipe 26.1 Parsing an HTML Page Using thejavax.swing.text SubpackagesProblemYou want to use the classes the Java 2 Standard Edition (J2SE) makes available for parsing HTML. SolutionUse the various subpackages of the javax.swing.text package to create a parser for HTML. DiscussionThe J2SE 1.3 and 1.4 versions include the necessary classes for sifting through web pages in search of information. The Java programs these recipes use import the following classes: javax.swing.text.html. HTMLEditorKit.ParserCallback; javax.swing.text.MutableAttributeSet; javax.swing.text.html.parser.ParserDelegator; The design pattern that these classes use to read web pages involves three main elements:
The servlet and JavaBean defined in this chapter use an inner class to implement the callback. Example 26-1 shows the callback that extends javax.swing.text.html.HTMLEditorKit.ParserCallback. Example 26-1. A callback class for sifting through web pagesclass MyParserCallback extends ParserCallback { //bread crumbs that lead us to the stock price private boolean lastTradeFlag = false; private boolean boldFlag = false; public MyParserCallback( ){ //Reset the enclosing class' stock-price instance variable if (stockVal != 0) stockVal = 0f; } //A method that the parser calls each time it confronts a start tag public void handleStartTag(javax.swing.text.html.HTML.Tag t, MutableAttributeSet a,int pos) { if (lastTradeFlag && (t == javax.swing.text.html.HTML.Tag.B )){ boldFlag = true; } }//handleStartTag //A method that the parser calls each time it reaches nested text content public void handleText(char[] data,int pos){ htmlText = new String(data); if (htmlText.indexOf("No such ticker symbol.") != -1){ throw new IllegalStateException( "Invalid ticker symbol in handleText( ) method."); } else if (htmlText.equals("Last Trade:")){ lastTradeFlag = true; } else if (boldFlag){ try{ stockVal = new Float(htmlText).floatValue( ); } catch (NumberFormatException ne) { try{ // tease out any commas in the number using NumberFormat java.text.NumberFormat nf = java.text.NumberFormat. getInstance( ); Double f = (Double) nf.parse(htmlText); stockVal = (float) f.doubleValue( ); } catch (java.text.ParseException pe){ throw new IllegalStateException( "The extracted text " + htmlText + " cannot be parsed as a number!"); }//try }//try //Reset the inner class's instance variables lastTradeFlag = false; boldFlag = false; }//if } //handleText }//MyParserCallback A callback includes methods that represent the attainment of a certain element of a web page during the parsing process. For example, the parser (the object that encloses the callback object) calls handleStartTag( ) whenever it runs into an opening tag as it traverses the web page. Examples of opening tags are <html>, <title>, or <body>. Therefore, when you implement the handleStartTag( ) method in the code, you can control what your program does when it finds an opening tag, such as "prepare to grab the text that appears within the opening and closing title tag." Example 26-1 uses a particular algorithm to search a web page for an updated stock quote, and this is what the two methods (handleStartTag( ) and handleText( )) accomplish in the MyParserCallback class:
Example 26-2 shows a snippet of code that uses the ParserDelegator and MyParserCallback objects, just to give you an idea of how they fit together before we move on to the servlet and JSP. Example 26-2. A code snippet shows the parser and callback classes at work//Instance variables private ParserDelegator htmlParser = null; private MyParserCallback callback = null; //Initialize a BufferedReader and a URL inside of a method for connecting //to and reading a web page BufferedReader webPageStream = null; URL stockSite = new URL(BASE_URL + symbol); //Connect inside of a method webPageStream = new BufferedReader( new InputStreamReader(stockSite.openStream( ))); //Create the parser and callback htmlParser = new ParserDelegator( ); callback = new MyParserCallback( );//ParserCallback //Call parse( ), passing in the BufferedReader and callback objects htmlParser.parse(webPageStream,callback,true); The parse( ) method of ParserDelegator is what triggers the calling of the callback's methods, with the callback passed in as an argument to parse( ). Now let's see how these classes work in a servlet, JavaBean, and JSP. See AlsoA Javadoc link for ParserDelegator: http://java.sun.com/j2se/1.4.1/docs/api/javax/swing/text/html/parser/ParserDelegator.html; Chapter 27 on using web services APIs to grab information from web servers. ![]() |
[ Team LiB ] |
![]() ![]() |