Perl in a Nutshell

Perl in a NutshellSearch this book
Previous: 16.3 FTP Configuration with Net::NetrcChapter 17Next: 17.2 The LWP Modules
 

17. The LWP Library

Contents:
LWP Overview
The LWP Modules
The HTTP Modules
The HTML Module
The URI Module

LWP, the library for web access in Perl, is a bundle of modules that provide a consistent, object-oriented approach to creating web applications. The library, downloaded as the single file named libwww-perl, contains the following classes:

File

Parses directory listings.

Font

Handles Adobe Font Metrics.

HTML

Parses HTML files and converts them to printable or other forms.

HTTP

Provides client requests, server responses, and protocol implementation.

LWP

The core of all web client programs. It creates network connections and manages the communication and transactions between client and server.

URI

Creates, parses, and translates URLs.

WWW

Implements standards used for robots (automatic client programs).

Each module provides different building blocks that make up a whole web transaction - from connection, to request, to response and returned data. Each part is encapsulated by an object to give a standard interface to every web program you write. The following section gives an overview of how LWP works to create a web client.

17.1 LWP Overview

Any web transaction requires an application that can establish a TCP/IP network connection and send and receive messages using the appropriate protocol (usually HTTP). TCP/IP connections are established using sockets, and messages are exchanged via socket filehandles. See Chapter 13, Sockets, for information on how to manually create socket applications. LWP provides an object for this application with LWP::UserAgent for clients; HTTP::Daemon provides a server object. The UserAgent object acts as the browser: it connects to a server, sends requests, receives responses, and manages the received data. This is how you create a UserAgent object:

use LWP::UserAgent;
$ua = new LWP::UserAgent;
The UserAgent now needs to send a message to a server requesting a URL (Universal Resource Locator) using the request method. request forms an HTTP request from the object given as its argument. This request object is created by HTTP::Request.

An HTTP request message contains three elements. The first line of a message always contains an HTTP command called a method, a Universal Resource Identifier (URI), which identifies the file or resource the client is querying, and the HTTP version number. The following lines of a client request contain header information, which provides information about the client and any data it is sending the server. The third part of a client request is the entity body, which is data being sent to the server (for the POST method). The following is a sample HTTP request:

GET /index.html HTTP/1.0
User-Agent: Mozilla/1.1N (Macintosh; I; 68K)
Accept: */*
Accept: image/gif
Accept: image/jpeg
LWP::UserAgent->request forms this message from an HTTP::Request object. A request object requires a method for the first argument. The GET method asks for a file, while the POST method supplies information such as form data to a server application. There are other methods, but these two are most commonly used.

The second argument is the URL for the request. The URL must contain the server name, for this is how the UserAgent knows where to connect. The URL argument can be represented as a string or as a URI::URL object, which allows more complex URLs to be formed and managed. Optional parameters for an HTTP::Request include your own headers, in the form of an HTTP::Headers object, and any POST data for the message. The following example creates a request object:

use HTTP::Request;

$req = new HTTP::Request(GET, $url, $hdrs);
The URL object is created like this:
use URI::URL;

$url = new URI::URL('www.ora.com/index.html');
And a header object can be created like this:
use HTTP::Headers;

$hdrs = new HTTP::Headers(Accept => 'text/plain',
                          User-Agent => 'MegaBrowser/1.0');
Then you can put them all together to make a request:
use LWP::UserAgent;  # This will cover all of them!

$hdrs = new HTTP::Headers(Accept => 'text/plain',
                          User-Agent => 'MegaBrowser/1.0');

$url = new URI::URL('www.ora.com/index.html');
$req = new HTTP::Request(GET, $url, $hdrs);
$ua = new LWP::UserAgent;
$resp = $ua->request($req);
if ($resp->is_success) {
        print $resp->content;}
else {
        print $resp->message;}
Once the request has been made by the user agent, the response from the server is returned as another object, described by HTTP::Response. This object contains the status code of the request, returned headers, and the content you requested, if successful. In the example, is_success checks to see if the request was fulfilled without problems, thus outputting the content. If unsuccessful, a message describing the server's response code is printed.

There are other modules and classes that create useful objects for web clients in LWP, but the above examples show the most basic ones. For server applications, many of the objects used above become pieces of a server transaction, which you either create yourself (such as response objects) or receive from a client (like request objects).

Additional functionality for both client and server applications is provided by the HTML module. This module provides many classes for both the creation and interpretation of HTML documents.

The rest of this chapter provides information for the LWP, HTTP, HTML, and URI modules.


Previous: 16.3 FTP Configuration with Net::NetrcPerl in a NutshellNext: 17.2 The LWP Modules
16.3 FTP Configuration with Net::NetrcBook Index17.2 The LWP Modules