Content Negotiation (CGI Programming with Perl)

2.6.1. Media Type

Clients may include a header with their HTTP request indicating a list of preferred formats. The header for media type looks like this:

Accept: text/html;q=1, text/plain;q=0.8, 
        image/jpeg, image/gif, */*;q=0.001

The Accept header list contains HTTP media types in the type/subtype format used by the Content-Type header, followed by optional quality factors (asterisks serve as wildcards). Quality factors are floating-point numbers between and 1 that indicate a preference for a particular type; the default is 1. Servers are expected to examine the Accept media types and return data that is preferred by the browser. When multiple values have the same quality factor, the more specific one (i.e., where the quality factor is specified or the media type is not a wildcard) has higher priority.

In the previous example, documents would be returned with the following priority:

text/html
image/jpeg or image/gif
text/plain
*/* (anything else)

In reality, media type negotiation is not often used because it is unwieldy for a browser to list the media types of all documents it supports each time it makes a request. The majority of browsers today specify only new or less common image formats in addition to */*. Examples of the newer formats are image/p-jpeg (progressive JPEG) or image/png. (PNG was created as an open alternative to GIF, which has patent issues; see Chapter 13, "Creating Graphics on the Fly"). Web servers generally do not support media type negotiation for static documents, but we will look at a CGI script that does this in the next chapter.

2.6.2. Internationalization

Although media type negotiation is becoming outdated, other forms of content negotiation are gaining much more importance. Internationalization has become a new arena where content negotiation plays an important role. Providing a document to members of other countries can mean two things: supporting other translations and possibly supporting other character sets. The Roman alphabet, the Cyrillic alphabet, and Kanji, for example, use different character sets. HTTP supports these forms of negotiation with the Accept-Language and Accept-Charset headers. Examples of these headers are:

Accept-Charset: iso-8859-5, iso-8859-1;q=0.5
Accept-Language: ru, en-gb;q=0.5, en;q=0.4

The first line indicates that the server should return the content in Cyrillic if possible or Western Roman otherwise. The language specifies Russian as the first choice, with British English as the second, and other forms of English as the third. Note that a single asterisk can be used in place of any of these values to represent a wildcard match. The default character set, unless specified, is US-ASCII or ISO-8859-1 (US-ASCII is a subset of ISO-8859-1).

Most web servers support language negotiation automatically for static documents. For example, if you perform a new installation of Apache, it will install multiple copies of the "It Worked!" welcome file in /usr/local/apache/htdocs. The files all share the index.html base name but have different extensions indicating the language code: index.html.en, index.html.fr, index.html.de, etc. If you point your browser at index.html, change the preferred language in your browser, and then reload the page, you should see it in another language.

2.6.3. Encoding

The final form of content negotiation supports encoding. Options for encoding include gzip , compress, and identity (no encoding). Here is an example header specifying that the browser supports compress and gzip :

Accept-Encoding: compress, gzip

A server may be able to speed up the download of a large document to this client by sending an encoded version of the document. The browser should decode the document automatically for the user.


2.5. Proxies		2.7. Summary

2.6. Content Negotiation

2.6.1. Media Type

2.6.2. Internationalization

2.6.3. Encoding