In addition to grabbing information off web pages, a shell script can also feed certain information to a website and scrape the data that the web page spits back. An excellent example of this technique is to implement a command that looks up the specified word in an online dictionary and returns its definition. There are a number of dictionaries online, but we'll use the WordNet lexical database that's made available through the Cognitive Science Department of Princeton University.
Learn more |
You can read up on the WordNet project — it's quite interesting — by visiting its website directly at http://www.cogsci.princeton.edu/~wn/ |
#!/bin/sh # define - Given a word, returns its definition. url="http://www.cogsci.princeton.edu/cgi-bin/webwn1.7.1?stage=1&word=" if [ $# -ne 1 ] ; then echo "Usage: $0 word" >&2 exit 1 fi lynx -source "$url$1" | \ grep -E '(^[[:digit:]]+\.| has [[:digit:]]+$)' | \ sed 's/<[^>]*>//g' | ( while read line do if [ "${line:0:3}" = "The" ] ; then part="$(echo $line | awk '{print $2}')" echo "" echo "The $part $1:" else echo "$line" | fmt | sed 's/^/ /g' fi done ) exit 0
Because you can't simply pass fmt an input stream as structurally complex as a word definition without completely ruining the structure of the definition, the while loop attempts to make the output as attractive and readable as possible. Another solution would be a version of fmt that wraps long lines but never merges lines, treating each line of input distinctly, as shown in script #33, toolong.
Worthy of note is the sed command that strips out all the HTML tags from the web page source code:
sed 's/<[^>]*>//g'
This command removes all patterns that consist of an open angle bracket (<) followed by any combination of characters other than a close angle bracket (>), finally followed by the close angle bracket. It's an example of an instance in which learning more about regular expressions can pay off handsomely when working with shell scripts.
$ define limn The verb limn: 1. delineate, limn, outline -- (trace the shape of) 2. portray, depict, limn -- (make a portrait of; "Goya wanted to portray his mistress, the Duchess of Alba") $ define visionary The noun visionary: 1. visionary, illusionist, seer -- (a person with unusual powers of foresight) The adjective visionary: 1. airy, impractical, visionary -- (not practical or realizable; speculative; "airy theories about socioeconomic improvement"; "visionary schemes for getting rich")
WordNet is just one of the many places online where you can look up words in an automated fashion. If you're more of a logophile, you might appreciate tweaking this script to work with the online Oxford English Dictionary, or even the venerable Webster's. A good starting point for learning about online dictionaries (and encyclopedias, for that matter) is the wonderful Open Directory Project. Try http://dmoz.org/Reference/Dictionaries/ to get started.
This HTML Help has been published using the chm2web software. |