6.4. Parsing Simple MIME Emails
Parsing email is not a terribly tricky thing, but we should think back to our founding virtues. Laziness tells us that we'll probably just want to use whatever's out there already, assuming it can do the job well enough. There's no sense reinventing a wheel that's been around for 20 years.
In PHP, we have a ready-made wheel in the form of PEAR's Mail::mimeDecode module. After having a poke about, you'll see that it nominally does what we wanttakes a raw email (headers and body) and parses it into chunks. A couple of tests will show that it can manage all the simple examples we throw at it. While PEAR might not be our idea of a good time, for whatever reason, we can't deny that it'll save us some time.
When we pass in a simple email with text and HTML bodies, we get the following structure:
$decoder = new Mail_mimeDecode($buffer);
$mail = $decoder->decode(array(
'include_bodies' => 1,
'decode_bodies' => 1,
'decode_headers' => 1,
));
stdClass Object
(
[headers] => Array ...
[ctype_primary] => multipart
[ctype_secondary] => mixed
[ctype_parameters] => Array
(
[boundary] => ----=_NextPart_000_7e3_7c0e_65a1
)
[parts] => Array
(
[0] => stdClass Object ...
[1] => stdClass Object ...
)
)
The object we're given back represents the top-level MIME chunk. It contains an array of headers, the content type information, and an array of subchunks. In the example, the top-level chunk is of the type multipart/mixed and contains two subchunks in the parts array. Each of these subchunks takes the same format as the parent chunk, with an array of headers, media type, and parts.
For non-multipart chunks containing actual content, the structure looks a little different:
stdClass Object
(
[headers] => Array ...
[ctype_primary] => text
[ctype_secondary] => plain
[ctype_parameters] => Array
(
[charset] => utf-8
)
[body] => hello world
)
To extract the body text, we just need to copy the $chunk->body member. The content type and disposition headers give us clues as to how to treat each chunkas body text or as an attached file.
If you're using Perl, then the MIME-Tools package provides the same services, taking either a file handle or string and parsing it into chunks. The MIME::Parser module in the package is a good place to start:
use MIME::Parser;
my $parser = new MIME::Parser;
$parser->decode_headers(1);
my $mail = $parser->parse(\*STDIN) or die "parse failed\n";
|