Sunday, May 18, 2008

Extracting $$!#**^% .mht files from email

Everyone once in a while someone sends me an attachment in Microsoft's MHT format, which is apparently how Internet explorer archives web pages and the associated images. I've read that Thunderbird will read these files, but that's apparently not true under Linux, at least I've been unable to open the file.

On one hand you can save the file to disk, use a utility such as munpack to pull out the files (it's just a MIME attachment), find the HTML file, and edit the heck out of it so it displays nicely on your display.

On the other hand, you can email your correspondent and say “Your tanj file won't open. Send it to me in an open format, frak it!” This is counterproductive if you and your correspondent are, say, on a Pastoral Nominating Committee (though if you want to get off of the committee this may be the way to go).

On the gripping hand we can look for a utility that will do all the work for us. These turn out to be surprisingly few and far between. The only one I've been able to find is kmhtConvert, a (duh) KDE app that can either extract the files from the archive into a new directory, convert the archive to KDE's WAR format (whatever happened to .tar.gz?), or display the file directly. I haven't tried the latter two, but it does extract the files from the archive, and you can read the HTML using a standard browser.

There really ought to be a command-line utility for this, or at least a Thunderbird plug-in for Linux, but I haven't found it yet. Anyone?