Thursday, January 31, 2008

doc2pdf

We all know that there are lots and lots of reasons to use the command line. But sometimes, believe it or not, there's a GUI program that does something that's not available as a command line utility.

Yet you want a command line utility, because you've got to do this thing 20-30 times, and sitting there clicking with the mouse is just boring.

So what'dya do? You hack the GUI program to do what you want it to do from the command line.

Let's back up a bit: Baen Books, the SF publishing house, has a long history of giving away its books. There's the online Baen Free Library, and every once in a while they'll put a CD full of old titles in the back of a book. You can copy this, give it to friends, use the copies for Frisbees at a warez convention, whatever, just so long as you don't sell them. The disks have several recent books, and, though I'll never claim that Baen's writers are in the class of Lois McMaster Bujold (well, she does have one book in the library) or Terry Pratchett, there's a satisfying bunch of science fiction and fantasy mind-candy that can be downloaded onto a computer and read on the plane or in one of those cheap hotel rooms that physicists crash in when we're at a meeting.

Baen puts the books out in all sorts of formats: HTML, DOC, RTF, Palm, Kindle, etc.

Everything except PDF.

Why, I don't know. PDF is compact, can be read on every computer of every kind, leaves the text exactly the way you want it. It just makes sense to distribute electronic books that include images in PDF format.* Yet Baen doesn't.

So last week I got hold of the disk in the back of Eric Flint and David Weber's% 1634-The Baltic War (the online version is only the first few chapters, alas). The disk has maybe 30 novels, some of which I actually want to read, some of which I'll read in a pinch, and all of them in the aforementioned everything-except-pdf formats.

The question before the house is how to convert all this mind-numbing SF into PDF files. The obvious way is to take the Word (.doc) or RTF version of the file, load it into OpenOffice.org, and hit the Export PDF button. A GUI solution, no doubt.

Given that it takes a few minutes to convert each novel, that's a really boring task. So let's look around for a command line solution. A look at the web shows that there aren't too many — make that there aren't any, in the sense that there is no open source, standalone, Linux command line program which takes a Word file and converts it into PDF.

There is, however, a way to do it. OpenOffice.org has a powerful Macro/scripting program included, and ways to call the scripts from the command line. All you need is the script.

Hey, don't look at me. I borrow, mostly, and point you to where I've done the borrowing. In this case, it's Convert MS/Word to PDF, posted by Graham Williams. He gives complete instructions for setting up the Macros and writing a short shell program to call them.

The only caveat is to heed Williams when he tells you that doc2pdf calls OpenOffice.org and then quits: it takes 5-10 minutes to convert one of Baen's novels to PDF, so if you want to do a batch job you need to put a long sleep time between calls of the shell command. Best to start the thing off at night and check it in the morning — just what you want a batch job to do, anyway.

So add one more command line tool into your kit, and good reading.

* It makes sense to distribute text without pictures as an ASCII file, but Baen doesn't do that, either. Well, I wasn't expecting the Moon, you know.

% I'm not a big fan of Weber's style of writing. Here's why. For some reason I keep going back to check the books out of the library, though. If I found myself paying for them I'd be really upset with me.

The RTF and DOC files are bitwise identical. This says something about someone. I'm not sure what, or who, though.

0 comments: