Alongside my large selection of "visionary" projects, there are a small
number of interesting side ideas. These get picked up whenever I have time,
or lack the necessary reference material to hand for my larger project. As happens on the train home.
Yesterday I converted a Word document listing of cocktails into a simple HTML file.
The process was simple: Word into plain text. Plain text into XML, courtesy of
Perl. And then XML into HTML with an XSL style sheet.
I first wrote a very simple Perl script that relied completely on the
cocktails list data format, as I saw it. When I ran the script:perl drinksfix.pl <cocktails.txt >cocktails.xml
against the real data. there were a couple of problems.
The usual problem of & becoming & was easy to fix, the conversion of
ã (and all the variations present) was not. This took about half an hour of to'ing and fro'ing. Oh! for Internet connectivity on South West Trains!
Then I found a couple of oddities in the format itself, which broke the simplistic
Perl code. I fixed the cocktails.txt source, since this was a once off job. The XML was
soon well formed.
It was then a simple case of generating an XSL stylesheet to convert it into HTML. Here you must remember that you need:
<xsl:text disable-output-escaping="yes">
to include HTML tags, like anchors, in the output.
Under Windows, the conversion process is simply:msxsl cocktails.xml basic_cocktails.xsl >cocktail_list.html
The layout is simple and functional, and I added an index to the end, just for
fun. The complete source code (plus data files) are released under the GPL and held here. With the cocktail list standing at 800K, it is held here, if you want to see the very unpretty output. But it'll be quicker to download the zip and decompress locally.