html2latex is a Perl script designed to convert a properly formatted HTML file into a properly formatted LaTeX file.


Version 1.0 is out. It basically is a installation fix for 0.9, but it also adds the 'kill' tag type which allows you do such things as remove any javascript. 'make test' failed in 0.9, which could be a major headache for some people. Version 0.9 is a minor release that supports international characters, quote-expansion, plus a fex bug fixes. You can dowload the latest tar.gz here. If you already got 0.9 installed and aren't bothered by javascript, you don't have to bother with 1.0; it's just the same.


  1. It can handle URLs on the command line and in the IMG tag.
  2. Converts pictures from jpeg or gif to png. pdflatex can have included pngs.
  3. Renders nested tables correctly.
  4. Supports most international characters (umlats, accents, etc).
  5. Converts all headers into sections. This can be easily customized.
  6. Lists of any form.
  7. Endless configuration thourgh command-line options or an XML config file.
  8. It is also very easy to extend by writing your own handlers.


If you try out the software, please go to the feedback site and take the survey. Or you can put comments in the forum, or email me. I'd like your suggestions.

All required modules listed below and all of their dependencies can be found here

html2latex requires the following modules for basic operation:

  1. HTML::Tree - It requires HTML::Parser.
  2. XML::Simple - It requires XML::Parser.
html2latex can use the following moduls for advanced operation:
  1. LWP::Simple - Used do download URLs. Requires lots of things; look for Bundle::LWP or libwww.
  2. URI - Comes with libwww or Bundle::LWP. Also required to grab URLs.
  3. Image::Magick - If you want to convert images to PNGs.

The easiest way to get these modules is to use the CPAN module. Try man CPAN.