Tuesday, April 17, 2007

Thoughts in djvu format

As a reader of djvu ebooks, I got curious how I could make my own djvu documents. Making pdfs is quite trivial nowadays with many free pdf printers (PDFCreator, PrimoPDF, CutePDF). Using a printable document (created with Open Office or M$ Office), here's how I converted my .odt's to .djvu's with readable/searchable text (text not treated as graphics).

Materials & Equipment

  1. djvudigital that comes with the djvulibre package. It converts pdf/ps to djvu.

  2. Ghostscipt (Usually comes with Linux)

  3. Linux: it is easier to compile/install djvulibre in here, Cygwin might also work (A real Linux is usually better)

  4. If printing in Windows,
  5. A pdf printer that uses GNU Ghostscript such as CutePDF



Methodology
  1. Convert the document to PDF. Unfortunately direct pdf export from Open Office may not work optimally, because it encodes PDF's differently. This may result in text being treated as graphics in the resulting djvu. Since djvudigital uses Ghostscript, it is better to "print to pdf" using the system's printer devices.. When exporting from OpenOffice, check the "Tagged PDF" option that would be seen by clicking the "Export..." button. This option makes the PDF file more readable to djvudigital. In Windows, I also use CutePDF since it uses GNU Ghostscript converter.

  2. In the bash prompt type:

    djvudigital --exact-color --words --lines -v input.pdf output.djvu

    Here it is assumed that input.pdf will be converted to output.djvu. The option --words ensures the text in the pdf remains text in the djvu. The explanation of the other options can be displayed by typing:

    djvudigital --help


I already had a djvu version of my undergraduate thesis (which is not supposed to be online yet). Here is my Lumban embroidery paper in djvu format.

No comments: