[Coco] Rainbow archives in DjVu
deek at d2dc.net
Tue Mar 17 02:04:05 EDT 2009
For the past week or so I've been putting together a system for converting the
Rainbow scans on excalibur1 to DjVu format; specifically, one that doesn't use
the 'pdf2djvu' program (which basically sucks for anything more than the
simplest "I want this PDF to be smaller" needs). The 'cpaldjvu' program might
have worked, but it doesn't seem to be all that controllable and I would
rather finish the project within the expected lifetime of the universe (it is
just a tad slow :) ). Luckily, I've come up with something using bash and the
GIMP that seems to do the job pretty well.
My processor is currently reducing the size of the files to about 10% their
original size (ex: the Jan 1992 issue has slimmed down from 103,045,775 bytes
to 9,180,533 bytes -- a 300dpi scanned page usually compresses down to
somewhat less than 200K). Plus, unlike pdf2djvu, it manages to separate the
page content from the page itself, which is what allows big compression
without reducing the resolution of the actual text.
A side effect of being able to split up each page into the different layers is
that Google's "Ocropus" software can really do a great job on making the
magazines searchable/indexable (and the OCR'ed text can be inserted right into
the document very easily).
Anyway, is this something that I'm doing just for me, or is there wider
interest around here?
Note: I wouldn't consider this a complete substitute for having the full-size
scans out there somewhere; there are almost certainly better ways to do what
I'm doing, and future versions of djvulibre should allow them to be
re-compressed with better methods allowing even better compression and/or
display. It's just that DjVu uses a LOT less memory and displays MUCH faster.
More information about the Coco