[Coco] RE: Rainbow magazines]
John R. Hogerhuis
jhoger at pobox.com
Wed Jun 8 13:07:29 EDT 2005
On Wed, 2005-06-08 at 06:03 -0500, Boisy G. Pitre wrote:
> I have some questions/thoughts:
> 1. What software tools would be used for scanning, and at what DPI?
Depends on what we want to do with it. For just raw information I think
150-300 dpi is good enough. If you want to OCR it, probably 600dpi.
As far as tools, it doesn't really matter as long as we fix the DPI. On
Linux I've used XSane, Gimp, Imagemagick, and other tools for this sort
of thing. There is no good OCR tool for Linux. I've used Transym OCR
under Windows ($40) with good results.
ISTR that Rainbow put halftone colors behind the listings, that might
cause problems for OCR.
> 2. Would some type of OCR be used in coordination with a word
> indexing scheme? This would be most valuable for searching through
> magazines for a particular keyword
Hopefully someone else knows more about this... I started and worked on
the "Thinking Forth" republication, and for it we actually retypeset the
book. I did the complete initial scan but it was less than 300 pages I
think. These were uploaded to SourceForge. Then we ran the whole thing
through OCR. Chapters were assigned to individuals to retypeset in
LaTeX. Some people cleaned up scanned images.
I'm not sure if there's a halfway between retypesetting and just raw
scans compiled into a PDF. Anyone know? That would be ideal... I don't
think there's going to be much interest in retypesetting the Rainbow.
As you say we could probably just do the OCR page-by-page and build a
master index out of it.
> 3. If more than one person does the scanning, there needs to be a way
> to insure that the scanning quality is consistent throughout.
Some volunteers could check work. As long as the format is kept
consistent and the DPI is consistent I think this is not too critical.
The more scanners the better. I'd guess whole issues would be assigned
to each person.
More information about the Coco