[Coco] Re: Rainbow on Disc - OCR

John R. Hogerhuis jhoger at pobox.com
Fri Jun 10 12:34:19 EDT 2005


On Fri, 2005-06-10 at 09:25 -0600, Michael Wayne Harwood wrote:
> John,
> 
> You make some excellent points!  Would you be willing to lead the charge
> in investigating and organizing what would be required to move forward
> with this?  I think that before we start scanning magazines en masse we
> should look into the minimal requirements needed a successful OCR project.
> 

Yes I will, if we have a non-squishy OK from Lonnie. From his point of
view, the concept is this:

With each PDF on the disk, there will be a similarly named ascii text
file. This text file will have the raw ASCII text that a computer
scanned from Rainbow, with editing for proofreading. The purpose is to
be able to do a text search through Rainbow to find articles and even
advertisements (you'd be surprised how often this comes up).  For each
program listing this file may be broken further into a set of program
text files with the volume/issue/listing name & number.

-- John.




More information about the Coco mailing list