[Coco] Re: Rainbow on Disc - OCR

Michael Wayne Harwood michael at musicheadproductions.org
Fri Jun 10 11:25:48 EDT 2005


John,

You make some excellent points! Would you be willing to lead the charge
in investigating and organizing what would be required to move forward
with this? I think that before we start scanning magazines en masse we
should look into the minimal requirements needed a successful OCR project.


Regards,
Michael Harwood



> Actually, I don't think the OCR is a risk.

>

> Think of it this way: we have a bunch of scanned pages, I filter them to

> black and white, and run OCR on them. This is a batch operation so it

> doesn't take anyone much time.

>

> Proofreading work can be done be people who don't even have a scanner,

> so we have the possibility of bringing in many more volunteers. That

> means we're even more scalable than the scanning work. So we'd probably

> be done with OCR at about the same time as scanning work in general is

> done, so no work would be delayed by it.

>

> I really think it should be brought into scope, considering the clear

> utility of such a resource (grepable Rainbow, cool...) and the fact that

> it's not a hard thing to do (done it before on Thinking Forth).

>

> Just raw ascii text.No doing a repub or anything seriously hard like

> that.

>

> There are probably a few of us who could take on this aspect of the

> project if you want to split the production work between OCR and

> scanning.

>

> -- John.

>






More information about the Coco mailing list