[Coco] Re: Rainbow on Disc - OCR

John R. Hogerhuis jhoger at pobox.com
Fri Jun 10 01:24:43 EDT 2005


On Thu, 2005-06-09 at 20:28 -0600, Michael Wayne Harwood wrote:

> I think that the decision to do full OCR should be made up front, and
> honestly I think that it's going to be such a huge amount of work in
> addition to what is already before us and should be out of scope for this
> project.
> 

Actually, I don't think the OCR is a risk.

Think of it this way: we have a bunch of scanned pages, I filter them to
black and white, and run OCR on them. This is a batch operation so it
doesn't take anyone much time.

Proofreading work can be done be people who don't even have a scanner,
so we have the possibility of bringing in many more volunteers. That
means we're even more scalable than the scanning work. So we'd probably
be done with OCR at about the same time as scanning work in general is
done, so no work would be delayed by it.

I really think it should be brought into scope, considering the clear
utility of such a resource (grepable Rainbow, cool...) and the fact that
it's not a hard thing to do (done it before on Thinking Forth).

Just raw ascii text.No doing a repub or anything seriously hard like
that.

There are probably a few of us who could take on this aspect of the
project if you want to split the production work between OCR and
scanning.

-- John.





More information about the Coco mailing list