[Coco] RainbowArchive . The Rainbow Archive Project

John R. Hogerhuis jhoger at pobox.com
Thu Jun 16 13:39:11 EDT 2005


Given that we can't add (a lot) more people, all you have to think about
is whether a software-only OCR would be sufficient. Personally, I think
it would add significant value. I volunteer to do such an OCR over all
the volumes... but no more than that unless (a lot) more volunteers can
be added. As to a study, we already have lots of examples of that. All
of these tools work reasonably well. Having used it on Thinking Forth, I
believe I can let you see the OCR work done on that using Transym OCR,
before it was cleaned up. Useful, but certainly not readable on its own.

My (informed) opinion, having done this before, is that a accurate OCR
without, say, one volunteer per issue do cleanup is simply not possible.

So there is not really a need for any study of an accurate OCR... no
point in volunteering to lead a study I already know the answer to. The
problem is the constraint of not being able to add enough volunteers to
do it. Volunteers ready to proofread are available. If you want to ask a
question, ask how many people are willing to proofread a given issue. If
you rather ask how many individuals are willing to completely proofread
War and Peace with random OCR errors, I don't think you'll get a lot of
takers.

Given that we can't add (a lot) more people, all you have to think about
is whether a software-only OCR would be sufficient. Personally, I think
it would add significant value. I volunteer to do such an first-pass OCR
over all the volumes... but no more than that unless (a lot) more
volunteers can be added.

-- John.





More information about the Coco mailing list