[Coco] RainbowArchive . The Rainbow Archive Project

Thu Jun 16 10:02:20 EDT 2005

Here is a reply from Lonnie when I asked him for hie thoughts in general
about the project, and if he would be able to provide any insight that
might be helpful...

  It depends on how you view the project. The main thing is to get it done
in a reasonable period of time. Every time you add an element to it, you
not only increase cost (whether it is volunteers' cost in time or actual
cost in money for having to buy something or pay someone) but you
increase the time it takes to have the finished product in hand.

  Let's just apply it to the OCR question. Certainly, having the whole
thing in machine-readable format would be great. But remember, when the
people who sell OmniPro say they have a 99 percent accuracy rate, that
means that one letter in every 100 is wrong. You not only have to fix
it, you have to find it first. May the Good Lord help the poor soul who
would have to proof those pages and pages of listings. We NEVER
rekeyboarded the listings, we just printed them out from the tapes or
disks we required the author send us. One character wrong out of every
100 in a DATA statement? I am betting the proofing process would double
the time in getting this project out. Is that acceptable? Is it that
important to that many people?

This reply and the constraints surrounding licensing led me to make the
statement that we should have one person step up and lead the OCR charge
and come up with an actual baseline we can use to determine how much work
will need to be done, how feasible it is, how much time would be needed,
etc.  Once that has been established we would have the data to determine
whether it's worth it.  Alot of people have been discussing OCR, some have
 said that without it the product would not a very attractive, so I
thought that I would immediately have a person step up and say "I'll prove
this will be workable", but so far that hasn't happened on this list.

If OCR is important enough someone should step up and volunteer to execute
a feasibility study and document the process and time it will take for all
aspects of the project.  There would be two main deliverables:

1. Text files of the entire text OCR'd
2. A searchable PDF file

I'll ask again - are there any takers willing to do the study?

Regards,
Michael Harwood

> Sure, something can be done. But constraints are constraints. They will
either affect the quality or the cost or timeliness of delivery of the
product.
>
> Anwyay, this does mean that we can't do the kind of OCR that's been
described so far. The only way I think that can work cost effectively is
if we distribute the work of doing corrections. say one issue for a
given person.
>
> At least assuming we want a good quality OCR, anyway. If it's good
enough to distribute a machine's first approximation of the text, then I
think OCR is doable, otherwise, I'm thinking it isn't.
>
> Here's the way OCR work on volunteer projects is usually done: one
person scans in the work. Then fragments are given to various people to
provide corrections. Usually it's a lot of people, since it's a lot of
work. In general though that's not a problem for the copyright holder
since no one has the whole work... everyone interested enough to
volunteer is going to come back and license a copy at the end of the
day. Perhaps Lonnie's concerns are more the legal issues of having a lot
of claims after the fact on the work. But there's no reason that stuff
can't be handled completely by the legal agreement.
>
> -- John.

-- 
"The best place to be is here,
 the best time to be is now."
             -- Bill and Ted