[Coco] Help me digitize Color Computer Magazines
tim lindner
tlindner at macmess.org
Sat Sep 6 13:06:21 EDT 2008
DarrenA said:
> I have another question Tim.
>
> When article / filler text is formatted in multiple columns, should a
> separate box be placed on each column to guide the OCR process? I ask
> this because I just got a task which asked me to check some OCR text.
> The scanned image showed two columns of text and each line of the
> OCR'd text combined the two columns. This would have been a lot of
> work to cleanup so I bypassed it.
Article text should be boxed by the column. If someone put's a box
encompasing two or more columns of article text, then that is a mistake.
For filler, 95% of the time they are just a single collumn. So I wrote
the system to only accept a single box for filler. We'll see if this
becomes a problem.
> Perhaps there should be an option to allow the scanned image to be
> broken down further. Another option could be to reject the OCR'd text
> completely and force the page to be placed back in the queue so that
> the Find Text tasks can be performed again.
I plan (pretty soon, in fact) to add a "Problem!" button to each page
where a user can tell me that something doesn't seems correct.
The task will be taken out of the queue and I'll be notified.
--
tim lindner
tlindner at macmess.org Bright
More information about the Coco
mailing list