[Coco] Help me digitize Color Computer Magazines

tim lindner tlindner at macmess.org
Sat Sep 6 13:06:21 EDT 2008


DarrenA said:

> I have another question Tim.
> 
> When article / filler text is formatted in multiple columns, should a
> separate box be placed on each column to guide the OCR process?  I ask
> this because I just got a task which asked me to check some OCR text.
> The scanned image showed two columns of text and each line of the
> OCR'd text combined the two columns. This would have been a lot of
> work to cleanup so I bypassed it.

Article text should be boxed by the column. If someone put's a box
encompasing two or more columns of article text, then that is a mistake.

For filler, 95% of the time they are just a single collumn. So I wrote
the system to only accept a single box for filler. We'll see if this
becomes a problem.

> Perhaps there should be an option to allow the scanned image to be
> broken down further. Another option could be to reject the OCR'd text
> completely and force the page to be placed back in the queue so that
> the Find Text tasks can be performed again.

I plan (pretty soon, in fact) to add a "Problem!" button to each page
where a user can tell me that something doesn't seems correct.

The task will be taken out of the queue and I'll be notified.

-- 
tim lindner
tlindner at macmess.org                                              Bright



More information about the Coco mailing list