[Coco] Rainbow on Disc - OCR
adit at 1stconnect.com
Sun Jun 12 03:46:00 EDT 2005
>I personally think that including the text in this manner is a higher
>priority than making an all inclusive PDF. I am not saying we shouldn't
>do an all inclusive text enabled PDF, just that I think that it's lower on
>the list of priorities.
Well, I guess I'll put my 2 cents in here. Since I have older systems
(MacOS8.6-9.2) it should probably be taken with a grain of salt. I have
all 3 major OCR programs that were available on the Mac (pre system X)
and they all leave alot to be desired. These types of programs have
hopefully gotten better since then but I found that they are highly
sensitive to point size, font, skew and obviously, quality of scan. I'm
sure color will also throw in a whole new dynamic as well. Some of these
programs may also have a limit on the size of the DPI of the scan/tiff
(something to keep in mind.) Expect to have all kinds of recognition
problems with 'zero' and the letter O, i,l, and the number 1.
Interestingly, the sequence of letters 'in' together often OCR as 'm.'
The fun is neverending. This will be very fun to proof programs. I just
thought I'd mention this since everyone seems to think OCRing is going to
be this painless process. It'll probably be a nice thing to have, but
remember that it will take time and effort;Most likely more than
scanning/'PDFing', if you include the proofreading.
More information about the Coco