[Coco] Re: Rainbow on Disc

John R. Hogerhuis jhoger at pobox.com
Thu Jun 9 18:29:59 EDT 2005

On Thu, 2005-06-09 at 17:09 -0500, Shawn M. Hedgecorth wrote:
> I've been playing around with this a little bit today. You are right about
> having to re-typeset in order to be able to select text. Another way to do
> this is to scan the pages as TIF images to get them into Acrobat. Then OCR
> the articles that make sense to do so and save them as RTF. Within the PDF
> file, you can use hyperlinks to open the RTF files. I am proposing RTF
> because I think that it is pretty standard across platforms, and can look
> better than plain ASCII TXT files. I am open to other ideas. And maybe your
> idea of ASCII is better, so that you can use command line tools such as grep
> Shawn

I like the link idea assuming it's workable for whoever is building
PDFs. Maybe we could create a script for building the PDFs that can link
everything up pain free.

I'd prefer simple ascii for OCRed text since there are so many unix
tools that can efficiently deal with masses of raw text. I don't think
anyone will get around to doing much markup on the text anyway so I
don't know if RTF would buy us much. In any event text->RTF is always
possible, and vice versa so it's not too critical. But if we were going
with some markup format like RTF, might as well use DocBook or something
with parsers available to do the conversions to RTF, ascii, LaTeX, etc.
en masse.

-- John.

More information about the Coco mailing list