[Coco] OT: stripping garbage from web pages

Lothan lothan at newsguy.com
Fri Nov 5 01:15:15 EST 2004


The tool of choice I use is HTML Tidy. I think you can download it from the
W3C website and it has a switch specifically to remove Microsoft-specific
tags and bloat, and it can reformat the HTML to create a very clean version.

-----Original Message-----
From: coco-bounces at maltedmedia.com [mailto:coco-bounces at maltedmedia.com] On
Behalf Of Brett K Heath
Sent: Wednesday, November 03, 2004 5:18 PM
To: CoCoList for Color Computer Enthusiasts
Subject: Re: [Coco] OT: stripping garbage from web pages



On Sat, 23 Oct 2004, Bob Devries wrote:

> Can anyone suggest a programme for windoze that will strip the unnecessary
> bloat from web pages created by M$Word? I find that Word puts in heaps of
> stuff that seems to be totally unrelated to the normal HTML code.

I feel your pain. I was once given the task of cleaning up a few pages
that had been generated by M$Word. In the end it was faster (and easier)
to cut and paste the ascii into a better editor and regenerate the rest
from scratch.

I ended up using Lyx (a LaTex front end, also freely available for
Windows) and converting to html with latex2html (a perl script). Lyx has
some fairly hefty prerequisites (LaTex, for example) but they are all
freely available and it is well documented and easy to use.

Not sure whether it would match your requirements (I was doing lot's of
math stuff) but it offers many handy facilities (like an automatically
generated TOC that can be used to navigate around the document while it's
being edited) and might be worth a look.

I don't have the URL's handy but google knows.

Brett K. Heath


-- 
Coco mailing list
Coco at maltedmedia.com
http://five.pairlist.net/mailman/listinfo/coco






More information about the Coco mailing list