[Coco] Resolution, size and usability
Dennis Bathory-Kitsz
dennis-ix at maltedmedia.com
Fri May 22 13:09:33 EDT 2009
Hi all,
I really hope this discussion is helpful as people work on archiving
what they have, and as they continue to make it available in the most
effective way -- for everybody. There was only one dismissive
comment, and it certainly doesn't reflect what most of us have been saying.
You know I rarely participate here, and only watch in the background
to make sure things are working smoothly.
This time, though, the discussion has been important to me because
since 1995 I have been a consultant in online accessibility. There
are lots of barriers to access, and in a case such as ours, an
effective solution between accuracy ('real' archiving) and usability
('library' archiving) is always welcome. Existing paper documents
provide an enormous challenge in this transitional phase between
low-speed, high-speed and the promised future internet (as well as
processing power and disk space, and the time needed for hands-on
re-archiving to long-term storage media).
Jeff's comments are interesting.
At 12:26 PM 5/22/2009, Jeff Teunissen <deek at d2dc.net> wrote:
>Except for choosing the white and black points and maybe the scanner's
>gamma curve, all of that stuff is ALMOST completely useless these
>days. A monochrome page of a given resolution compresses to pretty-much
>exactly the same size as a full-color page, and is often larger than
>the color scan because it can't be compressed as well (too much
>"redundant" information has been thrown away).
However one prefers to get rid of noise or whiten the background, I'm
all for that. I wanted to put it to the test and, beyond that, I was
curious about the compression differences Jeff mentioned. So I just
did a few test pages from the same source with a small quantity of
large text on a noisy page. (TIFF used here is uncompressed.)
Format order below is TIFF, ZIP, PDF and ZIPPED PDF:
Noisy page
24-bit color
35363K (100%) - 15910K (45%) - 13122 (37%) - 13133K (37%)
8-bit grayscale
11784K (33%) - 9774K (28%) - 9789K (28%) - 9780K (28%)
4-bit grayscale
5895K (17%) - 4389K (12%) - 7892K (22%) - 7885K (22%)
Cleaned (whitened) page
24-bit color
22557K (64%) - 1521K (4%) - 522K (1.5%) - 398K (1%)
8-bit grayscale
11783K (33%) - 942K (2.5%) - 938K (2.5%) - 935K (2.5%)
4-bit grayscale
5895K (17%) - 535K (1.5%) - 478K (1.5%) - 356K (1%)
As I'd mentioned in my earlier post, cleaning up the noise is the
most important factor in the compressable size. The color information
on a monochrome page turns out to be irrelevant -- thanks to Jeff for
teaching me that although the TIFFs are larger, the PDFs show
virtually no difference in size between the color setting and the bit depth.
If you look at many of the hundreds of documents scanned on our
maltedMedia site and others, you'll find that the handling of page
background noise is greatly responsible for ballooning document size.
My point is that documents can be archived for speedy downloads or
for accurate results. At my home (and in Bill's case) the documents
are archived for accuracy. The 'look' of the scanned page is exactly
the same as the 'look' of the paper document. That is a good thing --
with the one exception, and this is for the person downloading.
(Archiving visual accuracy is another topic, one of great concern to
those of us with artistic requirements.)
>Throwing away page content by reducing bit depth is MUCH worse for
>the content than lossy compression, because at least the lossy
>compression of J2K (or even JFIF) drops bits that the human brain
>doesn't readily notice, while reducing bit depth loses bits across
>the whole range at arbitrary cutoff points.
Yes. Given the choice, though, I'll take lower bit depth. My aging
eyes find the fuzziness of compression more effortful because I tend
to magnify documents to read them. I find the 4-bit grayscale easier
overall than the artifacts of lossy compression.
Dennis
More information about the Coco
mailing list