On Mon, 7 Jul 2003, Errol Smith wrote:
> On Sun, 6 Jul 2003 04:47:58 +1000, Raymond Wan <rwan@cs.mu.oz.au>
> wrote:
> > I have seen a couple of papers that looked at lossy compression of
> >HTML files and those were easily justifiable -- the arguement is to
> >produce something that the web browser does not notice. Otherwise, I've
> Raymond, you don't remember where you saw these papers do you? I am
> quite interested in this technology..
One of them is mine, but it isn't very technical and more like a
framework. I found the others through some painful searching --
basically, don't look at compression conferences/journals, which was my
mistake.
Here are some cites. I'm afraid you'll have to look at what they
cite to get more. In a nutshell, the first looked at lossy compression of
HTML tags (case-folding to lower case) and removal of extraneous white
space. What I did was implemented their idea in a set of expts, so
nothing new. The last cite should be about compression of Pascal source
code -- I'm pulling them out from memory so if I'm wrong, ask me to double
check. And for source code compression, there is more structure and
Cameron dropped or changed comments as required, with the justification
being the source code being more important. The programming style was
also changed as necessary.
Again, I've yet to see a program like GZIP, etc. which implements
lossy compression. My guess is that it wouldn't sell well. Good luck in
your search!
Ray
-----
@inproceedings{ngbpwl97:sigcomm,
author = "H. F. Nielsen and J. Gettys and A. Baird-{S}mith and
E. {Prud'hommeaux} and H. W. Lie and C. Lilley",
title = "Network Performance Effects of {HTTP}/1.1, {CSS}1, and
{PNG}",
pages = "155--166",
editor = "M. Steenstrup",
booktitle = "Proc. {ACM} {SIGCOMM}'97 Conference :
Applications, Technologies, Architectures, and
Protocols for Computer Communication",
month = oct,
series = "Computer Communication Review",
volume = "27(4)",
year = 1997,
}
@inproceedings{wm01:adc,
author = "R. Wan and A. Moffat",
title = "Effective Compression for the Web: Exploiting Document
Linkages",
booktitle = "Proc. 12th Australasian Database Conference",
year = 2001,
editor = "M. E. Orlowska and J. F. Roddick",
pages = "68--75",
month = feb,
}
@article{Cameron88:ieeeit,
author = "R. D. Cameron",
title = "Source Encoding Using Syntactic Information Source Models",
journal = "{IEEE} Transactions on Information Theory",
month = jul,
year = 1988,
number = 4,
volume = 34,
pages = "843--850",
}