f



Q: lossy text compression. real program?

hi,

is there a program that uses a lossy text compression?

i found some papers out there, but could not find a real
impmementation.

i more think about allowing errors in the text, so the person
that reads the text can deduct himself, like for typos.
i dont find the "semantic" approch, that replaces each word
by a simlilar shorter one from a thesaurus, promising.

so, has anyone a program?

tschau, towi.

0
towi
7/5/2003 5:35:33 PM
comp.compression 4696 articles. 0 followers. Post Follow

3 Replies
783 Views

Similar Articles

[PageSpeed] 54

Hi,

On Sat, 5 Jul 2003, towi wrote:
> i more think about allowing errors in the text, so the person
> that reads the text can deduct himself, like for typos.
> i dont find the "semantic" approch, that replaces each word
> by a simlilar shorter one from a thesaurus, promising.

	I have seen a couple of papers that looked at lossy compression of
HTML files and those were easily justifiable -- the arguement is to
produce something that the web browser does not notice.  Otherwise, I've
yet to see a paper which talks about lossy text compression (though, in
all honesty, I haven't looked very hard), as the idea hasn't been
accepted.  But the fact that you've found some papers proves that I
haven't searched very hard.

	I have seen one paper which used a thesaurus to achieve
compression, and based on how it is written and the paper acceptance date,
I'm quite certain it's just an April Fool's joke.  Not 100% certain, so
don't quote me on this.  ;)

Ray



1
Raymond
7/5/2003 6:47:58 PM
On Sun, 6 Jul 2003 04:47:58 +1000, Raymond Wan <rwan@cs.mu.oz.au>
wrote:
>	I have seen a couple of papers that looked at lossy compression of
>HTML files and those were easily justifiable -- the arguement is to
>produce something that the web browser does not notice.  Otherwise, I've
>yet to see a paper which talks about lossy text compression (though, in
>all honesty, I haven't looked very hard), as the idea hasn't been
>accepted.  But the fact that you've found some papers proves that I
>haven't searched very hard.

 Raymond, you don't remember where you saw these papers do you? I am
quite interested in this technology..
 Thanks!

Errol Smith
errol <at> ros (dot) com [period] au
0
Errol
7/6/2003 11:53:00 PM
On Mon, 7 Jul 2003, Errol Smith wrote:
> On Sun, 6 Jul 2003 04:47:58 +1000, Raymond Wan <rwan@cs.mu.oz.au>
> wrote:
> >	I have seen a couple of papers that looked at lossy compression of
> >HTML files and those were easily justifiable -- the arguement is to
> >produce something that the web browser does not notice.  Otherwise, I've
>  Raymond, you don't remember where you saw these papers do you? I am
> quite interested in this technology..

	One of them is mine, but it isn't very technical and more like a
framework.  I found the others through some painful searching --
basically, don't look at compression conferences/journals, which was my
mistake.  

	Here are some cites.  I'm afraid you'll have to look at what they
cite to get more.  In a nutshell, the first looked at lossy compression of
HTML tags (case-folding to lower case) and removal of extraneous white
space.  What I did was implemented their idea in a set of expts, so
nothing new.  The last cite should be about compression of Pascal source
code -- I'm pulling them out from memory so if I'm wrong, ask me to double
check.  And for source code compression, there is more structure and
Cameron dropped or changed comments as required, with the justification
being the source code being more important.  The programming style was
also changed as necessary. 

	Again, I've yet to see a program like GZIP, etc. which implements
lossy compression.  My guess is that it wouldn't sell well.  Good luck in
your search!

Ray

-----
@inproceedings{ngbpwl97:sigcomm,
  author =       "H. F. Nielsen and J. Gettys and A. Baird-{S}mith and 
                 E. {Prud'hommeaux} and H. W. Lie and C. Lilley",
  title =        "Network Performance Effects of {HTTP}/1.1, {CSS}1, and
                 {PNG}",
  pages =        "155--166",
  editor =       "M. Steenstrup",
  booktitle =    "Proc. {ACM} {SIGCOMM}'97 Conference :
                  Applications, Technologies, Architectures, and
                  Protocols for Computer Communication",
  month =        oct,
  series =       "Computer Communication Review",
  volume =       "27(4)",
  year =         1997,
}

@inproceedings{wm01:adc,
  author =    "R. Wan and A. Moffat",
  title =     "Effective Compression for the Web: Exploiting Document 
               Linkages",
  booktitle = "Proc. 12th Australasian Database Conference",
  year =      2001,
  editor =    "M. E. Orlowska and J. F. Roddick",
  pages =     "68--75",
  month =     feb,
}

@article{Cameron88:ieeeit,
  author =  "R. D. Cameron",
  title =   "Source Encoding Using Syntactic Information Source Models",
  journal = "{IEEE} Transactions on Information Theory",
  month =   jul,
  year =    1988,
  number =  4,
  volume =  34,
  pages =   "843--850",
}



0
Raymond
7/7/2003 12:41:12 AM
Reply: