this is similar to an idea I once had when first messing with arithmetic
coding, but has not popped up again (as a possibly implementable idea). I am
not sure if there is any originality in this.
rough idea:
instead of continuously encoding/decoding symbols, updating a range, and
possibly updating a model, the entire process is done "at once" with
standalone "word values". when the range gets below a certain minimum,
effectively the whole range is written out and then reset for the next few
symbols.
in effect, since each possible word value is independent of previous values,
one can pull off encoding/decoding by viewing the whole set of possible
values to mapping to a number of possible strings.
encoding could thus work by matching the longest string (though using the
original encoder idea is likely to be faster).
decoding would consist of reading the value, fetching the appropriate
string, and writing it to the output. with luck, this may be comprable to
table-based huffman approaches.
this is unlikely to work well with an adaptive model. either the model will
need to be transmitted with the data, or be rebuilt periodically.
this is likely to have some overhead related to how the range is managed. a
larger range is likely to have less overhead, but will cost in terms of the
decode table size (or, in exceeding available memory). each value could
probably be as small as 12 bits or as large as 24, however, 16 or 24 make
the most sense as they fall nicely on byte boundaries. 24 would cost a great
deal of memory, and rebuilding the table is likely to be quite expensive, so
16 makes a lot more sense.
alternatively, a bigger range may be possible if it is decoded in steps, or
only decoded partly. however, I don't think there is much reason to do this
(speed costs).
a little fudging may allow reducing the overhead, for example, it could be
ensured that it is allways possible to encode a "terminal" symbol, rather
than requiring that all possible symbols can be encoded all the time. if a
value can't be encoded in the given range, the terminal is encoded instead
and the symbol is encoded after the range is reset. likely, the weight of
said terminal will be special, eg, in that it will always be 1 (having a
weight like other symbols does not make sense, as it will only be encoded
when there is not enough weight for another symbol). as an alternative
though, the terminal would not be required in the case where, after encoding
a given symbol, it would not be possible to encode any symbols.
the difficulty (speed wise) would be efficiently rebuilding the tables, but
this could probably be worked out by trying it.
any thoughts?
|
|
0
|
|
|
|
Reply
|
cr88192499 (303)
|
10/27/2005 2:49:54 AM |
|
Here is my naiya paisa idea,
Your idea is perfectly feasible if the estimated code range required
for encoding is only a small part of the long working region. IF so
,then one can always restart the arithmetic codec periodically. The
restart will cost you at most 2 digits for previous encoded string
termination.
for a binary alphabet one can take the minimum
distributable range as 2 (as in fpaq0 !).
for a input alphabet of size n obviously the minimum exepected range
to distribute >=n.
As a Patentable(not now, as ,it is revealed in this post) secret
idea, forget the multiplicaions and divisons provided by the top of the
profit cpus . Have a table driven approx . multiplier and divider.(they
need not be exact.) Only ensure that the decoder also uses the same
tables.
That will provide the range distributer speed probably rivalling
huffman codecs.
Mahesh Naik
|
|
0
|
|
|
|
Reply
|
Ignorant
|
10/27/2005 10:10:27 AM
|
|
"Ignorant" <shunya@vsnl.com> wrote in message
news:1130407827.875297.174880@g43g2000cwa.googlegroups.com...
> Here is my naiya paisa idea,
>
> Your idea is perfectly feasible if the estimated code range required
> for encoding is only a small part of the long working region. IF so
> ,then one can always restart the arithmetic codec periodically. The
> restart will cost you at most 2 digits for previous encoded string
> termination.
no, my idea was to have the entire range be, eg, in 16 bit or so chunks. the
idea is to cram whatever is possible in those 16 bits, and not worrying
about the rest.
as imagined, the overhead should only really be a problem in rather
sub-optimal cases, where likely it will inflate a bit more than normal
arithmetic coding, but oh well. in more optimal cases, this should be a lot
less of a problem.
likely, managing the range will be a little more of a hassle than in a
normal coder (one has to be a lot more careful that the space is utilized
efficiently). the actual management of weights and ranges is likely to be
somewhat different (no "minimum range" or "overflow handling" per se,
instead, what can be rammed in each box needs to be such).
the advantage, however, is that I am imagining possible very fast decoding.
as a general rule, the range will be ignored completely during decoding (no
multiplies, no divides, only indexing and copying).
eg:
void decode(ushort *in, byte *out, byte **tab)
{
byte *s, l;
while(*in)
{
s=tab[*in++];
l=*s++;
while(l--)*out++=*s++;
}
}
decoding should be very fast (the cost is rebuiling the table, but this only
has to be done once in a great while).
current idea is that rebuilding the table would be a recursive (attempting
to build codes for whatever will still fit). this will be like a tweaked
out, exploratory arithmatic coder (actually, it would be vaguely similar to
how I built the index tables for my huffman decoder).
now, this does somewhat constrain the design of the algo. I can't use much
larger ranges, as then I can't decode like this (say, 32 bits, can't be
indexed like this, much less for a bigger 'box').
of course, seeing how well this works will require implementing it and
seeing first-hand. I may get to this...
> for a binary alphabet one can take the minimum
> distributable range as 2 (as in fpaq0 !).
> for a input alphabet of size n obviously the minimum exepected range
> to distribute >=n.
> As a Patentable(not now, as ,it is revealed in this post) secret
> idea, forget the multiplicaions and divisons provided by the top of the
> profit cpus . Have a table driven approx . multiplier and divider.(they
> need not be exact.) Only ensure that the decoder also uses the same
> tables.
> That will provide the range distributer speed probably rivalling
> huffman codecs.
> Mahesh Naik
>
no, this will be string based...
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/27/2005 11:01:22 AM
|
|
Range coder(it can be called by any other name...) works by
distributing the available
range among the possible input symbols. It is compression only when the
range share
of a symbol is in proportion to its probability .
For a long working region , you may , work without "Normalisation"
till the remaining coderange (that is high-low+1) is >=N. this just
effects the share distribution being less fair as the trange to
distribute becomes less and less. At some point you will have to force
a "carry" resolution using Rubins rule. This carry is internal , within
the working region.
For a binary input alphabet range to distribute >=2 is enough. For
larger input alphabet
one has to force a similar internal carry resolution so the msb digits
in high and low are
reduced to being identical . If you have a wide enough working region
(e.g.width of low/high) you can do range distribution till you assure
that each symbol gets a share of at least one code. You may normalise
at any point before range becomes less than N.
At that point if no digits are yet resolved because of a virtual queue
within the working
region then you have to use Rubin's principle...ie. code a virtual
symbol from a binary alphabet and force the MSB digits to be same.[You
may force more digits to be same by
different methods. the easiest is by having high=low and all digits go
out.]
please post your codec (or email me ..) after you give it a go.
Mahesh Naik
ps: the whole idea of range coding is to distribute the available
coding range among the
input symbols(step dependent) in ratio of their expected
probailities.You can do what you
are trying but the share of the symbols as the distributable range
decreases with each
symbol coded becomes less and less proportinate to the symbol
probabilities. You cant do worse than allocating each input symbol a
share of only 1 code[in spite of a skewed distribution].You can still
decode but shannon would turn in his resting place. At some point you
will have to force a carry resolution by whatever method suits the
codec.In your
case the virtual queue (charles bloom's terminology) will be within the
wide working region.
|
|
0
|
|
|
|
Reply
|
Ignorant
|
10/27/2005 1:55:19 PM
|
|
"Ignorant" <shunya@vsnl.com> wrote in message
news:1130421319.823871.42130@g49g2000cwa.googlegroups.com...
>
> Range coder(it can be called by any other name...) works by
> distributing the available
> range among the possible input symbols. It is compression only when the
> range share
> of a symbol is in proportion to its probability .
> For a long working region , you may , work without "Normalisation"
> till the remaining coderange (that is high-low+1) is >=N. this just
> effects the share distribution being less fair as the trange to
> distribute becomes less and less. At some point you will have to force
> a "carry" resolution using Rubins rule. This carry is internal , within
> the working region.
> For a binary input alphabet range to distribute >=2 is enough. For
> larger input alphabet
> one has to force a similar internal carry resolution so the msb digits
> in high and low are
> reduced to being identical . If you have a wide enough working region
> (e.g.width of low/high) you can do range distribution till you assure
> that each symbol gets a share of at least one code. You may normalise
> at any point before range becomes less than N.
> At that point if no digits are yet resolved because of a virtual queue
> within the working
> region then you have to use Rubin's principle...ie. code a virtual
> symbol from a binary alphabet and force the MSB digits to be same.[You
> may force more digits to be same by
> different methods. the easiest is by having high=low and all digits go
> out.]
> please post your codec (or email me ..) after you give it a go.
I guess I can give a copy after I try it.
I know how arithmetic and range coding works, just, managing those ranges,
all those multiplies, and shift, ... are not free.
I want something "fast" (eg: hoping for decode speeds on the order of
hundreds of MB per second or so, or at least >100). I am very far from this
in anything I have done thus far (15MB/s decode, pitiful, may as well just
use my other algo that pulls off 6MB/s...).
this idea may allow pulling this off, given how little the decoder actually
does (no multiplies or shifts within the actual decoder loop). first off
though, I have to verify it actually works. there is a high probability that
it wont compress worth a damn given the overly small ranges.
it hinges on how effectively the box-packing works in general.
if the boxes were say, 64 or 256 bits, I could be much more confident that
compression would be good, but there is no way to decode this using table
lookups afaik.
oh well, I have some time now, so I may try this.
boxes:
I was considering that the lowest value in the range, eg:
min when range!=0, will be the escape code.
as noted previously, an escape in an empty box would thus be a literal value
of 0, and this could be used as an "end of encoded data" marker.
after each symbol is encoded, the range shrinks, and progressively lower
probability symbols drop out of the possibility range.
A 1/2 (M: 0000, R: 7FFF)
B 1/4 (M: 8000, R: 3FFF)
C 1/8 (M: C000, R: 1FFF)
D 1/16 (M: E000, R: 0FFF)
E+ (M: F000, R: ...)
rng=(rng*r)>>16;
min=((rng*b)>>16)+1;
AAAAAAAAAAAAAAA...
M: 0000 R: FFFF (empty box, all possible)
M: 0001 R: 7FFE (all possible)
M: 0002 R: 3FFE (all possible)
....
M: 000B R: 001E (escape, A, B, C, and D possible)
M: 000C R: 000E (escape, A, B, and C possible)
M: 000D R: 0006 (escape, A, and B possible)
M: 000E R: 0002 (escape and A possible)
M: 000F R: 0000 (box full)
thus, '000F' refers to a string of 15 A's in this model (values <15 will
thus refer to shorter runs of A).
but, yes, experimentation should reveal more.
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/27/2005 11:13:48 PM
|
|
first results:
not supprisingly, the compression ratio is a bit lame.
also not supprisingly, the decoder is quite fast.
currently, it pulls off about 170MB/s with -O3 for some text, and about
146MB/s with the gimp tarball.
decode speeds are closer to around 66MB/s for debug options.
the speed does not include building the decode table, which takes about
25-35ms (insignifigant for the gimp tarball, but quite signifigant for the
text file).
at -O3 encoding pulls off about 74MB/s (some text). encoding pulls off about
86MB/s for the gimp tarball.
ratios are fairly poor.
text file: 663043
this algo: 474902 (71.63%)
vitter: 387435 (58.43%)
fpaq0: 383760 (57.88%)
for the gimp tarball it was worse. my algo offered virtually no compression
(static for the whole file), fpaq0 pulled off 68.95%.
the poor ratio is likely due to the small box size (16 bits). at 24 or 32
bits it would likely do a lot better, but this would require a lot more
memory (and make rebuilding the table be a lot slower).
actually, a quick test seems to show that making the box bigger actually
makes the ratio worse. mystery...
I guess maybe it is workable, decent speed, crap ratio...
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/28/2005 6:08:41 AM
|
|
cr88192 wrote:
> void decode(ushort *in, byte *out, byte **tab)
> {
> byte *s, l;
>
> while(*in)
> {
> s=tab[*in++];
> l=*s++;
> while(l--)*out++=*s++;
> }
> }
This is LZW with a predefined table. The compression you get will
depend on how well you optimize the dictionary during compression.
Gather counts for all strings up to a certain length, compute the
savings for each and keep the best ones in an oversize table. Then
make another pass because the parse with this table will change the
counts, then weed out the losers until the table is the size you want.
Doing this well takes lots of time and memory. There will be the usual
3 way tradeoff between speed, memory, and compression. Of course there
is no need to have any two symbols map to the same string, which seems
to be what you are suggesting.
A 64K table is rather large thing to transmit. It might actually be
faster to use a 4K table (12 bit symbols) so that it all fits in cache.
If you want raw speed, even 8-bit symbols should give you some
compression. Experiment.
-- Matt Mahoney
|
|
0
|
|
|
|
Reply
|
Matt
|
10/28/2005 10:17:09 PM
|
|
"Matt Mahoney" <matmahoney@yahoo.com> wrote in message
news:1130537829.498929.13830@g14g2000cwa.googlegroups.com...
> cr88192 wrote:
>> void decode(ushort *in, byte *out, byte **tab)
>> {
>> byte *s, l;
>>
>> while(*in)
>> {
>> s=tab[*in++];
>> l=*s++;
>> while(l--)*out++=*s++;
>> }
>> }
>
> This is LZW with a predefined table. The compression you get will
> depend on how well you optimize the dictionary during compression.
> Gather counts for all strings up to a certain length, compute the
> savings for each and keep the best ones in an oversize table. Then
> make another pass because the parse with this table will change the
> counts, then weed out the losers until the table is the size you want.
> Doing this well takes lots of time and memory. There will be the usual
> 3 way tradeoff between speed, memory, and compression. Of course there
> is no need to have any two symbols map to the same string, which seems
> to be what you are suggesting.
>
> A 64K table is rather large thing to transmit. It might actually be
> faster to use a 4K table (12 bit symbols) so that it all fits in cache.
> If you want raw speed, even 8-bit symbols should give you some
> compression. Experiment.
>
well, this is where the arithmetic coder part comes in. all I send is the
order-0 probabilities, and the dictionaries are built using these
probabilities and a variation of arithmetic coding to "pack symbols into the
boxes".
I do not send the table itself.
I am thinking order-1 probabilities would probably do better, but then
sending the counts would become signifigant. likely, another coding would be
needed for this (potentially the order-0 variety, or at least good rle).
void gentab_r(ushort *rb, ushort *rr,
byte **tab, byte **strs, ushort m, ushort r,
byte *str, int len)
{
ushort b, sr;
int i;
**strs=len;
memcpy(*strs+1, str, len);
for(i=m; i<=(m+r); i++)tab[i]=*strs;
*strs=*strs+len+1;
for(i=0; i<512; i++)
{
b=((uint)rb[i]*r)>>16;
sr=((uint)rr[i]*r)>>16;
if(!sr)continue; //not possible
str[len]=i;
gentab_r(rb, rr, tab, strs, m+b+1, sr, str, len+1);
}
}
void gentab(ushort *rb, ushort *rr, byte **tab, byte *strs)
{
byte buf[256], *t;
int i;
t=strs;
gentab_r(rb, rr, tab, &t, 0x0000, 0xFFFF, buf, 0);
printf("gt %d\n", t-strs);
}
here is the encoder:
int encbuf(byte *ibuf, byte *obuf, int sz, ushort *rb, ushort *rr)
{
byte *cs, *ce, *ct;
uint m, n, r;
ushort b, sr;
m=0x0000; r=0xFFFF;
cs=ibuf; ce=cs+sz; ct=obuf;
while(cs<ce)
{
b=((uint)rb[*cs]*r)>>16;
sr=((uint)rr[*cs]*r)>>16;
if(!sr)
{
*ct++=m&0xFF;
*ct++=m>>8;
m=0x0000; r=0xFFFF;
continue;
}
m=m+b+1;
r=sr;
cs++;
}
*ct++=m&0xFF;
*ct++=m>>8;
*ct++=0;
*ct++=0;
return(ct-obuf);
}
> -- Matt Mahoney
>
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/28/2005 10:41:21 PM
|
|
this is not quite "complete" in that it doesn't actually generate any
output, but more just tests encoding and decoding in memory.
it could be useful though if anyone wants to mess with the algo or such.
note that counts are normalized at 223. this is partly because I had
considered sending the statistics with an rle coding scheme where 224..255
are used for rle runs. the exact weight doesn't matter that much (the
overhead is in the encoding itself).
0..223: sent as is
226..255: rle run (following the repeated value) of 2..31 values
224: rle run (followed by a byte length)
225: (possible) longer rle run (16 bit lenth), could possibly be useful if
algo was adapted for order-1 statistics.
one vaguely interesting property is that usually messing up decoding only
really effects a few bytes (though possibly effecting message length). it
depends probably on other factors though.
well, if anyone cares, here is the source:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#if defined(__CYGWIN__) || defined(__linux__)
#define TIMEOFDAY
#define CLOCKHZ 1000
#include <sys/time.h>
#else
#include <time.h>
#define CLOCKHZ CLOCKS_PER_SEC
#endif
typedef unsigned char byte;
typedef unsigned short ushort;
typedef unsigned int uint;
typedef unsigned long long ull;
//generate and normalize statistics
void statbuf(byte *buf, int sz, int *stat)
{
int i, j, m;
for(i=0; i<512; i++)stat[i]=0;
while(sz--)stat[*buf++]++;
m=0;
for(i=0; i<512; i++)
if(stat[i]>m)m=stat[i];
if(!m)return;
for(i=0; i<512; i++)
{
if(!stat[i])continue;
j=(223*stat[i])/m;
if(!j)j++;
stat[i]=j;
}
}
void genweights(int *stat, ushort *rb, ushort *rr)
{
int i, j, k, t, s;
t=0;
for(i=0; i<512; i++)t+=stat[i];
j=0;
for(i=0; i<512; i++)
{
k=(stat[i]<<16)/t;
rb[i]=j;
rr[i]=k;
j+=k;
}
}
void gentab_r(ushort *rb, ushort *rr,
byte **tab, byte **strs, ushort m, ushort r,
byte *str, int len)
{
ushort b, sr;
int i;
**strs=len;
memcpy(*strs+1, str, len);
for(i=m; i<=(m+r); i++)tab[i]=*strs;
*strs=*strs+len+1;
for(i=0; i<512; i++)
{
b=((uint)rb[i]*r)>>16;
sr=((uint)rr[i]*r)>>16;
if(!sr)continue; //not possible
str[len]=i;
gentab_r(rb, rr, tab, strs, m+b+1, sr, str, len+1);
}
}
void gentab(ushort *rb, ushort *rr, byte **tab, byte *strs)
{
byte buf[256], *t;
int i;
t=strs;
gentab_r(rb, rr, tab, &t, 0x0000, 0xFFFF, buf, 0);
printf("gt %d\n", t-strs);
}
int encbuf(byte *ibuf, byte *obuf, int sz, ushort *rb, ushort *rr)
{
byte *cs, *ce, *ct;
ushort m, n, r;
ushort b, sr;
m=0x0000; r=0xFFFF;
cs=ibuf; ce=cs+sz; ct=obuf;
while(cs<ce)
{
b=((uint)rb[*cs]*r)>>16;
sr=((uint)rr[*cs]*r)>>16;
if(!sr) //symbol wont fit
{
*ct++=m&0xFF;
*ct++=m>>8;
m=0x0000; r=0xFFFF;
continue;
}
m=m+b+1;
r=sr;
cs++;
}
*ct++=m&0xFF;
*ct++=m>>8;
*ct++=0;
*ct++=0;
return(ct-obuf);
}
int decbuf(byte *ibuf, byte *obuf, int sz, byte **tab)
{
byte *cs, *ce, *ct, *s, *t;
int i, j, k;
k=0;
cs=ibuf; ce=cs+sz; ct=obuf;
while(cs<ce)
{
i=cs[0]+(cs[1]<<8);
cs+=2;
s=tab[i];
j=*s++;
// if(k<4)
// printf("%04X %d\n", i, j);
k++;
while(j--)
{
// if(k<8)printf("%02X %02X\n", *s, *ct);
if((*s)!=(*ct))break;
s++; ct++;
}
if(j>0)
{
printf("fail at %d\n", ct-obuf);
break;
}
}
return(ct-obuf);
}
//int decbuf2(byte *ibuf, byte *obuf, int sz, byte **tab);
#if 1
int decbuf2(byte *ibuf, byte *obuf, int sz, byte **tab)
{
byte *cs, *ce, *ct, *s;
int i, j;
cs=ibuf; ce=cs+sz; ct=obuf;
while(cs<ce)
{
i=cs[0]+(cs[1]<<8);
cs+=2;
s=tab[i];
j=*s++;
while(j--)*ct++=*s++;
}
return(ct-obuf);
}
#endif
int bgb_clock()
{
#ifdef TIMEOFDAY
struct timeval tp;
static int secbase;
gettimeofday(&tp, NULL);
if(!secbase)secbase=tp.tv_sec;
return(((tp.tv_sec-secbase)*1000)+tp.tv_usec/1000);
#endif
return(clock());
}
int fail(char *msg)
{
printf(msg);
exit(-1);
}
byte *readfile(char *name, int *rsz)
{
FILE *fd;
byte *buf;
int i;
fd=fopen(name, "rb");
if(!fd)fail("can't open input\n");
fseek(fd, 0, 2);
i=ftell(fd);
fseek(fd, 0, 0);
buf=malloc(i);
fread(buf, 1, i, fd);
fclose(fd);
*rsz=i;
return(buf);
}
int writefile(char *name, byte *buf, int sz)
{
FILE *fd;
fd=fopen(name, "wb");
if(!fd)fail("can't open output\n");
fwrite(buf, 1, sz, fd);
fclose(fd);
return(0);
}
int main(int argc, char *argv[])
{
int st[512];
ushort bt[512], rt[512];
byte *ibuf, *obuf;
byte **dt, *dtb;
int i, j, k, sz, t;
float ft;
ibuf=readfile(argv[1], &sz);
statbuf(ibuf, sz, st);
#if 0
for(i=0; i<16; i++)
{
for(j=0; j<16; j++)
printf("%3d ", st[i*16+j]);
printf("\n");
}
j=0;
for(i=0; i<512; i++)j+=st[i];
printf("%d\n", j);
#endif
genweights(st, bt, rt);
#if 0
for(i=0; i<32; i++)
{
for(j=0; j<8; j++)
{
if(j)fputc(' ', stdout);
printf("%04X:%04X", bt[i*8+j], rt[i*8+j]);
}
printf("\n");
}
#endif
obuf=malloc(sz*2);
dt=malloc(65536*sizeof(byte *));
dtb=malloc(65536*4);
t=bgb_clock();
i=encbuf(ibuf, obuf, sz, bt, rt);
t=bgb_clock()-t;
ft=t/(float)CLOCKHZ;
printf("enc %fs %fkB/s %d\n", ft, sz/(ft*1000), t);
printf("%d->%d\n", sz, i);
t=bgb_clock();
gentab(bt, rt, dt, dtb);
t=bgb_clock()-t;
ft=t/(float)CLOCKHZ;
printf("gentab %fs %d\n", ft, t);
t=bgb_clock();
j=decbuf2(obuf, ibuf, i, dt);
t=bgb_clock()-t;
ft=t/(float)CLOCKHZ;
printf("dec %fs %fkB/s %d\n", ft, sz/(ft*1000), t);
printf("%d->%d\n", sz, j);
}
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/28/2005 11:16:14 PM
|
|
cr88192 wrote:
> "Matt Mahoney" <matmahoney@yahoo.com> wrote in message
> news:1130537829.498929.13830@g14g2000cwa.googlegroups.com...
> > cr88192 wrote:
> >> void decode(ushort *in, byte *out, byte **tab)
> >> {
> >> byte *s, l;
> >>
> >> while(*in)
> >> {
> >> s=tab[*in++];
> >> l=*s++;
> >> while(l--)*out++=*s++;
> >> }
> >> }
> >
> > This is LZW with a predefined table. The compression you get will
> > depend on how well you optimize the dictionary during compression.
> > Gather counts for all strings up to a certain length, compute the
> > savings for each and keep the best ones in an oversize table. Then
> > make another pass because the parse with this table will change the
> > counts, then weed out the losers until the table is the size you want.
> > Doing this well takes lots of time and memory. There will be the usual
> > 3 way tradeoff between speed, memory, and compression. Of course there
> > is no need to have any two symbols map to the same string, which seems
> > to be what you are suggesting.
> >
> > A 64K table is rather large thing to transmit. It might actually be
> > faster to use a 4K table (12 bit symbols) so that it all fits in cache.
> > If you want raw speed, even 8-bit symbols should give you some
> > compression. Experiment.
> >
>
> well, this is where the arithmetic coder part comes in. all I send is the
> order-0 probabilities, and the dictionaries are built using these
> probabilities and a variation of arithmetic coding to "pack symbols into the
> boxes".
>
> I do not send the table itself.
OK, I understand. You might get better compression on the gimp tarball
if you break it up into blocks of text and binary and compress them
separately.
> I am thinking order-1 probabilities would probably do better, but then
> sending the counts would become signifigant. likely, another coding would be
> needed for this (potentially the order-0 variety, or at least good rle).
That would help. Maybe you could have the decompressor collect
statistics and periodically rebuild the table. Of course that would be
slower.
-- Matt Mahoney
|
|
0
|
|
|
|
Reply
|
Matt
|
10/29/2005 12:10:35 AM
|
|
"Matt Mahoney" <matmahoney@yahoo.com> wrote in message
news:1130544635.061980.185760@o13g2000cwo.googlegroups.com...
> cr88192 wrote:
>> "Matt Mahoney" <matmahoney@yahoo.com> wrote in message
>> news:1130537829.498929.13830@g14g2000cwa.googlegroups.com...
>> > cr88192 wrote:
>> >> void decode(ushort *in, byte *out, byte **tab)
>> >> {
>> >> byte *s, l;
>> >>
>> >> while(*in)
>> >> {
>> >> s=tab[*in++];
>> >> l=*s++;
>> >> while(l--)*out++=*s++;
>> >> }
>> >> }
>> >
>> > This is LZW with a predefined table. The compression you get will
>> > depend on how well you optimize the dictionary during compression.
>> > Gather counts for all strings up to a certain length, compute the
>> > savings for each and keep the best ones in an oversize table. Then
>> > make another pass because the parse with this table will change the
>> > counts, then weed out the losers until the table is the size you want.
>> > Doing this well takes lots of time and memory. There will be the usual
>> > 3 way tradeoff between speed, memory, and compression. Of course there
>> > is no need to have any two symbols map to the same string, which seems
>> > to be what you are suggesting.
>> >
>> > A 64K table is rather large thing to transmit. It might actually be
>> > faster to use a 4K table (12 bit symbols) so that it all fits in cache.
>> > If you want raw speed, even 8-bit symbols should give you some
>> > compression. Experiment.
>> >
>>
>> well, this is where the arithmetic coder part comes in. all I send is the
>> order-0 probabilities, and the dictionaries are built using these
>> probabilities and a variation of arithmetic coding to "pack symbols into
>> the
>> boxes".
>>
>> I do not send the table itself.
>
> OK, I understand. You might get better compression on the gimp tarball
> if you break it up into blocks of text and binary and compress them
> separately.
>
yes, or at least partition on regular boundaries probably...
>> I am thinking order-1 probabilities would probably do better, but then
>> sending the counts would become signifigant. likely, another coding would
>> be
>> needed for this (potentially the order-0 variety, or at least good rle).
>
> That would help. Maybe you could have the decompressor collect
> statistics and periodically rebuild the table. Of course that would be
> slower.
>
yeah, the problem would be that, potentially, the statistics table could be
around 64kB or so (ok, maybe a bit less, because many combinations of
symbols are unlikely to occur). this would likely imply encoding rather
large chunks to try to avoid this cost.
however, I was not aiming for large file compression (this time), the data
in question may not be large enough to pay for the statistics table...
I don't know.
> -- Matt Mahoney
>
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/29/2005 4:31:27 AM
|
|
timings for fpaq0 /vitter ..?.
I have played with a algorithm called predator/predictor ...which is
extremely fast
and can work with the extreme approx. range coding of fapq0. The c
(that ancient language) source is at Charles Bloom's site. Decent speed
and acceptable compression.
If I try to temper that codec i will post the results of the tries.
I have this idea: In lossless codecs , the output tokens still carry
the same entropy.
[ Hidden within the modelling structures or the distribution of output
tokens]. So,
currently i am putting my centrino cpu to good use by trying to have a
fast backend
for lzrw1 to find a codec which will work on the transformed entropy
from pass 1 RW
and have a decent speed.
Mahesh Naik
|
|
0
|
|
|
|
Reply
|
Ignorant
|
10/29/2005 5:04:37 AM
|
|
"Ignorant" <shunya@vsnl.com> wrote in message
news:1130562277.415362.151310@g49g2000cwa.googlegroups.com...
>
> timings for fpaq0 /vitter ..?.
fpaq0, encode: 2.03MB/s (78.75MB in 38.72s)
fpaq0, decode: 1.71MB/s (78.75MB in 45.98s)
vitter, encode: 1.49MB/s (78.75MB in 52.72s)
vitter, decode: 11.50MB/s (78.75MB in 6.85s)
as can be noted, my algo is drastically faster than those above, but gives
signifigantly worse compression (the ratio was unexpectedly poor). likewise,
rebuilding the decode table takes long enough to somewhat effect the total
decode time for non-large files.
a possible idea could be to make the box size be 12 bits instead, and see
what all this does (less things fit per box, but the cost per box of having
things not fit is less). I may try this.
I was able to improve some on the speed of the fpaq0 algo, and gain a
several-times increase, via eliminating a division among a few other things.
I gan get somewhat better (40MB/s encode, 15MB/s decode, from what I
remember) from an order-0 range coder (increasing the order doesn't help
much with the speed, actually, it gets slow enough that I may as well just
use a slightly slower algo anyways...).
there is the idea of using shifts instead of multiplies/divides, which could
be effective wrt speed (at the cost of compression). however, presently I am
unsure exactly how this would be implemented (having not tried it).
> I have played with a algorithm called predator/predictor ...which is
> extremely fast
> and can work with the extreme approx. range coding of fapq0. The c
> (that ancient language) source is at Charles Bloom's site. Decent speed
> and acceptable compression.
>
errm, I write primarily in c.
> If I try to temper that codec i will post the results of the tries.
>
> I have this idea: In lossless codecs , the output tokens still carry
> the same entropy.
> [ Hidden within the modelling structures or the distribution of output
> tokens]. So,
> currently i am putting my centrino cpu to good use by trying to have a
> fast backend
> for lzrw1 to find a codec which will work on the transformed entropy
> from pass 1 RW
> and have a decent speed.
> Mahesh Naik
>
ok.
|
|
0
|
|
|
|
Reply
|
cr88192
|
10/29/2005 3:24:02 PM
|
|
|
12 Replies
113 Views
(page loaded in 0.212 seconds)
|