Re: A Random Data Compression method -draft #4

Another interesting idea of move to front is instead of encoding value's 
like zero's.... and then later passing huffman over it...

How about directly outputting codes...

Instead of a value table and position table and such... there would be an 
encoding table for the move to front...

the front positions would have the shortest codes... and the later positions 
longer codes...

Decoding happens in opposite... longer codes are looked up... the value 
outputted... and then moved to the front to get a shorter code next time...

Added benefit of this new idea is that outputted codes immediatly get 
shorter codes on the next try...

while with static huffman and perhaps even dynamic huffman it would still 
depend on the ultimate frequencies but not so with move to front...

Hmm kinda interesting... this could also be combined with bubble to front... 
for a quicker implementation but would probably be less compression 

Only problem remaining is what prefixes/codes to give the the element of the 
move to front ? that's where dynamic huffman uses a tree to adept to 

and that's why huffman works nicely normally...

With move to front the prefixes could be a simple list for starters:

always ends with a one.


Other idea:

reserve position 0 as a special pattern/compression pattern.

move to front position zero is outputted as 0.
move to front position x is outputted as 1 + value.

This way re-occuring patterns always get the short 0 code.

Other variations could be:

position 0 is outputted as 1
position 1 is outputted as 01
position 2 is outputted as 001
position 3 is outputted as 0001
position 4 is outputted as 00001
position 5 is outputted as 000001
position 6 is outputted as 0000001
position 7 and higher is outputted as 00000001 + value.

This Skybuck's Universal Code can be used for the prefixes.

Savings are achieved when positions 0 to 6 are hit.

Bits are doubled for 7 and higher.

For position 7 and higher cases... perhaps a huffman code could be used... 
but then again that might be used for position 1 to 6 as well... and 
scrapping the idea. and only using position 0.

Let's see original idea:

Random input:

1  2  3  4  5  6  7  8  9  10 11 12
11 10 10 01 01 10 11 01 01 01 01 01

2 bit
[0, 101, 110, 111]

First input:

First output:

Table changed:
[100,101,110, 0]

Remarkable discovery: move to front is not necessary to move table 
forward... why would you do that ? it apprently assumes values close to each 
other but for random probablt doesn't make sense... so I am just gonna swap 
;) :) like he says ;)
Now I am just gonna modify it... add a bit in front of it and zero the other 
one ;)

Second input:

Second output:

Table changed:

[100,101,0, 111]

Third input:

Third output:

Table stay same.

Fourth input:

Fouth output:

Table changed:
[100, 0, 110, 111]

Well it's easy to see where this goes:

Output5: 0
Output6: 110
Output7: 111
Output8: 101
Output9: 0
Output10: 0
Output11: 0
Output12: 0

Final result:

Compared to input:

No saving so far ;)

Perhaps table can be encoded more efficient:

0 is reserved for least occurence.
10 is for second
110 is for third
111 is for fourth...

using this new table... it should be possible to get some kind of saving.

Reading 0 means efficient code.
Reading 1 means read next, if next is zero stop.
Reading 1 means read next, if next is one continue.
Reading 1 means read next, if next is one continue, if next is zero stop.
Reading 1 means read next, if next is one continue, if next is one stop.

So decoding this without conflict should be possible.

Not gonna do a full example again... I've seen enough ;)

Maybe later I gotta go eat ;)


1/10/2011 7:26:07 PM
comp.compression 4696 articles. 0 followers. Post Follow

0 Replies

Similar Articles

[PageSpeed] 48