Special bitfield compression

  • Follow


Hello everyone.

I need ideas for a compression algorithm to compress a special kind of
bitfield. It contains always 400 bits and up to 30 bits are 1s, the
other bits are 0s.

My ideas are:
 -using RLE on bits
 -stop transmitting after 30 1s

The problem - only the worst case performance of this algorithm
matters. I'd be grateful for any additional ideas.

0
Reply iso8859-1 (1) 1/23/2007 6:51:13 AM

iso8859-1@gmx.de wrote:
) Hello everyone.
)
) I need ideas for a compression algorithm to compress a special kind of
) bitfield. It contains always 400 bits and up to 30 bits are 1s, the
) other bits are 0s.
)
) My ideas are:
)  -using RLE on bits
)  -stop transmitting after 30 1s
)
) The problem - only the worst case performance of this algorithm
) matters. I'd be grateful for any additional ideas.

For optimal worst-case performance, you could use an arbitrary precision
integer package, and calculate the index into the enumeration of all
possible strings.

I'll explain that last one further.  For starters, you need an accumulator
that can hold very large (arbitrary precision) integers.  Also, you need to
calculate the total number of possible combinations of bits, rounding up to
the next power of two.  This gives you how many bits you need to output.

Initialise the accumulator at 0.

At each step, you look at the current bit.  If it's a 0, do nothing.  If
it's a 1, you then calculate how many possible combinations of the
remaining bits there are if the current bit were 0.  Add this number to
the accumulator.

When you're done, output the value of the accumulator, using the number of
bits calculated previously.  (This will always be the same.)

You could use a lookup table for all the possible values from
400-choose-30 to 400-choose-1 to 30-choose-30, that's less than 12000
values.  I've no idea how fast it is to calculate those as opposed to
looking them up though.

I hope I explained it clearly enough ?  If you want, I can do an example in
smaller scale when I get the time.


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
0
Reply Willem 1/23/2007 9:49:23 AM


> I need ideas for a compression algorithm to compress a special kind of
> bitfield. It contains always 400 bits and up to 30 bits are 1s, the
> other bits are 0s.
If this is all you know about your data, then the best you can do is to
use an arithmetic coder with a static model using the fixed
probabilities p('1')=0.075 and p('0')=0.925. This should give you an
output of ~153.7 bits.

Note: many existing compressors will add additional information to the
output (e.g. in form of a header) which can be quite a lot in relation
to such small data sets. So you will either need to create your own
fixed (i.e. without the need for a header) compressor or you can try
David Scott's bijective arithmetic compressor.

> The problem - only the worst case performance of this algorithm
> matters. I'd be grateful for any additional ideas.
As long as you don't compress millions of such bitfields, I would not
care about the performance. However, you could also fall back to a
huffman coder with the probabilities recalculated for e.g. whole input
bytes...

I hope this helps.

   Christian

0
Reply Christian 1/23/2007 9:50:08 AM

Christian wrote:

> If this is all you know about your data, then the best you can do is to
> use an arithmetic coder with a static model using the fixed
> probabilities p('1')=0.075 and p('0')=0.925. This should give you an
> output of ~153.7 bits.

A coder with adaptive probabilities would do better. Ie. probability for 
1 is (30 - previous 1s) / (remaining number of bits). The difference in 
entropy with your model is only around 4 bits though.

Either way, arithmetic coders aren't perfect ... proving which sequence 
gives worst case performance is tricky. At a gamble I would say it's 341 
0s followed by 30 1s with the adaptive model, and 370 0s with your 
static one.

With enumerative coding like Willem suggests you might have an easier 
time proving worst case performance. I must admit I don't immediately 
see how to get back from the 150 bit number to a sequence though.

Marco
0
Reply Marco 1/23/2007 10:10:53 AM

Marco Al wrote:

Oops, nevermind ... missed the past that there could be less than 30 1s.

Marco
0
Reply Marco 1/23/2007 10:13:29 AM

Marco wrote:
) With enumerative coding like Willem suggests you might have an easier 
) time proving worst case performance. I must admit I don't immediately 
) see how to get back from the 150 bit number to a sequence though.

Well, basically you follow the same steps as when encoding:

You start with the encoded number, and at each step you calculate the
number of possibilities for the remaining bits, assuming a 0.  When the
accumulator is smaller than that, output a 0, otherwise output a 1 and
subtract it from the accumulator.


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
0
Reply Willem 1/23/2007 10:21:38 AM

Willem wrote:
> Also, you need to
> calculate the total number of possible combinations of bits, rounding up to
> the next power of two.  This gives you how many bits you need to output.

Just for kicks, here it is (took a few seconds to type into
Mathematica):

Sum[Binomial[400, n], {n, 0, 30}]
1549835246319296875396604627597789530400430472

Log[2, %] // N
150.119

So the best you can do would be 151 bits for the shortest maximum code
length.

Mark Adler

0
Reply Mark 1/23/2007 3:19:01 PM

Mark wrote:
) Willem wrote:
)> Also, you need to
)> calculate the total number of possible combinations of bits, rounding up to
)> the next power of two.  This gives you how many bits you need to output.
)
) Just for kicks, here it is (took a few seconds to type into
) Mathematica):
)
) Sum[Binomial[400, n], {n, 0, 30}]
) 1549835246319296875396604627597789530400430472
)
) Log[2, %] // N
) 150.119
)
) So the best you can do would be 151 bits for the shortest maximum code
) length.

152 bits is 19 bytes, and 160 bits is 20 bytes...
So if you absolutely need to store it in 19, go for the more difficult
exact approach, if you can handle 20, use an arithcoder with fixed 0/1
probabilities, to get a worst case of 154 bits, which means you have
6 bits of play (you need a few for arithcoding).


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
0
Reply Willem 1/23/2007 3:28:51 PM


On Jan 22, 11:51 pm, iso885...@gmx.de wrote:
> Hello everyone.
>
> I need ideas for a compression algorithm to compress a special kind of
> bitfield. It contains always 400 bits and up to 30 bits are 1s, the
> other bits are 0s.
>
> My ideas are:
>  -using RLE on bits
>  -stop transmitting after 30 1s
>
> The problem - only the worst case performance of this algorithm
> matters. I'd be grateful for any additional ideas.

It would be a trival matter to modify a very simple arithmetic coder to
 use for
a count of 370 zeroes and 30 ones it would use  150 bits max
  if  you allow for 0 to 30 ones then you could encode that fact at the
start.
let a 0 mean 30 bits are one that would give 151 bits max for the
encoding
a 10 would mean 29 bits max and so on.  The fewer the number of actual
ones
the shorter the encoded string.  You could search for a way with few
cases ending
in 151 bits but  why bother since  if you need to store this as data
you could 
use 19 byte fixed size records to handle the problem.

0
Reply davvid_a_scott 1/24/2007 6:02:31 AM

iso8859-1@gmx.de wrote:
> Hello everyone.
> 
> I need ideas for a compression algorithm to compress a special kind of
> bitfield. It contains always 400 bits and up to 30 bits are 1s, the
> other bits are 0s.
> 
> My ideas are:
>  -using RLE on bits
>  -stop transmitting after 30 1s
> 
> The problem - only the worst case performance of this algorithm
> matters. I'd be grateful for any additional ideas.
> 

I know I'm way late on this, but consider starting with merely encoding 
the positions of the 1 bits.  Then encode the difference - 1 between the 
bits.  You can use a variable length code depending on the remaining 
size of the data (initially 9 bits, 8 bits when you get to the halfway 
point, etc.).  I haven't run the numbers, but I suspect that this method 
would run close (on average) to the 150 bits option provided by Willem 
(assuming worst-case always 9 bits, you get 275 bits - 5 for the count, 
270 for the 9 bit deltas, or 14 bits - 5 for the count, 9 for the 1 entry).

  - Josiah
0
Reply Josiah 2/14/2007 8:29:38 AM

On Feb 14, 12:29 am, Josiah Carlson <josiah.carl...@sbcglobal.net>
wrote:
> iso885...@gmx.de wrote:
> > Hello everyone.
>
> > I need ideas for acompressionalgorithmto compress a special kind of
> > bitfield. It contains always 400 bits and up to 30 bits are 1s, the
> > other bits are 0s.
>
> > My ideas are:
> >  -using RLE on bits
> >  -stop transmitting after 30 1s
>
> > The problem - only the worst case performance of thisalgorithm
> > matters. I'd be grateful for any additional ideas.
>
> I know I'm way late on this, but consider starting with merely encoding
> the positions of the 1 bits.  Then encode the difference - 1 between the
> bits.  You can use a variable length code depending on the remaining
> size of the data (initially 9 bits, 8 bits when you get to the halfway
> point, etc.).  I haven't run the numbers, but I suspect that this method
> would run close (on average) to the 150 bits option provided by Willem
> (assuming worst-case always 9 bits, you get 275 bits - 5 for the count,
> 270 for the 9 bit deltas, or 14 bits - 5 for the count, 9 for the 1 entry).
>
>   - Josiah

Hi,
With input 400 bits ( max 30 bits '1'),
It is possible to compress to 245 bits or less.

Tam

0
Reply chutamhung 2/28/2007 6:35:18 AM

10 Replies
97 Views

(page loaded in 0.122 seconds)

6/16/2013 7:13:24 AM


Reply: