COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

### Special bitfield compression

• Email
• Follow

```Hello everyone.

I need ideas for a compression algorithm to compress a special kind of
bitfield. It contains always 400 bits and up to 30 bits are 1s, the
other bits are 0s.

My ideas are:
-using RLE on bits
-stop transmitting after 30 1s

The problem - only the worst case performance of this algorithm
matters. I'd be grateful for any additional ideas.

```
 0
Reply iso8859-1 (9) 1/23/2007 6:51:13 AM

See related articles to this posting

```iso8859-1@gmx.de wrote:
) Hello everyone.
)
) I need ideas for a compression algorithm to compress a special kind of
) bitfield. It contains always 400 bits and up to 30 bits are 1s, the
) other bits are 0s.
)
) My ideas are:
)  -using RLE on bits
)  -stop transmitting after 30 1s
)
) The problem - only the worst case performance of this algorithm
) matters. I'd be grateful for any additional ideas.

For optimal worst-case performance, you could use an arbitrary precision
integer package, and calculate the index into the enumeration of all
possible strings.

I'll explain that last one further.  For starters, you need an accumulator
that can hold very large (arbitrary precision) integers.  Also, you need to
calculate the total number of possible combinations of bits, rounding up to
the next power of two.  This gives you how many bits you need to output.

Initialise the accumulator at 0.

At each step, you look at the current bit.  If it's a 0, do nothing.  If
it's a 1, you then calculate how many possible combinations of the
remaining bits there are if the current bit were 0.  Add this number to
the accumulator.

When you're done, output the value of the accumulator, using the number of
bits calculated previously.  (This will always be the same.)

You could use a lookup table for all the possible values from
400-choose-30 to 400-choose-1 to 30-choose-30, that's less than 12000
values.  I've no idea how fast it is to calculate those as opposed to
looking them up though.

I hope I explained it clearly enough ?  If you want, I can do an example in
smaller scale when I get the time.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
```
 0

```> I need ideas for a compression algorithm to compress a special kind of
> bitfield. It contains always 400 bits and up to 30 bits are 1s, the
> other bits are 0s.
If this is all you know about your data, then the best you can do is to
use an arithmetic coder with a static model using the fixed
probabilities p('1')=0.075 and p('0')=0.925. This should give you an
output of ~153.7 bits.

output (e.g. in form of a header) which can be quite a lot in relation
to such small data sets. So you will either need to create your own
fixed (i.e. without the need for a header) compressor or you can try
David Scott's bijective arithmetic compressor.

> The problem - only the worst case performance of this algorithm
> matters. I'd be grateful for any additional ideas.
As long as you don't compress millions of such bitfields, I would not
care about the performance. However, you could also fall back to a
huffman coder with the probabilities recalculated for e.g. whole input
bytes...

I hope this helps.

Christian

```
 0

```Christian wrote:

> If this is all you know about your data, then the best you can do is to
> use an arithmetic coder with a static model using the fixed
> probabilities p('1')=0.075 and p('0')=0.925. This should give you an
> output of ~153.7 bits.

A coder with adaptive probabilities would do better. Ie. probability for
1 is (30 - previous 1s) / (remaining number of bits). The difference in
entropy with your model is only around 4 bits though.

Either way, arithmetic coders aren't perfect ... proving which sequence
gives worst case performance is tricky. At a gamble I would say it's 341
0s followed by 30 1s with the adaptive model, and 370 0s with your
static one.

With enumerative coding like Willem suggests you might have an easier
time proving worst case performance. I must admit I don't immediately
see how to get back from the 150 bit number to a sequence though.

Marco
```
 0

```Marco Al wrote:

Oops, nevermind ... missed the past that there could be less than 30 1s.

Marco
```
 0

```Marco wrote:
) With enumerative coding like Willem suggests you might have an easier
) time proving worst case performance. I must admit I don't immediately
) see how to get back from the 150 bit number to a sequence though.

Well, basically you follow the same steps as when encoding:

You start with the encoded number, and at each step you calculate the
number of possibilities for the remaining bits, assuming a 0.  When the
accumulator is smaller than that, output a 0, otherwise output a 1 and
subtract it from the accumulator.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
```
 0

```Willem wrote:
> Also, you need to
> calculate the total number of possible combinations of bits, rounding up to
> the next power of two.  This gives you how many bits you need to output.

Just for kicks, here it is (took a few seconds to type into
Mathematica):

Sum[Binomial[400, n], {n, 0, 30}]
1549835246319296875396604627597789530400430472

Log[2, %] // N
150.119

So the best you can do would be 151 bits for the shortest maximum code
length.

```
 0

```Mark wrote:
) Willem wrote:
)> Also, you need to
)> calculate the total number of possible combinations of bits, rounding up to
)> the next power of two.  This gives you how many bits you need to output.
)
) Just for kicks, here it is (took a few seconds to type into
) Mathematica):
)
) Sum[Binomial[400, n], {n, 0, 30}]
) 1549835246319296875396604627597789530400430472
)
) Log[2, %] // N
) 150.119
)
) So the best you can do would be 151 bits for the shortest maximum code
) length.

152 bits is 19 bytes, and 160 bits is 20 bytes...
So if you absolutely need to store it in 19, go for the more difficult
exact approach, if you can handle 20, use an arithcoder with fixed 0/1
probabilities, to get a worst case of 154 bits, which means you have
6 bits of play (you need a few for arithcoding).

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
```
 0

```
On Jan 22, 11:51 pm, iso885...@gmx.de wrote:
> Hello everyone.
>
> I need ideas for a compression algorithm to compress a special kind of
> bitfield. It contains always 400 bits and up to 30 bits are 1s, the
> other bits are 0s.
>
> My ideas are:
>  -using RLE on bits
>  -stop transmitting after 30 1s
>
> The problem - only the worst case performance of this algorithm
> matters. I'd be grateful for any additional ideas.

It would be a trival matter to modify a very simple arithmetic coder to
use for
a count of 370 zeroes and 30 ones it would use  150 bits max
if  you allow for 0 to 30 ones then you could encode that fact at the
start.
let a 0 mean 30 bits are one that would give 151 bits max for the
encoding
a 10 would mean 29 bits max and so on.  The fewer the number of actual
ones
the shorter the encoded string.  You could search for a way with few
cases ending
in 151 bits but  why bother since  if you need to store this as data
you could
use 19 byte fixed size records to handle the problem.

```
 0

```iso8859-1@gmx.de wrote:
> Hello everyone.
>
> I need ideas for a compression algorithm to compress a special kind of
> bitfield. It contains always 400 bits and up to 30 bits are 1s, the
> other bits are 0s.
>
> My ideas are:
>  -using RLE on bits
>  -stop transmitting after 30 1s
>
> The problem - only the worst case performance of this algorithm
> matters. I'd be grateful for any additional ideas.
>

I know I'm way late on this, but consider starting with merely encoding
the positions of the 1 bits.  Then encode the difference - 1 between the
bits.  You can use a variable length code depending on the remaining
size of the data (initially 9 bits, 8 bits when you get to the halfway
point, etc.).  I haven't run the numbers, but I suspect that this method
would run close (on average) to the 150 bits option provided by Willem
(assuming worst-case always 9 bits, you get 275 bits - 5 for the count,
270 for the 9 bit deltas, or 14 bits - 5 for the count, 9 for the 1 entry).

- Josiah
```
 0

```On Feb 14, 12:29 am, Josiah Carlson <josiah.carl...@sbcglobal.net>
wrote:
> iso885...@gmx.de wrote:
> > Hello everyone.
>
> > I need ideas for acompressionalgorithmto compress a special kind of
> > bitfield. It contains always 400 bits and up to 30 bits are 1s, the
> > other bits are 0s.
>
> > My ideas are:
> >  -using RLE on bits
> >  -stop transmitting after 30 1s
>
> > The problem - only the worst case performance of thisalgorithm
> > matters. I'd be grateful for any additional ideas.
>
> I know I'm way late on this, but consider starting with merely encoding
> the positions of the 1 bits.  Then encode the difference - 1 between the
> bits.  You can use a variable length code depending on the remaining
> size of the data (initially 9 bits, 8 bits when you get to the halfway
> point, etc.).  I haven't run the numbers, but I suspect that this method
> would run close (on average) to the 150 bits option provided by Willem
> (assuming worst-case always 9 bits, you get 275 bits - 5 for the count,
> 270 for the 9 bit deltas, or 14 bits - 5 for the count, 9 for the 1 entry).
>
>   - Josiah

Hi,
With input 400 bits ( max 30 bits '1'),
It is possible to compress to 245 bits or less.

Tam

```
 0

10 Replies
111 Views

Similar Articles

12/6/2013 9:56:04 PM
page loaded in 54657 ms. (0)

Similar Artilces:

Compression 356144
I have a faily large jpg file. How do I compress it to meet a certain amount of bytes? Nothing in the book on how to. -- Rita http://myquilts.hypermart.net > I have a faily large jpg file. How do I compress it to meet a certain > amount of bytes? Nothing in the book on how to. File > Save as Choose JPG as the file extension and click on "options". Then choose "optimizer" and see the resulting file size depending on the amount of compression you ask. Are you aware that at high levels of compression the resulting image will be less and less quality to the point of being unusable? Antonio Rita wrote: > > I have a faily large jpg file. How do I compress it to meet a certain > amount of bytes? Nothing in the book on how to. File > Export > JPEG Optimizer. Change the compression setting and watch the resulting output size reported under the right preview window. > -- > Rita > http://myquilts.hypermart.net I did find that section in the book after all, but I didn't see any difference in size afterwards -- Rita http://myquilts.hypermart.net A Soberon wrote: >>I have a faily large jpg file. How do I compress

AOL compression
I maintain an application that among other things reads AOL .pfc mail files. Since version 7 of AOL their messages have been stored in a compressed format using the "deflate" algorithm, which I have successfully decompressed using the ZLIB compression library. Now however I have encountered one message that does not seem to be compressed that way - when I run the block of text through the uncompress routine, I get an "invalid data" error message. Does anyone know if AOL has used other compression methods? How would I go about figuring out what method was used, given that I have both the compressed version and clear text available?

Compression Functions
I want to do a quick and dirty scan of Apache logs using some of the zlib Compression Functions. My hosting company tars and gzips several other log files into the gz file. Is there an easy way, using PHP, to separate the files, and only process the Apache log? Hi you can exec this command tar xzvf logs.tar.gz access_log where access_log is the file you want to extract. Regards, John Peters On Apr 28, 9:52 pm, William Gill <nore...@example.com> wrote: > I want to do a quick and dirty scan of Apache logs using some of the > zlib Compression Functions. My hosting company tars and gzips several > other log files into the gz file. Is there an easy way, using PHP, to > separate the files, and only process the Apache log? petersprc wrote: > Hi you can exec this command > > tar xzvf logs.tar.gz access_log > > where access_log is the file you want to extract. > Thanks, but I don't remote access to the shell. I was hoping I could use the PHP compression functions to split the files, extract to a variable, and scan the data. I have been playing with gzfile() but what I'm doing seems a little crude. On 30 Apr, 13:28

Bandlet compression?
Has anyone seen Let it Wave's bandlet compression technology? The examples on their website are so impressive that I would accuse them of cheating. They also have some impressive names on the company's board... Stephane Mallat, Yves Faroudja. http://www.letitwave.com/index.php?rubrique=3%20PRODUCTS&page=id_photo_codec.htm glennchan@gmail.com wrote: > Has anyone seen Let it Wave's bandlet compression technology? The > examples on their website are so impressive that I would accuse them of > cheating. They also have some impressive names on the company's >..._Codecbeschreibu= ng.zip and a German TV documentary with a lot of info at http://www.3sat.de/dynamic/webtv/webtv_frame.php?url=3D/neues/neues_050205_= spezial.rm That picture resize to bigger and better quality looks similar to what the Belgium inventor Guillaume Defoss=E9 did in the past (2001) like an exciter in music. A Belgium judge forbid him to talk about any of his compression related inventings otherwise he must pay 1mln euro fine by every violation, so I'm afraid we don't go to hear much about his compression inventings any more. He is now busy with blue energy (saving 60

compression or crimp
Those of you who have had 'hands on' - which do you think is best for terminating F, RCA, and BNC fittings? Pros and cons? Thanks Bob Dozier wrote: > Those of you who have had 'hands on' - which do you think is best > for terminating F, RCA, and BNC fittings? Pros and cons? Thanks > > Compression, without a question. Hex-crimp creates 6 points of impedance mismatch, which is a source of reflections. Reflections, of course, can cause standing wave (analog) and/or packet collision (digital). I once went on a service call where a few of their DirecTV channels were missing. Not a few transponders, but a few channels. Replaced the hex-crimp fittings at the LNB, and back they came. Compression fittings also have a much higher return loss (>30dB) than hex-crimp (~18dB). I once saw a copy of a sweep trace of a high rise building in which hex-crimp fittings were used. You could see every floor on the trace. Compression fittings have a better pullout strength than hex-crimp. Most of them also have internal O-rings, which makes them more moisture resistant. They may be a little more expensive (not that much, though), but the time saved

svd and image compression
Hi all, Sorry if this topics has been beaten to dead here. I was playing around with SVD and image compression. Sure enough, if I just use the first few eigenvalues the image quality drops significantly and the quality of the image begins to reach the origin's quality as the number of eiganvalues climbs. What I don't get is that my images are always represented by matrices of the same size regardless of the image quality. Shouldn't this mean that the image size remain the same? I am sure that if I save the image the the filesize will reflect the quality, but I thought that since each pixel is represented by one (b/w) or three (rbg) numbers, the image size is simply calculated by the matrix size. I can't see the approximated matrix as sparse because zero elements will show up as black. Any hints, suggestions is appreciated. Michuco Le 25/09/2011 04:42, ibmichuco a �crit : > What I don't get is that my images are always represented by > matrices of the same size regardless of the image quality. > Shouldn't this mean that the image size remain the same? what you need to store on disk is just the eigen vectors and eigen values

a compression/packing problem
Hi everyone, I have a problem: Given an array nxm that contains 0's or 1's, I need group the 0 values into rectangles. At the beginning, I was used a simple quadtree, but different nodes in the same level of the tree have the same value. I'm not totally sure if the R-tree works for my problem or another data structure because I just will use this structure in pre-calculate step and that's it. Thanks in advanced esmitt Am 01.07.2010 00:42, schrieb esmitt: > Given an array nxm that contains 0's or 1's, I need group the 0 values > into rectangles. What kind of re

Image::Magick jpeg compression
Hi all, I've been banging my head against a wall trying to use Image::Magick to generate jpeg compressed thumbnails. Here's latest iteration of the code I've been using; my \$image = Image::Magick->new; \$image->Read(filename=>\$source); \$image->Resize(width=>400, height=>300); \$image->Write(filename=>\$output); \$image->Resize(width=>100, height=>75); \$image->Set(Quality=>10); \$image->Write(filename=>\$thumb, compression=>'JPEG'); undef \$image; Basically, I'm trying to read a source image, resize to 400x300...'s latest iteration of the code I've been using; > my \$image = Image::Magick->new; > \$image->Read(filename=>\$source); > \$image->Resize(width=>400, height=>300); > \$image->Write(filename=>\$output); > \$image->Resize(width=>100, height=>75); > \$image->Set(Quality=>10); > \$image->Write(filename=>\$thumb, compression=>'JPEG'); > undef \$image; > > Basically, I'm trying to read a source image, resize to 400x300 and > write that, then resize to 100x75 and then write that. I notice