### Is ratio of set and reset bits an indicator of compressibility?

```Is ratio of set and reset bits an indicator of compressibility?

I'm working on a and exploring a binary transform and am forming an
opinion on what qualities are important to test for.

So far I believe that a ratio between set and reset bits is a general
indicator of compressibility. This I have understood from reading this
group.

Is this a correct notion?

I'm generating files of length 415241 bytes and testing for ratios of
set and reset.  Currently saving each file generating a larger ratio then
the last. in the neighborhood of 8000 bits difference at the moment.

```
It is, as it implies a bias in the input (and clearly would allow an
arithmatic compressor with two symbols to compress the file), but it's
very weak.  Consider, for example, a file full of uppercase Z's: the
number of zero and one bits will be identical, which would imply low
compressibility, while it's hard to imagine a file that would compress
better.

BTW, you're looking for the general concept of entropy.  See:

http://en.wikipedia.org/wiki/Entropy_(information_theory)
```
```On 17.03.2011 20:07, Ernst Berg wrote:
>
> Is ratio of set and reset bits an indicator of compressibility?

Partially. Having a bias of 0 in respect to 1's indicates
compressibility, i.e. a non-maximal zero-entropy. But the reverse does
not hold, i.e. just because a source has maximal zero-order entropy does
not mean it is not compressible. It could have non-maximal first-order
entropy. Consider a bit sequence generated by a Bernoulli source, and
transform it by duplicating each bit in the input sequence, gaining an
output twice as long. Clearly, the ratio of 1 to 0 should be
approximately 50/50, but yet this expanded sequence is compressible.

The proper notion for "compressibility" (in terms of the Shannon
theorem) is to consider blocks of length m, compute the entropy of these
blocks divided by the block length, and then make the blocks larger and
larger. If *that* isn't non-maximal, the source is compressible - if and
only if. This is Shannon's lossless source coding theorem.

In practical terms, you cannot, of course, make the block size
arbitrarily large, and still get some *useful* statistical information
from it.

Greetings,
Thomas

```
```On Thu, 17 Mar 2011 21:59:29 -0700, robertwessel2@yahoo.com wrote:

Thank you.

```
```On Fri, 18 Mar 2011 10:12:40 +0100, Thomas Richter wrote:

Thank you.

```
