COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

### okay, serious stuff now

• Email
• Follow

```I need a method to examine a vector of data and estimate it's
compressibility.  Several months ago, I worked on this with one person
from here, but he went to China, and, apparently, was lost to the
world.  Forever?  I don't know.

Anyway, here's what I need.

Assume a vector with known width and length.

Now, it's possible to examine the tokens and do a:

sort <ascii_tokens_as_nbrs | uniq -c | sort -r

With this, (the equivalent in C,) one can estimate
the cost by assuming a bit for the most frequent
token, and two bits, etc...

The problem is, this method isn't accurate.

Accuracy wasn't 20% off from the actual compressed
size, it was typically off by a factor of two or so.  Plus
unreliable and unusable.

The need I have for this application component is such
that I can tolerate inaccuracy but need stability and
monotonicity;  ie., figures of merit wrt one file compared
to another must retain the same size relationship when
those files are compressed.

Also I have an ad on the internet now, looking for help
making a bzip2 connection to my program.  Ditto gzip.

```
 0

See related articles to this posting

```jules Gilbert wrote:
> I need a method to examine a vector of data and estimate it's
> compressibility.

For that, please first identify the problem. Do you have a random source
that generates vectors in R^n, and you want to compress (on average) the
family of vectors that are generated by that source? In that case, H_n,
the n-th order block entropy, provides an estimate. It is accurate if
the source vectors are independent from each other, otherwise a
higher-order entropy must be used and the source is more compressible
than H_n states.

Or are you talking about a *single* vector? Then, the question makes no
sense since you cannot define a data model from a single event. In order
to compress on average, compressible vectors must be "typical", but for
that it must be known how vectors "usually" look like. Without that
information, nothing can be said.

For example, if N=8, and vectors are built from a binary alphabet, then
vectors 00000000 and 11111111 may "look" more compressible, and
runlength coding might seem feasible, though this might not be. If your
source is a first order Markov chain such that the probability of
finding the *alternate* symbol as next symbol rather than the same
symbol again, runlength coding is a very stupid idea, and "half" of the
first order Markov sources look like this. Again, the *source* defines
the compressibility, not the samples. Samples don't matter.

So long,
Thomas
```
 0

```On May 22, 8:47=A0pm, jules Gilbert <jules.sto...@gmail.com> wrote:
> I need a method to examine a vector of data and estimate it's
> compressibility. =A0Several months ago, I worked on this with one person
> from here, but he went to China, and, apparently, was lost to the
> world. =A0Forever? =A0I don't know.
>

but, I'm here; may I know the stuff you invented ?

> Anyway, here's what I need.
>
> Assume a vector with known width and length.
>
> Now, it's possible to examine the tokens and do a:
>
> sort <ascii_tokens_as_nbrs | uniq -c | sort -r
>
> With this, (the equivalent in C,) one can estimate
> the cost by assuming a bit for the most frequent
> token, and two bits, etc...
>
> The problem is, this method isn't accurate.
>
> Accuracy wasn't 20% off from the actual compressed
> size, it was typically off by a factor of two or so. =A0Plus
> unreliable and unusable.
>
> The need I have for this application component is such
> that I can tolerate inaccuracy but need stability and
> monotonicity; =A0ie., figures of merit wrt one file compared
> to another must retain the same size relationship when
> those files are compressed.
>
> Also I have an ad on the internet now, looking for help
> making a bzip2 connection to my program. =A0Ditto gzip.

greetings
nimo
_____________

I. When a distinguished but elderly scientist states that some-
thing is possible, he is almost certainly right. When he states that
something is impossible, he is very probably wrong.

II. The only way of discovering the limits of the possible is to
venture a little way past them into the impossible.

III. Any sufficiently advanced technology is indistinguishable from
magic.

-Arthur C.Clarke's the 3 laws

```
 0

```On May 23, 12:47=A0am, jules Gilbert <jules.sto...@gmail.com> wrote:
> I need a method to examine a vector of data and estimate it's
> compressibility. =A0Several months ago, I worked on this with one person
> from here, but he went to China, and, apparently, was lost to the
> world. =A0Forever? =A0I don't know.

So... You can't code this, but coded a magic compression tool?

> Anyway, here's what I need.
> Assume a vector with known width and length.
> Now, it's possible to examine the tokens and do a:
> sort <ascii_tokens_as_nbrs | uniq -c | sort -r
>
> With this, (the equivalent in C,) one can estimate
> the cost by assuming a bit for the most frequent
> token, and two bits, etc...

Why not log2(p) ?

See http://www.fourmilab.ch/random/

Andr=E9
```
 0

```On May 25, 6:43=A0pm, Andr=E9 <andrebacci.lis...@gmail.com> wrote:
> On May 23, 12:47=A0am, jules Gilbert <jules.sto...@gmail.com> wrote:
>
> > I need a method to examine a vector of data and estimate it's
> > compressibility. =A0Several months ago, I worked on this with one perso=
n
> > from here, but he went to China, and, apparently, was lost to the
> > world. =A0Forever? =A0I don't know.
>
> So... You can't code this, but coded a magic compression tool?

Yes, please note.  This is such an interesting request from a person
who has publicly posted here that none of us are any good at
programming, that he is a better programmer than all of us combined.
But yet after more than ten years of posting these claims has *NOT* be
able to supply a working compressor/decompressor in all that time.
```
 0

```On May 22, 11:47=A0pm, jules Gilbert <jules.sto...@gmail.com> wrote:
> I need a method to examine a vector of data and estimate it's
> compressibility. =A0Several months ago, I worked on this with one person
> from here, but he went to China, and, apparently, was lost to the
> world. =A0Forever? =A0I don't know.

Why don't you program it yourself?  You already claimed to be smarter
than everyone else here.  You have already claimed that you were a
better programmer than everyone else here.  And you have already
stated that people here have blinders on and don't know how to design
new code.

As I have been saying for the last ten plus years, shup and write the
code then come back.  So far you have done NOTHING!
```
 0

```On May 26, 12:31=A0pm, earlcolby.pottin...@sympatico.ca wrote:
> On May 22, 11:47=A0pm, jules Gilbert <jules.sto...@gmail.com> wrote:
>
> > I need a method to examine a vector of data and estimate it's
> > compressibility. =A0Several months ago, I worked on this with one perso=
n
> > from here, but he went to China, and, apparently, was lost to the
> > world. =A0Forever? =A0I don't know.
>
> Why don't you program it yourself? =A0You already claimed to be smarter
> than everyone else here. =A0You have already claimed that you were a
> better programmer than everyone else here. =A0And you have already
> stated that people here have blinders on and don't know how to design
> new code.
>
> As I have been saying for the last ten plus years, shup and write the
> code then come back. =A0So far you have done NOTHING!

First,

I have never claimed to have discovered new science or to have proven
existing science impossible.  You folks are much too casual in your
remarks.

What I have done is use our existing programming technology to develop
several class's of repeatable file/message compressors.

And for those who don't get it or for another reason can't make the
connection, here it is again.

I wanted a tool that would measure a program and estimate it's
compressibility wrt conventional compressors.

Once I saw that (one more time,) I wasn't going to get any real help
from you folks, I resorted to solving a different problem, but
sufficient for my needs.  I wrote a program that given two files,
decides which program is more compressible, compared to the other.
Simple to write and running it through my suite of test files causes
me to think I might actually have gotten it right!

I do want to do one more thing;  While Earl Colby doesn't believe
anything I say, okay, sorry -- I wish he did but he alone makes his
decisions.

However Earl, a few years ago you wrote a memo describing what you
thought were necessary conditions for a satisfactory demonstration of
compression.  Could you point me to that memo, please.  Suddenly it's
important.

```
 0

```On 27 Mai, 11:22, jules Gilbert <jules.sto...@gmail.com> wrote:
>
> Once I saw that (one more time,) I wasn't going to get any real help
> from you folks [...]

You got real good help from Thomas in this thread. You just don't know
how to handle it. In your first post you didn't link "compressibility"
to standard general purpose compressors. You left out important
informations and Thomas explained to you why what you asked doesn't
make much sense. Standard general purpose compressors try to exploit
repeating byte sequences and non-uniform distributions (simplified
explanation).

Thomas was talking about "average compressibility" of a source's
output. This, of course, depends on the source.

Cheers!
SG
```
 0

```jules Gilbert <jules.stocks@gmail.com> writes:
[snip]
> I have never claimed to have discovered new science or to have proven
> existing science impossible.  You folks are much too casual in your
> remarks.
>
> What I have done is use our existing programming technology to develop
> several class's of repeatable file/message compressors.
>
> And for those who don't get it or for another reason can't make the
> connection, here it is again.
>
> I wanted a tool that would measure a program and estimate it's
> compressibility wrt conventional compressors.
>
> Once I saw that (one more time,) I wasn't going to get any real help
> from you folks, I resorted to solving a different problem, but
> sufficient for my needs.  I wrote a program that given two files,
> decides which program is more compressible, compared to the other.
> Simple to write and running it through my suite of test files causes
> me to think I might actually have gotten it right!

You seem to be conflating the terms "file" and "program".  I think you
mean "which *file* is more compressible".  If not, you should probably
explain what you mean by the compressibility of a program.

The solution to your problem, as stated, seems almost trivial: run the
files through a "conventional compressor" and see how much they were
actually compressed.  I'm guessing that's not what you had in mind.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
Nokia
"We must do something.  This is something.  Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
```
 0

```jules Gilbert schrieb:

> I wanted a tool that would measure a program and estimate it's
> compressibility wrt conventional compressors.

Different question, different answers. IOW, you want to estimate the (operational) entropy of a stream. Ok, there is a nice result
(IIRC) that shows that the LZ algorithm (actually, both of them) compresses up to H if the size of such streams are going to
infinity - or, if you flush the statistics on block-boundaries, converges to the block-entropy of a block size as given by the LZ
block size. That is, LZ followed by an optimal entropy coder. LZ is rather a method to build a model than to compress, to be precise.

Anyhow, compressors do not reach that limit, of course, because first real streams are not infinitely long, second because they do
not use an optimal encoding, and third this is only a result in the limit.

But what all that boils down to is that you could either simply run an LZ-based compressor to get approximate result (the easiest
choice) or alternatively, use at least the LZ model builder and from there use the relative frequencies of the symbols the LZ
estimates to get an estimate on the entropy of the source, and from that derive the compressibility. Again, this is only an
approximation.

> I do want to do one more thing;  While Earl Colby doesn't believe
> anything I say, okay, sorry -- I wish he did but he alone makes his
> decisions.

As a note to the public: Earl is not alone.

> However Earl, a few years ago you wrote a memo describing what you
> thought were necessary conditions for a satisfactory demonstration of
> compression.  Could you point me to that memo, please.  Suddenly it's
> important.

That note has been explained several times here, and it should be obvious, but anyhow, the rules of the game
haven't changed:

a) You provide a decompressor (executable).
b) they sent you files,
c) You compress those files,
d) you sent them the compressed files,
e) they decompress such files on (supposedly) isolated systems with your decompressor,
f) they compare the decompressed streams with the originals.
g) comparison of the file sizes of compressed and original.
h) possible repeat back to b).

Note that the order matters. a) must go before b), and not vice versa.

So long,
Thomas

```
 0

```On May 28, 11:43=A0am, Thomas Richter <t...@math.tu-berlin.de> wrote:
> a) You provide a decompressor (executable).
> b) they sent you files,
> c) You compress those files,
> d) you sent them the compressed files,
> e) they decompress such files on (supposedly) isolated systems with your =
decompressor,
> f) they compare the decompressed streams with the originals.
> g) comparison of the file sizes of compressed and original.
> h) possible repeat back to b).

I feel compelled to take this excellent proof of compression summary
and reword it (not because it's badly written, but because I feel the
point needs to be hammered home with an especially large hammer):

1. You (inventor) provide a decompressor.
2. We (testers) send you a file to compress.
3. You run the file through your compressor and send us the resulting
compressed output.
4. We run the compressed output through the decompressor and compare
to the original file we sent you.

If the files match, your decompression system works.
If the compressed file is smaller than the original file, then your
compression system works.
If the size of the compressed file + the size of the decompression
system is smaller than the original file, then you have achieved
actual compression of the data.

There should be no questions.  This is not a "challenge", this is a
very simple verification procedure.

For the truly paranoid, you will note that your compressor is never
transmitted to the testers, which keeps your process "safe" until
which time you choose to reveal it.
```
 0

```On May 28, 11:43=A0am, Thomas Richter <t...@math.tu-berlin.de> wrote:
>
> a) You provide a decompressor (executable).
> b) they sent you files,
[etc]

Jules has been talking to me for the last year or so about
demonstrating his software. All the methods he propose to me seem
pretty pointless.

One of his recurring plans is to do a demonstration using clean
machines in the presence of some presumably competent and impartial
authority, while recording the whole thing on video for an upload to

Jules doesn't see that this is of no value, because:

1) You aren't going to be able to read computer screens from a hand-
held video camera.
2) Anyone who thinks Jules is faking it is not going to be moved by a
demo that is so easy to manipulate.
3) The demo will be a lot less dramatic and exciting than he seems to
think.

I would suggest that if he feels he must demo something, he should use
a screen capture tool such as ScreenToaster to create a webcast. He
can then demonstrate the following things:

1) Compile his compressor and/or decompressor. Use directory listings
to show the size of the source and executables.
2) run md5 or sha1 or whatever on both source files to establish a
somewhat firm record of their existence at this point in space and
time.
3) compress the million digit file, do a listing to show its size, run
md5/sha1 on that
4) decompress, do a diff to show that the output is unchanged.

There are a few things that would enhance the value of the demo:

1) Write your program so that it could be run with ltrace and strace,
demonstrating that only very specific system calls are being made.
Ideally this would be almost none.
2) Repeat the demo on a version of the million digit file that has
been encrypted using a key supplied by a third party. Post the webcast
within minutes of receiving that key.

This demo would be nice and simple. Anyone who thinks Jules is a fake
and a liar will not be convinced by this or any other demo. But it
will require Jules (or anyone else who wants to do it) to put their
credibility on the line. Either they have something that works, or
they are publicly committing a fraud.

For perhaps 15 years, mostly what Jules has done is create reputedly
reversible transforms that take input data and create output data that
he claims is compressible. He then publishes long lists of numbers and
says "look, 55% zeros!" But he has never finished the job.

I would suggest now that he stop talking about what he is until he can
do a demo like the one describe here. It poses no risk to any of his
intellectual property, it just requires honesty and completeness.

- Mark Nelson

```
 0

```On May 29, 9:30=A0am, Mark Nelson <snorkel...@gmail.com> wrote:

> For perhaps 15 years, mostly what Jules has done is create reputedly
> reversible transforms that take input data and create output data that
> he claims is compressible. He then publishes long lists of numbers and
> says "look, 55% zeros!" But he has never finished the job.

I have never seen anything from Jules that even hint that has even
created a working transform, much less a highly compressable one when
the compressed size is compared to the original size.

> I would suggest now that he stop talking about what he is until he can
> do a demo like the one describe here. It poses no risk to any of his
> intellectual property, it just requires honesty and completeness.

To date he refuses to even try and write a working decompressor, so
expecting him to do the real work of creating a video is a little hard
to believe that he can do the effort needed.

Final, why believe the video.  There are tons of tools to fake such a
demo.

Simply put, I will never believe a word Jules says about his code
until I can run *HIS* decompressor on the computer of my choice using
the the compressed files created by Jules from the files *I* repeat
*I* choice to send him.

Remember, if I can't run the code on my computer it has no value to me
anyway.  The same applies to everyone else, if they can't use what
good is it?
```
 0