This post may say nothing that's not been said
in this ng before, but we do have some newcomers
who are asking for enlightenment and respect.
I will pose five questions. Working on the answers
to these five questions may be a good first step
for those pursuing random compression.
(A) Method to reduce any large random file by,
on average, 2 bits.
There are 2 possible one-bit files,
4 possible two-bit files, 8 three-bit files,
and so on. The 256 eight-bit files
can be represented as one of (2+4+8+16+32+64+128)
files shorter than eight bits, with only two
left-overs that cannot be compressed.
As a definite example, a compressor for three-bit
files might apply the following encoding:
000 --> 0
001 --> 1
010 --> 10
011 --> 11
100 --> 00
101 --> 01
110 --> 110
111 --> 111
Question #1) Why do comp.compression gurus
consider this method inadmissible?
Question #2) Can the method be applied iteratively
to achieve greater compression?
Hint: If you understand why the answer to #2 is No,
it may help you answer #1.
Some attempts at random compression are actually
variations of this "inadmissible" method.
(B) Method to compress any large random by using
alternative sub-methods.
Suppose you have a method that allows *some*
million-byte random files to be compressed
to 999,998 bytes. Suppose you have another
254 such methods, which work on other random
files. The compressor works as follows:
1. Find the successful method, if any,
numbered 00 to FE (hex). Output that
method code, followed by the 999,998 bytes
of compressed data. You have saved 1 byte.
2. If there is no successful method,
output FF, followed by the 1,000,000 bytes
of uncompressed data. You have lost 1 byte.
This method may sound worth pursuing!
Question #3) Prove that for *any* such system,
fewer than 1% of random files will be compressed.
Mark Nelson offers a prize (one million dollars?)
for compressing a million-digit random file which
is documented on the Internet. It was built
from a combination of pseudo-random and physically
random numbers.
Suppose a "little birdie" tells you that the
Nelson-Challenge file viewed as a single very
large integer happens to be a prime number!
Suppose specifically it is the k'th prime
number; moreover suppose k is itself prime!
Question #4) With these assumptions, how small
a file could the Nelson-Challenge be reduced
to? Is this enough to win the contest?
Question #5) If you suspect the randomness in
the Nelson-Challenge file to be inadequate
and the file thus compressible, what is the first
step you need to take?
Hope these questions help.
James Dow Allen
|
|
0
|
|
|
|
Reply
|
jdallen2000 (489)
|
6/16/2010 7:07:54 PM |
|
James Dow Allen <jdallen2000@yahoo.com> wrote:
(snip)
> (A) Method to reduce any large random file by,
> on average, 2 bits.
> There are 2 possible one-bit files,
> 4 possible two-bit files, 8 three-bit files,
> and so on.
Oh, I can compress any file, reducing it by
about 10 bits. It seems usual for compressors
to add a new extension to the file, such as .gz
or .bz2. If you allow for, say, 1024 different
extensions using letter combinations. (Remove
those confusing extensions, such as .exe and
such from the list) then you can code 10 file
bits into the extension.
You can even repeat this process until you reach
the system specific maximum file length.
If you transfer the file to another system, be
sure not to change the name.
-- glen
|
|
0
|
|
|
|
Reply
|
glen
|
6/16/2010 7:34:13 PM
|
|
Question #1) Why do comp.compression gurus
consider this method inadmissible?
It doesn't remove redundancy!?
Question #2) Can the method be applied iteratively
to achieve greater compression?
I don't have any idea. Please help!
On Jun 16, 9:07=A0pm, James Dow Allen <jdallen2...@yahoo.com> wrote:
> This post may say nothing that's not been said
> in this ng before, but we do have some newcomers
> who are asking for enlightenment and respect.
> I will pose five questions. =A0Working on the answers
> to these five questions may be a good first step
> for those pursuing random compression.
>
> (A) Method to reduce any large random file by,
> on average, 2 bits.
>
> There are 2 possible one-bit files,
> 4 possible two-bit files, 8 three-bit files,
> and so on. =A0The 256 eight-bit files
> can be represented as one of (2+4+8+16+32+64+128)
> files shorter than eight bits, with only two
> left-overs that cannot be compressed.
> As a definite example, a compressor for three-bit
> files might apply the following encoding:
> =A0 =A0000 --> 0
> =A0 =A0001 --> 1
> =A0 =A0010 --> 10
> =A0 =A0011 --> 11
> =A0 =A0100 --> 00
> =A0 =A0101 --> 01
> =A0 =A0110 --> 110
> =A0 =A0111 --> 111
>
> Question #1) =A0Why do comp.compression gurus
> consider this method inadmissible?
>
> Question #2) =A0Can the method be applied iteratively
> to achieve greater compression?
>
> Hint: If you understand why the answer to #2 is No,
> it may help you answer #1.
>
> Some attempts at random compression are actually
> variations of this "inadmissible" method.
>
> (B) Method to compress any large random by using
> alternative sub-methods.
>
> Suppose you have a method that allows *some*
> million-byte random files to be compressed
> to 999,998 bytes. =A0Suppose you have another
> 254 such methods, which work on other random
> files. =A0The compressor works as follows:
> =A0 1. =A0Find the successful method, if any,
> numbered 00 to FE (hex). =A0Output that
> method code, followed by the 999,998 bytes
> of compressed data. =A0You have saved 1 byte.
> =A0 2. =A0If there is no successful method,
> output FF, followed by the 1,000,000 bytes
> of uncompressed data. =A0You have lost 1 byte.
>
> This method may sound worth pursuing!
>
> Question #3) Prove that for *any* such system,
> fewer than 1% of random files will be compressed.
>
> Mark Nelson offers a prize (one million dollars?)
> for compressing a million-digit random file which
> is documented on the Internet. =A0It was built
> from a combination of pseudo-random and physically
> random numbers.
>
> Suppose a "little birdie" tells you that the
> Nelson-Challenge file viewed as a single very
> large integer happens to be a prime number!
> Suppose specifically it is the k'th prime
> number; moreover suppose k is itself prime!
>
> Question #4) With these assumptions, how small
> a file could the Nelson-Challenge be reduced
> to? =A0Is this enough to win the contest?
>
> Question #5) If you suspect the randomness in
> the Nelson-Challenge file to be inadequate
> and the file thus compressible, what is the first
> step you need to take?
>
> Hope these questions help.
>
> James Dow Allen
|
|
0
|
|
|
|
Reply
|
Chi
|
6/17/2010 7:25:02 AM
|
|
On Jun 16, 1:07=A0pm, James Dow Allen <jdallen2...@yahoo.com> wrote:
> This post may say nothing that's not been said
> in this ng before, but we do have some newcomers
> who are asking for enlightenment and respect.
> I will pose five questions. =A0Working on the answers
> to these five questions may be a good first step
> for those pursuing random compression.
>
> (A) Method to reduce any large random file by,
> on average, 2 bits.
>
> There are 2 possible one-bit files,
> 4 possible two-bit files, 8 three-bit files,
> and so on. =A0The 256 eight-bit files
> can be represented as one of (2+4+8+16+32+64+128)
> files shorter than eight bits, with only two
> left-overs that cannot be compressed.
> As a definite example, a compressor for three-bit
> files might apply the following encoding:
> =A0 =A0000 --> 0
> =A0 =A0001 --> 1
> =A0 =A0010 --> 10
> =A0 =A0011 --> 11
> =A0 =A0100 --> 00
> =A0 =A0101 --> 01
> =A0 =A0110 --> 110
> =A0 =A0111 --> 111
>
> Question #1) =A0Why do comp.compression gurus
> consider this method inadmissible?
>
>
You actually can use this to "compress" a bit file.
If the input file is exactly 3 bits to begin with and all you care
about
are files that are exactly 3 bits in length.
However it is not a valid compression method for the
set of all bit files. The best you can do is a bijective
compressor and we can even use your table above
as part of it.
The trick is let every file 4 bits and longer map to themselves.
You have covered what 3 bit files map to all you need to do
is cover what 1 and 2 bit files map to. The easies fix is to
just reverse you staring tables
0 to 000
1 to 001
00 to 010
01 to 011
10 to 100
11 t0 101
now you have a complete mapping
Notice if every file is equally likely that is random you
have no net savings. That is there is not compression
on the average. If the input is not random but has a lot
of 3 bit files then on the average you may have some
compression.
Lossless compression at its best is nothing more than
a reordering of the files. Real compression occurs if you
can find a reordering which tends to favor likely files.
Random files if random don't have this property so they
can't be compressed.
What leads many people astray is they get a small number
of files and match some method that will reduce in size the
test files. And then they wrongly think it will work will all so
called random files. Well it doesn't.
David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
|
|
0
|
|
|
|
Reply
|
biject
|
6/17/2010 6:57:51 PM
|
|
Hi,
> (A) Method to reduce any large random file by,
> on average, 2 bits.
>
> There are 2 possible one-bit files,
> 4 possible two-bit files, 8 three-bit files,
> and so on. The 256 eight-bit files
> can be represented as one of (2+4+8+16+32+64+128)
> files shorter than eight bits, with only two
> left-overs that cannot be compressed.
> As a definite example, a compressor for three-bit
> files might apply the following encoding:
> 000 --> 0
> 001 --> 1
> 010 --> 10
> 011 --> 11
> 100 --> 00
> 101 --> 01
> 110 --> 110
> 111 --> 111
>
> Question #1) Why do comp.compression gurus
> consider this method inadmissible?
>
> Question #2) Can the method be applied iteratively
> to achieve greater compression?
Sadly enough, I haven't seen a satisfactory answer to this, even though
I consider it fairly obvious what is wrong here. Hint: It is not related
to "bijectivity" at all. Think about "where does the information go".
Otherwise, I don't want to spoil the quiz. Actually, #2 is also a pretty
good hint since the "extra information" is already taken up at the first
step.
> (B) Method to compress any large random by using
> alternative sub-methods.
>
> Suppose you have a method that allows *some*
> million-byte random files to be compressed
> to 999,998 bytes. Suppose you have another
> 254 such methods, which work on other random
> files. The compressor works as follows:
> 1. Find the successful method, if any,
> numbered 00 to FE (hex). Output that
> method code, followed by the 999,998 bytes
> of compressed data. You have saved 1 byte.
> 2. If there is no successful method,
> output FF, followed by the 1,000,000 bytes
> of uncompressed data. You have lost 1 byte.
>
> This method may sound worth pursuing!
>
> Question #3) Prove that for *any* such system,
> fewer than 1% of random files will be compressed.
Count the number of files that can be represented by this method
by a string shorter than a million bytes. Again, I don't want to
spoil this, but it's fun working it out.
Folks, come on, this isn't hard!
> Suppose a "little birdie" tells you that the
> Nelson-Challenge file viewed as a single very
> large integer happens to be a prime number!
> Suppose specifically it is the k'th prime
> number; moreover suppose k is itself prime!
>
> Question #4) With these assumptions, how small
> a file could the Nelson-Challenge be reduced
> to? Is this enough to win the contest?
Hmm, here's my answer. Given I know the number is prime,
I would first need to encode this knowledge (for a general purpose
compressor), which takes up one bit. Furthermore, the number of
primes smaller than x, called pi(x) approximates x / log(x) for large x,
that is, instead of "Nelson's number" I would only need to encode the
prime number index of size x / log(x). Then compute how many digits that
number would take.
> Question #5) If you suspect the randomness in
> the Nelson-Challenge file to be inadequate
> and the file thus compressible, what is the first
> step you need to take?
Mail in the algorithm to collect the prize money. (-:
Oh well, patenting it is probably not a good idea, it's too special
purpose. (-:
So long,
Thomas
|
|
0
|
|
|
|
Reply
|
Thomas
|
6/19/2010 8:19:53 PM
|
|
On Jun 19, 9:19=A0pm, Thomas Richter <t...@math.tu-berlin.de> wrote:
> > (A) Method to reduce any large random file by,
> > on average, 2 bits.
>
> > There are 2 possible one-bit files,
> > 4 possible two-bit files, 8 three-bit files,
> > and so on. =A0The 256 eight-bit files
> > can be represented as one of (2+4+8+16+32+64+128)
> > files shorter than eight bits, with only two
> > left-overs that cannot be compressed.
> > As a definite example, a compressor for three-bit
> > files might apply the following encoding:
> > =A0 =A0000 --> 0
> > =A0 =A0001 --> 1
> > =A0 =A0010 --> 10
> > =A0 =A0011 --> 11
> > =A0 =A0100 --> 00
> > =A0 =A0101 --> 01
> > =A0 =A0110 --> 110
> > =A0 =A0111 --> 111
>
> > Question #1) =A0Why do comp.compression gurus
> > consider this method inadmissible?
>
> > Question #2) =A0Can the method be applied iteratively
> > to achieve greater compression?
>
> Sadly enough, I haven't seen a satisfactory answer to this, even though
> I consider it fairly obvious what is wrong here. Hint: It is not related
> to "bijectivity" at all. Think about "where does the information go".
> Otherwise, I don't want to spoil the quiz. Actually, #2 is also a pretty
> good hint since the "extra information" is already taken up at the first
> step.
>
> > Question #5) If you suspect the randomness in
> > the Nelson-Challenge file to be inadequate
> > and the file thus compressible, what is the first
> > step you need to take?
>
> Mail in the algorithm to collect the prize money. (-:
>
> Oh well, patenting it is probably not a good idea, it's too special
> purpose. (-:
Warm Greetings :)
A "little birdie" now recent confirms to you Method (A) now been
improved upon & similar proof to Method (A) here proven to 'general
purpose' reduce any large random file by, on average, at very least de
minimis 2 bits :( ..... or more :) ...... & now with no 'left-
overs' nowhere-to-go unable to be represented .....
HOW will the answer now differs (?)
Regards
(A) Method to reduce any large random file by,
on average, 2 bits.
There are 2 possible one-bit files,
4 possible two-bit files, 8 three-bit files,
and so on. The 256 eight-bit files
can be represented as one of (2+4+8+16+32+64+128)
files shorter than eight bits, with only two
left-overs that cannot be compressed.
|
|
0
|
|
|
|
Reply
|
LawCounsels
|
6/20/2010 9:21:23 AM
|
|
On Jun 20, 3:19=A0am, Thomas Richter <t...@math.tu-berlin.de> wrote:
> > Suppose a "little birdie" tells you that the
> > Nelson-Challenge file viewed as a single very
> > large integer happens to be a prime number!
> > Suppose specifically it is the k'th prime
> > number; moreover suppose k is itself prime!
>
> > Question #4) With these assumptions, how small
> > a file could the Nelson-Challenge be reduced
> > to? =A0Is this enough to win the contest?
>
> Hmm, here's my answer. Given I know the number is prime,
> I would first need to encode this knowledge (for a general purpose
> compressor), which takes up one bit. Furthermore, the number of
> primes smaller than x, called pi(x) approximates x / log(x) for large x,
> that is, instead of "Nelson's number" I would only need to encode the
> prime number index of size x / log(x). Then compute how many digits that
> number would take.
OK.
If N is a million-digit number and is the k'th prime,
then k probably has about 999,994 digits.
If, by extremely good luck, k is itself the j'th
prime, then j probably has about 999,988 digits.
To win the Million Dollar Prize, the code for
developing the (p_j)'th prime would have to fit
in the space of 12 decimal digits! Good luck
on that!
I like this example. It shows that one needs more than
a little help to win the Million Dollar Prize!
James
|
|
0
|
|
|
|
Reply
|
James
|
6/23/2010 8:05:44 PM
|
|
James Dow Allen <jdallen2000@yahoo.com> wrote:
(snip)
> If N is a million-digit number and is the k'th prime,
> then k probably has about 999,994 digits.
> If, by extremely good luck, k is itself the j'th
> prime, then j probably has about 999,988 digits.
> To win the Million Dollar Prize, the code for
> developing the (p_j)'th prime would have to fit
> in the space of 12 decimal digits! Good luck
> on that!
> I like this example. It shows that one needs more than
> a little help to win the Million Dollar Prize!
There may be prizes for factoring, or proving the primeness,
of large numbers.
The million digit number likely has some large factors,
so won't be easy to factor.
-- glen
|
|
0
|
|
|
|
Reply
|
glen
|
6/23/2010 8:55:12 PM
|
|
|
7 Replies
28 Views
(page loaded in 0.132 seconds)
|