f



Quantized Indexing Source Code Available #2

Quantized Indexing is a new entropy coding algorithm
based on enumerative combinatorics of symbol sequences.
It codes typically 10-20 times faster (over 200
times in low entropy limit, and over 6 times in high
entropy limit) than the fastest arithmetic coders,
while compressing tighter for all input ranges.
The Quantized Indexing web page:

   http://www.1stWorks.com/ref/qi.htm

contains preprints describing the algorithm and  the C source code
for algorithms researchers.  The source code contains several
variants of QI coding, such as generic binary, sparse binary, mixed
radix (inlcuding factorial radix and optimal permutations
encoder/decoder).

The sparse binary coder runs additional 2-5 times faster on the sparse
data than the generic QI coder (note that in our benchmarks against AC
we had used only the generic QI coder).  The sparse coding useful for
compression of B&W images,  outputs of other higher level coders (such
as BWT output), as well as for coding frequency tables of the multi-
block version of QI (or other entropy coders). It would be particularly
useful for encoding of data base and web search engine bit-map indexes
and the keyword incidence maps, which, at least for the major engines
such as Google, Yahoo, MSN are billions of  bits  long and extremely
sparse. With that type of data the generic QI, and especially the
sparse QI coder, run at the speeds of run-length coders (which is
2-3 orders of magnitude faster than AC or LZ type coders that are
often used in this role),  but without the fragility of the RL or
adaptive AC against the small clusters of denser data.

The source also contains a little tutorial program for general
enumerative coding (which is in enumerative combinatorics called
ranking & unranking), showing step by step encoding & decoding of
user entered short binary or hex inputs, with all bit strings and
indexes shown in full as they transform. Another small tutorial
feature provides details of permutations coding and factorial radix.

Another set of functions included shows in detail the properties of the
quantized binomial tables, including all gaps in the output code space
(this was a subject of some controversy in the earlier thread) and
various types of redundancies due to quantization.

A recent thread here has additional discussion & links about QI
algorithm:

"New combinatorial coder: tighter & faster than Arithmetic coding"
http://groups.google.com/group/comp.compression/browse_thread/thread/ffc7f7ca84c76378

Few posts with additional info:

http://groups.google.com/group/comp.compression/msg/ae53ee708dae058c
http://groups.google.com/group/comp.compression/msg/6418d7524322c6c3
http://groups.google.com/group/comp.compression/msg/d601691935c4c116
http://groups.google.com/group/comp.compression/msg/a656f2937e768fa3
http://groups.google.com/group/comp.compression/msg/57984b354bb2ec1a
http://groups.google.com/group/comp.compression/msg/368b484e5c26dce3
http://groups.google.com/group/comp.compression/msg/b89ea4829c82659c

0
12/24/2005 12:36:18 PM
comp.compression 4696 articles. 0 followers. Post Follow

127 Replies
3190 Views

Similar Articles

[PageSpeed] 4

As an illustration how simple enumerative coding is, below is
the actual source code for the exact EC (for 32 bit inputs only,
although one could extend it to 64 bit fairly easily). To display
properly use fixed font for this post.

//-- Encode bit-string x to enum index I

dword enc32(dword x)
{ int k,n;
  dword I;
    k=1; I=0;
    while(x)	      // assumes that unused/higher bits in x are 0
      {
      n=loBit(x);     // get bit offset of the lowest bit set to 1
      x&=x-1;	      // clear the lowest bit=1 in "buffer" x
      I+=bc32(n,k++); // add binomial C(n,k) to the index, k=# of ones
      }		      // increment count of ones, k
    return I;	      // return enumerative index I
}

//-- Decode enum index I to bit-string

dword dec32(dword I,int n,int k)
{ dword x,b;
    x=0;
    do{
      x<<=1;	    // fill in decoded bit as 0	(at position 0)
      b=bc32(n,k);  // find the largest binomial coefficient C(n,k)<=I
      if (I>=b)	    // check if we can subtract b from I
	{	    // ==> yes, decoded bit is 1
	I-=b; ++x;  // reduce index I and set decoded bit=1
	if (!--k)   // decrement count of 1's and stop if no more 1's left
	  {
	  x<<=n;    // pad the rest of output with 0 bits (in the low bits)
	  break;    // leave the decoding loop
	  }
	}	    // ==> no, decoded bit is 0; try next smaller b(n,k)
      }
    while(--n>=0);  // this loop can be made to go faster using
    return x;	    // the binary (instead of sequential) search for n 
}

0
nightlight
12/24/2005 1:08:35 PM
I received some questions about the source and the algorithm via email
and since the matter may be of interest to others who downloaded it, I
will
reply here as well:

> Concerning the source, I am afraid it is of little use in this form.
> I don't run windows here, and I do not feel like I would really need it.
> Thus, if you'd like to present your work, it would be most useful to
> depend on ANSI-C only.

The main nonstandard VC6/win32 C use in the source has to do with the
nanosecond timer & thread control (in Qiutl.c). That was added in case
users wish to check the precise performance against other coders.
There are few additional macros in Qitypes.h such as hiBit(), loBit(),
shifts & mul/div for various 32/64 bit mixes (which VC6 has implemented
very poorly and which are used in mixed radix code). For the next
release I'll try to add some control macros in the top level
application header "Intro.h" which control how these currently
nonstandard C elements are implemented, with ANSI C variant supported.
The matter of endian and word size is a bit more involved to extend and
any future generalization there depend on the requests from people
experimenting with the code. Right now the coders assume 32 bit little
endian architecture. Extension to 64 bit should be fairly simple (and
may be included within next few revs), while the big endian if it comes
up, will require more line by line work through the routines.

> Concerning the algorithm itself: This pretty much reminds me
> on the ELS coder,  and I think it might be only benefitial to check
> out some papers on this matter.

The ELS coder is a member of AC algorithms. It is essentially a
rediscovered variant of the orignal Rissanen's 1976 AC coder, AC-76
(which presents that  kind of additive integer AC in a mathematically
much cleaner and in a more rigorous way than the ELS paper from
Pegasus does on their version):

  5. J. Rissanen  Generalised Kraft inequality and arithmetic
     coding, IBM J. Res. Dev. 20, 198-203, 1976
     http://www.research.ibm.com/journal/rd/203/ibmrd2003B.pdf

The key differences between EC/QI and AC-76 (or ELS) type of coding
is discussed in [T2] pp. 19-25. The obvious similarity to EC/QI is
not accidental -- Rissanen was trying to solve the very same EC
precision problem that QI solves (this story was discussed in
more detail in Rissanen's 2nd AC paper:

   27. J. Rissanen   Arithmetic codings as number representations
   Acta Polyt. Scand., Math. & Comp. Sc.  Vol. 31 pp 44-51, 1979
   http://www.1stworks.com/ref/ariNumRepr.pdf

as partially recounted in the intro section of [T3] and chapter in
[T2]). As explained in [T2] p. 22-23, he could not find the right
quantization for the exact enumeration. So he approximated the
unlimited precision enumeration first (Stirling, etc), then found
how to quantize the latter. The QI performs quantization of exact
EC directly, without approximating the exact enumeration. The
benefits of the new procedure relative to AC-76 (or its rediscovery,
ELS) are described in detail in [T2] pp. 19-25. Both, the coding
accuracy and the speed are improved via new procedure (speed
particularly dramatically).

Regarding the speed difference, you should note that QI performs
no coding operations on the most probable symbol, other than
skipping it (at the memory bus speed) while AC-76 & ELS do have
to update much larger coder state (beyond just the bare count of
symbols processed). As explained in [T2], AC-76 (or ELS) could
construct tables which would allow them skipping, but they would
need such tables for each source probability.

> It seems to me that your algorithm is a special form of ELS tuned
> to multinomials.

The QI is not a special case of ELS or AC-76 -- it is a
more accurate and a much faster  solution of the same
problem. The AC-76 and the later ACs can be obtained
via approximation of QI (cf. [T2] pp. 22-23). Note also that
multinomials are only a special case of EC/QI addends --
they occur for order-0 Markov source. E.g. Cover's paper
[1] shows exact EC order-1 binary Markov enumerator.
The Oktem's thesis [23] has several other types of addends,
some of which don't even have closed forms (or at least they
didn't check the combinatorial literature well enough).
Constrained coding (cf. [26] and other Immink's work)
also has numerous other types of addends.

In short, the multinomials are just one type of EC/QI
addends. Even the QI sample source includes several of
non-multinomial types of addends, such as powers and
factorials for radix & permutation coding. Similarly, coding
of trees would use Ballot numbers (which reduce to Catalan
numbers on the main lattice diagonal x=y).

>  Whether this can be exploited for general-purpose
> compression is not yet clear  to me.

I think the difficulty that this and several other of your
questions show is result of using the AC modeling paradigm
to try to figure out how would one model with EC/QI. The
EC/QI doesn't model that way at all. This was discussed
at some length in [T2] pp. 26-35 (see especially 30-31
for direct comparison with AC modeling paradigm/pattern).

Briefly, the division of labor between the coder and
modeler in EC/QI is different than for AC & its modeler.
The EC/QI expect the modeler to perform particular kind
of partition of input which hands to coder instances of
equiprobable "messages" for enumeration (computation
of index). The "partition" or "segmentation" is only
in simple cases (such as quasi-stationary order-0 source)
the literal segmentation of input sequence. Generally,
this is not the case. E.g. for a known order-M
quasi-stationary Markov source, one would partition
EC output into M separate streams -- each symbol is
coded via EC/QI into a stream selected based on the
last M symbols. For unknown source, the BWT algorithm
performs such partition in its output column (cf. [T2]
pp 34-35; note that I use only the transform of BWT,
not the MTF or other entropy coding usualy done in the
2nd phase of BWT). The BW transform can be used as
general purpose segmentation module of the EC modeling
engine. The second half of an earlier reply to Matt Mahoney
gives some more detail on this:

http://groups.google.com/group/comp.compression/msg/2c769e3a278a62f4?hl=en&

The AC end EC modeling schemes belong to different schools of
thought in information theory, AC to Shannon, EC to Kolmogorov
school. The latter, due to the absence of practical coding
algorithm, has been living in an undeserved shadow. I think
that its potential has yet to be tapped and that it will prove
itself a much more powerful modeling scheme than what is done
today using AC (which functions through a modeling bottleneck,
as argued in T2). Future will tell.

References
--------------------------------------------------------------------------
  T1. R. V. Tomic "Fast, optimal entropy coder"
      1stWorks TR04-0815, 52p, Aug 2004
      http://www.1stworks.com/ref/TR/tr04-0815b.pdf

  T2. R. V. Tomic "Quantized indexing: Background information"
      1stWorks TR05-0625, 39p, Jun 2005
      http://www.1stworks.com/ref/TR/tr05-0625a.pdf

  T3. R. V. Tomic "Quantized Indexing: Beyond Arithmetic Coding"
      arXiv  cs.IT/0511057, 10p, Nov 2005 (also: 1stWorks TR05-1115)
      http://arxiv.org/abs/cs.IT/0511057

Additional relevant papers are at:

    Quantized Indexing RefLib page:
    http://www.1stworks.com/ref/RefLib.htm

    Quantized Indexing Home Page:
    http://www.1stworks.com/ref/qi.htm

0
nightlight
12/29/2005 11:05:18 AM
> for a known order-M quasi-stationary Markov source,
> one would partition EC output into M separate streams
> -- each symbol is coded via EC/QI into a stream selected
> based on the last M symbols.

Ooops, there is a bad typo up there. The fragment:

 "into M separate streams"

should say:

 "into up to A^M separate streams (where A is alphabet size)"

0
nightlight
12/29/2005 11:20:01 AM
> Quantized Indexing RefLib page:
> http://www.1stworks.com/ref/RefLib.htm

There were two bad links in the RefLib.htm file. The missing files were
Ruskey's Combinatorial Generation [38] and Potapov's Theory of
Information [37] textbooks.  Strangely, while there were many failed
attempts to download (errors 404) these files, no one left any feedback
to fix the link. Anyway, they files should be Ok now.

0
nightlight
12/31/2005 2:52:42 AM
There was an small update to the source. The main addition (requested
in several emails) was an option:

      #define _ASM 1      // Set to 0 to disable inline asm

in the Qitypes.h which allows disabling of the VC6 inline asm (used in
some macros). The speed of operations (mostly radix codes decoder)
drops by a few percent without the inline asm. There were also some
minor compiler warnings that some people emailed about which were
cleaned up in the latest code (I left the file name unchanged for the
old links to work):

   http://www.1stworks.com/ref/C/QIC100.zip

There is also a thread about the algorithm discovery in the Computer
Chess Club ( http://www.talkchess.com/  <-- signup screen), where the
seed for it was planted way back in the last century. Here is the
excerpt for those interested in that aspect:

------ Computer Chess Club thread excerpt (few typos fixed) ----------

  "Number of positions in chess -- The rest of the story"
  http://www.talkchess.com/forums/1/message.html?476509

Posted by Ratko V. Tomic on January 03, 2006 at 08:12:12:

> Uri Blass:  There is a better upper bound see:
>
> http://chessprogramming.org/cccsearch/ccc.php?art_id=77068
>
> Uri

Hi Uri, that little excercise in enumeration you brought up
(and mentioned in the other thread) set me off back then to
try make it work as a general purpose compression algorithm.
While a neat idea on paper, the problem was that the arithmetic
precision had to be of the size of output.

After struggling for a while, I searched the literature and
it turned out such compression algorithm already existed,
called "Enumerative Coding", since 1960s (first in Russian
literature, from Kolmogorov and his disciples, then shortly
thereafter here, in USA, from Lynch and Davisson). And, as
in my version, the precision problem was still unsolved after
over four decades of various attempts to make the algorithm
practical.

Since I arrived at it on my own, my conventions for
enumeration happened to be backwards from those that
existed in the literature (mine sorted right to left,
the so-called colex sorting of combinations, and built
up the enumerative addends bottom up, while the standard
scheme sorted lexicographically & worked recursively top
down,  plus all my pictures were rotated 45 degrees from
theirs).  Further, due to my playing with lattice methods
in QCD (in my physics graduate school days), I also had
my own visual representation of combinatorics as lattice
walks, which is a very intuitive, heuristically rewarding way
of looking at it, allowing one to see all of the combinatorial
identities at a glance (especially useful for tossing and
checking out algorithms in the head when going to sleep
or waking up,  without a pencil and paper).  The lattice
formulation turns out to have existed in the EC literature
as well (as the Schalkwijk's Pascal triangle walks), although
not in as general or elegant formalism as mine, lacking even
a notation  for the lattice walks, key sets of paths, enumerative
classes, constraints... (stuff I worked out while doing physics).

Since that thread, I kept returning to the problem, on and
off, trying various ideas. Nothing worked. Then, in summer
2004, when my wife and kids went to a summer camp for
a week, I stayed home to work on a programming project
(a video codec). The first night home alone, at about 2AM,
while debugging the latest batch of code, out of nowhere
an idea popped into my head on that pesky enumeration
problem, something I didn't yet try. I quickly coded just a
toy version, allowing input buffers of 32 bits only, and by
dawn it worked -- a version using arithmetic precision of  only
8 bits encoded & decoded correctly all 2^32 possible inputs.

That same early Sunday morning, it must have been around 6AM,
I called and woke up the company owner (I am a CTO & a chief
scientist), and he, still half awake, yielded to my enthusism and
agreed to suspend the original project, so I could try if the idea
works on data of any size.

At the time I didn't have a proof, and wasn't sure even at the
heuristic level, that it can always be decoded. I also didn't
know what the maximum or average redundancy would result
from the reduced  precision. Within a week I had a simple
version of code working on buffers up to 4 kilobits, using only
16 bit arithmetic precision (instead of 4 kilobit precision).
It worked again, and it was very fast, even in that crude version.
The redundancy due to the limited precision arithmetic was
measured and it was on average about 0.05  bits (and always
below 0.07 bits)  for the entire 4k block.

The next couple months I extended the algorithm to any input
size and to general alphabet (from the original binary alphabet).
I also found a proof of general decodability and an expression
for the max redundancy due to finite precision. The max
redundancy is always below log(e)/2^(g-1) for g bit arithmetic
precision (I use now g=32 bit precision). The fourty years old
puzzle was finally cracked. The accidental backwards conventions
of my initial approach turned out to be the critical element
exposing the key to the solution, which is virtually impossible
to spot from within the conventional enumerative coding scheme.

I also developed a new modeling scheme for combinatorial methods
of coding (such as the new algorithm) which is quite promising
on its own. It is basically a scheme along the lines of Kolmogorov's
algorithmic approach to information theory (in contrast to Shannon's
probabilistic approach, which is dominant at present, and where
modeling for arithmetic coding consists in calculating the
probabilities of the next single symbol).

The algorithm, which I named "Quantized Indexing", turned out
pretty amazing. It codes always tighter than the present best
entropy coding algorithm, the Arithmetic Coding (which is only
a particular approximation of QI), yet it codes much faster than
AC due to using only a simple table add (of a machine size word)
for the less frequent symbol and no coding operations for the
most  frequent symbol (AC needs coding operations for both types
of  symbols, and more expensive operations at that, such as mul, div).

As result, QI runs typically 10-20 times faster than the
fastest full Arithmetic Coder implementations (and always at
least 6 times faster, which occurs in high entropy limit,
while for very sparse data, the low entropy limit, QI runs
well over 200 times faster).

Recently, I posted a preprint about the algorithm to arXiv:

   http://arxiv.org/abs/cs.IT/0511057

and also created a web page with additional more detailed
technical reports and the C source code:

   http://www.1stworks.com/ref/qi.htm

A nice little surprise came when I emailed about the
preprint to Jorma Rissanen, the inventor of arithmetic coding
himself. He had struggled for several years with the very
same enumerative coding precision problem, inventing in the
process arithmetic coding as a compromise solution (in 1976,
while working for IBM). Although busy with a lecture tour,
he read the paper right away and was quite pleased to see
his old problem solved at last and a bit surprised at how
"very clever" the solution turned out to be.

So, that's the chain of unlikely events triggered by your
original code snippet for enumerating the chess positions.

0
nightlight
1/4/2006 8:40:05 AM
"nightlight" <nightlight@omegapoint.com> wrote in 
news:1136364005.126391.156010@g43g2000cwa.googlegroups.com:

> 
> The algorithm, which I named "Quantized Indexing", turned out
> pretty amazing. It codes always tighter than the present best
> entropy coding algorithm, the Arithmetic Coding (which is only
> a particular approximation of QI), yet it codes much faster than
> AC due to using only a simple table add (of a machine size word)
> for the less frequent symbol and no coding operations for the
> most  frequent symbol (AC needs coding operations for both types
> of  symbols, and more expensive operations at that, such as mul, div).
> 
> 

   
 I know you claim to have this great code. But others such as Matt
the inventor of PAQ codes at least wrote a simple arithmetic coder
FPAQ0 to show how his methods would compete on files where zero order
arithmetic coding would come into play. Since your model is always
"tigther than the present best enropy coding algorithm" do you have
actually test code to compare with real files and test sets. Or is
the method not yet advanced enough to do real compression on files
yet.

  Don't get me wrong Matt is not saying FAPQ0 is the best entropy
coder he knows it isn't that good. He is just showing how it works
for a certain model. The same sort of thing could be dome with your
method. It least that is if your method is comparable at all with
real world entropy coders. And since using the same model as most
one could calculate the true entropy of versus standard test models.

  If one does this one can see Matts code does not produce on average
the shortest file. Maybe yours could do better if you ever actually
get real working code. But I find it hard to belive it could compress
as well as some current entropy encoders.

  Where Matts code shines is his use of various models and a slick
way to combine them which makes his family of PAQ coders among the
best on the net. The point is if your code is half as good as you
claim then his simple 2 state enropy coder could be replaces by
you faster and tighter 2 state coders wich would bring you name fame.
But I won't hold my breath.


David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/4/2006 11:17:17 AM
> "tigther than the present best entropy coding algorithm" do you
> have actually test code to compare with real files and test sets.
> Or is the method not yet advanced enough to do real compression
> on files yet.

You can download the source code which shows the coding aspects where
QI is essentially different from the existent entropy coding
algorithms. All that the source will show you is that QI is more
accurate and much faster entropy coder than any AC you may have, tested
under the same conditions (e.g. same model, same coding of model
information, of data lengths, counts, output serialization or
packaging... etc). There are some quasi-arithmetic coders, which run
faster than the full AC (paying for speed in extra output redundancy),
but even these won't run 20 or 30 times faster, let alone 200 times
faster than the full AC (they're usually 2-3 times faster), as QI does.
But if you have one such, you're welcome to test it. I would be curious
to know.

The source code itself is not a file archiver or video codec or any
such higher level application.  Any differences for these higher level
applications, however interesting they may be otherwise, are a simple
consequence of the fundamental differences:

a)  AC always codes with greater redundancy than QI (under the same
coding conditions, obviously; this is the result of AC being a direct
approximation of QI, see [T2]  pp. 19-25, the chapter on that exact
difference, how much and how much for which data, with additional
details on AC in [40],[41],[41a],[41b]) and

b) AC codes much more slowly than QI due to:

... b1) AC has to perform coding operations for all input symbols, while
QI  can just skip over the most probable symbols at memory speed (you
can see the top of coding loop in EncDec.c where it merely scans the
memory for the the less probable symbol, 32 symbols for each loop step
at basically memory scan speed), and

... b2)  AC performs more complex operations for the least probable
symbols (which QI also needs to encode explicitly) i.e. mul/div vs
simple array lookup and add. This difference, which remains even for
the uncompressable data (where the least & most probable symbols are
approximately equally likely), allows QI to still code at least 6 times
faster than the full AC even in high entropy limit.

All that is, of course, measurable using the source code provided
(which also includes very accurate timing functions). The above are not
religious claims or invitation to believe or debate belief systems, but
a simple statement of easily verifiable empirical facts. If you need to
know also how it would do on "real" file, and can't extrapolate from
how it does on memory buffers filled with arbitrary content, well, you
are welcome to add such file related code and try it. Now, if you do
add a file i/o which takes hundreds times longer than the coding, I can
predict you won't see almost any speed difference.

-- References ( http://www.1stworks.com/ref/RefLib.htm )

T2. R. V. Tomic  "Quantized indexing: Background information" 1stWorks
TR05-0625, 39p, Jun 2005
http://www.1stworks.com/ref/TR/tr05-0625a.pdf

 40. J.C. Kieffer  "Second Order Analysis of Data Compression
Algorithms" (Preprint from J.C.K. lectures)
http://citeseer.ist.psu.edu/370131.html

41. M.Drmota, H-K. Hwang, W. Szpankowski  "Precise Average Redundancy
of an Idealized Arithmetic Coding" DCC 2002, 222-231.
http://citeseer.ist.psu.edu/drmota02precise.html

41a. P.A.J. Volf   "Weighting Techniques In Data Compression: Theory
and Algorithms" Ph.D. thesis, Eindhoven University of Technology, Dec
2002
http://alexandria.tue.nl/extra2/200213835.pdf

 41b.B. Ryabko, A. Fionov  "Fast and Space-Efficient Adaptive
Arithmetic Coding" Proc. 7th IMA Intern. Conf. on  Cryptography and
Coding, 1999
http://www.1stworks.com/ref/RyabkoAri99.pdf

0
nightlight
1/4/2006 1:00:59 PM
"nightlight" <nightlight@omegapoint.com> wrote in 
news:1136379659.121605.50640@g44g2000cwa.googlegroups.com:

>> "tigther than the present best entropy coding algorithm" do you
>> have actually test code to compare with real files and test sets.
>> Or is the method not yet advanced enough to do real compression
>> on files yet.
> 
> You can download the source code which shows the coding aspects where
> QI is essentially different from the existent entropy coding
> algorithms. All that the source will show you is that QI is more...

  I guess that means you don't yet have code where you can compare
it to even simple airhmtic file coders. I don't have faith in your
work from your earlier posts.

  A simple No you can't actully test it in any real applications yet
would have been enough. Again from the earlier thread its not obvious
to me you have a full understanding of arithmetic coding methods.

 
> 
> All that is, of course, measurable using the source code provided
> (which also includes very accurate timing functions). The above are not
> religious claims or invitation to believe or debate belief systems, but
> a simple statement of easily verifiable empirical facts. If you need to
> know also how it would do on "real" file, and can't extrapolate from
> how it does on memory buffers filled with arbitrary content, well, you
> are welcome to add such file related code and try it. Now, if you do
> add a file i/o which takes hundreds times longer than the coding, I can
> predict you won't see almost any speed difference.


  Very funny. 
Its very strange you make a big deal of claiming you compare it
against what you claim is a good entropy coder Moffat. Yet you don't
even test against it on a level playing ground. You think by modifying
Moffat that you are giving an honest test. However if you really wanted
an honest test since you are the new kid on the block. You would think
you could easily convert your code to work on files like Moffat's or are
you afraid to test it on the same playing field Moffat and others have
picked so various methods can be tested against yours. 

  I for one belive you shifted ground because you fear real aithemetic
coders and people could take existing software without modification and
show you directly that you coder does not lead to shorter compressed output
than already existing coders. I suspect this since most would have tested
the Moffat code on the same playground instead of moding it to one of your
choice where its not easier to compare against any other standard codings.

 Look maybe you have something. If your method is any good at all
surely you could easily add the stuff you stripped out of Moffat's
to your code so that you can  compare the compression results or is
there some reason you can't. If you can't than it would seem to be of
little use.

See:

http://groups.google.com/group/comp.compression/browse_frm/thread/ffc7f7ca8
4c76378/792884848866dc4c?q=moffat&rnum=4#792884848866dc4c

From below if its true at all and if your code works at all you should
have the courage of your convictions to test it against Moffat and others
where they were designed to run. Doesn't yours work there whats the 
problem?

YOUR QUOTE
"Note also that for the test we had stripped the Moffat et al. V3 binary
coder to its bare engine, replaced stream i/o with memory to memory
coding, no allocations were done within the timed loop, model decisions
were taken out (since only order 0 was tested) and all dynamical coding
options (alternatives that didn't matter)  were hardwired as static to
avoid time waste in the test. So the arithmetic coder tested was
already quite a bit faster than the out-of-a-box Moffat et al code.  We
wanted it to take its best shot, the best it possibly could (since QI
didn't need any such implementation/code level edge, being so much more
efficient at the fundamental, algorithmic level)." 

  If you wanted to take the best shot Again let me stated that for the
dense. If you wanted to take the best shot. Then test it like others on
files where Moffats was made to work. If you don't one can only wonder
what you are hiding. Especailly since you clain this is so much better
than arithmetic. 

  I thought about downloading as you suggested and converting it to files
so it really can honestly be compared to Moffat. But I realize from your
post that you would claim I did a poor job of converting so I will not do
it. After all your the one making the cliam its better than Moffat. Your
the one saying you compared it to Moffat. Yet you really only compared it
to a modifed verision of Moffat that you yourself modified. 

David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/4/2006 6:12:40 PM

NIGHTLIGHT what is wrong with you?

 You claim a coder better than the best arithmetic, Yet
your only proof is your word by using Moffat code you
modifed yourself.

  Look people here are willing to help those trying to
learn. Yet you never seem to anwser real questions.

  Such as why you went to the trouble to modify Moffats
code yourself and then proclaim your method is better
than the best present day arithmetic encoders.

  Please stop with the ridiculus post of refereces that
have nothing to do with this unless your trying to pull
the wool over peoples eyes.

  Since we don't work for you and don't have to kiss
your ass for a job. If you really want to show its better
and can compress to smaller sizes than Moffats. Then do
an honest test on unmodifed code. 

  Why is it you can change his code and put him down
with code that is not even completely his?
Yet you seem unable to change your code to work with
the data his is designed for? Surely you have the 
ability with your team of people to do that. Is it
that your afraid other current arithmetic coders
can already compress most files better?

0
David
1/4/2006 11:07:16 PM
My tests on my time, your tests on your time.

0
nightlight
1/4/2006 11:08:59 PM
 You must be damn rich to afford time ownership.
   Can I share your secret to owning time?
  No entropy ..please.

0
Ignorant
1/6/2006 11:53:03 AM
> You claim a coder better than the best arithmetic,
> Yet your only proof is your word by using Moffat
> code you modifed yourself.

That's not the "only" proof. Analysis of the algorithms,
as sketched below (and covered in more detail in the the
preprint & the tech reports) will give you the same answer,
showing it even more clearly. The source code which is
available allows anyone to test as they wish. The source
also allows examination so that one can explain the results
from the code itself, rather than form the mathematical
description of the algorithms.

> Such as why you went to the trouble to modify Moffats
> code yourself and then proclaim your method is better
> than the best present day arithmetic encoders.

The main mod on their code was to make it code from memory
to memory instead of using much slower C streams. The
alternative of adding stream in/out to QI code would simply
add a large and variable term and uncertainty to both coding
times making tests measure mostly the irrelevant aspects (such
as how much time the C stream library takes under all OS &
buffering fluctuations).

Hence, regarding the speed tests, the removal of their
stream i/o has allowed much more accurate and reliable
measurements of the coding speeds differences. If you wish
to measure your C stream i/o functions speed, which combines
some addons of coding times, you go ahead, test that. That
doesn't interest me.

The other mods were selection of the coding mode and precision
via their own options (in the headers). Binary coder, order 0
mode was used. Their code for the higher order contexts was
commented out, so the coder can run faster (instead of checking
higher order model variables inside their coding loop). Also, all
their memory allocs were moved outside of their coding loop (just
their coding loop was timed using the hi-res win32 timer).

As with the speed tests, the restriction to order 0 model was
made because that is where the coders differ (since one can
always use the same model and the same encoding of the model
parameters with both coders on higher order models). Hence, the
optimum way to measure the coding efficiency _difference_ was to
remove any extra variables in which they don't differ. Again, if
you are interested in the modeling quality of Moffat98 AC
implementation beyond order 0, go test that.

Of course, both aspects, the speed and the coding accuracy
advantage of QI, are self-evident to anyone who understands the
mathematical description of the two coding algorithms. QI is
always more accurate since AC is its direct approximation
(assuming you use both under 'everything else set the same way',
such as model, model parameter encoding, etc). AC and QI use the
same type of addends, except that AC has different normalization,
in which it rescales what is in QI an exact integer "path count"
at some point (x,y), by dividing it with QI's total "path count"
(e.g. the path count at the path endpoint B in Fig 1, p. 4). This
AC scaling turns QI's integers into unlimited binary fractions,
which the AC then truncates to the given number of bits.

This truncation of the infinite fractions even for a small number
of symbols (which is absent in QI's integer format of addends),
is a loss of precision which leads to AC losing parts of its
coding interval in each step. If one were to use fractions of
infinite precision, all intervals would fit exactly next to each
other, without gaps. Since allowing intervals to overlap would
result in non-decodable output (by Kraft inequality), any loss in
precision for specifying interval boundaries must leave unused
gaps in the output code space.

The basic arithmetic difference in coding is the extra loss of
precision for AC. A rough analogy would be as if two of us are
balancing some expenses and I use exact integer number of cents
from the receipts, while you take integers from the receipts
and divide them by some large total, then, since you will
generally get an infinite decimal fraction, you terminate it to
some number of places. Hence you're making an error even before
the first add, while my integer scheme won't have any error at
all (until the sum reaches certain magnitude).

The QI.exe included with the source has a command "cbr" which
lists all such code space gaps for any n-symbol input, as well as
the cumulative redundancy in bits resulting from the gaps.
Anpother command, "ct" lists various types of redundancies for
the entire tables, exeamining every quantized binomial
coefficient, for blocks up up 2^20 bits. In QI the Kraft
inequality is eq. (20) on page 8, and QI's loss of accuracy is
due to rounding up the integer addends once their magnitude
exceeds the number of bits QI uses. As explained on p. 8, the
error of QI's addends is the smallest one satisfying both the
given finite precision of g bits and the Kraft inequality
(eq. (20)).

Hence, I don't even need a test to know that, all else (model
etc) set the same, QI will code at least as tightly as anything
else you may have now or in the future. The test is useful to
find how much exactly does the difference in coding accuracy
amount to against some specific coder. Note also that AC has an
additional coding redundancy of about 1-2 bits (max is 2, avg
1.5) on the total output even for the _infinite precision_ coders
(see [41] and [41a] I mentioned before).

For a finite precision coders and if AC is set to code in
"decrementing mode" (described in [34]), which is its most
accurate coding mode, the additional redudnancy vs QI will be
about 2-3 bits on the total output (that's the best case when
using AC's "frugal bit" mode, which is how I have tested it).

For non-decrementing finite precision AC, such as the usual
'adaptive AC', e.g. Moffat98, or for the Rissanen's AC-76 (which
is a static AC), as shown in [T2] p. 22 for order-0 source, there
is an additional AC redundancy which is approximately:
1/2 log(2*Pi p*q n) bits, where p and q are probabilities of
1's and 0's and n is the input size in bits. This redundancy is
due to AC approximating the enumeration itself (Stirling approx.
plus dropping of the square root factor), before any finite
precision is imposed. This is normally the dominant difference
and it is what all but the last row in the table on p. 10, in
[T3], show. For stationary order-0 binary inputs, adaptive AC can
code using Krichevsky-Trofimov (KT) estimator (see [41a], [40]),
which removes this worst case O(1/2*log(n)) redundnacy, by
largely accounting for the square root factor. That results in
lowering the adaptive AC redundancy for the low entropy inputs
(where the relative error was the greatest) and increasing it for
the higher entropy inputs (where the relative error was the
smallest), hence KT estimator is a tradeoff.

Note that the last row on p. 10 compares entirely different
aspects of redundancy, the distinction between predictive vs
descriptive methods of modeling (see [T2] pp. 30-31). Although,
one might call it comparing apples and oranges, since it didn't
really compare the coders proper, the point was that each coder
was modeling input as order-0 source (while the input was more
complex), but each was modeling in its "native" mode -- QI in
descriptive and AC in predictive mode. The lesson of that
example, or any other with large and unpredicatble changes in the
symbol frequencies is that the predictive coders pay much greater
price than the descriptive coders when the input is unpredictable
(relative to whatever model order they use, assuming they both
use the same order models).

We can similarly deduce the nature of the speed difference from
the mathematical descriptions of the algorithms. The QI index
recurrence is eq. (22), which for binary order-0 coding simplifes
to eq. (17), with quantized binomials C(n_j,j). Since QI keeps
these in a table, its coding via (17) consists in adding a table
value (a machine size word, 32 bits in C code provided) to the
output buffer for every "least probable symbol" (1 by
convention), and no coding operation for the most probable symbol
(0 by convention).

In this same coding setup, the AC's calculation does exactly the
same additions of (17), except that all its terms are rescaled
(normalized to total=1, as explained above, see [T2] pp. 22-23),
and that AC doesn't use table to get its rescaled C(n,k) addends,
but it computes them on the fly using multiplicative recurrences,
which in binary order-0 coding are of the form (this is a regular
binomial identity):

    C(n,k) = C(n+1,k+1) * (k+1)/(n+1) ......  (1)

when symbol 1 is encoded and:

    C(n,k) = C(n+1,k) * (n-k)/(n+1) ......  (2)

when symbol 0 is encoded. The factor p=(k+1)/(n+1) in (1), which
is a ratio of the remaining counts of ones, (k+1), and the total
remaining symbols (n+1), is interpreted within AC as probability
of ones and the factor q=(n-k)/(n+1) in (2) as probability of
zeros at that same place. As explained in [T2] pp. 19-25, AC has
no choice here since its addends are dependent on probabilities,
hence it can't have a single table (as QI does) which applies to
all probabilities. If you wanted to make it code faster using
tables to skip the most probable symbol, you would need a
separate table for each source probabilities. That makes such
scheme quite impractical for AC (see [T2] pp. 19-21).

Consequently, if both coders have processed n symbols, containing
k ones, and have 20 zeros followed by 1 ahead, QI simply picks
the next binomial C(n+20,k+1) from the table and adds it to the
index. The AC has to compute all 20 intervening binomials (for
the 0's), using multiplicative recurrence of type (2), to
calculate the final rescaled addend for the last bit=1. That's
the basic difference in work between the coders which follows
from the math of the algorithms. You don't need any test to see
that if there were 500 zeros followed by 1, AC would have done
500 multiplications via eq (2), while QI has only added n+500 and
picked the addend out of the table. The tests only tell you how
do these obvious differences in the amounts of work translate
into differences in coding speeds. The table on page 10 shows
that e.g. for 128 Kbit input, with 8 ones, where you have on
average 16384 zeros between the ones, that QI will execute 247
times faster (the result in the table is averaged over 500
inputs, each with random placements of 8 ones). As mentioned in
the readme.txt that comes with the source code, in this test
generic QI was used, not the QI tuned for sparse coding mode
(which is provided in the C source), which would have given it
another factor 2-5 on such very sparse and long inputs.

In short, my tests were focused on the aspects in which the
coders differ _inherently_, as described above (since everything
else can be set the same if one wishes so). That's what I
measured and what I reported. If you're interested in something
else, such as quality if Moffat98 AC modeler of order 1,2..., or
in speed of your C stream i/o library, you can write the code and
test that. Neither question interests me, though.

Regarding the QI higher order modeling, while one can use AC
modeling engine, that is a sub-optimal modeling scheme for QI, as
explained in [T2] pp. 26-35. The optimum division of labor
between the modeler and the coder is quite different in QI from
the one in AC (see [T2] pp. 30-31). QI's general purpose native
modeling engine is BWT (the bare BW transform, before MTF or
run-lengths or other entropy coding). That is all still research
in progress, so I'll leave it at that. No sense arguing
heuristics.

0
nightlight
1/6/2006 6:34:26 PM
> You claim a coder better than the best arithmetic,
> Yet your only proof is your word by using Moffat
> code you modifed yourself.

That's not the "only" proof. Analysis of the algorithms,
as sketched below (and covered in more detail in the the
preprint & the tech reports) will give you the same answer,
showing it even more clearly. The source code which is
available allows anyone to test as they wish. The source
also allows examination so that one can explain the results
from the code itself, rather than form the mathematical
description of the algorithms.

> Such as why you went to the trouble to modify Moffats
> code yourself and then proclaim your method is better
> than the best present day arithmetic encoders.

The main mod on their code was to make it code from memory
to memory instead of using much slower C streams. The
alternative of adding stream in/out to QI code would simply
add a large and variable term and uncertainty to both coding
times making tests measure mostly the irrelevant aspects (such
as how much time the C stream library takes under all OS &
buffering fluctuations).

Hence, regarding the speed tests, the removal of their
stream i/o has allowed much more accurate and reliable
measurements of the coding speeds differences. If you wish
to measure your C stream i/o functions speed, which combines
some addons of coding times, you go ahead, test that. That
doesn't interest me.

The other mods were selection of the coding mode and precision
via their own options (in the headers). Binary coder, order 0
mode was used. Their code for the higher order contexts was
commented out, so the coder can run faster (instead of checking
higher order model variables inside their coding loop). Also, all
their memory allocs were moved outside of their coding loop (just
their coding loop was timed using the hi-res win32 timer).

As with the speed tests, the restriction to order 0 model was
made because that is where the coders differ (since one can
always use the same model and the same encoding of the model
parameters with both coders on higher order models). Hence, the
optimum way to measure the coding efficiency _difference_ was to
remove any extra variables in which they don't differ. Again, if
you are interested in the modeling quality of Moffat98 AC
implementation beyond order 0, go test that.

Of course, both aspects, the speed and the coding accuracy
advantage of QI, are self-evident to anyone who understands the
mathematical description of the two coding algorithms. QI is
always more accurate since AC is its direct approximation
(assuming you use both under 'everything else set the same way',
such as model, model parameter encoding, etc). AC and QI use the
same type of addends, except that AC has different normalization,
in which it rescales what is in QI an exact integer "path count"
at some point (x,y), by dividing it with QI's total "path count"
(e.g. the path count at the path endpoint B in Fig 1, p. 4). This
AC scaling turns QI's integers into unlimited binary fractions,
which the AC then truncates to the given number of bits.

This truncation of the infinite fractions even for a small number
of symbols (which is absent in QI's integer format of addends),
is a loss of precision which leads to AC losing parts of its
coding interval in each step. If one were to use fractions of
infinite precision, all intervals would fit exactly next to each
other, without gaps. Since allowing intervals to overlap would
result in non-decodable output (by Kraft inequality), any loss in
precision for specifying interval boundaries must leave unused
gaps in the output code space.

The basic arithmetic difference in coding is the extra loss of
precision for AC. A rough analogy would be as if two of us are
balancing some expenses and I use exact integer number of cents
from the receipts, while you take integers from the receipts
and divide them by some large total, then, since you will
generally get an infinite decimal fraction, you terminate it to
some number of places. Hence you're making an error even before
the first add, while my integer scheme won't have any error at
all (until the sum reaches certain magnitude).

The QI.exe included with the source has a command "cbr" which
lists all such code space gaps for any n-symbol input, as well as
the cumulative redundancy in bits resulting from the gaps.
Anpother command, "ct" lists various types of redundancies for
the entire tables, exeamining every quantized binomial
coefficient, for blocks up up 2^20 bits. In QI the Kraft
inequality is eq. (20) on page 8, and QI's loss of accuracy is
due to rounding up the integer addends once their magnitude
exceeds the number of bits QI uses. As explained on p. 8, the
error of QI's addends is the smallest one satisfying both the
given finite precision of g bits and the Kraft inequality
(eq. (20)).

Hence, I don't even need a test to know that, all else (model
etc) set the same, QI will code at least as tightly as anything
else you may have now or in the future. The test is useful to
find how much exactly does the difference in coding accuracy
amount to against some specific coder. Note also that AC has an
additional coding redundancy of about 1-2 bits (max is 2, avg
1.5) on the total output even for the _infinite precision_ coders
(see [41] and [41a] I mentioned before).

For a finite precision coders and if AC is set to code in
"decrementing mode" (described in [34]), which is its most
accurate coding mode, the additional redudnancy vs QI will be
about 2-3 bits on the total output (that's the best case when
using AC's "frugal bit" mode, which is how I have tested it).

For non-decrementing finite precision AC, such as the usual
'adaptive AC', e.g. Moffat98, or for the Rissanen's AC-76 (which
is a static AC), as shown in [T2] p. 22 for order-0 source, there
is an additional AC redundancy which is approximately:
1/2 log(2*Pi p*q n) bits, where p and q are probabilities of
1's and 0's and n is the input size in bits. This redundancy is
due to AC approximating the enumeration itself (Stirling approx.
plus dropping of the square root factor), before any finite
precision is imposed. This is normally the dominant difference
and it is what all but the last row in the table on p. 10, in
[T3], show. For stationary order-0 binary inputs, adaptive AC can
code using Krichevsky-Trofimov (KT) estimator (see [41a], [40]),
which removes this worst case O(1/2*log(n)) redundnacy, by
largely accounting for the square root factor. That results in
lowering the adaptive AC redundancy for the low entropy inputs
(where the relative error was the greatest) and increasing it for
the higher entropy inputs (where the relative error was the
smallest), hence KT estimator is a tradeoff.

Note that the last row on p. 10 compares entirely different
aspects of redundancy, the distinction between predictive vs
descriptive methods of modeling (see [T2] pp. 30-31). Although,
one might call it comparing apples and oranges, since it didn't
really compare the coders proper, the point was that each coder
was modeling input as order-0 source (while the input was more
complex), but each was modeling in its "native" mode -- QI in
descriptive and AC in predictive mode. The lesson of that
example, or any other with large and unpredicatble changes in the
symbol frequencies is that the predictive coders pay much greater
price than the descriptive coders when the input is unpredictable
(relative to whatever model order they use, assuming they both
use the same order models).

We can similarly deduce the nature of the speed difference from
the mathematical descriptions of the algorithms. The QI index
recurrence is eq. (22), which for binary order-0 coding simplifes
to eq. (17), with quantized binomials C(n_j,j). Since QI keeps
these in a table, its coding via (17) consists in adding a table
value (a machine size word, 32 bits in C code provided) to the
output buffer for every "least probable symbol" (1 by
convention), and no coding operation for the most probable symbol
(0 by convention).

In this same coding setup, the AC's calculation does exactly the
same additions of (17), except that all its terms are rescaled
(normalized to total=1, as explained above, see [T2] pp. 22-23),
and that AC doesn't use table to get its rescaled C(n,k) addends,
but it computes them on the fly using multiplicative recurrences,
which in binary order-0 coding are of the form (this is a regular
binomial identity):

    C(n,k) = C(n+1,k+1) * (k+1)/(n+1) ......  (1)

when symbol 1 is encoded and:

    C(n,k) = C(n+1,k) * (n-k)/(n+1) ......  (2)

when symbol 0 is encoded. The factor p=(k+1)/(n+1) in (1), which
is a ratio of the remaining counts of ones, (k+1), and the total
remaining symbols (n+1), is interpreted within AC as probability
of ones and the factor q=(n-k)/(n+1) in (2) as probability of
zeros at that same place. As explained in [T2] pp. 19-25, AC has
no choice here since its addends are dependent on probabilities,
hence it can't have a single table (as QI does) which applies to
all probabilities. If you wanted to make it code faster using
tables to skip the most probable symbol, you would need a
separate table for each source probabilities. That makes such
scheme quite impractical for AC (see [T2] pp. 19-21).

Consequently, if both coders have processed n symbols, containing
k ones, and have 20 zeros followed by 1 ahead, QI simply picks
the next binomial C(n+20,k+1) from the table and adds it to the
index. The AC has to compute all 20 intervening binomials (for
the 0's), using multiplicative recurrence of type (2), to
calculate the final rescaled addend for the last bit=1. That's
the basic difference in work between the coders which follows
from the math of the algorithms. You don't need any test to see
that if there were 500 zeros followed by 1, AC would have done
500 multiplications via eq (2), while QI has only added n+500 and
picked the addend out of the table. The tests only tell you how
do these obvious differences in the amounts of work translate
into differences in coding speeds. The table on page 10 shows
that e.g. for 128 Kbit input, with 8 ones, where you have on
average 16384 zeros between the ones, that QI will execute 247
times faster (the result in the table is averaged over 500
inputs, each with random placements of 8 ones). As mentioned in
the readme.txt that comes with the source code, in this test
generic QI was used, not the QI tuned for sparse coding mode
(which is provided in the C source), which would have given it
another factor 2-5 on such very sparse and long inputs.

In short, my tests were focused on the aspects in which the
coders differ _inherently_, as described above (since everything
else can be set the same if one wishes so). That's what I
measured and what I reported. If you're interested in something
else, such as quality if Moffat98 AC modeler of order 1,2..., or
in speed of your C stream i/o library, you can write the code and
test that. Neither question interests me, though.

Regarding the QI higher order modeling, while one can use AC
modeling engine, that is a sub-optimal modeling scheme for QI, as
explained in [T2] pp. 26-35. The optimum division of labor
between the modeler and the coder is quite different in QI from
the one in AC (see [T2] pp. 30-31). QI's general purpose native
modeling engine is BWT (the bare BW transform, before MTF or
run-lengths or other entropy coding). That is all still research
in progress, so I'll leave it at that. No sense arguing
heuristics.

0
nightlight
1/6/2006 6:43:08 PM
-- Errata:

> 128 Kbit input, with 8 ones, where you have on
> average 16384 zeros between the ones

The figure 16384 should be repaced by 128*1024/9 = 14563.5... since the
8 ones produce 9 sections of zeros.

0
nightlight
1/6/2006 7:51:42 PM
nightlight wrote:
) That's not the "only" proof. Analysis of the algorithms,
) as sketched below (and covered in more detail in the the
) preprint & the tech reports) will give you the same answer,
) showing it even more clearly.

This analysis is no different from any other analysis in that
you have to make lots of assumptions.  This means that if you use
such an analysis to make real-world predictions, then that depends
on how well your assumptions match the real world.



Because of the massive nature of your posts, I haven't been able to
find an answer to a few questions I have about the QI coder:

- if I read you correctly, it is not an adaptive coder,
so how do you transmit the model information for the QI coder ?

- how would you use a QI coder with an adaptive model ?

- assuming I have a stream of symbols, where at each position in
the stream, the probability distribution of the symbols is different,
then how does QI coder adapt itself to all those different distributions ?


(I have a few more questions, but stick to these for now.)


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
0
Willem
1/6/2006 8:01:39 PM
-- Errata:

>> 128 Kbit input, with 8 ones, where you have on
>> average 16384 zeros between the ones

The figure 16384 should be repaced by 128*1024/9 = 14563.5... since the
8 ones produce 9 sections of zeros.

0
nightlight
1/6/2006 8:05:15 PM
"nightlight" <nightlight@omegapoint.com> wrote in 
news:1136572988.625145.191290@z14g2000cwz.googlegroups.com:

> 
> Of course, both aspects, the speed and the coding accuracy
> advantage of QI, are self-evident to anyone who understands the
> mathematical description of the two coding algorithms. QI is
> always more accurate since AC is its direct approximation
> 

   I wish we had someone here that was an expert on both
algorithms since from your previous posts it sure indicates
to me that you are no expert on arithmetic coding. I 
admit I know very little about your QI however I now enough
about arithmetic coding to realize that proper optimal bijective
file coding is one of the best methods and it is an optimal 
method something you don't seem to understand from your various
posts. Even if QI could be used to make some sort of optimal
file compressor it could never compress all files as well 
as an optimal arithmetic. Since you can't grasp that simple
fact you can't be an expert in arithmetic so I doubt what you
think is self evident has any relationship to reality.


   You seem to think your stuff is better than arithmetic when
used as an entropy encoder. Yet it appears this so called neat
method of yours has yet to be even tested in any simple code
where one is using an entropy coder. Why is that?

   Even Matt did FPAQ0 can't you or your team do something
similar with QI or is it to complex of a task?


   I know you could care less what I think. But many people
here would like to see real results. We get plenty of people
that can quote and talk about how good there stuff is and people
here realize talk is cheap. They want to see REAL RESULTS can
you do that or do you care what the average person here thinks
of your method.

   Like mentioned above it appears you can modify Moffat which
is not the best but you picked what you wanted to pick and then
called it the best. I would continue asking new questions but
for some reason you seem not to anwser even the most simple
of  questions.

David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/6/2006 10:39:48 PM
Willem wrote:
> nightlight wrote:
> ) That's not the "only" proof. Analysis of the algorithms,
> ) as sketched below (and covered in more detail in the the
> ) preprint & the tech reports) will give you the same answer,
> ) showing it even more clearly.
>
> This analysis is no different from any other analysis in that
> you have to make lots of assumptions.  This means that if you use
> such an analysis to make real-world predictions, then that depends
> on how well your assumptions match the real world.
>
>
>
> Because of the massive nature of your posts, I haven't been able to
> find an answer to a few questions I have about the QI coder:

I agree that someone having enough time for reading his massive posts
could rather read the paper and find answers himself.

I will try to answer some questions on the basis of my understanding of
what he has posted here (I have not read his papers yet).

> - if I read you correctly, it is not an adaptive coder,
> so how do you transmit the model information for the QI coder ?

He says that QI is based on enumerative coding, where models are
conceptually different from PPM models which we are more familiar with.


So it will probably mean that we will need to study EC from ground up
and then see how QI fits in (atleast I will have to as I am not
familiar with EC) and how it relates to PPM models.

Or if someone can summarize the difference here for the people like me,
please do so.

> - how would you use a QI coder with an adaptive model ?

He said that QI is the "natural" choice for BWT post-processing. This
probably means that QI itself cant be used for higher order adaptive
coding but by using BWT, the higher-order-adaptive-modelling problem
can be reduced into something which QI can handle.

> - assuming I have a stream of symbols, where at each position in
> the stream, the probability distribution of the symbols is different,
> then how does QI coder adapt itself to all those different distributions ?

I dont know the answer to this one.

I hope this helps (my apologies to nightlight if I am mistaken
somewhere, feel free to correct me).

As for nightlights's comments on QI's speed, I am afraid that as the
modelling scheme for QI is different from modelling scheme for
ArithCoding, we will need to compare speed of "QI+its modelling code"
with "AC+its modelling code". Where both models should be of same
order, or chosen to give same compression ratio. (My doubt here is
"what if QI just *shifts* computation burden to modelling code instead
of reducing it".)

Sachin Garg [India]
http://www.sachingarg.com

0
Sachin
1/6/2006 10:54:36 PM
"Sachin Garg" <schngrg@gmail.com> wrote in
news:1136588076.767621.250110@g14g2000cwa.googlegroups.com: 

> 
> Willem wrote:
>> nightlight wrote:
>> ) That's not the "only" proof. Analysis of the algorithms,
>> ) as sketched below (and covered in more detail in the the
>> ) preprint & the tech reports) will give you the same answer,
>> ) showing it even more clearly.
>>
>> This analysis is no different from any other analysis in that
>> you have to make lots of assumptions.  This means that if you use
>> such an analysis to make real-world predictions, then that depends
>> on how well your assumptions match the real world.
>>
>>
>>
>> Because of the massive nature of your posts, I haven't been able to
>> find an answer to a few questions I have about the QI coder:
> 
> I agree that someone having enough time for reading his massive posts
> could rather read the paper and find answers himself.
> 

  If the paper is that small pdf file at his site it does not anwser
the simple questions being asked. So I don't see this as an anwser.
And if its some stuff where one would have to sign some nondiscloser
agreement I think that to would be a waste to time. Since the questions
asked of him are not that hard.

> I will try to answer some questions on the basis of my understanding
> of what he has posted here (I have not read his papers yet).
> 
>> - if I read you correctly, it is not an adaptive coder,
>> so how do you transmit the model information for the QI coder ?
> 
> He says that QI is based on enumerative coding, where models are
> conceptually different from PPM models which we are more familiar
> with. 
> 
> 
> So it will probably mean that we will need to study EC from ground up
> and then see how QI fits in (atleast I will have to as I am not
> familiar with EC) and how it relates to PPM models.
> 
> Or if someone can summarize the difference here for the people like
> me, please do so.
>

   He seems to belive arithemtic compression is a poor approximation
of EC compression. Here is summary take two symbol one and zero
suppose you have 2 ones and 2 zeros then there are 4!/(2!*2!) or 6
combinations. his method would assign the 6 possible numbers to this
problem so you could say its exact. At least if numbers small. But
the paper really doesn't say how to do compression. You read various
other papers that show how to assign a number to a combination.

  All he seem to do is get a number for a combination.
He has in his paper the example of 00101001 in which he calculates
the value as 2+6+35 = 43 this was done in long post message
59 at 

http://groups.google.com/group/comp.compression/browse_frm/thread/ffc7f7ca8
4c76378/30566aec6aa7d363?q=nightlight&rnum=1#30566aec6aa7d363

   Big deal thats not compression. He state there

"The answer is on page 6, where it shows the string index
I(S8)=2+6+35=43 and how to calculate it (eq. 17). The actual size in
bits of the index is L=log(56)=5.807... bits since the valid values of
the index are 0..55 (there are 56 paths from A to B). The fractional
bits of L don't normally go to waste since they are coded in the mixed
radix with other items sent, usually with the fractional bits of
indices for other blocks (cf. p.9 [N4]). The encoder also sends the
count of 1's per block, which is 3 here and the length of the entire
string which is 8. The latter two items get coded in variety of ways in
different variants and different parts of the coder/s and experimental
prototypes (cf. p.9 [N6])."

  This means he never compares it to an arithmetic compressor. He only
makes claims thats it better. I think if he could actually compress the
one example he gives in paper then he might be more beliveable. But he
wants to say its exact. But he can't be pinned down on any application
that compares it to an arithmetic. So not having any simple examples
done in a complete way says him being shown arithmetic ways that beat
his method. He leaves it to you to do the coding. At which point he
could claim you didn't do it the right way. Actually its a clever way
to prevent it from being compared to any real world entropy compressor.

  So the only thing I got out of the paper besides the fact he never
completes anything is to say for a string made of only ones and zeros
the compressed encoded result is 3 items that are easy to combine
just don't ask him how to combine them since he seems not likely to do so.
1) the length of entire string
2) the number of ones
3) the index value.

He doesn't risk actually combining these in an output string 
its possible he does not want to risk being laughed at or he
fears one could show how a simple bijective string compressor gets
better results. If you can think of any other reason no examples
like this are done to the end please Schin tell us what you think
it is.


   

>> - how would you use a QI coder with an adaptive model ?
> 
> He said that QI is the "natural" choice for BWT post-processing. This
> probably means that QI itself cant be used for higher order adaptive
> coding but by using BWT, the higher-order-adaptive-modelling problem
> can be reduced into something which QI can handle.
> 
>> - assuming I have a stream of symbols, where at each position in
>> the stream, the probability distribution of the symbols is different,
>> then how does QI coder adapt itself to all those different
>> distributions ? 
> 
> I dont know the answer to this one.
>

  Maybe it depends on the defination of "natural"
 
> I hope this helps (my apologies to nightlight if I am mistaken
> somewhere, feel free to correct me).
>

   I suspect he will paste in lots of links to various papers that
seems to be his style. So don't hold your breath for useful anwsers.
They may ot may not be related to your questions.
 
> As for nightlights's comments on QI's speed, I am afraid that as the
> modelling scheme for QI is different from modelling scheme for
> ArithCoding, we will need to compare speed of "QI+its modelling code"
> with "AC+its modelling code". Where both models should be of same
> order, or chosen to give same compression ratio. (My doubt here is
> "what if QI just *shifts* computation burden to modelling code instead
> of reducing it".)
> 
> Sachin Garg [India]
> http://www.sachingarg.com
> 



David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/7/2006 2:15:24 AM
nightlight wrote:
> This truncation of the infinite fractions even for a small number
> of symbols (which is absent in QI's integer format of addends),
> is a loss of precision which leads to AC losing parts of its
> coding interval in each step. If one were to use fractions of
> infinite precision, all intervals would fit exactly next to each
> other, without gaps. Since allowing intervals to overlap would
> result in non-decodable output (by Kraft inequality), any loss in
> precision for specifying interval boundaries must leave unused
> gaps in the output code space.

Discarding part of the range is one way to deal with finite precision,
for example the carryless rangecoder in ppmd.  However the various
coders in paq6, paq7, fpaq0, etc. do not have any gaps in the code
space.  These are carryless binary arithmetic coders with 32 bits
precision and 12 bit representation of probabilities.  They output a
byte at a time so most of the time the range is represented with 24
bits precision.  Coding loss occurs due to rounding of the probability.
 In the worst case the range is 2 and the probability is forced to 1/2,
but this is rare.  The practical effect is to increase the compressed
size by 0.0001 bpc on typical inputs.

The coding error can be made arbitrarily small with little effort.  In
paqar and pasqda the coder includes a carry counter and outputs a bit
at a time, so the range always has at least 30 bits of precision.
Redundancy due to rounding error e in the probability is O(e^2), or
about 2^-60.  If this is still too big, you could go to 64 bit
arithmetic and reduce the redundancy to about 2^-124.

I am sure there are applications for QI, but in the PAQ series even a
perfect coder would have negligible effect on both compression ratio
and speed, since CPU time is dominated by modeling.  Being restricted
to an order 0 model seems like a severe disadvantage.  How would you
transform a context mixing model to order 0?

-- Matt Mahoney

0
Matt
1/7/2006 3:42:31 AM
Hi David,

How is your health these days? I hope you are doing better now.

> >> Because of the massive nature of your posts, I haven't been able to
> >> find an answer to a few questions I have about the QI coder:
> >
> > I agree that someone having enough time for reading his massive posts
> > could rather read the paper and find answers himself.
> >
>
>   If the paper is that small pdf file at his site it does not anwser
> the simple questions being asked. So I don't see this as an anwser.
> And if its some stuff where one would have to sign some nondiscloser
> agreement I think that to would be a waste to time. Since the questions
> asked of him are not that hard.

Oh, I had presumed that hidden in all papers he links to, there will be
answers to the questions here, I didn't realized that they have only
incomplete examples (maybe complete from QI perspective, but incomplete
from compression perspective, which makes them useless atleast for us).


> >> - how would you use a QI coder with an adaptive model ?
> >
> > He said that QI is the "natural" choice for BWT post-processing. This
> > probably means that QI itself cant be used for higher order adaptive
> > coding but by using BWT, the higher-order-adaptive-modelling problem
> > can be reduced into something which QI can handle.
>
>   Maybe it depends on the defination of "natural"

I guess what he meant was, more efficient than MTF etc... Anyway, we
can leave discussions on this, hopefully he will come up with a BWT
based compressor implementation to prove his point.


Sachin Garg [India]
http://www.sachingarg.com

David A. Scott wrote:
> "Sachin Garg" <schngrg@gmail.com> wrote in
> news:1136588076.767621.250110@g14g2000cwa.googlegroups.com:
>
> >
> > Willem wrote:
> >> nightlight wrote:
> >> ) That's not the "only" proof. Analysis of the algorithms,
> >> ) as sketched below (and covered in more detail in the the
> >> ) preprint & the tech reports) will give you the same answer,
> >> ) showing it even more clearly.
> >>
> >> This analysis is no different from any other analysis in that
> >> you have to make lots of assumptions.  This means that if you use
> >> such an analysis to make real-world predictions, then that depends
> >> on how well your assumptions match the real world.
> >>
> >>
> >>
> >> Because of the massive nature of your posts, I haven't been able to
> >> find an answer to a few questions I have about the QI coder:
> >
> > I agree that someone having enough time for reading his massive posts
> > could rather read the paper and find answers himself.
> >
>
>   If the paper is that small pdf file at his site it does not anwser
> the simple questions being asked. So I don't see this as an anwser.
> And if its some stuff where one would have to sign some nondiscloser
> agreement I think that to would be a waste to time. Since the questions
> asked of him are not that hard.
>
> > I will try to answer some questions on the basis of my understanding
> > of what he has posted here (I have not read his papers yet).
> >
> >> - if I read you correctly, it is not an adaptive coder,
> >> so how do you transmit the model information for the QI coder ?
> >
> > He says that QI is based on enumerative coding, where models are
> > conceptually different from PPM models which we are more familiar
> > with.
> >
> >
> > So it will probably mean that we will need to study EC from ground up
> > and then see how QI fits in (atleast I will have to as I am not
> > familiar with EC) and how it relates to PPM models.
> >
> > Or if someone can summarize the difference here for the people like
> > me, please do so.
> >
>
>    He seems to belive arithemtic compression is a poor approximation
> of EC compression. Here is summary take two symbol one and zero
> suppose you have 2 ones and 2 zeros then there are 4!/(2!*2!) or 6
> combinations. his method would assign the 6 possible numbers to this
> problem so you could say its exact. At least if numbers small. But
> the paper really doesn't say how to do compression. You read various
> other papers that show how to assign a number to a combination.
>
>   All he seem to do is get a number for a combination.
> He has in his paper the example of 00101001 in which he calculates
> the value as 2+6+35 = 43 this was done in long post message
> 59 at
>
> http://groups.google.com/group/comp.compression/browse_frm/thread/ffc7f7ca8
> 4c76378/30566aec6aa7d363?q=nightlight&rnum=1#30566aec6aa7d363
>
>    Big deal thats not compression. He state there
>
> "The answer is on page 6, where it shows the string index
> I(S8)=2+6+35=43 and how to calculate it (eq. 17). The actual size in
> bits of the index is L=log(56)=5.807... bits since the valid values of
> the index are 0..55 (there are 56 paths from A to B). The fractional
> bits of L don't normally go to waste since they are coded in the mixed
> radix with other items sent, usually with the fractional bits of
> indices for other blocks (cf. p.9 [N4]). The encoder also sends the
> count of 1's per block, which is 3 here and the length of the entire
> string which is 8. The latter two items get coded in variety of ways in
> different variants and different parts of the coder/s and experimental
> prototypes (cf. p.9 [N6])."
>
>   This means he never compares it to an arithmetic compressor. He only
> makes claims thats it better. I think if he could actually compress the
> one example he gives in paper then he might be more beliveable. But he
> wants to say its exact. But he can't be pinned down on any application
> that compares it to an arithmetic. So not having any simple examples
> done in a complete way says him being shown arithmetic ways that beat
> his method. He leaves it to you to do the coding. At which point he
> could claim you didn't do it the right way. Actually its a clever way
> to prevent it from being compared to any real world entropy compressor.
>
>   So the only thing I got out of the paper besides the fact he never
> completes anything is to say for a string made of only ones and zeros
> the compressed encoded result is 3 items that are easy to combine
> just don't ask him how to combine them since he seems not likely to do so.
> 1) the length of entire string
> 2) the number of ones
> 3) the index value.
>
> He doesn't risk actually combining these in an output string
> its possible he does not want to risk being laughed at or he
> fears one could show how a simple bijective string compressor gets
> better results. If you can think of any other reason no examples
> like this are done to the end please Schin tell us what you think
> it is.
>
>
>
>
> >> - how would you use a QI coder with an adaptive model ?
> >
> > He said that QI is the "natural" choice for BWT post-processing. This
> > probably means that QI itself cant be used for higher order adaptive
> > coding but by using BWT, the higher-order-adaptive-modelling problem
> > can be reduced into something which QI can handle.
> >
> >> - assuming I have a stream of symbols, where at each position in
> >> the stream, the probability distribution of the symbols is different,
> >> then how does QI coder adapt itself to all those different
> >> distributions ?
> >
> > I dont know the answer to this one.
> >
>
>   Maybe it depends on the defination of "natural"
>
> > I hope this helps (my apologies to nightlight if I am mistaken
> > somewhere, feel free to correct me).
> >
>
>    I suspect he will paste in lots of links to various papers that
> seems to be his style. So don't hold your breath for useful anwsers.
> They may ot may not be related to your questions.
>
> > As for nightlights's comments on QI's speed, I am afraid that as the
> > modelling scheme for QI is different from modelling scheme for
> > ArithCoding, we will need to compare speed of "QI+its modelling code"
> > with "AC+its modelling code". Where both models should be of same
> > order, or chosen to give same compression ratio. (My doubt here is
> > "what if QI just *shifts* computation burden to modelling code instead
> > of reducing it".)
> >
> > Sachin Garg [India]
> > http://www.sachingarg.com
> >
>
>
>
> David A. Scott
> --
> My Crypto code
> http://bijective.dogma.net/crypto/scott19u.zip
> http://www.jim.com/jamesd/Kong/scott19u.zip old version
> My Compression code http://bijective.dogma.net/
> **TO EMAIL ME drop the roman "five" **
> Disclaimer:I am in no way responsible for any of the statements
>  made in the above text. For all I know I might be drugged.
> As a famous person once said "any cryptograhic
> system is only as strong as its weakest link"

0
Sachin
1/7/2006 8:04:48 AM
["Followup-To:" header set to comp.compression.]
On 2005-12-29, nightlight <nightlight@omegapoint.com> wrote:
> I received some questions about the source and the algorithm via email
> and since the matter may be of interest to others who downloaded it, I
> will
> reply here as well:
>
>> Concerning the source, I am afraid it is of little use in this form.
>> I don't run windows here, and I do not feel like I would really need it.
>> Thus, if you'd like to present your work, it would be most useful to
>> depend on ANSI-C only.
>
> The main nonstandard VC6/win32 C use in the source has to do with the
> nanosecond timer & thread control (in Qiutl.c). 

the thread control functions are never called.

you have arrays with no length declared in headers 

(I'm guessing they should be declared extern) in the headers....
the timer stuff I replaced with calls of the posix gettimeofday(), 
(I figure microsecond precision is close enough)

I turned off the ASM. renamed the files to match the names given in the
source. produced stubs for the conio calls you make, and wrote a makefiles 
replaced the integer types with names from <stdint.h> it compiles.
and does something

what is all that output supposed to mean, or more to the point, 
what do I do to get efficiency statistics, elapsed time, compressed size, 
that sort of stuff.

Also I found some printf()s (for error conditions) with the wrong number 
of arguments and a few other wierdnesses.


another probem with your code is that as it stands it seems that it only
tests your algorithm's ability to compress pseudo-random data... 
pseudo-random data is theorietically extremely compressible.

Bye.
   Jasen
0
Jasen
1/7/2006 9:53:19 AM
"nightlight" <nightlight@omegapoint.com> wrote in 
news:1136572988.625145.191290@z14g2000cwz.googlegroups.com:


.....

> infinite precision, all intervals would fit exactly next to each
> other, without gaps. Since allowing intervals to overlap would
> result in non-decodable output (by Kraft inequality), any loss in
> precision for specifying interval boundaries must leave unused
> gaps in the output code space.
> 
> The basic arithmetic difference in coding is the extra loss of
> precision for AC. A rough analogy would be as if two of us are
> balancing some expenses and I use exact integer number of cents
> from the receipts, while you take integers from the receipts
> and divide them by some large total, then, since you will
> generally get an infinite decimal fraction, you terminate it to
> some number of places. Hence you're making an error even before
> the first add, while my integer scheme won't have any error at
> all (until the sum reaches certain magnitude).
> 
> The QI.exe included with the source has a command "cbr" which
> lists all such code space gaps for any n-symbol input, as well as
> the cumulative redundancy in bits resulting from the gaps.
> 

.....


   You don't seem to grasp the obvious. There are arithmetic coders
that have zero gaps. You don't know what you are talking about.
One such coder is arb255.exe you seem to repeat useless stuff over
and over with out actually thinking.  You can't anwser simple
questions or provide simple anwsers. Who are you kidding?



David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/7/2006 2:19:31 PM
nightlight wrote:
) This truncation of the infinite fractions even for a small number
) of symbols (which is absent in QI's integer format of addends),
) is a loss of precision which leads to AC losing parts of its
) coding interval in each step. If one were to use fractions of
) infinite precision, all intervals would fit exactly next to each
) other, without gaps. Since allowing intervals to overlap would
) result in non-decodable output (by Kraft inequality), any loss in
) precision for specifying interval boundaries must leave unused
) gaps in the output code space.

This paragraph clearly demonstrates that you do not understand well
enough how Arith Encoding works.  Any decent AC does *not* lose parts
of its coding interval each step.  Try to get that through your head.
It *is* possible (hell, it's quite easy) to get the intervals to line
up exactly without infinite precision.


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
0
Willem
1/7/2006 2:39:04 PM
"Matt Mahoney" <matmahoney@yahoo.com> wrote in 
news:1136605351.691283.112110@g47g2000cwa.googlegroups.com:

> From: "Matt Mahoney" <matmahoney@yahoo.com>
> Newsgroups: comp.compression,sci.math
> 
> nightlight wrote:
>> This truncation of the infinite fractions even for a small number
>> of symbols (which is absent in QI's integer format of addends),
>> is a loss of precision which leads to AC losing parts of its
>> coding interval in each step. If one were to use fractions of
>> infinite precision, all intervals would fit exactly next to each
>> other, without gaps. Since allowing intervals to overlap would
>> result in non-decodable output (by Kraft inequality), any loss in
>> precision for specifying interval boundaries must leave unused
>> gaps in the output code space.
> 
> Discarding part of the range is one way to deal with finite precision,
> for example the carryless rangecoder in ppmd.  However the various
> coders in paq6, paq7, fpaq0, etc. do not have any gaps in the code
> space.  These are carryless binary arithmetic coders with 32 bits
> precision and 12 bit representation of probabilities. 

    
   Matt I think you are correct in saying the coder used in fpaq0 has
no gaps.  But the over all code it produces does have gaps due to the
modeling. It has nothing to do with the arguement with nightlight
since not sure he has a grasp of arithmetic at all. I am not just
talking about the file endings I am taking about gaps that exist
through out the whole output file because of the model. 
   In your model you use 9 bits for every 8 bits of data where starting
bit is a ZERO for each byte and then for EOF you allow the starting
bit to be ONE and stop the compressor. This does not allow all the
code space to be used.
   As a result of this modeling every compressed file with FPAQ0 has
the first bit of the fist byte set on output so technically the first
bit out is a total waste of space. However its not that bad as far
as total number of extra bits. Most add the waste purely at end or
with a count field at the beginning. Without all the complications
of a true bijective compressor you could drop back to 8 bits per symbol
and then when marking the compression done you just start with a
pretend bit for the EOF such that X2 changes then flush as before.
On decompression you check for EOF at start of each new byte if all
the bytes of archive have been read. If not just continue in loop.
When checking you can caluclate exactly when to stop in fact you
can do a free check to see if file is actaully the result of the
compression. You don't have a bijective file compressor at this point
but its with in a byte or two for even every long files.
   To go the extra step for a  full bijective file compressor you
would have to do what arb255.exe did which is basically use the last
bit that is a one in the file as the EOF endicator or if last byte
all zeros or a zero followed by tail of 100..'s you add in bijecitvely
a last bit that is a obe which takes a lot or overhead for just 2 bytes
or so of saving at a cost of time.

   Let me state your model does not cost more than what other people
do in fact it is slick. However the cost for a N byte file being
compressed is  -lg(1/2) -ln(2/3) - .. -ln(N/N+1) for the zero codings
plus the length for 1 coding at end which is -ln(1/N+2) bits
the zeroes add to  -ln(1/N+1) bits.  For a file 100,000 bytes long
this is 16.6096549013 bits due to zeroes
and  is 16.6096693280 due to one. 
for a total of 4 extra bytes that are not needed. The cost of the
zeros can be totally eliminate with very little change to FPAQ0
the cost of the One would be the current cost of a one in the table
and could be no greater than what you are currently using. This
method would allow dense packing and fully use the compressed
space with out gaps except for the very ending bytes.


David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/7/2006 3:17:10 PM
Willem <willem@stack.nl> wrote in 
news:slrndrvkk7.188o.willem@toad.stack.nl:

> nightlight wrote:
> ) This truncation of the infinite fractions even for a small number
> ) of symbols (which is absent in QI's integer format of addends),
> ) is a loss of precision which leads to AC losing parts of its
> ) coding interval in each step. If one were to use fractions of
> ) infinite precision, all intervals would fit exactly next to each
> ) other, without gaps. Since allowing intervals to overlap would
> ) result in non-decodable output (by Kraft inequality), any loss in
> ) precision for specifying interval boundaries must leave unused
> ) gaps in the output code space.
> 
> This paragraph clearly demonstrates that you do not understand well
> enough how Arith Encoding works.  Any decent AC does *not* lose parts
> of its coding interval each step.  Try to get that through your head.
> It *is* possible (hell, it's quite easy) to get the intervals to line
> up exactly without infinite precision.
> 
> 
> SaSW, Willem

  It his inability to grasp this simple well understood fact that
makes one wonder if he understands the subject field at all. Of course
you can expect his usually reply with lines and lines of quoted text
having nothing to do with this fact which he either will not or can
not seem to grasp. He reminds me of the current crop of students who
are never corrected when thay make common errors so they just go on
making ever larger errors while never having learned how to learn
from mistakes.


David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/7/2006 3:25:13 PM
 > This analysis is no different from any other analysis
 > in that you have to make lots of assumptions. This
 > means that if you use such an analysis to make
 > real-world predictions, then that depends
 > on how well your assumptions match the real world.

I see now where you're misreading my redundancy
statements in this and in the earlier thread. The
redundancy I was deriving in [T3] at end of page 8,
and talking about here, is simply the number of
excess bits that the finite precision arithmetic
used by QI will add to the unlimited precision
enumerative code. The paragraph on page 8 after all,
starts with "To obtain QI redundancy d(g) _due to
SW rounding_ in (21)..." The upper bound for
_this_ redundancy d(g) obtained there is:

       d(g) < log(e)/2^(g-1) .... (1)

No model or source assumptions are needed for (1), and
none is reflected in (1). The only parameter in (1) is
the assumed arithmetic precision of g bits (the SW
mantissa length). The eq. (1) is simply an upper bound
on any additional bits relative to the exact EC, produced
by QI due to its use of SW rounding up in eq. (21). It has
nothing to do with how well EC or QI will code in any
given setup (model, source, parameter coding method, etc).
If the exact EC model is perfect in a given setup, than
(1) shows what is the maximum that QI can fall short of
that "perfect" output. If EC is coding using an imperfect
model, resulting in some redundancy of R bits per symbol
relative to the best model, (1) shows the maximum that
QI can add to R. But (1) doesn't care or predict what
R is. The two types of redundancy are completely unrelated.

The question of how well the exact EC and AC can code in
different coding setups, is an entirely different and a
much larger topic, well covered in the literature, starting
with Davisson's [28] and many other papers since.

Krichevsky-Trofimov's paper [33] provides great many
bounds for variety of coding setups. Some related later
results are in [40],[41],[41a]-[41d]. The basic result
is that even for the unlimited precision coders (and
coding under 'everything else set the same') the exact
EC has a slightly lower redundancy than the exact AC (by
approximately 1 bit for the entire input, for max & avg).
This is the same difference as between the Huffman and
the Shannon-Fano prefix codes. Even the origin of the
difference is the same: the  bottom-up addend construction,
as done by QI and Huffman, is tighter than the top down
addend construction, as done by AC & Shannon-Fano.

Now to your main questions, starting with the last one,
which is the most specific:

 > assuming I have a stream of symbols, where at each
 > position in the stream, the probability distribution
 > of the symbols is different, then how does QI coder
 > adapt itself to all those different distributions ?

This is the scenario of AC modeling engine feeding QI,
which was sketched in note N7 on p. 9, [T3]. Two ways
are described there:

a) --- QI coding AC style

QI can code the AC style by performing "Lattice
Jumps". It is simplest to see how this is done by looking
at the decoder, arriving at some point B=(x,y) (see
Fig. 1, p. 4, [T3]). The path count at B is N(x,y)=56.
The index at point B can have values 0..N(x,y)-1, hence
the length of the index at B is log(N(x,y))=log(56) bits.
If bit=1 gets decoded (as shown by path on Fig 1), the
decoder moves up, to point BA=(x,y-1), which has the
path count 21, hence the index has length log(21) bits.
Hence, upon decoding bit=1 at B, the index length has
dropped by log(56)-log(21)=log(8/3) bits, which is
precisely the ideal code length log(1/p) for bit=1
at B, where p=3/8=probability of 1 at B. If bit=0
gets decoded at B, decoder moves to point BL=(x-1,y)
where path count is N(x-1,y)=35, hence the index
length is log(35). In this case the index length
drops by log(56)-log(35)=log(8/5) which is exactly
same as the ideal code length log(1/q), where q=5/8
is probability of 0 at B. It is easy to see from
multiplicative recurrences for binomial coefficients
(eq's (1) & (2) from the previous post here) that
this pattern always holds - after every decode step,
the index length drops by exactly log(1/P), where P
is the probability of the decoded symbol. Analogous
relation holds for each encode step, where the index
length increases by the ideal code length of the
encoded symbol at that point. Note also that due to
integer arithmetic, this is not an approximate
optimality (such as one would get using truncated
infinite fractions, as AC does). With QI/EC, this
coding optimality at every point is built into
the table entries. { You can check the quantization
errors using e.g. QI.exe cbr n36, which shows no
quantization errors for n=36 (or below), and with
n=37, the 1st error for k=16, of just +1 in the
SW mantissa which adds 4e-10 bits to the index.}

With QI, for a general point B=(x,y), the quantized
path count L(x,y) (computed via (21)) is an SW integer
with a g-bit mantissa w(x,y) and exponent e(x,y).
The ideal code lengths and ratios for the steps
from B described above still hold, but only within
the limits d(g). In particular, L(x,y-1)/L(x,y) is
approx. =p=y/(x+y) and L(x-1,y)/L(x,y)=q=x/(x+y).

The index at B will have for the leading g bits
at the bit offset e(x,y) some g-bit integer Iw(x,y)
which is in the interval [0,w(x,y)-1] (this is a
simple consequence of the index at any point
ranging from 0 to path count-1 and the fact that
quantized path count L(x,y) has trailing zeros
after the leading g bits given by w(x,y), hence
L(x,y)-1 will decrement w(x,y)). We can thus view
for any point B(x,y) and index I(x,y), the Iw(x,y)
as a digit in the radix w(x,y).

Suppose now, decoder at B=(x,y) gets from the modeler
some probabilities p' and q' different from p,q. To
continue decoding, decoder makes a jump to another
lattice point B'=(x',y') where x'/(x'+y')=p' and
y'/(x'+y')=q'. One can use Farey fractions (see
[F]) to obtain the optimum such point for any
given binomial table size. Alternatively, one
can simply jump to another point on the same
front i.e. one would keep n fixed, x+y=n=x'+y'
and select point B' using x'=n*p'. The path
count at B' is L(x',y') with mantissa w(x',y')
and exponent e(x',y'), which are different from
w(x,y) and e(x,y). The exponent is easy to adjust:
you simply change the presumed position of the
least significant bit of the index I(x',y') (this
is the origin, A on Fig 1., toward which decoder
is heading, but hasn't reached yet since there
are more symbols to decode; in the QI source
code in file EncDec.c this presumed origin of the
index is given as argument "sb" to function qiDec()).

The main work is with the difference in the path
count mantissas w(x,y) and w(x',y') at B and B'.
Namely at B' the leading g bits of index Iw(x',y')
have to be a digit in the radix w'=w(x',y'). But we
only have a g-bit digit left over from B which is
in the radix w=w(x,y). So, the problem here is
that of radix conversion -- we have a digit Iw
in radix w and we need a digit Iw' in radix w'.
There are several ways to do this. A conceptually
simple one is as follows: decoder extracts the
digit Iw and encodes it as digit of some mixed radix
output integer M, which serves as an accumulator or
recycler for all such 'orphaned' Iw digits. The bits
of M (which are an arbitrary bit pattern, being a
binary form of a mixed radix integer) can simply be
reused, e.g. by growing M at the unprocessed end
of the compressed input (or just having M as separate
component). At this stage the encoder would have
done the opposite - it would have "decoded" (see
file Radix.c, function dec_radix()) the far end
of the compressed data (which was an arbitrary
binary pattern) into a digit Iw in radix w and
concatenated it to the leading end of the index.
There are other similar ways to perform this
radix conversion, all of them using amount of
processing per symbol very similar to the
conventional AC algorithm. They all also have
to perform explicit coding/decoding operations
(which include mul/div) for both, the most and
the least probable symbols, just as AC does.

The summary of this is that if you want the AC
modeling plus AC coding style, you get the AC
speed and the AC accuracy. The AC scheme, with its
'single next symbol probability' bottleneck interface
between the modeler & the coder (where the modeler
micro-manages the coder, symbol by symbol, and where
the whole coder+ modeler processing and interaction
is traversed from top to bottom on every symbol)
is simply intrinsically a poor division of labor
to allow for any high performance coding.

It is analogous to organizing car manufacturing,
and requiring that the next car can be started
only after the current car is complete and out
the door. That's a kind of conceptual constraint
imposed by the AC modeling "paradigm" as its
so-called "online" coding requirement. This
online" is taken to mean some kind of analog,
memoryless, CPU-less, Morse telegraph device.

That has nothing to do with the actual online
as it is done, or any actual requirements or
inherent design constraints. One normally has
a fairly large buffer space and processor which
can access any of it, running programs of high
complexity. Internet would grind to a halt if
its protocols interpreted "online" as a constraint
to have to send or receive a single bit (or a
single symbol) at a time. Even the old style
point-to-point modems had several KB buffers
to accumulate, batch and compress the data.
And similarly for disk sectors & clusters.

The point of the above critique of the present
AC "paradigm" is that it is simply a gratuitous,
historically accidental conceptual bottleneck and
an intrinsic performance drain. Any algorithm that
follows its prescribed division of labor will bog
down. Once you snap out of its "online" spell,
many better possibilities open up. For example,
even retaining the AC modeling engine, with its
"probability of the next single symbol" bottleneck
parametrization of all the information about the
sequence being encoded, but just allowing coder to
ignore the imagined "online" constraint, one can
get much better performance with QI as follows:

b) --- QI Modeling AC style

QI breaks the probabilities into classes, so that
each class includes an interval of probabilities of
size 1/sqrt(n), where n is the size of data to be
encoded. Since coder doesn't assume any more the Morse
telegraph kind of "online", it doesn't assume n is 1
but some much larger number. The modeler is still left
to work under the old "online" spell and imagine that
it has to convert, by hook or crook, all it knows
or that it could know about the input sequence into
the probabilities for the next single symbol p(c).

Consider now a binary sequence of n symbols, for which
the modeler produces, symbol by symbol probabilities
p of bit=1, with p in some interval D=[a,b), of size
d=b-a. We divide D into s=sqrt(n) equal subintervals
of lengths d/s. Each input symbol is assigned to
one of s enumerative classes (thus enumerated by a
separate index) based on the subinterval in which
the modeler's p at that point falls in. Hence, we're
using quantized probabilities to classify the symbols
as "equiprobable". The excess output E in bits per
symbol due to this 'p quantization' is about
(cf. [41c], p. 8):

     E = [dp^2/p+dq^2/q)]*log(e)/2 .... (2)

where dp & dq are quantization errors of p and q. Since
dp & dq are =< 1/s, then E =< log(e)/2npq = O(1/n). Note
that since adaptive AC probability estimates have also
sampling error dp, dq of order 1/sqrt(n), this redundancy
is of the similar size as that of the adaptive AC. One can
further optimize this method (to reduce its worst case)
by selecting non-uniform partition of interval D, so that
the subintervals around smaller probabilities are shorter.

In practical situations, the AC modeler would be producing
its predictions p(c) based on statistics from the processed
part of the input, hence its probabilities would already
have a built in sampling error interval (which decreases
as 1/sqrt(n)), which can be used by the QI coder as the
partition criteria for the enumerative classes (instead of
an ad hoc partition described above). Various existent
methods for growing and splitting the contexts based on
such past statistics, such as CTW or Rissanen's Context,
would transfer here as methods for generating enumerative
classes adaptively.

For the multi-alphabet case one would perform the decomposition
described [T1] pp. 31-38 with the only difference that instead
of combining the symbol counts k(c) based on symbol's binary
code c, one would combine the modeler's probabilities p(c).

A special case of interest for this method are the finite
order Markov sources. Here, for order m, the probabilities
of the next symbol are defined by the m previous symbols.
For smaller m, one could simply bypass the computation
of probabilities (since QI doesn't need them) and simply
assign the enumerative class for the next input symbol
directly: using m previous symbols as the class tag
(hence there would be 2^m classes in binary case).

In this case we can notice another advantage of QI/EC
coder for modeling over the AC: to encode a symbol QI
needs to know only whether the symbol has the same or
different probabilities as some other symbols, but unlike
AC, QI doesn't also need to know what values these
probabilities have. Hence, QI places much lower demand
on the modeling engine, since the modeler here can simply
pass on to QI the context ID (the last m symbols) and
QI will code the symbol into the index for that ID,
whatever its probability may be.

In conclusion for this method (b), QI can use AC modeling
engine, with its full speed advantage over AC, and with
the redundancy being same as that of an adaptive AC in
the same setup.

 > how would you use a QI coder with an adaptive model ?

Sections (a) and (b) above have couple answers.

 > if I read you correctly, it is not an adaptive coder,
 > so how do you transmit the model information for the
 > QI coder ?

It is not "adaptive AC", but as illustrated in (a),(b)
above, it can function that way. The native QI modeling
is descriptive, in the sense of Rissanen's MDL. So the
QI model information is a much wider type of information
than just a list of probabilities (although it can be
that, too).

Consider an order-0 QI coding. The modeling analogous to
the order-0 adaptive AC, except more resilient against
the "surprises", becomes here the selection of the
segmentation of the input into contiguous sections,
based on measured symbol frequencies. Its resilience
to surprises is illustrated in the row "Vary" (cf. table
on page 10 in [T3]). The adaptive order-0 modeler
for QI has entire input sequence available and it
does not have to rely on a possibly false assumption
that the symbol frequencies in the initial parts of
the sequence are predictive of the frequencies in the
later parts, or gamble on which way might they
be predictive. While AC can code this way, too,
all it would achieve with that would be to advance
into a literal role of a less accurate and a lot
slower imitation of QI.

QI order-0 adaptive modeler identifies contiguous
quasi-stationary sections of the input sequence and
uses them as enumerative classes. There are many ways
to do such segmentation and even more ways to encode
it, along with the corresponding section counts, into
the output. Some of these methods, especially the
encoding aspect, were developed already for conventional
EC, such as described in [11]-[15], [23]. I have also
developed several for QI (some of which were touched
upon in [T2], where the general QI/EC modeling pattern
is presented, pp. 26-35).

Due to a lack practical EC coder and the exaggerated
dominance of the AC modeling paradigm (hypertrophied
to the point of pathology by the absence of practical
competition), this entire field of EC modeling is
highly under-explored. With the precision & performance
problems finally solved by QI, an algorithmic gold mine
has opened, where just about anything you do, and there
is more to do than an eye can see, is a new algorithm,
maybe a great new discovery to be taught to kids ever after.



-- References ( http://www.1stworks.com/ref/RefLib.htm )

T1-T3 are on http://www.1stworks.com/ref/qi.htm

28. L.D. Davisson "Universal noiseless coding"
  IEEE Trans. Inform. Theory  IT-19 (6), 783-795, 1973
  http://cg.ensmp.fr/~vert/proj/bibli/local/Davisson1973Universal.pdf

33. R. Krichevsky, V. Trofimov
  "The performance of universal encoding"
  IEEE Trans. Inform. Theory  IT-27 (2), 199-207, 1981
  http://cg.ensmp.fr/~vert/proj/bibli/local/Krichevsky1981performance.pdf

41. M.Drmota, H-K. Hwang, W. Szpankowski
    "Precise Average Redundancy of an Idealized
     Arithmetic Coding"  DCC 2002, 222-231.
  http://citeseer.ist.psu.edu/drmota02precise.html

34. J.G. Cleary, I.H. Witten
  "A Comparison of Enumerative and Adaptive Codes"
  IEEE Trans. Inform. Theory  IT-30 (2), 306-315, 1984
  http://www.1stworks.com/ref/Cleary84Enum.pdf

F. Farey Fractions:
    http://www.cut-the-knot.org/ctk/PickToFarey.shtml

41c. P.G. Howard, J.S. Vitter
  "Practical Implementations of Arithmetic Coding"
   Tech. Rep.  No. 92-18, CS, Brown University, 1992
   http://www.1stworks.com/ref/Howard92PractAri.pdf
0
nightlight
1/7/2006 5:25:12 PM
nightlight wrote:
) > if I read you correctly, it is not an adaptive coder,
) > so how do you transmit the model information for the
) > QI coder ?
)
) It is not "adaptive AC", but as illustrated in (a),(b)
) above, it can function that way. The native QI modeling
) is descriptive, in the sense of Rissanen's MDL. So the
) QI model information is a much wider type of information
) than just a list of probabilities (although it can be
) that, too).
)
)  <snip theoretical discussion>

I have read through your entire description, and I haven't found even
a single hint to the practical question of what exactly the QI coder
needs as modeling information to do its job.

Can I assume that 'enumerative coder' means that you need the exact
symbol counts for each of the subclasses ?  And if not, then what do
you need ?  Please try to explain this as simply as possible, within
a single paragraph.


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
0
Willem
1/7/2006 5:55:21 PM
nightlight <nightlight.skip-this@and-this.omegapoint.com> wrote in 
news:SY6dnUrMLIWIZyLenZ2dnUVZ_s6dnZ2d@rcn.net:

> > This analysis is no different from any other analysis
> > in that you have to make lots of assumptions. This
> > means that if you use such an analysis to make
> > real-world predictions, then that depends
> > on how well your assumptions match the real world.
> 
> I see now where you're misreading my redundancy
> statements in this and in the earlier thread.

   Actually I think he is reading them correctly he
has an understanding of arithmetic coding its not clear
you do.


....


> Now to your main questions, starting with the last one,
> which is the most specific:
> 
> > assuming I have a stream of symbols, where at each
> > position in the stream, the probability distribution
> > of the symbols is different, then how does QI coder
> > adapt itself to all those different distributions ?
> 
> This is the scenario of AC modeling engine feeding QI,
> which was sketched in note N7 on p. 9, [T3]. Two ways
> are described there:
> 
> a) --- QI coding AC style

.....

> 
> The summary of this is that if you want the AC
> modeling plus AC coding style, you get the AC
> speed and the AC accuracy. The AC scheme, with its
> 'single next symbol probability' bottleneck interface
> between the modeler & the coder (where the modeler
> micro-manages the coder, symbol by symbol, and where
> the whole coder+ modeler processing and interaction
> is traversed from top to bottom on every symbol)
> is simply intrinsically a poor division of labor
> to allow for any high performance coding.
> 

   I gues that means it doesn't work so hot, Took you
a long time to state it. So you are admiting the 
arithmetic may beat the QI when you attempt to shoe
horn in your QI method where a arithmetic might be
a natural fit. No wonder you changed Moffat to what
ever porblem your doing instead of the other way around.


....

> 
> b) --- QI Modeling AC style
>

....
 

> QI breaks the probabilities into classes, so that
> each class includes an interval of probabilities of
> size 1/sqrt(n), where n is the size of data to be
> encoded. Since coder doesn't assume any more the Morse
> telegraph kind of "online", it doesn't assume n is 1
> but some much larger number. The modeler is still left
> to work under the old "online" spell and imagine that
> it has to convert, by hook or crook, all it knows
> or that it could know about the input sequence into
> the probabilities for the next single symbol p(c).
> 
> Consider now a binary sequence of n symbols, for which
> the modeler produces, symbol by symbol probabilities
> p of bit=1, with p in some interval D=[a,b), of size
> d=b-a. We divide D into s=sqrt(n) equal subintervals
> of lengths d/s. Each input symbol is assigned to
> one of s enumerative classes (thus enumerated by a
> separate index) based on the subinterval in which
> the modeler's p at that point falls in. Hence, we're
> using quantized probabilities to classify the symbols
> as "equiprobable". The excess output E in bits per
> symbol due to this 'p quantization' is about
> (cf. [41c], p. 8):

  Well it likes like your having trouble since you need to
break it up into smaller chunks. Thats to bad since a real
entropy compressor would take a file of millions of symbols
and still compress to roughly the same length no matter how
the symbols arranged. Form what you wrote it seems like your
only doing local considerations.

....
....

> 
> > how would you use a QI coder with an adaptive model ?
> 
> Sections (a) and (b) above have couple answers.
> 
> > if I read you correctly, it is not an adaptive coder,
> > so how do you transmit the model information for the
> > QI coder ?
> 
> It is not "adaptive AC", but as illustrated in (a),(b)
> above, it can function that way. The native QI modeling
> is descriptive, in the sense of Rissanen's MDL. So the
> QI model information is a much wider type of information
> than just a list of probabilities (although it can be
> that, too).
> 
> Consider an order-0 QI coding. The modeling analogous to
> the order-0 adaptive AC, except more resilient against
> the "surprises", becomes here the selection of the
> segmentation of the input into contiguous sections,
> based on measured symbol frequencies. Its resilience
> to surprises is illustrated in the row "Vary" (cf. table
> on page 10 in [T3]). The adaptive order-0 modeler
> for QI has entire input sequence available and it
> does not have to rely on a possibly false assumption
> that the symbol frequencies in the initial parts of
> the sequence are predictive of the frequencies in the
> later parts, or gamble on which way might they
> be predictive. While AC can code this way, too,
> all it would achieve with that would be to advance
> into a literal role of a less accurate and a lot
> slower imitation of QI.
> 

   Again this show your lack of understanding just
what an order-0 adaptive AC coder does. If a file is made up
of 2 symbol types. And one wanted a to compress it by
the above method it would get the same length no matter
how things ordered. You could have all the zeros then the ones
or any combination you would get roughly the same length file.
In fact even in you own exaples of small strings you use the
fact to get a combination number for each arrangemetn based
entirely on the length of string and number of ones. That is
what the adaptive order-o arithemtic coder would do. If you
have to segment the file then local effects take over and you
would not get the same compression for different combinations
of the ones and zeros. The order if the symbols makes no
difference.

....


  Here is a question I doubt you will anwser since you really
don't seem to answer anything. But I leave so others can see
just how your code can't solve simple problems.

  I go back yo your on example with a string where you use QI
in lacttice jumps to fet an index.  This truely is about as
easy as it gets. Something even you could but your fingers around.
You claim you need only three things. 
a) this combination index
b) the number of ones in the string
c) the length of string

  How would you combine this infromation into a string so decompresstion
could be done. This is something even you should be able to do. 
Yet you will not. You may give several pages of references but
you will not give one complete example.


David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/7/2006 6:43:18 PM
 > Any decent AC does *not* lose parts of its coding
 > interval each step.  Try to get that through your
 > head. It *is* possible (hell, it's quite easy) to
 > get the intervals to line up exactly without
 > infinite precision.

It seems you and few others here are either restricting
the noun "gap" to a subset of cases where gaps do occur,
or you are making certain implicit assumptions about
the source and the sequences it produces, without being
aware of making them.

In order to clarify which is the case here, why don't
you demonstrate your easy and gapless finite precision
binary AC coding for the finite input sequences produced
by a binary source S3, which satisfy the following
conditions:

  a) Each sequence from S3 is exactly 3 bits long,

  b) each sequence has exactly two bits=0 and one bit=1 and

  c) each sequence is equally likely as any other.

How does your easy gapless AC coding for all possible
input sequences from S3 look like? Since the inputs
are pretty short, you should be able to show it all
using an AC working in 4 bit precision. But if you
need more, well use more (as long as the bit strings
fit on a 72 char line).

NOTE 1: The S3 output needs to be coded using a binary
order-0 AC in incrementing or decrementing or static mode,
which codes bit by bit as if the sequences have arbitrary
lengths. Namely, the short sequences are given above
only to allow you to show explicitly the coding, not
to allow switching over to let you construct the small
Huffman tree for the entire input (or some equivalent
ad hoc codes).

For example, the exactly same coding algorithm should be
demonstrable (e.g. as a little sample exe coder that
will work on a regular PC with, say, 1G of RAM) with
inputs of, say, exactly 30 million bits long with exactly
10 million bits=1 set. In this case you couldn't code it
via the multi-alphabet AC which treats the entire input
as a single character of a gigantic alphabet. Hence,
you can't use any such methods for S3 either. So,
just show it for a plain bitwise binary AC.

NOTE 2: You can't feed your AC input sequences with
0, 2 or 3 ones in order to fill the gaps and declare
them as "used". The S3 specified does not produce
any such sequences.


0
nightlight
1/7/2006 7:32:11 PM
Can you email me a more detailed list of
changes (or headers etc) and fixes so I
can fix the copy on the web site?

Microsecond timer ought to be Ok (the windows
hi res is about 1/4 microsecond). Those test
loops need to be decoupled to measure separately
encode and decode times, and do the correctness
check outside of that, reseting the generator
to run 3 times with the same seed (it already
has a spare copy of original table & function
to do it). That way, even the time() would allow
accurate speed with enough iterations.

 > what is all that output supposed to mean,
 > or more to the point, what do I do to get
 > efficiency statistics, elapsed time,
 > compressed size, that sort of stuff.

The readme.txt explains the main information commands,
such as ct (for tables statistics), cbr (binomial
rows full detail, code space gaps, redundancies, roundoffs,
mantissa excess, etc). The binary coding commands ci
and cc have very similar output, so only cc is described
in detail in the readme. The radix codes (fixed, mixed
and factorial) commands cr, cf, cm, show the same info
as cc, except that counts of 1's is replaced by radix.
All times are given at the end of each line as ns/sym,
separately for encoder and decoder.


 > another probem with your code is that as it stands it
 > seems that it only tests your algorithm's ability
 > to compress pseudo-random data...
 > pseudo-random data is theorietically extremely
 > compressible.

The coders don't assume or look for a way to model the
random generator. If this were just executable demo,
than one could think of some cheating like that. But
with source, it should be clear that the coders are
just plain 0-order coders and nothing more.

Adding a modeling engine to do the general file
compression or such is one of the items that will
get in eventually. The present source is only a
research code for the few folks interested in
exploring the algorithm and EC coding/modeling
potential. Having played quite a bit with the
algorithm and seen the advantages over the
alternatives, the possibilites it opens in coding
and modeling, I think that in few years this
algorithm and its ofshoots will be running in
every portable device and serve as the bitmap
index and keyword incidence map coder in all the
search engines and data-warehouses. At the moment,
though, the number people who share this view
fits in about 2.3219... bits.




0
nightlight
1/7/2006 8:57:34 PM
nightlight wrote:
) It seems you and few others here are either restricting
) the noun "gap" to a subset of cases where gaps do occur,
) or you are making certain implicit assumptions about
) the source and the sequences it produces, without being
) aware of making them.

Neither.  See below.

) In order to clarify which is the case here, why don't
) you demonstrate your easy and gapless finite precision
) binary AC coding for the finite input sequences produced
) by a binary source S3, which satisfy the following
) conditions:
)
)   a) Each sequence from S3 is exactly 3 bits long,
)
)   b) each sequence has exactly two bits=0 and one bit=1 and
)
)   c) each sequence is equally likely as any other.
)
) How does your easy gapless AC coding for all possible
) input sequences from S3 look like? Since the inputs
) are pretty short, you should be able to show it all
) using an AC working in 4 bit precision. But if you
) need more, well use more (as long as the bit strings
) fit on a 72 char line).
)
) NOTE 1: The S3 output needs to be coded using a binary
) order-0 AC in incrementing or decrementing or static mode,
) which codes bit by bit as if the sequences have arbitrary
) lengths. Namely, the short sequences are given above
) only to allow you to show explicitly the coding, not
) to allow switching over to let you construct the small
) Huffman tree for the entire input (or some equivalent
) ad hoc codes).
)
) For example, the exactly same coding algorithm should be
) demonstrable (e.g. as a little sample exe coder that
) will work on a regular PC with, say, 1G of RAM) with
) inputs of, say, exactly 30 million bits long with exactly
) 10 million bits=1 set. In this case you couldn't code it
) via the multi-alphabet AC which treats the entire input
) as a single character of a gigantic alphabet. Hence,
) you can't use any such methods for S3 either. So,
) just show it for a plain bitwise binary AC.
)
) NOTE 2: You can't feed your AC input sequences with
) 0, 2 or 3 ones in order to fill the gaps and declare
) them as "used". The S3 specified does not produce
) any such sequences.

From this note, I assume the model is allowed to 'know'
that the sequence will have a single '1' and two '0' bits,
and update its probabilities accordingly.


I'll sketch out an AC encoding, that should be enough:

The starting range is [0..256)

The steps for encoding the sequence '100' are:

Step 1: Encode the symbol '1'.
  The probabilities are 1/3 for a '1' and 2/3 for a '0'.
  Therefore, the range is subdivided as follows:
  [0..170) for a '0', and [170..255) for a '1'.
  Thus, the range is reduced to [170.255)

Step 2: Encode the symbol '0'.
  The probabilities are 0 for a '1' and 1 for a '0'.
  Therefore, the range is subdivided as follows:
  [170..255) for a '0', and [255..255) for a '1'.
  Thus, the range is reduced to [170.255)
  
Step 3: Encode the symbol '0'.
  The probabilities are 0 for a '1' and 1 for a '0'.
  Therefore, the range is subdivided as follows:
  [170..255) for a '0', and [255..255) for a '1'.
  Thus, the range is reduced to [170.255)



The steps for encoding the sequence '010' are:

Step 1: Encode the symbol '0'.
  The probabilities are 1/3 for a '1' and 2/3 for a '0'.
  Therefore, the range is subdivided as follows:
  [0..170) for a '0', and [170..255) for a '1'.
  Thus, the range is reduced to [0.170)

Step 2: Encode the symbol '1'.
  The probabilities are 1/2 for a '1' and 1/2 for a '0'.
  Therefore, the range is subdivided as follows:
  [0..85) for a '0', and [85..170) for a '1'.
  Thus, the range is reduced to [85.170)
  
Step 3: Encode the symbol '0'.
  The probabilities are 0 for a '1' and 1 for a '0'.
  Therefore, the range is subdivided as follows:
  [85..170) for a '0', and [170..170) for a '1'.
  Thus, the range is reduced to [85..170)


The steps for encoding the sequence '010' are:

Step 1: Encode the symbol '0'.
  The probabilities are 1/3 for a '1' and 2/3 for a '0'.
  Therefore, the range is subdivided as follows:
  [0..170) for a '0', and [170..255) for a '1'.
  Thus, the range is reduced to [0.170)

Step 2: Encode the symbol '0'.
  The probabilities are 1/2 for a '1' and 1/2 for a '0'.
  Therefore, the range is subdivided as follows:
  [0..85) for a '0', and [85..170) for a '1'.
  Thus, the range is reduced to [0.85)
  
Step 3: Encode the symbol '1'.
  The probabilities are 1 for a '1' and 0 for a '0'.
  Therefore, the range is subdivided as follows:
  [0..0) for a '0', and [0..85) for a '1'.
  Thus, the range is reduced to [0..85)


As you can see, the three possible sequences each lead to a range of
approximately the same size, and the ranges of all possible sequences,
when put together, form the starting range without any gaps.

This scheme easily scales up to sequences with millions of bits.


There is a coding gap caused by having to terminate the encoder, but that
is a fixed cost of two or three bits, not a loss at each step as you have
claimed.  Furthermore, this termination cost can be avoided as well, with
careful handling.

I hope to have cleared up your misgivings about Arith Coding with this.



SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
0
Willem
1/7/2006 9:24:11 PM
Willem <willem@stack.nl> writes:
> )   a) Each sequence from S3 is exactly 3 bits long,

> There is a coding gap caused by having to terminate the encoder, 

Not in this particular case.

Phil
-- 
What is it: is man only a blunder of God, or God only a blunder of man?
-- Friedrich Nietzsche (1844-1900), The Twilight of the Gods
0
Phil
1/7/2006 10:18:07 PM
nightlight <nightlight.skip-this@and-this.omegapoint.com> wrote in 
news:HaudnSFNItBFil3eRVn-rg@rcn.net:

> > Any decent AC does *not* lose parts of its coding
> > interval each step.  Try to get that through your
> > head. It *is* possible (hell, it's quite easy) to
> > get the intervals to line up exactly without
> > infinite precision.
> 
> It seems you and few others here are either restricting
> the noun "gap" to a subset of cases where gaps do occur,
> or you are making certain implicit assumptions about
> the source and the sequences it produces, without being
> aware of making them.
> 
> In order to clarify which is the case here, why don't
> you demonstrate your easy and gapless finite precision
> binary AC coding for the finite input sequences produced
> by a binary source S3, which satisfy the following
> conditions:
> 
>   a) Each sequence from S3 is exactly 3 bits long,
> 
>   b) each sequence has exactly two bits=0 and one bit=1 and
> 
>   c) each sequence is equally likely as any other.
> 
> How does your easy gapless AC coding for all possible
> input sequences from S3 look like? Since the inputs
> are pretty short, you should be able to show it all
> using an AC working in 4 bit precision. But if you
> need more, well use more (as long as the bit strings
> fit on a 72 char line).
> 
> NOTE 1: The S3 output needs to be coded using a binary
> order-0 AC in incrementing or decrementing or static mode,
> which codes bit by bit as if the sequences have arbitrary
> lengths. Namely, the short sequences are given above
> only to allow you to show explicitly the coding, not
> to allow switching over to let you construct the small
> Huffman tree for the entire input (or some equivalent
> ad hoc codes).
> 
> For example, the exactly same coding algorithm should be
> demonstrable (e.g. as a little sample exe coder that
> will work on a regular PC with, say, 1G of RAM) with
> inputs of, say, exactly 30 million bits long with exactly
> 10 million bits=1 set. In this case you couldn't code it
> via the multi-alphabet AC which treats the entire input
> as a single character of a gigantic alphabet. Hence,
> you can't use any such methods for S3 either. So,
> just show it for a plain bitwise binary AC.
> 
> NOTE 2: You can't feed your AC input sequences with
> 0, 2 or 3 ones in order to fill the gaps and declare
> them as "used". The S3 specified does not produce
> any such sequences.
> 
> 

 If you have a sequence that is defined by 3 bit strings you in
effect have 3 input symbols. With an adaptive coder you don't
need to know in advance that each of the 3 symbols is equal likely
but what the hell lets play the game and compress with a coder the
likes of arb2x or arb255 the only difference being you need two cells
one cell is split equal in half the other is split 1 to 2. Note
this split is not perfect but when one one is useing 64 bit registers
the split is dam close and no gaps.

 To explain what is happening lets use 1 bit registers  the first
splits so that a one is output when you have the 100 token
the next is split 50 50 so the mappings would be
100 goes to 1
010 goes to 01
001 goes to 00  
in fact this is what you would get in huffman case. And you can see
that since its a complete tree any sequence of 1 and 0's on 
decompression gives you any of the 3 symbol sequences no gaps.

  You might complain in that if you had 30 million bits from
the source your compressing that the compressed length would
vary from 10 million bits in the unlikely event you have 10
million 100's being output even though it should have been
only occuring roughly 1/3 of the time. The compressed length
could also be as long as 20 million bits if the 100 never occured
at all highly unlikely. However using the huffman apprpximation
the average compressed length would be 16.666... million bits.

  When you go to more bits it compresses better again no gaps
The ideal length for the 30 million bit sequence is which is
actually 10 million times -lg(1/3) which is roughly
15849625.0072115618145373894394782 bits for the 10 million.
while if using 64 bit register 
its - lg (6148914691236517204/18446744073709551616) for 100 
which is roughly
15849625.0072115618176657356346099 and  slightly less
for the other 2 symbols. In short there would be no way
to tell your not actually using 1/3 and there would be no
gaps. Since you would need 15849625 bits give or take a
bit to express the compression.

  If you have high enough interger precession math you could
calculate the index. However sooner of later when you do compression
you have to right that number out as ones and zeros. You gain
nothing in space savings by have the exact number since the
airhtmetic compression has no gaps. Sure you might be one bit
shorter for some combinations but then you will be a bit longer
for otherse. Its just a plain fact the bijective arithmetic
can do this kind of compression in an optimal way. It not 
clear you understand how to do this with your method in general.



and 

David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/7/2006 11:19:59 PM
["Followup-To:" header set to comp.compression.]
On 2006-01-07, nightlight <nightlight.skip-this@and-this.omegapoint.com> wrote:
> Can you email me a more detailed list of
> changes (or headers etc) and fixes so I
> can fix the copy on the web site?

I'm not sure that it's actually working, but I'll email
you a diff in two (formats machine readable and human 
readable in case the tools you have can't handle the 
former).

I haven't implemented kbhit(), so that bit isn't working yet.
is it used inside any timing loops or can I do an inefficient
implementation.

> Microsecond timer ought to be Ok (the windows
> hi res is about 1/4 microsecond). Those test
>  loops need to be decoupled to measure separately
> encode and decode times, and do the correctness
> check outside of that, reseting the generator
> to run 3 times with the same seed (it already
> has a spare copy of original table & function
> to do it). That way, even the time() would allow
> accurate speed with enough iterations.

You didn't consider wasting a little memory by decoding 
into a third buffer?

> > what is all that output supposed to mean,
> > or more to the point, what do I do to get
> > efficiency statistics, elapsed time,
> > compressed size, that sort of stuff.
>
> The readme.txt explains the main information commands,

not in terms that I understand.

> Adding a modeling engine to do the general file
> compression or such is one of the items that will
> get in eventually. 

I feel that it would make it easier to understand, but 
I haven't tried to uderstand the code much more than 
to get it to compile.

> I think that in few years this
> algorithm and its ofshoots will be running in
> every portable device and serve as the bitmap
> index and keyword incidence map coder in all the
> search engines and data-warehouses. At the moment,
> though, the number people who share this view
> fits in about 2.3219... bits.

  :)

Bye.
   Jasen
0
Jasen
1/8/2006 4:23:27 AM
David A. Scott wrote:
> "Matt Mahoney" <matmahoney@yahoo.com> wrote in
> news:1136605351.691283.112110@g47g2000cwa.googlegroups.com:
>
>    In your model you use 9 bits for every 8 bits of data where starting
> bit is a ZERO for each byte and then for EOF you allow the starting
> bit to be ONE and stop the compressor. This does not allow all the
> code space to be used.
>    As a result of this modeling every compressed file with FPAQ0 has
> the first bit of the fist byte set on output so technically the first
> bit out is a total waste of space.

Actually all the code space is used because the first bit is 0 for an
empty file.

Of course it could be improved by making it bijective instead of using
2*log2(n) bits to encode the length in bytes.

-- Matt Mahoney

0
Matt
1/8/2006 5:57:59 PM
nightlight <nightlight.skip-this@and-this.omegapoint.com> wrote in 
news:HaudnSFNItBFil3eRVn-rg@rcn.net:

> In order to clarify which is the case here, why don't
> you demonstrate your easy and gapless finite precision
> binary AC coding for the finite input sequences produced
> by a binary source S3, which satisfy the following
> conditions:
> 
>   a) Each sequence from S3 is exactly 3 bits long,
> 
>   b) each sequence has exactly two bits=0 and one bit=1 and
> 
>   c) each sequence is equally likely as any other.
> 
> How does your easy gapless AC coding for all possible
> input sequences from S3 look like? Since the inputs
> are pretty short, you should be able to show it all
> using an AC working in 4 bit precision. But if you
> need more, well use more (as long as the bit strings
> fit on a 72 char line).
> 
> NOTE 1: The S3 output needs to be coded using a binary
> order-0 AC in incrementing or decrementing or static mode,
> which codes bit by bit as if the sequences have arbitrary
> lengths. Namely, the short sequences are given above
> only to allow you to show explicitly the coding, not
> to allow switching over to let you construct the small
> Huffman tree for the entire input (or some equivalent
> ad hoc codes).
> 
> For example, the exactly same coding algorithm should be
> demonstrable (e.g. as a little sample exe coder that
> will work on a regular PC with, say, 1G of RAM) with
> inputs of, say, exactly 30 million bits long with exactly
> 10 million bits=1 set. In this case you couldn't code it
> via the multi-alphabet AC which treats the entire input
> as a single character of a gigantic alphabet. Hence,
> you can't use any such methods for S3 either. So,
> just show it for a plain bitwise binary AC.
> 
> NOTE 2: You can't feed your AC input sequences with
> 0, 2 or 3 ones in order to fill the gaps and declare
> them as "used". The S3 specified does not produce
> any such sequences.
> 
> 
> 

 Nightlight here is what I am willing to do so others
and maybe even you can see the difference. This is 
basically your example. But as usually you are not clear
enough. I would like anybody to be to test the code.

1) So first of all the input for the compressor has to be
files of ascii characters in multiples of 3 namely
100 or 010 001  that way the file will be exactly a multiple
of 3 bytes where only those 3 combinations allowed. So file 
can be short as 3 byte or millions of bytes long but also
a multiply of 3.

2) I will map that bijectively to a packed binary file you
really don't need the format here but its for the arithmetic
coder I don't wish to mod arb255 very much so that it easy
for you and other to follow the changes I but in.

3) The bijective arithemtic coder will compress with no gaps using
the fixed and unchanging wieghts of as close to 1/3 that it can
get. ( One could design a cusom way but hay powers or 2 good enough)
The output will be an binary file where each of the 3 sequences maps
to roughly 1.5849625 bits per use.

4) The reverse is all so true. Take any file and do the reverse of
3 2 and 1 above and you get a unique sequemce of type 1 such that
if X is an ascii file in begining then steps 2 and 3 would be
compression and reverse of 3 and revese of 2 would be uncompression.
if Y is any file period then:
compress( uncompress (Y)) = Y 
and
uncompress( compress(X)) = X

Even though this is your example not sure you can do this with your
method. It not even ideal for the arithmetic but even so it would be
trival to mod arb255 to do this.


David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/8/2006 9:20:37 PM
First, few minor problems with your answer:

 > The starting range is [0..256) ...
 >  Therefore, the range is subdivided as follows:
 >  [0..170) for a '0', and [170..255) for a '1'. ...

You start with full "range" [0..256), but then you switch
the endpoints from there on to [*..255). You also meld together
two mutually exclusive notations, the Pascal range A..B
with the semi-open interval math notation [A,B) for a "range".
The Pascal "range" A..B includes into the interval both points
A and B, while the semi-open interval [A,B) doesn't include B.
That notational meld of two mutually exclusive prescriptions
was probably what led to your endpoint mixup 256 vs. 255. The fact that 
255 is divisible by 3, while 256 is not gave it an extra wishful nudge 
to go with the 255. I will switch to a single coherent notation, [A,B), 
hence use e.g. [*,256).

 > There is a coding gap caused by having to terminate
 > the encoder, but that is a fixed cost of two or three
 > bits, not a loss at each step as you have claimed.

The AC doesn't normally build the complete, ready to go output code on 
each step. Part of the output code is implicit in the coder's state 
variables and the implementation conventions. If at any point the coder 
is told to wrap it up and provide output code, the excess of 1-2 bits on 
that total output code will be in that output in the case of an infinite 
precision AC coder (ACX). This excess will be 2-3 bits in the case of a 
finite precision AC coder (ACF). Hence, these 1-2 or 2-3 excess bits on 
the total output are part of the AC output throughout -- they are only 
represented differently at different stages (implicit in the algorithm 
conventions and the internal state variables until the last step, at 
which point the output code creation is finalized and they become 
explicitly part of that output). Hence, at best your argument above is 
vacuous (arguing about ambiguities in verbal conventions), and at worst 
it is incorrect. I will grant you the benefit of the best case.

Well, on the positive side, at least you recognize that these excess 
bits result in a "coding gap", which they do, of course.

There is an additional and unrelated misconception revealed in the last 
part of that sentence:

 > ... but that is a fixed cost of two or three
 > bits, not a loss at each step as you have claimed.

An infinite number of additive contributions can easily produce a sum 
with fixed upper bound, or within some fixed range, such as 1-2 excess 
for ACX or 2-3 for ACF. For example 1 + 1/3 + 1/3^2 +... = 1.5 and the 
partial sums after 2 terms are between 1.33 and 1.5. Hence, that the 
cost is "fixed" (or rather the cost interval is fixed) gives you no 
information on how many steps contributed to the cost.

 > Furthermore, this termination cost can be avoided as
 > well, with careful handling.

Not in the general AC coding setup. In a special cases, e.g. when the 
number of symbols is prescribed and known upfront to encoder and 
decoder, you can avoid, at least part of it. In a general AC setup, 
without any such "side information" provided, the 1-2 bit excess for ACX 
(or 2-3 for ACF) is a part of the standard ACX code length (needed to 
select unambiguously the final codeword interval):

      L = ceiling(log(1/Pc)) + 1 .... (1)

whole bits (see [41a], pp. 13-14), where Pc is the 'coding probability' 
(the probabilities you compute with in AC) of the complete message. For 
a stationary order-0 AC model, Pc is simply p^k * (1-p)^(n-k), where 
p=AC's probability of 1's, k=count of 1's, n=count of all symbols. 
Hence, the upper bound for ACX redundancy is 2 bits (unlike the Huffman 
or exact EC code, where the +1 is absent and where the upper bound is 1, 
resulting solely from rounding up of the log(1/Pc) to the next whole bit).

Now to the fundamental problem of your answer -- the complete and 
hopeless conceptual meld of the three distinct concepts of "interval" 
relevant in arithmetic coding. Hence, before we can head toward any 
approximation of a coherent discussion, we need a finer resolution vew 
of these three kinds of "intervals":

1. IVF ==> the coder's internal variables in finite precision specifying 
finite precision intervals (these are your [0,85),
[85,170),... etc. variables).

2. IVX ==> the coder's internal variable specifying an infinite
precision interval (this type is implicit in your description, when you 
are referring to your intervals as approximate). While one can't store 
infinite precision binary fraction in an internal variable, one can 
easily store an equivalent rational fractions with unlimited precision 
integers for numerators and denominators (e.g. as done in [41a] pp. 
13-14, 46).

3. CWI ==> Codeword interval - this is the interval defined by the AC 
codeword (the explicit codeword ready for output or the implicit 
codeword which could produced at each step). The AC codeword, implicit 
or explicit, is a binary string C of finite length L (for ACX L is given 
via (1)). The usual AC convention is to interpret C as representing 
fractional binary digits of some rational number Z, defined as: Z = 0.C. 
For example, if the AC codeword is C=1010 then L=4, Z=0.1010 = 
1/2+0/4+1/8+0/16 = 5/8 = 0.625. The mapping of codewords C to intervals 
CWI is used more generally than just for AC analysis e.g. in the Kraft 
inequality proofs (cf. Cover & Thomas IT textbook [24]). The common 
convention for this mapping is as follows ([24] eq. (24) which I use 
below as (2); also [41a] pp. 13-14 & eq. (1.7)):

       CWI = [Z, Z+G)  ....  (2)

where the "granularity" G (=the interval length) is defined as:

       G=1/2^L  .... (3)

Some examples of codewords C and their corresponding intervals CWI are 
shown below:

  C ... CWI
-------------------------
  0 ... [0, 0.5)
  1 ... [0.5, 1)
00 ... [0, 0.25)
01 ... [0.25, 0.5)
10 ... [0.5, 0.75)
11 ... [0.75, 1) ... etc.
-------------------------

The soft & fuzzy pedagogical descriptions of the AC encoding algorithm 
usually describe construction of the nested IVX or IVF intervals from 
the input sequence, then state that the coder transmits binary digits of 
some point Z from within the final IVX or IVF interval as a 
specification for that interval and they'll prescribe that the number of 
digits should be given via (1) (sometimes omitting the +1 term). But, 
the point Z given with L fractional bits C, can't all by itself specify 
even the IVF, let alone IVX.

The only interval that Z can specify unambiguously, and only if further 
supplemented with a precise convention, such as eqs. (2) and (3), is the 
interval CWI. The higher resolution descriptions (such as [41a] pp. 
13-14, 41-48) explain and derive code length (1) -- its origin is in the 
requirement (in ICW terms used above) that the interval ICW has to fit 
within the coder's final "internal variable" interval IVX or IVF (cf. 
eq. (1.7) in [41a]).

We can look now at a set {C} = {C1,C2,...} containing all possible 
codewords that an AC can produce from a given source (this may be an 
infinite set). The corresponding intervals CWI(C1), CWI(C2),... form a 
set {CWI(C)} of non-overlapping intervals (consequence of the AC output 
decodability). Since
all CWI intervals fit within [0, 1), the non-overlapping property 
implies that their lengths G(C1), G(C2),... can add up to at most 1, i.e.

     G(C1)+G(C2)+G(C3)+... =< 1   ... (4)

The eq. (4), recalling the definition (3) of G's, is the Kraft 
inequality which must hold for any uniquely decodable codes (cf. [24] p. 
90, McMillan theorem), including our set {C}.

The concept of "coding gaps" (in code space or in code intervals), which 
is our topic here, is normally used in the context of Kraft inequality 
(4) as follows: The codes for which (4) holds as the exact equality are 
called "compact" codes and such codes don't have "coding gaps". The 
codes for which (4) holds only as the "<" part of the "=<" inequality 
have "coding space gaps" or "coding gaps" (unused codes). The term 
"interval gap" simply refers to these same gaps in terms of the CWI 
intervals whose lenghts L(C1), L(C2)... are used in (4).

Even in the low res analysis you apparently did manage somehow to catch 
a vague glimpse, however briefly, of what these gaps refer too, since 
you said:

 > There is a coding gap caused by having to terminate
 > the encoder, but that is a fixed cost of two or three
 > bits, ...

Clearly, you are realizing above that adding 2 or 3 bits as a 
termination cost to the codewords, does create gaps somehow and 
somewhere. But since that didn't seem to fit into the your
conceptual meld in which all three interval types are just the 
"interval" and where you can make your internal variables interval IVF 
fit together as tightly as you wish, you couldn't quite see how could 
the "interval^3" have this gap. Now, with the three higher res interval 
concepts in front of you, you can look at (3) and see that adding 2 or 3 
to L will make the corresponding CWI interval length G smaller by 4 to 8 
times, hence a pretty big gap will be opened in the CWI coverage (4) of 
the interval [0,1) (since 75-87 percent of that CWI will turn into the 
unused code space, the interval gap).

But seeing in it the low res back then, you concluded:

 >... but that is a fixed cost of two or three
 > bits, not a loss at each step as you have claimed.

Now look at (1) which shows how the L is computed -- the smaller the 
probability Pc of the whole symbol sequence, the larger the L in (1) 
will be. But Pc in (1) is the coding probability for the particular 
sequence e.g. for order 0 model (as we used), Pc is a series of 
truncated products of probabilities p & q of individual symbols 
encountered. Now what happens if you have the actual p=1/3, which is an 
infinite binary fraction 1/3=0.01010101... and use instead p'=0.010 for 
AC coding? As you probably know from basic coding theory, the greater 
the deviation of p' from the actual probability p, the longer on average 
the codeword length L. In turn, the increase in avg. L shortens the 
average CWI interval length G via (3), contributing to the increase in 
CWI interval gap (which is the shortfall from 1.0 in Kraft inequality 
(4)). Note also that the average increase in L is the result of 
accumulation of these contributions along the way from all single coding 
step coding probability deviations from the actual probability at that step.

The exactly same coding gaps happen with the QI coder due to its finite 
precision. Since QI's construction of interval lengths (eq. (21) p. 8 in 
[T3]) is bottom up, the accumulation is much more transparent here. You 
can even see it as it accumulates step-by-step using the QI.exe program, 
option "cbr" (to show complete binomial row; set also option n50 since 
default is n=1024 which would scroll off). The QI errors are due to 
rounding up to g (g=32) bit SWI mantissa in (21). The "cbr" command 
shows all quantized SWI binomials in row n, with asterisk next to those 
which were rounded up when computed via recurrence (21), which here, for 
the binary coder, is simply the Pascal triangle recurrence: C(n,k) = 
C(n-1,k) + C(n-1,k-1). The columns labeled dm16 and dm10 show how much 
this increments of mantissa have accumulated over all such adds (for all 
prior n). The "Extra Bits" column shows how much these dm contrinbute to 
excess bits (usually 1/10^9). If the mantissa were shorter, or if n is 
very large, these accumulations of increments would eventually cause 
mantissa to overflow, which would then increase the SWI exponent by 1, 
hence lenghtening the output by 1 bit.

Unlike QI, AC uses multiplicative binomial recurrences, described in 
earlier post [P1], eq (1), (2). Even more unlike QI, which does this 
only during the table construction (which is a universal table, the same 
table for all source probabilities, hence it can be premade & loaded 
from a file when the program starts), AC computes them from scratch on 
every coding task and for every symbol along the way. The final 
difference from QI, is that AC works top down, from the largest binomial 
or addend (point B on Fig 1 in [T3], see also [T2] pp. 19), which in AC 
is scaled to 1, and all other addends are smaller than this since 
they're all divided by this largest binomial (which is at the endpoint 
of the path, cf [T2] pp. 19-25 for details on this very close 
correspondence between the two coders). Hence, the rounding down 
arithmetic of AC works by fixing the starting binomial to 1 (or an 
addend generally) and reducing the size of the lower order ones, 
resulting at the end in smaller Pc in (1), thus longer output L. With QI 
which computes from lower to higher binomials, the higher binomials are 
incremented when rounding (since SWI rounding in (21) is up). So, while 
QI's rounding up errors accumulate toward the larger addends making them 
larger, the AC's rounding down errors accumulate towarad the smaller 
addends making them smaller, which it eventually pays for via (1) in 1-2 
excess bits for ACX or 2-3 excess bits for ACF (note that for AVF, the 
excess can go beyond 2-3 if the input gets too long for the frequency 
counters).

As to your IVF intervals fitting perfectly together, that is irrelevant 
for the gaps in the ICW intervals (which are the intervals defined by 
the codeword C, the quantity that is transmitted and to which (4), the 
gaps, the unused codes & the redundancy apply). IVFs are your internal 
variables and you're managing them as you wish (e.g. what to do with 
their fit when you need to drop their digits in finite precision). If 
you want to fit the IVFs together, all tight and snuggly, thats' fine 
and you are free to do so, but that won't make the gaps in the CWI go 
away or average L become shorter. For example, you could reduce your AC 
precision to 4 bits and your avg. excess bits, the unused code space and 
the CWI interval gaps will baloon (see table 3.4 p. 52 in [41a] for 
effects of the 4 bit AC precision), yet all your internal variables IVFs 
are still happily covering the entire interval [0,1), as snuggly as 
ever, with no gaps in sight.

You're welcome, of course, to write out all the codes AC will produce on 
that example (under conditions described i.e. no special ad hoc code 
tweaks for the 3 short strings; or just run some AC, in decrementing 
mode as you did in your description) and show how the CWI intervals for 
them fit gaplessly or equivalently, show how these actual AC codes use 
entire output code space without "coding gaps" (and, obviously, without 
contradicting your earlier statement on "coding gaps" due to 2-3 bit 
termination overhead).

 > I hope to have cleared up your misgivings about Arith
 > Coding with this.

Well, I hope yours will clear up, too :)

-- References ( http://www.1stworks.com/ref/RefLib.htm )

41a. P.A.J. Volf  "Weighting Techniques In Data Compression: Theory and 
Algorithms" Ph.D. thesis, Eindhoven University of Technology, Dec 2002
http://alexandria.tue.nl/extra2/200213835.pdf

24. T.M. Cover, J.A. Thomas
"Elements of Information Theory" Wiley 1991

P1. Earlier post on AC multiplicative recurrences:

http://groups.google.com/group/comp.compression/msg/71847a32daa9a571
0
nightlight
1/8/2006 9:27:25 PM
First, few minor problems with your answer:

> The starting range is [0..256) ...
>  Therefore, the range is subdivided as follows:
>  [0..170) for a '0', and [170..255) for a '1'. ...

You start with full "range" [0..256), but then you switch
the endpoints from there on to [*..255). You also meld together
two mutually exclusive notations, the Pascal range A..B
with the semi-open interval math notation [A,B) for a "range".
The Pascal "range" A..B includes into the interval both points
A and B, while the semi-open interval [A,B) doesn't include B.
That notational meld of two mutually exclusive prescriptions
was probably what led to your endpoint mixup 256 vs. 255. The fact that
255 is divisible by 3, while 256 is not gave it an extra wishful nudge
to go with the 255. I will switch to a single coherent notation, [A,B),
hence use e.g. [*,256).

> There is a coding gap caused by having to terminate
> the encoder, but that is a fixed cost of two or three
> bits, not a loss at each step as you have claimed.

The AC doesn't normally build the complete, ready to go output code on
each step. Part of the output code is implicit in the coder's state
variables and the implementation conventions. If at any point the coder
is told to wrap it up and provide output code, the excess of 1-2 bits
on that total output code will be in that output in the case of an
infinite precision AC coder (ACX). This excess will be 2-3 bits in the
case of a finite precision AC coder (ACF). Hence, these 1-2 or 2-3
excess bits on the total output are part of the AC output throughout --
they are only represented differently at different stages (implicit in
the algorithm conventions and the internal state variables until the
last step, at which point the output code creation is finalized and
they become explicitly part of that output). Hence, at best your
argument above is vacuous (arguing about ambiguities in verbal
conventions), and at worst it is incorrect. I will grant you the
benefit of the best case.

Well, on the positive side, at least you recognize that these excess
bits result in a "coding gap", which they do, of course.

There is an additional and unrelated misconception revealed in the last
part of that sentence:

> ... but that is a fixed cost of two or three
> bits, not a loss at each step as you have claimed.

An infinite number of additive contributions can easily produce a sum
with fixed upper bound, or within some fixed range, such as 1-2 excess
for ACX or 2-3 for ACF. For example 1 + 1/3 + 1/3^2 +... = 1.5 and the
partial sums after 2 terms are between 1.33 and 1.5. Hence, that the
cost is "fixed" (or rather the cost interval is fixed) gives you no
information on how many steps contributed to the cost.

> Furthermore, this termination cost can be avoided as
> well, with careful handling.

Not in the general AC coding setup. In a special cases, e.g. when the
number of symbols is prescribed and known upfront to encoder and
decoder, you can avoid, at least part of it. In a general AC setup,
without any such "side information" provided, the 1-2 bit excess for
ACX (or 2-3 for ACF) is a part of the standard ACX code length (needed
to select unambiguously the final codeword interval):

     L = ceiling(log(1/Pc)) + 1 .... (1)

whole bits (see [41a], pp. 13-14), where Pc is the 'coding probability'
(the probabilities you compute with in AC) of the complete message. For
a stationary order-0 AC model, Pc is simply p^k * (1-p)^(n-k), where
p=AC's probability of 1's, k=count of 1's, n=count of all symbols.
Hence, the upper bound for ACX redundancy is 2 bits (unlike the Huffman
or exact EC code, where the +1 is absent and where the upper bound is
1, resulting solely from rounding up of the log(1/Pc) to the next whole
bit).

Now to the fundamental problem of your answer -- the complete and
hopeless conceptual meld of the three distinct concepts of "interval"
relevant in arithmetic coding. Hence, before we can head toward any
approximation of a coherent discussion, we need a finer resolution vew
of these three kinds of "intervals":

1. IVF ==> the coder's internal variables in finite precision
specifying finite precision intervals (these are your [0,85),
[85,170),... etc. variables).

2. IVX ==> the coder's internal variable specifying an infinite
precision interval (this type is implicit in your description, when you
are referring to your intervals as approximate). While one can't store
infinite precision binary fraction in an internal variable, one can
easily store an equivalent rational fractions with unlimited precision
integers for numerators and denominators (e.g. as done in [41a] pp.
13-14, 46).

3. CWI ==> Codeword interval - this is the interval defined by the AC
codeword (the explicit codeword ready for output or the implicit
codeword which could produced at each step). The AC codeword, implicit
or explicit, is a binary string C of finite length L (for ACX L is
given via (1)). The usual AC convention is to interpret C as
representing fractional binary digits of some rational number Z,
defined as: Z = 0.C. For example, if the AC codeword is C=1010 then
L=4, Z=0.1010 = 1/2+0/4+1/8+0/16 = 5/8 = 0.625. The mapping of
codewords C to intervals CWI is used more generally than just for AC
analysis e.g. in the Kraft inequality proofs (cf. Cover & Thomas IT
textbook [24]). The common convention for this mapping is as follows
([24] eq. (24) which I use below as (2); also [41a] pp. 13-14 & eq.
(1.7)):

      CWI = [Z, Z+G)  ....  (2)

where the "granularity" G (=the interval length) is defined as:

      G=1/2^L  .... (3)

Some examples of codewords C and their corresponding intervals CWI are
shown below:

 C ... CWI
-------------------------
 0 ... [0, 0.5)
 1 ... [0.5, 1)
00 ... [0, 0.25)
01 ... [0.25, 0.5)
10 ... [0.5, 0.75)
11 ... [0.75, 1) ... etc.
-------------------------

The soft & fuzzy pedagogical descriptions of the AC encoding algorithm
usually describe construction of the nested IVX or IVF intervals from
the input sequence, then state that the coder transmits binary digits
of some point Z from within the final IVX or IVF interval as a
specification for that interval and they'll prescribe that the number
of digits should be given via (1) (sometimes omitting the +1 term).
But, the point Z given with L fractional bits C, can't all by itself
specify even the IVF, let alone IVX.

The only interval that Z can specify unambiguously, and only if further
supplemented with a precise convention, such as eqs. (2) and (3), is
the interval CWI. The higher resolution descriptions (such as [41a] pp.
13-14, 41-48) explain and derive code length (1) -- its origin is in
the requirement (in ICW terms used above) that the interval ICW has to
fit within the coder's final "internal variable" interval IVX or IVF
(cf. eq. (1.7) in [41a]).

We can look now at a set {C} = {C1,C2,...} containing all possible
codewords that an AC can produce from a given source (this may be an
infinite set). The corresponding intervals CWI(C1), CWI(C2),... form a
set {CWI(C)} of non-overlapping intervals (consequence of the AC output
decodability). Since
all CWI intervals fit within [0, 1), the non-overlapping property
implies that their lengths G(C1), G(C2),... can add up to at most 1,
i.e.

    G(C1)+G(C2)+G(C3)+... =< 1   ... (4)

The eq. (4), recalling the definition (3) of G's, is the Kraft
inequality which must hold for any uniquely decodable codes (cf. [24]
p. 90, McMillan theorem), including our set {C}.

The concept of "coding gaps" (in code space or in code intervals),
which is our topic here, is normally used in the context of Kraft
inequality (4) as follows: The codes for which (4) holds as the exact
equality are called "compact" codes and such codes don't have "coding
gaps". The codes for which (4) holds only as the "<" part of the "=<"
inequality have "coding space gaps" or "coding gaps" (unused codes).
The term "interval gap" simply refers to these same gaps in terms of
the CWI intervals whose lenghts L(C1), L(C2)... are used in (4).

Even in the low res analysis you apparently did manage somehow to catch
a vague glimpse, however briefly, of what these gaps refer too, since
you said:

> There is a coding gap caused by having to terminate
> the encoder, but that is a fixed cost of two or three
> bits, ...

Clearly, you are realizing above that adding 2 or 3 bits as a
termination cost to the codewords, does create gaps somehow and
somewhere. But since that didn't seem to fit into the your
conceptual meld in which all three interval types are just the
"interval" and where you can make your internal variables interval IVF
fit together as tightly as you wish, you couldn't quite see how could
the "interval^3" have this gap. Now, with the three higher res interval
concepts in front of you, you can look at (3) and see that adding 2 or
3 to L will make the corresponding CWI interval length G smaller by 4
to 8 times, hence a pretty big gap will be opened in the CWI coverage
(4) of the interval [0,1) (since 75-87 percent of that CWI will turn
into the unused code space, the interval gap).

But seeing in it the low res back then, you concluded:

>... but that is a fixed cost of two or three
> bits, not a loss at each step as you have claimed.

Now look at (1) which shows how the L is computed -- the smaller the
probability Pc of the whole symbol sequence, the larger the L in (1)
will be. But Pc in (1) is the coding probability for the particular
sequence e.g. for order 0 model (as we used), Pc is a series of
truncated products of probabilities p & q of individual symbols
encountered. Now what happens if you have the actual p=1/3, which is an
infinite binary fraction 1/3=0.01010101... and use instead p'=0.010 for
AC coding? As you probably know from basic coding theory, the greater
the deviation of p' from the actual probability p, the longer on
average the codeword length L. In turn, the increase in avg. L shortens
the average CWI interval length G via (3), contributing to the increase
in CWI interval gap (which is the shortfall from 1.0 in Kraft
inequality (4)). Note also that the average increase in L is the result
of accumulation of these contributions along the way from all single
coding step coding probability deviations from the actual probability
at that step.

The exactly same coding gaps happen with the QI coder due to its finite
precision. Since QI's construction of interval lengths (eq. (21) p. 8
in [T3]) is bottom up, the accumulation is much more transparent here.
You can even see it as it accumulates step-by-step using the QI.exe
program, option "cbr" (to show complete binomial row; set also option
n50 since default is n=1024 which would scroll off). The QI errors are
due to rounding up to g (g=32) bit SWI mantissa in (21). The "cbr"
command shows all quantized SWI binomials in row n, with asterisk next
to those which were rounded up when computed via recurrence (21), which
here, for the binary coder, is simply the Pascal triangle recurrence:
C(n,k) = C(n-1,k) + C(n-1,k-1). The columns labeled dm16 and dm10 show
how much this increments of mantissa have accumulated over all such
adds (for all prior n). The "Extra Bits" column shows how much these dm
contrinbute to excess bits (usually 1/10^9). If the mantissa were
shorter, or if n is very large, these accumulations of increments would
eventually cause mantissa to overflow, which would then increase the
SWI exponent by 1, hence lenghtening the output by 1 bit.

Unlike QI, AC uses multiplicative binomial recurrences, described in
earlier post [P1], eq (1), (2). Even more unlike QI, which does this
only during the table construction (which is a universal table, the
same table for all source probabilities, hence it can be premade &
loaded from a file when the program starts), AC computes them from
scratch on every coding task and for every symbol along the way. The
final difference from QI, is that AC works top down, from the largest
binomial or addend (point B on Fig 1 in [T3], see also [T2] pp. 19),
which in AC is scaled to 1, and all other addends are smaller than this
since they're all divided by this largest binomial (which is at the
endpoint of the path, cf [T2] pp. 19-25 for details on this very close
correspondence between the two coders). Hence, the rounding down
arithmetic of AC works by fixing the starting binomial to 1 (or an
addend generally) and reducing the size of the lower order ones,
resulting at the end in smaller Pc in (1), thus longer output L. With
QI which computes from lower to higher binomials, the higher binomials
are incremented when rounding (since SWI rounding in (21) is up). So,
while QI's rounding up errors accumulate toward the larger addends
making them larger, the AC's rounding down errors accumulate towarad
the smaller addends making them smaller, which it eventually pays for
via (1) in 1-2 excess bits for ACX or 2-3 excess bits for ACF (note
that for AVF, the excess can go beyond 2-3 if the input gets too long
for the frequency counters).

As to your IVF intervals fitting perfectly together, that is irrelevant
for the gaps in the ICW intervals (which are the intervals defined by
the codeword C, the quantity that is transmitted and to which (4), the
gaps, the unused codes & the redundancy apply). IVFs are your internal
variables and you're managing them as you wish (e.g. what to do with
their fit when you need to drop their digits in finite precision). If
you want to fit the IVFs together, all tight and snuggly, thats' fine
and you are free to do so, but that won't make the gaps in the CWI go
away or average L become shorter. For example, you could reduce your AC
precision to 4 bits and your avg. excess bits, the unused code space
and the CWI interval gaps will baloon (see table 3.4 p. 52 in [41a] for
effects of the 4 bit AC precision), yet all your internal variables
IVFs are still happily covering the entire interval [0,1), as snuggly
as ever, with no gaps in sight.

You're welcome, of course, to write out all the codes AC will produce
on that example (under conditions described i.e. no special ad hoc code
tweaks for the 3 short strings; or just run some AC, in decrementing
mode as you did in your description) and show how the CWI intervals for
them fit gaplessly or equivalently, show how these actual AC codes use
entire output code space without "coding gaps" (and, obviously, without
contradicting your earlier statement on "coding gaps" due to 2-3 bit
termination overhead).

> I hope to have cleared up your misgivings about Arith
> Coding with this.

Well, I hope yours will clear up, too :)

-- References ( http://www.1stworks.com/ref/RefLib.htm )

41a. P.A.J. Volf  "Weighting Techniques In Data Compression: Theory and
Algorithms" Ph.D. thesis, Eindhoven University of Technology, Dec 2002
http://alexandria.tue.nl/extra2/200213835.pdf

24. T.M. Cover, J.A. Thomas
"Elements of Information Theory" Wiley 1991

P1. Earlier post on AC multiplicative recurrences:

http://groups.google.com/group/comp.compression/msg/71847a32daa9a571

0
nightlight
1/8/2006 9:29:19 PM
nightlight <nightlight.skip-this@and-this.omegapoint.com> wrote in
news:yIGdndvhfe_GGVzenZ2dnUVZ_tqdnZ2d@rcn.net: 


  First some minor problems with you whole line of thought.
We can easily code this so most of your anwser has nothing
to do with reality. Second just what you want coded is unclear
I have tried to ask for enough details so others can look at
real test results instead of the long rants. Third are you
going to write code that would work on real files for this
example or is that to hard even though you designed this
yourself to make AC look bad you your stuff good?



> > There is a coding gap caused by having to terminate
> > the encoder, but that is a fixed cost of two or three
> > bits, not a loss at each step as you have claimed.
> 
> The AC doesn't normally build the complete, ready to go output code on
> each step. Part of the output code is implicit in the coder's state 
> variables and the implementation conventions. If at any point the
> coder is told to wrap it up and provide output code, the excess of 1-2
> bits on that total output code will be in that output in the case of
> an infinite precision AC coder (ACX). This excess will be 2-3 bits in
> the case of a finite precision AC coder (ACF). Hence, these 1-2 or 2-3
> excess bits on the total output are part of the AC output throughout
> -- they are only represented differently at different stages (implicit
> in the algorithm conventions and the internal state variables until
> the last step, at which point the output code creation is finalized
> and they become explicitly part of that output). Hence, at best your
> argument above is vacuous (arguing about ambiguities in verbal
> conventions), and at worst it is incorrect. I will grant you the
> benefit of the best case. 
> 
> Well, on the positive side, at least you recognize that these excess 
> bits result in a "coding gap", which they do, of course.
> 

    Actually thats not entirely true. As one does a full bijective
arithemtic file encoder for this you write out only what is done when
compressing you realize for each or the 3 possible output you need
roughly 1.58 bits So naturally the coder is carrying the excess around
till the input terminates.  On termination some times you round up
sometimes you round down. The interesting thing is this.
Take the output file from a long set of inputs but change the last
byte of the compressed file. In fact change it so that you have 256
files each with a different last value.  In turn each of these files
will decompress back to a valid input that if recompressed goes back
to same unique compressed file. THER ARE NO GAPS. Can you do this with
your method. I don't think so.


  REST OF USELESS RANT CUT




David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/8/2006 9:55:55 PM
--- Errata:

The citation:

 > (cf. Cover & Thomas IT textbook [24]). The
 > common convention for this mapping is as follows
 > ([24] eq. (24)

should have in the last line:

([24] eq. 5.11 p. 84

0
nightlight
1/8/2006 9:56:53 PM
nightlight wrote:
) First, few minor problems with your answer:
)
) > The starting range is [0..256) ...
) >  Therefore, the range is subdivided as follows:
) >  [0..170) for a '0', and [170..255) for a '1'. ...
)
) You start with full "range" [0..256), but then you switch
) the endpoints from there on to [*..255). You also meld together
) two mutually exclusive notations, the Pascal range A..B
) with the semi-open interval math notation [A,B) for a "range".

Semantics, and a typo.  I trust it was obvious what I meant.

) > There is a coding gap caused by having to terminate
) > the encoder, but that is a fixed cost of two or three
) > bits, not a loss at each step as you have claimed.
)
) The AC doesn't normally build the complete, ready to go output code on 
) each step. Part of the output code is implicit in the coder's state 
) variables and the implementation conventions. If at any point the coder 
) is told to wrap it up and provide output code, the excess of 1-2 bits on 
) that total output code will be in that output in the case of an infinite 
) precision AC coder (ACX). This excess will be 2-3 bits in the case of a 
) finite precision AC coder (ACF). Hence, these 1-2 or 2-3 excess bits on 
) the total output are part of the AC output throughout

This does not follow.  As I see it, there are no excess bits in an Arith
Coder, until the coder is told to wrap it up.  Can you give some clear
arguments why you claim that the bits added by termination are present
before termination ?

To any sane person, the phrase 'a loss at each step' implies that this
loss will grow as the number of steps increases.


) Well, on the positive side, at least you recognize that these excess 
) bits result in a "coding gap", which they do, of course.

A coding gap of a whopping two bits.  If your claim is that AC is worse
than QI because of those two bits, I laugh in your face.

I have snipped your whole discussion on those two or three bits because
they are irrelevant, and your claim that a cost that does not grow as
the number of steps increases is nevertheless incurred at each step is
ridiculous and arbitrary.

) Now to the fundamental problem of your answer -- the complete and 
) hopeless conceptual meld of the three distinct concepts of "interval" 
) relevant in arithmetic coding. Hence, before we can head toward any 
) approximation of a coherent discussion, we need a finer resolution vew 
) of these three kinds of "intervals":

I have snipped your entire discussion below, because in the end it still
boils down to the fixed cost of two or three bits of excess.

) You're welcome, of course, to write out all the codes AC will produce on 
) that example (under conditions described i.e. no special ad hoc code 
) tweaks for the 3 short strings; or just run some AC, in decrementing 
) mode as you did in your description) and show how the CWI intervals for 
) them fit gaplessly or equivalently, show how these actual AC codes use 
) entire output code space without "coding gaps" (and, obviously, without 
) contradicting your earlier statement on "coding gaps" due to 2-3 bit 
) termination overhead).

As evident from your conclusion.  As I said before, I will now laugh
in your face.  You have written page upon page of explanation, all of
which in the end revolved around a fixed coding excess of a few bits.


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
0
Willem
1/8/2006 11:18:30 PM
> This analysis is no different from any other analysis
> in that you have to make lots of assumptions. This
> means that if you use such an analysis to make
> real-world predictions, then that depends
> on how well your assumptions match the real world.

I see now where you're misreading my redundancy
statements in this and in the earlier thread. The
redundancy I was deriving in [T3] at end of page 8,
and talking about here, is simply the number of
excess bits that the finite precision arithmetic
used by QI will add to the unlimited precision
enumerative code. The paragraph on page 8 after all,
starts with "To obtain QI redundancy d(g) _due to
SW rounding_ in (21)..." The upper bound for
_this_ redundancy d(g) obtained there is:

      d(g) < log(e)/2^(g-1) .... (1)

No model or source assumptions are needed for (1), and
none is reflected in (1). The only parameter in (1) is
the assumed arithmetic precision of g bits (the SW
mantissa length). The eq. (1) is simply an upper bound
on any additional bits relative to the exact EC, produced
by QI due to its use of SW rounding up in eq. (21). It has
nothing to do with how well EC or QI will code in any
given setup (model, source, parameter coding method, etc).
If the exact EC model is perfect in a given setup, than
(1) shows what is the maximum that QI can fall short of
that "perfect" output. If EC is coding using an imperfect
model, resulting in some redundancy of R bits per symbol
relative to the best model, (1) shows the maximum that
QI can add to R. But (1) doesn't care or predict what
R is. The two types of redundancy are completely unrelated.

The question of how well the exact EC and AC can code in
different coding setups, is an entirely different and a
much larger topic, well covered in the literature, starting
with Davisson's [28] and many other papers since.

Krichevsky-Trofimov's paper [33] provides great many
bounds for variety of coding setups. Some related later
results are in [40],[41],[41a]-[41d]. The basic result
is that even for the unlimited precision coders (and
coding under 'everything else set the same') the exact
EC has a slightly lower redundancy than the exact AC (by
approximately 1 bit for the entire input, for max & avg).
This is the same difference as between the Huffman and
the Shannon-Fano prefix codes. Even the origin of the
difference is the same: the  bottom-up addend construction,
as done by QI and Huffman, is tighter than the top down
addend construction, as done by AC & Shannon-Fano.

Now to your main questions, starting with the last one,
which is the most specific:

> assuming I have a stream of symbols, where at each
> position in the stream, the probability distribution
> of the symbols is different, then how does QI coder
> adapt itself to all those different distributions ?

This is the scenario of AC modeling engine feeding QI,
which was sketched in note N7 on p. 9, [T3]. Two ways
are described there:

a) --- QI coding AC style

QI can code the AC style by performing "Lattice
Jumps". It is simplest to see how this is done by looking
at the decoder, arriving at some point B=(x,y) (see
Fig. 1, p. 4, [T3]). The path count at B is N(x,y)=56.
The index at point B can have values 0..N(x,y)-1, hence
the length of the index at B is log(N(x,y))=log(56) bits.
If bit=1 gets decoded (as shown by path on Fig 1), the
decoder moves up, to point BA=(x,y-1), which has the
path count 21, hence the index has length log(21) bits.
Hence, upon decoding bit=1 at B, the index length has
dropped by log(56)-log(21)=log(8/3) bits, which is
precisely the ideal code length log(1/p) for bit=1
at B, where p=3/8=probability of 1 at B. If bit=0
gets decoded at B, decoder moves to point BL=(x-1,y)
where path count is N(x-1,y)=35, hence the index
length is log(35). In this case the index length
drops by log(56)-log(35)=log(8/5) which is exactly
same as the ideal code length log(1/q), where q=5/8
is probability of 0 at B. It is easy to see from
multiplicative recurrences for binomial coefficients
(eq's (1) & (2) from the previous post here) that
this pattern always holds - after every decode step,
the index length drops by exactly log(1/P), where P
is the probability of the decoded symbol. Analogous
relation holds for each encode step, where the index
length increases by the ideal code length of the
encoded symbol at that point. Note also that due to
integer arithmetic, this is not an approximate
optimality (such as one would get using truncated
infinite fractions, as AC does). With QI/EC, this
coding optimality at every point is built into
the table entries. { You can check the quantization
errors using e.g. QI.exe cbr n36, which shows no
quantization errors for n=36 (or below), and with
n=37, the 1st error for k=16, of just +1 in the
SW mantissa which adds 4e-10 bits to the index.}

With QI, for a general point B=(x,y), the quantized
path count L(x,y) (computed via (21)) is an SW integer
with a g-bit mantissa w(x,y) and exponent e(x,y).
The ideal code lengths and ratios for the steps
from B described above still hold, but only within
the limits d(g). In particular, L(x,y-1)/L(x,y) is
approx. =p=y/(x+y) and L(x-1,y)/L(x,y)=q=x/(x+y).

The index at B will have for the leading g bits
at the bit offset e(x,y) some g-bit integer Iw(x,y)
which is in the interval [0,w(x,y)-1] (this is a
simple consequence of the index at any point
ranging from 0 to path count-1 and the fact that
quantized path count L(x,y) has trailing zeros
after the leading g bits given by w(x,y), hence
L(x,y)-1 will decrement w(x,y)). We can thus view
for any point B(x,y) and index I(x,y), the Iw(x,y)
as a digit in the radix w(x,y).

Suppose now, decoder at B=(x,y) gets from the modeler
some probabilities p' and q' different from p,q. To
continue decoding, decoder makes a jump to another
lattice point B'=(x',y') where x'/(x'+y')=p' and
y'/(x'+y')=q'. One can use Farey fractions (see
[F]) to obtain the optimum such point for any
given binomial table size. Alternatively, one
can simply jump to another point on the same
front i.e. one would keep n fixed, x+y=n=x'+y'
and select point B' using x'=n*p'. The path
count at B' is L(x',y') with mantissa w(x',y')
and exponent e(x',y'), which are different from
w(x,y) and e(x,y). The exponent is easy to adjust:
you simply change the presumed position of the
least significant bit of the index I(x',y') (this
is the origin, A on Fig 1., toward which decoder
is heading, but hasn't reached yet since there
are more symbols to decode; in the QI source
code in file EncDec.c this presumed origin of the
index is given as argument "sb" to function qiDec()).

The main work is with the difference in the path
count mantissas w(x,y) and w(x',y') at B and B'.
Namely at B' the leading g bits of index Iw(x',y')
have to be a digit in the radix w'=w(x',y'). But we
only have a g-bit digit left over from B which is
in the radix w=w(x,y). So, the problem here is
that of radix conversion -- we have a digit Iw
in radix w and we need a digit Iw' in radix w'.
There are several ways to do this. A conceptually
simple one is as follows: decoder extracts the
digit Iw and encodes it as digit of some mixed radix
output integer M, which serves as an accumulator or
recycler for all such 'orphaned' Iw digits. The bits
of M (which are an arbitrary bit pattern, being a
binary form of a mixed radix integer) can simply be
reused, e.g. by growing M at the unprocessed end
of the compressed input (or just having M as separate
component). At this stage the encoder would have
done the opposite - it would have "decoded" (see
file Radix.c, function dec_radix()) the far end
of the compressed data (which was an arbitrary
binary pattern) into a digit Iw in radix w and
concatenated it to the leading end of the index.
There are other similar ways to perform this
radix conversion, all of them using amount of
processing per symbol very similar to the
conventional AC algorithm. They all also have
to perform explicit coding/decoding operations
(which include mul/div) for both, the most and
the least probable symbols, just as AC does.

The summary of this is that if you want the AC
modeling plus AC coding style, you get the AC
speed and the AC accuracy. The AC scheme, with its
'single next symbol probability' bottleneck interface
between the modeler & the coder (where the modeler
micro-manages the coder, symbol by symbol, and where
the whole coder+ modeler processing and interaction
is traversed from top to bottom on every symbol)
is simply intrinsically a poor division of labor
to allow for any high performance coding.

It is analogous to organizing car manufacturing,
and requiring that the next car can be started
only after the current car is complete and out
the door. That's a kind of conceptual constraint
imposed by the AC modeling "paradigm" as its
so-called "online" coding requirement. This
online" is taken to mean some kind of analog,
memoryless, CPU-less, Morse telegraph device.

That has nothing to do with the actual online
as it is done, or any actual requirements or
inherent design constraints. One normally has
a fairly large buffer space and processor which
can access any of it, running programs of high
complexity. Internet would grind to a halt if
its protocols interpreted "online" as a constraint
to have to send or receive a single bit (or a
single symbol) at a time. Even the old style
point-to-point modems had several KB buffers
to accumulate, batch and compress the data.
And similarly for disk sectors & clusters.

The point of the above critique of the present
AC "paradigm" is that it is simply a gratuitous,
historically accidental conceptual bottleneck and
an intrinsic performance drain. Any algorithm that
follows its prescribed division of labor will bog
down. Once you snap out of its "online" spell,
many better possibilities open up. For example,
even retaining the AC modeling engine, with its
"probability of the next single symbol" bottleneck
parametrization of all the information about the
sequence being encoded, but just allowing coder to
ignore the imagined "online" constraint, one can
get much better performance with QI as follows:

b) --- QI Modeling AC style

QI breaks the probabilities into classes, so that
each class includes an interval of probabilities of
size 1/sqrt(n), where n is the size of data to be
encoded. Since coder doesn't assume any more the Morse
telegraph kind of "online", it doesn't assume n is 1
but some much larger number. The modeler is still left
to work under the old "online" spell and imagine that
it has to convert, by hook or crook, all it knows
or that it could know about the input sequence into
the probabilities for the next single symbol p(c).

Consider now a binary sequence of n symbols, for which
the modeler produces, symbol by symbol probabilities
p of bit=1, with p in some interval D=[a,b), of size
d=b-a. We divide D into s=sqrt(n) equal subintervals
of lengths d/s. Each input symbol is assigned to
one of s enumerative classes (thus enumerated by a
separate index) based on the subinterval in which
the modeler's p at that point falls in. Hence, we're
using quantized probabilities to classify the symbols
as "equiprobable". The excess output E in bits per
symbol due to this 'p quantization' is about
(cf. [41c], p. 8):

    E = [dp^2/p+dq^2/q)]*log(e)/2 .... (2)

where dp & dq are quantization errors of p and q. Since
dp & dq are =< 1/s, then E =< log(e)/2npq = O(1/n). Note
that since adaptive AC probability estimates have also
sampling error dp, dq of order 1/sqrt(n), this redundancy
is of the similar size as that of the adaptive AC. One can
further optimize this method (to reduce its worst case)
by selecting non-uniform partition of interval D, so that
the subintervals around smaller probabilities are shorter.

In practical situations, the AC modeler would be producing
its predictions p(c) based on statistics from the processed
part of the input, hence its probabilities would already
have a built in sampling error interval (which decreases
as 1/sqrt(n)), which can be used by the QI coder as the
partition criteria for the enumerative classes (instead of
an ad hoc partition described above). Various existent
methods for growing and splitting the contexts based on
such past statistics, such as CTW or Rissanen's Context,
would transfer here as methods for generating enumerative
classes adaptively.

For the multi-alphabet case one would perform the decomposition
described [T1] pp. 31-38 with the only difference that instead
of combining the symbol counts k(c) based on symbol's binary
code c, one would combine the modeler's probabilities p(c).

A special case of interest for this method are the finite
order Markov sources. Here, for order m, the probabilities
of the next symbol are defined by the m previous symbols.
For smaller m, one could simply bypass the computation
of probabilities (since QI doesn't need them) and simply
assign the enumerative class for the next input symbol
directly: using m previous symbols as the class tag
(hence there would be 2^m classes in binary case).

In this case we can notice another advantage of QI/EC
coder for modeling over the AC: to encode a symbol QI
needs to know only whether the symbol has the same or
different probabilities as some other symbols, but unlike
AC, QI doesn't also need to know what values these
probabilities have. Hence, QI places much lower demand
on the modeling engine, since the modeler here can simply
pass on to QI the context ID (the last m symbols) and
QI will code the symbol into the index for that ID,
whatever its probability may be.

In conclusion for this method (b), QI can use AC modeling
engine, with its full speed advantage over AC, and with
the redundancy being same as that of an adaptive AC in
the same setup.

> how would you use a QI coder with an adaptive model ?

Sections (a) and (b) above have couple answers.

> if I read you correctly, it is not an adaptive coder,
> so how do you transmit the model information for the
> QI coder ?

It is not "adaptive AC", but as illustrated in (a),(b)
above, it can function that way. The native QI modeling
is descriptive, in the sense of Rissanen's MDL. So the
QI model information is a much wider type of information
than just a list of probabilities (although it can be
that, too).

Consider an order-0 QI coding. The modeling analogous to
the order-0 adaptive AC, except more resilient against
the "surprises", becomes here the selection of the
segmentation of the input into contiguous sections,
based on measured symbol frequencies. Its resilience
to surprises is illustrated in the row "Vary" (cf. table
on page 10 in [T3]). The adaptive order-0 modeler
for QI has entire input sequence available and it
does not have to rely on a possibly false assumption
that the symbol frequencies in the initial parts of
the sequence are predictive of the frequencies in the
later parts, or gamble on which way might they
be predictive. While AC can code this way, too,
all it would achieve with that would be to advance
into a literal role of a less accurate and a lot
slower imitation of QI.

QI order-0 adaptive modeler identifies contiguous
quasi-stationary sections of the input sequence and
uses them as enumerative classes. There are many ways
to do such segmentation and even more ways to encode
it, along with the corresponding section counts, into
the output. Some of these methods, especially the
encoding aspect, were developed already for conventional
EC, such as described in [11]-[15], [23]. I have also
developed several for QI (some of which were touched
upon in [T2], where the general QI/EC modeling pattern
is presented, pp. 26-35).

Due to a lack practical EC coder and the exaggerated
dominance of the AC modeling paradigm (hypertrophied
to the point of pathology by the absence of practical
competition), this entire field of EC modeling is
highly under-explored. With the precision & performance
problems finally solved by QI, an algorithmic gold mine
has opened, where just about anything you do, and there
is more to do than an eye can see, is a new algorithm,
maybe a great new discovery to be taught to kids ever after.



-- References ( http://www.1stworks.com/ref/RefLib.htm )

T1-T3 are on http://www.1stworks.com/ref/qi.htm

28. L.D. Davisson "Universal noiseless coding"
 IEEE Trans. Inform. Theory  IT-19 (6), 783-795, 1973
 http://cg.ensmp.fr/~vert/proj/bibli/local/Davisson1973Universal.pdf

33. R. Krichevsky, V. Trofimov
 "The performance of universal encoding"
 IEEE Trans. Inform. Theory  IT-27 (2), 199-207, 1981

http://cg.ensmp.fr/~vert/proj/bibli/local/Krichevsky1981performance.pdf

41. M.Drmota, H-K. Hwang, W. Szpankowski  "Precise Average Redundancy
of
 an Idealized Arithmetic Coding"  DCC 2002, 222-231.
 http://citeseer.ist.psu.edu/drmota02precise.html

34. J.G. Cleary, I.H. Witten
 "A Comparison of Enumerative and Adaptive Codes"
 IEEE Trans. Inform. Theory  IT-30 (2), 306-315, 1984
 http://www.1stworks.com/ref/Cleary84Enum.pdf

F. Farey Fractions:
   http://www.cut-the-knot.org/ctk/PickToFarey.shtml

41c. P.G. Howard, J.S. Vitter
 "Practical Implementations of Arithmetic Coding"
  Tech. Rep.  No. 92-18, CS, Brown University, 1992
  http://www.1stworks.com/ref/Howard92PractAri.pdf

0
nightlight
1/9/2006 5:56:27 AM
> You have written page upon page of explanation,
> all of which in the end revolved around a fixed
> coding excess of a few bits.

The topic being discussed in this sub-thread was about difference in
the quantization errors and the resulting gaps. For ACX (infinite
precision AC), this excess is 1-2 bits on total. For ACF (finite
precision AC) the error is 2-3 bits, provided AC is within its initial
bounds for counts (usually 2^24-2^30 or similar). Once it gets beyond,
unlike ACX, the error does grow beyond 2-3 bits, which becomes
dependent on this case is handled.

In any case, this is only one component of the ACF excess, which was
the particular subtopic we were discussing. You can make ACF perform at
about that level of excess by using decrementing AC (see [34]). In that
role, though, while ACF will code nearly as optimally as QI (within
these few bits, which grow very slowly with N), all it achieves is to
become a slightly less accurate and much slower exact work-alike of QI.

The regular adaptive or static AC's that one normally finds in
practical implementations, there will be an additional redundancy
relative to QI which, in case of order-0 stationary sources can be
derived exactly, as shown in [T2] pp. 20-24). That redundancy is in the
leading order 1/2 log(2P npq) for binary (for general alphabet of size
A, there are A-1 such terms which are summed, resulting in approx 1/2
changing to (A-1)/2). As shown in [T2], this error is due to the
approximate enumeration of AC, where the Stirling approximation and the
subsequent dropping of the sqrt() & other factors (each being <1),
causes AC addends to increase relative to QI, leading to an excess of
O(log(n)) bits on the total output.

This O(log(n)) order redundancy was the dominant effect shown in the
test table (p.10 in [T3]) for all but the last row (which illustrates
different effect). Depending of course on what the total output size is
(which depends on input size and density of 1's), this may amount to a
tiny fraction of 1% or to 5-6% for the parameter ranges shown in the
table.

Some superficial, low res, short fuse mind will stop right here,
concluding 'few percent' at best, it seems. What's the big deal. I can
do with my new xyz coder for images 35% better than jpeg,... or some
such.

To realize what's the big deal, you need to follow the 'white rabbit' a
bit further.

Since the enumeration excess of O(log(n)), or even the 2-3 bit
quantization excess, can be as large or even larger than the output
size (e.g. in low entropy limit or for short enough inputs of any
entropy per symbol), this can be 50% or 100% or _any_ ratio you want
provided you work with exact fractional bit sizes. You can encode to
such fractional sizes if you have several items of any kind (which may
belong to entirely different sources, different parts of the program or
different programs) to encode and all have some fractional bit sizes --
the mixed radix codes allow combining of these fractions so that the
only one whole bit rounding error is paid for the total. Hence, the
potential compression ratio in favor of QI is in principle unlimited.

The two kinds of "small" differences in redundancy are indeed small for
the usual things one codes with AC (or Huffman). The reason for that is
not because the differences are universally small, but because one
doesn't code with AC anything where it would be very poor. So, the
apparent "smallness" is a mere reflection of the narrowing of the
domain surveyed to what AC or Huffman algorithm code reasonably well.

But, there is a whole realm of unexplored algorithms, which would
require coding of many separate items, which in each item amount to a
tiny bit fraction, and which are ignored because AC won't do well here,
due to those "tiny" overheads O(log(n)) of fixed 2-3 bits per coding
task. Consider, for example BW-Transform output column R (Fig 5, p. 34,
[T2]). If you look at long enough contexts, the column R will fragment
into many small pieces. To account for longer contexts, one would want
to select for coding as fine fragmentation of R as practical. The
adaptive AC was found to perform very poorly here, even worse than
various ad hoc schemes that are in use now. { The row "Vary" in table
on p.10 in [T3] illustrates this effect, where on a 4K input, adaptive
order-0 AC outputs more than twice as much as the descriptive order-0
QI.} The general coding for long contexts is not the only area with
lots of highly fragmented small inputs. Differential frame updates for
video & audio codecs generate similar data as well.

In addition to domain of highly fragmented data, there is an area where
the ultra-low entropy limit effects become significant. These are the
database or data-warehouse bitmap indexes and keyword incidence bitmaps
for search engines. The big search engines will have such bitmaps of 8+
billion bits (or whatever the latest Google vs Yahoo vs MSN figures are
these days). The compression methods in these fields are either ad hoc
methods or runlengths, both of which break down on highly clustered and
very low average entropy data.

In relation to search engine & database domains (and few others),
another unique aspect of QI becomes highly relevant:

a) the compressed output size is available without decompressing the
data -- for example you know exactly the size of binomial coefficient
C(n,k) from count of 1's.

b) For fixed input entropy rate QI codes at _precisely_ the fixed
number of bits (which is also within the log(e)/2^(g-1) bits from the
entropy, at precision g and model used) e.g. all n! permutations of n
items will always be encoded into the _precisely_ the same size (you
can test that with QI.exe or see it in the source).

The Huffman and AC don't have these properties. Huffman will have
fairly large variations (easily as much as 10-15%) in the output size,
AC less than Huffman, but still of the order of O(log(n)) bits even at
fixed input entropy rate. Hence, if one wishes to store the compressed
items into a designated space, known in advance, with AC and Huffman,
one would have to reserve space for the worst case of variation,
increasing their redundancy.

Similarly, if one is coding complex data structures, with lots of
compressed pieces that need later to be traversed quickly and accessed
randomly, it is essential that the sizes of compressed pieces can be
known exactly, without expanding the data, which QI property (a)
assures. With Huffman & AC, you either need to pad the space to cover
the worst case compressed size or include the size as a separate item.
Either way wastes additional space.

Although the above types of setups occur in many fields, the complex
data structure aspects are especially relevant when coding and
traversing large networks or graphs, the most popular example of these
being the web links. But the network modeling paradigm has been gaining
in recent years in many other fields, such as computational biology,
where ever more complex webs of bio-chemical reactions are being
constructed on top of the growing DNA maps & genetic databases. Another
field with mathematically similar problems are in the ad hoc routing
for wireless networks.

Finally, there is a domain of constrained coding (used for the low
level recording media codes and channel coding), where the exact EC,
however cumbersome and inefficient in the conventional form, is still
sufficiently advantageous over the Huffman & AC that they have been
using it here since 1950s (see Immink's work). The QI improves speed
and memory requirements vs exact EC by a huge factor O(n), so these
areas can benefit directly and right away.

In conclusion, the availability of an extremly fast and a highly
accurate (optimal for any given aritmetic precsion) coding algorithm to
perform such coding tasks, opens the entire realm of unexplored
compression algorithms. The compression gains potentially available to
such algorithms are not limited at all to the few percent one can see
on the subset of coding tasks where Huffman or AC code well (which to
those over-conditioned to seeing only that domain, appears to be all
there is to compress).


-- References ( http://www.1stworks.com/ref/RefLib.htm )

T1-T3 are on http://www.1stworks.com/ref/qi.htm

34. J.G. Cleary, I.H. Witten  "A Comparison of Enumerative and Adaptive
Codes" IEEE Trans. Inform. Theory  IT-30 (2), 306-315, 1984
http://www.1stworks.com/ref/Cleary84Enum.pdf

0
nightlight
1/9/2006 1:38:45 PM
nightlight wrote:
) In any case, this is only one component of the ACF excess, which was
) the particular subtopic we were discussing. You can make ACF perform at
) about that level of excess by using decrementing AC (see [34]). In that
) role, though, while ACF will code nearly as optimally as QI (within
) these few bits, which grow very slowly with N), all it achieves is to
) become a slightly less accurate and much slower exact work-alike of QI.
)
) The regular adaptive or static AC's that one normally finds in
) practical implementations, there will be an additional redundancy
) relative to QI which, in case of order-0 stationary sources can be
) derived exactly, as shown in [T2] pp. 20-24). That redundancy is in the
) leading order 1/2 log(2P npq) for binary (for general alphabet of size
) A, there are A-1 such terms which are summed, resulting in approx 1/2
) changing to (A-1)/2). As shown in [T2], this error is due to the
) approximate enumeration of AC, where the Stirling approximation and the
) subsequent dropping of the sqrt() & other factors (each being <1),
) causes AC addends to increase relative to QI, leading to an excess of
) O(log(n)) bits on the total output.

This redundancy is offset by the need to transmit extra bits for the
modeling information, which is something you have neatly excluded in
your calculations.

) This O(log(n)) order redundancy was the dominant effect shown in the
) test table (p.10 in [T3]) for all but the last row (which illustrates
) different effect). Depending of course on what the total output size is
) (which depends on input size and density of 1's), this may amount to a
) tiny fraction of 1% or to 5-6% for the parameter ranges shown in the
) table.

I assume your tests only counted the actual bits output by the encoder,
and not the bits needed for transmitting the model ?  That makes it
an unfair comparison.


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
0
Willem
1/9/2006 1:57:04 PM
"Matt Mahoney" <matmahoney@yahoo.com> wrote in 
news:1136681911.708860.197090@g47g2000cwa.googlegroups.com:

> David A. Scott wrote:
>> "Matt Mahoney" <matmahoney@yahoo.com> wrote in
>> news:1136605351.691283.112110@g47g2000cwa.googlegroups.com:
>>
>>    In your model you use 9 bits for every 8 bits of data where starting
>> bit is a ZERO for each byte and then for EOF you allow the starting
>> bit to be ONE and stop the compressor. This does not allow all the
>> code space to be used.
>>    As a result of this modeling every compressed file with FPAQ0 has
>> the first bit of the fist byte set on output so technically the first
>> bit out is a total waste of space.
> 
> Actually all the code space is used because the first bit is 0 for an
> empty file.
> 

   You could leave the enpty file empty.

   And yes one could think of the code space there as being used when
one likes at compression. I guess I was also looking at decompression
of a long file and wondering what happens to the rest of the file
to be decompressed if what is returned is marked as an EOF while the
uncompressed file has much wasted file space since following bytes
will not be looked at.


> Of course it could be improved by making it bijective instead of using
> 2*log2(n) bits to encode the length in bytes.
>

   You don't have to go to that extreme and make it hard for most to 
follow. You could use just log2(n) bits to encode the length. It still
would not be bijective but it would not complicate the code that much
more and a rough file integrity check could be done during 
decompression without any additional output to the compressed file.

   Matt if I can get nightlight to commit to coding his example of
the 3 symols types. I would like to play again with fpaq0. To see 
how much better it can be made with as little change as possible.
I like your style but I don't think I will go to the wall and make
it bijective. But the nine times for each eight can be changed to
eight for eight with a ninth only needed for the last byte.


   But it looks like nightlight is all talk and will not even
attempt to code his simple example. Maybe he has looked at the
example he himself made up and realizes that he can't beat a
real aritmetic coder that can be written to do his own example
where QI was suppost to shine and arithmetic fail.

> -- Matt Mahoney
> 

  Again you code really it neat for the other methods. Where this
nodel code is different is that in your other coding you have the
length field in front of file so you never add this extra cost
through out the file. I think you can add the cost at the back end
and save a little extra space with very little change in source
code.

David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/9/2006 2:42:10 PM
> This redundancy is offset by the need to transmit extra bits for
> the modeling information, which is something you have neatly
> excluded in your calculations.

Not quite. The Stirling approx. & dropping of the sqrt(n) factor is
purely an arithmetic error (which increasies the AC addends) that you
can avoid in AC only by coding in decrementing mode, in which case both
coders have exactly the same parameter encoding. As explained few
messages ago, in some special cases (such as stationary Bernoulli
source) you can avoid the leading order of sqrt() error via KT
estimator (that is not available generally and even when available
includes tradeoffs as explained earlier in this thread). A simple intro
into the 2nd order AC redunancy is in the ref. [40].

> I assume your tests only counted the actual bits output by the encoder,
> and not the bits needed for transmitting the model ?  That makes
> it an unfair comparison.

Of course, not.  Each coder had to produce entirely self-contained
output, with data sizes and all counts packaged in. AC needed only to
encode its input size (which Moffat98 does using AC itself, as an outer
loop, called on every 8 bits of payload), while QI had several more
items. I already explained that in [T3] p. 9, other posts & the source
code package.  I coded the counts using a small Huffman code table
(prebuilt for the binomial distribution, just few KB size for n=1024; I
also have hypergeometric distrib. tables, but these were not used for
tests in [T3]), using total k sent separately (in log n bits since n is
known at that point; the n itself was coded using two part
self-delimiting codes, Huffman code for prefix specifyingt the length
of rest & binary for the rest). The total k was used to calculate the
running block to block average (which selects the Hufman subtable for
that average) and also to extracting the exact last block count as the
leftover k at that point. I also coded the leading 16 bits of mantissa
for each output block using the mixed radix codes (these can also use
tapered huffman codes if one doesn't care about 2-3 percent loss on
these 16 bits or about the artificial output size variability for fixed
entropy inputs, see item (a) & (b) in previous post).

The source is out there along with tips in the readme.txt on how to
compare, compression effectivness & speed vs AC or other coders.  This
is not some secret compression algorithm with magic powers and which no
one is allowed access to. The math alone should be enough for those
familiar enough with the relevant topics, to know how the compression
will come out, even without any source or tests. You're welcome to
experiment with source and report any abberations.

0
nightlight
1/9/2006 2:53:18 PM
Willem <willem@stack.nl> wrote in news:slrnds4qtg.frs.willem@toad.stack.nl:

> nightlight wrote:
> ) In any case, this is only one component of the ACF excess, which was
> ) the particular subtopic we were discussing. You can make ACF perform at
> ) about that level of excess by using decrementing AC (see [34]). In that
> ) role, though, while ACF will code nearly as optimally as QI (within
> ) these few bits, which grow very slowly with N), all it achieves is to
> ) become a slightly less accurate and much slower exact work-alike of QI.
> )
> ) The regular adaptive or static AC's that one normally finds in
> ) practical implementations, there will be an additional redundancy
> ) relative to QI which, in case of order-0 stationary sources can be
> ) derived exactly, as shown in [T2] pp. 20-24). That redundancy is in the
> ) leading order 1/2 log(2P npq) for binary (for general alphabet of size
> ) A, there are A-1 such terms which are summed, resulting in approx 1/2
> ) changing to (A-1)/2). As shown in [T2], this error is due to the
> ) approximate enumeration of AC, where the Stirling approximation and the
> ) subsequent dropping of the sqrt() & other factors (each being <1),
> ) causes AC addends to increase relative to QI, leading to an excess of
> ) O(log(n)) bits on the total output.
> 
> This redundancy is offset by the need to transmit extra bits for the
> modeling information, which is something you have neatly excluded in
> your calculations.
> 

    Would you really expect him to actaully do the example he proposed.
This guy seems all talk and no action even on the very example he proposed
to code.

> ) This O(log(n)) order redundancy was the dominant effect shown in the
> ) test table (p.10 in [T3]) for all but the last row (which illustrates
> ) different effect). Depending of course on what the total output size is
> ) (which depends on input size and density of 1's), this may amount to a
> ) tiny fraction of 1% or to 5-6% for the parameter ranges shown in the
> ) table.
> 
> I assume your tests only counted the actual bits output by the encoder,
> and not the bits needed for transmitting the model ?  That makes it
> an unfair comparison.
>

   Of course he tried to make an unfair comparison. He wants his method to
look best. However I am sure I can change my code to do the example he
proposed in a bijective way, So go ahead press him to write code that any
one can check with files of there own. He will not do it since I don't
think he has the ability to follow through even on his own example. He can
quote text that he seems to not understand when it come to arithmetic but
can he write real code for even the simple example he proposed. Don't
hold you breath I think we have see his kind here every year. So lets
cut the chase and see if he will actually do anything.


   NIGHTLIGHT where is your code. Lets nail the example down to files.
Where one had to compress and then has to decompress back, Since its
not a real test unless one can go both ways and users of the group can
actually test it.

   WE ARE WAITING!!!!
 
> 
> SaSW, Willem



David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/9/2006 2:58:29 PM
"nightlight" <nightlight@omegapoint.com> wrote in 
news:1136818397.999924.283680@z14g2000cwz.googlegroups.com:

> 
> Of course, not.  Each coder had to produce entirely self-contained
> output, with data sizes and all counts packaged in. AC needed only to
> encode its input size (which Moffat98 does using AC itself, as an outer
> loop, called on every 8 bits of payload), while QI had several more
> items. I already explained that in [T3] p. 9, other posts & the source
> code package.  I coded the counts using a small Huffman code table
> (prebuilt for the binomial distribution, just few KB size for n=1024; I
> also have hypergeometric distrib. tables, but these were not used for
> tests in [T3]), using total k sent separately (in log n bits since n is
> known at that point; the n itself was coded using two part
> self-delimiting codes, Huffman code for prefix specifyingt the length
> of rest & binary for the rest). The total k was used to calculate the
> running block to block average (which selects the Hufman subtable for
> that average) and also to extracting the exact last block count as the
> leftover k at that point. I also coded the leading 16 bits of mantissa
> for each output block using the mixed radix codes (these can also use
> tapered huffman codes if one doesn't care about 2-3 percent loss on
> these 16 bits or about the artificial output size variability for fixed
> entropy inputs, see item (a) & (b) in previous post).
> 
> 

   I guess this is your way of saying you can't even code your own
example. What are you afraid of. I think you fear losing your own
contest??


David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/9/2006 3:02:23 PM
> Discarding part of the range is one way to deal with finite
> precision, for example the carryless rangecoder in ppmd.
> However the various coders in paq6, paq7, fpaq0, etc. do
> not have any gaps in the code space.  These are carryless
> binary arithmetic coders with 32 bits precision and 12
> bit representation of probabilities.

I responded to a similar objection from Willem in another post [M1] in
more detail, so I will just supplement the differential aspect here,
referring to the formulas & ref's there.

The AC carry problem which is due to the usual FIFO coding mode can be
avoided by either using LIFO mode (as Rissanen's AC-76) or buffering
the whole data and propagating the carry. But the carry redundancy or
any explicit gaps in the IVF (see [M1]) intervals are a only part of
the quantization errors.

The unavoidable fundamental quantization bit excess comes from the
reduction in size of the Pc (see eq. (1) in [M1]), which is the coding
probability of the complete message. This reduction in size of Pc as
gets computed along the string (e.g. by mutiplying with the single
symbol probabilities along the way) is instrinsic to the truncating
arithmetic of AC -- there is no way around it since one can't just
round the product down along one branch and round it up along the
alternative branch, due to the Kraft inequality constraint in the
current step and all the steps upstream (the parts already encoded).

This exact same effect, which is a purely arithmetic rounding
phenomenon, is much more obvious with QI, where the Kraft inequality
constraint is given in eq. (20) ([T3] p. 8). The basic QI quantization
recurrence is given in eq. (21). Any time the rounding up occurs in
(21), the next higher order addends increase slightly (the upper bound
on that excess is log(e)/2^(g-1) bits per symbol). The AC addends are
exactly the same addends, except that they are rescaled for each coding
task spearately, so that the largest addend is always exactly 1
(interpreted in AC as total probability=1, cf. [T2] pp. 19-20, Figs. 3
& 4). All other AC adends are simply the QI addends divided by this
largest addend (which is selected individually for each input).
Similarly, the AC equivalent of eq (20), the Kraft inequality
constraint, is divided by the largest addend, so it appears as a
constraint on the probabilities, but it is still the same constraint.
Hence, the exactly same relation between the higher and lower order
addends is maintained by AC and QI, where in both cases the
quantization enlarges the ratios between the higher order and lower
order addends. The difference is that QI quantization, which is bottom
up, is optimal for any given precision, while AC's, which is top down,
is not (this is the same difference as between the bottom-up Huffman
codes, which are optimal, and top-down Shannon-Fano codes, which are
not).

As explained in another message I posted today [M2], this particular
difference (which is unavoidable for the AC) is small for the tasks for
which AC & Huffman are typically used for. As explained in [M2] in more
detail, that is almost a tautological observation, since AC & Huffman
are used only where they do a reasonably good job. But there are coding
domains (along with the corresonding largely unexplored higher level
algorithms) beyond this one, in which the compression ratios could be
arbitrarily large, solely based on these "small" differences (the
quantization & enumeration errors, the latter are of O(log(n)) are
avoidable by AC, see [M2]).


> Being restricted to an order 0 model seems like a
> severe disadvantage.  How would you transform a
> context mixing model to order 0?

This was explained in detail in another message, [M3]. The appearance
of this limitation (which many people have expressed), is due to
viewing it through the AC modeling paradigm (ACMP), especially through
the so-called "online" constraint (which doesn't exist in real life,
since Morse telegraph were the main communication technology), where
one has to process entire sequence symbol by symbol and output with
only log(n) latency. Also, the modeling is a much larger field than
calculating probabilities of the next single symbol (note, though, that
method (b) in [M3] can handle arbitrary AC models, while still
retaining the full QI speed advantage).


-- References ( http://www.1stworks.com/ref/RefLib.htm )

T1-T3 are on http://www.1stworks.com/ref/qi.htm

M1. Post on quantization errors in AC & QI:

http://groups.google.com/group/comp.compression/msg/b157e1aa25e598d8

M2. Post on the apparent "smallness" of compression differences:

http://groups.google.com/group/comp.compression/msg/6ebbc078012c215c

M3. Post on QI modeling:

http://groups.google.com/group/comp.compression/msg/1314ff87da597fad

0
nightlight
1/9/2006 3:35:29 PM
Hi,

> The AC carry problem which is due to the usual FIFO coding mode can be
> avoided by either using LIFO mode (as Rissanen's AC-76) or buffering
> the whole data and propagating the carry. But the carry redundancy or
> any explicit gaps in the IVF (see [M1]) intervals are a only part of
> the quantization errors.

Ehem, small correction: You don't have to buffer the whole data, see
for example Nelson's or Moffat's implementation. All you need to do is
to count how many times you have "forgotten" to carry over, and
resolve this as soon as the carry is resolved by propagating it
thru all the "counted" (but never buffered) data. 

Other methods use "bitstuffing", i.e. insert a redundant bit to
allow the carry to propagade in, but this is completely off-topic
here since it pessimises the code, and we're currently in "bit juggling"
mode here. (-;

> This was explained in detail in another message, [M3]. The appearance
> of this limitation (which many people have expressed), is due to
> viewing it through the AC modeling paradigm (ACMP), especially through
> the so-called "online" constraint (which doesn't exist in real life,
> since Morse telegraph were the main communication technology), where
> one has to process entire sequence symbol by symbol and output with
> only log(n) latency. 

I afraid this is pretty much an existing problem. This "online" constraint
is often a limitation imposed by limited memory of telecommunications
hardware, and in this context called the "latency" of the coding model.
Consider for example an audio compression over an IP network, using
a hardware based coder. A long memory in the device would mean that
the receiver has to wait long (in principle infinitely long) 
before it might be able to play the sound. Unacceptable for real-life
applications, and this is why folks play the "bitstuffing" game I mentioned
above, even though it reduces the coder performance. Thus, there *are*
definitely areas where one wants to have an online-algorithm. However,
I would not consider this problem on-topic here right now.

I don't think that this latency question is right now
relevant for your work; getting past AC is interesting enough by itself,
let it be on-line or off-line.

So long,
	Thomas
0
Thomas
1/9/2006 5:27:27 PM
> Consider for example an audio compression over an IP network, using
> a hardware based coder. A long memory in the device would mean that
> the receiver has to wait long (in principle infinitely long)
> before it might be able to play the sound. Unacceptable for real-life
> applications, and this is why folks play the "bitstuffing" game I mentioned
> above, even though it reduces the coder performance.

I did few of these, all mods from existent audio codecs, for our
current communication product ( http://www.hotComm.com ). Even here,
though, take the most basic low quality voice with 8000 samples/sec,
and take low precision samples of just 1 byte per sample (this is not
what we normally use). The segments we typically get are 20ms of audio
data. (Note that one could easily go to 60-80 ms, blocks without human
listener noticing any difference.) With 20ms, 1 byte/sample, 8
samples/ms you get a block of data to encode that is 1280 bits long.
That is three orders of magnitude larger than the "online" constraint
in the AC modeling/coding paradigm lingo. Any higher quality, and you
are several times larger. Another real time app which doesn't tolerate
latency would be video codec, and even the differential frame data is
almost two orders of magnitude larger than audio.  Although the entropy
coder won't get the raw samples but some outputs from the transforms &
filtering, the above was reduced in all parameters to bare minimum, so
in more realistic case that one does get at least as much data even for
the entropy coder.

Even assuming you can have few hundreds symbols at a time vs just a
single symbol (or just few, for AC latency) adds a great deal of
flexibility and opens space for new algorithms, for the modeling engine
and the coder, as BWT illustrates (or the so-called "offline"
dictionary methods) for the modeler or QI for the coder.

> Ehem, small correction: You don't have to buffer the whole data, see
> for example Nelson's or Moffat's implementation. All you need to do is
> to count how many times you have "forgotten" to carry over, and
> resolve this as soon as the carry is resolved by propagating it
> thru all the "counted" (but never buffered) data.

I think that depends on the constraints the task. The AC is adding to a
common sum the numbers which decrease in size. Hence, there is no way,
even in principle to send data out from the higher digits if they are
large enough for carry to propagate, and let decoder decode it
incorrectly (and possibly take actions based on), then issue a carry
signal to undo that decode.  That is what I meant above -- encoder
either has to keep all the data that can propagate carry in the worst
case or use blocking mechanisms  mechanisms (which add redundancy) or
use LIFO mode (where everything is held until the whole buffer is
encoded).

Note also that one pays the cost of FIFO coding not just in cases when
one has to propagate carry, or in small extra redundnancy, but the
coder is burdened with checking & branching for such cases inside its
coding loop, so there is a coding speed penalty of the AC "online"
constraint, with no practical benefit at all.

0
nightlight
1/9/2006 6:10:43 PM
>> the probability distribution of the symbols is different,
>> then how does QI coder adapt itself to all those different distributions ?
>
> I dont know the answer to this one.

There is a more detailed answer to this question in an another post
[M1], see section (b), which is QI style coding using AC modeling
engine, which retains the QI speed advantage while coding at the
compression efficiency of the AC. That section fleshes out in more
detail a shorter comment about this method from the note N7 on p. 9 in
[T3]. Much more detailed description of EC/QI modeling is given in
[T2], pp. 26-35.


> As for nightlights's comments on QI's speed, I am afraid
> that as the modelling scheme for QI is different from
> modelling scheme for ArithCoding, we will need to compare
> speed of "QI+its modelling code" with "AC+its modelling code".
> Where both models should be of same order, or chosen to give
> same compression ratio. (My doubt here is "what if QI just
> *shifts* computation burden to modelling code instead
> of reducing it".)

Well, you already have results for QI+Model and AC+Model for the
simplest case of modeling, order-0 adaptive AC vs order-0 "adaptive" QI
(where QI simply counts the bits and selects 0 or 1 based addends to
use, as shown in the source). The AC uses its probability adaptation
algorithm, which is quite a bit of more work (since it counts bits as
well). You can also look at [M1], method (b) and see that any model
made for AC, which feeds probabilities to the coder can be used with QI
and coded with the full coding speed advantage. In that case the
modeling engine is left as is, hence there is _no cost shifting_.

The QI simply codes faster because it has a much better division of
labor within the coder itself than AC (see [T2] pp. 19-21). You can see
that by comparing the AC coding loop with QI coding loop. For example,
the computation of all enumerative addends is done separately, by
specialized code which does only that and saves the results in a
universal table (e.g. quantized binomials for Bernoulli source,
factorials for permutations, powers & products for mixed radix codes).

The tables are universal in the sense that they encapsulate the generic
combinatorial properties of symbol sequences, hence no matter what
source probabilities are, a single table is needed. The AC algorithm,
due to an intricate entanglement of source probabilities with its
coding arithmetic, cannot have such universal tables (see [T2] pp.
19-21) -- it would need a separate table for each source probability.
Instead AC's coding loop computes essentially the same addends
(rescaled so that max addend is always 1, see [T2] p. 19) using
multiplicative recurrences for the addends. In the binary order-0 case,
these are simply the binomial coefficients C(n,k) recurrences:

   C(n,k) = C(n+1,k+1) * (k+1)/(n+1) ......  (1)

when symbol 1 is encoded and:

    C(n,k) = C(n+1,k) * (n-k)/(n+1) ......  (2)

when symbol 0 is encoded. The factor p=(k+1)/(n+1) in (1), which
is a ratio of the remaining counts of ones, (k+1), and the total
remaining symbols (n+1), is interpreted within AC as probability
of ones and the factor q=(n-k)/(n+1) in (2) as probability of
zeros at that same place. The AC also uses a common scaling factor,
where all C(n,k) in (1) & (2) are divided with C(N,K), where N=total
bits and K=total 1's. This division with C(N,K) is implicit in AC, i.e.
AC starts coding with, what in QI space is the path endpoint (N,K), and
sets its addend value to 1 (interpreted in AC as the starting
probability of an empty message), then using (1) & (2) it computes next
lower coefficients C(N-1,K-1) if 1 is encountered 1st or C(N-1,K) if 0
is encountered 1st, thus it doesn't need to know explicitly the
absolute value of C(N,K). Whenever it encounters 1 (less frequent
symbol by convention), it adds the current addend (the interval
belonging to 0) to the index (which the same as QI does with its
integer addends). The main difference is that AC needs to do
multiplications in (1) or (2) and do it on every symbol. QI already has
all the addends in its table and it doesn't have to do anything for the
most frequent symbol.

As you can see (or also check in the source), at the coder level
itself, the QI has the labor divided much cleaner -- all calculations
of the universal properties of the symbol sequences are outside of its
coding loop. They're done once for all. The only thing done in the QI
coding loop is specific to the given sequence - the particular
placement of 1's for that sequence. The AC carries all the
calculations, for the universal and for the particular sequence
properties in its innermost coding loop, since the two are irreversibly
entangled in the AC scheme.

The same distortions in the division of labor propagate outwardly as
you wrap more layers around the coder, forcing the modeling engine to
deform its division of labor around the inefficient logistic inside
(see more discussion on this point in [T2] pp. 30-31).

The QI modeling is much simpler. The coder does not force the modeler
to convert all it knows or what it can find out about the sequence into
the probabilities of the next symbol, as AC does with its modeling
engine. Since QI doesn't need probabilities to perform coding (but, as
shown in [M1], part (b), it can use them just fine if that is what
modeler outputs), the QI modeling engine is less constrained, more free
to chose from a much larger space of possible algorithms. In essence,
what QI's modeler has to do for QI is to tell it which messages are
equiprobable with which other messages (so they can be enumerated in
the same enumerative class). QI coder doesn't care what the actual
probabilities of the messages are, but only whether P(M1)==P(M2), which
is a much weaker load on the modeler than asking it what is the value
P(M1) and what is the value P(M2), as AC does.

Another essential difference between the two modeling schemes is that
QI's modeler is trying to describe the sequence, while the AC's modeler
is trying to predict the next symbol in the sequence (i.e. calculate
all the possible odds for the next symbol). Paraphrasing Yogi Berra, it
is much easier to predict the past than to predict the future. Hence,
the QI's modeler does have a much easier job at the fundamental level.

The general QI/EC modeling pattern is described in [T2] p. 26, with
details fleshed out on pp. 27-35.



-- References ( http://www.1stworks.com/ref/RefLib.htm )

     QI source & tech. reports:
     http://www.1stworks.com/ref/qi.htm

M1. Post on QI modeling:

http://groups.google.com/group/comp.compression/msg/1314ff87da597fad

T3. R.V. Tomic "Quantized Indexing: Beyond Arithmetic Coding"
    arXiv  cs.IT/0511057, 10p, Nov 2005, 1stWorks TR05-1115
    http://arxiv.org/abs/cs.IT/0511057

T2. R.V. Tomic "Quantized indexing: Background information"
    1stWorks TR05-0625, 39p, Jun 2005
    http://www.1stworks.com/ref/TR/tr05-0625a.pdf

0
nightlight
1/9/2006 9:37:26 PM
David A. Scott wrote:
>    Matt if I can get nightlight to commit to coding his example of
> the 3 symols types. I would like to play again with fpaq0. To see
> how much better it can be made with as little change as possible.
> I like your style but I don't think I will go to the wall and make
> it bijective. But the nine times for each eight can be changed to
> eight for eight with a ninth only needed for the last byte.

There is some room for improvement.  I tried compressing 10,000,000
bytes of random charaters A, B, C.  fpaq0 compresses it to 1,982,988
bytes.  The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of
1785 bytes.  For 1,000,000 bytes it compresses to 198,322 bytes, a
difference of 201.7 bytes.

-- Matt Mahoney

0
Matt
1/10/2006 1:17:52 AM
Matt Mahoney wrote:
> There is some room for improvement.  I tried compressing 10,000,000
> bytes of random charaters A, B, C.  fpaq0 compresses it to 1,982,988
> bytes.  The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of
> 1785 bytes.  For 1,000,000 bytes it compresses to 198,322 bytes, a
> difference of 201.7 bytes.

Funny test I repeated it quickly with 10 compressors already located at
my PC:

Input (random A,B,C):
0.97 MB (1,024,000 bytes)

Output (compressed):
198 KB (203,401 bytes) PAQ 7
199 KB (204,731 bytes) WinRK beta 3.0 build 2
200 KB (205,119 bytes) WinUHA 2.0
202 KB (207,127 bytes) SBC 0.970
202 KB (207,355 bytes) Slim 0.021
206 KB (211,485 bytes) WinRAR 3.51
206 KB (211,632 bytes) Stuffit 9.0.0.21
216 KB (222,042 bytes) 7-ZIP 4.32
229 KB (234,886 bytes) WinZip 9.0
231 KB (237,390 bytes) BZIP2 1.0.2

0
Sportman
1/10/2006 2:46:34 AM
"Matt Mahoney" <matmahoney@yahoo.com> wrote in 
news:1136855872.776041.80650@g44g2000cwa.googlegroups.com:

> David A. Scott wrote:
>>    Matt if I can get nightlight to commit to coding his example of
>> the 3 symols types. I would like to play again with fpaq0. To see
>> how much better it can be made with as little change as possible.
>> I like your style but I don't think I will go to the wall and make
>> it bijective. But the nine times for each eight can be changed to
>> eight for eight with a ninth only needed for the last byte.
> 
> There is some room for improvement.  I tried compressing 10,000,000
> bytes of random charaters A, B, C.  fpaq0 compresses it to 1,982,988
> bytes.  The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of
> 1785 bytes.  For 1,000,000 bytes it compresses to 198,322 bytes, a
> difference of 201.7 bytes.
> 
> -- Matt Mahoney
> 
> 

   That interesting but I suspect even though arb255.exe would compress
to a much smaller amount. It would not hit the limit yet since raw arb255
is for all possible 256 symbols instead of 3. Yet I suspect it will be 
much closer to the limit. Do you have a place to get some of the zipped
test files. So I can test arb255.exe I think you would be surprised at the
difference.


   Your code does not use the full count you have a limit at 65534 so if 
the data is mixed well you start to appraoch what I am assuming is the
numbers your got. However in your code if there is roughly equal number of
both A B and C your code will compress smaller than what your calculating
as the theoritical if all the A's followed by all the B's followed by all
the C's.  My code since its a more true arithmetic does not have these
reset points. So you should get roughly within 2 bytes the same length no
matter how the bytes occur. My code does sort of what QI promises as far
as compression amount but with real files.

   Both your code and mine are basically using the same tree. Yours is laid
out at 512 cells mine is laid out as 255 put you are really only using half
the cells since fist bit a constant except at the EOF.


David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/10/2006 2:46:42 AM
"Sportman" <sportman@gmail.com> wrote in 
news:1136861194.808260.184880@f14g2000cwb.googlegroups.com:

> 
> Matt Mahoney wrote:
>> There is some room for improvement.  I tried compressing 10,000,000
>> bytes of random charaters A, B, C.  fpaq0 compresses it to 1,982,988
>> bytes.  The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of
>> 1785 bytes.  For 1,000,000 bytes it compresses to 198,322 bytes, a
>> difference of 201.7 bytes.
> 
> Funny test I repeated it quickly with 10 compressors already located at
> my PC:
> 
> Input (random A,B,C):
> 0.97 MB (1,024,000 bytes)
> 
> Output (compressed):
> 198 KB (203,401 bytes) PAQ 7
> 199 KB (204,731 bytes) WinRK beta 3.0 build 2
> 200 KB (205,119 bytes) WinUHA 2.0
> 202 KB (207,127 bytes) SBC 0.970
> 202 KB (207,355 bytes) Slim 0.021
> 206 KB (211,485 bytes) WinRAR 3.51
> 206 KB (211,632 bytes) Stuffit 9.0.0.21
> 216 KB (222,042 bytes) 7-ZIP 4.32
> 229 KB (234,886 bytes) WinZip 9.0
> 231 KB (237,390 bytes) BZIP2 1.0.2
> 
> 

  Could you test with arb255.exe  
http://bijective.dogma.net/arb255.zip

Or do you have a test file. Again arb255 is not tuned for
just 3 characters but would work well if the A B C are
truely in a random order.




David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/10/2006 2:55:41 AM
David A. Scott wrote:
>   Could you test with arb255.exe
> http://bijective.dogma.net/arb255.zip

Do you have a readme file with command line instructions?

> Or do you have a test file. Again arb255 is not tuned for
> just 3 characters but would work well if the A B C are
> truely in a random order.

The test file is send by email did you received it?

I tested the same 10 compressors with a more real life file to compare
the rankings, note WinRK used round 1,5 hour and PAQ round 30 minutes
to compress the test file at a Pentium M single core.

Input (DB dump structure with mix between HTML tags, text and data):
32.4 MB (34,017,118 bytes)

Output (compressed):
2.54 MB (2,674,010 bytes) WinRK beta 3.0 build 2
2.56 MB (2,686,923 bytes) PAQ 7
2.81 MB (2,948,566 bytes) Slim 0.021
3.22 MB (3,379,942 bytes) WinUHA 2.0
3.52 MB (3,699,846 bytes) SBC 0.970
3.54 MB (3,723,094 bytes) 7-ZIP 4.32
3.62 MB (3,806,862 bytes) Stuffit 9.0.0.21
3.72 MB (3,910,233 bytes) WinRAR 3.51
4.03 MB (4,231,646 bytes) BZIP2 1.0.2
4.84 MB (5,082,559 bytes) WinZip 9.0

0
Sportman
1/10/2006 4:43:39 AM
"Matt Mahoney" <matmahoney@yahoo.com> wrote in 
news:1136855872.776041.80650@g44g2000cwa.googlegroups.com:

> David A. Scott wrote:
>>    Matt if I can get nightlight to commit to coding his example of
>> the 3 symols types. I would like to play again with fpaq0. To see
>> how much better it can be made with as little change as possible.
>> I like your style but I don't think I will go to the wall and make
>> it bijective. But the nine times for each eight can be changed to
>> eight for eight with a ninth only needed for the last byte.
> 
> There is some room for improvement.  I tried compressing 10,000,000
> bytes of random charaters A, B, C.  fpaq0 compresses it to 1,982,988
> bytes.  The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of
> 1785 bytes.  For 1,000,000 bytes it compresses to 198,322 bytes, a
> difference of 201.7 bytes.
> 
> -- Matt Mahoney
> 
> 

   Matt AEB255 got 1,981,227 both for a random file of 10,000,000
and a file where first 3,333,334 symbols of A followed by 3,333,333 of
B followed by 3,333,333 of C. Which is only  24 bytes different than
what you call optimal. But that optimal value you quoted was assuming
the coder knows that only the 3 symbols used are A B C  and that each
occur equally likely. Which is not what our models assume so the 
corrent value is larger.  See the work of Paul Howard and Jeffery
Scott Vitter.


Since we are using general arithmetic compressors not operating with fixed
probabilites there is a cost associated with in the data. With a slight
change to either FPAQ0 or ARB255 you will see the excess drop a lot.
So lets see if NIGHTLIGHT ever gets a working code for what was really
the contest he wanted. The value you quote is what a modifed arb should
get or Nightlight code where the code knows that its only 3 sybmbols each
occurring at equal rate but don't hold you breath.



David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/10/2006 4:53:20 AM
"Sportman" <sportman@gmail.com> wrote in 
news:1136868219.095473.58400@z14g2000cwz.googlegroups.com:

> David A. Scott wrote:
>>   Could you test with arb255.exe
>> http://bijective.dogma.net/arb255.zip
> 
> Do you have a readme file with command line instructions?
> 
  
 I thought there was one in it but to compress
arb255.exe file.in file.out

to decompress
unarb255.exe file.in file.out

Note this is very slow code its writes stuff to the screen.
Its made to do pure bijective arithmetic file compression.



>> Or do you have a test file. Again arb255 is not tuned for
>> just 3 characters but would work well if the A B C are
>> truely in a random order.
> 
> The test file is send by email did you received it?

 
 Yes I got it but I don't have code on this machinge to
uncompress rar files. Yes I know its common could you send
a zip file?


> 
> I tested the same 10 compressors with a more real life file to compare
> the rankings, note WinRK used round 1,5 hour and PAQ round 30 minutes
> to compress the test file at a Pentium M single core.
> 
> Input (DB dump structure with mix between HTML tags, text and data):
> 32.4 MB (34,017,118 bytes)
> 
> Output (compressed):
> 2.54 MB (2,674,010 bytes) WinRK beta 3.0 build 2
> 2.56 MB (2,686,923 bytes) PAQ 7
> 2.81 MB (2,948,566 bytes) Slim 0.021
> 3.22 MB (3,379,942 bytes) WinUHA 2.0
> 3.52 MB (3,699,846 bytes) SBC 0.970
> 3.54 MB (3,723,094 bytes) 7-ZIP 4.32
> 3.62 MB (3,806,862 bytes) Stuffit 9.0.0.21
> 3.72 MB (3,910,233 bytes) WinRAR 3.51
> 4.03 MB (4,231,646 bytes) BZIP2 1.0.2
> 4.84 MB (5,082,559 bytes) WinZip 9.0
> 
> 



David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/10/2006 5:06:49 AM
"David A. Scott" <daVvid_a_scott@email.com> wrote in
news:Xns9746DEE5BBC64H110W296LC45WIN3030R@213.155.197.138: 

> "Matt Mahoney" <matmahoney@yahoo.com> wrote in 
> news:1136855872.776041.80650@g44g2000cwa.googlegroups.com:
> 
>> David A. Scott wrote:
>>>    Matt if I can get nightlight to commit to coding his example of
>>> the 3 symols types. I would like to play again with fpaq0. To see
>>> how much better it can be made with as little change as possible.
>>> I like your style but I don't think I will go to the wall and make
>>> it bijective. But the nine times for each eight can be changed to
>>> eight for eight with a ninth only needed for the last byte.
>> 
>> There is some room for improvement.  I tried compressing 10,000,000
>> bytes of random charaters A, B, C.  fpaq0 compresses it to 1,982,988
>> bytes.  The theoretical limit is 1/8 lg 3 = 1,981,203, a difference
>> of 1785 bytes.  For 1,000,000 bytes it compresses to 198,322 bytes, a
>> difference of 201.7 bytes.
>> 
>> -- Matt Mahoney
>> 
>> 
> 
>    Matt AEB255 got 1,981,227 both for a random file of 10,000,000

 I meant ARB255 the other was a typo

> and a file where first 3,333,334 symbols of A followed by 3,333,333 of
> B followed by 3,333,333 of C. Which is only  24 bytes different than
> what you call optimal. But that optimal value you quoted was assuming
> the coder knows that only the 3 symbols used are A B C  and that each
> occur equally likely. Which is not what our models assume so the 
> corrent value is larger.  See the work of Paul Howard and Jeffery
> Scott Vitter.
> 
> 
> Since we are using general arithmetic compressors not operating with
> fixed probabilites there is a cost associated with in the data. With a
> slight change to either FPAQ0 or ARB255 you will see the excess drop a
> lot. So lets see if NIGHTLIGHT ever gets a working code for what was
> really the contest he wanted. The value you quote is what a modifed
> arb should get or Nightlight code where the code knows that its only 3
> sybmbols each occurring at equal rate but don't hold you breath.
> 
> 
> 
> David A. Scott



David A. Scott


David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/10/2006 5:10:19 AM
David A. Scott wrote:

>  Yes I got it but I don't have code on this machinge to
> uncompress rar files. Yes I know its common could you send
> a zip file?
Done

0
Sportman
1/10/2006 5:16:56 AM
"Sportman" <sportman@gmail.com> wrote in 
news:1136870215.955060.184180@o13g2000cwo.googlegroups.com:

> David A. Scott wrote:
> 
>>  Yes I got it but I don't have code on this machinge to
>> uncompress rar files. Yes I know its common could you send
>> a zip file?
> Done
> 
> 

 Your file 1,024,000 bytes
compressed to 202,894 bytes


David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/10/2006 5:23:10 AM
David A. Scott wrote:
> > Could you test with arb255.exe
> > http://bijective.dogma.net/arb255.zip
> >
> > Do you have a readme file with command line instructions?
> >
>
>  I thought there was one in it but to compress
> arb255.exe file.in file.out
>
> to decompress
> unarb255.exe file.in file.out

Thanks this helped:

Result test file 1:
198 KB (202,894 bytes)

Result test file 2:
22.4 MB (23,500,615 bytes) Did I something wrong?

0
Sportman
1/10/2006 5:27:42 AM
"Sportman" <sportman@gmail.com> wrote in 
news:1136870862.023091.297960@g44g2000cwa.googlegroups.com:

> David A. Scott wrote:
>> > Could you test with arb255.exe
>> > http://bijective.dogma.net/arb255.zip
>> >
>> > Do you have a readme file with command line instructions?
>> >
>>
>>  I thought there was one in it but to compress
>> arb255.exe file.in file.out
>>
>> to decompress
>> unarb255.exe file.in file.out
> 
> Thanks this helped:
> 
> Result test file 1:
> 198 KB (202,894 bytes)
> 
> Result test file 2:
> 22.4 MB (23,500,615 bytes) Did I something wrong?
> 
> 

  maybe don't try unix style with the > or < you
have to type in the whole line.
Here is what I do. to compres a file A.txt
arb255 a.txt  a.255
unarb255 a.256 a.25
fc /b a.txt a.a25

the a,255 is compressed of a.txt
a.25 is uncompressed and fc checks
to see if they are the same.

Since it bijective if you run unarb255 it will
uncompress to another file that is usually longer
but when you compress it comes back. The code
it completely bijectibe but don't till nightlight
not sure he could take it from a simple arithmetic
coder.


David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/10/2006 5:34:57 AM
nightlight wrote:
) The regular adaptive or static AC's that one normally finds in
) practical implementations, there will be an additional redundancy
) relative to QI which, in case of order-0 stationary sources can be
) derived exactly, as shown in [T2] pp. 20-24). That redundancy is in the
) leading order 1/2 log(2P npq) for binary (for general alphabet of size
) A, there are A-1 such terms which are summed, resulting in approx 1/2
) changing to (A-1)/2). As shown in [T2], this error is due to the
) approximate enumeration of AC, where the Stirling approximation and the
) subsequent dropping of the sqrt() & other factors (each being <1),
) causes AC addends to increase relative to QI, leading to an excess of
) O(log(n)) bits on the total output.

By the way, this excess you are talking about is *not* present as coding
gaps.  It is present as an imperfect mapping to the output domain,
resulting in some files compressing larger, but others compressing
*smaller* than they should, according to their probability distribution.

) a) the compressed output size is available without decompressing the
) data -- for example you know exactly the size of binomial coefficient
) C(n,k) from count of 1's.
)
) b) For fixed input entropy rate QI codes at _precisely_ the fixed
) number of bits (which is also within the log(e)/2^(g-1) bits from the
) entropy, at precision g and model used) e.g. all n! permutations of n
) items will always be encoded into the _precisely_ the same size (you
) can test that with QI.exe or see it in the source).

If the data fits these restrictions, then that is additional modeling
information that should be present in the model, and if so, AC will
take advantage of it as well as QI.

) In conclusion, the availability of an extremly fast and a highly
) accurate (optimal for any given aritmetic precsion) coding algorithm to
) perform such coding tasks, opens the entire realm of unexplored
) compression algorithms. The compression gains potentially available to
) such algorithms are not limited at all to the few percent one can see
) on the subset of coding tasks where Huffman or AC code well (which to
) those over-conditioned to seeing only that domain, appears to be all
) there is to compress).

I wasn't debating that.  I was debating your absolute 'always better'
claim, and your 'holy grail' attitude.


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
0
Willem
1/10/2006 8:23:35 AM
Hi,

> I did few of these, all mods from existent audio codecs, for our
> current communication product ( http://www.hotComm.com ). Even here,
> though, take the most basic low quality voice with 8000 samples/sec,
> and take low precision samples of just 1 byte per sample (this is not
> what we normally use). The segments we typically get are 20ms of audio
> data. (Note that one could easily go to 60-80 ms, blocks without human
> listener noticing any difference.) With 20ms, 1 byte/sample, 8
> samples/ms you get a block of data to encode that is 1280 bits long.
> That is three orders of magnitude larger than the "online" constraint
> in the AC modeling/coding paradigm lingo. Any higher quality, and you
> are several times larger. Another real time app which doesn't tolerate
> latency would be video codec, and even the differential frame data is
> almost two orders of magnitude larger than audio.  Although the entropy
> coder won't get the raw samples but some outputs from the transforms &
> filtering, the above was reduced in all parameters to bare minimum, so
> in more realistic case that one does get at least as much data even for
> the entropy coder.

> Even assuming you can have few hundreds symbols at a time vs just a
> single symbol (or just few, for AC latency) adds a great deal of
> flexibility and opens space for new algorithms, for the modeling engine
> and the coder, as BWT illustrates (or the so-called "offline"
> dictionary methods) for the modeler or QI for the coder.

A war-time story from my side: JPEG-1 has an arithmetic coder option,
the QM-coder, which does have a latency problem because the coder can,
in principle, delay the output arbitrarely long, depending on the
carry-over resolution. There was a long discussion over this issue in
the group because this actually disallows efficient hardware
designs. Thus, it *is* a problem, at least for some people.

> > Ehem, small correction: You don't have to buffer the whole data, see
> > for example Nelson's or Moffat's implementation. All you need to do is
> > to count how many times you have "forgotten" to carry over, and
> > resolve this as soon as the carry is resolved by propagating it
> > thru all the "counted" (but never buffered) data.

> I think that depends on the constraints the task. The AC is adding to a
> common sum the numbers which decrease in size. Hence, there is no way,
> even in principle to send data out from the higher digits if they are
> large enough for carry to propagate, and let decoder decode it
> incorrectly (and possibly take actions based on), then issue a carry
> signal to undo that decode.  That is what I meant above -- encoder
> either has to keep all the data that can propagate carry in the worst
> case or use blocking mechanisms  mechanisms (which add redundancy) or
> use LIFO mode (where everything is held until the whole buffer is
> encoded).

But that's not what you've written. Clearly, you cannot send the data
until the carry propagation has been resolved, (unless you sacrifize
some coder efficiency, that is) but that's not "keeping" the data, as
I would call it. It is represented in the encoder, sure, but not "as
is", but as a "carry-over" count. Thus, one doesn't require an
arbitary long buffer for that. Just an arbitrary large counter. (-;

> Note also that one pays the cost of FIFO coding not just in cases when
> one has to propagate carry, or in small extra redundnancy, but the
> coder is burdened with checking & branching for such cases inside its
> coding loop, so there is a coding speed penalty of the AC "online"
> constraint, with no practical benefit at all.

I don't understand what you mean by "no practical benefit". AC online
coders without the carry-over problem (i.e. bitstuffing) are *very*
practical, and they are all realistically used in real applications,
i.e. JBIG, JPEG2000, etc... and that, *because* the hardware folks
cannot get away with arbitrary long delays. The intermediate
communication protocols don't allow it.

However, note that this drifts the communication away from the initial
discussion, thus we stop here.

So long,
	Thomas
0
Thomas
1/10/2006 9:13:05 AM
> As I see it, there are no excess bits in an Arith
> Coder, until the coder is told to wrap it up.  Can
> you give some clear arguments why you claim that
> the bits added by termination are present before
> termination ?

I thought the explanation was pretty clear. Well, perhaps a numeric
example showing the accumulation of errors may help.

I will use below results from my previous message [M1]. The ACX
(infinite precision AC) redundancy formula (from [M1]) is:

     L = ceiling(log(1/Pc)) + 1 .... (1)

which gives the length L of the ACX output in terms of Pc, which is the
"coding probability" for the entire message. AC computes Pc by
multiplying coding probabilities of all symbols encountered along the
way. With ACX, the products are exact, while with ACF (finite precision
AC), the products are approximate which will add 2-3 bits to L in (1)
(before ACF counts overflow i.e. generally, ACF adds excess bits at the
approximate rate of log(e)/2^(g-2) bits per symbol). Note also that Pc
is always a product of probabilities generated by the AC model along
the way. For order-0 stationary models the factors used are constants,
but in a general model the factors used may vary from step to step.

Let's now take a finite precision AC and watch its computation of Pc
used in (1). For simplicity we'll calculate in decimal notation (we'll
still use base 2 log in (1) to count bits). We'll use static binary
source with probabilities p(1)=0.3 and p(0)=0.7 and we will set ACF
precision to 1 decimal digit. Let's look at what happens to Pc when
coding string 00100:

0. Initial Pc=1.0 (prob=1 for empty message) ------- GAPS
1. In C=0 => Pc = 1.0 * 0.7 = 0.7   => Pc = 0.7 e+0  G=0.000
2. In C=0 => Pc = 0.7 * 0.7 = 0.49  => Pc = 0.4 e+0  G=0.090
3. In C=1 => Pc = 0.4 * 0.3 = 0.12  => Pc = 0.1 e+0  G=0.020
4. In C=0 => Pc = 0.1 * 0.7 = 0.07  => Pc = 0.7 e-1  G=0.000
5. In C=0 => Pc = 0.07* 0.7 = 0.049 => Pc = 0.4 e-2  G=0.009

The ACX's exact Pc is Pcx = 0.7^4 * 0.3 = 0.07203, while the APF's
approximate Pc is 0.04 which is smaller than Pcx. The output sizes
(rounded to whole bits) are: L(Pcx)=[3.795]+1=5 bits, L(Pc)=[4.644]+1=6
bits. The gaps created in the Pc (or in the "range" in ACF's
terminology) are shown next to each truncation.

Note, for example in step 2, the product 0.49 was truncated to 0.4 and
0.09 gap was created in Pc, which via (1), but without rounding to the
next whole bit since we're not outputing yet anything, results in the
extra log(0.49/0.40) = 0.29 bits accumulated in the ACF's L compared to
the ACX's L at that point.

Important element to observe in step 2 is that the gap of 0.09 in the
Pc (or equiv. in ACF's range), arising when encoding symbol '0' in step
2 did not get reassigned to symbol '1', since if '1' were the second
symbol, its product would have been truncated as well, as shown below
in alternate step 2a:

2a. In C=1 => Pc = 0.7 * 0.3 = 0.21 => Pc = 0.2e+0  G=0.01

Hence, either symbol in step 2 would have created a gap in the Pc
(range), i.e. the part of the Pc coverage of the full interval [0,1)
for all messages is simply wasted. For example, all 2 symbol messages
are 00, 01, 10, 11. Their Pc values, as computed by ACF, add up to: 0.4
+ 0.2 + 0.2 + 0.9e-1 = 0.89 leaving the total gap of 0.11 in the
coverage of the interval [0,1) (or shortfall from 1 in the Kraft
inequality (4) in [M1]). The exact Pcx for these 4 messages always add
up to exactly 1 and thus have no this gap. Note that ACX does introduce
a gap in [0,1) when truncating the final Pcx to a finite fraction in
order to obtain a finite length codeword, which results in the baseline
1-2 bit redundancy it creates (see CWI discussion in the earlier post
[M1]). But the extra gaps & the corresponding extra redundancy added by
ACF, accumulate from symbol by symbol truncation errors, step by step,
as shown above.

If you have time and want to play with this more, you should be able to
replicate these same conclusions from this little '1 decimal digit ACF'
in an actual ACF by making it output Pc for all input messages of
certain length (note that the usual ACF's "range" is an integer, a
mantissa of Pc, and ACF doesn't' keep track of its exponent explicitly,
so you will need to keep track of that yourself to display Pc's as
decimal numbers; the QI's SW formalism is a much cleaner and more tidy
way to do these kinds of computations; the ACF in SW formulation flows
much more coherently than the usual handwaving at its little pictures).

(Note: before you lose the context of the post again and bolt off onto
that tangent 'who cares, it's so tiny', as you did in the previous
post, recall what this was replying to: you were insisting to teach me
on the AC gaps and asserting that there are none.)

-- References ( http://www.1stworks.com/ref/RefLib.htm )

T1-T3 are on http://www.1stworks.com/ref/qi.htm

M1. Post on quantization errors in AC & QI:

http://groups.google.com/group/comp.compression/msg/b157e1aa25e598d8

41a. P.A.J. Volf  "Weighting Techniques In Data Compression: Theory and
Algorithms" Ph.D. thesis, Eindhoven University of Technology, Dec 2002
http://alexandria.tue.nl/extra2/200213835.pdf

0
nightlight
1/10/2006 12:31:29 PM
Hi Again,

> Let's now take a finite precision AC and watch its computation of Pc
> used in (1). For simplicity we'll calculate in decimal notation (we'll
> still use base 2 log in (1) to count bits). We'll use static binary
> source with probabilities p(1)=0.3 and p(0)=0.7 and we will set ACF
> precision to 1 decimal digit. Let's look at what happens to Pc when
> coding string 00100:

> 0. Initial Pc=1.0 (prob=1 for empty message) ------- GAPS
> 1. In C=0 => Pc = 1.0 * 0.7 = 0.7   => Pc = 0.7 e+0  G=0.000
> 2. In C=0 => Pc = 0.7 * 0.7 = 0.49  => Pc = 0.4 e+0  G=0.090
> 3. In C=1 => Pc = 0.4 * 0.3 = 0.12  => Pc = 0.1 e+0  G=0.020
> 4. In C=0 => Pc = 0.1 * 0.7 = 0.07  => Pc = 0.7 e-1  G=0.000
> 5. In C=0 => Pc = 0.07* 0.7 = 0.049 => Pc = 0.4 e-2  G=0.009

> The ACX's exact Pc is Pcx = 0.7^4 * 0.3 = 0.07203, while the APF's
> approximate Pc is 0.04 which is smaller than Pcx. The output sizes
> (rounded to whole bits) are: L(Pcx)=[3.795]+1=5 bits, L(Pc)=[4.644]+1=6
> bits. The gaps created in the Pc (or in the "range" in ACF's
> terminology) are shown next to each truncation.

> Note, for example in step 2, the product 0.49 was truncated to 0.4 and
> 0.09 gap was created in Pc, which via (1), but without rounding to the
> next whole bit since we're not outputing yet anything, results in the
> extra log(0.49/0.40) = 0.29 bits accumulated in the ACF's L compared to
> the ACX's L at that point.

And that's where the computation starts getting incorrect. You do
not 

It might
be that Pc gets rounded to 0.4, but that also means that the interval for 
the other symbol gets larger, thus there is no gap. This means, then,
that the probabilies that are imposed by the AC coder no longer fit
to the source (true), but it does not mean that there are gaps. The 0.090
here is the derivation of the "idealized" probability and a quantized
one, Qc, that differs from Pc due to the finite precision. 

Note that this is different from ELS, where we do have gaps, indeed.

> Important element to observe in step 2 is that the gap of 0.09 in the
> Pc (or equiv. in ACF's range), arising when encoding symbol '0' in step
> 2 did not get reassigned to symbol '1', since if '1' were the second
> symbol, its product would have been truncated as well, as shown below
> in alternate step 2a:

> 2a. In C=1 => Pc = 0.7 * 0.3 = 0.21 => Pc = 0.2e+0  G=0.01

No, Pc = 0.3, G = -0.090, resulting in a total "gap" of zero. The
confusion arises because you use only the code-word where AC in
fact uses an interval I to represent the values. 

0. Initial I = [0,1) = [0,0.7) U [0.7,1) 
1. In C=0 => [0,0.7). At this time, the interval cannot yet be rescaled since
the first digit (7) is not yet fixed, 
2. In C=0 => [0,0.4). No digit written either.

Instead, with 2a:

2a. In C=1 => [0.7,0.4). No digit written either, Pc = 0.3, Gap = -0.090

Rescaling doesn't happen in your example, because the end-points of
the interval are not yet "fixed enough" to write out data.

> If you have time and want to play with this more, you should be able to
> replicate these same conclusions from this little '1 decimal digit ACF'.

Yes, please play with the above a bit more. (-;

So long,
	Thomas
0
Thomas
1/10/2006 1:25:21 PM
nightlight wrote:
)> As I see it, there are no excess bits in an Arith
)> Coder, until the coder is told to wrap it up.  Can
)> you give some clear arguments why you claim that
)> the bits added by termination are present before
)> termination ?
)
) I thought the explanation was pretty clear. Well, perhaps a numeric
) example showing the accumulation of errors may help.

Read my sentence again.  'the bits added at termination'.
I was talking about the fixed cost.

) Let's now take a finite precision AC and watch its computation of Pc
) used in (1). For simplicity we'll calculate in decimal notation (we'll
) still use base 2 log in (1) to count bits). We'll use static binary
) source with probabilities p(1)=0.3 and p(0)=0.7 and we will set ACF
) precision to 1 decimal digit. Let's look at what happens to Pc when
) coding string 00100:
)
) 0. Initial Pc=1.0 (prob=1 for empty message) ------- GAPS
) 1. In C=0 => Pc = 1.0 * 0.7 = 0.7   => Pc = 0.7 e+0  G=0.000
) 2. In C=0 => Pc = 0.7 * 0.7 = 0.49  => Pc = 0.4 e+0  G=0.090
) 3. In C=1 => Pc = 0.4 * 0.3 = 0.12  => Pc = 0.1 e+0  G=0.020
) 4. In C=0 => Pc = 0.1 * 0.7 = 0.07  => Pc = 0.7 e-1  G=0.000
) 5. In C=0 => Pc = 0.07* 0.7 = 0.049 => Pc = 0.4 e-2  G=0.009

Why are you rounding down all the time ?

In AC, some of the symbols probabilities are rounded up,
in such a way that the total always adds up to the total range.
In other words, the 'gap' for one of the symbols gets added to
the code space for the other symbol.


) (Note: before you lose the context of the post again and bolt off onto
) that tangent 'who cares, it's so tiny', as you did in the previous
) post, recall what this was replying to: you were insisting to teach me
) on the AC gaps and asserting that there are none.)

You were asserting that there are gaps _at each step_, which is false.

It then looked as if you were suddenly talking about only the gaps
that occur at termination, which I discarded as irrelevant, but
which you jumped on with glee to point out I was wrong.


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
0
Willem
1/10/2006 1:32:01 PM
The QI.exe file which you may already have (from the source; current
source version is 1.03) has a command line option to test it on that
same input (which is a high entropy limit for multi-alphabet coding,
and which in I call radix codes):

  QI  cr3  n1000000  i100

which tells it to code inputs in radix 3 (this can be any 32 bit value
above 2), to use input of 1 million symbols (there is a constant
MAXRDIG in Intro.h which limits the input size to max 2^20 or 1M
digits, you can change that to allow larger sizes e.g. to 16 MEG) and
to run the test 100 times on 100 random inputs  (i100  for 100
iterations). The size it produces is 1584962.50... bits, which compared
to the exact N*log(3) entropy has an excess of 1.62 e-04 bits on the
total of 10^6 symbols (i.e. the excess per symbol is 1.6e-10 bits).

To compare that with AC output size, one option is to make AC work in
static mode without adapting to probabilities and make it not count the
transmission of frequency table or number of symbols n (which is the
same condition that the QI.exe figure applies to).

Alternatively, you can add to QI's output the size to transmit N, A and
the frequency table. QI.exe has a command QI cl<int> which computes
self-delimiting size for <int>, or just "QI cl" to list a table for
common values. There you get for N=10^6 its self-delimiting length
L(N)=27.543 and for L(A)=2.49 bits. The cost for frequency table with
QI/EC is the log of the binomial C(N+A-1,A-1), for N=10^6 and A=3,
which is log(C(1000002,2))=38.863 bits, which totals (each rounded
separately, which they don't need to) 28+3+39=70 bits to be added to
QI's output to match the adaptive AC's coding conditions. Since the
QI's output was essentially the entropy, the QI's total is 70 at most
whole bits above the "entropy" (note the "entropy" N*log(3) didn't
include N; also in high entropy limit QI doesn't need to transmit freq.
table, but one would need to modify AC to work in high entropy limit,
so I added table to QI, which distorts a bit comparison to entropy H).

Now, running the Moffat98 coder (in 8 bit max symbol coding mode &
frugal bits enabled), it outputs: 1588435.52 bits (this is avg. over
100 iterations), which is 3473 bits above the entropy, or 3403 bits
above the comparable QI output size. (Note that Mofat98 coder has
generally a slight bias to code worst for the max entropy inputs, but
it gains in return on very low entropy inputs.)

To compare speeds properly, one would need to modify QI's radix coder
to use 8 bit alphabet size limit, instead of 32 bit limit, otherwise QI
pays cost for accessing 4 times larger memory (and few other related
costs in the code, such as 32 bit multiplies or padding 4 times larger
output buffers, etc). Without adjusting the max alphabet size, QI (on a
2G P4 laptop) codes at 22 ns/sym while Moffat98 at 96 ns/sym which is a
ratio 4.36. That is a smaller ratio vs Mofat98 than the for the binary
coder high entropy limit vs Moffat98,  which is about 6. I think that
when both coders are normalized to the same max alphabet (thus to use
the same buffers they for input & width of multiplies), it would
probably come out the same high entropy ratio of 6 as in the binary
case.

0
nightlight
1/10/2006 1:51:49 PM
> Why are you rounding down all the time ?

> In AC, some of the symbols probabilities are rounded up,
> in such a way that the total always adds up to the total range.

Those are not probabilities being rounded down but the update to the
range size (which is the mantissa of the Pc, the coding probability of
the entire message up to that point). Check any AC source and watch div
in the coder to calc new range size (which discards the reminder), or
take a look in (41a) p.  48, formulas for WNC coder (see the 1st line
with asterisk in pseudo code). That is always rounded down.

> It then looked as if you were suddenly talking about only the gaps
> that occur at termination, which I discarded as irrelevant, but
> which you jumped on with glee to point out I was wrong.

The +1 in eq. (1) is the ACX (exact AC) truncation error cost for the
last interval, when it does need to produce finite fraction. That last
interval does have an interval gap as well, that is what those CWI
intervals were about. The exess above that +1 bit is the accumulated
error due to rounding along the way.  The ACF error produces total of
2-3 bit on top of +1.  You can measure it accumulate if you force
output at any point and subtract the last +1 (which need to go, by
definition on the last CWI interval specification). Anything above +1
in any intermediate stage is the accumulated excess. Note that total
ACF output has a latency of approx. g bits, so until you make it clear
up its internal state (and also adjust for +1 since that one is by
definition on the last interval only) you can't see what the excess is.

0
nightlight
1/10/2006 2:11:03 PM
> look in (41a) p.  48, formulas for WNC coder (see the
> 1st line with asterisk in pseudo code).

That  should say:

> 3rd line with asterisk..

0
nightlight
1/10/2006 2:52:32 PM
nightlight wrote:
)> Why are you rounding down all the time ?
)
)> In AC, some of the symbols probabilities are rounded up,
)> in such a way that the total always adds up to the total range.
)
) Those are not probabilities being rounded down but the update to the
) range size (which is the mantissa of the Pc, the coding probability of
) the entire message up to that point). Check any AC source and watch div
) in the coder to calc new range size (which discards the reminder), or
) take a look in (41a) p.  48, formulas for WNC coder (see the 1st line
) with asterisk in pseudo code). That is always rounded down.

If the AC source you checked rounds down to calc the new range size,
then you have a very poorly written AC source.  No wonder you are
misguided about gaps.


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
0
Willem
1/10/2006 2:59:23 PM
Matt Mahoney wrote:
> David A. Scott wrote:
> >    Matt if I can get nightlight to commit to coding his example of
> > the 3 symols types. I would like to play again with fpaq0. To see
> > how much better it can be made with as little change as possible.
> > I like your style but I don't think I will go to the wall and make
> > it bijective. But the nine times for each eight can be changed to
> > eight for eight with a ninth only needed for the last byte.
>
> There is some room for improvement.  I tried compressing 10,000,000
> bytes of random charaters A, B, C.  fpaq0 compresses it to 1,982,988
> bytes.  The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of
> 1785 bytes.  For 1,000,000 bytes it compresses to 198,322 bytes, a
> difference of 201.7 bytes.
>
> -- Matt Mahoney

I posted fpaq1.cpp to
http://www2.cs.fit.edu/~mmahoney/compression/#fpaq0
It is an improved order 0 arithmetic coder using 64 bit arithmetic.  On
a 10MB file which repeats "ABC" it is 25 bytes over the theoretical
limit, and I believe most of this is due to approximations made by the
model early in compression.

-- Matt Mahoney

0
Matt
1/10/2006 3:14:31 PM
> And that's where the computation starts getting incorrect. You donot
> It might be that Pc gets rounded to 0.4, but that also means that the
> interval for the other symbol gets larger, thus there is no gap.

The Pc calculation, which is the total message "coding probability"
(meaning it uses AC's model probabilities for its computation as a
product for all p's of the symbols encountered along the way), rounds
down on every symbol. Check any AC source code and look the integer
divison when they update the total range size. The formulas in [41a]
p.  48, show how this is done in the WNC87 coder (look at the 3rd line
with the asterisk in the pseudo code).  The same goes for Moffat98,
except here they divide first, then multiply. In all cases, though, the
result is rounded down (the integer "div" discards the reminder). There
is no check for symbol value and a branch for one or another symbol
value. It is unconditional loss of reminder -- the reminder gets
discarded on every symbol, every time the new range is calculated.
These are pure gaps in the coding interval (or shortfalls from 1 in the
Kraft inequality). You can easily see them if you make AC output
codewords for all possible inputs M of given length and add up Pc(M),
which are the total message probabilities, for all possible inputs M of
given length. The sum will come out smaller than 1, just as the numeric
examples show.

What you Willem are talking about is the division of the range among
different symbols, in which case your compensation does apply. But the
total interval has shrunk during the total interval update (when the
integer div rounds down on every symbol).

Note also that index computed by ACF is not rounded itself. This works
exactly the same way as the QI's arithmetic, where eq. (21) computes
path counts for given point using rounding up arithmetic, while the
index formulas (22)-(23) use exact arithmetic, with no rounding. With
QI these two calculations are separate, the rounding stuff is done only
the for the tables to obtain quantized binomials.

0
nightlight
1/10/2006 3:15:28 PM
Jo.

> The Pc calculation, which is the total message "coding probability"

Yes.

> (meaning it uses AC's model probabilities for its computation as a
> product for all p's of the symbols encountered along the way), rounds
> down on every symbol. 

No. Definitely. Not.

> Check any AC source code and look the integer divison when they update 
> the total range size. 

You typically do not update the total range size, but rather the
coding interval low and high, and you have to do this consistently.
If you always round here to the same direction, some intervals will get
larger, and some smaller. I placed an example on top which is pretty
much realistic except that I round to powers of two instead to powers
of ten in real applications. Otherwise, that's the code, sorry.

> The formulas in [41a]
> p.  48, show how this is done in the WNC87 coder (look at the 3rd line
> with the asterisk in the pseudo code).  The same goes for Moffat98,

No. You are confused. If you always round down *interval boundaries*,
you do not round down *interval sizes* because the topmost interval
then gets larger. Pick for example Nelson's coder from the net (just
google for Nelson, Arithmetic Coding), then see yourself.

> What you Willem are talking about is the division of the range among
> different symbols, in which case your compensation does apply. But the
> total interval has shrunk during the total interval update (when the
> integer div rounds down on every symbol).

No, see the example, compute yourself. The *top* interval gets larger
than it should be if you round down its lower boundary. It is really
that simple. AC coding does not have coding gaps. ELS does. AC coding
has a quantization of probabilities, though.

So long,
	Thomas
0
Thomas
1/10/2006 4:14:08 PM
>> in the coder to calc new range size (which discards
>> the reminder), or take a look in (41a) p.  48, formulas
>> for WNC coder (see the 3rd line with asterisk in pseudo
>> code). That is always rounded down.
>
> If the AC source you checked rounds down to calc the
> new range size, then you have a very poorly written
> AC source.  No wonder you are misguided about gaps.

I gave you above the places to check, which are not some "poorly
written AC source" but the well known reference implementations of the
AC coders. So, look Moffat98 or WNC87 source or the reference [41a]
above which shows it quite clearly how the Pc is updated.

You are welcome to show a coder which doesn't round down the size of
the updated total range on every symbol. (It still has to be able to
decode, though.)

--- References:

41a. P.A.J. Volf  "Weighting Techniques In Data Compression: Theory and
Algorithms" Ph.D. thesis, Eindhoven University of Technology, Dec 2002
http://alexandria.tue.nl/extra2/200213835.pdf

0
nightlight
1/10/2006 4:17:12 PM
"Matt Mahoney" <matmahoney@yahoo.com> wrote in 
news:1136906071.456046.325470@g49g2000cwa.googlegroups.com:

> 
> Matt Mahoney wrote:
>> David A. Scott wrote:
>> >    Matt if I can get nightlight to commit to coding his example of
>> > the 3 symols types. I would like to play again with fpaq0. To see
>> > how much better it can be made with as little change as possible.
>> > I like your style but I don't think I will go to the wall and make
>> > it bijective. But the nine times for each eight can be changed to
>> > eight for eight with a ninth only needed for the last byte.
>>
>> There is some room for improvement.  I tried compressing 10,000,000
>> bytes of random charaters A, B, C.  fpaq0 compresses it to 1,982,988
>> bytes.  The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of
>> 1785 bytes.  For 1,000,000 bytes it compresses to 198,322 bytes, a
>> difference of 201.7 bytes.
>>
>> -- Matt Mahoney
> 
> I posted fpaq1.cpp to
> http://www2.cs.fit.edu/~mmahoney/compression/#fpaq0
> It is an improved order 0 arithmetic coder using 64 bit arithmetic.  On
> a 10MB file which repeats "ABC" it is 25 bytes over the theoretical
> limit, and I believe most of this is due to approximations made by the
> model early in compression.
> 
> -- Matt Mahoney
> 


  I find this very interesting. The fact your coding 10,000,000 zeros
should at least add roughly 5.8 bytes to my anwser. Instead they 
differ by 1 byte since I got 24 bytes 1,981,227 for what your claiming
optimal at 1,981,203


  Just complied and ran your code a 10,000,000 file that was
ABCABC...ABCA  note one exta A 

  My code compress to 1,981,227  from what you should was opitmal
of 1,981,203

  You code compressed to 1,981,232 which is only 5 bytes longer
mine. So both I hope are doing what they should. I don't see
how you say this new one is 25 bytes over isn't it 29 bytes over?


David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/10/2006 4:53:33 PM
>> The formulas in [41a]
>> p.  48, show how this is done in the WNC87 coder
>> (look at the 3rd line with the asterisk in the
>> pseudo code).  The same goes for Moffat98,
>
> No. You are confused. If you always round down *interval
> boundaries*, you do not round down *interval sizes*
> because the topmost interval then gets larger.

Sorry, I gave you the wrong page & the wrong line number (that was for
similar code in the decoder).

Take a look in (41a) p. 47, the formula in the 1st line with astersk.
which shows the interval size update. Here he updates (in simplified
notation) the integer quantity P(n-1), which is his integer range for
n-1 symbols, using (in abbreviated notation, see the paper):

      P(n) = floor[p * P(n-1)]   .... (1)

where p is the coding probability of the n-th symbol xn, which is the
symbol now being encoded (for p he uses the conditional probability
that the new symbol xn, just found, occurs after the given n-1 previous
symbols). There is no other statement which assigns a value (updates)
quantity P(n).

Note that at the top he initializes value P(0) to 2^F (where F is the
mantissa precision in bits), i.e. the quantity P is the mantissa of the
actual Pc, meaning the initial width is 1.00.... The fraction dropped
in (1) is gone from P(n) which is to be used on next step. There are no
other updates of P(n). It simply lost the fraction at its edges
irreversibly, no condition, no compensation for it anywhere else.

The compensation occurs in the next line, where he calculates the new
lower boundary point of the interval, Q(n) as (again streamlined, see
paper):

    Q(n) = Q(n-1) + floor [p(x<xn) * P(n-1)]  .... (2)

where now p(x<xn) is conditional cumulative probability for symbols
ahead of xn. For example (this is binary coder) if xn=0, then (2)
doesn't add anything since no other symbol x is smaller than xn. If
xn=1 then (2) adds to the lower boundary Q(n-1), the product:
p(0)*P(n-1), where p(0) is probability of 0.

The same conclusion follows from the basic accounting of the precisions
and buffers used by AC: there are two real numbers < 1.0 in the coding
loop whose precision grows indefinitely for the unlimited precsion AC:

a) the  Pc, which is the product of all probabilities of the symbols
encounterd in the input, and

b) the cummulative probability Qc (which is the exact full precision
rational number < 1.0; note that the Q(n) in (2) is just an integer
representing the trailing bits of Qc as seen from the current AC
window).

The Qc is the content of the AC's encoded output. The precision of that
number does grow indefinitely, since it is the output itself. The other
number, also large with unlimited precsion AC, the message probability
Pc does not grow indefinitely. Its precision is reduced in each step in
(1). Its fraction beyond the F bits precision is discarded
unconditionally and irreversibly after each new symbol -- in (1) we
multiply previous Pc with the probability of the new symbol, and
truncate the result to F significant bits. If you are saying that at
the end the AC has somehow also computed the product Pc in the full
precision, then where is it stored? That number is _independent_ of the
value Qc, hence you would need a second number of indefinite length,
besides Qc, which would be some kind of second work buffer of the size
of output. There is no such buffer in AC.

0
nightlight
1/10/2006 5:20:59 PM
I gave you the incorrect page &  line number to look at (that was for a
similar code in the decoder, which still does the same, but it may
confuse the discussion).

The proper page for the encoder is 47 in [41a], the 1st line with
asterisk. See the details on that code in the post below to Thomas
Richter (you're both wrong the same way):

 http://groups.google.com/group/comp.compression/msg/39c25c38b882532e

0
nightlight
1/10/2006 5:28:33 PM
"nightlight" <nightlight@omegapoint.com> wrote in 
news:1136914113.774637.18840@f14g2000cwb.googlegroups.com:

> 
> I gave you the incorrect page &  line number to look at (that was for a
> similar code in the decoder, which still does the same, but it may
> confuse the discussion).
> 
> The proper page for the encoder is 47 in [41a], the 1st line with
> asterisk. See the details on that code in the post below to Thomas
> Richter (you're both wrong the same way):
> 
> 


   Actually you have yet to give a correct page and line number to 
show what you claim. All you do is farther show people how will read
you are yet how little you seem to comprehend. Why is that?
You have proposed a test for a arithmetic coder to put it in a bad
light yet. Even this simple test you can't seem to do with a coder of
your own design. Why is that?

  The fact that there are simple arithmetic coders that contain no gaps
at the end of file or through the compression show that you know nothing
about arithmetic compression. Quoting or misquoting a paper has little
to do with reality when working code exists that shows your wrong. What
is it about simple real world working code that you can't comprehend.




David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/10/2006 5:47:39 PM
> Even this simple test you can't seem to do with a coder of
>  your own design. Why is that?

The QI source code (which was public for a while now) and the compiled
executable QI.exe which comes with it,  compress the 1,000,000 symbol
input, alphabet size A=3, to smaller size than any AC you can ever have
(even in principle). I already posted on that few messages earlier. You
can run the program with the command line given there and see for
yourself (and then look in the source and see how it did it). If you
want a 10,000,000 symbol input, you need to change the upper limit in
the header to 10,000,000, as described in that post (the default limit
for max number of symbols in the source was arbitrarily set to 1M
symbols). If you want it to read/write to a file, you can play with
that, too.

That's why the code was released in the first place -- so that anyone
who is interested in specific questions that would require some work on
my part which isn't already on my list of things to do, can do it
himself and find out the answer. There is also the 1stWorks Corp.
contact email on the QI web page where you can send any other requests
for consideration.

--------
Earlier post on QI.exe coding performance on 1,000,000 symbol (A=3)
input:
http://groups.google.com/group/comp.compression/msg/ff1ee67d18b63f5a

0
nightlight
1/10/2006 6:19:50 PM
"nightlight" <nightlight@omegapoint.com> wrote in 
news:1136917190.294822.41540@g47g2000cwa.googlegroups.com:

> 
> The QI source code (which was public for a while now) and the compiled
> executable QI.exe which comes with it,  compress the 1,000,000 symbol
> input, alphabet size A=3, to smaller size than any AC you can ever have
> (even in principle). I already posted on that few messages earlier. You
> can run the program with the command line given there and see for
> yourself (and then look in the source and see how it did it). If you
> want a 10,000,000 symbol input, you need to change the upper limit in
> the header to 10,000,000, as described in that post (the default limit
> for max number of symbols in the source was arbitrarily set to 1M
> symbols). If you want it to read/write to a file, you can play with
> that, too.
> 
> 

  Again you show your incredable lack of intelligence. I write
bijective arithemtic coders. Even if your QI was any good and
the more you give useless rants it seems likely its not very good
since you don't have a basic understanding of arithmetic compression.


  This has been pointed out to you several times. Let me try to 
explain it to you one more time. BIJECTIVE ARITHEMTIC FILE CODERS
EXIST. Therefore the set of compressed files is nothing but a reordering
of every possible input file. There are no gaps period.
Again since you seem not to comprehend the obvious. THERE ARE NO GAPS.

  If the problem is such you make the BIJECTIVE ARITHMETIC CODER
work only on files of type X in this case to be in X the file
has to have only the letters A B C. Let the output set be any 
possible file call the set Y then if x is any element of X
and y is any element of Y and the coding is bijective.

That is
compress( uncompress ( y) ) = y
and
uncompress ( compress (x) ) = x

You have an optimal compressor by defination. If the above holds for 
every possible x and y.  Can you do this in a QI file compression
the more you stall the less likely you can. The point is even if you
could write an optimal compress for this class of files at the best
its a slightly different reordering and thats only if you could write
such a compressor. But in terms of compression it will never always
beat the equivalent bijective file compressor.  Its not rocket
science I am sure if you have normal people working for you even
they could tell you where you have gone wrong. Unless you would fire
them for daring to question your great knowledge.


   Again this is not a real world test and you know it. But just 
for laughs how small does it compress a 1,000,000 symbol input
where A = 3?  Is this a zero order static compression or not? You do
know the difference don't you? And if you are only calculating the
size of the index its not a fair comparasion. Since you yourself state
you needed two other numbers to go along with it. ONE a count field for the
number of inputs. TWO the number of inputs where a one is used. So
thats 3 fields. Do you skip this important info in your compression
comparasion with the highly modifed arithmetic coder you chose to use.

  Look you modified a general arithmetic coder to color you comparisons.
You stated a test for a real airhtmetic file coder. You have been wrong
on what arithmetic compression does so are you know saying you don't
have the time to do the test you yourself proposed. Do you really think
most people here belive anything your saying whem you will not do a real
test?

  You must really think people are stupid if you cliam you have test code
yet it somehow seems incapable of using real world files. How would you
feel if we wrote the equivalent and changed you code to fit our needs
and then the people bad mouth your code. I can tell you now you would not
like it. I don't like it when you bad mouth arithmetic and then refuse
to do a real file test. What are you hiding? Are you admiting in your
own why it does not compress very good? I would mod you code to work with
real files but then when it sucks you will scream as we are screaming that
it was not done correctly.

  Do you comprehend anything being said here or is your mind so closed
that you will not take the time to back you own claims. 

  You can be glad I am not reviewing your stuff since the way you present
it without a real test I would reject it. You might have something
you might not but its clear you have not really tested it in a real
world way.



David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/10/2006 7:31:52 PM
David A. Scott wrote:
> "Matt Mahoney" <matmahoney@yahoo.com> wrote in
> news:1136906071.456046.325470@g49g2000cwa.googlegroups.com:
>
> >
> > Matt Mahoney wrote:
> >> David A. Scott wrote:
> >> >    Matt if I can get nightlight to commit to coding his example of
> >> > the 3 symols types. I would like to play again with fpaq0. To see
> >> > how much better it can be made with as little change as possible.
> >> > I like your style but I don't think I will go to the wall and make
> >> > it bijective. But the nine times for each eight can be changed to
> >> > eight for eight with a ninth only needed for the last byte.
> >>
> >> There is some room for improvement.  I tried compressing 10,000,000
> >> bytes of random charaters A, B, C.  fpaq0 compresses it to 1,982,988
> >> bytes.  The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of
> >> 1785 bytes.  For 1,000,000 bytes it compresses to 198,322 bytes, a
> >> difference of 201.7 bytes.
> >>
> >> -- Matt Mahoney
> >


> > I posted fpaq1.cpp to
> > http://www2.cs.fit.edu/~mmahoney/compression/#fpaq0
> > It is an improved order 0 arithmetic coder using 64 bit arithmetic.  On
> > a 10MB file which repeats "ABC" it is 25 bytes over the theoretical
> > limit, and I believe most of this is due to approximations made by the
> > model early in compression.
> >
> > -- Matt Mahoney
> >
>
>
>   I find this very interesting. The fact your coding 10,000,000 zeros
> should at least add roughly 5.8 bytes to my anwser. Instead they
> differ by 1 byte since I got 24 bytes 1,981,227 for what your claiming
> optimal at 1,981,203
>
>
>   Just complied and ran your code a 10,000,000 file that was
> ABCABC...ABCA  note one exta A
>
>   My code compress to 1,981,227  from what you should was opitmal
> of 1,981,203
>
>   You code compressed to 1,981,232 which is only 5 bytes longer
> mine. So both I hope are doing what they should. I don't see
> how you say this new one is 25 bytes over isn't it 29 bytes over?

Oops, you're right, it's 29 bytes over.  I also get 1981232 bytes.

For 1,000,000 bytes I get 198,145 bytes which is 25 bytes over.

fpaq1 compresses files of all zero bytes as follows:
0 -> 1
1 -> 2
10 -> 5
100 -> 9
1000 -> 13
10^4 -> 17
10^5 -> 21
10^6 -> 25
10^7 -> 29
10^8 -> 34

Here is a hex dump of the 34 byte compressed file.  I'm not sure where
the small inefficiencly is.

  FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF
  FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FE 09 DE
  BD F7


fpaq1 is acutally worse than fpaq0 on 10^0 zero bytes.
fpaq0: 10^6 -> 17
fpaq1: 10^6 -> 25

but better on all 1 bits (10^6 FF bytes):
fpaq0: 10^6 -> 446
fpaq1: 10^6 -> 25

Also, Fabio Buffoni posted a version of fpaq0b that uses the 30 bit
precision coder from paqar/paq6fb (carry counter and 1 bit at a time
I/O).  It also improves on fpaq0 using only 32 bit arithmetic.

-- Matt Mahoney

0
Matt
1/10/2006 7:36:32 PM
 Hy nightlight;

> ...
>
>       P(n) = floor[p * P(n-1)]   .... (1)
>
> ...
>
>     Q(n) = Q(n-1) + floor [p(x<xn) * P(n-1)]  .... (2)

 Mei is this a nitpicking discussion. I think after this post
I got your idea, so I did this picture with a (im)possible
development of the probabilities.

 I think you mean that the limited precision arithmetic coder
does not code exactly on the probability of the given source-
symbol.
 Also you say that because there is no backfeed-correction of
the modified probability, in result this quantization noise
adds up.

 I agree with that, in the picture I hope to got it right. The
blue one is the backfeed-quantizer, the green the unlimited
precision-quantizer, the red the AC and the yellow the QI.
 I didn't read the paper of you so I don't know if yours is
not maybe the blue one.

 I agree with that (completly) only in the case of static
modeling! In the case of adaptive modeling it is in fact
not true that the exact probability is the best predictor.
 The difference between adaptive and static modeling is that
static nows it all and adaptive modeling is guessing. The
fixed precision quantized _guess_ in the adaptive case may
be(come) _better_ than the infinite precision _guess_.

 So there are sources for an adaptive modeled AC where rounding
down produces _smaller_ output. For example when you always
underestimate the wrongly predicted MPS.

 Ciao
	Niels

P.S.: the pic is in http://www.paradice-insight.us/pics/AC.png

0
niels
1/10/2006 7:41:19 PM
"Matt Mahoney" <matmahoney@yahoo.com> wrote in
news:1136921792.785730.261850@g43g2000cwa.googlegroups.com: 

> David A. Scott wrote:
>> "Matt Mahoney" <matmahoney@yahoo.com> wrote in
>> news:1136906071.456046.325470@g49g2000cwa.googlegroups.com:
>>
>> >
>> > Matt Mahoney wrote:
>> >> David A. Scott wrote:
>> >> >    Matt if I can get nightlight to commit to coding his example
>> >> >    of 
>> >> > the 3 symols types. I would like to play again with fpaq0. To
>> >> > see how much better it can be made with as little change as
>> >> > possible. I like your style but I don't think I will go to the
>> >> > wall and make it bijective. But the nine times for each eight
>> >> > can be changed to eight for eight with a ninth only needed for
>> >> > the last byte. 
>> >>
>> >> There is some room for improvement.  I tried compressing
>> >> 10,000,000 bytes of random charaters A, B, C.  fpaq0 compresses it
>> >> to 1,982,988 bytes.  The theoretical limit is 1/8 lg 3 =
>> >> 1,981,203, a difference of 1785 bytes.  For 1,000,000 bytes it
>> >> compresses to 198,322 bytes, a difference of 201.7 bytes.
>> >>
>> >> -- Matt Mahoney
>> >
> 
> 
>> > I posted fpaq1.cpp to
>> > http://www2.cs.fit.edu/~mmahoney/compression/#fpaq0
>> > It is an improved order 0 arithmetic coder using 64 bit arithmetic.
>> >  On a 10MB file which repeats "ABC" it is 25 bytes over the
>> > theoretical limit, and I believe most of this is due to
>> > approximations made by the model early in compression.
>> >
>> > -- Matt Mahoney
>> >
>>
>>
>>   I find this very interesting. The fact your coding 10,000,000 zeros
>> should at least add roughly 5.8 bytes to my anwser. Instead they
>> differ by 1 byte since I got 24 bytes 1,981,227 for what your
>> claiming optimal at 1,981,203
>>
>>
>>   Just complied and ran your code a 10,000,000 file that was
>> ABCABC...ABCA  note one exta A
>>
>>   My code compress to 1,981,227  from what you should was opitmal
>> of 1,981,203
>>
>>   You code compressed to 1,981,232 which is only 5 bytes longer
>> mine. So both I hope are doing what they should. I don't see
>> how you say this new one is 25 bytes over isn't it 29 bytes over?
> 
> Oops, you're right, it's 29 bytes over.  I also get 1981232 bytes.
> 
> For 1,000,000 bytes I get 198,145 bytes which is 25 bytes over.
> 
> fpaq1 compresses files of all zero bytes as follows:
> 0 -> 1
> 1 -> 2
> 10 -> 5
> 100 -> 9
> 1000 -> 13
> 10^4 -> 17
> 10^5 -> 21
> 10^6 -> 25
> 10^7 -> 29
> 10^8 -> 34
> 
> Here is a hex dump of the 34 byte compressed file.  I'm not sure where
> the small inefficiencly is.
> 
>   FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF
>   FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FE 09 DE
>   BD F7
> 
> 
> fpaq1 is acutally worse than fpaq0 on 10^0 zero bytes.
> fpaq0: 10^6 -> 17
> fpaq1: 10^6 -> 25
> 
> but better on all 1 bits (10^6 FF bytes):
> fpaq0: 10^6 -> 446
> fpaq1: 10^6 -> 25
> 
> Also, Fabio Buffoni posted a version of fpaq0b that uses the 30 bit
> precision coder from paqar/paq6fb (carry counter and 1 bit at a time
> I/O).  It also improves on fpaq0 using only 32 bit arithmetic.
> 
> -- Matt Mahoney
> 

   First of all this may be totally wrong but its my gut feeling.

IN you coder  X1 and X2 have to have the same bit patterns before
you dump it out.  Look at these two set of data using 32 bit 
registers.

X1 0x7FFFFFFF X2 0x80000000 difference 1. and nothing written out

while 

X1 0x12345678 X2 0x12345679 difference 1 and 3 bytes written out.

I am sure that in practice its not usually that bad and for the
case where you did the three symbols nothing like above popped up.

But when you try all zeros or ones it may pop up and then all the
sudden your actaully using a lot less than 32 bits for the state
registers.



David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/10/2006 8:18:01 PM
"Matt Mahoney" <matmahoney@yahoo.com> wrote in
news:1136921792.785730.261850@g43g2000cwa.googlegroups.com: 

> David A. Scott wrote:
>> "Matt Mahoney" <matmahoney@yahoo.com> wrote in
>> news:1136906071.456046.325470@g49g2000cwa.googlegroups.com:
>>
>> >
>> > Matt Mahoney wrote:
>> >> David A. Scott wrote:
>> >> >    Matt if I can get nightlight to commit to coding his example
>> >> >    of 
>> >> > the 3 symols types. I would like to play again with fpaq0. To
>> >> > see how much better it can be made with as little change as
>> >> > possible. I like your style but I don't think I will go to the
>> >> > wall and make it bijective. But the nine times for each eight
>> >> > can be changed to eight for eight with a ninth only needed for
>> >> > the last byte. 
>> >>
>> >> There is some room for improvement.  I tried compressing
>> >> 10,000,000 bytes of random charaters A, B, C.  fpaq0 compresses it
>> >> to 1,982,988 bytes.  The theoretical limit is 1/8 lg 3 =
>> >> 1,981,203, a difference of 1785 bytes.  For 1,000,000 bytes it
>> >> compresses to 198,322 bytes, a difference of 201.7 bytes.
>> >>
>> >> -- Matt Mahoney
>> >
> 
> 
>> > I posted fpaq1.cpp to
>> > http://www2.cs.fit.edu/~mmahoney/compression/#fpaq0
>> > It is an improved order 0 arithmetic coder using 64 bit arithmetic.
>> >  On a 10MB file which repeats "ABC" it is 25 bytes over the
>> > theoretical limit, and I believe most of this is due to
>> > approximations made by the model early in compression.
>> >
>> > -- Matt Mahoney
>> >
>>
>>
>>   I find this very interesting. The fact your coding 10,000,000 zeros
>> should at least add roughly 5.8 bytes to my anwser. Instead they
>> differ by 1 byte since I got 24 bytes 1,981,227 for what your
>> claiming optimal at 1,981,203
>>
>>
>>   Just complied and ran your code a 10,000,000 file that was
>> ABCABC...ABCA  note one exta A
>>
>>   My code compress to 1,981,227  from what you should was opitmal
>> of 1,981,203
>>
>>   You code compressed to 1,981,232 which is only 5 bytes longer
>> mine. So both I hope are doing what they should. I don't see
>> how you say this new one is 25 bytes over isn't it 29 bytes over?
> 
> Oops, you're right, it's 29 bytes over.  I also get 1981232 bytes.
> 
> For 1,000,000 bytes I get 198,145 bytes which is 25 bytes over.
> 
> fpaq1 compresses files of all zero bytes as follows:
> 0 -> 1
> 1 -> 2
> 10 -> 5
> 100 -> 9
> 1000 -> 13
> 10^4 -> 17
> 10^5 -> 21
> 10^6 -> 25
> 10^7 -> 29
> 10^8 -> 34
> 
> Here is a hex dump of the 34 byte compressed file.  I'm not sure where
> the small inefficiencly is.
> 
>   FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FF FF FF
>   FF FF FF FF  FF FF FF FF  FF FF FF FF  FF FE 09 DE
>   BD F7
> 
> 
> fpaq1 is acutally worse than fpaq0 on 10^0 zero bytes.
> fpaq0: 10^6 -> 17
> fpaq1: 10^6 -> 25
> 
> but better on all 1 bits (10^6 FF bytes):
> fpaq0: 10^6 -> 446
> fpaq1: 10^6 -> 25
> 
> Also, Fabio Buffoni posted a version of fpaq0b that uses the 30 bit
> precision coder from paqar/paq6fb (carry counter and 1 bit at a time
> I/O).  It also improves on fpaq0 using only 32 bit arithmetic.
> 
> -- Matt Mahoney
> 
> 

 Here is the result of arb255 first on a file of
all zeros then on a file of all 0xFF each of
which is 1,000,000 bytes long

0000  E0 2C 30 99 A0 52 8F ED 1A 14 41 67 B1 4C 1B B5  *.,0..R....Ag.L..*
0010  EC 4A E7 25 C2 D8 60  .  .  .  .  .  .  .  .  .  *.J.%..`*
 number of bytes is 23 
0000  1F 2C 30 99 A0 52 8F ED 1A 14 41 67 B1 4C 1B B5  *.,0..R....Ag.L..*
0010  EC 4A E7 25 C2 C7 A0  .  .  .  .  .  .  .  .  .  *.J.%...*
 number of bytes is 23 


  Note only the first byte and last few are different. 
The lack of a string of ones and zeros on output is how
I do the I/O there is a mapping with a hope of making it
more stable so you will not see the string of FFF or 000
for long repeats. Also as an unexpected side affect it
would be better for a last pass compression that is getting
ready for an encryption pass.


 In this case only your leading 0 for the 9 bits for 8 caused
the expansion since in the case the freeend was so large it
as if we both wrote out your last 1 for the count. At least in
the large register cases.

 Note even if alternate I/O used the code most peple either
break intervals so a "one" is high or low.
Or they break the interval so most probalby is either high
or low.

For some reason I choose to do it totally different again so
that it would compress better and as a side effect be better
for the file if a last pass in compression before the 
encryption pass.


 I feel strongly 23 is most likely optimal and that you should
with in a byte or so get the same length for fpaq0 if its all zeros
or all ones. IN your case its like it used to low a probability
when doing the all ones and used to high when doing all zeros.
It most have something the fact you don't carry and that for this
case 32 bits without carry not enough.

David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/10/2006 9:23:01 PM
>-- Willem wrote:
>
> Well, yeah.  It is the 'always compresses better' that
> he keeps harping on about that just simply isn't true.
> And in his discussions, he keeps on demonstrating
> this misunderstanding of his.
> http://groups.google.com/group/comp.compression/msg/0f830d20dcd0ee50

There is no misunderstanding here. There are few frivolous ways in
which "always compresses better" cannot obviously be true and which are
not worth cluttering a discussion with by guarding against when talking
with presumably informed participants. Hence, a statement "always
compresses better" made in this newsgroup, where one can assume
informed readers, should naturally be understood to include implicitly
allowances such as "excluding these frivolus cases (a),(b),(c)... to
which the statement does not apply".

It appears that, having no argument against the substance of the claim,
you have fallen to clinging onto some of these frivolus ways of
circumventing "always". Let me review few of these.

Say, we have agreed to a data set of 256 input samples that we wish to
use to test the QI claims against. You can always "prove" the QI claims
false using any the following "compression techniques" or
"enhancements" of AC:

a) You can write a QI work-alike (or if need be, just a clone, since
the QI source is publicly available), which will create exactly the
same output size as QI, at least on the given test. Hence it trivially
cannot be true that QI "always compresses better" than any other
entropy coder.

b) You may insist that the sample inputs must be fed to the coders as
"files", so that the OS will keep track of their lengths. Then you can
"enhance" any existent AC (or generally any other compressor which
assumes that its output must be a self-terminating code) by providing
the AC decoder with the compressed length you obtained from the OS,
which allows you to save approximately log(H) bits on the AC decode
termination cost. If per chance the competitor doesn't follow the suit
and "enhance" similarly his coder, or if he is entirely unaware that
you are using this age old kind of "compressor helper", you've just
pocketed a log(H) bits edge.

c) You can compile the test set into your coder and decoder executables
and then "compress" any of the 256 test inputs by simply transmitting
to the decoder the 8 bit number telling it which of the samples to
retrieve from the built in list.

d) You can refine the technique (c), so that it is not as blatantly
obvious and which doesn't take as much room. Instead of including all
of the 256 test files into your coder, you just select one and make an
"enhancement" on top of regular AC, so that for that selected input the
"enhancement" outputs a single bit 0, which decoder uses to retrieve
the built in pre-selected sample, and for the remaining 255 samples, it
outputs 1 followed by the original output of the AC. You can further
refine this, so it is harder to detect, by not storing the entire input
sample, but just some section of it, and also by not going all the way
to a single bit, but maybe few or some such. The AC with this
"enhancement" will come out worse off by 1 bit on average, but at least
it will "prove" the QI claim wrong. What's 1 little bit, compared to
the great "victory".

e) But even the "technology" (d), however subtle with its refinements
compared to (c), still leaves some vulnerabilities. It still kind of
looks like cheating, and one can get caught since there as an 'intenet'
hiding in there. Plus it breaks down if the data set changes and you
don't get chance to recompile for the new set. The breaktrough of the
present method is to do a randomized version of (d), where you
"enhance" the AC by letting a random number generator help select which
of the input message patterns, which may be just small bit-string
sections bound to occur by chance in any nontrivial test set, will get
shorter codewords and by how much. As in (d), at some cost to the
average performance, this kind of "compression technology" can "prove"
the QI claim wrong, although it may take much longer time than (d) to
find an actual example "disproving" the claim. But, in return for the
longer wait (possibly astronomically long) compared to (d), this
"compression technology" doesn't require any test set to be given
upfront and it is "good" not just against QI but against any other
genuine advance in the future, since it always can "beat them" at least
on some inputs. And you virtually cannot get caught, since the
deterministic intent of (d) has become a random fluctuation beyond
human will or control.

>From your latest line of arguments on 'random quantization' it would
seem you have taken your final fallback position -- the "compression
technology" of class (e), which is the hardened version of technique
(d), which in turn was a subtler variant of method (c). The random
generator is simply the set of discarded fractional parts of the AC
'range', which in turn can always be used to select a small, random
fluctuation in the codeword lengths (at some cost to the average output
size), hence implement the "compression technology" (e). Well, you're
welcome to continue clinging onto that.

0
nightlight
1/11/2006 8:12:00 AM
nightlight wrote:
)>-- Willem wrote:
)>
)> Well, yeah.  It is the 'always compresses better' that
)> he keeps harping on about that just simply isn't true.
)> And in his discussions, he keeps on demonstrating
)> this misunderstanding of his.
)> http://groups.google.com/group/comp.compression/msg/0f830d20dcd0ee50
)
) There is no misunderstanding here. There are few frivolous ways in
) which "always compresses better" cannot obviously be true and which are
) not worth cluttering a discussion with by guarding against when talking
) with presumably informed participants. Hence, a statement "always
) compresses better" made in this newsgroup, where one can assume
) informed readers, should naturally be understood to include implicitly
) allowances such as "excluding these frivolus cases (a),(b),(c)... to
) which the statement does not apply".

The statement 'always compresses better' is simply false as such, and
should be replaced by 'compresses better on average'.  I have made clear
from the start that this was my argument.  Your statement was and is based
on a misunderstanding of how AC works.  It's as simple as that.

If you had read this newsgroup for longer instead of immediately starting
to post, as is described in the netiquette guidelines, you would have
realised that 'no compressor always outperforms another' is one of the
basics here, and as this is what your statement blatantly goes against,
this caused everyone to fall over you.

The wise move would be to learn from this, instead of clinging to your
claim and going up against everyone.


SaSW, Willem
-- 
Disclaimer: I am in no way responsible for any of the statements
            made in the above text. For all I know I might be
            drugged or something..
            No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
0
Willem
1/11/2006 8:31:09 AM
> The statement 'always compresses better' is simply false as such,
> and should be replaced by 'compresses better on average'.

The statement "better on average" does not capture the essential
relation between the two coders. They are almost twins regarding the
coding efficiency and the method of enumerating messages,  at least
when everything else is set the same (such as coding in  decerementing
AC mode), except  that QI quantization is optimal, while AC
quantization is sub-optimal. Hence, aside for the few frivolous
loopholes, the factual implication is that QI will always compress
better than AC simply because its addends, which are the same between
the two except for scaling factor, expand slower. The AC's truncated
infinite fractions (along with the top-down enforcement of the Kraft
inequality constraints), which don't exist in the integer EC/QI
formulation { whether you leave them as code space gaps arising from
the excess contraction of Pc or shift the burden randomly to the cost
of skewed coding probabilites, using the discarded fractions as the
random number generator to fluctuate probabilities, losing on average
but getting around "always"}  make the relation in compression
efficiencies entirely one sided (not to mention speed, where the main
practical difference is).  When AC is set to code optimally on average,
QI will always produce smaller output. That's why "always" [except the
trivial]  is a proper characterization. That you can trade AC
optimality to get around "always" is a frivolous observation as
illustrated in the listing of few such "techniques".

0
nightlight
1/11/2006 9:13:26 AM
Hi Again,

> > No. You are confused. If you always round down *interval
> > boundaries*, you do not round down *interval sizes*
> > because the topmost interval then gets larger.

> Sorry, I gave you the wrong page & the wrong line number (that was for
> similar code in the decoder).

> Take a look in (41a) p. 47, the formula in the 1st line with astersk.

Look, I prefer to look at working code, and I know what it does.

> Note that at the top he initializes value P(0) to 2^F (where F is the
> mantissa precision in bits), i.e. the quantity P is the mantissa of the
> actual Pc, meaning the initial width is 1.00.... The fraction dropped
> in (1) is gone from P(n) which is to be used on next step. There are no
> other updates of P(n). It simply lost the fraction at its edges
> irreversibly, no condition, no compensation for it anywhere else.

You are confused. The value P(n) is *nowhere* stored in a sane AC encoder,
it is implicit by size of the interval used to encode the symbol n.

> The compensation occurs in the next line, where he calculates the new
> lower boundary point of the interval, Q(n) as (again streamlined, see
> paper):

>     Q(n) = Q(n-1) + floor [p(x<xn) * P(n-1)]  .... (2)

> where now p(x<xn) is conditional cumulative probability for symbols
> ahead of xn. For example (this is binary coder) if xn=0, then (2)
> doesn't add anything since no other symbol x is smaller than xn. If
> xn=1 then (2) adds to the lower boundary Q(n-1), the product:
> p(0)*P(n-1), where p(0) is probability of 0.

> The same conclusion follows from the basic accounting of the precisions
> and buffers used by AC: there are two real numbers < 1.0 in the coding
> loop whose precision grows indefinitely for the unlimited precsion AC:

> a) the  Pc, which is the product of all probabilities of the symbols
> encounterd in the input, 

No. Pc is the result of applying some kind of modelling of the input.
This *need not* to relate to the relative frequencies of the symbols
found in an actual sequence. A static model would have Pc *fixed* once
and for all.

> b) the cummulative probability Qc (which is the exact full precision
> rational number < 1.0; note that the Q(n) in (2) is just an integer
> representing the trailing bits of Qc as seen from the current AC
> window).

What is typically kept as a model is Qc, and not Pc. If you cannot
keep Qc to full precision, you clearly loose coding efficiency because
the "model" implied by the Qc does no longer fit to the model you
intended (i.e. the coder no longer encodes the proposed model optimally,
but rather another model). But nevertheless, there are no gaps. If
you follow the formula (2) closely, you'd see that for the "topmost" symbol
the update rule says that the upper boundary of the coding interval
stays constant, whereas the lower boundary is updated and "rounded down",
making the interval larger than it should, and thus making Pc larger
than it should. This means a discrepancy between model and implementation,
but no gaps.

> The Qc is the content of the AC's encoded output. The precision of that
> number does grow indefinitely, since it is the output itself. The other
> number, also large with unlimited precsion AC, the message probability
> Pc does not grow indefinitely. 

Neither Qc nor Pc have infinite precision in a realistic implementation.

> Its precision is reduced in each step in
> (1). Its fraction beyond the F bits precision is discarded
> unconditionally and irreversibly after each new symbol -- in (1) we
> multiply previous Pc with the probability of the new symbol, 

No, *that* is a model update. You *can* do that, but there's no need
to drive the model like this.

> and
> truncate the result to F significant bits. If you are saying that at
> the end the AC has somehow also computed the product Pc in the full
> precision, then where is it stored? 

I never stated that Pc is kept in infinite precision. I stated that
there are no gaps. In fact, Pc is *nowhere* stored. Instead, high and
low interval counts are stored.

> That number is _independent_ of the
> value Qc, hence you would need a second number of indefinite length,
> besides Qc, which would be some kind of second work buffer of the size
> of output. There is no such buffer in AC.

So then, this doesn't prove that there are gaps. It only proves that
AC cannot implement all possible models. That is true in first place
since the Qcs are quantized (and thus the Pcs) by the precision
limitation.

So long,
	Thomas
0
Thomas
1/11/2006 9:16:09 AM
Hi again,

> I gave you above the places to check, which are not some "poorly
> written AC source" but the well known reference implementations of the
> AC coders. So, look Moffat98 or WNC87 source or the reference [41a]
> above which shows it quite clearly how the Pc is updated.

Aparently, you don't read the sources correctly.

> You are welcome to show a coder which doesn't round down the size of
> the updated total range on every symbol. (It still has to be able to
> decode, though.)

Oh my, that should be your job. Ok, so here we go. The following is
code from a real, existing arithmetic coder students of mine wrote.
It works, is decodable and has no gaps, and it "doesn't round Pc down".

I added comments:

void ArithCod::Encode(UWORD low_count,
		      UWORD high_count,
		      UWORD total)
//
// This encodes a symbol i where high_count is the (scaled) probability
// of finding a symbol with index smaller or equal than the current one,
// and low_count is the (scaled) probability of finding a symbol whose index
// is exactly smaller than the current. total is the scaling factor.
// Specifically, low_count/total = Q[n-1], high_count/total = Q[n] in
// your notation. low_count for the lowest symbol is therefore zero,
// high_count for the topmost (last) symbol in the alphabet equals total.
// m_Low and m_High are the (scaled) borders of the coding interval.
{
  // compute scaling factor
  ULONG step = (m_High-m_Low+1)/total;

  // scale upper and lower interval borders
  m_High = m_Low + step*high_count - 1;
  m_Low = m_Low + step*low_count;

  // This is the update step. Now what happens for the first symbol
  // of the alphabet: low remains constant, high is scaled and
  // due to the finite precision of "step" rounded down.
  // For the last symbol, step * high_count = m_High - m_Low + 1
  // by a simple identity, thus m_High stays constant and m_Low
  // is rounded down. -> The implied probability grows.
  // For all other symbols between, m_Low of the symbol n scales
  // as m_High of the symbol n-1. (Compute!)
  // Thus, no coding gaps and the claim that Pc is always rounded
  // down is refuted.

  //
  // ensure that m_High and m_Low are not in the same half
  // nb: here we generate the output bits!
  while ((m_High & m_Half) == (m_Low & m_Half)) {
    m_Stream.Put(m_High & m_Half); // argument casted to bool
    while (m_UnderflowBits > 0) {
      // output saved underflow bits
      m_UnderflowBits--;
      m_Stream.Put(~m_High & m_Half);
    }
    
    //         before scaling | after scaling | output bit
    //         ===============+===============+============
    // m_Low:  00xy...        | 0xy...        | 0
    // m_High: 00ab...        | 0ab..1        |
    //   or
    // m_Low:  01xy...        | 0xy...        | 1
    // m_High: 01ab...        | 0ab..1        | 

    // m_Half is the representation of 0.5 in the precision
    // of the coder, namely 0x80000000 in the current implementation
    m_Low &= ~m_Half;  // strip of 2nd MSB (we use only 31 bits!)
    m_Low <<= 1;

    m_High &= ~m_Half;  // strip of 2nd MSB (we use only 31 bits!)
    m_High <<= 1;
    m_High |= 1;

    // Here low and high are updated and scaled.
  }

  // prevent underflow if m_Low and m_High are near to m_Half
  // This is the resolution of the carry-over problem.
  while ((m_Low & m_Quarter) && !(m_High & m_Quarter)) {
    m_UnderflowBits++;

    //         before scaling | after scaling
    //         ===============+==============
    // m_Low:  001xy...       | 00xy...
    // m_High: 010ab...       | 01ab..1

    m_Low &= ~m_Quarter;
    m_Low <<= 1;

    m_High &= ~m_Half;  // strip of 2nd MSB (we use only 31 bits!)
    m_High <<= 1;
    m_High |= 1|m_Half;
  }
}

So, that's it. Now, where is the "always rounds down" part you claim
to have, and where are the coding gaps?

So long,
	Thomas
0
Thomas
1/11/2006 9:26:34 AM
>> Take a look in (41a) p. 47, the formula in the
>> 1st line with asterisk.
>
> Look, I prefer to look at working code, and I know
> what it does.

I agree that code is ultimate judge, but only of its
own implementation. The mathematical formulation
given in [41a] captures more general properties, in
particular the excess in the contraction of Pc
due to reduced precision used for Pc computations
by any finite precision AC (which was the point
of contention). The reduction in Pc in turn
leads to the excess in output bits.

>> a) the  Pc, which is the product of all probabilities
>> of the symbols encountered in the input,
>
> No. Pc is the result of applying some kind of
> modelling of the input.

You are mistaking Pc with the "probabilities for the
next symbol x" p(x). The latter is the result of the
model computations and need not be constant or related
to the frequencies. The p(x) is generally a conditional
probability of the form p(xn|<x1..xn-1>) i.e. p(x)
depends in unrestricted way on the entire preceding
sequence. The p(x) is also "coding probability" but
only for symbol x with a given preceding string.

But the AC's total coding probability of the entire
message, labeled here as Pc, and which enters
the AC codeword length formula:

     L = ceiling(log(1/Pc)) + 1 .... (1)

is a plain arithmetic product of all these p(x|<>)
values for the symbols x encountered along the way.
That includes the most general case for any probabilistic
model. Check, for example [41a] p. 42, where he
describes Elias coder and uses Pc (the expression
in the 2nd  line of the 2nd bullet, my Pc is denoted
there as P(x1^n)).

>> There are no other updates of P(n). It simply
>> lost the fraction at its edges irreversibly,
>> no condition, no compensation for it anywhere else.
>
> You are confused. The value P(n) is *nowhere*
> stored in a sane AC encoder, it is implicit by size
> of the interval used to encode the symbol n.

No confusion above. I was talking there about the
mathematical formulation of the algorithm from the
cited thesis. Hence, except for your 1st sentence,
the rest is consistent with what I said.

>> ...on Qc, eq (2)...
>
> But nevertheless, there are no gaps.
> If you follow the formula (2) closely, you'd see that for
> the "topmost" symbol the update rule says that the upper
> boundary of the coding interval stays constant, whereas the
> lower boundary is updated and "rounded down", making the
> interval larger than it should, and thus making Pc larger
> than it should.

You can't make the entire interval larger, unless decodability
is not required). You can only partition the entire
interval non-proportionately to the probabilities of the
individual symbols. But the size of the entire interval
at that point is Pc(n-1), when n-th symbol xn is being
encoded, and that can only be reduced of left same if
you are to comply with decodability. And as the 1st
asterisk line shows, that value is reduced on each step
by truncation. Left at that, this would result in coding
interval gaps (or code space gaps) and the excess in output.
These same types of code space gaps occur in QI, whenever
the rounding up increments the mantissa of the next higher
order binomial.

It is a trivial observation that one can take any code
with "code space gaps", meaning its Kraft inequality sum
adds to less than 1, and remove the gaps by shortening
some codewords, to increase the sum until it reaches
exactly 1. In that sense, debating whether there are
"gaps" or not is vacuous, since any code with gaps
can be transformed into gapless code. AC coder may
choose to remove gaps, e.g. by using the discarded
fractional bits of the Pc as an equivalent of a random
number generator, to randomly pick which codeword lengths
to shorten in order to fill the gaps.

Moffat98 coder is example of such procedure, which in each
step biases up (relative to exact AC) the coding probability
p(MPS) of the most probable symbol (MPS) and down p(LPS),
of the less probable symbol, while keeping their
sum fixed. That does avoid rounding down of the Pc shown
in [41a]. But that doesn't preclude excess contraction of Pc.
It merely pushes it to the renormalization procedure,
where they feed in 0s into the lower bits of their full
range R, mantissa of Pc (until it reaches at least a
"Quarter" of full AC window, 32 bits). When coding MPS,
whose sub-range R(MPS) was expanded by rounding, that
feeding of 0s doesn't help or hurt as far as R(MPS)'s
closeness to the exact arithmetic value xR(MPS). But
for the subrange R(LPS), which is already smaller than
its exact counterpart xR(LPS), feeding of 0s during
normalization is precisely the rounding down of the
Pc shown in [41a]. Since the R(LPS)<R(MPS) (by
definition), the requests for the normalization of R,
which get triggered when R drops below Quarter, will
be triggered more often by the LPS encoding than by
the MPS encoding, the net result of which is the excess
contraction of Pc, only done at different point in the
coding loop than that shown in [41a]. A side-effect
of Moffat98 choice is that their coder introduces
a random fluctuations in codeword lengths, which are
biased to kick up the codeword lengths of LPS and
down those of MPS. That type of random small deviations
in probabilities, dp and dq, from the "optimum" p
and q (for the given model) will generally increase
the output size as dp^2/p + dq^2/q, which can be
large when p or q -> 0.

Hence, while one can trivially get rid of the gaps
(which the explicit form of contraction of Pc shown
[41a] introduces), doing it via random fluctuations
driven by the discarded bits of the Pc as a pseudorandom
number generator, will normally cost in the increased
average code length.


>> The precision of that number does grow indefinitely, since
>> it is the output itself. The other number, also large
>> with unlimited precision AC, the message probability Pc
>> does not grow indefinitely.
>
> Neither Qc nor Pc have infinite precision in a
> realistic implementation.

The Qc, which is the full value (as rational number < 1)
of  the interval base, grows as large as the output.
That is the output (with the redundant "0." omitted).
The Pc only determines the length L of the output via
eq. (1). Since the computed value Pc is constrained not
to increase above its exact value, any reduction in
its precision requires Pc to be kept on the safe side
of decodability constraint, which means farther away
from the exact value (which is optimal relative to
the model being used).


>> Its precision is reduced in each step in
>> (1). Its fraction beyond the F bits precision is
>> discarded unconditionally and irreversibly after
>> each new symbol -- in (1) we multiply previous Pc
>> with the probability of the new symbol,
>
> No, *that* is a model update.

Adding one more label to a specific arithmetic
procedure doesn't change anything. I was describing
what happens in terms of the arithmetic operations
and bits discarded in Pc, which is correct - Pc
is kept within F significant bits of precision,
the rest of its bits (it gains approx. F bits on
each multiply with p(xn)) simply gets discarded on
each coding step. Call it "model update" or anything
else you wish, the arithmetic effect is a loss of
precision of Pc. Since Pc is not allowed to grow
(lest it violates decodability), the only place the
loss of precision has to go is down, which is the
excess contraction of Pc shown in [41a] as an explicit
truncation, or as done in a round about and spread
around way, during normalization, in Moffat98,
with its biased codeword lengths fluctuations.

>> If you are saying that at
>> the end the AC has somehow also computed the
>> product Pc in the full precision, then where
>> is it stored?
>
> I never stated that Pc is kept in infinite precision.
> I stated that there are no gaps. In fact, Pc is *nowhere*
> stored. Instead, high and low interval counts are stored.

Pc is being computed, in effect as floating point number
with a max F bit mantissa. The AC just doesn't keep
track of its exponent, using instead kludgey ad hoc
rules for scaling (which amounts to round about way of
keeping a 2 bit exponent modulo 4, stored as the Quarters
position of R in the AC window) and that it allows mantissa
width to fluctuate few bits below F. The SWI formulation
of AC coding, as in EC/QI, with explicit exponent makes
the procedure much cleaner and more logical (and probably
a bit faster than the branching around kludgey Quarters
and zooming; I didn't implement it, though, so maybe it
won't turn out faster).


> So then, this doesn't prove that there are gaps.
> It only proves that AC cannot implement all possible
> models.

It proves that there is a mathematical formulation
of the AC coder algorithm which shows explicit creation
of the gaps arising from the excess in contraction of Pc.
(resulting from the explicit truncation of Pc after
the multiplication with p(xn)).

One can do it differently and explain the net contraction
of Pc some other way, e.g. by adjusting coder variables
so that it looks as if it wasn't a lost precision but
that there was "random model fluctuation" which
accounts for deviation of Pc from its exact counterpart
Pcx. It's a kind a of "explanation" that reminds me of
a kid who trips and falls, while a parent is watching,
and kid starts pretending to be looking for something
on the floor, saying, I didn't really trip and fall,
I actually meant to come down here to look for something.

0
nightlight
1/11/2006 12:02:40 PM
> So, that's it. Now, where is the "always rounds down" part you claim
> to have, and where are the coding gaps?

As explained with the Moffat98 coder in the previous post, you can
easily avoid the "always" part (but not the net reduction in Pc; just
follow it through the normalization and check in which case the
shifting in of 0s occurs more often), but only at the expense of paying
an extra cost on the expected codeword lengths. That is a special case
of a general tradeoff in coding, and if you proceed just few more steps
in that direction, you will be able to "beat" any coder  on any message
that it encodes in more than one bit. Of course, you will be coding
with an average excess of 1 bit.

0
nightlight
1/11/2006 12:27:55 PM
"nightlight" <nightlight@omegapoint.com> wrote in 
news:1136967120.140527.123410@g49g2000cwa.googlegroups.com:

> 
> b) You may insist that the sample inputs must be fed to the coders as
> "files", so that the OS will keep track of their lengths. Then you can
> "enhance" any existent AC (or generally any other compressor which
> assumes that its output must be a self-terminating code) by providing
> the AC decoder with the compressed length you obtained from the OS,
> which allows you to save approximately log(H) bits on the AC decode
> termination cost. If per chance the competitor doesn't follow the suit
> and "enhance" similarly his coder, or if he is entirely unaware that
> you are using this age old kind of "compressor helper", you've just
> pocketed a log(H) bits edge.
> 
> 

  Yes we here like to use files so the con man can't play games. 
Especially those who claim they have some method that always compresses
better. I guess this is as close as you will come to not only admiting
that your statement about always compressing better was flat wrong.
But that you can't at this point of time get QI to even coder better
than an arithmetic based on your own suggested test. Which you assumed
wrongly that arithmetic would fail. You failed because even though
you can quote texts you fail to understand them and failed to realize
one can code arithmetic without gaps. I suspect at this point of time
you realize people here are not as dumb and as easily bullied as
your workmates who don't want to waste the time to try to educate you.

  Yes it is slick how an AC coder suits its self to what you are 
calling "" this age old kind of "compressor helper" ""  What I see
is someing crying that his code is not that good so attempt to put
real code down by calling it "age old" thats a real laugh. Do you
have any references to this so called age old method or are you just
pissed that you don't know how to use what you so nicely call an
"age old" method. First of all I doubt its "age old" but I see why
you claimed it was. If you pay me enough cash I could fix your coder
to work properly without gaps at least for small files. Can you do
that since its an "age old" method. It shouldn't be hard if you have
any real programming experience. But in the end even if you do 
get it working it can't be better than an optimal bijective arithmetic
coder. It might be faster but it will not be a better entropy
compressor to think so is rather foolish.


  My guess is that you actually tried to beat the test you want people
to write a arithmetic coder for and realized that you can't beat it.
Why should you write code where the common man can actually test your
code. You seem to be some sort of control freak and want people to
honor you with actually running real world tests maybe you can do that
it work but here many will just laugh at you. Or laugh along with you.



David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/11/2006 3:19:42 PM
> Mei is this a nitpicking discussion. I think after this post
> I got your idea, so I did this picture with a (im)possible
> development of the probabilities.

It is a bit nitpicky, although that makes everyone go over the things
they imagined they knew. A good refresher for the old thoughts and it
brings in new thoughts which none of us would have thought on their
own.  These kinds of "nitpicky" arguments take a life on their own.

> I agree with that, in the picture I hope to got it right. The
> blue one is the backfeed-quantizer, the green the unlimited
> precision-quantizer, the red the AC and the yellow the QI.
>  I didn't read the paper of you so I don't know if yours is
> not maybe the blue one.

How was that graph made? Did you measure some errors?

Interesting observation, in any case. Indeed, even though QI provides
the optimum quantization of enumeration at any given _fixed_ coder
precision, that is not the best one can do in principle, since the
fixed precision is a mere programming convenience.  In the QI source
kit, the radix and the binary entropy coders use different kinds of
quantization, with the radix & permutation coders being more optimal
since they perform "delayed rounding" within any one quantized power of
radix (see [T3] p. 7).  That would be roughly equivalent of binary
coder only quantizing every m-th row, with m up to 32 (see [T3] p.9,
N2) and reconstructing the m-1 rows in between on the fly, without
additional quantization. As result the radix & permutation coders
redundancy is even lower than coding the same input using regular
multi-alphabet coder (which uses regular binary coder as its sub-task
coder). When D. Scott posted his "nightlight challenge"  I did run his
alpha=3 array setup on a million digit array, using the sample program
QI.exe from the QI source kit, and the redundancy was almost
undetectable, about 1.6e-10 bits/digit. The binary coder would have had
about 5e-8- redundancy on the similar size input (both still much
smaller than AC's redundnacy). That test was described in the post:

http://groups.google.com/group/comp.compression/msg/ff1ee67d18b63f5a

> So there are sources for an adaptive modeled AC where rounding
> down produces _smaller_ output. For example when you always
> underestimate the wrongly predicted MPS.

The Moffat98 AC uses quantization which as a side-effect systematcally
skews the prediction in favor of overpredicting the MPS,
underpredicting LPS. That is a good feature if you're tuning the coder
for benchmarks on stationary Bernoulli sources, since the coder locks
into the stationary distribution quicker. It does cost it bit on the
average redundancy, which is above what it can be. It also is a
hard-hardwired "feature" in the very center of the arithmetic of its
coding, not something that a modeler can control, in case that is not
how it wishes to skew the probabiliies. It is a bad idea, in my view,
for a coder to take it upon itself to make such decision on how to bias
the codeword lengths so systematically.

QI Source & preprints are at:
http://www.1stworks.com/ref/qi.htm

0
nightlight
1/11/2006 3:55:26 PM
> I guess this is as close as you will come to not only admiting
> that your statement about always compressing better was flat wrong.

As explained in that post, I did assume informed enough participants,
to bother spelling out what I though was well understood by everyone.
That post was about the wrong assumption, not about the wrong
statement. The statement was and is fine, under the conditions
explained in the post. Although, as Willem helpfully noted,
comp.compression isn't a place one ought to make it flat out in such a
form, due to particular sensitivities in here. That he may be right
about. I do have a bit of wooden ear, or so I am told every now and
then, for these kinds of vibes and social clues.

0
nightlight
1/11/2006 4:19:07 PM
Hi,

> >> a) the  Pc, which is the product of all probabilities
> >> of the symbols encountered in the input,
> >
> > No. Pc is the result of applying some kind of
> > modelling of the input.

> You are mistaking Pc with the "probabilities for the
> next symbol x" p(x). The latter is the result of the
> model computations and need not be constant or related
> to the frequencies.

There's no "mistaking" here; Modelling the input means
that you imply some model that generated the sequence,
and thus it is a probability.

> The p(x) is generally a conditional
> probability of the form p(xn|<x1..xn-1>) i.e. p(x)
> depends in unrestricted way on the entire preceding
> sequence. 

We are in agreement here, no need to discuss.

> But the AC's total coding probability of the entire
> message, labeled here as Pc, and which enters
> the AC codeword length formula:

>      L = ceiling(log(1/Pc)) + 1 .... (1)

> is a plain arithmetic product of all these p(x|<>)
> values for the symbols x encountered along the way.

Properly rounded/approximated, though.

> That includes the most general case for any probabilistic
> model. Check, for example [41a] p. 42, where he
> describes Elias coder and uses Pc (the expression
> in the 2nd  line of the 2nd bullet, my Pc is denoted
> there as P(x1^n)).

I'm still not arguing here. If Pc is the probability of
the output message, then fine (I would believe it was something
different in your example), but Pc itself is nowhere aparent
in the implementation, and thus need not to be kept. What *is*
quantized are the terms that contribute the Pc --- no arguing about
that --- but still in a way such that there are no coding gaps. That
is, the individual estimations an AC model can make are *different*
from what the original model intended due to quantization, and by
that, you get an ineffiency I do not deny. What I *do* deny is
that there is unused code space due to "round down". 

> > But nevertheless, there are no gaps.
> > If you follow the formula (2) closely, you'd see that for
> > the "topmost" symbol the update rule says that the upper
> > boundary of the coding interval stays constant, whereas the
> > lower boundary is updated and "rounded down", making the
> > interval larger than it should, and thus making Pc larger
> > than it should.

> You can't make the entire interval larger, unless decodability
> is not required). You can only partition the entire
> interval non-proportionately to the probabilities of the
> individual symbols. 

Yes.

> But the size of the entire interval
> at that point is Pc(n-1), when n-th symbol xn is being
> encoded, and that can only be reduced of left same if
> you are to comply with decodability. 

No, the size of the interval you divide and you quantize into is
Pc(n-1) * 2^m where m is the number of scaling steps performed so far
(if you scale by doubling, that is): This is the interval which
defines the coarseness of the "quantization" of the model
probabilities. 

> And as the 1st
> asterisk line shows, that value is reduced on each step
> by truncation. Left at that, this would result in coding
> interval gaps (or code space gaps) and the excess in output.

No, and no. You make the coding interval smaller, but by making
it smaller, the scaling of the sub-intervals into the full interval
space gets coarser, and thus the quantization of probabilities coarser,
and this until the code space gets re-scaled. However, the *subdivision*
of the full interval into the sub-intervals for each symbol is always
such that the entire coding interval gets used. Thus, no gaps here.

> These same types of code space gaps occur in QI, whenever
> the rounding up increments the mantissa of the next higher
> order binomial.

> It is a trivial observation that one can take any code
> with "code space gaps", meaning its Kraft inequality sum
> adds to less than 1, and remove the gaps by shortening
> some codewords, to increase the sum until it reaches
> exactly 1. In that sense, debating whether there are
> "gaps" or not is vacuous, since any code with gaps
> can be transformed into gapless code. 

Statement: The presented AC has no gaps.

This holds for quite a lot of AC implementations, including
Moffat's and Nelson's. It does not hold for ELS.

> AC coder may
> choose to remove gaps, e.g. by using the discarded
> fractional bits of the Pc as an equivalent of a random
> number generator, to randomly pick which codeword lengths
> to shorten in order to fill the gaps.

There are no bits of Pc discarded, simply because there's nothing
to discard. Pc is never kept/represented in an AC coder. It gets
represented/computed step by step.

> Moffat98 coder is example of such procedure, which in each
> step biases up (relative to exact AC) the coding probability
> p(MPS) of the most probable symbol (MPS) and down p(LPS),
> of the less probable symbol, while keeping their
> sum fixed. 

You are talking about a *different* thing here. The AC coder does not
care about p(LPS) and p(MPS) and there is no interpretation of
probabilities here. You can keep these numbers fixed if you know the
source you compress. It keeps working in any case, just not as a good
compressor. By biassing p(LPS) / p(MPS), Moffat builts up a Markov
model that might or might not appropriate to the source.

> That does avoid rounding down of the Pc shown
> in [41a]. But that doesn't preclude excess contraction of Pc.

I don't see what you mean by "contraction" of Pc. There are two very
different things happening there: 1) The coding of symbols, for given
interval division. This is the AC coder. 2) Choosing the interval
division such that it fits to a given model of the source.

For 1), there are absolutely no coding gaps, regardless of limited
precision of the interval, the interval subdivision and so on.

For 2), the model that is implicitly used in 1) differs from the model
that is implied by keeping the symbol counts. *That* is a problem, but
it does not cause any unused bits in Pc. It is rather a "modelling error"
due to not fitting the desired model due to quantization.

> It merely pushes it to the renormalization procedure,
> where they feed in 0s into the lower bits of their full
> range R, mantissa of Pc (until it reaches at least a
> "Quarter" of full AC window, 32 bits). 

It doesn't push "unused bits into Pc". The interval there is *NOT* Pc,
it is rather "the 32 low-order bits" of Pc, where the high-order bits
are already in a file, and the number of bits a carry can propagade into
are kept in a counter. Thus, the implied Pc at the start of the coding
procedure is 0, or to be more clear, an infinite string of zeros 
representing this zero as infinite binary fraction. By upscaling, all
you do is to move - as a sliding window algoritm - the *still* zero
bits into the internal accumulator of the algorithm, ready to modify it,
and move the already computed bits out of Pc, writing it into the file
(or counting them to resolve carry-over propagation).

> When coding MPS, whose sub-range R(MPS) was expanded by rounding, that
> feeding of 0s doesn't help or hurt as far as R(MPS)'s
> closeness to the exact arithmetic value xR(MPS). But
> for the subrange R(LPS), which is already smaller than
> its exact counterpart xR(LPS), feeding of 0s during
> normalization is precisely the rounding down of the
> Pc shown in [41a]. 

But no way! All this implies is that the current subdivision of the
full coding interval is only precise to M bits, where M is the number
of bits written so far, plus the bits represented in the carry over-
counter, plus the bits kept in the accumulator representing the coding
interval. By multiplying this accumulator with a power of two, you
just alter the window "view" onto Pc, i.e. change the coordinate system,
but you do not at all alter any interval.

Example: Binary arithmetic encoder, Moffat/Nelson

Coding interval	  		[0,874]
Coding interval for the MPS:	[0,513)
Coding interval for the LPS:	[513,874]

After upscaling (note that the implicit m_High is scaled by two, then 1 added!)

Coding interval			[0,1749]
MPS subinterval			[0,1026)
LPS subinterval			[1026,1749]

*NOTE* the exlusive upper boundary between the intervals, the
inclusive boundary for the total coding interval. That is, any number
of the accumulator value that is smaller than 513 before the scaling,
or smaller than 1026 after the scaling is an MPS, otherwise it is an
LPS. By upscaling, you *neither* alter the ratio of MPS to LPS, nor
the placement of the accumulator value, nor any gap is opening
here. Note that the size of the interval is upper bound (exlusive!) -
lower bound. The interval [0,11] has 12 points, [0,11) just 11.

> Since the R(LPS)<R(MPS) (by
> definition), the requests for the normalization of R,
> which get triggered when R drops below Quarter, will
> be triggered more often by the LPS encoding than by
> the MPS encoding, the net result of which is the excess
> contraction of Pc, only done at different point in the
> coding loop than that shown in [41a]. A side-effect
> of Moffat98 choice is that their coder introduces
> a random fluctuations in codeword lengths, which are
> biased to kick up the codeword lengths of LPS and
> down those of MPS. That type of random small deviations
> in probabilities, dp and dq, from the "optimum" p
> and q (for the given model) will generally increase
> the output size as dp^2/p + dq^2/q, which can be
> large when p or q -> 0.

*That* effect is caused by the unability to represent the model
parameters (in order not to call them probabilities) to the model,
that is, you cannot subdivide any given interval exactly as you would
need to. I don't say that this cannot be large. What I say is that this
is not due to gaps in the subdivision of the interval.

> Hence, while one can trivially get rid of the gaps
> (which the explicit form of contraction of Pc shown
> [41a] introduces), doing it via random fluctuations
> driven by the discarded bits of the Pc as a pseudorandom
> number generator, will normally cost in the increased
> average code length.

*Sigh* No. There are no unused bits. Every number in the interval
subdivision belongs to either LPS or MPS, and if scaled consistently
(namely, multiply number by two) the same number belongs to the same
subinterval afterwards, and every number after upscaling belongs to
either the LPS or the MPS, and there is no single "unused" number
not assigned to any of the two.

> > So then, this doesn't prove that there are gaps.
> > It only proves that AC cannot implement all possible
> > models.

> It proves that there is a mathematical formulation
> of the AC coder algorithm which shows explicit creation
> of the gaps arising from the excess in contraction of Pc.
> (resulting from the explicit truncation of Pc after
> the multiplication with p(xn)).

In that case, if the implementation shows no gaps, but the
mathematical formulation does, this means that a) the mathematical
description is invalid, or b) that you failed to read it. (-:

> One can do it differently and explain the net contraction
> of Pc some other way, e.g. by adjusting coder variables
> so that it looks as if it wasn't a lost precision but
> that there was "random model fluctuation" which
> accounts for deviation of Pc from its exact counterpart
> Pcx. It's a kind a of "explanation" that reminds me of
> a kid who trips and falls, while a parent is watching,
> and kid starts pretending to be looking for something
> on the floor, saying, I didn't really trip and fall,
> I actually meant to come down here to look for something.

So what *is* a coding gap for you, then? For me it means
unused code space. And there simply is none.

So long,
	Thomas
0
Thomas
1/11/2006 4:52:57 PM
"nightlight" <nightlight@omegapoint.com> wrote in 
news:1136994926.763677.126950@g44g2000cwa.googlegroups.com:
.....

> coder). When D. Scott posted his "nightlight challenge"  I did run his
> alpha=3 array setup on a million digit array, using the sample program
> QI.exe from the QI source kit, and the redundancy was almost
> undetectable, about 1.6e-10 bits/digit. The binary coder would have had
> about 5e-8- redundancy on the similar size input (both still much
> smaller than AC's redundnacy). That test was described in the post:
> 
....

  Actually you never really described it fully in your post.
How many bits did this so called test of yours compress to?
And did you include the lengths of the extra fields for your
so called length and your so called number of ones? And was it
really a static zero entropy type of compression where each of
the 3 symbols assumed equally likely? That is what you wanted
others to do with the arithmetic. And last but not least what
are the results with different random million digit arrays?


  I know you suffer from a wooden ear as you state but are you
capable of actually anwsering these simple questions. Or is it
like arithmetic coding in the sense your not really sure what
your own code does you obviously don't seem to understand
arithmetic coding based on your previous posts.






David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/11/2006 6:06:56 PM
>> Moffat98 coder is example of such procedure, which in each
>> step biases up (relative to exact AC) the coding probability
>> p(MPS) of the most probable symbol (MPS) and down p(LPS),
>> of the less probable symbol, while keeping their
>> sum fixed.
>
> The AC coder does not care about p(LPS) and p(MPS)
> and there is no interpretation of probabilities here.
>

I am calling them (empirical) "probability" here. In
Moffat98 code these are simply ratios of counts to
total count. What I am saying above is that M98
systematically overvalues the p(MPS) and gives it
always relatively larger interval than to p(LPS),
which is undervalued.


> There are no bits of Pc discarded, simply because
> there's nothing to discard.
....
> I don't see what you mean by "contraction" of Pc.
....
> It doesn't push "unused bits into Pc". The interval
> there is *NOT* Pc, it is rather "the 32 low-order
> bits" of Pc, where the high-order bits are already
> in a file, and the number of bits a carry can
> propagade into are kept in a counter. Thus, the
> implied Pc at the start of the coding procedure
> is 0, or to be more clear, an infinite string of
> zeros representing this zero as infinite binary fraction.
.....

The above shows a major confusion somewhere on what
Pc is. You started at the beginning of the post,
on the issue of Pc being the (coding) message
probability, apparently fine, But somewhere before
the Moffat98 section, you're suddenly talking of
Pc as if that were the Qc, the cumulative probability
of the message, which is the AC output. In other places
you're interpreting Pc as it mantissa of Pc, the F bit
integer Pc(n) (from page 47 on WNC encoder pseudo code).

Yet you also have Qc involved all along as well,
so there is some major crossing of wires on these
two (or three) over there. Consequently anything I
said after the 'Pc is a product' section (which we
now agree on, or so I thought), was interpreted and
responded to absolutely without any contact at all
with what I was saying. So I'll leave those replies
alone for the moment, until the basic definitions
of the two key symbols is in sync.

Since this may be only a minor fall out of sync
regarding the Volf's AC formulation (which is the
same one as in his advisors' Tjalkens thesis, [36],
both have many details and highlight AC aspects
and perspective you won't find anywhere else,
especially [36]), I will just point you to the p. 43
in [41a] where Qc and Pc arithmetic is shown, with
their bits aligned as they get added. It may be that
my use of Pc(n), which is the mantissa of Pc after
n symbols have been encoded ( Pc = Pc(n) * 2^k, where
k is exponent of Pc) has led to confusion of Pc
and Pc(n). How did Pc get crossed with Qc, as shown in the
quote above, I have no idea.

On p. 43, you see the bits of Pc being added
to the cumulative probability Qc (which is the
AC output). The only bits of Pc whose values
are nonzero are the most significant F bits of
Pc, which are held in the integer Pc(n). The
exponent of Pc is there integer k. Pc is simply
a floating point number, with mantissa Pc(n)
and exponent k. Qc is not a floating point number
but an unlimited precision binary fraction (its
bits are shown as q1,q2... on p. 43). I also label
throughout the exact values of Pc and Qc, as Pcx
and Qcx. Another relevant number is probability
p(xn) of the n-th symbol found, which is xn.
The p(xn) is another floating point number with
max F bits of precision. It has mantissa and
exponent, e.g. p(xn)= p.M * 2^p.E. We could
similarly denote Pc = Pc.M * 2^Pc.E, where
our integer Pc(n-1) is mantissa Pc.M, and k
from their figure on p. 43 is exponent Pc.E.

On page 47 (with WNC coder) the first line with
the asterisk, shows multiplication of the floating
point number Pc with another floating point number
p(xn) in a form of a truncated multiplication of
p(xn), which is left as an F bit floating point
number, with integer mantissa of Pc after n-1
symbols have been encoded, the integer Pc(n-1).
Note that p(xn) is written as conditional probability
on p. 47, as we discussed and agreed on.

The first critical point of mixed up symbolism was
what happens in the product. That product represent
multiplication  Pc * p(xn). Obviously, the actual
multiplication done there is done on two mantissas,
while their exponents get added, i.e. the low level
format (as if implementing floating point mul yourself)
product in the first asterisk line on p. 47 is:

  Pc * p(xn) = (Pc.M * p.M) * 2^(Pc.E+p.E) ... (1)


Since the product of mantissas (Pc.M * p.M) generates
2*F bits, the lowest F bits of the product are discarded.
The usual AC scaling and normalizations and windows...
are simply the round about way of doing the plain floating
point manipulation of mantissas and exponents in disguise.
You can now go back to addition fig on page 43, and see
that is simply a floating point number Pc being added,
now _without any approximations_ (ignoring for a moment
carry blocking), to the unlimited precision binary fraction
Qc. As with QI SWI arithmetic, even though Qc is unlimited,
the sum on p. 43 is just a regular integer addition of
Pc.M at the right place (given by Pc.E, which is k)
into the Qc. This is all exacly same as done with QI's
SWI arithmetic, where eq. (21) in [T3] is the approximate
arithemtic, with rounding up to compute addends (to be
stored into table), while the index computation via
adding of table addends into the accumulated index I,
eq. (22), is exact. QI's index "I" is AC's index Qc
(a cumulative prob.). QI's quantized addends C(n,k)
are AC's truncated addends computed in eq. (1) (for
binary coder there is just one addend, being added
when LPS is found, the same as in QI, except that AC
must calculate via (1) all the intermediate addends
which don't get added on MPS, while QI has them
precomputed in the universal table which is independent
of source probabilities).

I will pause here and leave you to synchronize
your definitions and notation, after which you should
realize that all that I said, to which you seemed to
objected above, was perfectly correct, when the correct
definitions and semantics for the symbols are applied.



--- References

36. T.J. Tjalkens  "Efficient and fast data compression
codes for discrete sources with memory"
Ph.D. thesis, Eindhoven University of Technology, Sep 1987
http://alexandria.tue.nl/extra3/proefschrift/PRF5B/8709277.pdf

41a. .A.J. Volf  "Weighting Techniques In Data Compression:
Theory and Algorithms" Ph.D. thesis, Eindhoven University of
Technology, Dec 2002
http://alexandria.tue.nl/extra2/200213835.pdf

0
nightlight
1/11/2006 7:46:32 PM
This is the post:
http://groups.google.com/group/comp.compression/msg/ff1ee67d18b63f5a

All you are asking is right there, answered, plus many you didn't ask.
For example, you ask here:

> How many bits did this so called test of yours compress to?

There it answers:
----
 The size it produces is 1584962.50... bits, which compared
 to the exact N*log(3) entropy has an excess of 1.62 e-04 bits on the
 total of 10^6 symbols (i.e. the excess per symbol is 1.6e-10 bits).
---

> And did you include the lengths of the extra fields for your
> so called length and your so called number of ones?

There it says how it added these numbers to match the AC coding
conditions from the numbers QI.exe gave it (which wasn't running
a test for AC but stand alone function) -- It says righ below it added
70 bits (which was very generously rounded up, all pieces separately):

--------
To compare that with AC output size, one option is to make AC work in
static mode without adapting to probabilities and make it not count the
transmission of frequency table or number of symbols n (which is the
same condition that the QI.exe figure applies to).

Alternatively, you can add to QI's output the size to transmit N, A and
the frequency table. QI.exe has a command QI cl<int> which computes
self-delimiting size for <int>, or just "QI cl" to list a table for
common values. There you get for N=10^6 its self-delimiting length
L(N)=27.543 and for L(A)=2.49 bits. The cost for frequency table with
QI/EC is the log of the binomial C(N+A-1,A-1), for N=10^6 and A=3,
which is log(C(1000002,2))=38.863 bits, which totals (each rounded
separately, which they don't need to) 28+3+39=70 bits to be added to
QI's output to match the adaptive AC's coding conditions. Since the
QI's output was essentially the entropy, the QI's total is 70 at most
whole bits above the "entropy" (note the "entropy" N*log(3) didn't
include N; also in high entropy limit QI doesn't need to transmit freq.
table, but one would need to modify AC to work in high entropy limit,
so I added table to QI, which distorts a bit comparison to entropy H).
-----------


> And was it really a static zero entropy type of compression where
> each of the 3 symbols assumed equally likely?

That was answered right there on the top line: the equiprobable symbols
are "high entropy limit" (as opposed to "low entropy limit" which is
for highly sparse arrays, which one symbol vastly dominating others).

------------
The QI.exe file which you may already have (from the source; current
source version is 1.03) has a command line option to test it on that
same input (which is a high entropy limit for multi-alphabet coding,
and which in I call radix codes):

  QI  cr3  n1000000  i100

which tells it to code inputs in radix 3 (this can be any 32 bit value
above 2), to use input of 1 million symbols (there is a constant
MAXRDIG in Intro.h which limits the input size to max 2^20 or 1M
digits, you can change that to allow larger sizes e.g. to 16 MEG) and
to run the test 100 times on 100 random inputs  (i100  for 100
iterations).
----------------------------

The comparison to Moffat98 answer (for 10^6 symbol file): QI output
(adjusted generously to favor AC) came out 3403 bits shorter than AC's:
----------------------------
Now, running the Moffat98 coder (in 8 bit max symbol coding mode &
frugal bits enabled), it outputs: 1588435.52 bits (this is avg. over
100 iterations), which is 3473 bits above the entropy, or 3403 bits
above the comparable QI output size. (Note that Mofat98 coder has
generally a slight bias to code worst for the max entropy inputs, but
it gains in return on very low entropy inputs.)
-------------------------------

All that you can run right there and look at the source that there was
no cheating. The test was run on 100 random inputs. And each encode
decode cycle checked that decoded data matches inpuit. For QI, the
outputs all come out to exactly same size, i.e. the QI takes for size
in high entropy limit the max one can get for converting number of 10^6
digits given in radix 3 into binary number (which is QI's output for
high entropy limit coding, essentially the radix coding/decoding).

0
nightlight
1/11/2006 8:06:24 PM
> It is a bit nitpicky, although that makes everyone go over the things
> they imagined they knew. A good refresher for the old thoughts and it
> brings in new thoughts which none of us would have thought on their
> own.  These kinds of "nitpicky" arguments take a life on their own.

 I think this is not the group to discuss this in that form, Bloom,
Ross
and the other 'giants' that were able to span the bridge between pure
information-theory and implementation left or are not willing to
participate any more.
 Being patient, unprejudge and curious is also a rare quality in the
usenet. With all due respect. :)

>>I agree with that, in the picture I hope to got it right. The
>>blue one is the backfeed-quantizer, the green the unlimited
>>precision-quantizer, the red the AC and the yellow the QI.
>> I didn't read the paper of you so I don't know if yours is
>>not maybe the blue one.
> How was that graph made? Did you measure some errors?

 x-axis is symbol_at_pos(x)
 f(x) = -p(symbol_at_pos(x)) after/with quantization

 The infinite precision arithmetic coder doesn't have
quantization-noise,
in the context of static modeling, so it's the 'optimum' to measure the
other coders against. The bigger the difference between green and any
one
of the others, the bigger the inefficiency introduced by quantization.
 The y-axis has no legend, because it's only to show the difference
between
real probability and quantized proability and should be considered as
extremly zoomed.

 The blue backfeed somehow-coder is a quick idea, I guess modifying an
AC with self-correction in that way is horrible complex (in relation to

without).

> Interesting observation, in any case. Indeed, even though QI provides
> the optimum quantization of enumeration at any given _fixed_ coder
> precision, that is not the best one can do in principle, since the
> fixed precision is a mere programming convenience.  In the QI source
> kit, the radix and the binary entropy coders use different kinds of
> quantization, with the radix & permutation coders being more optimal
> since they perform "delayed rounding" within any one quantized power of
> radix (see [T3] p. 7).  That would be roughly equivalent of binary
> coder only quantizing every m-th row, with m up to 32 (see [T3] p.9,
> N2) and reconstructing the m-1 rows in between on the fly, without
> additional quantization.

 So in principle you're providing a "virtually" bigger and alinear
coding-register? Even adaptive alinear?

> As result the radix & permutation coders
> redundancy is even lower than coding the same input using regular
> multi-alphabet coder (which uses regular binary coder as its sub-task
> coder).

 Hmm, technically/mathematically the decomposition of a multi-alphabet
coder into a multi-step binary coder is identical. But it never appears
because then every step within the coding of a single symbol raises
quantization-noise.
 With decomposition I mean binary choices MPS yes/no, SecPS yes/no,
ThrPS yes/no, ..., LPS. It's easier to tune nevertheless.

> When D. Scott posted his "nightlight challenge"  I did run his
> alpha=3 array setup on a million digit array, using the sample program
> QI.exe from the QI source kit, and the redundancy was almost
> undetectable, about 1.6e-10 bits/digit. The binary coder would have had
> about 5e-8- redundancy on the similar size input (both still much
> smaller than AC's redundnacy). That test was described in the post:

 Hehe, _much_ smaller here means whole percentages, if not tenth of. :)

> ... It is a bad idea, in my view,
> for a coder to take it upon itself to make such decision on how to bias
> the codeword lengths so systematically.

 Yes, but it's an understandable approach in the context of
testing against small (and handselected) corpi. For my image-
compression project I tried to compress a lot very nice looking
fractals too, it's my over-adaption testbed.

 Ciao
    Niels

0
niels
1/11/2006 9:27:00 PM
"nightlight" <nightlight@omegapoint.com> wrote in 
news:1137009984.700743.75690@g44g2000cwa.googlegroups.com:

> This is the post:
> http://groups.google.com/group/comp.compression/msg/ff1ee67d18b63f5a
> 
> All you are asking is right there, answered, plus many you didn't ask.
> For example, you ask here:
> 
>> How many bits did this so called test of yours compress to?
> 
> There it answers:
> ----
>  The size it produces is 1584962.50... bits, which compared
>  to the exact N*log(3) entropy has an excess of 1.62 e-04 bits on the
>  total of 10^6 symbols (i.e. the excess per symbol is 1.6e-10 bits).
> ---
> 

   I assume if your coder is honest that the number you got represents
the average of 100 runs its very close to the real number N*log(3)
1584962.50072115618145373894394782  As you can tell this number makes
your code shine and its better than the real entropy. You must have
got 50 cases where it took 1584962 bits and 50 where it got 1584963
a change of just one bit in 1 out of the 100 cases and you would get
a different anwser I wonder what the odds of that are.

   Very interesting your coder puts out fractional bits. Why do I doubt
that. And again you don't answer the total questions.  You mention in
paper there are 3 parts. One part is the lenght of thing compressed and
the other is the number of ones and the third which I assume the above
is the index.




>> And did you include the lengths of the extra fields for your
>> so called length and your so called number of ones?
> 
> There it says how it added these numbers to match the AC coding
> conditions from the numbers QI.exe gave it (which wasn't running
> a test for AC but stand alone function) -- It says righ below it added
> 70 bits (which was very generously rounded up, all pieces separately):
> 


   So lets not be generous and round down. from the combination part
you get 1584962 for the rest of needed overhead you need say 69 bits.
Would it be far to say that you compressed the 1000000 sybmols to
1585031 bits, Let's be clear about this the list is for 3 symbols
the total bits only refers to the compressed combinations and file
lenght it does not carry in those bits just exactly what those symbols
are. If thats not correct could you give a straight anwser. If it is
correct could you confirm it with a yes. 


 
> --------
> To compare that with AC output size, one option is to make AC work in
> static mode without adapting to probabilities and make it not count the
> transmission of frequency table or number of symbols n (which is the
> same condition that the QI.exe figure applies to).
> 
> Alternatively, you can add to QI's output the size to transmit N, A and
> the frequency table. QI.exe has a command QI cl<int> which computes
> self-delimiting size for <int>, or just "QI cl" to list a table for
> common values. There you get for N=10^6 its self-delimiting length
> L(N)=27.543 and for L(A)=2.49 bits. The cost for frequency table with
> QI/EC is the log of the binomial C(N+A-1,A-1), for N=10^6 and A=3,
> which is log(C(1000002,2))=38.863 bits, which totals (each rounded
> separately, which they don't need to) 28+3+39=70 bits to be added to
> QI's output to match the adaptive AC's coding conditions. Since the
> QI's output was essentially the entropy, the QI's total is 70 at most
> whole bits above the "entropy" (note the "entropy" N*log(3) didn't
> include N; also in high entropy limit QI doesn't need to transmit freq.
> table, but one would need to modify AC to work in high entropy limit,
> so I added table to QI, which distorts a bit comparison to entropy H).
> -----------
> 
> 

   I am just curious you say you need 2.49 bits to code the number 3
where did you get the formula for L(X) is it based on some universal number
scheme where they give a neat way of caulating bits needed for univerasal
coding of large numbers. Again these are bits you have to code them
with whole number of bits don't you? 

>> And was it really a static zero entropy type of compression where
>> each of the 3 symbols assumed equally likely?
> 
> That was answered right there on the top line: the equiprobable symbols
> are "high entropy limit" (as opposed to "low entropy limit" which is
> for highly sparse arrays, which one symbol vastly dominating others).
> 

   Look either it is or is not?


David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/11/2006 11:22:28 PM
> It is a bit nitpicky, although that makes everyone go over the things
> they imagined they knew. A good refresher for the old thoughts and it
> brings in new thoughts which none of us would have thought on their
> own.  These kinds of "nitpicky" arguments take a life on their own.

 After some thought of this (by me) I suggest you to carefully
reformulate
your 'always' message.
 I think something like you've "invented a more exact (in sense of
error) or precise algorithm for performing (arithmetic) coding of
unknown length real-numbers", that "(always) produces less quantization
noise in the coding operation than (for example) the X-bit arithmetic
coder", in result "the (attempt) to code all possible real-numbers
would
result (always) in a shorter code than with the AC".

<context type="adaptive model">
 What's misleading actually is that you (maybe try to say) that you
always
code shorter for _any_ _single_ message. To say that about the _sum_
of _any_ of them is true, as far as I can work it out. To say that that
is true
for nearly all of them is also true, because nearly all of them doesn't
fit
well into the "correct over/underestimation" paradigma (because of the
holy
/pig/eon).
</context>

 I'm trying to filter out true and false in all of these postings that
doesn't
stay in one context and aren't nearly atomic. There is such much mix.

 Ciao
    Niels

0
iso
1/12/2006 2:18:11 AM
David A. Scott wrote:
> "Matt Mahoney" <matmahoney@yahoo.com> wrote in
> > fpaq1 compresses files of all zero bytes as follows:
> > 0 -> 1
> > 1 -> 2
> > 10 -> 5
> > 100 -> 9
> > 1000 -> 13
> > 10^4 -> 17
> > 10^5 -> 21
> > 10^6 -> 25
> > 10^7 -> 29
> > 10^8 -> 34

snip... for 10^6 zero bytes...

>  I feel strongly 23 is most likely optimal and that you should
> with in a byte or so get the same length for fpaq0 if its all zeros
> or all ones. IN your case its like it used to low a probability
> when doing the all ones and used to high when doing all zeros.
> It most have something the fact you don't carry and that for this
> case 32 bits without carry not enough.

I think fpaq1 is behaving correctly for the model it uses.  The fpaq1
model is for 9 bit symbols so there are 9 contexts.  In 8 of these
there are 10^6 zero bits.  In the other (the EOF bit) there are 10^6
zero bits and a 1.

The model adds 1 to all counts.  The probability for n zero bits is
modeled as p = (1/2)(2/3)(3/4)...(n/(n+1)) = 1/(n+1), which codes in
log(n+1) = ~20 bits for n = 10^6.  The extra 1 bit in the EOF stream
has probability 1/(n+2) requiring log(n+2) = ~20 bits.  So the total is
~200 bits = 25 bytes.

Using a bijective coder, the best you can do is get log(25) = ~5 bits
of information from the length of the compressed data.  (I suppose it
depends on how you model the compressed length).

You could improve compression by using a different data model, such as
adding a constant less than 1 to the counts, or modeling a bit stream
instead of a byte stream.

-- Matt Mahoney

0
Matt
1/12/2006 5:29:36 AM
> I assume if your coder is honest that the number you
> got represents the average of 100 runs its very close
>  Very interesting your coder puts out fractional bits.

The fractional bit sizes used are not averages but actual fractional
bit sizes. Their meaning is that if you tell decoder that an index will
take a value from a range  0..M-1, which is M values, then the index
itself has size log(M) bits. For example if I tell decoder: the next
index has M=3, then the size of the index is log(3)=1.58496... bits.

If you have to ship just that single index, the best you can do with it
is a tapered Huffman code, which is 0,10,11 for the 3 values of the
index. That way you will be sending either 1 or 2 bit output, based on
the value of the index itself. The average cost, over multiple and
_separate_ transmissions of such index is 5/3=1.667 bits per shipment,
which is 0.0817 bits above the fractional bit index size log(3), thus
about 5% larger.

If you are shipping several items I1, I2,... (which need not be related
to each other), that have such fractional sizes (meaning there is some
max for each, M1, M2,...known to decoder) you can combine the indexes
into a single mixed radix value V, which in turn is also a fractional
bit item, since its range size is the product of range sizes as: M =
M1*M2*... In this case you would pay a little (such as 5% on avg.
above) _only on the total_ bit fraction for V, while all the individual
item fractions have now been added exactly to obtain fractional size of
V. To compute combined package index V, you interpret I1,I2,... as
digits in mixed radix M1,M2,... and then you convert that number into
binary format using:

     V = I1 + I2*M1 + I3*M1*M2 + ...        (1)

You also calculate M=M(V), which is total range size of V as:

     M = M1 * M2 * M3 * ...                 (2)


Obviosuly, (1) requires arithmetic precision which grows as much as the
size of index V. I have run into this kind of little coding task many
times, and the best one could do with it were Huffman codes, which
would not only fall short (on avg.) of the ideal size log(M) bits few
percent, but the output size would fluctuate from sequence to sequence
so it didn't package well into pre-assigned fixed size space. The ideal
way, of using (1), becomes impractical as soon as V grows beyond 64
bits.

That is one type of problems on which this new quantization scheme, QI,
does its trick: with QI you can compute combined index (1) and its size
(2), for a tiny cost in precision (which is a very small fraction of
bit), using only N-1 regular precision integer multiplies for N items.
Note that the whole array of 10^6 items in alphabet A=3 was computed by
the QI.exe test program as this packaging problem, with M1=M2=...=3.

Let me explain how QI does it for this case, with A=3 and N items being
combined. In that case (1) and (2) become:


     I = D1 + D2*3 + D3*3^2 + D4*3^3 +... DN*3^(N-1)   ... (3)

     M = 3^N     ... (4)

Above, D1,D2,D3,...,DN is the sequence of N digits in base 3, which is
our interpretation of the input sequence. The old style radix
conversion using (3) would need arithmetic precision of log(M) = N
log(3) bits. With QI you do same number of multiplies as in (3), except
in regular integer precision (the QI source is set to 32 bit
precision).

The basic tool QI uses for this is "Sliding Window Integer" (SWI),
which is a numeric type like a floating point (FP) number, except that
it is an integer (see [T3] p. 7). You can look at it as a hybrid of
regular FP numbers and unlimited precision integers. Or like an FP with
more flexibility in rounding and operations. Or like an unlimited
precsion integer with some constraints on its bit pattern.

In any case, an SWI is specified via 3 items: g=precision in bits,
m=mantissa (sliding window, an integer of max width g bits) and
s=exponent (shift, regular integer). Since we'll use fixed g=32, we
don't need to drag g around explicitly any more.

With that, some SWI variable Q is given via a pair of integers Q=(m,s),
meaning Q=m*2^s, i.e. m shifted left s times. The value Q is used in
arithmetic as if it were some long integer m*2^s, but about which we do
know that only 32 bits given in its component m are nonzero and that
there are s zeros follown it. In the source code, the header Qi.h has a
structure SWI, which
is:

typedef union _swi { // SW Integer type
  struct {
    dword m;    // mantissa: 32 bit unsigned
    int e;      // exponent: signed 32 bit int
    };
  qword q;      // 64-bit alias e.g. for 64-bit compares
} SWI;


Otherwise the arithmetic with Q is exactly the same as with a large
integer of that size. The result of such operations is generally not an
SWI, since extra significant bits may be produced. Except that
computing with Q is faster and Q takes much less memory than a large
integer of similar magnitude (which would be g+s bits wide). Note that
for large integers X < 2^32, the SWI mantissa is simply that number X
and exponent s is 0. For larger numbers, s is nonzero and we keep the
32 bit mantissa m normalized i.e.  2^31 <= m < 2^32.

There only one extra operator used for SW numbers which is lacking in
regular large integer arithmetic, and that is rounding, which is
rounding up to g precision i.e. any large integer X is converted into
SWI format number Q by copying leading g bits of X into intger m and
placing into s the count of bits remaing in the tail of X (tail: bits
we didn't copy into m), then if the tail had any nonzero bit we
increment m (on overflow we renormalize it and increment the exponent
s). Since this is important operation below, I will denote SW rounding
of X as {X}sw.

The QI source has a file Swi.c which implements the SWI arithmetic (as
needed by the rest).

The QI method applied to our problem in eq. (3) breaks into 2 phases:

 a) Compute quantized power array Q[i]=A^i for i=0,1,...N
    (its elements are of type SWI):

    SWI Q[N+1]={1,3, 3^2, 3^3,..., 3^N};

 b) Use array Q[] to compute I via radix expansion, eq. (3).


To compute power Q[i+1] for radix A we use Q[i+1]=Q[i]*A, and we
initialize Q[0]=(1,0), Q[1]=(A,0)  (see also function new_radix() in
Radix.c for an actual implementation). So far this is the same as if
one were doing regular power table for radix conversions.

The key new element QI brings in here is the handling of the
multiplication Q[i]*A. If Q[i] were just a 32 bit integer, the product
would be a 64 bit integer. Denote Q[i]=(m,s), where m=mantissa of Q[i]
and s=exponent of Q[i]. The SWI arithmetic works like large integer,
hece Q[i]*A = (m*A)*2^2. The result is not SWI any more. We convert it
to SWI using rounding up operator {}sw (which yields SWI variable):

       Q[i+1] = { m*A*2^s }sw    ...   (4)

The function implementing mutiplication with rounding up in (4) is
given as: SWI swuMul(SWI x,dword y) in the file Swi.c.

**Important note is that (4) is the only place we will use rounding up.
>From here on, _all_ operations are _exact_.

With the array Q[] computed, we apply (3) to compute index (which is a
binary value of radix 3 number) for sequence of digits
D[N]={D1,D2,...,DN}. The basic step in (3) is multiplication Di*Q[i],
i=1,2...N, and addition of the result to the sum I. The product
X=Di*Q[i]=Di*m*2^s is a large integer with up to 64 nonzero bits,
followed by s trailing zeros from Q[i]. Hence adding X to I is done by
adding the 64 bit integer Di*m into the buffer I at the bit offset s.
The source file Swi.c has a function swuMulAdd() which combines this
multiplication Di*Q[i] and addition into I at position s. (There are
also functions swuMulx() and swxAdd() which can perform the same in two
separate steps.)

The result of all this is the index I. The number of bits in I is
determined using the fact that I is always smaller than Q[N] (I can
take values from 0 to Q[N]-1). Hence the number of bits in I is
L(I)=log(Q[N]). Since Q[N]=(m,s)=m*2^s, L(I)=s+log(m). Function
radix_szf() in Radix.c computes this fractional size from given N
digits and power array Q[],  while function radix_sz() returns the
rounded up integer number of bits.

Note that because of rounding up in (4), value Q[N] is little bit
larger than the exact integer A^N, therefore log(Q[N]) will be larger
than log(A^N)=N*log(A) (that's the number shown as 'exact entropy').
The difference between the two is 1.62e-4 bits for A=3 and N=10^6. In
the earlier post I used rounded up to next integer value for L(I). That
packaging wastes about 1/2 bits. Since decoder will have Q[N] table as
well, we can use mantissa m of Q[N] to package the upper 32 bits of I
(which are extracted at the position s in buffer I) via tapered Huffman
codes. In the test we had: Q[N]=(0xB521509A,1584931) and the average
tapered Huffman length for x < 0xB521509A is 31.5867
bits, hence that would get us on average to within 0.087 bits from the
exact fractional bit value for the whole index I.

> You mention in paper there are 3 parts. One part is the
> lenght of thing compressed and the other is the number
> of ones and the third which I assume the above is the index.

That's correct. The index alone is generally smaller than entropy (for
binary coder: by 1/2 log(n) bits), but the combined count of 1's, which
needs log(n+1) bits, plus index length are longer than the entropy by
approx E=1/2 log(n) bits. The entropy formula for (binary case, p=p(1),
q=p(0), p+q=1):

       H(n)=n* [p*log(1/p)+q*log(1/q)]   ...  (5)

does not include cost of sending n and p, but does include cost of
sending count of 1's, integer k. If coder knows p, then it can encode k
in about 1/2 log(n) bits, hence you code without the earelier excess of
E=1/2 log(n).

Note than in high entropy limit, such as our test case, the exact index
(3) is same as entropy, while the quantized index is slightly larger.
But in high entropy case, coder doesn't need to send any frequency
table since all posible frequencies are enumerated into the same index,
hence the index produce d by QI already has that built in. You do need
to send A and N, which will cost you about 30 bits (the entropy formula
n*log(3) didn't count cost of sending these two either, hence adding
these two items to our size doesn't change our distance from entropy).


> So lets not be generous and round down. from the
> combination part you get 1584962 for the rest of
> needed overhead you need say 69 bits.
> Would it be far to say that you compressed the
> 1000000 sybmols to 1585031 bits,

Roughly, withing your rounding, yes. That is the figure to
match the conditions of an adaptive AC (which I ran in the test).

That is not what we need to send to decode it, though. We don't need to
send frequency table in high entropy limit coding. That frequencty
table size was 39 bits. It was added only to match the cost adaptive AC
had to pay to adapt (which is approximately same as the cost of sending
the frequency table explicitly).

Hence, rounding up sizes for A to 3 bits and for N to 27 bits, you need
30 bits. Rounding up the index, you need 1584963 bits.
Hence the total decodable output for N=10^6 symbols, A=3 is:

     DECODABLE OUTPUT = 1584993 bits = 198124.125 bytes

That size is fixed, the same for all inputs. You can check the whole
code-decode-compare in the function radix_iter() in Tests.c file. To
verify the fixed size claim, you can corrupt the bits beyond the
declared size of compressed index and verify that it decodes. Or you
can check dec_radix() function in Radix.c and verify that the first
thing it does is to obtain this same index size in bits and then it
extracts the leading 32 bits of the index using the calculated end (it
decodes it from the end, the last bit is the most significant bit of
the index).

> Let's be clear about this the list is for 3 symbols the
> total bits only refers to the compressed combinations
> and file lenght it does not carry in those bits just
> exactly what those symbols are.

The coder assumes symbols 0,1,2,...A-1. If they are anything else you
need to send separately the A values as a simple array matching the
order of assumed values 0,1,2..., in the input. The array of items is
in whatever size each may be (they could be 64 bit integers, or each
item can be different size from others).

> I am just curious you say you need 2.49 bits to code
> the number 3 where did you get the formula for L(X)
> is it based on some universal number scheme where they
> give a neat way of caulating bits needed for univerasal
> coding of large numbers.

The self-delimiting length is a number log(n)+log(log(n))+...
(as calculated by function sdlen() in Qiutl.c). That's how many bits
one needs to code integers of arbitrary size about which no limit is
known to the coder. Explicit codes exist (such as Elias omega) which
approximate these fractional size with integers, averaging about same
for large enough samples. You can check [1] for a survey and list of
codes.

> Again these are bits you have
> to code them with whole number of bits don't you?

As with other fractional counts, no, you don't need to pay the full
rounding up cost. Since you have A and N  (both are self-delimiting in
general case, with no upfront limits on A & N), you can enumerate the
two together and send just one self-delimiting number, so you round
just one fraction for 2 numbers. If you're really after every last
fraction, you can also enumerate the two together with the top 32 bits
of index, so you only pay a single fractional bit rounding on the total
output.


-- References ( http://www.1stworks.com/ref/RefLib.htm )

T1-T3 are on http://www.1stworks.com/ref/qi.htm

1. Peter Fenwick "Punctured Elias Codes for variable-length coding of
the integers (1996)"
http://citeseer.ist.psu.edu/fenwick96punctured.html

0
nightlight
1/12/2006 8:52:06 AM
Matt Mahoney wrote:
> David A. Scott wrote:
> >    Matt if I can get nightlight to commit to coding his example of
> > the 3 symols types. I would like to play again with fpaq0. To see
> > how much better it can be made with as little change as possible.
> > I like your style but I don't think I will go to the wall and make
> > it bijective. But the nine times for each eight can be changed to
> > eight for eight with a ninth only needed for the last byte.
>
> There is some room for improvement.  I tried compressing 10,000,000
> bytes of random charaters A, B, C.  fpaq0 compresses it to 1,982,988
> bytes.  The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of
> 1785 bytes.  For 1,000,000 bytes it compresses to 198,322 bytes, a
> difference of 201.7 bytes.
>

   actually the ideal length is 10,000,000 * lg(3)/8 which is
1981203.12590144522681717367993477 bytes.
1981203  is actually not enough you need to round up
to get 1981204 bytes.   That what I get using the
two files of  10,000,000 bytes first is 333,334 bytes of A
followed by 333,333 bytes of B followed by 333,333 bytes
of C   The same with a file of  10,000,000 bytes that are
ABCABC.... ABCA  Both compress to same size check
it out. Its a plain simple bijective coders useing old methods
or at least thats what some think. The code is straight forward
and easy to use.

see

http://bijective.dogma.net/nitelite.zip


David Scott

PS  I may be drunk Take it for what its worht trust no one
who can't test with real code. Don't even trust me
test it. Remember I am an idoit who know  belives
I think some one drugged me.  Take it with a grain
of salt.

0
davvid_a_scott
1/13/2006 5:39:07 AM
Since the debate has wound down, here is a condensed summary of the
differences between QI and arithmetic coding (AC) all in one place.


QI is an advance of enumerative coding (EC) which solves
optimally the fundamental problem of unlimited precision EC.
The solution of this problem has been sought throughout the
four decades since Lynch-Davisson 1966 coder, with various
attempts extending to at least year 2003. All such attempts
resulted in partial solutions only, involving significant
tradeoffs in each case. The arithmetic coding (AC) itself,
arose as one such attempt by Rissanen in 1970s, and it could
be viewed as the most successful among such partial and
sub-optimal solutions of the EC precision problem (that is
how Rissanen viewed it in his early AC papers as well).

For the rest of this post I will address the QI's solutions
for the four principal remaining weaknesses and tradeoffs
introduced by the AC algorithm.


A1) -- SPEED & POWER CONSUMPTION --

The AC use of complex, power hungry instructions (mul/div when
coding at its maximum precision) and the requirement for coding
operations on the more probable symbol (MPS) results in speed &
power consumption penalties (the latter becoming increasingly
important). In contrast, QI performs no coding operations on the
MPS and it uses fewer and simpler instructions on the less
probable symbol (LPS). These two performance distinctions extend
to a general alphabet of size A coding, through all A-1 internal
nodes of the binary decomposition of the alphabet (cf. [T1], pp.
32-38).

Variety of details giving rise to the significant QI speed
advantage in different coding settings fit together in a general
pattern of QI's much better division of labor at all levels --
within the functions of the coder itself and extending to a
similarly improved division of labor between the coder and the
modeling engine.

Within the coder proper, QI separates cleanly the general
combinatorial properties of all symbol sequences satisfying some
types of conditions (the "enumerative classes") from the
incidental properties distinguishing individual sequences of
that type. The general part (the quantized enumerative addends
via eq. (21) p. 8, [T3]) is computed up front, once and for all
and outside of the coding loop, with the results saved into
universal tables (which are independent of source
probabilities). The coding loop for a particular instance does
only the absolute minimum work that deals exclusively with the
individual properties of that instance (the index computation,
eqs. (21),(22) p. 8 [T3]).

Similarly, QI/EC modeling engine (cf. p. 27, [T2]) processes the
entire finite sequence being encoded, decides on its
decomposition into enumerative classes (ranging from simple
segmentation of the input into fixed or variable contiguous
blocks, through BW transform and selection of the optimal BW
output column segments), then hands its complete output to the
suitable enumerators within the encoder for index computation
within the selected enumerative classes. Both components, the
modeler and the coder (enumerator), perform their specialized
tasks on the entire input, without interacting symbol by symbol
as done with AC.

Therefore, the speed & power consumption edge of QI over AC is
not a result of a coding trick or a lucky pick of parameters
which happens to work well in some cases or any such accidental
circumstance. The QI speed gains are large for all inputs -- for
all source parameters and for all input sizes. It is a
fundamentally much more efficient way to do the coding, in the
same way that Henry Ford's production line was a much more
efficient way to build cars than, say, organizing the same
number of workers & machines so that they all work together on
the same, single car, from the raw materials until it is out the
door, and only then start on the single next car. The latter
organization corresponds closely to the division of labor used
by AC, within the coder itself and between the coder and its
modeling engine (car <=> codeword, materials for one car <=> one
input symbol, workers/machines <=> coder & modeler functions).


A2) -- PROBABILISTIC PARAMETRIZATION --

The AC reparametrization of EC enumeration of finite sequences
into the probabilistic framework (where the exact combinatorial
parameters of finite sequences are replaced with normalized
limit values of infinite sequences), can generally provide only
a lower resolution, approximate models for a much richer space
of finite sequences coding problems (which includes all
practical coding tasks). In contrast, QI modeling interface uses
precise finite sequence parameters, which are richer, sharper
and more flexible language for modeling finite sequences.

As result, in the field of constrained coding (used e.g. in
recording media and general constrained channel coding), where
such finer controls over the precise finite sequence parameters
are vital, EC remains the method of choice, despite the
intervening advent of AC and the performance drawbacks of the
unlimited precision EC.

In the wider realm of practical coding, AC's loss of resolution
in the space of possible parametrizations has generally narrowed
down the spectrum of modeling algorithms useful with AC, to
essentially the PPM & CTW type of algorithms as the apex of
modeling. It has also constrained the language and the type of
parameters that a modeling engine can use to transmit all it
knows about the finite sequence being encoded to the coders,
reducing it in practice to 'probabilities of the next single
symbol'. Yet, presently the most widespread and the most
practical general compression algorithms, such as LZ & BWT
families, perform "surprisingly" effective modeling of finite
sequences in what are intrinsically the finite sequence
parametrizations (dynamic dictionary entries for LZ, or context
sorting via BW block transform for BWT) without ever computing
'probabilities of the next single symbol' (or any probabilities
at all).

Another side-effect of the ill-fitting parametrization for
enumeration of finite sequences, is the performance penalty.
Specifically, if one were to try emulating with AC the
streamlined table based coding of QI (Schalkwijk has shown how
this can be done for Elias algorithm, cf. p. 19-20 [T2]), so
that AC would need encoding operations only for LPS, while
skipping the MPS, and for LPS to have addends precomputed and
ready in the table, one would need separate table of the same
size as QI's table, for each source probability distribution. In
binary case this would increase the table size by factor O(n)
over the QI's universal table size, which is the factor QI
gained over the exact EC (i.e. AC did not solve at all the EC
table size problem arising in the high speed mode). In short,
the probabilistic parametrization lacks the resolution to draw
the precise line separating cleanly the universal (which could
be precomputed into a table) from the instance enumerative
properties of the finite symbol sequences.


A3) -- CODING PRECISION (REDUNDANCY) --

The AC choice (A2) is accomplished through Stirling and several
further approximations of the enumeration itself (which are in
addition to the finite precision approximation, cf. [T3] p. 2,
[T2] pp. 22-25), which taken together with suboptimal AC
quantization results in the AC output excess of the general
form:

    D = O(log(N)) + O(N) + O(1)     ...  (1)

bits over the optimum finite precision solutions computed by QI.

The O(log(N)) type terms in D are due to excess in the cost of
transmitting probabilities via 'learning' for an adaptive AC or
failing to account for effects of shortening of the rest of a
finite sequence during coding for a static AC (side-effect of
(A2)). These costs can be in some situations largely avoided by
AC e.g. by using KT estimator for Bernoulli sources or the
'faithful probabilities' (the decrementing AC in [34]). But even
when applicable these methods involve tradeoffs (e.g. speed
penalty and a complete disconnect from the conventional AC
modeling engine for a decrementing AC, while for KT, the
increased pointwise excess in all but the low entropy density
range and the increased average excess on composite sources).

The term of O(N) type in D, which is the result of QI's optimal
bottom up vs AC's suboptimal top down index quantization (and
the resulting AC truncation of generally infinite fractions,
which are one of the consequences of the ill-fitting finite
sequence parametrization (A2) e.g. when dividing its current
interval among the alphabet symbols, which is absent in QI),
despite appearing significant in D, dominates only for large
number of symbols N or for large alphabets A, or for the
intermediate sizes of the two when present together. That term
is approximately:

     O(N) = 2*N*(2A-1)*log(e)/2^g     ... (2)

bits. { g is the coder arithmetic precision, A alphabet size,
cf. eq. (20) p. 14 [41a] for AC and d(g) p. 8 in [T3] + Radix.c
in [QIC] for QI, see function enc_radix() and note that function
it calls swuMulAdd() uses exact mul & add without rounding, thus
it does no quantization on the interval partition between the A
alphabet symbols, while AC quantizes these A partitions; you can
also verify that QI's redundancy does not grow with alphabet by
running QI.exe with option "cr<radix>" and look at the excess
term shown as Q = .,. = E+<absolute excess>.}

Note that in our earlier 'contest' we had used A=3, which is via
(2) the most favorable A>2 test for AC since using the smallest
non-binary alphabet size A minimizes the O(N) term. Had we used
a 32 bit value A32 for the alphabet size, QI's redundancy would
remain unchanged, while AC's O(N) redundancy would increase by
a large factor A32/3 (note though that O(N) is not the total AC
excess over QI in (1), hence the total excess would not grow by
the same factor A32/3).

For the binary alphabet, A=2, eq. (2) yields for the O(N) term
of the full difference D:

     O(N) = 6*N*log(e)/2^g     ... (3)

Or, expressed as a ratio of AC's over QI's maximum quantization
redundancies:

     AC(N) / QI(N) = 4        ... (4)

i.e. even for binary alphabets, which is the best case for AC,
the AC's sub-optimal quantization "leaks" four times more
excess bits per symbol than QI's optimal quantization.

{ Note: although the expressions (2),(3) refer to the maximum
redundancies of the coders, the average and pointwise
redundancies are 2-3 smaller for both coders. Since there are no
such simple closed forms expressions for these, one can only
measure them. The executable QI.exe included in [QIC] has a
command line option "ct" which measures QI's quantization
redundancies, maximum and average (and several others) on _all_
quantized binomials for N up to 2^20 (this max N can be changed
by editing the constant "#define MAXTN 1*MEG" in Intro.h).}

Finally, the terms of O(1) type in D, which are 2-4 bits
(depending on inputs & specific AC & QI implementations),
despite small absolute size, may dominate the difference D when
the outputs themselves are very small (e.g. in low entropy limit
or generally for short outputs at any entropy rate).

The empirically observed compression efficiency differences
shown in the QI vs AC performance table (p. 10, [T3]) are
dominated (in all but the last row) by the contributions of the
O(log(N)) and O(1) terms in D.

Although these differences are clearly quite small, such
observation is somewhat tautological, since the types of inputs
considered (probabilistic sources with sufficiently large N) are
those on which AC or Huffman coders perform reasonably well.
These inputs are merely a subset of all practically important
finite sequences for which O(log(N)) and O(1) terms don't amount
to very much relative to the output size. A high precision
coding tool such as QI, or even the unlimited precision EC, will
by the very definition of such subset, improve very little over
AC or Huffman on that subset. To put it another way, if we were
to take modern high precision surgical instruments to ancient
Roman physicians to test and evaluate, they would find very
little if any gain for "surgery" with the new instruments. Of
course, what they considered "surgery" consisted only of
procedures which could be done well and safely with the more
blunt instruments they had. Much of the vast realm of present
day surgical procedures, which are accessible only to the high
precision modern instruments, would have been invisible or well
outside of what they understood as a conceivable "surgery".

Similarly, there is a large unexplored realm of practically
important finite sequences and associated modeling and
compression algorithms for such sequences, all virtually
invisible from the probabilistic parametrization &
predictive/adaptive modeling viewpoint. From that vantage point,
such sequences typically appear as highly fragmented and very
unpredictable from one fragment to the next, with "low" intra-
fragment entropy, but "high" inter-fragment (combined) entropy,
thus they are incompressible by virtue of noise from the O(1)
and O(log(N)) excess terms in D becoming comparable in size to
the encoded fragment sizes. QI, being a high precision ultra low
noise coding instrument, optimal at any given arithmetic
precision, yet _simultaneously_ extremely fast across the
spectrum of inputs and universally applicable, opens the gates
into this untapped realm of finite sequences, modeling and
compression algorithms, in which the relative gains vs AC or
Huffman are not limited to mere few percent or even to 100%.

To illustrate this point, consider the BW transform output
column R (cf. Fig. 5, pp 34-35, [T2]). For increasingly long
right contexts of the symbols in R, R becomes increasingly
fragmented, with low intra-fragment and high inter-fragment
entropy (when partitioned at MDL optimum), which are well beyond
the useful resolution of AC or Huffman to encode separately and
thus optimally (both perform very poorly when applied directly
to R, be it in one piece or at any level of fragmentation). The
optimum segementation of R (in the MDL sense, which is
computable via Huffman-like bottom up "MDL greedy" contexts
merging), will generally vary in context depth across R, and it
is a function of coder precision & noise, with longer contexts,
thus finer partitions, accessible only to the high enough
precision coders. In the absence of high precision low noise
coder, R is presently coded with blunt ad hoc tools, such as
MTF, which obliterates all the context boundaries, so finely
drawn just moment before by the BW transform. The MTF is then
usually followed by one or more of runlength, Elias, Huffman
codes, sometimes by AC (with little practical benefit to
compensate for the speed penalty). As a simple indicator of the
degree of sub- optimality introduced by the existent second
phase BWT methods, the output size is measurably sensitive even
to the mere 1-1 remapping of the alphabet (sensitive enough,
that specialized algorithms have been developed to search, a la
TSP, for the optimum BWT alphabet mappings).

In addition to the general compression scenario where BWT is
used, many specialized problems of great practical importance
offer a glimpse at this same unexplored realm waiting for the
high precision, low noise coder. Most notable among these are
incremental frame updates in video coding, be it for live video
or for recording & playback media. The residue after the inter-
frame redundancies removal, including the motion compensation,
are collection of small, uncorrelated fragments with relatively
low intra- fragment and high inter-fragment entropies, but where
the O(1) and O(log(N)) terms of AC or Huffman, accumulated over
multiple fragments, would wipe out any gains if one were to try
coding such fragments separately. Even the conventional EC in a
suboptimal hierarchical approximation has been found overall
advantageous vs AC in this domain. A similar conflict between
the optimum MDL segementation vs. the coder noise is encountered
in various forms across the image coding domains, especially for
methods based on hierarchical set partitioning (cf. citations on
p. 2 [T3]).

As the last example of practically important coding tasks in the
high precision coder realm, we note the coding of complex data
packages, which often arise as result of serialization (e.g. for
network transmission, messaging, storage, output of
interpreters, etc), and which contain many small elements,
mutually unrelated at any level where a coder or its modeler
would have access to, and which have low intra-fragment and high
inter-fragment entropies. In most cases, trying to code such
data via AC or Huffman would likely increase the size, thus one
often simply stores such data as given, or at best, if the
programmer time is inexpensive enough, uses some minor
specialized ad hoc tidying up. For this types of 'complex data
package' coding tasks QI provides not only the needed high
precision coding, but its advantage (A4) described below, makes
possible the traversal and the random access to the individual
package components without having to decompress any components
or store separate compressed lengths or pad the compressed sizes
to some fixed maximum lengths.


A4) -- STABLE & AVAILABLE COMPRESSED SIZE --

The output size produced by AC (and even more so for Huffman)
fluctuates unpredictably from instance to instance even for
perfectly equiprobable messages. This is a manifestation of
their coding sub-optimality (A3) and their lower resolution
parametrization (A2). In contrast, the output of QI is perfectly
stable for equiprobable messages (enumerative class), not just
down to 1 bit, but down to the exact bit fraction. Namely, since
in a suitable coding setup, the QI's precisely known upper bound
on the value of the index allows encoding of the leading g bits
of such index via the mixed radix codes (e.g. bundled with other
related or unrelated items into a common mixed radix number, cf.
N3 & N4, p. 9, [T3]), that means that QI can code such
equiprobable messages to sizes identical to the exact bit
fraction. AC normalizes all index bounds to 1.00.. (following
its (A2) parametrization prescription), which obliterates the
precise upper bounds for the index which exist for finite
sequence enumeration.

Consider, for example, a permutation of N elements (cf. pp.
49-50 in [T1], also permutation coder in Radix.c [QIC]). QI will
encode every instance of such permutation into the log(N!) bits
to the exact bit fraction (which is within 1/2^(g-1) from
log(N!)). AC, even coding in its most exact mode (enumerative AC
mode of [34]) will produce at least O(1) variation in output
size from instance to instance, and for the conventional
adaptive or static AC implementations, also the O(log(N))
variation, while Huffman will produce a huge O(N) variation.
Therefore, if one has to store such output in a fixed size field
(as often required for compressed database/search engine fields
or when bundled into fixed structure packets), with AC & Huffman
one has to reserve the space for the worst case input instance
(the optimum estimate of which, especially if 100% guarantee is
required, may be non-trivial for AC; while Huffman becomes
largely unsuitable due to its huge O(N) variation), adding to
redundancy, lowering performance and significantly complicating
the development of such compressors.

In addition to stable output size for stable input entropy, QI
has the actual precise compressed size (precise down to 1/2^g
bit fraction) readily available from the "enumerative class tag"
such as N for permutations (cf. p. 3 [T3], more detail in [T2]
p. 27), which is the info which it needs to know for decoding,
anyway. But QI can have this precise compressed size _without
decoding_ the index and without storing the compressed size
separately -- it is available precomputed, from its quantization
tables (e.g. quantized N! or quantized binomial C(n,k)). In
contrast, to find out the compressed size without storing it
separately, AC would need to decode the entire output, with
possible overruns beyond the end of the compressed buffer (that
occurs when AC codes most tightly, in "frugal bit" mode, where
it reads up to g bits beyond the end of compressed data, then
after decoding it "pushes" the unused bits back, which means an
outside information needs to be kept to prevent it from
accessing unmapped pages, or a g bit padding needs to be added).
Similarly, Huffman coder would need to either decode the entire
permutation, or to store the compressed size separately.

The combination of availability and stability of output size,
makes QI ideal for data base & search engine coding tasks, where
one needs not just the stable and predictable compressed size,
but also the ability to quickly traverse and randomly access
compressed components of the complex packages (db records,
serialized packages, compressed structured documents, such as
html & xml pages, spreadsheets, etc), without decompression.

As noted at the end of (A3), this QI property, even though it is
merely another facet of its high precision, low noise coding, is
an entirely separate lever from the redundancy considerations of
(A3), contributing independently to the 'opening the gate' into
the presently untapped realm of high precision coding.


=======  ONE COMMON QUESTION =======

> Willem: ... assuming I have a stream of symbols, where
> at each position in the stream, the probability distribution
> of the symbols is different, then how does QI coder adapt
> itself to all those different distributions ?

The answer was given as methods (a) and (b) described in the post:

http://groups.google.com/group/comp.compression/msg/1314ff87da597fad



--- References ( http://www.1stworks.com/ref/RefLib.htm )

QIC. QI C source code research kit, publicly available at:
     http://www.1stworks.com/ref/qi.htm

T1.  R.V. Tomic "Fast, optimal entropy coder" 1stWorks
     TR04-0815, 52p, Aug 2004
     http://www.1stworks.com/ref/TR/tr04-0815b.pdf

T2.  R.V. Tomic "Quantized indexing: Background information"
     1stWorks TR05-0625, 39p, Jun 2005
     http://www.1stworks.com/ref/TR/tr05-0625a.pdf

T3.  R.V. Tomic "Quantized Indexing: Beyond Arithmetic Coding"
     arXiv  cs.IT/0511057, 10p, Nov 2005
     http://arxiv.org/abs/cs.IT/0511057

34.  J.G. Cleary, I.H. Witten "A Comparison of Enumerative and
     Adaptive Codes" IEEE Trans. Inform. Theory  IT-30 (2),
     306-315, 1984
     http://www.1stworks.com/ref/Cleary84Enum.pdf

41b.  B. Ryabko, A. Fionov "Fast and Space-Efficient Adaptive
      Arithmetic Coding" Proc. 7th IMA Intern. Conf. on
      Cryptography and Coding, 1999
      http://www.1stworks.com/ref/RyabkoAri99.pdf

0
nightlight
1/22/2006 8:41:59 AM
"nightlight" <nightlight@omegapoint.com> wrote in 
news:1137919319.704116.113760@g44g2000cwa.googlegroups.com:

> Since the debate has wound down, here is a condensed summary of the
> differences between QI and arithmetic coding (AC) all in one place.
> 

   It only appeared to wind down when you seemed to give up on the
contest. I assumed you realized you where wrong.

....


> In the wider realm of practical coding, AC's loss of resolution
> in the space of possible parametrizations has generally narrowed
> down the spectrum of modeling algorithms useful with AC, 

  If this was true then from your early statements it
should be childs play to put you coder inside Matts PAQ
which has won several times for the best compressor. It
would make QI shine if this was possible. All you have 
shown so far is how easy and flexible arithemtic codeing
is when you successfully modifed moffat's code to fit your
needs. Appearently so far it was to hard to get QI to fit
in the world of Moffat and use real files so anyone could
check with the files of there choice.

.......


> 
> Note that in our earlier 'contest' we had used A=3, which is via
> (2) the most favorable A>2 test for AC since using the smallest
> non-binary alphabet size A minimizes the O(N) term. Had we used
> a 32 bit value A32 for the alphabet size, QI's redundancy would


   Look you picked the 3 alphabet size so that AC would look bad
I guess that here you recognized that QI can't start to compare
to AC for the "contest" It is nice to have something that works
on files isn't it? Not sure if I trust you on the A32 contest or
what ever you mean by that you have yet to show code for the other
contest that can use files. First of all not sure you have a grasp
of what you are proposing. If one had such a large number of symbol
2**32 its not likely that for any reasonable length file any symobol
would appear more than once. Have you actually thought this through?
Just wondering I should know better than to ask but have you based this
so called error on the number of symbols. Or have you used what I
think most here would like to see based on FILE LENGTH. You can't get
many symbols in a test file if they are 32 bits in length. If its
based only on the number of symbols. The test files are going to
be very long before you get to the same number of symbols in a short
file of only three symbol types. In fact if they are completely random
and independent of each other its tough to beat the copy function.



.....


> i.e. even for binary alphabets, which is the best case for AC,
> the AC's sub-optimal quantization "leaks" four times more
> excess bits per symbol than QI's optimal quantization.
> 

   Strange this best case which I assume is better than the
3 symbol case "leaks four times more excess bits per symbol
than QI's optimal quantization. If this is so why could you
not complete the contest? Why does if appear the poor simple
arithmetic coder from one so low that can not write fancy
articles like you seem to win the contest? Oh I see the key
words "QI's optimal quantization" is that a fancy trick to
mean if you ever understand the problems invovled then in
theory someday in the distant future someone could actually
optimally finish the code so in theory you think it would be
better. That's all nice and interesting but I would like not
to use AC's best case and go with what was originally your
idea my only contribution to this was to suggest making it
really a practical contest and use files. After all you still
must belive that QI could be some how be magically transformed
to do the simple contest. You have already proved how flexable AC
code is by you own work in modifying Moffat surely since QI is so
much better and so more flexable and etc that it would be
childs play to make a version that works on files as in the
"contest".you are refering too.

.....

> compensate for the speed penalty). As a simple indicator of the
> degree of sub- optimality introduced by the existent second
> phase BWT methods, the output size is measurably sensitive even
> to the mere 1-1 remapping of the alphabet (sensitive enough,
> that specialized algorithms have been developed to search, a la
> TSP, for the optimum BWT alphabet mappings).

  I actaully look forward to you attempt to improve BWT I think
it will give me the motivation I need to improve my own code in
that area. I hope that your method works better here then what so
far seems like short coming from the "contest" In BWT I don't
have the best or close to it. But it might be fun to work
on it if QI and your programming team ever make a file version
so others can test it. GOOD LUCK

.....


> 
> A4) -- STABLE & AVAILABLE COMPRESSED SIZE --
> 
> The output size produced by AC (and even more so for Huffman)
> fluctuates unpredictably from instance to instance even for
> perfectly equiprobable messages. This is a manifestation of
> their coding sub-optimality (A3) and their lower resolution
> parametrization (A2). In contrast, the output of QI is perfectly
> stable for equiprobable messages (enumerative class), not just
> down to 1 bit, but down to the exact bit fraction.

  It's interesting to note that the file size I compressed to for
the contest didn't seem to fuctuate at all for a wide range of
input messages which were all the came length and perfectly
equiprobale. However as noted in the counting therom There can
not be this perfect match up. There will have to be some message
length for which some perfectly equiprobable messages map to
some N and some N+1. This again is the result of the counting
theorm either you have found a clever way to avoid this problem.
Or you have gaps in your final compression thus wasting space
which does not seem to happen in simple bijective arithmetic
coders.
 
  I love this its really funny. QI "down to the
exact bit fraction" Its really funny when you think
about it. I have bad news this knowing it down to
the exact bit fraction did not seem to help in the
contest. Why is that? Is it beacuse you still don't
understand the full scope of the problem when it
comes to compressing. If you had an understanding of
the problem you would not have put AC down with what
you seemed to first think would cause a problem the
compression of 3 symbols "the contest". Now it seems
that its not so bad as you first thought is it? 
  Just what is it about no gaps and bijective to the
set of files we are trying to compress that you don't
yet seem able to grasp?
  It reminds me when in high school I won the school
championship. The guy I was playing at the end I was
not a member of the chess club at the time. Stated
he was hot and read all the books. He asked why would
a freshman even enter the contest. I had not read books
He could think ahead several moves. I only one. Well I
beat the cry baby he had all kinds of excuses and was
real read up on chess. I played a few games for fun and
waxed him over and over. Yes you are better read than
me but can you write a real compression program even
a simple one for the contest that QI should have easily
won. Well I don't know so far I see a lot of quoting of
text which would make you a good tech writter but when
the shoe leather has to hit the ground I don't see anything
yet. But then again thats only what it looks like to me
surely its different for other people.





David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/22/2006 4:07:10 PM
> Appearently so far it was to hard to get QI to
> fit in the world of Moffat and use real files
> so anyone could check with the files of there
> choice.

QI source code & executable (which are available for anyone to
verify as they wish), gives you OutSizeQI in bits to use for
the compression ratio vs any AC you may have:

     RC = OutSizeAC [bits] / OutSizeQI [bits]  ... (1)

using memory buffers that you can fill with any data you wish.

When you can make a rational explanation as to how writing the
buffers to a file and reading them from it, can change RC in
(1), then I will consider your suggestion for "upgrading" the
QIC kit with file I/O. So far, you haven't given a _single
rational reason_ as to how RC in (1) can change by mere writing
& reading the memory buffers to/from the disk. We all know why
you don't want to say it, see the method (b) described here:

http://groups.google.com/group/comp.compression/msg/8d62e372056d9d53

All you have been offering is a repetitive lame whine like the
one above: "so anyone could check with the files of their
choice..." which is a complete BS, given that with the QI source
_publicly available_ anyone can already check on any input
pattern given to the coder via a memory buffer (which includes,
obviously, any pattern you may read & write from/to a file).

When you give a rational reason that is consistent with all the
facts, such as the availability of QI source at:

  http://www.1stworks.com/ref/qi.htm

and the intended audience & purpose for the source release:

http://groups.google.com/group/comp.compression/msg/6d8dcafd8b947ea1

then I will listen. Otherwise you might as well insist that the
file used in the QI "upgrade" you are demanding for "real" test
must also be named as SCOTT1234.txt to count as a "real" test in
your book (which will affect RC in (1) by about as much as
any other file name, or any file at all).

A difference that adding file i/o would make, though, regarding
the QI source kit would be to lower the signal to noise ratio
on: (A1) the speed advantage over AC, as well on (A3) (the
optimality e.g. by insisting on counting output sizes in bytes
only) and (A4) (stability & availability of the precise output
size). Why would I care, given the absence of any rational basis
to believe that RC in (1) will change from writing memory
buffers to a file, to waste time and to expand the QI source
size just to add gratuitous noise on the top of the (A1),(A3) &
(A4) signals?


> Look you picked the 3 alphabet size so that AC would
> look bad...

It was you who proposed A=3 test (your ABC), not me. I only
responded when Matt Mahoney reported his first results on your
ABC test "challenge":

http://groups.google.com/group/comp.compression/msg/eb1fed7f8181bd31

that the QI.exe included in the source kit already has a command
line which can give you the answer for symbols 0,1,2 instead of
ABC for N=10^6 (or with recompiling for N=10^7). My response on
Matt's post was here:

http://groups.google.com/group/comp.compression/msg/ff1ee67d18b63f5a

which you questioned (without apparently reading it at all since
all you asked was already answered in the post). Then all of it
re-re-re-explained to you again here:

http://groups.google.com/group/comp.compression/msg/508e96ebcb4577f1

and more here (on fractional bit sizes):

http://groups.google.com/group/comp.compression/msg/1e015f38d228969e

and here few more times:

http://groups.google.com/group/comp.compression/msg/6d77316462d9ee42
http://groups.google.com/group/comp.compression/msg/1a6773533448f7de
http://groups.google.com/group/comp.compression/msg/69ff2ec175f1a5ed
http://groups.google.com/group/comp.compression/msg/a7be8670d64e9f25


.... etc, round and around. You ought to go read all that before
asking the same things for the 30-th time.

As to the tests results on A3, Matt showed his results, I got
Moffat98 results, all consistently showing significant AC output
excess over QI. Since you rejected to test (or at least didn't
report your results) on a proper random sample, as explained in
the posts above, you have no claim to be compared to anything.

The point made in (A2) with A32, which for whatever reason you
keep misstating exactly upside down in your post above, is that
some larger alphabet A32 >> 3 will only increase the O(N)
component of the AC excess vs QI, given as D in (A2). This
component will grow as A32/3. QI.exe will show you the excess
for any A up to A=2^32-1 (as already explained, use QI.exe cmd
line "cr<alphabet_size_A>" to see it).

> If this is so why could you not complete the contest?

There is no contest left regarding the ratio of output sizes RC.
As to your loop:

  while(1) printf("Yeah, but what about file i/o?");

you first need to explain how the file i/o will change the ratio
RC in (1) the answer for which we already know for memory buffer
tests, Mahoney & Moffat98 providing AC sizes in (1) and QI.exe
providing QI size in (1) (which anyone can verify with QI source
kit as explained 30 times before).

> First of all not sure you have a grasp of what you are proposing.
> If one had such a large number of symbol 2**32 its not likely
> that for any reasonable length file any symobol would appear
> more than once. Have you actually thought this through?

It appears, someone else here might need a bit of "grasping".
Take some alphabet size A such that 2^31 < A < 2^32, for our N=
10^6 and run your AC on it. For example, I just ran QI.exe using
the command line (you are welcome to verify this):

     QI  cr3133061822  n1e6  i10

which is a test for A=3,133,061,822 N=10^6 and i=10 iterations
with randomly generated inputs (the iter doesn't matter for QI
since, as pointed out in (A4), QI produces always exactly the
same size, including the max index value which in this case will
always have the top 32 bits smaller than 0x8865BAF8) and the QI
output size for the index is:

  Q = 31544926.09167... bits

which is about 2.422e-4 bits above the lower bound N*log(A) for
this index. That amounts to QI excess of 2.422e-10 bits per
symbol. You can run your AC on this A & N and tell me what you
get. Measure the total size and the maximum variation in size
(with AC you get only whole bits since its exact upper bound on
the index is obliterated due to normalization of index to 1.0,
which in turn is done to comply with AC's coarse-grained
probabilistic parametrization for finite sequences, enumeration
them on the cumulative probability scale, as explained in (A2)).

Recall also that just storing 32 bits/symbol for N=10^6 will use
455073 bits _more_ than the QI's output 31544927 in whole bits.

> It's interesting to note that the file size I compressed
> to for the contest didn't seem to fuctuate at all for a
> wide range of input messages which were all the came
> length and perfectly equiprobale. ...
> I love this its really funny. QI "down to the exact bit
> fraction" Its really funny when you think about it.

The question is not "file size" (which is rounded to next byte
at best, or even sector or cluster on the disk) but size in bits
(or bit fractions). As explained in (A4) on several examples,
there are many practically important cases when size in bits,
and even size in bit fractions matters (whenever you have many
such little leftovers). That you can imagine or cite cases where
such precision doesn't matter, such as when storing output into
a free size file on the disk, is a non sequitur for the point
made in (A4). The point in (A4) is that there are practically
important cases where such precision does matter and there QI is
distinctly ahead of AC (let alone Huffman).

That the bit fractions do matter (and what they mean), it should
be already obvious even from our tests on A=3, where we were
trying to pack symbol x<3 as closely to the exact bit fraction
log(3)=1.584962... as possible, instead of just storing each
symbol in 2 whole bits, or coding it in Huffman code as e.g. 0:0
1:10 2:11 (which gets you 1.66.. bits/sym on average).

The point (A4) is in this case (A=3) that if you need to store
compressed inputs with N=10^6 symbols into fixed size fields (or
skip quickly over compressed data in a larger package containing
them), with the flat 2-bit code for symbol you will need to
reserve space for 2*10^6 bits. Huffman, which on average reaches
1.66.,. bits/symbol will also need to reserve 2*10^6 bits to
guarantee that _all_ sequences from the set of possible inputs
(I called it Set_QI) will fit into the reserved space. With AC,
depending on implementation you may need about 20-60 bits, as in
Mahoney's reported tests, to guarantee fit for _all_ possible
A=3 N=10^6 sequences { much more for Moffat98 which has a
sub-optimally skewed quantization as explained here:

http://groups.google.com/group/comp.compression/msg/efa6336f483bbb89

in a post to Thomas Richter).

The alternative to reserving the extra space with AC and
Huffman, in cases where you only need to traverse, within a
larger package of items, the compressed data _quickly_ (which
means without decompressing) but not store it into the fixed
size record/packet fields is to store the compressed length
separately, in addition to storing N and A (for self-contained
compressed package).

In contrast, with QI coding these A=3, N=10^6 inputs, you not
only can know that its index will certainly fit in precisely
1584963 whole bits, but you _also_ know from the quantized
power q[A^N] (which is the QI table entry) that the leading 32
bits of the index will be always smaller than the 32 bit value
0xB521509A (which means you can package the index via mixed
radix codes with other items to reclaim the N*log(3) bit
fraction to within 1/2^31 bits from the exact N*log(3) lower
bound). QI also does not need to store separately the compressed
length -- all it needs are A and N values, which it already
needs anyway (just like the other coders do need A and N in some
form) to know the exact index size and its exact upper bound.
This precision is consequence of QI output optimality (it
produces index closest to N*log(3) for any g-bit addends which
also satisfy Kraft inequality eq. (20) in [T3]) and its finer-
granularity parametrization (A2). That was all explained at
length in the earlier posts cited above. The (A4) only
summarizes the main points.

0
nightlight
1/22/2006 8:24:00 PM
"nightlight" <nightlight@omegapoint.com> wrote in 
news:1137961440.803193.5700@g44g2000cwa.googlegroups.com:

> Subject: Re: Quantized Indexing Source Code (update & alg. history)
> From: "nightlight" <nightlight@omegapoint.com>
> Newsgroups: comp.compression,sci.math
> 
>> Appearently so far it was to hard to get QI to
>> fit in the world of Moffat and use real files
>> so anyone could check with the files of there
>> choice.
> 
> QI source code & executable (which are available for anyone to
> verify as they wish), gives you OutSizeQI in bits to use for
> the compression ratio vs any AC you may have:
> 
> 

   Appearantly I missunderstood your comments below in message
number 94 

http://groups.google.com/group/comp.compression/browse_frm/thread/7053c23c0
d01c81c/1e015f38d228969e?lnk=st&q=1584993&rnum=1&hl=en#1e015f38d228969e

QUOTE ON

 Hence the total decodable output for N=10^6 symbols, A=3 is:

     DECODABLE OUTPUT = 1584993 bits = 198124.125 bytes

That size is fixed, the same for all inputs. You can check the whole
code-decode-compare in the function radix_iter() in Tests.c file.

QUOTE OFF

  Maybe my code that is in

http://bijective.dogma.net/nitelite.zip

is way off base and doesn't seem to compress to
1584992 bits or 198124 bytes. That is if I put it
to a buffer and included in a poor way the length
field. Actully the compressed file length was
198121 bytes. Maybe this is all a dream of mine.
If so I GIVE UP YOU WIN. I HAVE WASTED ENOUGH
TIME TRYING TO ARGUE WITH YOU. DO YOU UNDERSTAND
I GIVE YOU WIN I GOT TIRED BEFORE YOU DO THEREFOR
YOU WIN. I WILL NOT POST IN THIS THREAD ANYMORE
SO GO AHEAD BE THE LAST ONE TO POST.



David A. Scott
-- 
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
 made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

0
David
1/22/2006 10:07:29 PM
I guess your AC must have choked up on the 32 bit alphabet
size A32=3,133,061,822.

Otherwise, why would you start recycling "results" from your "random
unbiased sample"  of five cherry picked files where even ZIP beats the
N*log(3) "lower bound" on index size by about 200 times. To say nothing
of comparing apples & oranges in other ways as well (such adding
different costs for N  etc, or using 64 bit AC,... see old posts).  We
don't need to waste further time flogging that particular dead horse.
We also have Matt's and Moffat98 results (you have Moffat98 source code
link to verify my statement on that) on the proper unbiased random
sample already (you're using Matt's AC anyway). And QI source is there
to test using the command line as described. That's all you or anyone
else needs to find out for themselves.

> I GIVE UP YOU WIN.

This has as much to do with "win" as does winning on: "which is bigger
2+2 or 3+3?" It is an elementary mathematical fact (see [T3]  pp. 2, 8
and [T2] pp. 22-25, and section A3 in the "summary" post). There is as
much of a "contest" and "win" about it as in the "2+2 vs 3+3" contest.
The room QI leaves for improvement in these N,A tests is very tiny, a
gap from QI to the entropy N*log(A) of about about 2e-10 bits  per
symbol for _any_ alphabet size or 2e-4 bits for the total N=10^6
symbols (well within 1 whole bit of entropy).  Not even the _infinite
precision_ AC will beat that little gap (it will be 1-2 bits above the
N*log(A) lower bound).

The  only coding algorithm which uses a limited precision arithmetic
which could beat that tiny bit fraction left is another version of QI
(or an equivalent optimal, hence bottom-up quantizer) using  higher
precision than g=32 bit arithmetic included in the source (at any given
addend precision,  the QI addends are mathematically the smallest ones
which also satisfy the Kraft  inequality eq. (20)  in [T3] and
pigeonhole principle, eq (8) in [T3]).

0
nightlight
1/22/2006 11:27:40 PM
Hi,

In comp.compression nightlight <nightlight@omegapoint.com> wrote:
> Since the debate has wound down, here is a condensed summary of the
> differences between QI and arithmetic coding (AC) all in one place.

Question: What is the purpose of this post?

a) Keep QI in discussion? (for example for b)
b) Promotion of QI?
c) a scientific discussion.

In case of a) or b), you are wrong here. This group is not your
favourite advertisment place. We all work for companies that pay our
bills and are all aware of that, but you should try to loosen this
relation a bit when posting here.

In case of c), you lack conterarguments that have been presented in
the discussion, and your posting requires shortening.

/* historical side remarks snipped *

> For the rest of this post I will address the QI's solutions
> for the four principal remaining weaknesses and tradeoffs
> introduced by the AC algorithm.


> A1) -- SPEED & POWER CONSUMPTION --

> The AC use of complex, power hungry instructions (mul/div when
> coding at its maximum precision) and the requirement for coding
> operations on the more probable symbol (MPS) results in speed &
> power consumption penalties (the latter becoming increasingly
> important). 

Objection, your honor. Whether mul/div is "slow" compared to
the QI table lookup (that is not mentioned here) is a question
of the table size and the framework the algorithm is implemented
for. You don't discuss the table sizes. Do they grow with the
data set? Side remark: State of the art desktop processors are
*much* faster on mul/div than on table lookup due to lack of
caching.

> In contrast, QI performs no coding operations on the
> MPS and it uses fewer and simpler instructions on the less
> probable symbol (LPS). 

Objection, your honor. If you mention that QI performs no coding
operation on MPS, you should mention that there are variants of AC
for which this holds. MQ is one of them. So what are you targetting
at? Speed, or optimality? I don't think you can have both.

> These two performance distinctions extend
> to a general alphabet of size A coding, through all A-1 internal
> nodes of the binary decomposition of the alphabet (cf. [T1], pp.
> 32-38).

> Variety of details giving rise to the significant QI speed
> advantage in different coding settings fit together in a general
> pattern of QI's much better division of labor at all levels --
> within the functions of the coder itself and extending to a
> similarly improved division of labor between the coder and the
> modeling engine.

In which sense "much better"?

> Within the coder proper, QI separates cleanly the general
> combinatorial properties of all symbol sequences satisfying some
> types of conditions (the "enumerative classes") from the
> incidental properties distinguishing individual sequences of
> that type. The general part (the quantized enumerative addends
> via eq. (21) p. 8, [T3]) is computed up front, once and for all
> and outside of the coding loop, with the results saved into
> universal tables (which are independent of source
> probabilities). 

Here we have a problem. How large are these tables? If I want
an improvement over AC on messages of length N -> infinity,
I would suppose the size of the tables have to grow in some power
of N to have all the binomial factors covered. Comments?

> The coding loop for a particular instance does
> only the absolute minimum work that deals exclusively with the
> individual properties of that instance (the index computation,
> eqs. (21),(22) p. 8 [T3]).

> Similarly, QI/EC modeling engine (cf. p. 27, [T2]) processes the
> entire finite sequence being encoded, decides on its
> decomposition into enumerative classes (ranging from simple
> segmentation of the input into fixed or variable contiguous
> blocks, through BW transform and selection of the optimal BW
> output column segments), then hands its complete output to the
> suitable enumerators within the encoder for index computation
> within the selected enumerative classes. Both components, the
> modeler and the coder (enumerator), perform their specialized
> tasks on the entire input, without interacting symbol by symbol
> as done with AC.

Objection: If we deal with non-all-purpose compressors, you often
have to deal pre-defined alphabets. AC can handle this fine because
the modeller is a separate stage. How would QI handle this?

> Therefore, the speed & power consumption edge of QI over AC is
> not a result of a coding trick or a lucky pick of parameters
> which happens to work well in some cases or any such accidental
> circumstance. The QI speed gains are large for all inputs -- for
> all source parameters and for all input sizes. 

See above. No serious speed analysis has happened yet. I personally
wouldn't care, but since you want to discuss it, you should use
a scientific method to do it. 

/* snip */

Paragraph about Ford snipped. What's the value of this paragraph?
It's not part of a summary.

> A2) -- PROBABILISTIC PARAMETRIZATION --

> The AC reparametrization of EC enumeration of finite sequences
> into the probabilistic framework (where the exact combinatorial
> parameters of finite sequences are replaced with normalized
> limit values of infinite sequences), can generally provide only
> a lower resolution, approximate models for a much richer space
> of finite sequences coding problems (which includes all
> practical coding tasks). In contrast, QI modeling interface uses
> precise finite sequence parameters, which are richer, sharper
> and more flexible language for modeling finite sequences.

Might very well be.

> As result, in the field of constrained coding (used e.g. in
> recording media and general constrained channel coding), where
> such finer controls over the precise finite sequence parameters
> are vital, EC remains the method of choice, despite the
> intervening advent of AC and the performance drawbacks of the
> unlimited precision EC.

Why the method of choice? Which things become simpler, which
things become harder? Where to put pre-knowledge on the data
if there is no model, for example? (I'm not saying it's good or
bad, I just lack a clear methology here.)

> In the wider realm of practical coding, AC's loss of resolution
> in the space of possible parametrizations has generally narrowed
> down the spectrum of modeling algorithms useful with AC, to
> essentially the PPM & CTW type of algorithms as the apex of
> modeling. It has also constrained the language and the type of
> parameters that a modeling engine can use to transmit all it
> knows about the finite sequence being encoded to the coders,
> reducing it in practice to 'probabilities of the next single
> symbol'. Yet, presently the most widespread and the most
> practical general compression algorithms, such as LZ & BWT
> families, perform "surprisingly" effective modeling of finite
> sequences in what are intrinsically the finite sequence
> parametrizations (dynamic dictionary entries for LZ, or context
> sorting via BW block transform for BWT) without ever computing
> 'probabilities of the next single symbol' (or any probabilities
> at all).

That's because you're mixing two things here. If you consider
BWT + AC as one code, you would see that the purpose of BWT
is just to provide an alphabet and symbol definition to the AC
to make it efficently compressible. That is, BWT defines a model
that is suitable for most text/program compression. It is a lot
less usable model for, e.g., image compression. BWT without
entropy coding does not work. Entropy coding without BWT does
work, but not very well on, e.g. text files. Thus, BWT+AC does
a lot more than "predict the next symbol". It also defines
what "the symbol" should be, in a clever way. That is, the
"probabilities" within AC are not "probabilities of words".


> Another side-effect of the ill-fitting parametrization for
> enumeration of finite sequences, is the performance penalty.
> Specifically, if one were to try emulating with AC the
> streamlined table based coding of QI (Schalkwijk has shown how
> this can be done for Elias algorithm, cf. p. 19-20 [T2]), so
> that AC would need encoding operations only for LPS, while
> skipping the MPS, and for LPS to have addends precomputed and
> ready in the table, one would need separate table of the same
> size as QI's table, for each source probability distribution. 

That is likely to be correct (I'll buy this).

> In
> binary case this would increase the table size by factor O(n)
> over the QI's universal table size, which is the factor QI
> gained over the exact EC (i.e. AC did not solve at all the EC
> table size problem arising in the high speed mode). In short,
> the probabilistic parametrization lacks the resolution to draw
> the precise line separating cleanly the universal (which could
> be precomputed into a table) from the instance enumerative
> properties of the finite symbol sequences.

This might be, but how does the QI table size grow in n?

/* snip */

> Or, expressed as a ratio of AC's over QI's maximum quantization
> redundancies:

>      AC(N) / QI(N) = 4        ... (4)

What is the definition of AC(N), resp. QI(N) here?

> i.e. even for binary alphabets, which is the best case for AC,
> the AC's sub-optimal quantization "leaks" four times more
> excess bits per symbol than QI's optimal quantization.

> { Note: although the expressions (2),(3) refer to the maximum
> redundancies of the coders, the average and pointwise
> redundancies are 2-3 smaller for both coders. 

Upper bounds on redundancies, then? Redundancies measured 
in what?

> The empirically observed compression efficiency differences
> shown in the QI vs AC performance table (p. 10, [T3]) are
> dominated (in all but the last row) by the contributions of the
> O(log(N)) and O(1) terms in D.

Objection: There has not yet been an independent empirical 
observation. For my place mostly because I can't compile your
source. (-;

/* snip */ Unnecesary small talk snipped.

> In addition to the general compression scenario where BWT is
> used, many specialized problems of great practical importance
> offer a glimpse at this same unexplored realm waiting for the
> high precision, low noise coder. Most notable among these are
> incremental frame updates in video coding, be it for live video
> or for recording & playback media. The residue after the inter-
> frame redundancies removal, including the motion compensation,
> are collection of small, uncorrelated fragments 

I object with the "uncorrelated". This one model one typically
chooses here.

> with relatively
> low intra- fragment and high inter-fragment entropies, but where
> the O(1) and O(log(N)) terms of AC or Huffman, accumulated over
> multiple fragments, would wipe out any gains if one were to try
> coding such fragments separately. Even the conventional EC in a
> suboptimal hierarchical approximation has been found overall
> advantageous vs AC in this domain. 

More data please. By whom? 

> A similar conflict between
> the optimum MDL segementation vs. the coder noise is encountered
> in various forms across the image coding domains, especially for
> methods based on hierarchical set partitioning (cf. citations on
> p. 2 [T3]).

> A4) -- STABLE & AVAILABLE COMPRESSED SIZE --

> The output size produced by AC (and even more so for Huffman)
> fluctuates unpredictably from instance to instance even for
> perfectly equiprobable messages. This is a manifestation of
> their coding sub-optimality (A3) and their lower resolution
> parametrization (A2). In contrast, the output of QI is perfectly
> stable for equiprobable messages (enumerative class), not just
> down to 1 bit, but down to the exact bit fraction. 

Seems to be about correct.

So long,
	Thomas
0
Thomas
1/23/2006 12:04:36 PM
Just a quick follow-up to my follow-up:

Another item that lacks discussion/summarization if you want to say so:

QI is a LIFO coder. Whether this is good or bad depends on the
application, but it ought to be said.

So long,
	Thomas
0
Thomas
1/23/2006 2:47:16 PM
Thomas Richter <thor@mersenne.math.TU-Berlin.DE> writes:
> > A1) -- SPEED & POWER CONSUMPTION --
> 
> > The AC use of complex, power hungry instructions (mul/div when
> > coding at its maximum precision) and the requirement for coding
> > operations on the more probable symbol (MPS) results in speed &
> > power consumption penalties (the latter becoming increasingly
> > important). 
> 
> Objection, your honor. Whether mul/div is "slow" compared to
> the QI table lookup (that is not mentioned here) is a question
> of the table size and the framework the algorithm is implemented
> for. You don't discuss the table sizes. Do they grow with the
> data set? Side remark: State of the art desktop processors are
> *much* faster on mul/div than on table lookup due to lack of
> caching.

I believe I already mentioned that the not-yet-hit-the-streets kit 
on my desk will do 200 multiplies in the same time as one main-memory
access. This particular core is one where power consumption has been
a priority consideration, being not a desktop processor.

There was no response.

Have you just been trolled too?

Phil
-- 
What is it: is man only a blunder of God, or God only a blunder of man?
-- Friedrich Nietzsche (1844-1900), The Twilight of the Gods
0
Phil
1/23/2006 11:16:25 PM
>> Since the debate has wound down, here is a condensed summary
>> of the differences between QI and arithmetic coding (AC)
>> all in one place.
>
> Question: What is the purpose of this post?
>
> a) Keep QI in discussion? (for example for b)
> b) Promotion of QI?
> c) a scientific discussion.

Various aspects of differences between QI and AC collected and
integrated here were scattered over many posts. As stated at the
top, I wanted them all in in one place, where one can see the
relations among them more clearly (as the summary itself shows).
As for the larger purpose and intended audience, see the post:

M1.
http://groups.google.com/group/comp.compression/msg/6d8dcafd8b947ea1

> case of c), you lack conterarguments that have
> been presented in the discussion, and your
> posting requires shortening.

The two requirements above are mutually exclusive. As to
counterarguments, it is an open group. There were some
groundless objections already that were dealt with in the two
followups from me.

> Whether mul/div is "slow" compared to the QI table
> lookup (that is not mentioned here) is a question
> of the table size and the framework the algorithm
> is implemented for.

First, mul/div will consume more power than a simple table
lookup, especially if tables are in ROM on a portable
device (where power consumption is a critical parameter).
The speed ratio is a more complex matter due to cache
considerations and that's why the test were done and
the [QIC] source released.

> You don't discuss the table sizes.

The table sizes depend on the type of enumerative classes (cf.
N2, p. 9 [T3]). For permutation codes they grow linearly with N
(number of items permuted) and similarly with general mixed
radix codes (N=number of digits). For general entropy coder, the
binomial tables have N^2/4 entries, where N is the max block
size.

The [QIC] source uses 32 bit precision and it keeps only
mantissa in the tables. It computes exponents via array
containing N+1 entries for log(x!) in fixed point. Hence for
general entropy coding the tables are N^2 bytes. The block size
N in the source can be set from 32 to 16384. The "typical" block
size (e.g. as used in the tests) for binary coder is N=1K which
makes the table size 1MB.

This table can be cut in half by storing only every 2nd addend
front (the row of Pascal triangle) and interpolating in between
via C(n,k)=C(n-1,k)+C(n-1,k-1) (this also reduces quantization
errors since the interpolated SWI addends are not quantized but
the two terms are added directly into the index). The mixed
radix codes given in [QIC] use this "skip" method with factor A
(alphabet size) instead of 2, which makes their redundancy
independent of the alphabet size (unlike AC where the
quantization redundancy for N symbols is proportional to A*N).

Regarding cache:

C1) Not all of N^2/4 entries need to fit in the cache. Namely,
in multiblock coding of a stationary source the table entries
used are along the line y=p*x on the (x,y) lattice, where p=
probability of 1 (LPS), within a band of width sqrt(N) (for not
overly low entropy), which means you need approx. N^3/2 entries
to fit in the cache. For N=1K, that is about 3% of the 1MB, or
32KB, well within the desktop processors cache sizes.

C2) The cache miss cost can be greatly reduced to practically
negligible in the QI case due to the orderly nature of table
accesses. Namely, QI coder can simply read but not use the table
addend right away. Several statements later (or even on the next
or later loop pass) it can use the retrieved addend for
addition, which by now was long preloaded into the cache from
the main RAM. A very simple variant of this method (which is all
I tried), in which the table mantissa gets loaded one loop cycle
ahead in the very basic simpleminded implementation, improves QI
speed by about 30% on the machines with the largest cache penalty.

C3) The CPU/CACHE vs RAM speed ratio C/M has been evolving.
The machines (all win/intel) we have around, span 1996-2005
(the oldest ones are still kept for testing). Looking at
the worst case QI vs AC speed ratio (the high entropy limit),
on the oldest machines the ratio was 6-7, then by 2000 it
dropped to 4-5, then it rose again back to 6-7 on the
latest machines. (This is all without any cache specific
code optimizations, which will work better on newer CPUs due to
greater parallelism.) The trend seems to be that the memory bus
speed has been catching up with the CPU speeds.

C4) For very low entropy sources, the [QIC] source includes a
specialized "sparse coder" with tables which cover only a narrow
strip of width MK (Max K, this can be set via QI.exe option
"mk<maxk>") near the axis y=0. Hence its entire table size is
(MK-2)*N. For very sparse arrays, due to larger block size N
available (for any given cache size), in multi-block coding
this version runs typically 2-3 times faster than the generic QI
(only generic QI was used for the comparison table in [T3]).


>> In contrast, QI performs no coding operations on the
>> MPS and it uses fewer and simpler instructions on the less
>> probable symbol (LPS).
>
> .. If you mention that QI performs no coding operation on MPS,
> you should mention that there are variants of AC for which
> this holds. MQ is one of them. So what are you targetting
> at? Speed, or optimality? I don't think you can have both.

Well, I did say upfront that I was talking about AC "coding at
its maximum precision" (such as Moffat98, [4]). Since the 3
other points A2-A4 dealt with various aspects of coding
precision advantages of QI (and the unexplored potentials
opened), mixing in some specialized solutions (be it MQ, QM, Q,
ELS, Z, runlength etc) with much lower coding accuracy in the
full spectrum of input parameters would have only added the
noise to the A2-A4 signal.

Doing a study of AC performance/accuracy tradeoffs is
a mechanical, non-creative work that you can give to
any student as a term project to play with and learn.
With so many more interesting, unexplored areas opened
by QI, fiddling with various AC approximations and
tradeoffs is like leaving a luxurious dinner prepared
by the best chefs to eat some week old fast food
leftovers from a fridge.

> Speed, or optimality? I don't think you can have both.

Well, QI does have both. It codes more accurately than the most
accurate AC, it runs much faster than the fastest full precision
AC such as [4], and it even runs faster than any less accurate
specialized quasi-ACs on their own narrow turfs (cf. recent
survey by [Said04] which includes MQ coder, showing speed
factors of only 2-3 for the quasi-ACs vs the full AC; these are
not even close to typical QI factors 10-20, let alone the
extreme points of 200+).


>> pattern of QI's much better division of labor at all levels --
>> within the functions of the coder itself and extending to a
>> similarly improved division of labor between the coder and the
>> modeling engine.
>
> In which sense "much better"?

While generally, "better" does require specification of the
value function being optimized, in the case here this value
function is made explicit (the speed & power consumption), from
the title of that section down to the descriptions & car factory
analogy that followed the sentence you quoted. Also, generally,
a better division of labor means that the interaction between
the functional blocks combined to perform the full task is
minimized. AC does it exactly backwards - it maximizes this
interaction by conditioning coding steps on modeling steps,
symbol by symbol. The Henry Ford analogy is very close to the
differences here.


> Here we have a problem. How large are these tables? If
> I want an improvement over AC on messages of length
> N -> infinity, I would suppose the size of the tables
> have to grow in some power of N to have all the binomial
> factors covered. Comments?

The tables sizes were already answered above. These sizes are
result of memory & speed tradeoffs. The size of coding blocks
(for index computation) depends on the source. The actual
selection of the block size or input segementation policy is a
job of the modeling engine within EC/QI scheme. The basic
objective is to minimize total output (MDL), the index + the
model parameters. Hence the transformation of the input sequence
into the enumerative classes varies greatly, depending on input
sequence ([T2] pp. 26-35). A simple coder for stationary sources
would use maximum block size N that the tables allow. Generally,
though, the "messages" given to the coder for indexing need not
be contiguous segments of the input sequence at all (e.g. when
using BWT as the QI modeler, see also [M2] for two ways of
QI use with AC modelers).

> If we deal with non-all-purpose compressors, you often
> have to deal pre-defined alphabets. AC can handle this
> fine because the modeller is a separate stage. How would
> QI handle this?

The QI multi-alphabet coding is described in [T1] pp. 31-38). In
general case it is a binary decomposition into (A-1) streams,
(where A is alphabet size). That decomposition can be seen as a
special case of the m-th order Markov source described in [M2]
method (b). For high entropy sources QI uses mixed radix codes
([T1] pp. 46-51, also in Radix.c in the [QIC] source).

Note also that QI modeling is "descriptive" and not
"predictive", hence decoder is not trying to predict anything
based on previous symbols. It simply reconstructs the sequence
from the specified enumerative classes and their index values
given (cf. [T2] pp. 26-35). Hence QI's division of labor between
the coder & modeler is much cleaner than for AC, since the
interaction between the two doesn't occur symbol by symbol as it
does with AC. The often touted "separate" modeler & coder of AC,
looks in comparison like a car factory building one car at time,
with all machines & workers just working on that one care, from
start to end, and only then starting the next car.

Obviously the trick is how to get good enumerative classes for a
given sequence (or set of sequences/source). The BWT (just the
bare transform) is the general purpose QI modeler. The selection
of the optimum partition of the BWT output column is an
interesting problem with solutions still evolving (the simple
bottom up method mentioned in (A3), although accurate, is still
too slow and clumsy for my taste). For idealized  sources, such
as the m-th orer Markov source one can use simpler methods (see
[M2] method b).

> No serious speed analysis has happened yet. I personally
> wouldn't care, but since you want to discuss it, you
> should use a scientific method to do it.

There is obviusly much more one can do on theoretical and
experimental sides. But, with many more interesting problems in
front of me, I will leave that particular exploration to others.
Tests & comparisons with Moffat98, which is overall the best (on
combined speed & accuracy) full precision general AC, given the
magnitude of the speed differences found, is more than
indicative what is ahead on that topic. Note also that Moffat98
crowns the two decades of AC optimizations (and it has also a
highly optimized C code), while QI tested (p. 10 [T3]) was a
relatively crude prototype code of an algorithm 6-8 months old
at the time. Hence, it is reasonable to expect only greater
speed ratios in the future.

> Paragraph about Ford snipped. What's the value of this
> paragraph?

It is a close analogy to the difference in the division of labor
at all levels between QI and AC. If the speed & power
consumption is the value function being optimized (instead of
AC's "online" capability, which is the Morse code style analog
line "on-line", with a fictitious constraint of always having
to process one symbol at a time from top to bottom and from
start to end), then the QI's streamlined division of labor is
the way to do it.

> Why the method of choice? Which things become simpler, which
> things become harder? Where to put pre-knowledge on the data
> if there is no model, for example?

In constrained coding you are coding a maximum entropy input
(all symbols equiprobable, being an output of some unconstrained
entropy coder) and you wish to produce output which satisfies
precise conditions on output symbol counts, such as max and min
numbers of successive 0's and 1's, with often different min &
max values for the start and end of the sequence. The way EC
does this is to look at the constrained sequence (which in CC
they consider as "encoded output") as the EC encoder's input,
where the complex constriants are simply an enumerative
regularity, which is known upfront, hence it needs no
transmission, that EC enumerates (calculates index). The EC
output is the index plus any extra info to specify enumerative
class (with much of the 'regularity' info being hardwired
constant, thus not packaged). Hence the "CC decoding" is the "EC
encoding" (and vice versa). While AC can compute the index for
these complex constraints, what you end up with is an AC of
enumerative style (like decrementing AC of [34], but more
complex) with complex and expensive probability updates done in
the coding loop and lower accuracy than EC. Since in CC they use
relatively smaller block sizes (dictated by hardware & recording
media formats), the unlimited precision EC on small blocks with
addend tables in ROM is often faster, simpler and more accurate
than AC.

You can check Immink's work [26] for more details and on his
arguments for use of EC within CC. He develped a floating point
approximation for EC which, except for violating the EC form of
Kraft inequality (eq. (20), p. 8 [T3]) and pigeonhole principle
(eq. (8), p. 5 [T3]), almost worked (see [T3] p. 2). It is
basically similar kind of idea as QI, except that he didn't have
a clear concept for SWI ([T3] p. 7, which unlike FP, have
arithmetic operations formally decoupled from rounding, which
is with SWI considered an independent operator that application
invokes or doesn't invoke, like any other operator, as
appropriate for the algorithm) or a clean formalism with
factored constraints (eq. (5) & (15) for EC, (21) & (23) for
QI). With everything mangled together it was difficult in that
approach to see what exactly was missing (eqs. (20) & (8), (5)
in [T3]) and how to fix the decodability problem (eqs. (21),
(23) in [T3]).

> That's because you're mixing two things here. If you consider
> BWT + AC as one code, you would see that the purpose of BWT
> is just to provide an alphabet and symbol definition to the
> AC to make it efficently compressible. ...Thus, BWT+AC does
> a lot more than "predict the next symbol". It also defines
> what "the symbol" should be, in a clever way. That is,
> the "probabilities" within AC are not "probabilities of
> words".

I am not missing anything. You are merely hardwired to translate
everything when the term "model" comes up into the AC modeling
paradigm, that particular division of labor with its
probabilistic language and the "probabilities of the next
symbol" (see distinction (A2)). BWT or LZ don't do the modeling
of finite sequences in that language. BWT output can be coded
enumeratively without ever performing any translation of BWT
model (the BW bare transform output column R, with its full
context tree) into "probabilities of the next symbol" or any
probability at all. The enumerative coder only needs to know the
boundaries of the fragments (which represent enumerative
classes) to produce the index. The whole AC translation layer is
never computed or considered. All the communication between the
modeling engine (BW transform generator + segment constructor)
and the coder is in terms of the finite sequence parameters
(fragment positions & lengths). The probabilistic language is not
needed by either and that layer is simple never constructed.

With existent BWT implementations, you have variants where
probabilistic layer gets constructed, and others where it
doesn't (e.g. MTF & various forms of runlength & other universal
and synthetic Huffman codes). Depending in which language you
interpret those methods which never compute any probabilities,
you can at best insist that they use implicit probabilities.

>From the perspective of "descriptive" modeling ([T2] pp. 26-35)
though, that kind of non-functional conceptual scaffolding is a
superfluous, low resoluton post hoc "explanation" of the much
richer and finer-grained finite sequence tranformations and
properties. That was not the "mixing" of the "two things"
but the main point of the distinction (A2).

> This might be, but how does the QI table size grow in n?

The QI binomial table has n^2/4 integer entires for block of
size n (the QI.exe gives you more accurate figures, see
new_qbct() in Qi.c which for binomials has exact value n^2/4-
n+1). The table AC would need would have n^3/4 entries if its p
is can distinguish n values (see [T2] pp. 19-20). Obviously, if
you quantize p into a different number of levels than n, then
you need L*n^2/4 entries.

>> Or, expressed as a ratio of AC's over QI's maximum quantization
>> redundancies:
>>      AC(N) / QI(N) = 4        ... (4)
>
> What is the definition of AC(N), resp. QI(N) here?

That was from the cited refs right above that place (cf.
eq. (20) p. 14 [41a] for AC and d(g) p. 8 in [T3]...).
They are upper bounds on absolute redundancies, in bits.
>From AC(N)=4*N*A*log(e)/2^g for A=2 => AC(N)=8*N*log(e)/2^g
while QI(N)=2*N*log(e)/2^g, hence the ratio (4).

Note that AC quantization losses are due to truncation of the
infinite fractions (compared to unlimited precision Ac) and to
its variable size mantissa for Pc (I assume you clarified by now
the conventions of the Volf-Tjalkens AC formalism), which is
allowed to drop to 2^g/4, thus to g-2 mantissa bits, before they
renormalize it. That leads to a lower precision intervals (the
mantissa of Pc) and a reduction of Pc for the next stage. QI,
which does that quantization outside of its coding loop (eq.
(21) p. 8 [T3]) can afford to be more careful and keep its g bit
mantissa always normalized to exactly g bits. Also, working in
integers QI doesn't have the infinite fractions that must be
forgotten (one way or the other) on every step. E.g. near the
axis y=x, where binomials are nearly equal (the same exponents),
adding two binomials C1+C2 leads to rounding up only 1/2 of the
time (only if the lowest bits of C1 and C2 are different).
Finally, for the initial g+O(1) steps QI makes no approximation
at all (and for even more steps farther away from the axis y=x).

{ The QI.exe with the [QIC] source shows you (via option "cbr")
row by row for all binomials all quantizations, in the last step
and cumulative and the resulting cumulative bit excess (for n up
to 2^20, which can be extended in the Intro.h). The command "ct"
gives the full stats for the entire table for n up to 2^20, e.g.
for n=1K, it shows 72% of binomial computations (eq. (21)
p. 8 [T3]) produced rounding up, while for n=16K, 73.96% were
rounded up. }

> There has not yet been an independent empirical
> observation.
> For my place mostly because I can't compile your
> source. (-;

The current [QIC} 1.03 has _ASM option in Intro.h to turn off
few MSVC _asm macros (for bit position & mul/div). Also, Jasen
Betts who posted in this thread has created a unix version (with
different, perhaps a bit lower res timers). I don't have a set-
up to test it at the moment so it is not released (it will have
another directory port\* with his win32 substitutes), but he
could probably email you his copy that does compile and run.

>> coding such fragments separately. Even the conventional EC in a
>> suboptimal hierarchical approximation has been found overall
>> advantageous vs AC in this domain.
>
> More data please. By whom?

Mostly from my own ongoing work on the video codec (which
motivated last summer the search for a more practical high
precision enumerative coder and which ended up as a long detour,
when the QI idea came up). See also similar observations from
Oktem et al. papers [23],[46] which deal with more general image
coding, on advantages of their hierarchical EC (the interframe
differences coded by a video codec are subsets of their inputs
with additional fragmentation, hence they are even more suitable
for the high precision coding).

> QI is a LIFO coder. Whether this is good or bad depends on the
> application, but it ought to be said.

QI can code as LIFO or FIFO (cf. N1, p.9 [T3]). Note also
that encoder can send data out with delay g bits, in LIFO or
FIFO mode. In FIFO mode decoder can also decode it as soon as
it arrives, while in LIFO mode it has to wait until the whole
block arrives. The sample source code uses only LIFO mode.

Also, within descriptive modeling there is no FIFO/LIFO coding
concept. The sequence handed to the coder+modeler is coded as a
whole. The coding task is considered as a request to produce the
minimum length description of that sequence, understandible to
decoder and taking into account any shared knowledge about it
that encoder & decoder may have. The coder & modeler do not
gratuitously impose upon themselves some "ideological"
constraints, or drag in by inertia various conceptual leftovers
from the Morse telegraph era, of having to model+encode one
symbol at a time, from start to end before they can "touch" the
next one symbol, which also must have index i+1 (or that
symbols have to be read in any particular order or just once or
just one symbol in one coding loop step,... etc).

Of course, all of the above doesn't imply that there is a self-
imposed "ideological" constraint prohibiting coding/modeling the
AC way, even when that is the most suitable way (see [M2]).



--- References ( http://www.1stworks.com/ref/RefLib.htm )

QIC. QI C source code research kit, publicly available at:
     http://www.1stworks.com/ref/qi.htm

T1.  R.V. Tomic "Fast, optimal entropy coder" 1stWorks
     TR04-0815, 52p, Aug 2004
     http://www.1stworks.com/ref/TR/tr04-0815b.pdf

T2.  R.V. Tomic "Quantized indexing: Background information"
     1stWorks TR05-0625, 39p, Jun 2005
     http://www.1stworks.com/ref/TR/tr05-0625a.pdf

T3.  R.V. Tomic "Quantized Indexing: Beyond Arithmetic Coding"
     arXiv  cs.IT/0511057, 10p, Nov 2005
     http://arxiv.org/abs/cs.IT/0511057

34.  J.G. Cleary, I.H. Witten "A Comparison of Enumerative and
     Adaptive Codes" IEEE Trans. Inform. Theory  IT-30 (2),
     306-315, 1984
     http://www.1stworks.com/ref/Cleary84Enum.pdf

4. A. Moffat, R. M. Neal, I.H. Witten
   "Arithmetic coding revisited"
   ACM Trans. on Inf. Sys. Vol 16, No 3, 256-294, July 1998
http://www.stanford.edu/class/ee398/handouts/papers/Moffat98ArithmCoding.pdf

41b.  B. Ryabko, A. Fionov "Fast and Space-Efficient Adaptive
      Arithmetic Coding" Proc. 7th IMA Intern. Conf. on
      Cryptography and Coding, 1999
      http://www.1stworks.com/ref/RyabkoAri99.pdf

[Said04] Amir Said "Comparative Analysis of Arithmetic Coding
Computational Complexity" HPL-2004-75
http://citeseer.ist.psu.edu/said04comparative.html

M2. Coding methods (a) & (b) for AC modeling engine:
http://groups.google.com/group/comp.compression/msg/1314ff87da597fad

26. K.A.S. Immink "A Practical Method for Approaching the Channel
Capacity of Constrained Channels" IEEE Trans. Inform. Theory
IT-43 (5), 1389-1399, 1997
http://www.exp-math.uni-essen.de/%7Eimmink/pdf/ultra.pdf
http://www.exp-math.uni-essen.de/%7Eimmink/refs.html

23. L. Oktem  "Hierarchical Enumerative Coding and Its
    Applications in Image Compressing"
    Ph.D. thesis, TUT Finland, 1999
    http://www.1stworks.com/ref/oktemThesis.pdf

46. L. Oktem, R. Oktem, K. Egiazarian, J. Astola "Efficient Encoding
Of The Significance Maps In Wavelet Based Image Compression"
Proc. ISCAS 2000, 28-31 May, Geneva, Switzerland
http://www.cs.tut.fi/%7Ekaren/project_site/publications/sigmap.pdf

0
nightlight
1/24/2006 12:27:13 AM
(a)
> I believe I already mentioned that the not-yet-hit-the-streets kit
> on my desk will do 200 multiplies in the same time as one main-memory
> access.
(b)
> This particular core is one where power consumption has been
> a priority consideration, being not a desktop processor.

1. Statement (b) has no relation to the power consumption of mul/div vs
power consumption of memory read (e.g. fixed table in ROM on a portable
device). The mul/div will use more power than such memory read. And the
faster that mul/div circuit, the more power it uses.

2. As to cache miss penalty, see C1-C4 from the previous reply.

3. Note that you can compute quantized binomials one step at a time
using 1 mul and 1 div (in machine precision), without any tables at all
(e.g. C(n+1,k+1) = C(n,k)*(n+1)/(k+1) using QI's sliding window
integers for C's). For QI that is simply an extreme point in the
spectrum of possible table size reductions (see note N2, p. 9 in [T3]
for other points in that spectrum).  The tradeoff is that in that case
QI would need to do one such mul/div pair on each symbol, LPS and MPS.
That just happens to be exactly the amount and exact kind of work AC
does for its encoding steps (which is not a coincidence, see [T2]). The
speed difference would be gone at this extreme point of the spectrum.
The only difference left would be the QI's more accurate quantization
(with about 1/4 in quantization produced excess of bits/sym compared to
AC).  For the rest of the spectrum, though, the AC can't select some
other tradeoff, which would make it code as fast as QI without dropping
even further (and very drastically so) its already inferior coding
precision. Even the low accuracy quasi-ACs (see Said04 ppaper from the
previous post) trading accuracy for speed, run only 2-3 times faster
than the full AC, such as Moffat98. That's not anywhere near the QI vs
Moffat98 speed ratios across the spectrum of input entropies.

0
nightlight
1/24/2006 1:06:35 AM
Holy shit.
This has been the longest, most drawn out thread.

So how many BINARY BITS caused this whole dispute?
NOT BYTES, BITS.

0
Goldy
1/24/2006 1:12:49 AM
> Holy shit.
> This has been the longest, most drawn out thread.

You're not exactly helping keep it shorter.

> So how many BINARY BITS caused this whole dispute?
> NOT BYTES, BITS.

See table on page 10 in the arXiv preprint:

   "Quantized Indexing: Beyond Arithmetic Coding"
   http://arxiv.org/abs/cs.IT/0511057

0
nightlight
1/24/2006 1:29:32 AM
Hi,

> Various aspects of differences between QI and AC collected and
> integrated here were scattered over many posts. As stated at the
> top, I wanted them all in in one place, where one can see the
> relations among them more clearly (as the summary itself shows).

If so, then your collection at least had a very strong tendency.

> > Whether mul/div is "slow" compared to the QI table
> > lookup (that is not mentioned here) is a question
> > of the table size and the framework the algorithm
> > is implemented for.

> First, mul/div will consume more power than a simple table
> lookup, especially if tables are in ROM on a portable
> device (where power consumption is a critical parameter).

This depends. If the CPU has to idle for many cycles, it
will draw more power during the idling that during the data
fetch.

> > You don't discuss the table sizes.

> The table sizes depend on the type of enumerative classes (cf.
> N2, p. 9 [T3]). For permutation codes they grow linearly with N
> (number of items permuted) and similarly with general mixed
> radix codes (N=number of digits). For general entropy coder, the
> binomial tables have N^2/4 entries, where N is the max block
> size.

Do you mean (N^2)/4 or N^(2/4) = sqrt(N)?

Thus, table size grows O(N^0.5) where N is the size of the message.
That is, in realistic setups, the question remains whether you receive
at all a point where the advantage pays off. Or, mathematically, there
is a lower bound M on the memory requirements and a random source such
that for all table sizes smaller than M, QI is outperformed by AC. It
remains to check what M is and whether M is small enough to make the
above a realistic argument for the proposed application, namely
embedded devices.

> The block size N in the source can be set from 32 to 16384. 
> The "typical" block
> size (e.g. as used in the tests) for binary coder is N=1K which
> makes the table size 1MB.

Pretty small block sizes, actually. There's no blocking requirement
for AC, so it may happen that for a given random source AC outperforms
QI in the long run. Now since it has been understood that there are
random sources where AC might perform better, what are the random
sources QI might outperform AC. And if so, are they relevant to the
target application? As I read you, QI shows an advantage in the
high entropy domain. What is your target application? (Video/audio
data that come out of a predictor are IMHO not 'high entropy' because
there's even for lossless a compression gain of 1:2 to 1:4. But then,
what's high entropy?. How high?)

> Regarding cache:

> C1) Not all of N^2/4 entries need to fit in the cache. Namely,
> in multiblock coding of a stationary source the table entries
> used are along the line y=p*x on the (x,y) lattice, where p=
> probability of 1 (LPS), within a band of width sqrt(N) (for not
> overly low entropy), which means you need approx. N^3/2 entries
> to fit in the cache. For N=1K, that is about 3% of the 1MB, or
> 32KB, well within the desktop processors cache sizes.

Huh? N^3/2 grows faster than N^2/4. 3% for a stationary 0 order
Markov source, then?

> C4) For very low entropy sources, the [QIC] source includes a
> specialized "sparse coder" with tables which cover only a narrow
> strip of width MK (Max K, this can be set via QI.exe option
> "mk<maxk>") near the axis y=0. Hence its entire table size is
> (MK-2)*N. For very sparse arrays, due to larger block size N
> available (for any given cache size), in multi-block coding
> this version runs typically 2-3 times faster than the generic QI
> (only generic QI was used for the comparison table in [T3]).

How does this compare to AC? How to specific high-speed 
implementations as MQ?

> Doing a study of AC performance/accuracy tradeoffs is
> a mechanical, non-creative work that you can give to
> any student as a term project to play with and learn.
> With so many more interesting, unexplored areas opened
> by QI, fiddling with various AC approximations and
> tradeoffs is like leaving a luxurious dinner prepared
> by the best chefs to eat some week old fast food
> leftovers from a fridge.

I don't deny that QI might be interesting. However, I don't
like unproved claims...

> > Speed, or optimality? I don't think you can have both.

> Well, QI does have both. 

.... as this one. 

> It codes more accurately than the most accurate AC, 

Does it? With blocking? It means that the "memory" of the
coder and thus probabilities are only based on a relatively short
sample set, where "short" = block size.

> it runs much faster than the fastest full precision
> AC such as [4], and it even runs faster than any less accurate
> specialized quasi-ACs on their own narrow turfs (cf. recent
> survey by [Said04] which includes MQ coder, showing speed
> factors of only 2-3 for the quasi-ACs vs the full AC; these are
> not even close to typical QI factors 10-20, let alone the
> extreme points of 200+).

I don't know how they implemented MQ, but you can get it to a
speed where it is close to writing out bits uncompressed. At
least, I would want to do an independent measurement.

> > In which sense "much better"?

> While generally, "better" does require specification of the
> value function being optimized, in the case here this value
> function is made explicit (the speed & power consumption), from
> the title of that section down to the descriptions & car factory
> analogy that followed the sentence you quoted. 

Not really. If you make this claim, you possibly should back it 
up because I do not yet find the arguments too convincing. Analogies
are fine, but don't prove anything. Thus, is there a provable
theorem, or is there an independent study of whether the mentioned
claims are correct?

> Also, generally,
> a better division of labor means that the interaction between
> the functional blocks combined to perform the full task is
> minimized. AC does it exactly backwards - it maximizes this
> interaction by conditioning coding steps on modeling steps,
> symbol by symbol.

The question is: Is this a useful thing to do? I do have
situations where I know in advance that the bitstream I'm pushing
into the coder consists actually of several interleaved random
sources which behaive quite differently. AC allows me to define
contexts. Can QI do that? With its tight modelling, is it able to
exploit this statistics?

Example: Consider a source consisting of two first-order Markov
sources A and B. At an even timestep, I draw from A, at odd timesteps,
I draw from B. With modelling, the implementation of an AC coder that
is optimized to this situation is easy. What would I need to do with
QI? Not exploiting the special nature of this source might be very
wasteful.


> > Here we have a problem. How large are these tables? If
> > I want an improvement over AC on messages of length
> > N -> infinity, I would suppose the size of the tables
> > have to grow in some power of N to have all the binomial
> > factors covered. Comments?

> The tables sizes were already answered above. These sizes are
> result of memory & speed tradeoffs. The size of coding blocks
> (for index computation) depends on the source. 

If you care about compression performance, the only thing that
is meaingful is the limit N->infinity. Is there a finite memory
implementation of QI that runs optimal in this limit?

> > If we deal with non-all-purpose compressors, you often
> > have to deal pre-defined alphabets. AC can handle this
> > fine because the modeller is a separate stage. How would
> > QI handle this?

> The QI multi-alphabet coding is described in [T1] pp. 31-38). In
> general case it is a binary decomposition into (A-1) streams,
> (where A is alphabet size). That decomposition can be seen as a
> special case of the m-th order Markov source described in [M2]
> method (b). For high entropy sources QI uses mixed radix codes
> ([T1] pp. 46-51, also in Radix.c in the [QIC] source).

This wouldn't cover the random source example above. Of course,
I can model this as some 2nd order markov chain, but I then
pay the penalty of making it harder to adapt to it. In other words,
I do not make optimal usage of my apriori knowledge on the source.

> Note also that QI modeling is "descriptive" and not
> "predictive", hence decoder is not trying to predict anything
> based on previous symbols. 

What means "descriptive"? As I see it, you keep relative frequencies
or symbol counts here. Whether you call this "descriptive" (as in,
I have counted so and so much sequences of this kind) or "predictive"
(as in, I predict the probability of a symbol due to the counts) is
just a matter of language. Thus, what *is* the difference, leaving
language issues alone?

> > No serious speed analysis has happened yet. I personally
> > wouldn't care, but since you want to discuss it, you
> > should use a scientific method to do it.

> There is obviusly much more one can do on theoretical and
> experimental sides. But, with many more interesting problems in
> front of me, I will leave that particular exploration to others.

All fine with me. For that, I would prefer:

a) not to state results for tests that haven't been verified yet.

b) to have a source code here I can actually make use of. 

First things first.

> > Why the method of choice? Which things become simpler, which
> > things become harder? Where to put pre-knowledge on the data
> > if there is no model, for example?

> In constrained coding you are coding a maximum entropy input
> (all symbols equiprobable, being an output of some unconstrained
> entropy coder) and you wish to produce output which satisfies
> precise conditions on output symbol counts, such as max and min
> numbers of successive 0's and 1's, with often different min &
> max values for the start and end of the sequence. The way EC
> does this is to look at the constrained sequence (which in CC
> they consider as "encoded output") as the EC encoder's input,
> where the complex constriants are simply an enumerative
> regularity, which is known upfront, hence it needs no
> transmission, that EC enumerates (calculates index). The EC
> output is the index plus any extra info to specify enumerative
> class (with much of the 'regularity' info being hardwired
> constant, thus not packaged). Hence the "CC decoding" is the "EC
> encoding" (and vice versa). While AC can compute the index for
> these complex constraints, what you end up with is an AC of
> enumerative style (like decrementing AC of [34], but more
> complex) with complex and expensive probability updates done in
> the coding loop and lower accuracy than EC. Since in CC they use
> relatively smaller block sizes (dictated by hardware & recording
> media formats), the unlimited precision EC on small blocks with
> addend tables in ROM is often faster, simpler and more accurate
> than AC.

> You can check Immink's work [26] for more details and on his
> arguments for use of EC within CC. He develped a floating point
> approximation for EC which, except for violating the EC form of
> Kraft inequality (eq. (20), p. 8 [T3]) and pigeonhole principle
> (eq. (8), p. 5 [T3]), almost worked (see [T3] p. 2). It is
> basically similar kind of idea as QI, except that he didn't have
> a clear concept for SWI ([T3] p. 7, which unlike FP, have
> arithmetic operations formally decoupled from rounding, which
> is with SWI considered an independent operator that application
> invokes or doesn't invoke, like any other operator, as
> appropriate for the algorithm) or a clean formalism with
> factored constraints (eq. (5) & (15) for EC, (21) & (23) for
> QI). With everything mangled together it was difficult in that
> approach to see what exactly was missing (eqs. (20) & (8), (5)
> in [T3]) and how to fix the decodability problem (eqs. (21),
> (23) in [T3]).

Ok, thanks.

> > That's because you're mixing two things here. If you consider
> > BWT + AC as one code, you would see that the purpose of BWT
> > is just to provide an alphabet and symbol definition to the
> > AC to make it efficently compressible. ...Thus, BWT+AC does
> > a lot more than "predict the next symbol". It also defines
> > what "the symbol" should be, in a clever way. That is,
> > the "probabilities" within AC are not "probabilities of
> > words".

> I am not missing anything. You are merely hardwired to translate
> everything when the term "model" comes up into the AC modeling
> paradigm, that particular division of labor with its
> probabilistic language and the "probabilities of the next
> symbol" (see distinction (A2)). BWT or LZ don't do the modeling
> of finite sequences in that language. BWT output can be coded
> enumeratively without ever performing any translation of BWT
> model (the BW bare transform output column R, with its full
> context tree) into "probabilities of the next symbol" or any
> probability at all. The enumerative coder only needs to know the
> boundaries of the fragments (which represent enumerative
> classes) to produce the index. The whole AC translation layer is
> never computed or considered. All the communication between the
> modeling engine (BW transform generator + segment constructor)
> and the coder is in terms of the finite sequence parameters
> (fragment positions & lengths). The probabilistic language is not
> needed by either and that layer is simple never constructed.

That's about what I meant, though. If you think that I'm thinking
in "symbols or probablities" then this is just not so, which is
why I wrote that AC isn't used to "predict the next symbol". If so,
then the word "symbol" is not used in the right sense.

> With existent BWT implementations, you have variants where
> probabilistic layer gets constructed, and others where it
> doesn't (e.g. MTF & various forms of runlength & other universal
> and synthetic Huffman codes). Depending in which language you
> interpret those methods which never compute any probabilities,
> you can at best insist that they use implicit probabilities.

Ok, agreed.

> >From the perspective of "descriptive" modeling ([T2] pp. 26-35)
> though, that kind of non-functional conceptual scaffolding is a
> superfluous, low resoluton post hoc "explanation" of the much
> richer and finer-grained finite sequence tranformations and
> properties. That was not the "mixing" of the "two things"
> but the main point of the distinction (A2).

Ok, so a misunderstanding on my side. Sorry.

> >> Or, expressed as a ratio of AC's over QI's maximum quantization
> >> redundancies:
> >>      AC(N) / QI(N) = 4        ... (4)
> >
> > What is the definition of AC(N), resp. QI(N) here?

> That was from the cited refs right above that place (cf.
> eq. (20) p. 14 [41a] for AC and d(g) p. 8 in [T3]...).
> They are upper bounds on absolute redundancies, in bits.
> >From AC(N)=4*N*A*log(e)/2^g for A=2 => AC(N)=8*N*log(e)/2^g
> while QI(N)=2*N*log(e)/2^g, hence the ratio (4).

> Note that AC quantization losses are due to truncation of the
> infinite fractions (compared to unlimited precision Ac) and to
> its variable size mantissa for Pc (I assume you clarified by now
> the conventions of the Volf-Tjalkens AC formalism), which is
> allowed to drop to 2^g/4, thus to g-2 mantissa bits, before they
> renormalize it. That leads to a lower precision intervals (the
> mantissa of Pc) and a reduction of Pc for the next stage. QI,
> which does that quantization outside of its coding loop (eq.
> (21) p. 8 [T3]) can afford to be more careful and keep its g bit
> mantissa always normalized to exactly g bits. Also, working in
> integers QI doesn't have the infinite fractions that must be
> forgotten (one way or the other) on every step. E.g. near the
> axis y=x, where binomials are nearly equal (the same exponents),
> adding two binomials C1+C2 leads to rounding up only 1/2 of the
> time (only if the lowest bits of C1 and C2 are different).

Ok. Thanks.

> > There has not yet been an independent empirical
> > observation.
> > For my place mostly because I can't compile your
> > source. (-;

> The current [QIC} 1.03 has _ASM option in Intro.h to turn off
> few MSVC _asm macros (for bit position & mul/div). Also, Jasen
> Betts who posted in this thread has created a unix version (with
> different, perhaps a bit lower res timers). I don't have a set-
> up to test it at the moment so it is not released (it will have
> another directory port\* with his win32 substitutes), but he
> could probably email you his copy that does compile and run.

I would need to get rid of Windows.h/conio.h. A makefile is put
toghether fastly, and I don't mind about the speed. At least *not now*
because too many things are compiler dependent in this area, unless
the speed differs in magnitudes. Isn't there a switch in VS to force
it to ANSI C?

> > QI is a LIFO coder. Whether this is good or bad depends on the
> > application, but it ought to be said.

> QI can code as LIFO or FIFO (cf. N1, p.9 [T3]). Note also
> that encoder can send data out with delay g bits, in LIFO or
> FIFO mode. In FIFO mode decoder can also decode it as soon as
> it arrives, while in LIFO mode it has to wait until the whole
> block arrives. The sample source code uses only LIFO mode.

> Also, within descriptive modeling there is no FIFO/LIFO coding
> concept. 

Maybe not, but in practical applications there is. (-;

> The sequence handed to the coder+modeler is coded as a
> whole. The coding task is considered as a request to produce the
> minimum length description of that sequence, understandible to
> decoder and taking into account any shared knowledge about it
> that encoder & decoder may have. The coder & modeler do not
> gratuitously impose upon themselves some "ideological"
> constraints, or drag in by inertia various conceptual leftovers
> from the Morse telegraph era, of having to model+encode one
> symbol at a time, from start to end before they can "touch" the
> next one symbol, which also must have index i+1 (or that
> symbols have to be read in any particular order or just once or
> just one symbol in one coding loop step,... etc).

That's quite not the point. The entropy coder is often only a minor
part of an overall design, and you often do not have the choice of
defining the coding order at your will. Besides, hardware applications
cannot buffer large amounts of data.

So long,
	Thomas
0
Thomas
1/24/2006 11:30:01 AM
> If so, then your collection at least had a
> very strong tendency.

What kind of "tendency"? To show what AC shortcomings were
solved by QI? At the top is says what are the 4 elements that
follow, and that is exactly what was done:

"For the rest of this post I will address the QI's
solutions for the four principal remaining weaknesses
and tradeoffs introduced by the AC algorithm."

>> First, mul/div will consume more power than a simple table
>> lookup, especially if tables are in ROM on a portable
>> device ...
>
> This depends. If the CPU has to idle for many cycles,
> it will draw more power during the idling that during
> the data fetch.

The pertinent comparison is between mul/div vs memory read
(RAM or ROM). The latter takes less power.

>> For general entropy coder, the binomial tables have
>> N^2/4 entries, where N is the max block size.
>
> Do you mean (N^2)/4 or N^(2/4) = sqrt(N)?

The binomial table is quadratic (see N2 p. 9 [T3] where table
sizes are discussed). Hence the size is N*N/4. The C source
shows a more accurate value: N*N/4 - N +1 (see Qi.c function
new_qbct(), the rows K=0,1 are not stored).


> Thus, table size grows O(N^0.5) where N is the size
> of the message.

Double _no_: It is O(N^2), not O(N^0,5).

The N is not the size of the message, it is table size. The
table size determines max block which can be indexed as single
index. The input string length may be longer or shorter. The QI
modeler performs decomposition of the input string S(L) of
length L into enumerative classes (see [T2] pp. 27-35). The
output of modeler is a set of strings X1(n1), X2(n2),... The
table size N only limits n1<=N, n2<=N,... The strings X1(n1),...
_may_ be segments of S(L), such as for the simple quasi-
stationary memoryless source. For stationary source one would
also have n1=n2... Generally, neither is true i.e. the strings
X1(n1),... have no simple relation to S(L) or to each other e.g.
they could be fragments of the BWT output column R, each of its
own length (determined by the optimal, in MDL sense, partition
of R, see (A3) & my previous reply).


> That is, in realistic setups, the question remains whether you
> receive at all a point where the advantage pays off. Or,
> mathematically, there is a lower bound M on the memory
> requirements and a random source such that for all table sizes
> smaller than M, QI is outperformed by AC.

Note that even for table limits N=32 or 64, where exact EC has
been used, in hierarchical approximation (see Oktem) is fairly
competitive with AC. With QI, when you change the max block size
N, what changes is the balance between how much is being output
as index (within enumerative class) and how much as "class tags"
(the model info, such as count of 1's). Since QI packages "class
tags" if they are compressible (see "entropy pump" in [T2] p.
27), the only difference is how much work it does. It will work
faster if it uses maximum blocks provided by the table limit N.
But the upper limit N on table size depends on CPU cache and if
you make it too large it can slow the coder down. Value N=1K
happens to be the right balance for most machines tested,
although on newer machines, the table limit N=2K even 4K works
faster than N=1K. Now, that doesn't mean that X1(n1) has to have
n1=1K or 2K or some such. N is only the upper bound on lengths
of enumerative classes strings modeler can hand to coder.

The the effect of the input S(L) length L is that speed ratio
generally increases for longer L. The high level reason for this
is that the QI's better division of labor pays off better at
larger scales. The lower level mechanisms by which this general
pattern is realized may vary depending on coding task e.g. on
stationary source with p=probability of 1, after multiple blocks
of length N, the table elements around the line y=p*x will load
into cache, so that later passes will work faster.

Note also that timers & timing methods included in [QIC] QI.exe
may not be very accurate on short blocks (for N<128) due to
timer granularity (~280 ns per tick). The EC.exe program (and
source) includes type of loop needed to tests speed of exact EC
on short blocks (N=32).

> It remains to check what M is and whether M is small enough
> to make the above a realistic argument for the proposed
> application, namely embedded devices.

If the total input length L is very small, e.g. 32 or 64 bits,
the AC and QI times are so short and comparable to timer
ganularity and also subject to general OS time-slicing effects,
that the variation from test to test will be of similar
magnitude as the times themselves. One would need a batch method
used in EC.exe loops to refine these timings (QI will still come
out faster since it does much less work and the tiny table
easily fits in cache).

> Pretty small block sizes, actually. There's no blocking
> requirement for AC, so it may happen that for a given
> random source AC outperforms QI in the long run. Now
> since it has been understood that there are random sources
> where AC might perform better,

How does that follow? You need a finer resolution of
the concepts 'input length L', 'table limit N' and coder's
message lengths n1, n2,... There is no basis in theory
or in experiment for any such 'conjecture', be it for
speed or for compression effectivness.

> what are the random sources QI might outperform AC.

Any. As explained earlier, QI can code exactly as AC, in stream
mode, if there is anything for which that is the best method,
with the only difference that QI quantization is more accurate
(due to QI's better utilization of g bit mantissa, which AC
allows to vary to well below g bits; plus due to QI's integer
arithmetic without needless infinite fractions which have to
be dropped by a finite precision AC, one way or the other).


> And if so, are they relevant to the target application? As
> I read you, QI shows an advantage in the high entropy domain.
> What is your target application? (Video/audio data that come
> out of a predictor are IMHO not 'high entropy' because there's
> even for lossless a compression gain of 1:2 to 1:4. But then,
> what's high entropy?. How high?)

I think you need to clarify first what is "perform"? You seem to
have switched somewhere along the way from speed (table sizes,
cache) to compression ratios. For any set of messages M=
{M1,M2,...}, provided all coders have a "perfect" model of M, QI
will _always_ compress M to a smaller or equal size as AC
(trivially, AC or any compressor, can compress individual
message from M such as M1 shorter by simply assigning it
codeword of length 1).

Of course, with "imperfect model" of M, a random pick of
codewords (satisfying Kraft inequality) may do better on M than
any other coder. Regarding the "imperfect" models of M, though,
as illustrated by array "Vary" (p.10, [T3]), the QI's
"descriptive" modeling is much more resilient to "surprise" than
the usual "predictive" AC modeling. The "descriptive" method is
always "surprised" less than the "predictive" method.

The AC codewords are an approximate form of QI codewords, which
use needless scaling of addends (which results in infinite
fractions for exact AC which are "forgotten" by a finite
precision AC), randomly fluctuating mantissa length (a very bad
idea for precision), sub-optimal top down quantization (the
largest addends quantized first, in contrast to optimal bottom
up QI quantization of (21) in [T3]).

The difference is essentially as if you went out and randomly
changed all codeword lengths for set M and then checked if the
total violates Kraft inequality and retried the whole thing if
the violation was found. If the original set was optimal (the
shortest total output size on M for given precision g), as it is
with QI (assuming M is a valid enumerative class or a set of
such classes, i.e. that we have a "perfect" model of M used by
all coders), anything you get from your random variation will
produce always a larger (or at best remain equal if you are
very lucky) output for M.

Also, I have no idea what is this about "high entropy"? I did
mention radix codes, and we used that for little tests here
in this thread. But that is not the only advantage. You can
check the table in [T3] or do tests with the source [QIC].
The compression advantage is across the spectrum of inputs.

The point of (A3) in the summary is that those small advantages
are merely a tip of the iceberg, and tip on which AC is at its
best. Compared to AC, QI has intrinsically higher precision,
lower noise coding and finer-grained more flexible modeling
interface and language more suitable for finite sequences. The
test results on "conventional" compression tasks show only a
relatively small QI compression advantage, since "conventional"
is by definition the subset of finite sequences where AC
performs well. (A3) points out that there is much more beyond
the "conventional" subset and the QI compression gains are not
limited to the few percent shown in the [T3] table.

>> N^3/2 entries  to fit in the cache. For N=1K, that is about
>> 3% of the 1MB, or 32KB, well within the desktop processors
>> cache sizes.
>
> Huh? N^3/2 grows faster than N^2/4.

With full parentheses that should be N^(3/2). The other one
is (N^2)/4. It should have been clear from the immediate
context that N^3/2, which came from N*sqrt(N), is N^(3/2).

>> C4) ...
>> For very sparse arrays, due to larger block size N
>> available (for any given cache size), in multi-block
>> coding this version runs typically 2-3 times faster
>> than the generic QI (only generic QI was used for
>> the comparison table in [T3]).
>
> How does this compare to AC?

The genric QI results vs Moffat98 AC are in the table in [T3].
You can extrapolate the top rows speed ratios (which is where
sparse coder applies) by such factor.

> How to specific high-speed implementations as MQ?

I have given you [Said04] reference which has results for MQ
(and other quasi-ACs), along with full precision AC. The MQ is
only 2-3 times faster than full ACs, which doesn't even come
close to QI vs full AC ratios, even for generic QI, much less
for specialized QI version optimized for some range of inputs
(which would be a more fair comparison against specialized ACs).

You could, of course, use runlength coders, which will run as
fast as QI on very sparse inputs, some coarse grained ones even
faster than QI. They will also have larger redundancy than even
AC and will perform very badly (in speed and compression) if the
input isn't exactly the "ideal" low entropy source for which
their codewords were precomputed.

> I don't deny that QI might be interesting. However,
> I don't like unproved claims...

Placing the source code for public access is equivalent to
publishing a mathematical proof (in non peer reviewed preprint).
Now, if you can't compile source on your system, that is
equivalent of saying you can't read the published proof because
you don't read English. That is not the same thing as "unproven
claims." There are hundreds of millions win32 PCs which can
compile or run the [QIC] code.

>> Speed, or optimality? I don't think you can have both.
>> Well, QI does have both.
>
> ... as this one.

The QI source code has been out there. With the level of
hostilty from some corners, anyone who wanted to empirically
falsify that claim, has a plenty of chance. Or, with preprints
publicly available, which show why it is so, one could have
shown that some key mathematical step is fatally wrong, making
the whole scheme flawed. Aside from minor typos and a minor
complaint of one of the authors whose EC contribution was cited
in [T3] (he though I should have put emphasis on a different
aspect of his contribution, which I will in the next revision),
no falsification of even a minor point has turned up.

>> It codes more accurately than the most accurate AC,
>
> Does it? With blocking? It means that the "memory"
> of the coder and thus probabilities are only based
> on a relatively short sample set, where "short" =
> block size.

The blocking doesn't limit the modeler state variables or how
the modeling is done. The modeler considers entire input
sequence as a request to provide minimum decodable description
of that sequence, taking into account any shared knowledge coder
& decoder may have about it. Adaptive/predictive modeling is a
proper subset of descriptive modeling. Descriptive modeling
merely removes needless self-imposed constraints of the adaptive
modeler.

The blocking only limits how big is the maximum single index
that coder can compute. If the enumerative class provided by
modeler has sequences longer than coder's table limit N, coder
breaks them into smaller blocks. The block boundaries don't
cause any bit fraction loss since QI codes these fractions via
mixed radix codes (cf. N4 p. 9 [T3]).

Your conclusion is no different than concluding that one can't
compute Pi to more than 15-16 digits on a PC, since that is how
many digits floating point unit handles in a 64 bit 'double'.

> I don't know how they implemented MQ, but you can get
> it to a speed where it is close to writing out bits
> uncompressed. At least, I would want to do an
> independent measurement.

Well, you can emil him and ask. From the paper, he did seem to
go out of his way to optimize the coders tested. Moffat98 also
has a test of their coder against Q coder, which they claim is
the coder and implementation to beat (on speed), again showing
only 2-3 ratio, which is not even close to QI's ratios. Based on
those two, I decided not to waste any time 'reserching' the
quasi-AC flora and fauna.

Note also that even a mere examiniong and copying of
uncompressed bits, bit by bit, isn't the fastest way to do it.
Look for example at the QI's block coding loop (EncDec.c) which
at the top has:

    do n+=32;                   // Encode next 32 MPS
      while((w=*s++)==0);       // find next nonzero dword

Anything that examines and merely copies bits, bit by bit,
on a sparse array will run slower than doing it 32 or 64 bits
per instruction (the output index was padded with 0 bits via
memset). And the above isn't even optimized QI code (e.g. it
doesn't need to do n+=32 inside loop, since the pointer's
increment already keeps track of the count, hence we can
calculate n outside of the loop as n=((int)s-s0)<<3; further
one could unroll loops as done in Moffat98,... etc).

>> function is made explicit (the speed & power consumption),
>> from the title of that section down to the descriptions
>> & car factory analogy that followed the sentence you quoted.
>
> Not really. If you make this claim, you possibly should
> back it up because I do not yet find the arguments too
> convincing.

Regarding speed & power consumption, as stated above, the QI's
division of labor is clearly better, as explained there at the
two levels, within the coder and between the coder and modeler.
Taking calculations out of the coding loop for the values which
are universal constants for a given class of sequences is better
for speed and power consumption. Reorganizing modeler calls to
coder, so they don't go symbol by symbol but for the entire
sequence of symbols is better for the speed and power
consumption.

What you're mixing in, as your next comment shows, is the
separate question, whether such organization will affect
compression negatively. A fair question, but whatever its
answer, the above "better for speed & power..." is still
perfectly valid, as stated.

> The question is: Is this a useful thing to do? I do
> have situations where I know in advance that the bitstream
> I'm pushing into the coder consists actually of several
> interleaved random sources which behaive quite differently.
> AC allows me to define contexts. Can QI do that? With its
> tight modelling, is it able to exploit this statistics?

Of course it can, in more ways than one. That was already
answered at the end of the original post:

M1.
http://groups.google.com/group/comp.compression/msg/27c6f329038a1bdc

which points to methods (a) and (b) in an earlier post:

M2.
http://groups.google.com/group/comp.compression/msg/1314ff87da597fad

The more relevant method for your setup is (b). That is all
assuminmg only use of the existent AC modeling engine i.e. which
parametrizes the properties of the finite sequences in the
probabilistic language and which converts all that it can
extract from the sequence into "probability of the next symbol".
Generally, that is only a coarse grained, impoverished way of
modeling the finite sequences. But, if as you say, that is all
one is allowed to have for the sake of argument, then QI can use
methods (a) and (b) above. The price paid is the compression
quality of adaptive AC and in case of method (a), the coding
speed of AC.

Of course, there is neither law nor theorem, be it in practice
or in theory, requiring that one has to use coarse grained
probabilistic parametrization on finite sequences or gratuituous
constraints and communication bottleneck "probability of single
next symbol" or model+code "online" (meaning analog line on-
line, a la Morse telegraph). Needlessly imposing upon onself
such gratuituous constraints is simply a result of conceptual
intertia and a herd mentality.

> Example: Consider a source consisting of two first-order
> Markov sources A and B. At an even timestep, I draw from
> A, at odd timesteps, I draw from B. With modelling, the
> implementation of an AC coder that is optimized to this
> situation is easy. What would I need to do with QI? Not
> exploiting the special nature of this source might be
> very wasteful.

QI would simply classify odd and even elements into two separate
enumerative classes, each having its own index. The QI output
will consist of 2 indices I1 and I2. Since QI also knows the
exact upper bounds M1 and M2 on I1 and I2 (these are all
integers with QI, M1 & M2 are SWI mantissas from quantized
addend tables), it would encode the top g=32 bits of I1 and I2,
call them D1 and D2, as digits in mixed radix M1, M2, i.e. as I
= D1 + M1*D2, and this "I" will have upper bound M=M1*M2, in
case you need to package similarly that index via mixed radix
with some other components of your output. The combined index
will be Iq consisting of concatenated leftover sections of I1
and I2 (which now fill the full bits since SWI addends which are
quantized full length upper bounds on I1 & I2, and whose
mantissas are M1 & M2, have all bits beyond the top g bits=0,
i.e. no additional redundancy beyond that contained in the QI
quantized addends is produced by these truncated I1 & I2, while
I has no additional error since it is computed as a 64 bit
product without any approximation) with the mixed radix number I
as the high 64 bits of Iq.

Note that AC will produce single index Ia ( longer than QI's Iq,
due to suboptimal AC quantization + unavoidable O(1) term at
least), but unlike QI's index Iq where QI knows its exact upper
bound M, AC's coarse grained probabilistic parametrization
requires it to normalize all its indices to have the same upper
bound 1.00 (since they 'supposed' to be 'cummulative
probabilities'), obliterating in the process the bit fractions.
Hence the AC's complete index is always in integer number of
bits, while QI's is in fractional bits (which can be packaged
exactly via mixed radix codes if there are more than just one
item to send, or in all cases via tapered/flat Huffman code,
where one wastes on average less than 1-log(e)+log(log(e)) =
0.086... bits per index from the exact bit fraction).

> If you care about compression performance, the only
> thing that is meaingful is the limit N->infinity.

Why? Have you ever coded an infinite sequence? Do you expect to
ever code one? It is meaningful only in the sense that such
coarse grained parametrization (in terms of infinite limit
values, after all the 'vanishing' terms can be dropped) has an
advantage of having often simple closed form expression which
are easy for human consumption.

> Is there a finite memory implementation of QI that
> runs optimal in this limit?

AC or QI implementations tested were limited to max inputs
of 2^30 bits. Any finite precision coder, AC or QI or
any other, has redundancy terms O(N) due to quantization.
Hence none of them will approach entropy in the limit n->inf.
For binary alphabet, QI's O(N) term is 4 times smaller than
AC's O(N) term (see (A3) and refes there, already discussed).

The QI table size limit, call it TN, has nothing to do with
these O(N) terms. (They would have a relation if one were
outputing blocks in whole number of bits, which QI doesn't
do, cf. N4, p. 9 [T3]). Of course, the addends at N=TN have
particular quantization error, which is a special case of
these O(N) terms, as do addends at N>TN and addends at N<TN.

{ The program QI.exe included with [QIC] has a command "ct"
which displays table stats, showng such redundancies for
the whole table, max, avg, last row, etc. Command "cbr"
shows the same for any row, plus individual quantizations,
on the last step and cumulative, for each binomial. These
commands, unlike the full coding tables, keep only one
row at a time, so they go up to much higher values N
e.g. N=1M while coding tables max is N=16K. }

> This wouldn't cover the random source example above.
> Of course, I can model this as some 2nd order markov
> chain, but I then pay the penalty of making it harder
> to adapt to it. In other words, I do not make optimal
> usage of my apriori knowledge on the source.

The method (b) in (M2), with the extra for your even/odd example
would have no problem with non-stationary even/edd example. The
quasi-stationary source for QI modeler means it has to select
the proper block boundaries (so that within a block it has
approximately stationary data i.e. a proper enumerative class).
The additional even/odd condition you specified only means that
this segementation is done separately for even and for odd input
array indices.

> What means "descriptive"? As I see it, you keep relative
frequencies or symbol counts here.

Minor point: relative frequences (that enumerative mode AC
would use) are less accurate parametrization of finite
sequences for limited precision coders than the exact
integer symbol counts (that QI uses).

> Whether you call this "descriptive" (as in, I have counted
> so and so much sequences of this kind) or "predictive" (as
> in, I predict the probability of a symbol due to the counts)
> is just a matter of language. Thus, what *is* the difference,
> leaving language issues alone?

No, that is not the difference. In "predictive" modeling scheme
modeler is trying to predict next symbol at position i+1 (in the
sense of calculating probabilities for values of the next
symbol) based _only_ on the symbols at positions 1,..,i. In
"descriptive" modeling there is no such restriction on what kind
of correlations symbols are allowed to have. Whenever the
symbols 1..i are not correlated well enough with i+1, be it
because i is too small or 1..i is low entropy data with too
little info altogether or because of nature of data, the
predictor pays a price for a wrong presumption.

Of course, any practical predictive modeler has to cheat the
"ideal" predictor scheme and use various ad hoc Escape schemes
(where encoder is allowed to look ahead of decoder and clue it
about sudden change) to get around the worst case penalties for
entirely wrong prediction that a "ideal" predictor would pay.

Consider array "Vary" = int32 { ..-2,-1,0,+1,+2,.. } which was
shown in [T3] p.10. { Any array with sudden changes will do
here, but this one had advantage (in view of 10 page limit) that
I could specify it unambiguously in less than half a line. }

Here you have a big "surprise" in the middle of the array, where
mostly 1's change to mostly 0's. Also at each 32 bit block,
there are little "surprises" of similar kind. Both QI and AC
were order-0 coders (i.e. no correlation assumed from a[i] to
a[i+1]), and both are allowed to assume quasi-stationary source,
i.e. that densities of 1's & 0's may vary along the array. The
predictive Moffat98 AC did very poorly here, especially for
shorter arrays where at N=4K it produced twice as long output as
QI. The QI quasi-stationary order-0 modeler used was very
simple: it looks at the whole array as a single block and splits
the block in exact half if the two halves would encode shorter
(including the cost to indicate the split, which uses longer
Huffman codes for the counts) and it is allowed to do the split
down to 1K and then stop. { NOTE: There are much better ways to
do this kind of modeling via sliding window splits, which are
nearly as fast as the simple method used but much more accurate.
There are also better ways to code counts instead of precomputed
Huffman tables for binomial distribution, but none of it was
used in the tests in [T3].}

The negative effect of the "suprprise" for the descriptive
modeler are limited to the second order items (small compared to
enumerative index), such as counts in this case, which cost
slightly more to encode when non-default block size is used for
enumerative class. With "predictive" AC modeler, the "surprise"
affected every "next symbol" (especially from the middle of the
array) with codeword lengths for MPS and LPS selected exactly
backwards, until eventually the frequences evened out (thus it
finally realized what is the actual MPS from now on), but that
happened at the very end of the array, well, too late. The
freuency scaling didn't help much for shorter arrays. Of course,
Moffat98 had additional disadvantage with this kind of input
since it is tuned to give extra advantage to MPS, giving it
shorter codes than what even the past frequences suggest. So the
wrong MPS vs LPS for much of the second half, penalized it
more than necessary. This quirk of Moffat98 AC was explained
in the earlier post:

M3.
http://groups.google.com/group/comp.compression/msg/efa6336f483bbb89

There isn't really a good way for _pure_ predictive modeler to
fix this problem. Namely, if it changes the adaptation rate, to
adapt faster, thus scale accumulated frequences down more often,
that will make the coding of genuine stationary sequences less
efficient due to greater error in the probabilities (fewer bits
used for p, hence greater effect of single count fluctuations).
The only way to get around for practical "predictive" coders is
to cheat from the "pure predictive" ideal and clue in the
decoder to switch gears via some ad hoc escape mechanism. Since
"descriptive" modeler is not bound by some imagined "predictive
ideal" (but only to MDL considerations, it is much more
consistent and resilent in real life situations, where
everything is non-ideal. Of course, it doesn't mean that
"descriptive" coder has sworn never to select "predictor" as a
method to describe some sequence if that provides the minimum
length for total output. It only means that "prediction" isn't
set on a pedestal as the "only true way".

Naturally, as with any specialized compressor, good or bad,
there is trivially a subset of finite sequences on which "pure
predictive" modeler will do slightly better than the
"descriptive" modeler. You're welcome to offer an example of
array on which predictive modeler will do a lot better (of the
similar ratios as with "Vary" results) than descriptive, if you
believe there is any. I don't think there is. (While it is quite
easy to trick pure predictor to perform very badly.)

> All fine with me. For that, I would prefer:
>
> a) not to state results for tests that haven't
> been verified yet.

I consider providing publicly source code in the most commonly
available standard language C for the most common platform
win32/MSVC, equivalent to providing a mathematical proof in a
conventional formalism and in the most commonly read human
language.

> b) to have a source code here I can actually make
> use of. ...
>
> I would need to get rid of Windows.h/conio.h.
> A makefile is put toghether fastly, and I don't
> mind about the speed.

Sorry again about win32 & MSVC. Jasen Betts has made
a unix compilable version (which includes his windows.h
substitute and few changes for timer functions) which
he could email you. His email is in his posts in this
thread.

> The entropy coder is often only a minor part of
> an overall design, and you often do not have
> the choice of defining the coding order at
> your will.

I was talking about a more powerful and much faster
modeler+coder scheme which is opened by the existence
of a universally applicable, very fast across the spectrum
of inputs, high precision, low noise (the absolute optimum
at any arithmetic precision available) coder.

To take the full advantage of this largely unexplored
algorithmic territory (A3), being opened (the wonder
algorithm of the 90s, the Burrows-Wheeler transform
with its mesmerizing beauty, was a mere early precursor
of the powerful 'magical' algorithms from this territory),
there is lots of work to be done for anyone who decides
to explore these virgin lands ahead of the crowd.

For the existent AC modelers which can't be upgraded, QI can
still produce an equal compression as AC, but still code it
much faster via method (b) described earlier. { This is an
area I am less interested in, although I may do some QI
adaptations of this kind for our company clients if that
is requested. }

> Besides, hardware applications cannot buffer
> large amounts of data.

Well, even just two symbols ahead is more than "single next
symbol" ahead. Even that little flexibility gives you a slight
edge, when you don't needlessly constrain coder to "one next
symbol". I have yet to run into a coding task where coder is
required (by the nature of the task, not by some arbitrary
dogma) to encode and output 1 symbol at a time. Everything that
is compressed needs packaging and framing of some sort, in
hardware and in software, and doing such framing symbol by
symbol makes no sense.

0
nightlight
1/24/2006 9:16:18 PM
Hi,

> > If so, then your collection at least had a
> > very strong tendency.

> What kind of "tendency"? 

The tendency of advocating one algorithm without presenting
the shortcommings.

> > This depends. If the CPU has to idle for many cycles,
> > it will draw more power during the idling that during
> > the data fetch.

> The pertinent comparison is between mul/div vs memory read
> (RAM or ROM). The latter takes less power.

This is an unfair comparison because the full operation of
performing the read a) stalls the program and by that b)
draws more power.

> >> For general entropy coder, the binomial tables have
> >> N^2/4 entries, where N is the max block size.
> >
> > Do you mean (N^2)/4 or N^(2/4) = sqrt(N)?

> The binomial table is quadratic (see N2 p. 9 [T3] where table
> sizes are discussed). Hence the size is N*N/4. The C source
> shows a more accurate value: N*N/4 - N +1 (see Qi.c function
> new_qbct(), the rows K=0,1 are not stored).

So it's O(N^2). I don't care about the factors (and it is
unusual to care about them).

> > Thus, table size grows O(N^0.5) where N is the size
> > of the message.

> Double _no_: It is O(N^2), not O(N^0,5).

> The N is not the size of the message, it is table size. The
> table size determines max block which can be indexed as single
> index. 

In which relation is N to the size of a block? Linear? I want
a dependency on the block size. (Every codec can be blocked, but
that's a sub-optimal solution because you loose statistics.
Or more specific, you cannot be asymptotically optimal if you
cannot push N towards infinity.)
Specifically, I'm interested in the behaivour for N -> infinity.

/* snip */

> > That is, in realistic setups, the question remains whether you
> > receive at all a point where the advantage pays off. Or,
> > mathematically, there is a lower bound M on the memory
> > requirements and a random source such that for all table sizes
> > smaller than M, QI is outperformed by AC.

> Note that even for table limits N=32 or 64, where exact EC has
> been used, in hierarchical approximation (see Oktem) is fairly
> competitive with AC. With QI, when you change the max block size
> N, what changes is the balance between how much is being output
> as index (within enumerative class) and how much as "class tags"
> (the model info, such as count of 1's). Since QI packages "class
> tags" if they are compressible (see "entropy pump" in [T2] p.
> 27), the only difference is how much work it does. It will work
> faster if it uses maximum blocks provided by the table limit N.
> But the upper limit N on table size depends on CPU cache and if
> you make it too large it can slow the coder down. Value N=1K
> happens to be the right balance for most machines tested,
> although on newer machines, the table limit N=2K even 4K works
> faster than N=1K. Now, that doesn't mean that X1(n1) has to have
> n1=1K or 2K or some such. N is only the upper bound on lengths
> of enumerative classes strings modeler can hand to coder.

You're too tight at the machine. If I'm discussing optimality, I
don't care about cache sizes. This is a second argument that then
has to follow as soon as the code should be made fast. *NOT NOW*

By limiting the table size, you limit the quality by limiting the
block size. I want to understand how the block size relates to the
table size, and then I want to understand what happens in infinitely
large blocks. *Then* one can enter the discussion whether it pays off
speedwise to use blocks.

> Note also that timers & timing methods included in [QIC] QI.exe
> may not be very accurate on short blocks (for N<128) due to
> timer granularity (~280 ns per tick). The EC.exe program (and
> source) includes type of loop needed to tests speed of exact EC
> on short blocks (N=32).

I *don't care* about timing. Not now. And if I had to, I would use
my own method of measuring it.

> > It remains to check what M is and whether M is small enough
> > to make the above a realistic argument for the proposed
> > application, namely embedded devices.

> If the total input length L is very small, e.g. 32 or 64 bits,
> the AC and QI times are so short and comparable to timer
> ganularity and also subject to general OS time-slicing effects,
> that the variation from test to test will be of similar
> magnitude as the times themselves. 

You're shifting off. I will find methods to measure - you shouldn't
care.

> > Pretty small block sizes, actually. There's no blocking
> > requirement for AC, so it may happen that for a given
> > random source AC outperforms QI in the long run. Now
> > since it has been understood that there are random sources
> > where AC might perform better,

> How does that follow? You need a finer resolution of
> the concepts 'input length L', 'table limit N' and coder's
> message lengths n1, n2,... There is no basis in theory
> or in experiment for any such 'conjecture', be it for
> speed or for compression effectivness.

It is pretty simple: If you encode messages in "blocks", then
the coder aparently cannot use knowledge from the previous
block to encode the next one. Thus, you loose information, thus
you are suboptimal. Whether this information is statical, are
"probabilities" or whatever does not matter. You cannot get better
by blocking, you can only get worse.

> > what are the random sources QI might outperform AC.

> Any. As explained earlier, QI can code exactly as AC, in stream
> mode, if there is anything for which that is the best method,

No, it can't, obviously. If I have to block, I can build an
AC coder - in principle - that carries the statistics over a size
that is larger than a QI block size. See above.

> with the only difference that QI quantization is more accurate
> (due to QI's better utilization of g bit mantissa, which AC
> allows to vary to well below g bits; plus due to QI's integer
> arithmetic without needless infinite fractions which have to
> be dropped by a finite precision AC, one way or the other).


> > And if so, are they relevant to the target application? As
> > I read you, QI shows an advantage in the high entropy domain.
> > What is your target application? (Video/audio data that come
> > out of a predictor are IMHO not 'high entropy' because there's
> > even for lossless a compression gain of 1:2 to 1:4. But then,
> > what's high entropy?. How high?)

> I think you need to clarify first what is "perform"? You seem to
> have switched somewhere along the way from speed (table sizes,
> cache) to compression ratios. 

There are different arguments here. Argument 1) is that I don't
thrust your argument about asymptotic optimality. Mainly because
you tell me that you cannot perform the limit N->infinity.

Argument 2) is that you claim that you're faster than AC. I do
not buy this because you're using tables and you're likely running
into cache stalls.

One *can* trade speed for optimality, but I can also do that with
AC. There is nothing wrong with doing so, but then you should state
this.

> For any set of messages M=
> {M1,M2,...}, provided all coders have a "perfect" model of M, QI
> will _always_ compress M to a smaller or equal size as AC
> (trivially, AC or any compressor, can compress individual
> message from M such as M1 shorter by simply assigning it
> codeword of length 1).

Obviously not, namely as soon as M becomes larger as the block size.
But then, optimality is defined for the limit message size -> infinity.

> Of course, with "imperfect model" of M, a random pick of
> codewords (satisfying Kraft inequality) may do better on M than
> any other coder. Regarding the "imperfect" models of M, though,
> as illustrated by array "Vary" (p.10, [T3]), the QI's
> "descriptive" modeling is much more resilient to "surprise" than
> the usual "predictive" AC modeling. The "descriptive" method is
> always "surprised" less than the "predictive" method.

I do not make any claims about any *specific* message. I do not
care about them - not now.

> Also, I have no idea what is this about "high entropy"? 

That was your claim, namely that QI outperforms AC in the
"high entropy regime". So what is "high entropy"? 

> > How to specific high-speed implementations as MQ?

> I have given you [Said04] reference which has results for MQ
> (and other quasi-ACs), along with full precision AC. The MQ is
> only 2-3 times faster than full ACs, which doesn't even come
> close to QI vs full AC ratios, even for generic QI, much less
> for specialized QI version optimized for some range of inputs
> (which would be a more fair comparison against specialized ACs).

Have you or haven't you made direct comparisons? I don't thrust
to go into tables and compare implementations cross-wise. The
problem is that the data is possibly based on different 
implementations.

> > I don't deny that QI might be interesting. However,
> > I don't like unproved claims...

> Placing the source code for public access is equivalent to
> publishing a mathematical proof (in non peer reviewed preprint).
> Now, if you can't compile source on your system, that is
> equivalent of saying you can't read the published proof because
> you don't read English. 

No, that's because you don't write english. There's a well-established
standard for C sources, namely C89 or C99. Pick it, work in it, and
I can compile it. <windows.h> is not in C89, nor in POSIX or whatever.

> That is not the same thing as "unproven
> claims." There are hundreds of millions win32 PCs which can
> compile or run the [QIC] code.

That still doesn't make win32 available on my desktop. And just
because it is wide-spread it still doesn't make it ANSI-C.

> The QI source code has been out there. With the level of
> hostilty from some corners, anyone who wanted to empirically
> falsify that claim, has a plenty of chance. Or, with preprints
> publicly available, which show why it is so, one could have
> shown that some key mathematical step is fatally wrong, making
> the whole scheme flawed. Aside from minor typos and a minor
> complaint of one of the authors whose EC contribution was cited
> in [T3] (he though I should have put emphasis on a different
> aspect of his contribution, which I will in the next revision),
> no falsification of even a minor point has turned up.

Just by claiming that there is no error, or because no-one has 
found one you cannot prove that there is nothing. I'm on the
road of understanding and measuring, though the mentioned article
is, sorry to say, not very readable, and the code is not very
readable, and not very compilable either. I'm willing to invest
*some* work, but you also have to do your homeworks.

> > Does it? With blocking? It means that the "memory"
> > of the coder and thus probabilities are only based
> > on a relatively short sample set, where "short" =
> > block size.

> The blocking doesn't limit the modeler state variables or how
> the modeling is done. 

So what's blocking then? Is it ever possible to get a *clear* and
*short* answer? Do you or don't you restart the encoder on a block
end?

> The blocking only limits how big is the maximum single index
> that coder can compute. 

Thus, it *does* limit the asymptotic optimality.

> If the enumerative class provided by
> modeler has sequences longer than coder's table limit N, coder
> breaks them into smaller blocks. The block boundaries don't
> cause any bit fraction loss since QI codes these fractions via
> mixed radix codes (cf. N4 p. 9 [T3]).

> Your conclusion is no different than concluding that one can't
> compute Pi to more than 15-16 digits on a PC, since that is how
> many digits floating point unit handles in a 64 bit 'double'.

> > Example: Consider a source consisting of two first-order
> > Markov sources A and B. At an even timestep, I draw from
> > A, at odd timesteps, I draw from B. With modelling, the
> > implementation of an AC coder that is optimized to this
> > situation is easy. What would I need to do with QI? Not
> > exploiting the special nature of this source might be
> > very wasteful.

/* snip */

> QI would simply classify odd and even elements into two separate
> enumerative classes, each having its own index. 

Fine. How do I tell QI that this is a useful thing to do? If I
have pre-knowledge on the sequence as in the above example, how
to I tell the back-end? 

/* snip */

You're again posting things I haven't asked for. I asked for an
algorithm that allows me to drive QI optimally for this random
source. Do I need to do anything special?


> > If you care about compression performance, the only
> > thing that is meaingful is the limit N->infinity.

> Why? 

Because things like entropy and optimality make only sense in
this limit. There is no mathematically meaningful definition 
otherwise.

> > Is there a finite memory implementation of QI that
> > runs optimal in this limit?

> AC or QI implementations tested were limited to max inputs
> of 2^30 bits. Any finite precision coder, AC or QI or
> any other, has redundancy terms O(N) due to quantization.
> Hence none of them will approach entropy in the limit n->inf.
> For binary alphabet, QI's O(N) term is 4 times smaller than
> AC's O(N) term (see (A3) and refes there, already discussed).

I haven't asked this. Can I, or can't I modify QI such that
it doesn't do blocking. Can I, or can't I run QI such that it
runs optimal for infinitely long messages?

Please keep yourself *short*.

> > Whether you call this "descriptive" (as in, I have counted
> > so and so much sequences of this kind) or "predictive" (as
> > in, I predict the probability of a symbol due to the counts)
> > is just a matter of language. Thus, what *is* the difference,
> > leaving language issues alone?

> No, that is not the difference. In "predictive" modeling scheme
> modeler is trying to predict next symbol at position i+1 (in the
> sense of calculating probabilities for values of the next
> symbol) based _only_ on the symbols at positions 1,..,i. In
> "descriptive" modeling there is no such restriction on what kind
> of correlations symbols are allowed to have. Whenever the
> symbols 1..i are not correlated well enough with i+1, be it
> because i is too small or 1..i is low entropy data with too
> little info altogether or because of nature of data, the
> predictor pays a price for a wrong presumption.

> Of course, any practical predictive modeler has to cheat the
> "ideal" predictor scheme and use various ad hoc Escape schemes
> (where encoder is allowed to look ahead of decoder and clue it
> about sudden change) to get around the worst case penalties for
> entirely wrong prediction that a "ideal" predictor would pay.

> Consider array "Vary" = int32 { ..-2,-1,0,+1,+2,.. } which was
> shown in [T3] p.10. { Any array with sudden changes will do
> here, but this one had advantage (in view of 10 page limit) that
> I could specify it unambiguously in less than half a line. }

> Here you have a big "surprise" in the middle of the array, where
> mostly 1's change to mostly 0's. Also at each 32 bit block,
> there are little "surprises" of similar kind. Both QI and AC
> were order-0 coders (i.e. no correlation assumed from a[i] to
> a[i+1]), and both are allowed to assume quasi-stationary source,
> i.e. that densities of 1's & 0's may vary along the array. The
> predictive Moffat98 AC did very poorly here, especially for
> shorter arrays where at N=4K it produced twice as long output as
> QI. The QI quasi-stationary order-0 modeler used was very
> simple: it looks at the whole array as a single block and splits
> the block in exact half if the two halves would encode shorter
> (including the cost to indicate the split, which uses longer
> Huffman codes for the counts) and it is allowed to do the split
> down to 1K and then stop. { NOTE: There are much better ways to
> do this kind of modeling via sliding window splits, which are
> nearly as fast as the simple method used but much more accurate.
> There are also better ways to code counts instead of precomputed
> Huffman tables for binomial distribution, but none of it was
> used in the tests in [T3].}

But that is only a different look on the scheme. You also pay a price
for the surprise, if you want to say so. You need to pay the price for
encoding on where to split the block. Whether you call this the price
for misprediction, or the price for splitting the data does not
matter.

> > a) not to state results for tests that haven't
> > been verified yet.

> I consider providing publicly source code in the most commonly
> available standard language C for the most common platform
> win32/MSVC, equivalent to providing a mathematical proof in a
> conventional formalism and in the most commonly read human
> language.

The most comonly available standard language for C is ANSI-C. It runs
on all platforms. I do care about the rest.

> To take the full advantage of this largely unexplored
> algorithmic territory (A3), being opened (the wonder
> algorithm of the 90s, the Burrows-Wheeler transform
> with its mesmerizing beauty, was a mere early precursor
> of the powerful 'magical' algorithms from this territory),
> there is lots of work to be done for anyone who decides
> to explore these virgin lands ahead of the crowd.

> For the existent AC modelers which can't be upgraded, QI can
> still produce an equal compression as AC, but still code it
> much faster via method (b) described earlier. { This is an
> area I am less interested in, although I may do some QI
> adaptations of this kind for our company clients if that
> is requested. }

Look, I do not, and never did deny that you might have a point; I need
to understand this point, and I will understand this point even better
(and I do understand it now more than before) but for that, it takes
time. It takes more time if the arguments you give are less scientific
and more advertising because I react allergic to the "hype language"
you tend to use. Thus, please have some patience with me. Be
*shorter*, and more to the point. Quote less, write less. This post is
much longer than it should be.

So long,
	Thomas


0
Thomas
1/25/2006 10:07:09 AM
>> The tendency of advocating one algorithm
>> without presenting the shortcomings.

Well, QI is not a one trick pony i.e. one algorithm. It is an
optimal solution for EC precision problem as shown in eqs.
(21)-(23) in [T3]. Hence one can view it as a class or family
of algorithms. This is basically what the summary [M1] states
right at the top:

M1. QI Summary (solutions for AC shortcomings, potentials)
http://groups.google.com/group/comp.compression/msg/27c6f329038a1bdc

The shortcoming of QI would be something that is intrinsic to
QI, not to some particular algorithm from the family, much less
to a particular implementation of any such algorithm. The only
shortcoming of QI I am aware of is its 'youth', especially in
view of elements (A2) and (A3) in [M1], i.e. to realize its
maximum potential new kind of modeling (cf. [T2] pp. 27-35)
needs to be developed and implemented. Without that, QI "only"
offers a large speed & power consumption advantage over AC and a
slight compression edge (if one discounts the row "Vary" on p.
10 [T3] since AC modeler can be implemented to model such cases
the 'descriptive' way).

Hence, I don't consider a QI shortcoming any objections to a
particular algorithm from the family (or its particular
implementation), provided there are well known methods for
resolving it or if there are already variants that resolve it
or if there are mathematical solutions which can be implemented
using well known methods.

Recalling now much of the arguments in this and previous
thread, I seems much of misunderstanding came from
considering QI as a single, cast in stone, algorithm.

>> The pertinent comparison is between mul/div vs memory read
>> (RAM or ROM). The latter takes less power.
>
> This is an unfair comparison because the full
> operation of performing the read a) stalls
> the program and by that b) draws more power.
>

It doesn't have to cause a stall if you organize computations
to be cache aware as pointed out in (C2) in post [M2].

M2. Cache, Speed, Clarification for [M1]
http://groups.google.com/group/comp.compression/msg/ad14695608d27f6f

The stall will occur only if the immediate next instructions are
conditioned on that memory value to proceed. (The processors
which have large ratios of CPU/RAM speeds are normally pipelined
and will execute such independent reads in parallel.) Since
QI table accesses have a very orderly nature, it is easy to
organize such memory reads to avoid stalls or any cache
penalty for all practical purposes.

> So it's O(N^2). I don't care about the factors (and
> it is unusual to care about them).

If you're considering cache limitation, as we were doing there,
the factor 4 in table size makes a difference.

>> The N is not the size of the message, it is table size.
>> The table size determines max block which can be indexed
>> as single index.
>
> In which relation is N to the size of a block? Linear?
> I want a dependency on the block size. (Every codec can
> be blocked, but that's a sub-optimal solution because
> you loose statistics.

There is no reason to lose any statistics or any information at
all. The QI table limit NT (let's use less confusing symbol for
it), is equivalent to CPU register limit to 32 bits. That does
not limit your addition to 32 bit integers, provided you
propagate carry to the next 32 bits on longer integers. Your
C code and algorithms may be entirely oblivious to any such
low level details.

For QI modeler the coder table limit is exactly that kind of
limitation. The modeler could in principle ignore it completely,
and provide enumerative classes of any lengths, and let the
coder break them down and index in NT sized chunks and do the
mixed radix code to remove interblock bit fraction gaps (as
explained with your even/odd coder example and I1,I2 coding in
mixed radix M1, M2, also in N4 p. 10 in [T3]).

In practice, if one is looking for fast and overall competitive
implementations, the practical modeler should account for NT
value and the coder should account for cache limitations in
selecting its NT. That is not something peculiar to QI, since in
any domain, each layer of a competitive practical implementation
of an algorithm needs to take into account the basic parameters
and the limits of the adjacent layers it deals with.

> Or more specific, you cannot be asymptotically optimal
> if you cannot push N towards infinity.) Specifically,
> I'm interested in the behavior for N -> infinity.

Input size N (symbols) can be pushed to infinity. For QI with
given NT limit, that only means that number of blocks NB=N/NT
needs to go to infinity. With QI, due to its precise integer
arithmetic with quantized addends, the blocks are exactly like
digits of mixed radix number, where the radix (e.g. for binary
coder of stationary source) values Rj for j-th digit are the
quantized binomials C(NT,Kj), where Kj is count of 1's in j-th
block.

The special convenience of the radix Rj is that these are
sliding window integers (SWI, p.7 [T3]), thus they have form:
Rj=(Mj,Ej)=Mj*2^Ej. Hence the computation of the binary form
of a mixed radix number with given digits Dj (0 =< Dj < Rj,
where Dj is index for j-th block) factors into simple
computation in radices Mj, which are g bit integers (as
explained in previous post with I1 & I2, M1 & M2, see also
N4 p. 9 in [T3]; [QIC] includes a very practical, ready to
use mixed radix coder, including a special permutation
coder, all of which should run circles in compression and
speed around anything you can do with AC on that kind of
inputs; plus the properties (A4) described in [M1]).

The upper bound on QI redundancy due to g-bit quantization is
DQ(g) = 2*log(e)/2^g bits per (input) symbol. That is 4 times
smaller than the corresponding upper bound for AC: DA(g)=
8*log(e)/2^g using the same g bit precision (see [M1] & [M2]
for refs & explanation). Thus neither coder when limited to g
bits arithmetic precision will produce output which converges
to entropy for N->inf. To obtain the latter, both coders need
a precision g which grows as log(N). Of course, at any g(N),
the QI max excess per symbol DQ(g(N)) remains 4 times
smaller than the AC max excess per symbol DA(g(N)).

As a finer detail for the N->inf analysis, the block limit NT
needs to grow as O(log(N)), since the QI mixed radix coding of
block indices introduces (for NB>2) redundancy of 1/2^NT bits
per block. Since NT>g (in practice NT>>g), this higher order
correction to DQ(g), although in practice entirely negligible
for any commonly used NT, g and N (e.g. g=32, NT=1K, N=2^32),
becomes important for N->inf and g(N)->inf. An implication
of the NT(N)=O(log(N)) is that the number of table entries
O(NT^2) grows as O(log(N)^2) as N->inf, and since entry size
is g(N)=O(log(N)), the total QI table size grows as
O(log(N)^3) in N->inf limit, if the asymptotic optimality
is required (hence limited precision requirement is dropped).

The actual precisions g selected in practical implementations
are chiefly determined by the available register sizes, which is
normally such that machine MemorySize =< 2^RegisterSize (in
order to be able to address the entire memory), which then leads
to automatic fulfillment of the g(N)=O(log(N)) condition if you
use g=RegisterSize. The table size limits NT are with QI "much"
larger than g (otherwise you would be using exact EC, since
binomials grow slower than O(2^N), e.g. NT=32 and g=32 requires
no table quantization), hence NT will automatically satisfy
asymptotic requirement NT(N)= O(log(N)) if g(N) does and if any
quantization of binomials is done.

As noted in [M1] & [M2], for any finite N, AC has additional
absolute (on the total output) excess of O(1) which is 2-4 bits,
plus O(log(N)) bits in case for adaptive and semi-static AC.
The point of (A2) section in [M1] was that even though these
terms are ignored in the conventional coarse-grained N->inf
analysis, that approach overlooks certain types of practically
important finite sequences, rendering them as "incompressible"
in practice simply because the AC's O(1) & O(log(N)) "noise"
precludes effective modeling and packaging algorithms.

> You're too tight at the machine. If I'm discussing
> optimality, I don't care about cache sizes. This
> is a second argument that then has to follow as soon
> as the code should be made fast. *NOT NOW*

These are parallel tracks of the discussion. You and
others have brought up cache issues, hence they had
to be answered. For N->inf asymptotics, one can
ignore such "details". For N->inf, the QI is
asymptotically as optimal as AC, but with
uniformly 4 times smaller (in binary case) distance
from the entropy than AC, at any precision value
g(N)=O(log(N)).

> By limiting the table size, you limit the quality
> by limiting the block size.

Hopefully, the above has convinced you that this is
not the case. The table limit NT is exactly as problematic
as the precision limit g i.e. not problem at all if
you allow them to grow as O(log(N)) in N->inf limit.


> It is pretty simple: If you encode messages in "blocks",
> then the coder apparently cannot use knowledge from
> the previous block to encode the next one.

That is a completely gratuitous requirement. There is
no reason for QI modeler to forget any information from
block to block. Or to even care, in principle, about
the blocks any more than it has to care about the
arithmetic precision g or the machine register size.

>> As explained earlier, QI can code exactly
>> as AC, in stream  mode, if there is anything
>> for which that is the best method, ...

M3. Methods (a) and (b) of QI coding for AC modeler:
http://groups.google.com/group/comp.compression/msg/1314ff87da597fad

> No, it can't, obviously. If I have to block, I
> can build an AC coder - in principle - that
> carries the statistics over a size that is
> larger than a QI block size. See above.

Again that gratuitous assumption of QI modeler
memory being erased on block boundaries. As
explained above, that is not necessary at all
and QI doesn't do that.

As a minor detail, the method (a) described there is a pure
streaming coder, no blocks are assumed there (since it never
reaches the end of the block). It is basically equivalent to
computing binomials using multiplicative recurrences of the type
C(n+1,k+1)=C(n,k)*(n+1)/(k+1), with only two rows of n and n+1
of the table kept (for speed). This is exactly the calculation
that AC does, only in the opposite direction, from n+1 -> n,
i.e. the AC calculation is: C(n,k) = (k+1)/(n+1)*C(n+1,k+1),
where it interprets factor (k+1)/(n+1) as probability of 1s, (k
is count of 1's). In other words AC is simply the extreme point
in the spectrum of QI's possible table size vs speed tradeoffs
(see N2 p.9 [T3], for the rest of that spectrum), in which the
tables are minimized to size 0, while the coding time is
maximized due to having to compute all binomials (addends of
eq. (21) [T3], in general), on LPS and MPS symbols and inside
the coding loop (see pp. 19-25 in [T2] for more details on the
relation between AC & QI coding).

> There are different arguments here. Argument 1)
> is that I don't thrust your argument about asymptotic
> optimality. Mainly because you tell me that you
> cannot perform the limit N->infinity.

Hopefully the N->inf analysis above has cleared this
objection.

> Argument 2) is that you claim that you're faster
> than AC. I do not buy this because you're using
> tables and you're likely running into cache stalls.

a) At the theoretical level QI does less work per coding step
because of its better division of labor at all levels takes out
of the coding loop universal properties of the symbol sequences
(see (A1) in (M1)). Hence, among others, it uses no coding
operation on MPS, and it does much less work on LPS. Further,
the interaction between the modeler and coder is minimized so
that modeler passes its coding tasks in maximum size batches of
symbols available instead of symbol by symbol interaction as
done within the AC modeling interface.

b) At the empirical level, where cache consideration enters the
picture, in addition to (C1)-(C4) points in (M2), with the
source made available, I consider that matter closed (you can
ask Jason Betts his unix version which has windosw.h & conio.h
substitutes, plus the needed changes in Qiutl.c).

I gave figures I obtained without even using the main cache
related optimizations (C2) and (C4), and anyone can challenge
them. With (C2) alone, one can for all practical purposes remove
any cache miss penalty, while (C4) (the sparse coder, which is
included in the [QIC] source) only further improves the largest
of the speed ratios.

> One *can* trade speed for optimality, but I can also
> do that with AC. There is nothing wrong with doing
> so, but then you should state this.

QI can trade it, too. For example if it were competing against
runlength coder on sparse data, it would not need to use high
precision tables (g=32), or large tables (the sparse coder needs
much smaller tables) or very tight coding for counts, or mixed
radix block boundaries coding (the tapered Huffman of D1, D2,..
will give it on average the bit fraction to within .086 bits).
These and similar tradeoffs would improve speed significantly at
the expense of precision (which can be kept to remain better
than that of the runlength coder).

For competition against the AC best known for combination of
speed and accuracy (Moffat98), different tradeoffs were used
(mixed radix for blocks was used, but still only Huffman for
symbol counts and data sizes) and the results are shown in [T3].
That particular balance of speed & accuracy, is still plenty
fast to run circles around any quasi-AC reported in Said04 and
Moffat98 papers.

With [QIC] source in public, that's all I can do from my end,
with the time I had available. Even though QI was conceived
in July 2004:

M4. QI history
http://groups.google.com/group/comp.compression/msg/f9dc68cc361bb740

hence about year and half ago, the work on it was and is heavily
multiplexed with the regular work requirements, throughout. With
so many interesting developments with QI (especially with new
modeling along the lines of (A2) & (A3)) going on, fiddling with
varieties of ACs and public source code variants, or even
writing the reports & preprints, to everyone satisfaction is
like chewing a week old MacDonalds burger, while the platter of
premium filet mignon broiled to perfection is in front of me.

> Obviously not, namely as soon as M becomes larger as
> the block size. But then, optimality is defined for
> the limit message size -> infinity.

That should be clear by now.

>> Also, I have no idea what is this about "high entropy"?
>
> That was your claim, namely that QI outperforms AC in the
> "high entropy regime". So what is "high entropy"?

I don't know what particular "high entropy" statement this
refers to. QI does outperform AC for any entropy. The relative
compression gains, which is evident from [T3] (or the analysis
of redundancies), increase for the smaller outputs, since
O(1) and O(log(n)) terms become significant. As explained in
(A3) [M1], the observation that these gains are small is a
tautology. The potential gains are much larger then just few
percent shown in [T4]. That, of course has to be shown in the
future. (A3) merely gives heuristic reasons (based on my own
experiments & math over past year and half, all still
unpublished) why one should expect so.

>> for specialized QI version optimized for some range of inputs
>> (which would be a more fair comparison against specialized ACs).
>
> Have you or haven't you made direct comparisons? I don't
> thrust to go into tables and compare implementations cross-wise.

Not against quasi-AC or any other AC but Moffat98. Here in this
group, we saw Matt Mahoney show his results for the high entropy
limit of A=3 N=10^6 sequence. I tried the same with QI, since
the QI.exe in the released [QIC] kit already does that so anyone
can verify, and also against Moffat98 (which did worse than
Matt's AC on compression), and there is no contest, not even
close, in speed or compression on that type of sequences (Matt
didn't give his timings, but his coder doesn't unroll loops as
Moffat98 and it uses stream i/o so it will be slower than
straight array coders). The QI radix codes use the A-row
skipping method (cf. N2 p. 9 [T3]), which makes the O(N)
redundancy independent of radix, while with AC O(N) grows as
N*A. Hence the case A=3 was the best non-binary case for AC, and
still there was no contest.

Cross-table comparison was only good enough to give me a basic
idea where do quasi-ACs roughly stand vs Moffat98 and other full
precision coders and whether it is worth spending time doing my
own tests on these (and I concluded that it was not worth it).

> There's a well-established standard for C sources, namely
> C89 or C99. Pick it, work in it, and I can compile it.
> <windows.h> is not in C89, nor in POSIX or whatever.

The coding functionality, with _ASM in Intro.h set to 0, is a
very generic C (provided you set the 8, 16, 32, 64 bit integer
typedefs in Qitypes.h if different from those set there). Only
the timer and some keyboard checks in the upper level tests are
windows/MSVC specific. Removing those can answer all but speed
questions.

Since speed is an important element of the difference, the ANSI
C, without a standard microsecond or better level timer would
require more elaborate test functions which loop many times on
each single aspect (generate, code, decode, compare). The
problem with that approach is that it would introduce distortion
relative to cache effects, which would go in favor of QI
(allowing it better use of cache by running single specialized
component with fixed input at a time). For fairness I chose the
less objectionable path of using win32 high res timers.

>> The blocking only limits how big is the maximum single index
>> that coder can compute.
>
> Thus, it *does* limit the asymptotic optimality.

As explained above, not at all.

>> QI would simply classify odd and even elements into
>> two separate enumerative classes, each having its
>> own index.
>
> Fine. How do I tell QI that this is a useful thing to do?
> If I have pre-knowledge on the sequence as in the above
> example, how to I tell the back-end?

For AC, with one shoe fits all, all of modeler info about
sequence gets funneled through 'probabilities of the next
symbol'. With QI, which uses much richer and more precise
language of finite sequences parameters, there are many ways to
tell the coder what to code here. The general pattern of doing
this is described in [T2] p. 27, and elaborated in the rest of
that chapter (on QI/EC modeling). As with any streamlined
division of labor, more thought is needed upfront, so less
work and a better job is done later, in the coding loop (that's
a special case of the old 'ounce of prevention' vs 'pound of
cure' difference).

In your example, one could  use general AC modeler interface
described as method (b) [M3], which understands the multiple
probability classes and splitting of the input for different
enumerators. We wouldn't have modeler call the coder on each
symbol, but it would call enumerator to enumerate one array
(which modeler may extract separately), then call enumerator
for another array. One can streamline the work in variety of
ways (which could handle more general interleaved type inputs)
so that modeler doesn't need to actually copy sub-arrays, just
as one doesn't create N literal BWT rows and sort them.

None of this should be any more complicated than AC modeler
setting up two contexts and alternating them from call to call.
The main difference here is that we wouldn't traverse the whole
coder+modeler hierarchy with calls and function entries, symbol
by symbol, but would make the modeler do the separation into
enumerative classes first (be it as explicit copy of sub-arrays
or as init() of some more general interleaved inputs enumerator,
which are needed in (b) when dealing with Markov sources,
anyway), then the enumerator does the requested enumerations of
the given complete sequences. If you want to do less work, do it
faster, more accurately and with lower power consumption, that's
the way to do it, here or in any program or generally in any
production (or even in mental work).

The packaging of components would be QI's generic mixed radix
packager (used in multi-block and multi-alphabet coders). The
package normally includes number of components and their sizes
as parameters (which are packaged as well). Since this is not a
simple order-0 coding one would also have to include a
description specifying number of AC probability classes that
method (b) has to include (unless this is assumed to be known
to decoder). To describe this info, in case it is not known
upfront, it will costs you the same as with AC given the same
upfront information. If you use adaptive AC, the 'learning'
will generally cost it O(log(n)) bits more than encoding
optimally the exact same information separately.


> Because things like entropy and optimality make
> only sense in this limit. There is no mathematically
> meaningful definition otherwise.


That is only the asymptotic optimality, which is just one
low resolution fact in the realm of finite sequences.

In any case, QI is asymptotically optimal, too. Provided you
don't set any limit on precision g or table size NT, as is the
case for AC or any other coding algorithm. At a higher
resolution view, QI quantization excess is 4 times smaller (for
A=2) than that of AC at any given arithmetic precision g. For
multi-alphabet coding the AC quantization excess per symbol
grows as O(A), while QI's remains O(1) in its most accurate
mode (as shown in [QIC] source, which includes a high entropy
limit coder usable for any A < 2^32).

> Can I, or can't I modify QI such that it doesn't do
> blocking. Can I, or can't I run QI such that it
> runs optimal for infinitely long messages?

The two questions are unrelated. As indicated above, the
answer is yes to both.

> But that is only a different look on the scheme.

It is not just a different perspective on the same modeling.
There is a tangible mathematical constraint on predictor as to
what and how it can correlate and no such restriction for the
descriptor. While asymptotically for n->inf, their outputs
do converge to the same limits, for n<inf (which is all that
anyone will ever code) the descriptive method has a
measurable advantage, such as more than two times smaller
output on "Vary" (or any similar sequence) for n=4K.

> You also pay a price for the surprise, if you
> want to say so. You need to pay the price for
> encoding on where to split the block. Whether
> you call this the price for misprediction, or
> the price for splitting the data does not matter.

Yes, but you pay much less in descriptive mode. You don't even
need QI to test array "Vary" (or any similar input with large
surprises) using two AC coders, one in predictive one in
descriptive mode. (You can even give it to a student as little
term project.) The difference in cost is due to predictor being
very wrong on LPS vs MPS on every input symbol the whole N/2
times in "Vary",while descriptor is wrong only on coding the
model information, which is O(log(n)) size. Hence, the
descriptor, due to its clean separation between the O(log(N))
model info from the primary O(N) output, can better localize the
damage of the "surprise" to the O(log(N)) component. This is a
quite easily verifiable empirical fact.

0
nightlight
1/25/2006 9:20:21 PM
--- Errata:

> As a finer detail for the N->inf analysis, the block limit NT
> needs to grow as O(log(N)), since the QI mixed radix coding of
> block indices introduces (for NB>2) redundancy of 1/2^NT bits

The "1/2^NT" above should say "2*log(e)/2^g" (since for the described
method of block boundary coding in multiblock QI via mixed radix codes,
the quantization  is done on the g-bit radices Mj not on the NT-bit
radices Rj; the latter would be quite impractical for NT larger than 64
bits on 32 bit processors). This doesn't affect any conclusions, but
one statement needs a slight refinement. A more precise QI's
quantization redundancy DQ(g) for the multiblock coding method
described and with the block boundary effects included, is
DQ(g)=2*(1+1/NT)/2^g. That affects slightly the statement that QI has
"4 times" smaller quantization redundancy than AC (which is exact in
single block case), to the more precise multiblock case statement
"4/(1+1/NT) times". E.g. for the [QIC] default value NT=1K, the "4
times" would say "3.996" times, if one is speaking at that level of
precision.

0
nightlight
1/26/2006 4:41:23 PM
--- Errata:

> As a finer detail for the N->inf analysis, the block limit NT
> needs to grow as O(log(N)), since the QI mixed radix coding of
> block indices introduces (for NB>2) redundancy of 1/2^NT bits

The "1/2^NT" above should say "2*log(e)/2^g" (since for the described
method of block boundary coding in multiblock QI via mixed radix codes,
the quantization  is done on the g-bit radices Mj not on the NT-bit
radices Rj; the latter would be quite impractical for NT larger than 64
bits on 32 bit processors). This doesn't affect any conclusions, but
one statement needs a slight refinement.

A more precise QI's quantization redundancy DQ(g) for the multiblock
coding method described and with the block boundary effects included,
is DQ(g)=2*(1+1/NT)*log(e)/2^g. That affects slightly the statement
that QI has "4 times" smaller quantization redundancy than AC (which is
exact in single block case), to the more precise multiblock case
statement "4/(1+1/NT) times". E.g. for the [QIC] default value NT=1K,
the "4 times" would say "3.996 times", if one is speaking at that level
of precision.

0
nightlight
1/26/2006 4:47:50 PM
=== Update to Quantized Indexing RefLib page ===
    http://www.1stworks.com/ref/RefLib.htm

    Quantized Indexing Home:
    http://www.1stworks.com/ref/qi.htm

* Local copies for several external papers (which had
   broken links) were added

* Added Richard Brak's AMSI lectures on lattice path
  combinatorics

=== Updates

8a. R. Brak
    Enumerative Combinatorics (lattice paths) Lectures,
    AMSI Summer School,  2003
    http://www.1stworks.com/ref/combin_P1.PDF

28. L.D. Davisson
    Universal noiseless coding
    IEEE Trans. Inform. Theory  IT-19 (6), 783-795, 1973
    http://www.1stworks.com/ref/Davisson1973Universal.pdf

29. J. Rissanen, G.G. Langdon
    Universal Modeling and Coding
    IEEE Trans. Inform. Theory  IT-19 (1), 12-23, 1981
    http://www.1stworks.com/ref/Rissanen1981Universal.pdf

 30. A. Barron, J. Rissanen, Bin Yu
     The minimum description length principle in coding
     and modeling
     IEEE Trans. Inform. Theory  IT-44 (6), 2743-2760, 1998
     http://www.1stworks.com/ref/Barron1998minimum.pdf

33. R. Krichevsky, V. Trofimov
    The performance of universal encoding
    IEEE Trans. Inform. Theory  IT-27 (2), 199-207, 1981
    http://www.1stworks.com/ref/Krichevsky1981performance.pdf

53. M. Feder, N. Merhav, M. Gutman
    Universal prediction of individual sequences
    IEEE Trans. Inform. Theory  IT-38 (4), 1258-1270, 1992
    http://www.1stworks.com/ref/Feder1992Universal.pdf

54. J. Rissanen
    Universal coding, information, prediction, and estimation
    IEEE Trans. Inform. Theory  IT-30 (4), 629-636, 1984
    http://www.1stworks.com/ref/Rissanen1984Universal.pdf

55. M.J. Weinberger, J. Rissanen, M. Feder
    A universal finite memory source
    IEEE Trans. Inform. Theory  IT-41 (3), 643-652, 1995
    http://www.1stworks.com/ref/Weinberger1995universal.pdf

0
nightlight
2/8/2006 5:49:04 AM
Reply: