As an illustration how simple enumerative coding is, below is the actual source code for the exact EC (for 32 bit inputs only, although one could extend it to 64 bit fairly easily). To display properly use fixed font for this post. //-- Encode bit-string x to enum index I dword enc32(dword x) { int k,n; dword I; k=1; I=0; while(x) // assumes that unused/higher bits in x are 0 { n=loBit(x); // get bit offset of the lowest bit set to 1 x&=x-1; // clear the lowest bit=1 in "buffer" x I+=bc32(n,k++); // add binomial C(n,k) to the index, k=# of ones } // increment count of ones, k return I; // return enumerative index I } //-- Decode enum index I to bit-string dword dec32(dword I,int n,int k) { dword x,b; x=0; do{ x<<=1; // fill in decoded bit as 0 (at position 0) b=bc32(n,k); // find the largest binomial coefficient C(n,k)<=I if (I>=b) // check if we can subtract b from I { // ==> yes, decoded bit is 1 I-=b; ++x; // reduce index I and set decoded bit=1 if (!--k) // decrement count of 1's and stop if no more 1's left { x<<=n; // pad the rest of output with 0 bits (in the low bits) break; // leave the decoding loop } } // ==> no, decoded bit is 0; try next smaller b(n,k) } while(--n>=0); // this loop can be made to go faster using return x; // the binary (instead of sequential) search for n }

0 |

12/24/2005 1:08:35 PM

I received some questions about the source and the algorithm via email and since the matter may be of interest to others who downloaded it, I will reply here as well: > Concerning the source, I am afraid it is of little use in this form. > I don't run windows here, and I do not feel like I would really need it. > Thus, if you'd like to present your work, it would be most useful to > depend on ANSI-C only. The main nonstandard VC6/win32 C use in the source has to do with the nanosecond timer & thread control (in Qiutl.c). That was added in case users wish to check the precise performance against other coders. There are few additional macros in Qitypes.h such as hiBit(), loBit(), shifts & mul/div for various 32/64 bit mixes (which VC6 has implemented very poorly and which are used in mixed radix code). For the next release I'll try to add some control macros in the top level application header "Intro.h" which control how these currently nonstandard C elements are implemented, with ANSI C variant supported. The matter of endian and word size is a bit more involved to extend and any future generalization there depend on the requests from people experimenting with the code. Right now the coders assume 32 bit little endian architecture. Extension to 64 bit should be fairly simple (and may be included within next few revs), while the big endian if it comes up, will require more line by line work through the routines. > Concerning the algorithm itself: This pretty much reminds me > on the ELS coder, and I think it might be only benefitial to check > out some papers on this matter. The ELS coder is a member of AC algorithms. It is essentially a rediscovered variant of the orignal Rissanen's 1976 AC coder, AC-76 (which presents that kind of additive integer AC in a mathematically much cleaner and in a more rigorous way than the ELS paper from Pegasus does on their version): 5. J. Rissanen Generalised Kraft inequality and arithmetic coding, IBM J. Res. Dev. 20, 198-203, 1976 http://www.research.ibm.com/journal/rd/203/ibmrd2003B.pdf The key differences between EC/QI and AC-76 (or ELS) type of coding is discussed in [T2] pp. 19-25. The obvious similarity to EC/QI is not accidental -- Rissanen was trying to solve the very same EC precision problem that QI solves (this story was discussed in more detail in Rissanen's 2nd AC paper: 27. J. Rissanen Arithmetic codings as number representations Acta Polyt. Scand., Math. & Comp. Sc. Vol. 31 pp 44-51, 1979 http://www.1stworks.com/ref/ariNumRepr.pdf as partially recounted in the intro section of [T3] and chapter in [T2]). As explained in [T2] p. 22-23, he could not find the right quantization for the exact enumeration. So he approximated the unlimited precision enumeration first (Stirling, etc), then found how to quantize the latter. The QI performs quantization of exact EC directly, without approximating the exact enumeration. The benefits of the new procedure relative to AC-76 (or its rediscovery, ELS) are described in detail in [T2] pp. 19-25. Both, the coding accuracy and the speed are improved via new procedure (speed particularly dramatically). Regarding the speed difference, you should note that QI performs no coding operations on the most probable symbol, other than skipping it (at the memory bus speed) while AC-76 & ELS do have to update much larger coder state (beyond just the bare count of symbols processed). As explained in [T2], AC-76 (or ELS) could construct tables which would allow them skipping, but they would need such tables for each source probability. > It seems to me that your algorithm is a special form of ELS tuned > to multinomials. The QI is not a special case of ELS or AC-76 -- it is a more accurate and a much faster solution of the same problem. The AC-76 and the later ACs can be obtained via approximation of QI (cf. [T2] pp. 22-23). Note also that multinomials are only a special case of EC/QI addends -- they occur for order-0 Markov source. E.g. Cover's paper [1] shows exact EC order-1 binary Markov enumerator. The Oktem's thesis [23] has several other types of addends, some of which don't even have closed forms (or at least they didn't check the combinatorial literature well enough). Constrained coding (cf. [26] and other Immink's work) also has numerous other types of addends. In short, the multinomials are just one type of EC/QI addends. Even the QI sample source includes several of non-multinomial types of addends, such as powers and factorials for radix & permutation coding. Similarly, coding of trees would use Ballot numbers (which reduce to Catalan numbers on the main lattice diagonal x=y). > Whether this can be exploited for general-purpose > compression is not yet clear to me. I think the difficulty that this and several other of your questions show is result of using the AC modeling paradigm to try to figure out how would one model with EC/QI. The EC/QI doesn't model that way at all. This was discussed at some length in [T2] pp. 26-35 (see especially 30-31 for direct comparison with AC modeling paradigm/pattern). Briefly, the division of labor between the coder and modeler in EC/QI is different than for AC & its modeler. The EC/QI expect the modeler to perform particular kind of partition of input which hands to coder instances of equiprobable "messages" for enumeration (computation of index). The "partition" or "segmentation" is only in simple cases (such as quasi-stationary order-0 source) the literal segmentation of input sequence. Generally, this is not the case. E.g. for a known order-M quasi-stationary Markov source, one would partition EC output into M separate streams -- each symbol is coded via EC/QI into a stream selected based on the last M symbols. For unknown source, the BWT algorithm performs such partition in its output column (cf. [T2] pp 34-35; note that I use only the transform of BWT, not the MTF or other entropy coding usualy done in the 2nd phase of BWT). The BW transform can be used as general purpose segmentation module of the EC modeling engine. The second half of an earlier reply to Matt Mahoney gives some more detail on this: http://groups.google.com/group/comp.compression/msg/2c769e3a278a62f4?hl=en& The AC end EC modeling schemes belong to different schools of thought in information theory, AC to Shannon, EC to Kolmogorov school. The latter, due to the absence of practical coding algorithm, has been living in an undeserved shadow. I think that its potential has yet to be tapped and that it will prove itself a much more powerful modeling scheme than what is done today using AC (which functions through a modeling bottleneck, as argued in T2). Future will tell. References -------------------------------------------------------------------------- T1. R. V. Tomic "Fast, optimal entropy coder" 1stWorks TR04-0815, 52p, Aug 2004 http://www.1stworks.com/ref/TR/tr04-0815b.pdf T2. R. V. Tomic "Quantized indexing: Background information" 1stWorks TR05-0625, 39p, Jun 2005 http://www.1stworks.com/ref/TR/tr05-0625a.pdf T3. R. V. Tomic "Quantized Indexing: Beyond Arithmetic Coding" arXiv cs.IT/0511057, 10p, Nov 2005 (also: 1stWorks TR05-1115) http://arxiv.org/abs/cs.IT/0511057 Additional relevant papers are at: Quantized Indexing RefLib page: http://www.1stworks.com/ref/RefLib.htm Quantized Indexing Home Page: http://www.1stworks.com/ref/qi.htm

0 |

12/29/2005 11:05:18 AM

> for a known order-M quasi-stationary Markov source, > one would partition EC output into M separate streams > -- each symbol is coded via EC/QI into a stream selected > based on the last M symbols. Ooops, there is a bad typo up there. The fragment: "into M separate streams" should say: "into up to A^M separate streams (where A is alphabet size)"

0 |

12/29/2005 11:20:01 AM

> Quantized Indexing RefLib page: > http://www.1stworks.com/ref/RefLib.htm There were two bad links in the RefLib.htm file. The missing files were Ruskey's Combinatorial Generation [38] and Potapov's Theory of Information [37] textbooks. Strangely, while there were many failed attempts to download (errors 404) these files, no one left any feedback to fix the link. Anyway, they files should be Ok now.

0 |

12/31/2005 2:52:42 AM

There was an small update to the source. The main addition (requested in several emails) was an option: #define _ASM 1 // Set to 0 to disable inline asm in the Qitypes.h which allows disabling of the VC6 inline asm (used in some macros). The speed of operations (mostly radix codes decoder) drops by a few percent without the inline asm. There were also some minor compiler warnings that some people emailed about which were cleaned up in the latest code (I left the file name unchanged for the old links to work): http://www.1stworks.com/ref/C/QIC100.zip There is also a thread about the algorithm discovery in the Computer Chess Club ( http://www.talkchess.com/ <-- signup screen), where the seed for it was planted way back in the last century. Here is the excerpt for those interested in that aspect: ------ Computer Chess Club thread excerpt (few typos fixed) ---------- "Number of positions in chess -- The rest of the story" http://www.talkchess.com/forums/1/message.html?476509 Posted by Ratko V. Tomic on January 03, 2006 at 08:12:12: > Uri Blass: There is a better upper bound see: > > http://chessprogramming.org/cccsearch/ccc.php?art_id=77068 > > Uri Hi Uri, that little excercise in enumeration you brought up (and mentioned in the other thread) set me off back then to try make it work as a general purpose compression algorithm. While a neat idea on paper, the problem was that the arithmetic precision had to be of the size of output. After struggling for a while, I searched the literature and it turned out such compression algorithm already existed, called "Enumerative Coding", since 1960s (first in Russian literature, from Kolmogorov and his disciples, then shortly thereafter here, in USA, from Lynch and Davisson). And, as in my version, the precision problem was still unsolved after over four decades of various attempts to make the algorithm practical. Since I arrived at it on my own, my conventions for enumeration happened to be backwards from those that existed in the literature (mine sorted right to left, the so-called colex sorting of combinations, and built up the enumerative addends bottom up, while the standard scheme sorted lexicographically & worked recursively top down, plus all my pictures were rotated 45 degrees from theirs). Further, due to my playing with lattice methods in QCD (in my physics graduate school days), I also had my own visual representation of combinatorics as lattice walks, which is a very intuitive, heuristically rewarding way of looking at it, allowing one to see all of the combinatorial identities at a glance (especially useful for tossing and checking out algorithms in the head when going to sleep or waking up, without a pencil and paper). The lattice formulation turns out to have existed in the EC literature as well (as the Schalkwijk's Pascal triangle walks), although not in as general or elegant formalism as mine, lacking even a notation for the lattice walks, key sets of paths, enumerative classes, constraints... (stuff I worked out while doing physics). Since that thread, I kept returning to the problem, on and off, trying various ideas. Nothing worked. Then, in summer 2004, when my wife and kids went to a summer camp for a week, I stayed home to work on a programming project (a video codec). The first night home alone, at about 2AM, while debugging the latest batch of code, out of nowhere an idea popped into my head on that pesky enumeration problem, something I didn't yet try. I quickly coded just a toy version, allowing input buffers of 32 bits only, and by dawn it worked -- a version using arithmetic precision of only 8 bits encoded & decoded correctly all 2^32 possible inputs. That same early Sunday morning, it must have been around 6AM, I called and woke up the company owner (I am a CTO & a chief scientist), and he, still half awake, yielded to my enthusism and agreed to suspend the original project, so I could try if the idea works on data of any size. At the time I didn't have a proof, and wasn't sure even at the heuristic level, that it can always be decoded. I also didn't know what the maximum or average redundancy would result from the reduced precision. Within a week I had a simple version of code working on buffers up to 4 kilobits, using only 16 bit arithmetic precision (instead of 4 kilobit precision). It worked again, and it was very fast, even in that crude version. The redundancy due to the limited precision arithmetic was measured and it was on average about 0.05 bits (and always below 0.07 bits) for the entire 4k block. The next couple months I extended the algorithm to any input size and to general alphabet (from the original binary alphabet). I also found a proof of general decodability and an expression for the max redundancy due to finite precision. The max redundancy is always below log(e)/2^(g-1) for g bit arithmetic precision (I use now g=32 bit precision). The fourty years old puzzle was finally cracked. The accidental backwards conventions of my initial approach turned out to be the critical element exposing the key to the solution, which is virtually impossible to spot from within the conventional enumerative coding scheme. I also developed a new modeling scheme for combinatorial methods of coding (such as the new algorithm) which is quite promising on its own. It is basically a scheme along the lines of Kolmogorov's algorithmic approach to information theory (in contrast to Shannon's probabilistic approach, which is dominant at present, and where modeling for arithmetic coding consists in calculating the probabilities of the next single symbol). The algorithm, which I named "Quantized Indexing", turned out pretty amazing. It codes always tighter than the present best entropy coding algorithm, the Arithmetic Coding (which is only a particular approximation of QI), yet it codes much faster than AC due to using only a simple table add (of a machine size word) for the less frequent symbol and no coding operations for the most frequent symbol (AC needs coding operations for both types of symbols, and more expensive operations at that, such as mul, div). As result, QI runs typically 10-20 times faster than the fastest full Arithmetic Coder implementations (and always at least 6 times faster, which occurs in high entropy limit, while for very sparse data, the low entropy limit, QI runs well over 200 times faster). Recently, I posted a preprint about the algorithm to arXiv: http://arxiv.org/abs/cs.IT/0511057 and also created a web page with additional more detailed technical reports and the C source code: http://www.1stworks.com/ref/qi.htm A nice little surprise came when I emailed about the preprint to Jorma Rissanen, the inventor of arithmetic coding himself. He had struggled for several years with the very same enumerative coding precision problem, inventing in the process arithmetic coding as a compromise solution (in 1976, while working for IBM). Although busy with a lecture tour, he read the paper right away and was quite pleased to see his old problem solved at last and a bit surprised at how "very clever" the solution turned out to be. So, that's the chain of unlikely events triggered by your original code snippet for enumerating the chess positions.

0 |

1/4/2006 8:40:05 AM

"nightlight" <nightlight@omegapoint.com> wrote in news:1136364005.126391.156010@g43g2000cwa.googlegroups.com: > > The algorithm, which I named "Quantized Indexing", turned out > pretty amazing. It codes always tighter than the present best > entropy coding algorithm, the Arithmetic Coding (which is only > a particular approximation of QI), yet it codes much faster than > AC due to using only a simple table add (of a machine size word) > for the less frequent symbol and no coding operations for the > most frequent symbol (AC needs coding operations for both types > of symbols, and more expensive operations at that, such as mul, div). > > I know you claim to have this great code. But others such as Matt the inventor of PAQ codes at least wrote a simple arithmetic coder FPAQ0 to show how his methods would compete on files where zero order arithmetic coding would come into play. Since your model is always "tigther than the present best enropy coding algorithm" do you have actually test code to compare with real files and test sets. Or is the method not yet advanced enough to do real compression on files yet. Don't get me wrong Matt is not saying FAPQ0 is the best entropy coder he knows it isn't that good. He is just showing how it works for a certain model. The same sort of thing could be dome with your method. It least that is if your method is comparable at all with real world entropy coders. And since using the same model as most one could calculate the true entropy of versus standard test models. If one does this one can see Matts code does not produce on average the shortest file. Maybe yours could do better if you ever actually get real working code. But I find it hard to belive it could compress as well as some current entropy encoders. Where Matts code shines is his use of various models and a slick way to combine them which makes his family of PAQ coders among the best on the net. The point is if your code is half as good as you claim then his simple 2 state enropy coder could be replaces by you faster and tighter 2 state coders wich would bring you name fame. But I won't hold my breath. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/4/2006 11:17:17 AM

> "tigther than the present best entropy coding algorithm" do you > have actually test code to compare with real files and test sets. > Or is the method not yet advanced enough to do real compression > on files yet. You can download the source code which shows the coding aspects where QI is essentially different from the existent entropy coding algorithms. All that the source will show you is that QI is more accurate and much faster entropy coder than any AC you may have, tested under the same conditions (e.g. same model, same coding of model information, of data lengths, counts, output serialization or packaging... etc). There are some quasi-arithmetic coders, which run faster than the full AC (paying for speed in extra output redundancy), but even these won't run 20 or 30 times faster, let alone 200 times faster than the full AC (they're usually 2-3 times faster), as QI does. But if you have one such, you're welcome to test it. I would be curious to know. The source code itself is not a file archiver or video codec or any such higher level application. Any differences for these higher level applications, however interesting they may be otherwise, are a simple consequence of the fundamental differences: a) AC always codes with greater redundancy than QI (under the same coding conditions, obviously; this is the result of AC being a direct approximation of QI, see [T2] pp. 19-25, the chapter on that exact difference, how much and how much for which data, with additional details on AC in [40],[41],[41a],[41b]) and b) AC codes much more slowly than QI due to: ... b1) AC has to perform coding operations for all input symbols, while QI can just skip over the most probable symbols at memory speed (you can see the top of coding loop in EncDec.c where it merely scans the memory for the the less probable symbol, 32 symbols for each loop step at basically memory scan speed), and ... b2) AC performs more complex operations for the least probable symbols (which QI also needs to encode explicitly) i.e. mul/div vs simple array lookup and add. This difference, which remains even for the uncompressable data (where the least & most probable symbols are approximately equally likely), allows QI to still code at least 6 times faster than the full AC even in high entropy limit. All that is, of course, measurable using the source code provided (which also includes very accurate timing functions). The above are not religious claims or invitation to believe or debate belief systems, but a simple statement of easily verifiable empirical facts. If you need to know also how it would do on "real" file, and can't extrapolate from how it does on memory buffers filled with arbitrary content, well, you are welcome to add such file related code and try it. Now, if you do add a file i/o which takes hundreds times longer than the coding, I can predict you won't see almost any speed difference. -- References ( http://www.1stworks.com/ref/RefLib.htm ) T2. R. V. Tomic "Quantized indexing: Background information" 1stWorks TR05-0625, 39p, Jun 2005 http://www.1stworks.com/ref/TR/tr05-0625a.pdf 40. J.C. Kieffer "Second Order Analysis of Data Compression Algorithms" (Preprint from J.C.K. lectures) http://citeseer.ist.psu.edu/370131.html 41. M.Drmota, H-K. Hwang, W. Szpankowski "Precise Average Redundancy of an Idealized Arithmetic Coding" DCC 2002, 222-231. http://citeseer.ist.psu.edu/drmota02precise.html 41a. P.A.J. Volf "Weighting Techniques In Data Compression: Theory and Algorithms" Ph.D. thesis, Eindhoven University of Technology, Dec 2002 http://alexandria.tue.nl/extra2/200213835.pdf 41b.B. Ryabko, A. Fionov "Fast and Space-Efficient Adaptive Arithmetic Coding" Proc. 7th IMA Intern. Conf. on Cryptography and Coding, 1999 http://www.1stworks.com/ref/RyabkoAri99.pdf

0 |

1/4/2006 1:00:59 PM

"nightlight" <nightlight@omegapoint.com> wrote in news:1136379659.121605.50640@g44g2000cwa.googlegroups.com: >> "tigther than the present best entropy coding algorithm" do you >> have actually test code to compare with real files and test sets. >> Or is the method not yet advanced enough to do real compression >> on files yet. > > You can download the source code which shows the coding aspects where > QI is essentially different from the existent entropy coding > algorithms. All that the source will show you is that QI is more... I guess that means you don't yet have code where you can compare it to even simple airhmtic file coders. I don't have faith in your work from your earlier posts. A simple No you can't actully test it in any real applications yet would have been enough. Again from the earlier thread its not obvious to me you have a full understanding of arithmetic coding methods. > > All that is, of course, measurable using the source code provided > (which also includes very accurate timing functions). The above are not > religious claims or invitation to believe or debate belief systems, but > a simple statement of easily verifiable empirical facts. If you need to > know also how it would do on "real" file, and can't extrapolate from > how it does on memory buffers filled with arbitrary content, well, you > are welcome to add such file related code and try it. Now, if you do > add a file i/o which takes hundreds times longer than the coding, I can > predict you won't see almost any speed difference. Very funny. Its very strange you make a big deal of claiming you compare it against what you claim is a good entropy coder Moffat. Yet you don't even test against it on a level playing ground. You think by modifying Moffat that you are giving an honest test. However if you really wanted an honest test since you are the new kid on the block. You would think you could easily convert your code to work on files like Moffat's or are you afraid to test it on the same playing field Moffat and others have picked so various methods can be tested against yours. I for one belive you shifted ground because you fear real aithemetic coders and people could take existing software without modification and show you directly that you coder does not lead to shorter compressed output than already existing coders. I suspect this since most would have tested the Moffat code on the same playground instead of moding it to one of your choice where its not easier to compare against any other standard codings. Look maybe you have something. If your method is any good at all surely you could easily add the stuff you stripped out of Moffat's to your code so that you can compare the compression results or is there some reason you can't. If you can't than it would seem to be of little use. See: http://groups.google.com/group/comp.compression/browse_frm/thread/ffc7f7ca8 4c76378/792884848866dc4c?q=moffat&rnum=4#792884848866dc4c From below if its true at all and if your code works at all you should have the courage of your convictions to test it against Moffat and others where they were designed to run. Doesn't yours work there whats the problem? YOUR QUOTE "Note also that for the test we had stripped the Moffat et al. V3 binary coder to its bare engine, replaced stream i/o with memory to memory coding, no allocations were done within the timed loop, model decisions were taken out (since only order 0 was tested) and all dynamical coding options (alternatives that didn't matter) were hardwired as static to avoid time waste in the test. So the arithmetic coder tested was already quite a bit faster than the out-of-a-box Moffat et al code. We wanted it to take its best shot, the best it possibly could (since QI didn't need any such implementation/code level edge, being so much more efficient at the fundamental, algorithmic level)." If you wanted to take the best shot Again let me stated that for the dense. If you wanted to take the best shot. Then test it like others on files where Moffats was made to work. If you don't one can only wonder what you are hiding. Especailly since you clain this is so much better than arithmetic. I thought about downloading as you suggested and converting it to files so it really can honestly be compared to Moffat. But I realize from your post that you would claim I did a poor job of converting so I will not do it. After all your the one making the cliam its better than Moffat. Your the one saying you compared it to Moffat. Yet you really only compared it to a modifed verision of Moffat that you yourself modified. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/4/2006 6:12:40 PM

NIGHTLIGHT what is wrong with you? You claim a coder better than the best arithmetic, Yet your only proof is your word by using Moffat code you modifed yourself. Look people here are willing to help those trying to learn. Yet you never seem to anwser real questions. Such as why you went to the trouble to modify Moffats code yourself and then proclaim your method is better than the best present day arithmetic encoders. Please stop with the ridiculus post of refereces that have nothing to do with this unless your trying to pull the wool over peoples eyes. Since we don't work for you and don't have to kiss your ass for a job. If you really want to show its better and can compress to smaller sizes than Moffats. Then do an honest test on unmodifed code. Why is it you can change his code and put him down with code that is not even completely his? Yet you seem unable to change your code to work with the data his is designed for? Surely you have the ability with your team of people to do that. Is it that your afraid other current arithmetic coders can already compress most files better?

0 |

1/4/2006 11:07:16 PM

My tests on my time, your tests on your time.

0 |

1/4/2006 11:08:59 PM

You must be damn rich to afford time ownership. Can I share your secret to owning time? No entropy ..please.

0 |

1/6/2006 11:53:03 AM

> You claim a coder better than the best arithmetic, > Yet your only proof is your word by using Moffat > code you modifed yourself. That's not the "only" proof. Analysis of the algorithms, as sketched below (and covered in more detail in the the preprint & the tech reports) will give you the same answer, showing it even more clearly. The source code which is available allows anyone to test as they wish. The source also allows examination so that one can explain the results from the code itself, rather than form the mathematical description of the algorithms. > Such as why you went to the trouble to modify Moffats > code yourself and then proclaim your method is better > than the best present day arithmetic encoders. The main mod on their code was to make it code from memory to memory instead of using much slower C streams. The alternative of adding stream in/out to QI code would simply add a large and variable term and uncertainty to both coding times making tests measure mostly the irrelevant aspects (such as how much time the C stream library takes under all OS & buffering fluctuations). Hence, regarding the speed tests, the removal of their stream i/o has allowed much more accurate and reliable measurements of the coding speeds differences. If you wish to measure your C stream i/o functions speed, which combines some addons of coding times, you go ahead, test that. That doesn't interest me. The other mods were selection of the coding mode and precision via their own options (in the headers). Binary coder, order 0 mode was used. Their code for the higher order contexts was commented out, so the coder can run faster (instead of checking higher order model variables inside their coding loop). Also, all their memory allocs were moved outside of their coding loop (just their coding loop was timed using the hi-res win32 timer). As with the speed tests, the restriction to order 0 model was made because that is where the coders differ (since one can always use the same model and the same encoding of the model parameters with both coders on higher order models). Hence, the optimum way to measure the coding efficiency _difference_ was to remove any extra variables in which they don't differ. Again, if you are interested in the modeling quality of Moffat98 AC implementation beyond order 0, go test that. Of course, both aspects, the speed and the coding accuracy advantage of QI, are self-evident to anyone who understands the mathematical description of the two coding algorithms. QI is always more accurate since AC is its direct approximation (assuming you use both under 'everything else set the same way', such as model, model parameter encoding, etc). AC and QI use the same type of addends, except that AC has different normalization, in which it rescales what is in QI an exact integer "path count" at some point (x,y), by dividing it with QI's total "path count" (e.g. the path count at the path endpoint B in Fig 1, p. 4). This AC scaling turns QI's integers into unlimited binary fractions, which the AC then truncates to the given number of bits. This truncation of the infinite fractions even for a small number of symbols (which is absent in QI's integer format of addends), is a loss of precision which leads to AC losing parts of its coding interval in each step. If one were to use fractions of infinite precision, all intervals would fit exactly next to each other, without gaps. Since allowing intervals to overlap would result in non-decodable output (by Kraft inequality), any loss in precision for specifying interval boundaries must leave unused gaps in the output code space. The basic arithmetic difference in coding is the extra loss of precision for AC. A rough analogy would be as if two of us are balancing some expenses and I use exact integer number of cents from the receipts, while you take integers from the receipts and divide them by some large total, then, since you will generally get an infinite decimal fraction, you terminate it to some number of places. Hence you're making an error even before the first add, while my integer scheme won't have any error at all (until the sum reaches certain magnitude). The QI.exe included with the source has a command "cbr" which lists all such code space gaps for any n-symbol input, as well as the cumulative redundancy in bits resulting from the gaps. Anpother command, "ct" lists various types of redundancies for the entire tables, exeamining every quantized binomial coefficient, for blocks up up 2^20 bits. In QI the Kraft inequality is eq. (20) on page 8, and QI's loss of accuracy is due to rounding up the integer addends once their magnitude exceeds the number of bits QI uses. As explained on p. 8, the error of QI's addends is the smallest one satisfying both the given finite precision of g bits and the Kraft inequality (eq. (20)). Hence, I don't even need a test to know that, all else (model etc) set the same, QI will code at least as tightly as anything else you may have now or in the future. The test is useful to find how much exactly does the difference in coding accuracy amount to against some specific coder. Note also that AC has an additional coding redundancy of about 1-2 bits (max is 2, avg 1.5) on the total output even for the _infinite precision_ coders (see [41] and [41a] I mentioned before). For a finite precision coders and if AC is set to code in "decrementing mode" (described in [34]), which is its most accurate coding mode, the additional redudnancy vs QI will be about 2-3 bits on the total output (that's the best case when using AC's "frugal bit" mode, which is how I have tested it). For non-decrementing finite precision AC, such as the usual 'adaptive AC', e.g. Moffat98, or for the Rissanen's AC-76 (which is a static AC), as shown in [T2] p. 22 for order-0 source, there is an additional AC redundancy which is approximately: 1/2 log(2*Pi p*q n) bits, where p and q are probabilities of 1's and 0's and n is the input size in bits. This redundancy is due to AC approximating the enumeration itself (Stirling approx. plus dropping of the square root factor), before any finite precision is imposed. This is normally the dominant difference and it is what all but the last row in the table on p. 10, in [T3], show. For stationary order-0 binary inputs, adaptive AC can code using Krichevsky-Trofimov (KT) estimator (see [41a], [40]), which removes this worst case O(1/2*log(n)) redundnacy, by largely accounting for the square root factor. That results in lowering the adaptive AC redundancy for the low entropy inputs (where the relative error was the greatest) and increasing it for the higher entropy inputs (where the relative error was the smallest), hence KT estimator is a tradeoff. Note that the last row on p. 10 compares entirely different aspects of redundancy, the distinction between predictive vs descriptive methods of modeling (see [T2] pp. 30-31). Although, one might call it comparing apples and oranges, since it didn't really compare the coders proper, the point was that each coder was modeling input as order-0 source (while the input was more complex), but each was modeling in its "native" mode -- QI in descriptive and AC in predictive mode. The lesson of that example, or any other with large and unpredicatble changes in the symbol frequencies is that the predictive coders pay much greater price than the descriptive coders when the input is unpredictable (relative to whatever model order they use, assuming they both use the same order models). We can similarly deduce the nature of the speed difference from the mathematical descriptions of the algorithms. The QI index recurrence is eq. (22), which for binary order-0 coding simplifes to eq. (17), with quantized binomials C(n_j,j). Since QI keeps these in a table, its coding via (17) consists in adding a table value (a machine size word, 32 bits in C code provided) to the output buffer for every "least probable symbol" (1 by convention), and no coding operation for the most probable symbol (0 by convention). In this same coding setup, the AC's calculation does exactly the same additions of (17), except that all its terms are rescaled (normalized to total=1, as explained above, see [T2] pp. 22-23), and that AC doesn't use table to get its rescaled C(n,k) addends, but it computes them on the fly using multiplicative recurrences, which in binary order-0 coding are of the form (this is a regular binomial identity): C(n,k) = C(n+1,k+1) * (k+1)/(n+1) ...... (1) when symbol 1 is encoded and: C(n,k) = C(n+1,k) * (n-k)/(n+1) ...... (2) when symbol 0 is encoded. The factor p=(k+1)/(n+1) in (1), which is a ratio of the remaining counts of ones, (k+1), and the total remaining symbols (n+1), is interpreted within AC as probability of ones and the factor q=(n-k)/(n+1) in (2) as probability of zeros at that same place. As explained in [T2] pp. 19-25, AC has no choice here since its addends are dependent on probabilities, hence it can't have a single table (as QI does) which applies to all probabilities. If you wanted to make it code faster using tables to skip the most probable symbol, you would need a separate table for each source probabilities. That makes such scheme quite impractical for AC (see [T2] pp. 19-21). Consequently, if both coders have processed n symbols, containing k ones, and have 20 zeros followed by 1 ahead, QI simply picks the next binomial C(n+20,k+1) from the table and adds it to the index. The AC has to compute all 20 intervening binomials (for the 0's), using multiplicative recurrence of type (2), to calculate the final rescaled addend for the last bit=1. That's the basic difference in work between the coders which follows from the math of the algorithms. You don't need any test to see that if there were 500 zeros followed by 1, AC would have done 500 multiplications via eq (2), while QI has only added n+500 and picked the addend out of the table. The tests only tell you how do these obvious differences in the amounts of work translate into differences in coding speeds. The table on page 10 shows that e.g. for 128 Kbit input, with 8 ones, where you have on average 16384 zeros between the ones, that QI will execute 247 times faster (the result in the table is averaged over 500 inputs, each with random placements of 8 ones). As mentioned in the readme.txt that comes with the source code, in this test generic QI was used, not the QI tuned for sparse coding mode (which is provided in the C source), which would have given it another factor 2-5 on such very sparse and long inputs. In short, my tests were focused on the aspects in which the coders differ _inherently_, as described above (since everything else can be set the same if one wishes so). That's what I measured and what I reported. If you're interested in something else, such as quality if Moffat98 AC modeler of order 1,2..., or in speed of your C stream i/o library, you can write the code and test that. Neither question interests me, though. Regarding the QI higher order modeling, while one can use AC modeling engine, that is a sub-optimal modeling scheme for QI, as explained in [T2] pp. 26-35. The optimum division of labor between the modeler and the coder is quite different in QI from the one in AC (see [T2] pp. 30-31). QI's general purpose native modeling engine is BWT (the bare BW transform, before MTF or run-lengths or other entropy coding). That is all still research in progress, so I'll leave it at that. No sense arguing heuristics.

0 |

1/6/2006 6:34:26 PM

0 |

1/6/2006 6:43:08 PM

-- Errata: > 128 Kbit input, with 8 ones, where you have on > average 16384 zeros between the ones The figure 16384 should be repaced by 128*1024/9 = 14563.5... since the 8 ones produce 9 sections of zeros.

0 |

1/6/2006 7:51:42 PM

nightlight wrote: ) That's not the "only" proof. Analysis of the algorithms, ) as sketched below (and covered in more detail in the the ) preprint & the tech reports) will give you the same answer, ) showing it even more clearly. This analysis is no different from any other analysis in that you have to make lots of assumptions. This means that if you use such an analysis to make real-world predictions, then that depends on how well your assumptions match the real world. Because of the massive nature of your posts, I haven't been able to find an answer to a few questions I have about the QI coder: - if I read you correctly, it is not an adaptive coder, so how do you transmit the model information for the QI coder ? - how would you use a QI coder with an adaptive model ? - assuming I have a stream of symbols, where at each position in the stream, the probability distribution of the symbols is different, then how does QI coder adapt itself to all those different distributions ? (I have a few more questions, but stick to these for now.) SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT

0 |

1/6/2006 8:01:39 PM

-- Errata: >> 128 Kbit input, with 8 ones, where you have on >> average 16384 zeros between the ones The figure 16384 should be repaced by 128*1024/9 = 14563.5... since the 8 ones produce 9 sections of zeros.

0 |

1/6/2006 8:05:15 PM

"nightlight" <nightlight@omegapoint.com> wrote in news:1136572988.625145.191290@z14g2000cwz.googlegroups.com: > > Of course, both aspects, the speed and the coding accuracy > advantage of QI, are self-evident to anyone who understands the > mathematical description of the two coding algorithms. QI is > always more accurate since AC is its direct approximation > I wish we had someone here that was an expert on both algorithms since from your previous posts it sure indicates to me that you are no expert on arithmetic coding. I admit I know very little about your QI however I now enough about arithmetic coding to realize that proper optimal bijective file coding is one of the best methods and it is an optimal method something you don't seem to understand from your various posts. Even if QI could be used to make some sort of optimal file compressor it could never compress all files as well as an optimal arithmetic. Since you can't grasp that simple fact you can't be an expert in arithmetic so I doubt what you think is self evident has any relationship to reality. You seem to think your stuff is better than arithmetic when used as an entropy encoder. Yet it appears this so called neat method of yours has yet to be even tested in any simple code where one is using an entropy coder. Why is that? Even Matt did FPAQ0 can't you or your team do something similar with QI or is it to complex of a task? I know you could care less what I think. But many people here would like to see real results. We get plenty of people that can quote and talk about how good there stuff is and people here realize talk is cheap. They want to see REAL RESULTS can you do that or do you care what the average person here thinks of your method. Like mentioned above it appears you can modify Moffat which is not the best but you picked what you wanted to pick and then called it the best. I would continue asking new questions but for some reason you seem not to anwser even the most simple of questions. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/6/2006 10:39:48 PM

Willem wrote: > nightlight wrote: > ) That's not the "only" proof. Analysis of the algorithms, > ) as sketched below (and covered in more detail in the the > ) preprint & the tech reports) will give you the same answer, > ) showing it even more clearly. > > This analysis is no different from any other analysis in that > you have to make lots of assumptions. This means that if you use > such an analysis to make real-world predictions, then that depends > on how well your assumptions match the real world. > > > > Because of the massive nature of your posts, I haven't been able to > find an answer to a few questions I have about the QI coder: I agree that someone having enough time for reading his massive posts could rather read the paper and find answers himself. I will try to answer some questions on the basis of my understanding of what he has posted here (I have not read his papers yet). > - if I read you correctly, it is not an adaptive coder, > so how do you transmit the model information for the QI coder ? He says that QI is based on enumerative coding, where models are conceptually different from PPM models which we are more familiar with. So it will probably mean that we will need to study EC from ground up and then see how QI fits in (atleast I will have to as I am not familiar with EC) and how it relates to PPM models. Or if someone can summarize the difference here for the people like me, please do so. > - how would you use a QI coder with an adaptive model ? He said that QI is the "natural" choice for BWT post-processing. This probably means that QI itself cant be used for higher order adaptive coding but by using BWT, the higher-order-adaptive-modelling problem can be reduced into something which QI can handle. > - assuming I have a stream of symbols, where at each position in > the stream, the probability distribution of the symbols is different, > then how does QI coder adapt itself to all those different distributions ? I dont know the answer to this one. I hope this helps (my apologies to nightlight if I am mistaken somewhere, feel free to correct me). As for nightlights's comments on QI's speed, I am afraid that as the modelling scheme for QI is different from modelling scheme for ArithCoding, we will need to compare speed of "QI+its modelling code" with "AC+its modelling code". Where both models should be of same order, or chosen to give same compression ratio. (My doubt here is "what if QI just *shifts* computation burden to modelling code instead of reducing it".) Sachin Garg [India] http://www.sachingarg.com

0 |

1/6/2006 10:54:36 PM

"Sachin Garg" <schngrg@gmail.com> wrote in news:1136588076.767621.250110@g14g2000cwa.googlegroups.com: > > Willem wrote: >> nightlight wrote: >> ) That's not the "only" proof. Analysis of the algorithms, >> ) as sketched below (and covered in more detail in the the >> ) preprint & the tech reports) will give you the same answer, >> ) showing it even more clearly. >> >> This analysis is no different from any other analysis in that >> you have to make lots of assumptions. This means that if you use >> such an analysis to make real-world predictions, then that depends >> on how well your assumptions match the real world. >> >> >> >> Because of the massive nature of your posts, I haven't been able to >> find an answer to a few questions I have about the QI coder: > > I agree that someone having enough time for reading his massive posts > could rather read the paper and find answers himself. > If the paper is that small pdf file at his site it does not anwser the simple questions being asked. So I don't see this as an anwser. And if its some stuff where one would have to sign some nondiscloser agreement I think that to would be a waste to time. Since the questions asked of him are not that hard. > I will try to answer some questions on the basis of my understanding > of what he has posted here (I have not read his papers yet). > >> - if I read you correctly, it is not an adaptive coder, >> so how do you transmit the model information for the QI coder ? > > He says that QI is based on enumerative coding, where models are > conceptually different from PPM models which we are more familiar > with. > > > So it will probably mean that we will need to study EC from ground up > and then see how QI fits in (atleast I will have to as I am not > familiar with EC) and how it relates to PPM models. > > Or if someone can summarize the difference here for the people like > me, please do so. > He seems to belive arithemtic compression is a poor approximation of EC compression. Here is summary take two symbol one and zero suppose you have 2 ones and 2 zeros then there are 4!/(2!*2!) or 6 combinations. his method would assign the 6 possible numbers to this problem so you could say its exact. At least if numbers small. But the paper really doesn't say how to do compression. You read various other papers that show how to assign a number to a combination. All he seem to do is get a number for a combination. He has in his paper the example of 00101001 in which he calculates the value as 2+6+35 = 43 this was done in long post message 59 at http://groups.google.com/group/comp.compression/browse_frm/thread/ffc7f7ca8 4c76378/30566aec6aa7d363?q=nightlight&rnum=1#30566aec6aa7d363 Big deal thats not compression. He state there "The answer is on page 6, where it shows the string index I(S8)=2+6+35=43 and how to calculate it (eq. 17). The actual size in bits of the index is L=log(56)=5.807... bits since the valid values of the index are 0..55 (there are 56 paths from A to B). The fractional bits of L don't normally go to waste since they are coded in the mixed radix with other items sent, usually with the fractional bits of indices for other blocks (cf. p.9 [N4]). The encoder also sends the count of 1's per block, which is 3 here and the length of the entire string which is 8. The latter two items get coded in variety of ways in different variants and different parts of the coder/s and experimental prototypes (cf. p.9 [N6])." This means he never compares it to an arithmetic compressor. He only makes claims thats it better. I think if he could actually compress the one example he gives in paper then he might be more beliveable. But he wants to say its exact. But he can't be pinned down on any application that compares it to an arithmetic. So not having any simple examples done in a complete way says him being shown arithmetic ways that beat his method. He leaves it to you to do the coding. At which point he could claim you didn't do it the right way. Actually its a clever way to prevent it from being compared to any real world entropy compressor. So the only thing I got out of the paper besides the fact he never completes anything is to say for a string made of only ones and zeros the compressed encoded result is 3 items that are easy to combine just don't ask him how to combine them since he seems not likely to do so. 1) the length of entire string 2) the number of ones 3) the index value. He doesn't risk actually combining these in an output string its possible he does not want to risk being laughed at or he fears one could show how a simple bijective string compressor gets better results. If you can think of any other reason no examples like this are done to the end please Schin tell us what you think it is. >> - how would you use a QI coder with an adaptive model ? > > He said that QI is the "natural" choice for BWT post-processing. This > probably means that QI itself cant be used for higher order adaptive > coding but by using BWT, the higher-order-adaptive-modelling problem > can be reduced into something which QI can handle. > >> - assuming I have a stream of symbols, where at each position in >> the stream, the probability distribution of the symbols is different, >> then how does QI coder adapt itself to all those different >> distributions ? > > I dont know the answer to this one. > Maybe it depends on the defination of "natural" > I hope this helps (my apologies to nightlight if I am mistaken > somewhere, feel free to correct me). > I suspect he will paste in lots of links to various papers that seems to be his style. So don't hold your breath for useful anwsers. They may ot may not be related to your questions. > As for nightlights's comments on QI's speed, I am afraid that as the > modelling scheme for QI is different from modelling scheme for > ArithCoding, we will need to compare speed of "QI+its modelling code" > with "AC+its modelling code". Where both models should be of same > order, or chosen to give same compression ratio. (My doubt here is > "what if QI just *shifts* computation burden to modelling code instead > of reducing it".) > > Sachin Garg [India] > http://www.sachingarg.com > David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/7/2006 2:15:24 AM

nightlight wrote: > This truncation of the infinite fractions even for a small number > of symbols (which is absent in QI's integer format of addends), > is a loss of precision which leads to AC losing parts of its > coding interval in each step. If one were to use fractions of > infinite precision, all intervals would fit exactly next to each > other, without gaps. Since allowing intervals to overlap would > result in non-decodable output (by Kraft inequality), any loss in > precision for specifying interval boundaries must leave unused > gaps in the output code space. Discarding part of the range is one way to deal with finite precision, for example the carryless rangecoder in ppmd. However the various coders in paq6, paq7, fpaq0, etc. do not have any gaps in the code space. These are carryless binary arithmetic coders with 32 bits precision and 12 bit representation of probabilities. They output a byte at a time so most of the time the range is represented with 24 bits precision. Coding loss occurs due to rounding of the probability. In the worst case the range is 2 and the probability is forced to 1/2, but this is rare. The practical effect is to increase the compressed size by 0.0001 bpc on typical inputs. The coding error can be made arbitrarily small with little effort. In paqar and pasqda the coder includes a carry counter and outputs a bit at a time, so the range always has at least 30 bits of precision. Redundancy due to rounding error e in the probability is O(e^2), or about 2^-60. If this is still too big, you could go to 64 bit arithmetic and reduce the redundancy to about 2^-124. I am sure there are applications for QI, but in the PAQ series even a perfect coder would have negligible effect on both compression ratio and speed, since CPU time is dominated by modeling. Being restricted to an order 0 model seems like a severe disadvantage. How would you transform a context mixing model to order 0? -- Matt Mahoney

0 |

1/7/2006 3:42:31 AM

Hi David, How is your health these days? I hope you are doing better now. > >> Because of the massive nature of your posts, I haven't been able to > >> find an answer to a few questions I have about the QI coder: > > > > I agree that someone having enough time for reading his massive posts > > could rather read the paper and find answers himself. > > > > If the paper is that small pdf file at his site it does not anwser > the simple questions being asked. So I don't see this as an anwser. > And if its some stuff where one would have to sign some nondiscloser > agreement I think that to would be a waste to time. Since the questions > asked of him are not that hard. Oh, I had presumed that hidden in all papers he links to, there will be answers to the questions here, I didn't realized that they have only incomplete examples (maybe complete from QI perspective, but incomplete from compression perspective, which makes them useless atleast for us). > >> - how would you use a QI coder with an adaptive model ? > > > > He said that QI is the "natural" choice for BWT post-processing. This > > probably means that QI itself cant be used for higher order adaptive > > coding but by using BWT, the higher-order-adaptive-modelling problem > > can be reduced into something which QI can handle. > > Maybe it depends on the defination of "natural" I guess what he meant was, more efficient than MTF etc... Anyway, we can leave discussions on this, hopefully he will come up with a BWT based compressor implementation to prove his point. Sachin Garg [India] http://www.sachingarg.com David A. Scott wrote: > "Sachin Garg" <schngrg@gmail.com> wrote in > news:1136588076.767621.250110@g14g2000cwa.googlegroups.com: > > > > > Willem wrote: > >> nightlight wrote: > >> ) That's not the "only" proof. Analysis of the algorithms, > >> ) as sketched below (and covered in more detail in the the > >> ) preprint & the tech reports) will give you the same answer, > >> ) showing it even more clearly. > >> > >> This analysis is no different from any other analysis in that > >> you have to make lots of assumptions. This means that if you use > >> such an analysis to make real-world predictions, then that depends > >> on how well your assumptions match the real world. > >> > >> > >> > >> Because of the massive nature of your posts, I haven't been able to > >> find an answer to a few questions I have about the QI coder: > > > > I agree that someone having enough time for reading his massive posts > > could rather read the paper and find answers himself. > > > > If the paper is that small pdf file at his site it does not anwser > the simple questions being asked. So I don't see this as an anwser. > And if its some stuff where one would have to sign some nondiscloser > agreement I think that to would be a waste to time. Since the questions > asked of him are not that hard. > > > I will try to answer some questions on the basis of my understanding > > of what he has posted here (I have not read his papers yet). > > > >> - if I read you correctly, it is not an adaptive coder, > >> so how do you transmit the model information for the QI coder ? > > > > He says that QI is based on enumerative coding, where models are > > conceptually different from PPM models which we are more familiar > > with. > > > > > > So it will probably mean that we will need to study EC from ground up > > and then see how QI fits in (atleast I will have to as I am not > > familiar with EC) and how it relates to PPM models. > > > > Or if someone can summarize the difference here for the people like > > me, please do so. > > > > He seems to belive arithemtic compression is a poor approximation > of EC compression. Here is summary take two symbol one and zero > suppose you have 2 ones and 2 zeros then there are 4!/(2!*2!) or 6 > combinations. his method would assign the 6 possible numbers to this > problem so you could say its exact. At least if numbers small. But > the paper really doesn't say how to do compression. You read various > other papers that show how to assign a number to a combination. > > All he seem to do is get a number for a combination. > He has in his paper the example of 00101001 in which he calculates > the value as 2+6+35 = 43 this was done in long post message > 59 at > > http://groups.google.com/group/comp.compression/browse_frm/thread/ffc7f7ca8 > 4c76378/30566aec6aa7d363?q=nightlight&rnum=1#30566aec6aa7d363 > > Big deal thats not compression. He state there > > "The answer is on page 6, where it shows the string index > I(S8)=2+6+35=43 and how to calculate it (eq. 17). The actual size in > bits of the index is L=log(56)=5.807... bits since the valid values of > the index are 0..55 (there are 56 paths from A to B). The fractional > bits of L don't normally go to waste since they are coded in the mixed > radix with other items sent, usually with the fractional bits of > indices for other blocks (cf. p.9 [N4]). The encoder also sends the > count of 1's per block, which is 3 here and the length of the entire > string which is 8. The latter two items get coded in variety of ways in > different variants and different parts of the coder/s and experimental > prototypes (cf. p.9 [N6])." > > This means he never compares it to an arithmetic compressor. He only > makes claims thats it better. I think if he could actually compress the > one example he gives in paper then he might be more beliveable. But he > wants to say its exact. But he can't be pinned down on any application > that compares it to an arithmetic. So not having any simple examples > done in a complete way says him being shown arithmetic ways that beat > his method. He leaves it to you to do the coding. At which point he > could claim you didn't do it the right way. Actually its a clever way > to prevent it from being compared to any real world entropy compressor. > > So the only thing I got out of the paper besides the fact he never > completes anything is to say for a string made of only ones and zeros > the compressed encoded result is 3 items that are easy to combine > just don't ask him how to combine them since he seems not likely to do so. > 1) the length of entire string > 2) the number of ones > 3) the index value. > > He doesn't risk actually combining these in an output string > its possible he does not want to risk being laughed at or he > fears one could show how a simple bijective string compressor gets > better results. If you can think of any other reason no examples > like this are done to the end please Schin tell us what you think > it is. > > > > > >> - how would you use a QI coder with an adaptive model ? > > > > He said that QI is the "natural" choice for BWT post-processing. This > > probably means that QI itself cant be used for higher order adaptive > > coding but by using BWT, the higher-order-adaptive-modelling problem > > can be reduced into something which QI can handle. > > > >> - assuming I have a stream of symbols, where at each position in > >> the stream, the probability distribution of the symbols is different, > >> then how does QI coder adapt itself to all those different > >> distributions ? > > > > I dont know the answer to this one. > > > > Maybe it depends on the defination of "natural" > > > I hope this helps (my apologies to nightlight if I am mistaken > > somewhere, feel free to correct me). > > > > I suspect he will paste in lots of links to various papers that > seems to be his style. So don't hold your breath for useful anwsers. > They may ot may not be related to your questions. > > > As for nightlights's comments on QI's speed, I am afraid that as the > > modelling scheme for QI is different from modelling scheme for > > ArithCoding, we will need to compare speed of "QI+its modelling code" > > with "AC+its modelling code". Where both models should be of same > > order, or chosen to give same compression ratio. (My doubt here is > > "what if QI just *shifts* computation burden to modelling code instead > > of reducing it".) > > > > Sachin Garg [India] > > http://www.sachingarg.com > > > > > > David A. Scott > -- > My Crypto code > http://bijective.dogma.net/crypto/scott19u.zip > http://www.jim.com/jamesd/Kong/scott19u.zip old version > My Compression code http://bijective.dogma.net/ > **TO EMAIL ME drop the roman "five" ** > Disclaimer:I am in no way responsible for any of the statements > made in the above text. For all I know I might be drugged. > As a famous person once said "any cryptograhic > system is only as strong as its weakest link"

0 |

1/7/2006 8:04:48 AM

["Followup-To:" header set to comp.compression.] On 2005-12-29, nightlight <nightlight@omegapoint.com> wrote: > I received some questions about the source and the algorithm via email > and since the matter may be of interest to others who downloaded it, I > will > reply here as well: > >> Concerning the source, I am afraid it is of little use in this form. >> I don't run windows here, and I do not feel like I would really need it. >> Thus, if you'd like to present your work, it would be most useful to >> depend on ANSI-C only. > > The main nonstandard VC6/win32 C use in the source has to do with the > nanosecond timer & thread control (in Qiutl.c). the thread control functions are never called. you have arrays with no length declared in headers (I'm guessing they should be declared extern) in the headers.... the timer stuff I replaced with calls of the posix gettimeofday(), (I figure microsecond precision is close enough) I turned off the ASM. renamed the files to match the names given in the source. produced stubs for the conio calls you make, and wrote a makefiles replaced the integer types with names from <stdint.h> it compiles. and does something what is all that output supposed to mean, or more to the point, what do I do to get efficiency statistics, elapsed time, compressed size, that sort of stuff. Also I found some printf()s (for error conditions) with the wrong number of arguments and a few other wierdnesses. another probem with your code is that as it stands it seems that it only tests your algorithm's ability to compress pseudo-random data... pseudo-random data is theorietically extremely compressible. Bye. Jasen

0 |

1/7/2006 9:53:19 AM

"nightlight" <nightlight@omegapoint.com> wrote in news:1136572988.625145.191290@z14g2000cwz.googlegroups.com: ..... > infinite precision, all intervals would fit exactly next to each > other, without gaps. Since allowing intervals to overlap would > result in non-decodable output (by Kraft inequality), any loss in > precision for specifying interval boundaries must leave unused > gaps in the output code space. > > The basic arithmetic difference in coding is the extra loss of > precision for AC. A rough analogy would be as if two of us are > balancing some expenses and I use exact integer number of cents > from the receipts, while you take integers from the receipts > and divide them by some large total, then, since you will > generally get an infinite decimal fraction, you terminate it to > some number of places. Hence you're making an error even before > the first add, while my integer scheme won't have any error at > all (until the sum reaches certain magnitude). > > The QI.exe included with the source has a command "cbr" which > lists all such code space gaps for any n-symbol input, as well as > the cumulative redundancy in bits resulting from the gaps. > ..... You don't seem to grasp the obvious. There are arithmetic coders that have zero gaps. You don't know what you are talking about. One such coder is arb255.exe you seem to repeat useless stuff over and over with out actually thinking. You can't anwser simple questions or provide simple anwsers. Who are you kidding? David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/7/2006 2:19:31 PM

nightlight wrote: ) This truncation of the infinite fractions even for a small number ) of symbols (which is absent in QI's integer format of addends), ) is a loss of precision which leads to AC losing parts of its ) coding interval in each step. If one were to use fractions of ) infinite precision, all intervals would fit exactly next to each ) other, without gaps. Since allowing intervals to overlap would ) result in non-decodable output (by Kraft inequality), any loss in ) precision for specifying interval boundaries must leave unused ) gaps in the output code space. This paragraph clearly demonstrates that you do not understand well enough how Arith Encoding works. Any decent AC does *not* lose parts of its coding interval each step. Try to get that through your head. It *is* possible (hell, it's quite easy) to get the intervals to line up exactly without infinite precision. SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT

0 |

1/7/2006 2:39:04 PM

"Matt Mahoney" <matmahoney@yahoo.com> wrote in news:1136605351.691283.112110@g47g2000cwa.googlegroups.com: > From: "Matt Mahoney" <matmahoney@yahoo.com> > Newsgroups: comp.compression,sci.math > > nightlight wrote: >> This truncation of the infinite fractions even for a small number >> of symbols (which is absent in QI's integer format of addends), >> is a loss of precision which leads to AC losing parts of its >> coding interval in each step. If one were to use fractions of >> infinite precision, all intervals would fit exactly next to each >> other, without gaps. Since allowing intervals to overlap would >> result in non-decodable output (by Kraft inequality), any loss in >> precision for specifying interval boundaries must leave unused >> gaps in the output code space. > > Discarding part of the range is one way to deal with finite precision, > for example the carryless rangecoder in ppmd. However the various > coders in paq6, paq7, fpaq0, etc. do not have any gaps in the code > space. These are carryless binary arithmetic coders with 32 bits > precision and 12 bit representation of probabilities. Matt I think you are correct in saying the coder used in fpaq0 has no gaps. But the over all code it produces does have gaps due to the modeling. It has nothing to do with the arguement with nightlight since not sure he has a grasp of arithmetic at all. I am not just talking about the file endings I am taking about gaps that exist through out the whole output file because of the model. In your model you use 9 bits for every 8 bits of data where starting bit is a ZERO for each byte and then for EOF you allow the starting bit to be ONE and stop the compressor. This does not allow all the code space to be used. As a result of this modeling every compressed file with FPAQ0 has the first bit of the fist byte set on output so technically the first bit out is a total waste of space. However its not that bad as far as total number of extra bits. Most add the waste purely at end or with a count field at the beginning. Without all the complications of a true bijective compressor you could drop back to 8 bits per symbol and then when marking the compression done you just start with a pretend bit for the EOF such that X2 changes then flush as before. On decompression you check for EOF at start of each new byte if all the bytes of archive have been read. If not just continue in loop. When checking you can caluclate exactly when to stop in fact you can do a free check to see if file is actaully the result of the compression. You don't have a bijective file compressor at this point but its with in a byte or two for even every long files. To go the extra step for a full bijective file compressor you would have to do what arb255.exe did which is basically use the last bit that is a one in the file as the EOF endicator or if last byte all zeros or a zero followed by tail of 100..'s you add in bijecitvely a last bit that is a obe which takes a lot or overhead for just 2 bytes or so of saving at a cost of time. Let me state your model does not cost more than what other people do in fact it is slick. However the cost for a N byte file being compressed is -lg(1/2) -ln(2/3) - .. -ln(N/N+1) for the zero codings plus the length for 1 coding at end which is -ln(1/N+2) bits the zeroes add to -ln(1/N+1) bits. For a file 100,000 bytes long this is 16.6096549013 bits due to zeroes and is 16.6096693280 due to one. for a total of 4 extra bytes that are not needed. The cost of the zeros can be totally eliminate with very little change to FPAQ0 the cost of the One would be the current cost of a one in the table and could be no greater than what you are currently using. This method would allow dense packing and fully use the compressed space with out gaps except for the very ending bytes. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/7/2006 3:17:10 PM

Willem <willem@stack.nl> wrote in news:slrndrvkk7.188o.willem@toad.stack.nl: > nightlight wrote: > ) This truncation of the infinite fractions even for a small number > ) of symbols (which is absent in QI's integer format of addends), > ) is a loss of precision which leads to AC losing parts of its > ) coding interval in each step. If one were to use fractions of > ) infinite precision, all intervals would fit exactly next to each > ) other, without gaps. Since allowing intervals to overlap would > ) result in non-decodable output (by Kraft inequality), any loss in > ) precision for specifying interval boundaries must leave unused > ) gaps in the output code space. > > This paragraph clearly demonstrates that you do not understand well > enough how Arith Encoding works. Any decent AC does *not* lose parts > of its coding interval each step. Try to get that through your head. > It *is* possible (hell, it's quite easy) to get the intervals to line > up exactly without infinite precision. > > > SaSW, Willem It his inability to grasp this simple well understood fact that makes one wonder if he understands the subject field at all. Of course you can expect his usually reply with lines and lines of quoted text having nothing to do with this fact which he either will not or can not seem to grasp. He reminds me of the current crop of students who are never corrected when thay make common errors so they just go on making ever larger errors while never having learned how to learn from mistakes. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/7/2006 3:25:13 PM

> This analysis is no different from any other analysis > in that you have to make lots of assumptions. This > means that if you use such an analysis to make > real-world predictions, then that depends > on how well your assumptions match the real world. I see now where you're misreading my redundancy statements in this and in the earlier thread. The redundancy I was deriving in [T3] at end of page 8, and talking about here, is simply the number of excess bits that the finite precision arithmetic used by QI will add to the unlimited precision enumerative code. The paragraph on page 8 after all, starts with "To obtain QI redundancy d(g) _due to SW rounding_ in (21)..." The upper bound for _this_ redundancy d(g) obtained there is: d(g) < log(e)/2^(g-1) .... (1) No model or source assumptions are needed for (1), and none is reflected in (1). The only parameter in (1) is the assumed arithmetic precision of g bits (the SW mantissa length). The eq. (1) is simply an upper bound on any additional bits relative to the exact EC, produced by QI due to its use of SW rounding up in eq. (21). It has nothing to do with how well EC or QI will code in any given setup (model, source, parameter coding method, etc). If the exact EC model is perfect in a given setup, than (1) shows what is the maximum that QI can fall short of that "perfect" output. If EC is coding using an imperfect model, resulting in some redundancy of R bits per symbol relative to the best model, (1) shows the maximum that QI can add to R. But (1) doesn't care or predict what R is. The two types of redundancy are completely unrelated. The question of how well the exact EC and AC can code in different coding setups, is an entirely different and a much larger topic, well covered in the literature, starting with Davisson's [28] and many other papers since. Krichevsky-Trofimov's paper [33] provides great many bounds for variety of coding setups. Some related later results are in [40],[41],[41a]-[41d]. The basic result is that even for the unlimited precision coders (and coding under 'everything else set the same') the exact EC has a slightly lower redundancy than the exact AC (by approximately 1 bit for the entire input, for max & avg). This is the same difference as between the Huffman and the Shannon-Fano prefix codes. Even the origin of the difference is the same: the bottom-up addend construction, as done by QI and Huffman, is tighter than the top down addend construction, as done by AC & Shannon-Fano. Now to your main questions, starting with the last one, which is the most specific: > assuming I have a stream of symbols, where at each > position in the stream, the probability distribution > of the symbols is different, then how does QI coder > adapt itself to all those different distributions ? This is the scenario of AC modeling engine feeding QI, which was sketched in note N7 on p. 9, [T3]. Two ways are described there: a) --- QI coding AC style QI can code the AC style by performing "Lattice Jumps". It is simplest to see how this is done by looking at the decoder, arriving at some point B=(x,y) (see Fig. 1, p. 4, [T3]). The path count at B is N(x,y)=56. The index at point B can have values 0..N(x,y)-1, hence the length of the index at B is log(N(x,y))=log(56) bits. If bit=1 gets decoded (as shown by path on Fig 1), the decoder moves up, to point BA=(x,y-1), which has the path count 21, hence the index has length log(21) bits. Hence, upon decoding bit=1 at B, the index length has dropped by log(56)-log(21)=log(8/3) bits, which is precisely the ideal code length log(1/p) for bit=1 at B, where p=3/8=probability of 1 at B. If bit=0 gets decoded at B, decoder moves to point BL=(x-1,y) where path count is N(x-1,y)=35, hence the index length is log(35). In this case the index length drops by log(56)-log(35)=log(8/5) which is exactly same as the ideal code length log(1/q), where q=5/8 is probability of 0 at B. It is easy to see from multiplicative recurrences for binomial coefficients (eq's (1) & (2) from the previous post here) that this pattern always holds - after every decode step, the index length drops by exactly log(1/P), where P is the probability of the decoded symbol. Analogous relation holds for each encode step, where the index length increases by the ideal code length of the encoded symbol at that point. Note also that due to integer arithmetic, this is not an approximate optimality (such as one would get using truncated infinite fractions, as AC does). With QI/EC, this coding optimality at every point is built into the table entries. { You can check the quantization errors using e.g. QI.exe cbr n36, which shows no quantization errors for n=36 (or below), and with n=37, the 1st error for k=16, of just +1 in the SW mantissa which adds 4e-10 bits to the index.} With QI, for a general point B=(x,y), the quantized path count L(x,y) (computed via (21)) is an SW integer with a g-bit mantissa w(x,y) and exponent e(x,y). The ideal code lengths and ratios for the steps from B described above still hold, but only within the limits d(g). In particular, L(x,y-1)/L(x,y) is approx. =p=y/(x+y) and L(x-1,y)/L(x,y)=q=x/(x+y). The index at B will have for the leading g bits at the bit offset e(x,y) some g-bit integer Iw(x,y) which is in the interval [0,w(x,y)-1] (this is a simple consequence of the index at any point ranging from 0 to path count-1 and the fact that quantized path count L(x,y) has trailing zeros after the leading g bits given by w(x,y), hence L(x,y)-1 will decrement w(x,y)). We can thus view for any point B(x,y) and index I(x,y), the Iw(x,y) as a digit in the radix w(x,y). Suppose now, decoder at B=(x,y) gets from the modeler some probabilities p' and q' different from p,q. To continue decoding, decoder makes a jump to another lattice point B'=(x',y') where x'/(x'+y')=p' and y'/(x'+y')=q'. One can use Farey fractions (see [F]) to obtain the optimum such point for any given binomial table size. Alternatively, one can simply jump to another point on the same front i.e. one would keep n fixed, x+y=n=x'+y' and select point B' using x'=n*p'. The path count at B' is L(x',y') with mantissa w(x',y') and exponent e(x',y'), which are different from w(x,y) and e(x,y). The exponent is easy to adjust: you simply change the presumed position of the least significant bit of the index I(x',y') (this is the origin, A on Fig 1., toward which decoder is heading, but hasn't reached yet since there are more symbols to decode; in the QI source code in file EncDec.c this presumed origin of the index is given as argument "sb" to function qiDec()). The main work is with the difference in the path count mantissas w(x,y) and w(x',y') at B and B'. Namely at B' the leading g bits of index Iw(x',y') have to be a digit in the radix w'=w(x',y'). But we only have a g-bit digit left over from B which is in the radix w=w(x,y). So, the problem here is that of radix conversion -- we have a digit Iw in radix w and we need a digit Iw' in radix w'. There are several ways to do this. A conceptually simple one is as follows: decoder extracts the digit Iw and encodes it as digit of some mixed radix output integer M, which serves as an accumulator or recycler for all such 'orphaned' Iw digits. The bits of M (which are an arbitrary bit pattern, being a binary form of a mixed radix integer) can simply be reused, e.g. by growing M at the unprocessed end of the compressed input (or just having M as separate component). At this stage the encoder would have done the opposite - it would have "decoded" (see file Radix.c, function dec_radix()) the far end of the compressed data (which was an arbitrary binary pattern) into a digit Iw in radix w and concatenated it to the leading end of the index. There are other similar ways to perform this radix conversion, all of them using amount of processing per symbol very similar to the conventional AC algorithm. They all also have to perform explicit coding/decoding operations (which include mul/div) for both, the most and the least probable symbols, just as AC does. The summary of this is that if you want the AC modeling plus AC coding style, you get the AC speed and the AC accuracy. The AC scheme, with its 'single next symbol probability' bottleneck interface between the modeler & the coder (where the modeler micro-manages the coder, symbol by symbol, and where the whole coder+ modeler processing and interaction is traversed from top to bottom on every symbol) is simply intrinsically a poor division of labor to allow for any high performance coding. It is analogous to organizing car manufacturing, and requiring that the next car can be started only after the current car is complete and out the door. That's a kind of conceptual constraint imposed by the AC modeling "paradigm" as its so-called "online" coding requirement. This online" is taken to mean some kind of analog, memoryless, CPU-less, Morse telegraph device. That has nothing to do with the actual online as it is done, or any actual requirements or inherent design constraints. One normally has a fairly large buffer space and processor which can access any of it, running programs of high complexity. Internet would grind to a halt if its protocols interpreted "online" as a constraint to have to send or receive a single bit (or a single symbol) at a time. Even the old style point-to-point modems had several KB buffers to accumulate, batch and compress the data. And similarly for disk sectors & clusters. The point of the above critique of the present AC "paradigm" is that it is simply a gratuitous, historically accidental conceptual bottleneck and an intrinsic performance drain. Any algorithm that follows its prescribed division of labor will bog down. Once you snap out of its "online" spell, many better possibilities open up. For example, even retaining the AC modeling engine, with its "probability of the next single symbol" bottleneck parametrization of all the information about the sequence being encoded, but just allowing coder to ignore the imagined "online" constraint, one can get much better performance with QI as follows: b) --- QI Modeling AC style QI breaks the probabilities into classes, so that each class includes an interval of probabilities of size 1/sqrt(n), where n is the size of data to be encoded. Since coder doesn't assume any more the Morse telegraph kind of "online", it doesn't assume n is 1 but some much larger number. The modeler is still left to work under the old "online" spell and imagine that it has to convert, by hook or crook, all it knows or that it could know about the input sequence into the probabilities for the next single symbol p(c). Consider now a binary sequence of n symbols, for which the modeler produces, symbol by symbol probabilities p of bit=1, with p in some interval D=[a,b), of size d=b-a. We divide D into s=sqrt(n) equal subintervals of lengths d/s. Each input symbol is assigned to one of s enumerative classes (thus enumerated by a separate index) based on the subinterval in which the modeler's p at that point falls in. Hence, we're using quantized probabilities to classify the symbols as "equiprobable". The excess output E in bits per symbol due to this 'p quantization' is about (cf. [41c], p. 8): E = [dp^2/p+dq^2/q)]*log(e)/2 .... (2) where dp & dq are quantization errors of p and q. Since dp & dq are =< 1/s, then E =< log(e)/2npq = O(1/n). Note that since adaptive AC probability estimates have also sampling error dp, dq of order 1/sqrt(n), this redundancy is of the similar size as that of the adaptive AC. One can further optimize this method (to reduce its worst case) by selecting non-uniform partition of interval D, so that the subintervals around smaller probabilities are shorter. In practical situations, the AC modeler would be producing its predictions p(c) based on statistics from the processed part of the input, hence its probabilities would already have a built in sampling error interval (which decreases as 1/sqrt(n)), which can be used by the QI coder as the partition criteria for the enumerative classes (instead of an ad hoc partition described above). Various existent methods for growing and splitting the contexts based on such past statistics, such as CTW or Rissanen's Context, would transfer here as methods for generating enumerative classes adaptively. For the multi-alphabet case one would perform the decomposition described [T1] pp. 31-38 with the only difference that instead of combining the symbol counts k(c) based on symbol's binary code c, one would combine the modeler's probabilities p(c). A special case of interest for this method are the finite order Markov sources. Here, for order m, the probabilities of the next symbol are defined by the m previous symbols. For smaller m, one could simply bypass the computation of probabilities (since QI doesn't need them) and simply assign the enumerative class for the next input symbol directly: using m previous symbols as the class tag (hence there would be 2^m classes in binary case). In this case we can notice another advantage of QI/EC coder for modeling over the AC: to encode a symbol QI needs to know only whether the symbol has the same or different probabilities as some other symbols, but unlike AC, QI doesn't also need to know what values these probabilities have. Hence, QI places much lower demand on the modeling engine, since the modeler here can simply pass on to QI the context ID (the last m symbols) and QI will code the symbol into the index for that ID, whatever its probability may be. In conclusion for this method (b), QI can use AC modeling engine, with its full speed advantage over AC, and with the redundancy being same as that of an adaptive AC in the same setup. > how would you use a QI coder with an adaptive model ? Sections (a) and (b) above have couple answers. > if I read you correctly, it is not an adaptive coder, > so how do you transmit the model information for the > QI coder ? It is not "adaptive AC", but as illustrated in (a),(b) above, it can function that way. The native QI modeling is descriptive, in the sense of Rissanen's MDL. So the QI model information is a much wider type of information than just a list of probabilities (although it can be that, too). Consider an order-0 QI coding. The modeling analogous to the order-0 adaptive AC, except more resilient against the "surprises", becomes here the selection of the segmentation of the input into contiguous sections, based on measured symbol frequencies. Its resilience to surprises is illustrated in the row "Vary" (cf. table on page 10 in [T3]). The adaptive order-0 modeler for QI has entire input sequence available and it does not have to rely on a possibly false assumption that the symbol frequencies in the initial parts of the sequence are predictive of the frequencies in the later parts, or gamble on which way might they be predictive. While AC can code this way, too, all it would achieve with that would be to advance into a literal role of a less accurate and a lot slower imitation of QI. QI order-0 adaptive modeler identifies contiguous quasi-stationary sections of the input sequence and uses them as enumerative classes. There are many ways to do such segmentation and even more ways to encode it, along with the corresponding section counts, into the output. Some of these methods, especially the encoding aspect, were developed already for conventional EC, such as described in [11]-[15], [23]. I have also developed several for QI (some of which were touched upon in [T2], where the general QI/EC modeling pattern is presented, pp. 26-35). Due to a lack practical EC coder and the exaggerated dominance of the AC modeling paradigm (hypertrophied to the point of pathology by the absence of practical competition), this entire field of EC modeling is highly under-explored. With the precision & performance problems finally solved by QI, an algorithmic gold mine has opened, where just about anything you do, and there is more to do than an eye can see, is a new algorithm, maybe a great new discovery to be taught to kids ever after. -- References ( http://www.1stworks.com/ref/RefLib.htm ) T1-T3 are on http://www.1stworks.com/ref/qi.htm 28. L.D. Davisson "Universal noiseless coding" IEEE Trans. Inform. Theory IT-19 (6), 783-795, 1973 http://cg.ensmp.fr/~vert/proj/bibli/local/Davisson1973Universal.pdf 33. R. Krichevsky, V. Trofimov "The performance of universal encoding" IEEE Trans. Inform. Theory IT-27 (2), 199-207, 1981 http://cg.ensmp.fr/~vert/proj/bibli/local/Krichevsky1981performance.pdf 41. M.Drmota, H-K. Hwang, W. Szpankowski "Precise Average Redundancy of an Idealized Arithmetic Coding" DCC 2002, 222-231. http://citeseer.ist.psu.edu/drmota02precise.html 34. J.G. Cleary, I.H. Witten "A Comparison of Enumerative and Adaptive Codes" IEEE Trans. Inform. Theory IT-30 (2), 306-315, 1984 http://www.1stworks.com/ref/Cleary84Enum.pdf F. Farey Fractions: http://www.cut-the-knot.org/ctk/PickToFarey.shtml 41c. P.G. Howard, J.S. Vitter "Practical Implementations of Arithmetic Coding" Tech. Rep. No. 92-18, CS, Brown University, 1992 http://www.1stworks.com/ref/Howard92PractAri.pdf

0 |

1/7/2006 5:25:12 PM

nightlight wrote: ) > if I read you correctly, it is not an adaptive coder, ) > so how do you transmit the model information for the ) > QI coder ? ) ) It is not "adaptive AC", but as illustrated in (a),(b) ) above, it can function that way. The native QI modeling ) is descriptive, in the sense of Rissanen's MDL. So the ) QI model information is a much wider type of information ) than just a list of probabilities (although it can be ) that, too). ) ) <snip theoretical discussion> I have read through your entire description, and I haven't found even a single hint to the practical question of what exactly the QI coder needs as modeling information to do its job. Can I assume that 'enumerative coder' means that you need the exact symbol counts for each of the subclasses ? And if not, then what do you need ? Please try to explain this as simply as possible, within a single paragraph. SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT

0 |

1/7/2006 5:55:21 PM

nightlight <nightlight.skip-this@and-this.omegapoint.com> wrote in news:SY6dnUrMLIWIZyLenZ2dnUVZ_s6dnZ2d@rcn.net: > > This analysis is no different from any other analysis > > in that you have to make lots of assumptions. This > > means that if you use such an analysis to make > > real-world predictions, then that depends > > on how well your assumptions match the real world. > > I see now where you're misreading my redundancy > statements in this and in the earlier thread. Actually I think he is reading them correctly he has an understanding of arithmetic coding its not clear you do. .... > Now to your main questions, starting with the last one, > which is the most specific: > > > assuming I have a stream of symbols, where at each > > position in the stream, the probability distribution > > of the symbols is different, then how does QI coder > > adapt itself to all those different distributions ? > > This is the scenario of AC modeling engine feeding QI, > which was sketched in note N7 on p. 9, [T3]. Two ways > are described there: > > a) --- QI coding AC style ..... > > The summary of this is that if you want the AC > modeling plus AC coding style, you get the AC > speed and the AC accuracy. The AC scheme, with its > 'single next symbol probability' bottleneck interface > between the modeler & the coder (where the modeler > micro-manages the coder, symbol by symbol, and where > the whole coder+ modeler processing and interaction > is traversed from top to bottom on every symbol) > is simply intrinsically a poor division of labor > to allow for any high performance coding. > I gues that means it doesn't work so hot, Took you a long time to state it. So you are admiting the arithmetic may beat the QI when you attempt to shoe horn in your QI method where a arithmetic might be a natural fit. No wonder you changed Moffat to what ever porblem your doing instead of the other way around. .... > > b) --- QI Modeling AC style > .... > QI breaks the probabilities into classes, so that > each class includes an interval of probabilities of > size 1/sqrt(n), where n is the size of data to be > encoded. Since coder doesn't assume any more the Morse > telegraph kind of "online", it doesn't assume n is 1 > but some much larger number. The modeler is still left > to work under the old "online" spell and imagine that > it has to convert, by hook or crook, all it knows > or that it could know about the input sequence into > the probabilities for the next single symbol p(c). > > Consider now a binary sequence of n symbols, for which > the modeler produces, symbol by symbol probabilities > p of bit=1, with p in some interval D=[a,b), of size > d=b-a. We divide D into s=sqrt(n) equal subintervals > of lengths d/s. Each input symbol is assigned to > one of s enumerative classes (thus enumerated by a > separate index) based on the subinterval in which > the modeler's p at that point falls in. Hence, we're > using quantized probabilities to classify the symbols > as "equiprobable". The excess output E in bits per > symbol due to this 'p quantization' is about > (cf. [41c], p. 8): Well it likes like your having trouble since you need to break it up into smaller chunks. Thats to bad since a real entropy compressor would take a file of millions of symbols and still compress to roughly the same length no matter how the symbols arranged. Form what you wrote it seems like your only doing local considerations. .... .... > > > how would you use a QI coder with an adaptive model ? > > Sections (a) and (b) above have couple answers. > > > if I read you correctly, it is not an adaptive coder, > > so how do you transmit the model information for the > > QI coder ? > > It is not "adaptive AC", but as illustrated in (a),(b) > above, it can function that way. The native QI modeling > is descriptive, in the sense of Rissanen's MDL. So the > QI model information is a much wider type of information > than just a list of probabilities (although it can be > that, too). > > Consider an order-0 QI coding. The modeling analogous to > the order-0 adaptive AC, except more resilient against > the "surprises", becomes here the selection of the > segmentation of the input into contiguous sections, > based on measured symbol frequencies. Its resilience > to surprises is illustrated in the row "Vary" (cf. table > on page 10 in [T3]). The adaptive order-0 modeler > for QI has entire input sequence available and it > does not have to rely on a possibly false assumption > that the symbol frequencies in the initial parts of > the sequence are predictive of the frequencies in the > later parts, or gamble on which way might they > be predictive. While AC can code this way, too, > all it would achieve with that would be to advance > into a literal role of a less accurate and a lot > slower imitation of QI. > Again this show your lack of understanding just what an order-0 adaptive AC coder does. If a file is made up of 2 symbol types. And one wanted a to compress it by the above method it would get the same length no matter how things ordered. You could have all the zeros then the ones or any combination you would get roughly the same length file. In fact even in you own exaples of small strings you use the fact to get a combination number for each arrangemetn based entirely on the length of string and number of ones. That is what the adaptive order-o arithemtic coder would do. If you have to segment the file then local effects take over and you would not get the same compression for different combinations of the ones and zeros. The order if the symbols makes no difference. .... Here is a question I doubt you will anwser since you really don't seem to answer anything. But I leave so others can see just how your code can't solve simple problems. I go back yo your on example with a string where you use QI in lacttice jumps to fet an index. This truely is about as easy as it gets. Something even you could but your fingers around. You claim you need only three things. a) this combination index b) the number of ones in the string c) the length of string How would you combine this infromation into a string so decompresstion could be done. This is something even you should be able to do. Yet you will not. You may give several pages of references but you will not give one complete example. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/7/2006 6:43:18 PM

> Any decent AC does *not* lose parts of its coding > interval each step. Try to get that through your > head. It *is* possible (hell, it's quite easy) to > get the intervals to line up exactly without > infinite precision. It seems you and few others here are either restricting the noun "gap" to a subset of cases where gaps do occur, or you are making certain implicit assumptions about the source and the sequences it produces, without being aware of making them. In order to clarify which is the case here, why don't you demonstrate your easy and gapless finite precision binary AC coding for the finite input sequences produced by a binary source S3, which satisfy the following conditions: a) Each sequence from S3 is exactly 3 bits long, b) each sequence has exactly two bits=0 and one bit=1 and c) each sequence is equally likely as any other. How does your easy gapless AC coding for all possible input sequences from S3 look like? Since the inputs are pretty short, you should be able to show it all using an AC working in 4 bit precision. But if you need more, well use more (as long as the bit strings fit on a 72 char line). NOTE 1: The S3 output needs to be coded using a binary order-0 AC in incrementing or decrementing or static mode, which codes bit by bit as if the sequences have arbitrary lengths. Namely, the short sequences are given above only to allow you to show explicitly the coding, not to allow switching over to let you construct the small Huffman tree for the entire input (or some equivalent ad hoc codes). For example, the exactly same coding algorithm should be demonstrable (e.g. as a little sample exe coder that will work on a regular PC with, say, 1G of RAM) with inputs of, say, exactly 30 million bits long with exactly 10 million bits=1 set. In this case you couldn't code it via the multi-alphabet AC which treats the entire input as a single character of a gigantic alphabet. Hence, you can't use any such methods for S3 either. So, just show it for a plain bitwise binary AC. NOTE 2: You can't feed your AC input sequences with 0, 2 or 3 ones in order to fill the gaps and declare them as "used". The S3 specified does not produce any such sequences.

0 |

1/7/2006 7:32:11 PM

Can you email me a more detailed list of changes (or headers etc) and fixes so I can fix the copy on the web site? Microsecond timer ought to be Ok (the windows hi res is about 1/4 microsecond). Those test loops need to be decoupled to measure separately encode and decode times, and do the correctness check outside of that, reseting the generator to run 3 times with the same seed (it already has a spare copy of original table & function to do it). That way, even the time() would allow accurate speed with enough iterations. > what is all that output supposed to mean, > or more to the point, what do I do to get > efficiency statistics, elapsed time, > compressed size, that sort of stuff. The readme.txt explains the main information commands, such as ct (for tables statistics), cbr (binomial rows full detail, code space gaps, redundancies, roundoffs, mantissa excess, etc). The binary coding commands ci and cc have very similar output, so only cc is described in detail in the readme. The radix codes (fixed, mixed and factorial) commands cr, cf, cm, show the same info as cc, except that counts of 1's is replaced by radix. All times are given at the end of each line as ns/sym, separately for encoder and decoder. > another probem with your code is that as it stands it > seems that it only tests your algorithm's ability > to compress pseudo-random data... > pseudo-random data is theorietically extremely > compressible. The coders don't assume or look for a way to model the random generator. If this were just executable demo, than one could think of some cheating like that. But with source, it should be clear that the coders are just plain 0-order coders and nothing more. Adding a modeling engine to do the general file compression or such is one of the items that will get in eventually. The present source is only a research code for the few folks interested in exploring the algorithm and EC coding/modeling potential. Having played quite a bit with the algorithm and seen the advantages over the alternatives, the possibilites it opens in coding and modeling, I think that in few years this algorithm and its ofshoots will be running in every portable device and serve as the bitmap index and keyword incidence map coder in all the search engines and data-warehouses. At the moment, though, the number people who share this view fits in about 2.3219... bits.

0 |

1/7/2006 8:57:34 PM

nightlight wrote: ) It seems you and few others here are either restricting ) the noun "gap" to a subset of cases where gaps do occur, ) or you are making certain implicit assumptions about ) the source and the sequences it produces, without being ) aware of making them. Neither. See below. ) In order to clarify which is the case here, why don't ) you demonstrate your easy and gapless finite precision ) binary AC coding for the finite input sequences produced ) by a binary source S3, which satisfy the following ) conditions: ) ) a) Each sequence from S3 is exactly 3 bits long, ) ) b) each sequence has exactly two bits=0 and one bit=1 and ) ) c) each sequence is equally likely as any other. ) ) How does your easy gapless AC coding for all possible ) input sequences from S3 look like? Since the inputs ) are pretty short, you should be able to show it all ) using an AC working in 4 bit precision. But if you ) need more, well use more (as long as the bit strings ) fit on a 72 char line). ) ) NOTE 1: The S3 output needs to be coded using a binary ) order-0 AC in incrementing or decrementing or static mode, ) which codes bit by bit as if the sequences have arbitrary ) lengths. Namely, the short sequences are given above ) only to allow you to show explicitly the coding, not ) to allow switching over to let you construct the small ) Huffman tree for the entire input (or some equivalent ) ad hoc codes). ) ) For example, the exactly same coding algorithm should be ) demonstrable (e.g. as a little sample exe coder that ) will work on a regular PC with, say, 1G of RAM) with ) inputs of, say, exactly 30 million bits long with exactly ) 10 million bits=1 set. In this case you couldn't code it ) via the multi-alphabet AC which treats the entire input ) as a single character of a gigantic alphabet. Hence, ) you can't use any such methods for S3 either. So, ) just show it for a plain bitwise binary AC. ) ) NOTE 2: You can't feed your AC input sequences with ) 0, 2 or 3 ones in order to fill the gaps and declare ) them as "used". The S3 specified does not produce ) any such sequences. From this note, I assume the model is allowed to 'know' that the sequence will have a single '1' and two '0' bits, and update its probabilities accordingly. I'll sketch out an AC encoding, that should be enough: The starting range is [0..256) The steps for encoding the sequence '100' are: Step 1: Encode the symbol '1'. The probabilities are 1/3 for a '1' and 2/3 for a '0'. Therefore, the range is subdivided as follows: [0..170) for a '0', and [170..255) for a '1'. Thus, the range is reduced to [170.255) Step 2: Encode the symbol '0'. The probabilities are 0 for a '1' and 1 for a '0'. Therefore, the range is subdivided as follows: [170..255) for a '0', and [255..255) for a '1'. Thus, the range is reduced to [170.255) Step 3: Encode the symbol '0'. The probabilities are 0 for a '1' and 1 for a '0'. Therefore, the range is subdivided as follows: [170..255) for a '0', and [255..255) for a '1'. Thus, the range is reduced to [170.255) The steps for encoding the sequence '010' are: Step 1: Encode the symbol '0'. The probabilities are 1/3 for a '1' and 2/3 for a '0'. Therefore, the range is subdivided as follows: [0..170) for a '0', and [170..255) for a '1'. Thus, the range is reduced to [0.170) Step 2: Encode the symbol '1'. The probabilities are 1/2 for a '1' and 1/2 for a '0'. Therefore, the range is subdivided as follows: [0..85) for a '0', and [85..170) for a '1'. Thus, the range is reduced to [85.170) Step 3: Encode the symbol '0'. The probabilities are 0 for a '1' and 1 for a '0'. Therefore, the range is subdivided as follows: [85..170) for a '0', and [170..170) for a '1'. Thus, the range is reduced to [85..170) The steps for encoding the sequence '010' are: Step 1: Encode the symbol '0'. The probabilities are 1/3 for a '1' and 2/3 for a '0'. Therefore, the range is subdivided as follows: [0..170) for a '0', and [170..255) for a '1'. Thus, the range is reduced to [0.170) Step 2: Encode the symbol '0'. The probabilities are 1/2 for a '1' and 1/2 for a '0'. Therefore, the range is subdivided as follows: [0..85) for a '0', and [85..170) for a '1'. Thus, the range is reduced to [0.85) Step 3: Encode the symbol '1'. The probabilities are 1 for a '1' and 0 for a '0'. Therefore, the range is subdivided as follows: [0..0) for a '0', and [0..85) for a '1'. Thus, the range is reduced to [0..85) As you can see, the three possible sequences each lead to a range of approximately the same size, and the ranges of all possible sequences, when put together, form the starting range without any gaps. This scheme easily scales up to sequences with millions of bits. There is a coding gap caused by having to terminate the encoder, but that is a fixed cost of two or three bits, not a loss at each step as you have claimed. Furthermore, this termination cost can be avoided as well, with careful handling. I hope to have cleared up your misgivings about Arith Coding with this. SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT

0 |

1/7/2006 9:24:11 PM

Willem <willem@stack.nl> writes: > ) a) Each sequence from S3 is exactly 3 bits long, > There is a coding gap caused by having to terminate the encoder, Not in this particular case. Phil -- What is it: is man only a blunder of God, or God only a blunder of man? -- Friedrich Nietzsche (1844-1900), The Twilight of the Gods

0 |

1/7/2006 10:18:07 PM

nightlight <nightlight.skip-this@and-this.omegapoint.com> wrote in news:HaudnSFNItBFil3eRVn-rg@rcn.net: > > Any decent AC does *not* lose parts of its coding > > interval each step. Try to get that through your > > head. It *is* possible (hell, it's quite easy) to > > get the intervals to line up exactly without > > infinite precision. > > It seems you and few others here are either restricting > the noun "gap" to a subset of cases where gaps do occur, > or you are making certain implicit assumptions about > the source and the sequences it produces, without being > aware of making them. > > In order to clarify which is the case here, why don't > you demonstrate your easy and gapless finite precision > binary AC coding for the finite input sequences produced > by a binary source S3, which satisfy the following > conditions: > > a) Each sequence from S3 is exactly 3 bits long, > > b) each sequence has exactly two bits=0 and one bit=1 and > > c) each sequence is equally likely as any other. > > How does your easy gapless AC coding for all possible > input sequences from S3 look like? Since the inputs > are pretty short, you should be able to show it all > using an AC working in 4 bit precision. But if you > need more, well use more (as long as the bit strings > fit on a 72 char line). > > NOTE 1: The S3 output needs to be coded using a binary > order-0 AC in incrementing or decrementing or static mode, > which codes bit by bit as if the sequences have arbitrary > lengths. Namely, the short sequences are given above > only to allow you to show explicitly the coding, not > to allow switching over to let you construct the small > Huffman tree for the entire input (or some equivalent > ad hoc codes). > > For example, the exactly same coding algorithm should be > demonstrable (e.g. as a little sample exe coder that > will work on a regular PC with, say, 1G of RAM) with > inputs of, say, exactly 30 million bits long with exactly > 10 million bits=1 set. In this case you couldn't code it > via the multi-alphabet AC which treats the entire input > as a single character of a gigantic alphabet. Hence, > you can't use any such methods for S3 either. So, > just show it for a plain bitwise binary AC. > > NOTE 2: You can't feed your AC input sequences with > 0, 2 or 3 ones in order to fill the gaps and declare > them as "used". The S3 specified does not produce > any such sequences. > > If you have a sequence that is defined by 3 bit strings you in effect have 3 input symbols. With an adaptive coder you don't need to know in advance that each of the 3 symbols is equal likely but what the hell lets play the game and compress with a coder the likes of arb2x or arb255 the only difference being you need two cells one cell is split equal in half the other is split 1 to 2. Note this split is not perfect but when one one is useing 64 bit registers the split is dam close and no gaps. To explain what is happening lets use 1 bit registers the first splits so that a one is output when you have the 100 token the next is split 50 50 so the mappings would be 100 goes to 1 010 goes to 01 001 goes to 00 in fact this is what you would get in huffman case. And you can see that since its a complete tree any sequence of 1 and 0's on decompression gives you any of the 3 symbol sequences no gaps. You might complain in that if you had 30 million bits from the source your compressing that the compressed length would vary from 10 million bits in the unlikely event you have 10 million 100's being output even though it should have been only occuring roughly 1/3 of the time. The compressed length could also be as long as 20 million bits if the 100 never occured at all highly unlikely. However using the huffman apprpximation the average compressed length would be 16.666... million bits. When you go to more bits it compresses better again no gaps The ideal length for the 30 million bit sequence is which is actually 10 million times -lg(1/3) which is roughly 15849625.0072115618145373894394782 bits for the 10 million. while if using 64 bit register its - lg (6148914691236517204/18446744073709551616) for 100 which is roughly 15849625.0072115618176657356346099 and slightly less for the other 2 symbols. In short there would be no way to tell your not actually using 1/3 and there would be no gaps. Since you would need 15849625 bits give or take a bit to express the compression. If you have high enough interger precession math you could calculate the index. However sooner of later when you do compression you have to right that number out as ones and zeros. You gain nothing in space savings by have the exact number since the airhtmetic compression has no gaps. Sure you might be one bit shorter for some combinations but then you will be a bit longer for otherse. Its just a plain fact the bijective arithmetic can do this kind of compression in an optimal way. It not clear you understand how to do this with your method in general. and David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/7/2006 11:19:59 PM

["Followup-To:" header set to comp.compression.] On 2006-01-07, nightlight <nightlight.skip-this@and-this.omegapoint.com> wrote: > Can you email me a more detailed list of > changes (or headers etc) and fixes so I > can fix the copy on the web site? I'm not sure that it's actually working, but I'll email you a diff in two (formats machine readable and human readable in case the tools you have can't handle the former). I haven't implemented kbhit(), so that bit isn't working yet. is it used inside any timing loops or can I do an inefficient implementation. > Microsecond timer ought to be Ok (the windows > hi res is about 1/4 microsecond). Those test > loops need to be decoupled to measure separately > encode and decode times, and do the correctness > check outside of that, reseting the generator > to run 3 times with the same seed (it already > has a spare copy of original table & function > to do it). That way, even the time() would allow > accurate speed with enough iterations. You didn't consider wasting a little memory by decoding into a third buffer? > > what is all that output supposed to mean, > > or more to the point, what do I do to get > > efficiency statistics, elapsed time, > > compressed size, that sort of stuff. > > The readme.txt explains the main information commands, not in terms that I understand. > Adding a modeling engine to do the general file > compression or such is one of the items that will > get in eventually. I feel that it would make it easier to understand, but I haven't tried to uderstand the code much more than to get it to compile. > I think that in few years this > algorithm and its ofshoots will be running in > every portable device and serve as the bitmap > index and keyword incidence map coder in all the > search engines and data-warehouses. At the moment, > though, the number people who share this view > fits in about 2.3219... bits. :) Bye. Jasen

0 |

1/8/2006 4:23:27 AM

David A. Scott wrote: > "Matt Mahoney" <matmahoney@yahoo.com> wrote in > news:1136605351.691283.112110@g47g2000cwa.googlegroups.com: > > In your model you use 9 bits for every 8 bits of data where starting > bit is a ZERO for each byte and then for EOF you allow the starting > bit to be ONE and stop the compressor. This does not allow all the > code space to be used. > As a result of this modeling every compressed file with FPAQ0 has > the first bit of the fist byte set on output so technically the first > bit out is a total waste of space. Actually all the code space is used because the first bit is 0 for an empty file. Of course it could be improved by making it bijective instead of using 2*log2(n) bits to encode the length in bytes. -- Matt Mahoney

0 |

1/8/2006 5:57:59 PM

nightlight <nightlight.skip-this@and-this.omegapoint.com> wrote in news:HaudnSFNItBFil3eRVn-rg@rcn.net: > In order to clarify which is the case here, why don't > you demonstrate your easy and gapless finite precision > binary AC coding for the finite input sequences produced > by a binary source S3, which satisfy the following > conditions: > > a) Each sequence from S3 is exactly 3 bits long, > > b) each sequence has exactly two bits=0 and one bit=1 and > > c) each sequence is equally likely as any other. > > How does your easy gapless AC coding for all possible > input sequences from S3 look like? Since the inputs > are pretty short, you should be able to show it all > using an AC working in 4 bit precision. But if you > need more, well use more (as long as the bit strings > fit on a 72 char line). > > NOTE 1: The S3 output needs to be coded using a binary > order-0 AC in incrementing or decrementing or static mode, > which codes bit by bit as if the sequences have arbitrary > lengths. Namely, the short sequences are given above > only to allow you to show explicitly the coding, not > to allow switching over to let you construct the small > Huffman tree for the entire input (or some equivalent > ad hoc codes). > > For example, the exactly same coding algorithm should be > demonstrable (e.g. as a little sample exe coder that > will work on a regular PC with, say, 1G of RAM) with > inputs of, say, exactly 30 million bits long with exactly > 10 million bits=1 set. In this case you couldn't code it > via the multi-alphabet AC which treats the entire input > as a single character of a gigantic alphabet. Hence, > you can't use any such methods for S3 either. So, > just show it for a plain bitwise binary AC. > > NOTE 2: You can't feed your AC input sequences with > 0, 2 or 3 ones in order to fill the gaps and declare > them as "used". The S3 specified does not produce > any such sequences. > > > Nightlight here is what I am willing to do so others and maybe even you can see the difference. This is basically your example. But as usually you are not clear enough. I would like anybody to be to test the code. 1) So first of all the input for the compressor has to be files of ascii characters in multiples of 3 namely 100 or 010 001 that way the file will be exactly a multiple of 3 bytes where only those 3 combinations allowed. So file can be short as 3 byte or millions of bytes long but also a multiply of 3. 2) I will map that bijectively to a packed binary file you really don't need the format here but its for the arithmetic coder I don't wish to mod arb255 very much so that it easy for you and other to follow the changes I but in. 3) The bijective arithemtic coder will compress with no gaps using the fixed and unchanging wieghts of as close to 1/3 that it can get. ( One could design a cusom way but hay powers or 2 good enough) The output will be an binary file where each of the 3 sequences maps to roughly 1.5849625 bits per use. 4) The reverse is all so true. Take any file and do the reverse of 3 2 and 1 above and you get a unique sequemce of type 1 such that if X is an ascii file in begining then steps 2 and 3 would be compression and reverse of 3 and revese of 2 would be uncompression. if Y is any file period then: compress( uncompress (Y)) = Y and uncompress( compress(X)) = X Even though this is your example not sure you can do this with your method. It not even ideal for the arithmetic but even so it would be trival to mod arb255 to do this. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/8/2006 9:20:37 PM

First, few minor problems with your answer: > The starting range is [0..256) ... > Therefore, the range is subdivided as follows: > [0..170) for a '0', and [170..255) for a '1'. ... You start with full "range" [0..256), but then you switch the endpoints from there on to [*..255). You also meld together two mutually exclusive notations, the Pascal range A..B with the semi-open interval math notation [A,B) for a "range". The Pascal "range" A..B includes into the interval both points A and B, while the semi-open interval [A,B) doesn't include B. That notational meld of two mutually exclusive prescriptions was probably what led to your endpoint mixup 256 vs. 255. The fact that 255 is divisible by 3, while 256 is not gave it an extra wishful nudge to go with the 255. I will switch to a single coherent notation, [A,B), hence use e.g. [*,256). > There is a coding gap caused by having to terminate > the encoder, but that is a fixed cost of two or three > bits, not a loss at each step as you have claimed. The AC doesn't normally build the complete, ready to go output code on each step. Part of the output code is implicit in the coder's state variables and the implementation conventions. If at any point the coder is told to wrap it up and provide output code, the excess of 1-2 bits on that total output code will be in that output in the case of an infinite precision AC coder (ACX). This excess will be 2-3 bits in the case of a finite precision AC coder (ACF). Hence, these 1-2 or 2-3 excess bits on the total output are part of the AC output throughout -- they are only represented differently at different stages (implicit in the algorithm conventions and the internal state variables until the last step, at which point the output code creation is finalized and they become explicitly part of that output). Hence, at best your argument above is vacuous (arguing about ambiguities in verbal conventions), and at worst it is incorrect. I will grant you the benefit of the best case. Well, on the positive side, at least you recognize that these excess bits result in a "coding gap", which they do, of course. There is an additional and unrelated misconception revealed in the last part of that sentence: > ... but that is a fixed cost of two or three > bits, not a loss at each step as you have claimed. An infinite number of additive contributions can easily produce a sum with fixed upper bound, or within some fixed range, such as 1-2 excess for ACX or 2-3 for ACF. For example 1 + 1/3 + 1/3^2 +... = 1.5 and the partial sums after 2 terms are between 1.33 and 1.5. Hence, that the cost is "fixed" (or rather the cost interval is fixed) gives you no information on how many steps contributed to the cost. > Furthermore, this termination cost can be avoided as > well, with careful handling. Not in the general AC coding setup. In a special cases, e.g. when the number of symbols is prescribed and known upfront to encoder and decoder, you can avoid, at least part of it. In a general AC setup, without any such "side information" provided, the 1-2 bit excess for ACX (or 2-3 for ACF) is a part of the standard ACX code length (needed to select unambiguously the final codeword interval): L = ceiling(log(1/Pc)) + 1 .... (1) whole bits (see [41a], pp. 13-14), where Pc is the 'coding probability' (the probabilities you compute with in AC) of the complete message. For a stationary order-0 AC model, Pc is simply p^k * (1-p)^(n-k), where p=AC's probability of 1's, k=count of 1's, n=count of all symbols. Hence, the upper bound for ACX redundancy is 2 bits (unlike the Huffman or exact EC code, where the +1 is absent and where the upper bound is 1, resulting solely from rounding up of the log(1/Pc) to the next whole bit). Now to the fundamental problem of your answer -- the complete and hopeless conceptual meld of the three distinct concepts of "interval" relevant in arithmetic coding. Hence, before we can head toward any approximation of a coherent discussion, we need a finer resolution vew of these three kinds of "intervals": 1. IVF ==> the coder's internal variables in finite precision specifying finite precision intervals (these are your [0,85), [85,170),... etc. variables). 2. IVX ==> the coder's internal variable specifying an infinite precision interval (this type is implicit in your description, when you are referring to your intervals as approximate). While one can't store infinite precision binary fraction in an internal variable, one can easily store an equivalent rational fractions with unlimited precision integers for numerators and denominators (e.g. as done in [41a] pp. 13-14, 46). 3. CWI ==> Codeword interval - this is the interval defined by the AC codeword (the explicit codeword ready for output or the implicit codeword which could produced at each step). The AC codeword, implicit or explicit, is a binary string C of finite length L (for ACX L is given via (1)). The usual AC convention is to interpret C as representing fractional binary digits of some rational number Z, defined as: Z = 0.C. For example, if the AC codeword is C=1010 then L=4, Z=0.1010 = 1/2+0/4+1/8+0/16 = 5/8 = 0.625. The mapping of codewords C to intervals CWI is used more generally than just for AC analysis e.g. in the Kraft inequality proofs (cf. Cover & Thomas IT textbook [24]). The common convention for this mapping is as follows ([24] eq. (24) which I use below as (2); also [41a] pp. 13-14 & eq. (1.7)): CWI = [Z, Z+G) .... (2) where the "granularity" G (=the interval length) is defined as: G=1/2^L .... (3) Some examples of codewords C and their corresponding intervals CWI are shown below: C ... CWI ------------------------- 0 ... [0, 0.5) 1 ... [0.5, 1) 00 ... [0, 0.25) 01 ... [0.25, 0.5) 10 ... [0.5, 0.75) 11 ... [0.75, 1) ... etc. ------------------------- The soft & fuzzy pedagogical descriptions of the AC encoding algorithm usually describe construction of the nested IVX or IVF intervals from the input sequence, then state that the coder transmits binary digits of some point Z from within the final IVX or IVF interval as a specification for that interval and they'll prescribe that the number of digits should be given via (1) (sometimes omitting the +1 term). But, the point Z given with L fractional bits C, can't all by itself specify even the IVF, let alone IVX. The only interval that Z can specify unambiguously, and only if further supplemented with a precise convention, such as eqs. (2) and (3), is the interval CWI. The higher resolution descriptions (such as [41a] pp. 13-14, 41-48) explain and derive code length (1) -- its origin is in the requirement (in ICW terms used above) that the interval ICW has to fit within the coder's final "internal variable" interval IVX or IVF (cf. eq. (1.7) in [41a]). We can look now at a set {C} = {C1,C2,...} containing all possible codewords that an AC can produce from a given source (this may be an infinite set). The corresponding intervals CWI(C1), CWI(C2),... form a set {CWI(C)} of non-overlapping intervals (consequence of the AC output decodability). Since all CWI intervals fit within [0, 1), the non-overlapping property implies that their lengths G(C1), G(C2),... can add up to at most 1, i.e. G(C1)+G(C2)+G(C3)+... =< 1 ... (4) The eq. (4), recalling the definition (3) of G's, is the Kraft inequality which must hold for any uniquely decodable codes (cf. [24] p. 90, McMillan theorem), including our set {C}. The concept of "coding gaps" (in code space or in code intervals), which is our topic here, is normally used in the context of Kraft inequality (4) as follows: The codes for which (4) holds as the exact equality are called "compact" codes and such codes don't have "coding gaps". The codes for which (4) holds only as the "<" part of the "=<" inequality have "coding space gaps" or "coding gaps" (unused codes). The term "interval gap" simply refers to these same gaps in terms of the CWI intervals whose lenghts L(C1), L(C2)... are used in (4). Even in the low res analysis you apparently did manage somehow to catch a vague glimpse, however briefly, of what these gaps refer too, since you said: > There is a coding gap caused by having to terminate > the encoder, but that is a fixed cost of two or three > bits, ... Clearly, you are realizing above that adding 2 or 3 bits as a termination cost to the codewords, does create gaps somehow and somewhere. But since that didn't seem to fit into the your conceptual meld in which all three interval types are just the "interval" and where you can make your internal variables interval IVF fit together as tightly as you wish, you couldn't quite see how could the "interval^3" have this gap. Now, with the three higher res interval concepts in front of you, you can look at (3) and see that adding 2 or 3 to L will make the corresponding CWI interval length G smaller by 4 to 8 times, hence a pretty big gap will be opened in the CWI coverage (4) of the interval [0,1) (since 75-87 percent of that CWI will turn into the unused code space, the interval gap). But seeing in it the low res back then, you concluded: >... but that is a fixed cost of two or three > bits, not a loss at each step as you have claimed. Now look at (1) which shows how the L is computed -- the smaller the probability Pc of the whole symbol sequence, the larger the L in (1) will be. But Pc in (1) is the coding probability for the particular sequence e.g. for order 0 model (as we used), Pc is a series of truncated products of probabilities p & q of individual symbols encountered. Now what happens if you have the actual p=1/3, which is an infinite binary fraction 1/3=0.01010101... and use instead p'=0.010 for AC coding? As you probably know from basic coding theory, the greater the deviation of p' from the actual probability p, the longer on average the codeword length L. In turn, the increase in avg. L shortens the average CWI interval length G via (3), contributing to the increase in CWI interval gap (which is the shortfall from 1.0 in Kraft inequality (4)). Note also that the average increase in L is the result of accumulation of these contributions along the way from all single coding step coding probability deviations from the actual probability at that step. The exactly same coding gaps happen with the QI coder due to its finite precision. Since QI's construction of interval lengths (eq. (21) p. 8 in [T3]) is bottom up, the accumulation is much more transparent here. You can even see it as it accumulates step-by-step using the QI.exe program, option "cbr" (to show complete binomial row; set also option n50 since default is n=1024 which would scroll off). The QI errors are due to rounding up to g (g=32) bit SWI mantissa in (21). The "cbr" command shows all quantized SWI binomials in row n, with asterisk next to those which were rounded up when computed via recurrence (21), which here, for the binary coder, is simply the Pascal triangle recurrence: C(n,k) = C(n-1,k) + C(n-1,k-1). The columns labeled dm16 and dm10 show how much this increments of mantissa have accumulated over all such adds (for all prior n). The "Extra Bits" column shows how much these dm contrinbute to excess bits (usually 1/10^9). If the mantissa were shorter, or if n is very large, these accumulations of increments would eventually cause mantissa to overflow, which would then increase the SWI exponent by 1, hence lenghtening the output by 1 bit. Unlike QI, AC uses multiplicative binomial recurrences, described in earlier post [P1], eq (1), (2). Even more unlike QI, which does this only during the table construction (which is a universal table, the same table for all source probabilities, hence it can be premade & loaded from a file when the program starts), AC computes them from scratch on every coding task and for every symbol along the way. The final difference from QI, is that AC works top down, from the largest binomial or addend (point B on Fig 1 in [T3], see also [T2] pp. 19), which in AC is scaled to 1, and all other addends are smaller than this since they're all divided by this largest binomial (which is at the endpoint of the path, cf [T2] pp. 19-25 for details on this very close correspondence between the two coders). Hence, the rounding down arithmetic of AC works by fixing the starting binomial to 1 (or an addend generally) and reducing the size of the lower order ones, resulting at the end in smaller Pc in (1), thus longer output L. With QI which computes from lower to higher binomials, the higher binomials are incremented when rounding (since SWI rounding in (21) is up). So, while QI's rounding up errors accumulate toward the larger addends making them larger, the AC's rounding down errors accumulate towarad the smaller addends making them smaller, which it eventually pays for via (1) in 1-2 excess bits for ACX or 2-3 excess bits for ACF (note that for AVF, the excess can go beyond 2-3 if the input gets too long for the frequency counters). As to your IVF intervals fitting perfectly together, that is irrelevant for the gaps in the ICW intervals (which are the intervals defined by the codeword C, the quantity that is transmitted and to which (4), the gaps, the unused codes & the redundancy apply). IVFs are your internal variables and you're managing them as you wish (e.g. what to do with their fit when you need to drop their digits in finite precision). If you want to fit the IVFs together, all tight and snuggly, thats' fine and you are free to do so, but that won't make the gaps in the CWI go away or average L become shorter. For example, you could reduce your AC precision to 4 bits and your avg. excess bits, the unused code space and the CWI interval gaps will baloon (see table 3.4 p. 52 in [41a] for effects of the 4 bit AC precision), yet all your internal variables IVFs are still happily covering the entire interval [0,1), as snuggly as ever, with no gaps in sight. You're welcome, of course, to write out all the codes AC will produce on that example (under conditions described i.e. no special ad hoc code tweaks for the 3 short strings; or just run some AC, in decrementing mode as you did in your description) and show how the CWI intervals for them fit gaplessly or equivalently, show how these actual AC codes use entire output code space without "coding gaps" (and, obviously, without contradicting your earlier statement on "coding gaps" due to 2-3 bit termination overhead). > I hope to have cleared up your misgivings about Arith > Coding with this. Well, I hope yours will clear up, too :) -- References ( http://www.1stworks.com/ref/RefLib.htm ) 41a. P.A.J. Volf "Weighting Techniques In Data Compression: Theory and Algorithms" Ph.D. thesis, Eindhoven University of Technology, Dec 2002 http://alexandria.tue.nl/extra2/200213835.pdf 24. T.M. Cover, J.A. Thomas "Elements of Information Theory" Wiley 1991 P1. Earlier post on AC multiplicative recurrences: http://groups.google.com/group/comp.compression/msg/71847a32daa9a571

0 |

1/8/2006 9:27:25 PM

0 |

1/8/2006 9:29:19 PM

nightlight <nightlight.skip-this@and-this.omegapoint.com> wrote in news:yIGdndvhfe_GGVzenZ2dnUVZ_tqdnZ2d@rcn.net: First some minor problems with you whole line of thought. We can easily code this so most of your anwser has nothing to do with reality. Second just what you want coded is unclear I have tried to ask for enough details so others can look at real test results instead of the long rants. Third are you going to write code that would work on real files for this example or is that to hard even though you designed this yourself to make AC look bad you your stuff good? > > There is a coding gap caused by having to terminate > > the encoder, but that is a fixed cost of two or three > > bits, not a loss at each step as you have claimed. > > The AC doesn't normally build the complete, ready to go output code on > each step. Part of the output code is implicit in the coder's state > variables and the implementation conventions. If at any point the > coder is told to wrap it up and provide output code, the excess of 1-2 > bits on that total output code will be in that output in the case of > an infinite precision AC coder (ACX). This excess will be 2-3 bits in > the case of a finite precision AC coder (ACF). Hence, these 1-2 or 2-3 > excess bits on the total output are part of the AC output throughout > -- they are only represented differently at different stages (implicit > in the algorithm conventions and the internal state variables until > the last step, at which point the output code creation is finalized > and they become explicitly part of that output). Hence, at best your > argument above is vacuous (arguing about ambiguities in verbal > conventions), and at worst it is incorrect. I will grant you the > benefit of the best case. > > Well, on the positive side, at least you recognize that these excess > bits result in a "coding gap", which they do, of course. > Actually thats not entirely true. As one does a full bijective arithemtic file encoder for this you write out only what is done when compressing you realize for each or the 3 possible output you need roughly 1.58 bits So naturally the coder is carrying the excess around till the input terminates. On termination some times you round up sometimes you round down. The interesting thing is this. Take the output file from a long set of inputs but change the last byte of the compressed file. In fact change it so that you have 256 files each with a different last value. In turn each of these files will decompress back to a valid input that if recompressed goes back to same unique compressed file. THER ARE NO GAPS. Can you do this with your method. I don't think so. REST OF USELESS RANT CUT David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/8/2006 9:55:55 PM

--- Errata: The citation: > (cf. Cover & Thomas IT textbook [24]). The > common convention for this mapping is as follows > ([24] eq. (24) should have in the last line: ([24] eq. 5.11 p. 84

0 |

1/8/2006 9:56:53 PM

nightlight wrote: ) First, few minor problems with your answer: ) ) > The starting range is [0..256) ... ) > Therefore, the range is subdivided as follows: ) > [0..170) for a '0', and [170..255) for a '1'. ... ) ) You start with full "range" [0..256), but then you switch ) the endpoints from there on to [*..255). You also meld together ) two mutually exclusive notations, the Pascal range A..B ) with the semi-open interval math notation [A,B) for a "range". Semantics, and a typo. I trust it was obvious what I meant. ) > There is a coding gap caused by having to terminate ) > the encoder, but that is a fixed cost of two or three ) > bits, not a loss at each step as you have claimed. ) ) The AC doesn't normally build the complete, ready to go output code on ) each step. Part of the output code is implicit in the coder's state ) variables and the implementation conventions. If at any point the coder ) is told to wrap it up and provide output code, the excess of 1-2 bits on ) that total output code will be in that output in the case of an infinite ) precision AC coder (ACX). This excess will be 2-3 bits in the case of a ) finite precision AC coder (ACF). Hence, these 1-2 or 2-3 excess bits on ) the total output are part of the AC output throughout This does not follow. As I see it, there are no excess bits in an Arith Coder, until the coder is told to wrap it up. Can you give some clear arguments why you claim that the bits added by termination are present before termination ? To any sane person, the phrase 'a loss at each step' implies that this loss will grow as the number of steps increases. ) Well, on the positive side, at least you recognize that these excess ) bits result in a "coding gap", which they do, of course. A coding gap of a whopping two bits. If your claim is that AC is worse than QI because of those two bits, I laugh in your face. I have snipped your whole discussion on those two or three bits because they are irrelevant, and your claim that a cost that does not grow as the number of steps increases is nevertheless incurred at each step is ridiculous and arbitrary. ) Now to the fundamental problem of your answer -- the complete and ) hopeless conceptual meld of the three distinct concepts of "interval" ) relevant in arithmetic coding. Hence, before we can head toward any ) approximation of a coherent discussion, we need a finer resolution vew ) of these three kinds of "intervals": I have snipped your entire discussion below, because in the end it still boils down to the fixed cost of two or three bits of excess. ) You're welcome, of course, to write out all the codes AC will produce on ) that example (under conditions described i.e. no special ad hoc code ) tweaks for the 3 short strings; or just run some AC, in decrementing ) mode as you did in your description) and show how the CWI intervals for ) them fit gaplessly or equivalently, show how these actual AC codes use ) entire output code space without "coding gaps" (and, obviously, without ) contradicting your earlier statement on "coding gaps" due to 2-3 bit ) termination overhead). As evident from your conclusion. As I said before, I will now laugh in your face. You have written page upon page of explanation, all of which in the end revolved around a fixed coding excess of a few bits. SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT

0 |

1/8/2006 11:18:30 PM

0 |

1/9/2006 5:56:27 AM

> You have written page upon page of explanation, > all of which in the end revolved around a fixed > coding excess of a few bits. The topic being discussed in this sub-thread was about difference in the quantization errors and the resulting gaps. For ACX (infinite precision AC), this excess is 1-2 bits on total. For ACF (finite precision AC) the error is 2-3 bits, provided AC is within its initial bounds for counts (usually 2^24-2^30 or similar). Once it gets beyond, unlike ACX, the error does grow beyond 2-3 bits, which becomes dependent on this case is handled. In any case, this is only one component of the ACF excess, which was the particular subtopic we were discussing. You can make ACF perform at about that level of excess by using decrementing AC (see [34]). In that role, though, while ACF will code nearly as optimally as QI (within these few bits, which grow very slowly with N), all it achieves is to become a slightly less accurate and much slower exact work-alike of QI. The regular adaptive or static AC's that one normally finds in practical implementations, there will be an additional redundancy relative to QI which, in case of order-0 stationary sources can be derived exactly, as shown in [T2] pp. 20-24). That redundancy is in the leading order 1/2 log(2P npq) for binary (for general alphabet of size A, there are A-1 such terms which are summed, resulting in approx 1/2 changing to (A-1)/2). As shown in [T2], this error is due to the approximate enumeration of AC, where the Stirling approximation and the subsequent dropping of the sqrt() & other factors (each being <1), causes AC addends to increase relative to QI, leading to an excess of O(log(n)) bits on the total output. This O(log(n)) order redundancy was the dominant effect shown in the test table (p.10 in [T3]) for all but the last row (which illustrates different effect). Depending of course on what the total output size is (which depends on input size and density of 1's), this may amount to a tiny fraction of 1% or to 5-6% for the parameter ranges shown in the table. Some superficial, low res, short fuse mind will stop right here, concluding 'few percent' at best, it seems. What's the big deal. I can do with my new xyz coder for images 35% better than jpeg,... or some such. To realize what's the big deal, you need to follow the 'white rabbit' a bit further. Since the enumeration excess of O(log(n)), or even the 2-3 bit quantization excess, can be as large or even larger than the output size (e.g. in low entropy limit or for short enough inputs of any entropy per symbol), this can be 50% or 100% or _any_ ratio you want provided you work with exact fractional bit sizes. You can encode to such fractional sizes if you have several items of any kind (which may belong to entirely different sources, different parts of the program or different programs) to encode and all have some fractional bit sizes -- the mixed radix codes allow combining of these fractions so that the only one whole bit rounding error is paid for the total. Hence, the potential compression ratio in favor of QI is in principle unlimited. The two kinds of "small" differences in redundancy are indeed small for the usual things one codes with AC (or Huffman). The reason for that is not because the differences are universally small, but because one doesn't code with AC anything where it would be very poor. So, the apparent "smallness" is a mere reflection of the narrowing of the domain surveyed to what AC or Huffman algorithm code reasonably well. But, there is a whole realm of unexplored algorithms, which would require coding of many separate items, which in each item amount to a tiny bit fraction, and which are ignored because AC won't do well here, due to those "tiny" overheads O(log(n)) of fixed 2-3 bits per coding task. Consider, for example BW-Transform output column R (Fig 5, p. 34, [T2]). If you look at long enough contexts, the column R will fragment into many small pieces. To account for longer contexts, one would want to select for coding as fine fragmentation of R as practical. The adaptive AC was found to perform very poorly here, even worse than various ad hoc schemes that are in use now. { The row "Vary" in table on p.10 in [T3] illustrates this effect, where on a 4K input, adaptive order-0 AC outputs more than twice as much as the descriptive order-0 QI.} The general coding for long contexts is not the only area with lots of highly fragmented small inputs. Differential frame updates for video & audio codecs generate similar data as well. In addition to domain of highly fragmented data, there is an area where the ultra-low entropy limit effects become significant. These are the database or data-warehouse bitmap indexes and keyword incidence bitmaps for search engines. The big search engines will have such bitmaps of 8+ billion bits (or whatever the latest Google vs Yahoo vs MSN figures are these days). The compression methods in these fields are either ad hoc methods or runlengths, both of which break down on highly clustered and very low average entropy data. In relation to search engine & database domains (and few others), another unique aspect of QI becomes highly relevant: a) the compressed output size is available without decompressing the data -- for example you know exactly the size of binomial coefficient C(n,k) from count of 1's. b) For fixed input entropy rate QI codes at _precisely_ the fixed number of bits (which is also within the log(e)/2^(g-1) bits from the entropy, at precision g and model used) e.g. all n! permutations of n items will always be encoded into the _precisely_ the same size (you can test that with QI.exe or see it in the source). The Huffman and AC don't have these properties. Huffman will have fairly large variations (easily as much as 10-15%) in the output size, AC less than Huffman, but still of the order of O(log(n)) bits even at fixed input entropy rate. Hence, if one wishes to store the compressed items into a designated space, known in advance, with AC and Huffman, one would have to reserve space for the worst case of variation, increasing their redundancy. Similarly, if one is coding complex data structures, with lots of compressed pieces that need later to be traversed quickly and accessed randomly, it is essential that the sizes of compressed pieces can be known exactly, without expanding the data, which QI property (a) assures. With Huffman & AC, you either need to pad the space to cover the worst case compressed size or include the size as a separate item. Either way wastes additional space. Although the above types of setups occur in many fields, the complex data structure aspects are especially relevant when coding and traversing large networks or graphs, the most popular example of these being the web links. But the network modeling paradigm has been gaining in recent years in many other fields, such as computational biology, where ever more complex webs of bio-chemical reactions are being constructed on top of the growing DNA maps & genetic databases. Another field with mathematically similar problems are in the ad hoc routing for wireless networks. Finally, there is a domain of constrained coding (used for the low level recording media codes and channel coding), where the exact EC, however cumbersome and inefficient in the conventional form, is still sufficiently advantageous over the Huffman & AC that they have been using it here since 1950s (see Immink's work). The QI improves speed and memory requirements vs exact EC by a huge factor O(n), so these areas can benefit directly and right away. In conclusion, the availability of an extremly fast and a highly accurate (optimal for any given aritmetic precsion) coding algorithm to perform such coding tasks, opens the entire realm of unexplored compression algorithms. The compression gains potentially available to such algorithms are not limited at all to the few percent one can see on the subset of coding tasks where Huffman or AC code well (which to those over-conditioned to seeing only that domain, appears to be all there is to compress). -- References ( http://www.1stworks.com/ref/RefLib.htm ) T1-T3 are on http://www.1stworks.com/ref/qi.htm 34. J.G. Cleary, I.H. Witten "A Comparison of Enumerative and Adaptive Codes" IEEE Trans. Inform. Theory IT-30 (2), 306-315, 1984 http://www.1stworks.com/ref/Cleary84Enum.pdf

0 |

1/9/2006 1:38:45 PM

nightlight wrote: ) In any case, this is only one component of the ACF excess, which was ) the particular subtopic we were discussing. You can make ACF perform at ) about that level of excess by using decrementing AC (see [34]). In that ) role, though, while ACF will code nearly as optimally as QI (within ) these few bits, which grow very slowly with N), all it achieves is to ) become a slightly less accurate and much slower exact work-alike of QI. ) ) The regular adaptive or static AC's that one normally finds in ) practical implementations, there will be an additional redundancy ) relative to QI which, in case of order-0 stationary sources can be ) derived exactly, as shown in [T2] pp. 20-24). That redundancy is in the ) leading order 1/2 log(2P npq) for binary (for general alphabet of size ) A, there are A-1 such terms which are summed, resulting in approx 1/2 ) changing to (A-1)/2). As shown in [T2], this error is due to the ) approximate enumeration of AC, where the Stirling approximation and the ) subsequent dropping of the sqrt() & other factors (each being <1), ) causes AC addends to increase relative to QI, leading to an excess of ) O(log(n)) bits on the total output. This redundancy is offset by the need to transmit extra bits for the modeling information, which is something you have neatly excluded in your calculations. ) This O(log(n)) order redundancy was the dominant effect shown in the ) test table (p.10 in [T3]) for all but the last row (which illustrates ) different effect). Depending of course on what the total output size is ) (which depends on input size and density of 1's), this may amount to a ) tiny fraction of 1% or to 5-6% for the parameter ranges shown in the ) table. I assume your tests only counted the actual bits output by the encoder, and not the bits needed for transmitting the model ? That makes it an unfair comparison. SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT

0 |

1/9/2006 1:57:04 PM

"Matt Mahoney" <matmahoney@yahoo.com> wrote in news:1136681911.708860.197090@g47g2000cwa.googlegroups.com: > David A. Scott wrote: >> "Matt Mahoney" <matmahoney@yahoo.com> wrote in >> news:1136605351.691283.112110@g47g2000cwa.googlegroups.com: >> >> In your model you use 9 bits for every 8 bits of data where starting >> bit is a ZERO for each byte and then for EOF you allow the starting >> bit to be ONE and stop the compressor. This does not allow all the >> code space to be used. >> As a result of this modeling every compressed file with FPAQ0 has >> the first bit of the fist byte set on output so technically the first >> bit out is a total waste of space. > > Actually all the code space is used because the first bit is 0 for an > empty file. > You could leave the enpty file empty. And yes one could think of the code space there as being used when one likes at compression. I guess I was also looking at decompression of a long file and wondering what happens to the rest of the file to be decompressed if what is returned is marked as an EOF while the uncompressed file has much wasted file space since following bytes will not be looked at. > Of course it could be improved by making it bijective instead of using > 2*log2(n) bits to encode the length in bytes. > You don't have to go to that extreme and make it hard for most to follow. You could use just log2(n) bits to encode the length. It still would not be bijective but it would not complicate the code that much more and a rough file integrity check could be done during decompression without any additional output to the compressed file. Matt if I can get nightlight to commit to coding his example of the 3 symols types. I would like to play again with fpaq0. To see how much better it can be made with as little change as possible. I like your style but I don't think I will go to the wall and make it bijective. But the nine times for each eight can be changed to eight for eight with a ninth only needed for the last byte. But it looks like nightlight is all talk and will not even attempt to code his simple example. Maybe he has looked at the example he himself made up and realizes that he can't beat a real aritmetic coder that can be written to do his own example where QI was suppost to shine and arithmetic fail. > -- Matt Mahoney > Again you code really it neat for the other methods. Where this nodel code is different is that in your other coding you have the length field in front of file so you never add this extra cost through out the file. I think you can add the cost at the back end and save a little extra space with very little change in source code. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/9/2006 2:42:10 PM

> This redundancy is offset by the need to transmit extra bits for > the modeling information, which is something you have neatly > excluded in your calculations. Not quite. The Stirling approx. & dropping of the sqrt(n) factor is purely an arithmetic error (which increasies the AC addends) that you can avoid in AC only by coding in decrementing mode, in which case both coders have exactly the same parameter encoding. As explained few messages ago, in some special cases (such as stationary Bernoulli source) you can avoid the leading order of sqrt() error via KT estimator (that is not available generally and even when available includes tradeoffs as explained earlier in this thread). A simple intro into the 2nd order AC redunancy is in the ref. [40]. > I assume your tests only counted the actual bits output by the encoder, > and not the bits needed for transmitting the model ? That makes > it an unfair comparison. Of course, not. Each coder had to produce entirely self-contained output, with data sizes and all counts packaged in. AC needed only to encode its input size (which Moffat98 does using AC itself, as an outer loop, called on every 8 bits of payload), while QI had several more items. I already explained that in [T3] p. 9, other posts & the source code package. I coded the counts using a small Huffman code table (prebuilt for the binomial distribution, just few KB size for n=1024; I also have hypergeometric distrib. tables, but these were not used for tests in [T3]), using total k sent separately (in log n bits since n is known at that point; the n itself was coded using two part self-delimiting codes, Huffman code for prefix specifyingt the length of rest & binary for the rest). The total k was used to calculate the running block to block average (which selects the Hufman subtable for that average) and also to extracting the exact last block count as the leftover k at that point. I also coded the leading 16 bits of mantissa for each output block using the mixed radix codes (these can also use tapered huffman codes if one doesn't care about 2-3 percent loss on these 16 bits or about the artificial output size variability for fixed entropy inputs, see item (a) & (b) in previous post). The source is out there along with tips in the readme.txt on how to compare, compression effectivness & speed vs AC or other coders. This is not some secret compression algorithm with magic powers and which no one is allowed access to. The math alone should be enough for those familiar enough with the relevant topics, to know how the compression will come out, even without any source or tests. You're welcome to experiment with source and report any abberations.

0 |

1/9/2006 2:53:18 PM

Willem <willem@stack.nl> wrote in news:slrnds4qtg.frs.willem@toad.stack.nl: > nightlight wrote: > ) In any case, this is only one component of the ACF excess, which was > ) the particular subtopic we were discussing. You can make ACF perform at > ) about that level of excess by using decrementing AC (see [34]). In that > ) role, though, while ACF will code nearly as optimally as QI (within > ) these few bits, which grow very slowly with N), all it achieves is to > ) become a slightly less accurate and much slower exact work-alike of QI. > ) > ) The regular adaptive or static AC's that one normally finds in > ) practical implementations, there will be an additional redundancy > ) relative to QI which, in case of order-0 stationary sources can be > ) derived exactly, as shown in [T2] pp. 20-24). That redundancy is in the > ) leading order 1/2 log(2P npq) for binary (for general alphabet of size > ) A, there are A-1 such terms which are summed, resulting in approx 1/2 > ) changing to (A-1)/2). As shown in [T2], this error is due to the > ) approximate enumeration of AC, where the Stirling approximation and the > ) subsequent dropping of the sqrt() & other factors (each being <1), > ) causes AC addends to increase relative to QI, leading to an excess of > ) O(log(n)) bits on the total output. > > This redundancy is offset by the need to transmit extra bits for the > modeling information, which is something you have neatly excluded in > your calculations. > Would you really expect him to actaully do the example he proposed. This guy seems all talk and no action even on the very example he proposed to code. > ) This O(log(n)) order redundancy was the dominant effect shown in the > ) test table (p.10 in [T3]) for all but the last row (which illustrates > ) different effect). Depending of course on what the total output size is > ) (which depends on input size and density of 1's), this may amount to a > ) tiny fraction of 1% or to 5-6% for the parameter ranges shown in the > ) table. > > I assume your tests only counted the actual bits output by the encoder, > and not the bits needed for transmitting the model ? That makes it > an unfair comparison. > Of course he tried to make an unfair comparison. He wants his method to look best. However I am sure I can change my code to do the example he proposed in a bijective way, So go ahead press him to write code that any one can check with files of there own. He will not do it since I don't think he has the ability to follow through even on his own example. He can quote text that he seems to not understand when it come to arithmetic but can he write real code for even the simple example he proposed. Don't hold you breath I think we have see his kind here every year. So lets cut the chase and see if he will actually do anything. NIGHTLIGHT where is your code. Lets nail the example down to files. Where one had to compress and then has to decompress back, Since its not a real test unless one can go both ways and users of the group can actually test it. WE ARE WAITING!!!! > > SaSW, Willem David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/9/2006 2:58:29 PM

"nightlight" <nightlight@omegapoint.com> wrote in news:1136818397.999924.283680@z14g2000cwz.googlegroups.com: > > Of course, not. Each coder had to produce entirely self-contained > output, with data sizes and all counts packaged in. AC needed only to > encode its input size (which Moffat98 does using AC itself, as an outer > loop, called on every 8 bits of payload), while QI had several more > items. I already explained that in [T3] p. 9, other posts & the source > code package. I coded the counts using a small Huffman code table > (prebuilt for the binomial distribution, just few KB size for n=1024; I > also have hypergeometric distrib. tables, but these were not used for > tests in [T3]), using total k sent separately (in log n bits since n is > known at that point; the n itself was coded using two part > self-delimiting codes, Huffman code for prefix specifyingt the length > of rest & binary for the rest). The total k was used to calculate the > running block to block average (which selects the Hufman subtable for > that average) and also to extracting the exact last block count as the > leftover k at that point. I also coded the leading 16 bits of mantissa > for each output block using the mixed radix codes (these can also use > tapered huffman codes if one doesn't care about 2-3 percent loss on > these 16 bits or about the artificial output size variability for fixed > entropy inputs, see item (a) & (b) in previous post). > > I guess this is your way of saying you can't even code your own example. What are you afraid of. I think you fear losing your own contest?? David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/9/2006 3:02:23 PM

> Discarding part of the range is one way to deal with finite > precision, for example the carryless rangecoder in ppmd. > However the various coders in paq6, paq7, fpaq0, etc. do > not have any gaps in the code space. These are carryless > binary arithmetic coders with 32 bits precision and 12 > bit representation of probabilities. I responded to a similar objection from Willem in another post [M1] in more detail, so I will just supplement the differential aspect here, referring to the formulas & ref's there. The AC carry problem which is due to the usual FIFO coding mode can be avoided by either using LIFO mode (as Rissanen's AC-76) or buffering the whole data and propagating the carry. But the carry redundancy or any explicit gaps in the IVF (see [M1]) intervals are a only part of the quantization errors. The unavoidable fundamental quantization bit excess comes from the reduction in size of the Pc (see eq. (1) in [M1]), which is the coding probability of the complete message. This reduction in size of Pc as gets computed along the string (e.g. by mutiplying with the single symbol probabilities along the way) is instrinsic to the truncating arithmetic of AC -- there is no way around it since one can't just round the product down along one branch and round it up along the alternative branch, due to the Kraft inequality constraint in the current step and all the steps upstream (the parts already encoded). This exact same effect, which is a purely arithmetic rounding phenomenon, is much more obvious with QI, where the Kraft inequality constraint is given in eq. (20) ([T3] p. 8). The basic QI quantization recurrence is given in eq. (21). Any time the rounding up occurs in (21), the next higher order addends increase slightly (the upper bound on that excess is log(e)/2^(g-1) bits per symbol). The AC addends are exactly the same addends, except that they are rescaled for each coding task spearately, so that the largest addend is always exactly 1 (interpreted in AC as total probability=1, cf. [T2] pp. 19-20, Figs. 3 & 4). All other AC adends are simply the QI addends divided by this largest addend (which is selected individually for each input). Similarly, the AC equivalent of eq (20), the Kraft inequality constraint, is divided by the largest addend, so it appears as a constraint on the probabilities, but it is still the same constraint. Hence, the exactly same relation between the higher and lower order addends is maintained by AC and QI, where in both cases the quantization enlarges the ratios between the higher order and lower order addends. The difference is that QI quantization, which is bottom up, is optimal for any given precision, while AC's, which is top down, is not (this is the same difference as between the bottom-up Huffman codes, which are optimal, and top-down Shannon-Fano codes, which are not). As explained in another message I posted today [M2], this particular difference (which is unavoidable for the AC) is small for the tasks for which AC & Huffman are typically used for. As explained in [M2] in more detail, that is almost a tautological observation, since AC & Huffman are used only where they do a reasonably good job. But there are coding domains (along with the corresonding largely unexplored higher level algorithms) beyond this one, in which the compression ratios could be arbitrarily large, solely based on these "small" differences (the quantization & enumeration errors, the latter are of O(log(n)) are avoidable by AC, see [M2]). > Being restricted to an order 0 model seems like a > severe disadvantage. How would you transform a > context mixing model to order 0? This was explained in detail in another message, [M3]. The appearance of this limitation (which many people have expressed), is due to viewing it through the AC modeling paradigm (ACMP), especially through the so-called "online" constraint (which doesn't exist in real life, since Morse telegraph were the main communication technology), where one has to process entire sequence symbol by symbol and output with only log(n) latency. Also, the modeling is a much larger field than calculating probabilities of the next single symbol (note, though, that method (b) in [M3] can handle arbitrary AC models, while still retaining the full QI speed advantage). -- References ( http://www.1stworks.com/ref/RefLib.htm ) T1-T3 are on http://www.1stworks.com/ref/qi.htm M1. Post on quantization errors in AC & QI: http://groups.google.com/group/comp.compression/msg/b157e1aa25e598d8 M2. Post on the apparent "smallness" of compression differences: http://groups.google.com/group/comp.compression/msg/6ebbc078012c215c M3. Post on QI modeling: http://groups.google.com/group/comp.compression/msg/1314ff87da597fad

0 |

1/9/2006 3:35:29 PM

Hi, > The AC carry problem which is due to the usual FIFO coding mode can be > avoided by either using LIFO mode (as Rissanen's AC-76) or buffering > the whole data and propagating the carry. But the carry redundancy or > any explicit gaps in the IVF (see [M1]) intervals are a only part of > the quantization errors. Ehem, small correction: You don't have to buffer the whole data, see for example Nelson's or Moffat's implementation. All you need to do is to count how many times you have "forgotten" to carry over, and resolve this as soon as the carry is resolved by propagating it thru all the "counted" (but never buffered) data. Other methods use "bitstuffing", i.e. insert a redundant bit to allow the carry to propagade in, but this is completely off-topic here since it pessimises the code, and we're currently in "bit juggling" mode here. (-; > This was explained in detail in another message, [M3]. The appearance > of this limitation (which many people have expressed), is due to > viewing it through the AC modeling paradigm (ACMP), especially through > the so-called "online" constraint (which doesn't exist in real life, > since Morse telegraph were the main communication technology), where > one has to process entire sequence symbol by symbol and output with > only log(n) latency. I afraid this is pretty much an existing problem. This "online" constraint is often a limitation imposed by limited memory of telecommunications hardware, and in this context called the "latency" of the coding model. Consider for example an audio compression over an IP network, using a hardware based coder. A long memory in the device would mean that the receiver has to wait long (in principle infinitely long) before it might be able to play the sound. Unacceptable for real-life applications, and this is why folks play the "bitstuffing" game I mentioned above, even though it reduces the coder performance. Thus, there *are* definitely areas where one wants to have an online-algorithm. However, I would not consider this problem on-topic here right now. I don't think that this latency question is right now relevant for your work; getting past AC is interesting enough by itself, let it be on-line or off-line. So long, Thomas

0 |

1/9/2006 5:27:27 PM

> Consider for example an audio compression over an IP network, using > a hardware based coder. A long memory in the device would mean that > the receiver has to wait long (in principle infinitely long) > before it might be able to play the sound. Unacceptable for real-life > applications, and this is why folks play the "bitstuffing" game I mentioned > above, even though it reduces the coder performance. I did few of these, all mods from existent audio codecs, for our current communication product ( http://www.hotComm.com ). Even here, though, take the most basic low quality voice with 8000 samples/sec, and take low precision samples of just 1 byte per sample (this is not what we normally use). The segments we typically get are 20ms of audio data. (Note that one could easily go to 60-80 ms, blocks without human listener noticing any difference.) With 20ms, 1 byte/sample, 8 samples/ms you get a block of data to encode that is 1280 bits long. That is three orders of magnitude larger than the "online" constraint in the AC modeling/coding paradigm lingo. Any higher quality, and you are several times larger. Another real time app which doesn't tolerate latency would be video codec, and even the differential frame data is almost two orders of magnitude larger than audio. Although the entropy coder won't get the raw samples but some outputs from the transforms & filtering, the above was reduced in all parameters to bare minimum, so in more realistic case that one does get at least as much data even for the entropy coder. Even assuming you can have few hundreds symbols at a time vs just a single symbol (or just few, for AC latency) adds a great deal of flexibility and opens space for new algorithms, for the modeling engine and the coder, as BWT illustrates (or the so-called "offline" dictionary methods) for the modeler or QI for the coder. > Ehem, small correction: You don't have to buffer the whole data, see > for example Nelson's or Moffat's implementation. All you need to do is > to count how many times you have "forgotten" to carry over, and > resolve this as soon as the carry is resolved by propagating it > thru all the "counted" (but never buffered) data. I think that depends on the constraints the task. The AC is adding to a common sum the numbers which decrease in size. Hence, there is no way, even in principle to send data out from the higher digits if they are large enough for carry to propagate, and let decoder decode it incorrectly (and possibly take actions based on), then issue a carry signal to undo that decode. That is what I meant above -- encoder either has to keep all the data that can propagate carry in the worst case or use blocking mechanisms mechanisms (which add redundancy) or use LIFO mode (where everything is held until the whole buffer is encoded). Note also that one pays the cost of FIFO coding not just in cases when one has to propagate carry, or in small extra redundnancy, but the coder is burdened with checking & branching for such cases inside its coding loop, so there is a coding speed penalty of the AC "online" constraint, with no practical benefit at all.

0 |

1/9/2006 6:10:43 PM

>> the probability distribution of the symbols is different, >> then how does QI coder adapt itself to all those different distributions ? > > I dont know the answer to this one. There is a more detailed answer to this question in an another post [M1], see section (b), which is QI style coding using AC modeling engine, which retains the QI speed advantage while coding at the compression efficiency of the AC. That section fleshes out in more detail a shorter comment about this method from the note N7 on p. 9 in [T3]. Much more detailed description of EC/QI modeling is given in [T2], pp. 26-35. > As for nightlights's comments on QI's speed, I am afraid > that as the modelling scheme for QI is different from > modelling scheme for ArithCoding, we will need to compare > speed of "QI+its modelling code" with "AC+its modelling code". > Where both models should be of same order, or chosen to give > same compression ratio. (My doubt here is "what if QI just > *shifts* computation burden to modelling code instead > of reducing it".) Well, you already have results for QI+Model and AC+Model for the simplest case of modeling, order-0 adaptive AC vs order-0 "adaptive" QI (where QI simply counts the bits and selects 0 or 1 based addends to use, as shown in the source). The AC uses its probability adaptation algorithm, which is quite a bit of more work (since it counts bits as well). You can also look at [M1], method (b) and see that any model made for AC, which feeds probabilities to the coder can be used with QI and coded with the full coding speed advantage. In that case the modeling engine is left as is, hence there is _no cost shifting_. The QI simply codes faster because it has a much better division of labor within the coder itself than AC (see [T2] pp. 19-21). You can see that by comparing the AC coding loop with QI coding loop. For example, the computation of all enumerative addends is done separately, by specialized code which does only that and saves the results in a universal table (e.g. quantized binomials for Bernoulli source, factorials for permutations, powers & products for mixed radix codes). The tables are universal in the sense that they encapsulate the generic combinatorial properties of symbol sequences, hence no matter what source probabilities are, a single table is needed. The AC algorithm, due to an intricate entanglement of source probabilities with its coding arithmetic, cannot have such universal tables (see [T2] pp. 19-21) -- it would need a separate table for each source probability. Instead AC's coding loop computes essentially the same addends (rescaled so that max addend is always 1, see [T2] p. 19) using multiplicative recurrences for the addends. In the binary order-0 case, these are simply the binomial coefficients C(n,k) recurrences: C(n,k) = C(n+1,k+1) * (k+1)/(n+1) ...... (1) when symbol 1 is encoded and: C(n,k) = C(n+1,k) * (n-k)/(n+1) ...... (2) when symbol 0 is encoded. The factor p=(k+1)/(n+1) in (1), which is a ratio of the remaining counts of ones, (k+1), and the total remaining symbols (n+1), is interpreted within AC as probability of ones and the factor q=(n-k)/(n+1) in (2) as probability of zeros at that same place. The AC also uses a common scaling factor, where all C(n,k) in (1) & (2) are divided with C(N,K), where N=total bits and K=total 1's. This division with C(N,K) is implicit in AC, i.e. AC starts coding with, what in QI space is the path endpoint (N,K), and sets its addend value to 1 (interpreted in AC as the starting probability of an empty message), then using (1) & (2) it computes next lower coefficients C(N-1,K-1) if 1 is encountered 1st or C(N-1,K) if 0 is encountered 1st, thus it doesn't need to know explicitly the absolute value of C(N,K). Whenever it encounters 1 (less frequent symbol by convention), it adds the current addend (the interval belonging to 0) to the index (which the same as QI does with its integer addends). The main difference is that AC needs to do multiplications in (1) or (2) and do it on every symbol. QI already has all the addends in its table and it doesn't have to do anything for the most frequent symbol. As you can see (or also check in the source), at the coder level itself, the QI has the labor divided much cleaner -- all calculations of the universal properties of the symbol sequences are outside of its coding loop. They're done once for all. The only thing done in the QI coding loop is specific to the given sequence - the particular placement of 1's for that sequence. The AC carries all the calculations, for the universal and for the particular sequence properties in its innermost coding loop, since the two are irreversibly entangled in the AC scheme. The same distortions in the division of labor propagate outwardly as you wrap more layers around the coder, forcing the modeling engine to deform its division of labor around the inefficient logistic inside (see more discussion on this point in [T2] pp. 30-31). The QI modeling is much simpler. The coder does not force the modeler to convert all it knows or what it can find out about the sequence into the probabilities of the next symbol, as AC does with its modeling engine. Since QI doesn't need probabilities to perform coding (but, as shown in [M1], part (b), it can use them just fine if that is what modeler outputs), the QI modeling engine is less constrained, more free to chose from a much larger space of possible algorithms. In essence, what QI's modeler has to do for QI is to tell it which messages are equiprobable with which other messages (so they can be enumerated in the same enumerative class). QI coder doesn't care what the actual probabilities of the messages are, but only whether P(M1)==P(M2), which is a much weaker load on the modeler than asking it what is the value P(M1) and what is the value P(M2), as AC does. Another essential difference between the two modeling schemes is that QI's modeler is trying to describe the sequence, while the AC's modeler is trying to predict the next symbol in the sequence (i.e. calculate all the possible odds for the next symbol). Paraphrasing Yogi Berra, it is much easier to predict the past than to predict the future. Hence, the QI's modeler does have a much easier job at the fundamental level. The general QI/EC modeling pattern is described in [T2] p. 26, with details fleshed out on pp. 27-35. -- References ( http://www.1stworks.com/ref/RefLib.htm ) QI source & tech. reports: http://www.1stworks.com/ref/qi.htm M1. Post on QI modeling: http://groups.google.com/group/comp.compression/msg/1314ff87da597fad T3. R.V. Tomic "Quantized Indexing: Beyond Arithmetic Coding" arXiv cs.IT/0511057, 10p, Nov 2005, 1stWorks TR05-1115 http://arxiv.org/abs/cs.IT/0511057 T2. R.V. Tomic "Quantized indexing: Background information" 1stWorks TR05-0625, 39p, Jun 2005 http://www.1stworks.com/ref/TR/tr05-0625a.pdf

0 |

1/9/2006 9:37:26 PM

David A. Scott wrote: > Matt if I can get nightlight to commit to coding his example of > the 3 symols types. I would like to play again with fpaq0. To see > how much better it can be made with as little change as possible. > I like your style but I don't think I will go to the wall and make > it bijective. But the nine times for each eight can be changed to > eight for eight with a ninth only needed for the last byte. There is some room for improvement. I tried compressing 10,000,000 bytes of random charaters A, B, C. fpaq0 compresses it to 1,982,988 bytes. The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of 1785 bytes. For 1,000,000 bytes it compresses to 198,322 bytes, a difference of 201.7 bytes. -- Matt Mahoney

0 |

1/10/2006 1:17:52 AM

Matt Mahoney wrote: > There is some room for improvement. I tried compressing 10,000,000 > bytes of random charaters A, B, C. fpaq0 compresses it to 1,982,988 > bytes. The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of > 1785 bytes. For 1,000,000 bytes it compresses to 198,322 bytes, a > difference of 201.7 bytes. Funny test I repeated it quickly with 10 compressors already located at my PC: Input (random A,B,C): 0.97 MB (1,024,000 bytes) Output (compressed): 198 KB (203,401 bytes) PAQ 7 199 KB (204,731 bytes) WinRK beta 3.0 build 2 200 KB (205,119 bytes) WinUHA 2.0 202 KB (207,127 bytes) SBC 0.970 202 KB (207,355 bytes) Slim 0.021 206 KB (211,485 bytes) WinRAR 3.51 206 KB (211,632 bytes) Stuffit 9.0.0.21 216 KB (222,042 bytes) 7-ZIP 4.32 229 KB (234,886 bytes) WinZip 9.0 231 KB (237,390 bytes) BZIP2 1.0.2

0 |

1/10/2006 2:46:34 AM

"Matt Mahoney" <matmahoney@yahoo.com> wrote in news:1136855872.776041.80650@g44g2000cwa.googlegroups.com: > David A. Scott wrote: >> Matt if I can get nightlight to commit to coding his example of >> the 3 symols types. I would like to play again with fpaq0. To see >> how much better it can be made with as little change as possible. >> I like your style but I don't think I will go to the wall and make >> it bijective. But the nine times for each eight can be changed to >> eight for eight with a ninth only needed for the last byte. > > There is some room for improvement. I tried compressing 10,000,000 > bytes of random charaters A, B, C. fpaq0 compresses it to 1,982,988 > bytes. The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of > 1785 bytes. For 1,000,000 bytes it compresses to 198,322 bytes, a > difference of 201.7 bytes. > > -- Matt Mahoney > > That interesting but I suspect even though arb255.exe would compress to a much smaller amount. It would not hit the limit yet since raw arb255 is for all possible 256 symbols instead of 3. Yet I suspect it will be much closer to the limit. Do you have a place to get some of the zipped test files. So I can test arb255.exe I think you would be surprised at the difference. Your code does not use the full count you have a limit at 65534 so if the data is mixed well you start to appraoch what I am assuming is the numbers your got. However in your code if there is roughly equal number of both A B and C your code will compress smaller than what your calculating as the theoritical if all the A's followed by all the B's followed by all the C's. My code since its a more true arithmetic does not have these reset points. So you should get roughly within 2 bytes the same length no matter how the bytes occur. My code does sort of what QI promises as far as compression amount but with real files. Both your code and mine are basically using the same tree. Yours is laid out at 512 cells mine is laid out as 255 put you are really only using half the cells since fist bit a constant except at the EOF. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/10/2006 2:46:42 AM

"Sportman" <sportman@gmail.com> wrote in news:1136861194.808260.184880@f14g2000cwb.googlegroups.com: > > Matt Mahoney wrote: >> There is some room for improvement. I tried compressing 10,000,000 >> bytes of random charaters A, B, C. fpaq0 compresses it to 1,982,988 >> bytes. The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of >> 1785 bytes. For 1,000,000 bytes it compresses to 198,322 bytes, a >> difference of 201.7 bytes. > > Funny test I repeated it quickly with 10 compressors already located at > my PC: > > Input (random A,B,C): > 0.97 MB (1,024,000 bytes) > > Output (compressed): > 198 KB (203,401 bytes) PAQ 7 > 199 KB (204,731 bytes) WinRK beta 3.0 build 2 > 200 KB (205,119 bytes) WinUHA 2.0 > 202 KB (207,127 bytes) SBC 0.970 > 202 KB (207,355 bytes) Slim 0.021 > 206 KB (211,485 bytes) WinRAR 3.51 > 206 KB (211,632 bytes) Stuffit 9.0.0.21 > 216 KB (222,042 bytes) 7-ZIP 4.32 > 229 KB (234,886 bytes) WinZip 9.0 > 231 KB (237,390 bytes) BZIP2 1.0.2 > > Could you test with arb255.exe http://bijective.dogma.net/arb255.zip Or do you have a test file. Again arb255 is not tuned for just 3 characters but would work well if the A B C are truely in a random order. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/10/2006 2:55:41 AM

David A. Scott wrote: > Could you test with arb255.exe > http://bijective.dogma.net/arb255.zip Do you have a readme file with command line instructions? > Or do you have a test file. Again arb255 is not tuned for > just 3 characters but would work well if the A B C are > truely in a random order. The test file is send by email did you received it? I tested the same 10 compressors with a more real life file to compare the rankings, note WinRK used round 1,5 hour and PAQ round 30 minutes to compress the test file at a Pentium M single core. Input (DB dump structure with mix between HTML tags, text and data): 32.4 MB (34,017,118 bytes) Output (compressed): 2.54 MB (2,674,010 bytes) WinRK beta 3.0 build 2 2.56 MB (2,686,923 bytes) PAQ 7 2.81 MB (2,948,566 bytes) Slim 0.021 3.22 MB (3,379,942 bytes) WinUHA 2.0 3.52 MB (3,699,846 bytes) SBC 0.970 3.54 MB (3,723,094 bytes) 7-ZIP 4.32 3.62 MB (3,806,862 bytes) Stuffit 9.0.0.21 3.72 MB (3,910,233 bytes) WinRAR 3.51 4.03 MB (4,231,646 bytes) BZIP2 1.0.2 4.84 MB (5,082,559 bytes) WinZip 9.0

0 |

1/10/2006 4:43:39 AM

"Matt Mahoney" <matmahoney@yahoo.com> wrote in news:1136855872.776041.80650@g44g2000cwa.googlegroups.com: > David A. Scott wrote: >> Matt if I can get nightlight to commit to coding his example of >> the 3 symols types. I would like to play again with fpaq0. To see >> how much better it can be made with as little change as possible. >> I like your style but I don't think I will go to the wall and make >> it bijective. But the nine times for each eight can be changed to >> eight for eight with a ninth only needed for the last byte. > > There is some room for improvement. I tried compressing 10,000,000 > bytes of random charaters A, B, C. fpaq0 compresses it to 1,982,988 > bytes. The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of > 1785 bytes. For 1,000,000 bytes it compresses to 198,322 bytes, a > difference of 201.7 bytes. > > -- Matt Mahoney > > Matt AEB255 got 1,981,227 both for a random file of 10,000,000 and a file where first 3,333,334 symbols of A followed by 3,333,333 of B followed by 3,333,333 of C. Which is only 24 bytes different than what you call optimal. But that optimal value you quoted was assuming the coder knows that only the 3 symbols used are A B C and that each occur equally likely. Which is not what our models assume so the corrent value is larger. See the work of Paul Howard and Jeffery Scott Vitter. Since we are using general arithmetic compressors not operating with fixed probabilites there is a cost associated with in the data. With a slight change to either FPAQ0 or ARB255 you will see the excess drop a lot. So lets see if NIGHTLIGHT ever gets a working code for what was really the contest he wanted. The value you quote is what a modifed arb should get or Nightlight code where the code knows that its only 3 sybmbols each occurring at equal rate but don't hold you breath. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/10/2006 4:53:20 AM

"Sportman" <sportman@gmail.com> wrote in news:1136868219.095473.58400@z14g2000cwz.googlegroups.com: > David A. Scott wrote: >> Could you test with arb255.exe >> http://bijective.dogma.net/arb255.zip > > Do you have a readme file with command line instructions? > I thought there was one in it but to compress arb255.exe file.in file.out to decompress unarb255.exe file.in file.out Note this is very slow code its writes stuff to the screen. Its made to do pure bijective arithmetic file compression. >> Or do you have a test file. Again arb255 is not tuned for >> just 3 characters but would work well if the A B C are >> truely in a random order. > > The test file is send by email did you received it? Yes I got it but I don't have code on this machinge to uncompress rar files. Yes I know its common could you send a zip file? > > I tested the same 10 compressors with a more real life file to compare > the rankings, note WinRK used round 1,5 hour and PAQ round 30 minutes > to compress the test file at a Pentium M single core. > > Input (DB dump structure with mix between HTML tags, text and data): > 32.4 MB (34,017,118 bytes) > > Output (compressed): > 2.54 MB (2,674,010 bytes) WinRK beta 3.0 build 2 > 2.56 MB (2,686,923 bytes) PAQ 7 > 2.81 MB (2,948,566 bytes) Slim 0.021 > 3.22 MB (3,379,942 bytes) WinUHA 2.0 > 3.52 MB (3,699,846 bytes) SBC 0.970 > 3.54 MB (3,723,094 bytes) 7-ZIP 4.32 > 3.62 MB (3,806,862 bytes) Stuffit 9.0.0.21 > 3.72 MB (3,910,233 bytes) WinRAR 3.51 > 4.03 MB (4,231,646 bytes) BZIP2 1.0.2 > 4.84 MB (5,082,559 bytes) WinZip 9.0 > > David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/10/2006 5:06:49 AM

"David A. Scott" <daVvid_a_scott@email.com> wrote in news:Xns9746DEE5BBC64H110W296LC45WIN3030R@213.155.197.138: > "Matt Mahoney" <matmahoney@yahoo.com> wrote in > news:1136855872.776041.80650@g44g2000cwa.googlegroups.com: > >> David A. Scott wrote: >>> Matt if I can get nightlight to commit to coding his example of >>> the 3 symols types. I would like to play again with fpaq0. To see >>> how much better it can be made with as little change as possible. >>> I like your style but I don't think I will go to the wall and make >>> it bijective. But the nine times for each eight can be changed to >>> eight for eight with a ninth only needed for the last byte. >> >> There is some room for improvement. I tried compressing 10,000,000 >> bytes of random charaters A, B, C. fpaq0 compresses it to 1,982,988 >> bytes. The theoretical limit is 1/8 lg 3 = 1,981,203, a difference >> of 1785 bytes. For 1,000,000 bytes it compresses to 198,322 bytes, a >> difference of 201.7 bytes. >> >> -- Matt Mahoney >> >> > > Matt AEB255 got 1,981,227 both for a random file of 10,000,000 I meant ARB255 the other was a typo > and a file where first 3,333,334 symbols of A followed by 3,333,333 of > B followed by 3,333,333 of C. Which is only 24 bytes different than > what you call optimal. But that optimal value you quoted was assuming > the coder knows that only the 3 symbols used are A B C and that each > occur equally likely. Which is not what our models assume so the > corrent value is larger. See the work of Paul Howard and Jeffery > Scott Vitter. > > > Since we are using general arithmetic compressors not operating with > fixed probabilites there is a cost associated with in the data. With a > slight change to either FPAQ0 or ARB255 you will see the excess drop a > lot. So lets see if NIGHTLIGHT ever gets a working code for what was > really the contest he wanted. The value you quote is what a modifed > arb should get or Nightlight code where the code knows that its only 3 > sybmbols each occurring at equal rate but don't hold you breath. > > > > David A. Scott David A. Scott David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/10/2006 5:10:19 AM

David A. Scott wrote: > Yes I got it but I don't have code on this machinge to > uncompress rar files. Yes I know its common could you send > a zip file? Done

0 |

1/10/2006 5:16:56 AM

"Sportman" <sportman@gmail.com> wrote in news:1136870215.955060.184180@o13g2000cwo.googlegroups.com: > David A. Scott wrote: > >> Yes I got it but I don't have code on this machinge to >> uncompress rar files. Yes I know its common could you send >> a zip file? > Done > > Your file 1,024,000 bytes compressed to 202,894 bytes David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/10/2006 5:23:10 AM

David A. Scott wrote: > > Could you test with arb255.exe > > http://bijective.dogma.net/arb255.zip > > > > Do you have a readme file with command line instructions? > > > > I thought there was one in it but to compress > arb255.exe file.in file.out > > to decompress > unarb255.exe file.in file.out Thanks this helped: Result test file 1: 198 KB (202,894 bytes) Result test file 2: 22.4 MB (23,500,615 bytes) Did I something wrong?

0 |

1/10/2006 5:27:42 AM

"Sportman" <sportman@gmail.com> wrote in news:1136870862.023091.297960@g44g2000cwa.googlegroups.com: > David A. Scott wrote: >> > Could you test with arb255.exe >> > http://bijective.dogma.net/arb255.zip >> > >> > Do you have a readme file with command line instructions? >> > >> >> I thought there was one in it but to compress >> arb255.exe file.in file.out >> >> to decompress >> unarb255.exe file.in file.out > > Thanks this helped: > > Result test file 1: > 198 KB (202,894 bytes) > > Result test file 2: > 22.4 MB (23,500,615 bytes) Did I something wrong? > > maybe don't try unix style with the > or < you have to type in the whole line. Here is what I do. to compres a file A.txt arb255 a.txt a.255 unarb255 a.256 a.25 fc /b a.txt a.a25 the a,255 is compressed of a.txt a.25 is uncompressed and fc checks to see if they are the same. Since it bijective if you run unarb255 it will uncompress to another file that is usually longer but when you compress it comes back. The code it completely bijectibe but don't till nightlight not sure he could take it from a simple arithmetic coder. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/10/2006 5:34:57 AM

nightlight wrote: ) The regular adaptive or static AC's that one normally finds in ) practical implementations, there will be an additional redundancy ) relative to QI which, in case of order-0 stationary sources can be ) derived exactly, as shown in [T2] pp. 20-24). That redundancy is in the ) leading order 1/2 log(2P npq) for binary (for general alphabet of size ) A, there are A-1 such terms which are summed, resulting in approx 1/2 ) changing to (A-1)/2). As shown in [T2], this error is due to the ) approximate enumeration of AC, where the Stirling approximation and the ) subsequent dropping of the sqrt() & other factors (each being <1), ) causes AC addends to increase relative to QI, leading to an excess of ) O(log(n)) bits on the total output. By the way, this excess you are talking about is *not* present as coding gaps. It is present as an imperfect mapping to the output domain, resulting in some files compressing larger, but others compressing *smaller* than they should, according to their probability distribution. ) a) the compressed output size is available without decompressing the ) data -- for example you know exactly the size of binomial coefficient ) C(n,k) from count of 1's. ) ) b) For fixed input entropy rate QI codes at _precisely_ the fixed ) number of bits (which is also within the log(e)/2^(g-1) bits from the ) entropy, at precision g and model used) e.g. all n! permutations of n ) items will always be encoded into the _precisely_ the same size (you ) can test that with QI.exe or see it in the source). If the data fits these restrictions, then that is additional modeling information that should be present in the model, and if so, AC will take advantage of it as well as QI. ) In conclusion, the availability of an extremly fast and a highly ) accurate (optimal for any given aritmetic precsion) coding algorithm to ) perform such coding tasks, opens the entire realm of unexplored ) compression algorithms. The compression gains potentially available to ) such algorithms are not limited at all to the few percent one can see ) on the subset of coding tasks where Huffman or AC code well (which to ) those over-conditioned to seeing only that domain, appears to be all ) there is to compress). I wasn't debating that. I was debating your absolute 'always better' claim, and your 'holy grail' attitude. SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT

0 |

1/10/2006 8:23:35 AM

Hi, > I did few of these, all mods from existent audio codecs, for our > current communication product ( http://www.hotComm.com ). Even here, > though, take the most basic low quality voice with 8000 samples/sec, > and take low precision samples of just 1 byte per sample (this is not > what we normally use). The segments we typically get are 20ms of audio > data. (Note that one could easily go to 60-80 ms, blocks without human > listener noticing any difference.) With 20ms, 1 byte/sample, 8 > samples/ms you get a block of data to encode that is 1280 bits long. > That is three orders of magnitude larger than the "online" constraint > in the AC modeling/coding paradigm lingo. Any higher quality, and you > are several times larger. Another real time app which doesn't tolerate > latency would be video codec, and even the differential frame data is > almost two orders of magnitude larger than audio. Although the entropy > coder won't get the raw samples but some outputs from the transforms & > filtering, the above was reduced in all parameters to bare minimum, so > in more realistic case that one does get at least as much data even for > the entropy coder. > Even assuming you can have few hundreds symbols at a time vs just a > single symbol (or just few, for AC latency) adds a great deal of > flexibility and opens space for new algorithms, for the modeling engine > and the coder, as BWT illustrates (or the so-called "offline" > dictionary methods) for the modeler or QI for the coder. A war-time story from my side: JPEG-1 has an arithmetic coder option, the QM-coder, which does have a latency problem because the coder can, in principle, delay the output arbitrarely long, depending on the carry-over resolution. There was a long discussion over this issue in the group because this actually disallows efficient hardware designs. Thus, it *is* a problem, at least for some people. > > Ehem, small correction: You don't have to buffer the whole data, see > > for example Nelson's or Moffat's implementation. All you need to do is > > to count how many times you have "forgotten" to carry over, and > > resolve this as soon as the carry is resolved by propagating it > > thru all the "counted" (but never buffered) data. > I think that depends on the constraints the task. The AC is adding to a > common sum the numbers which decrease in size. Hence, there is no way, > even in principle to send data out from the higher digits if they are > large enough for carry to propagate, and let decoder decode it > incorrectly (and possibly take actions based on), then issue a carry > signal to undo that decode. That is what I meant above -- encoder > either has to keep all the data that can propagate carry in the worst > case or use blocking mechanisms mechanisms (which add redundancy) or > use LIFO mode (where everything is held until the whole buffer is > encoded). But that's not what you've written. Clearly, you cannot send the data until the carry propagation has been resolved, (unless you sacrifize some coder efficiency, that is) but that's not "keeping" the data, as I would call it. It is represented in the encoder, sure, but not "as is", but as a "carry-over" count. Thus, one doesn't require an arbitary long buffer for that. Just an arbitrary large counter. (-; > Note also that one pays the cost of FIFO coding not just in cases when > one has to propagate carry, or in small extra redundnancy, but the > coder is burdened with checking & branching for such cases inside its > coding loop, so there is a coding speed penalty of the AC "online" > constraint, with no practical benefit at all. I don't understand what you mean by "no practical benefit". AC online coders without the carry-over problem (i.e. bitstuffing) are *very* practical, and they are all realistically used in real applications, i.e. JBIG, JPEG2000, etc... and that, *because* the hardware folks cannot get away with arbitrary long delays. The intermediate communication protocols don't allow it. However, note that this drifts the communication away from the initial discussion, thus we stop here. So long, Thomas

0 |

1/10/2006 9:13:05 AM

> As I see it, there are no excess bits in an Arith > Coder, until the coder is told to wrap it up. Can > you give some clear arguments why you claim that > the bits added by termination are present before > termination ? I thought the explanation was pretty clear. Well, perhaps a numeric example showing the accumulation of errors may help. I will use below results from my previous message [M1]. The ACX (infinite precision AC) redundancy formula (from [M1]) is: L = ceiling(log(1/Pc)) + 1 .... (1) which gives the length L of the ACX output in terms of Pc, which is the "coding probability" for the entire message. AC computes Pc by multiplying coding probabilities of all symbols encountered along the way. With ACX, the products are exact, while with ACF (finite precision AC), the products are approximate which will add 2-3 bits to L in (1) (before ACF counts overflow i.e. generally, ACF adds excess bits at the approximate rate of log(e)/2^(g-2) bits per symbol). Note also that Pc is always a product of probabilities generated by the AC model along the way. For order-0 stationary models the factors used are constants, but in a general model the factors used may vary from step to step. Let's now take a finite precision AC and watch its computation of Pc used in (1). For simplicity we'll calculate in decimal notation (we'll still use base 2 log in (1) to count bits). We'll use static binary source with probabilities p(1)=0.3 and p(0)=0.7 and we will set ACF precision to 1 decimal digit. Let's look at what happens to Pc when coding string 00100: 0. Initial Pc=1.0 (prob=1 for empty message) ------- GAPS 1. In C=0 => Pc = 1.0 * 0.7 = 0.7 => Pc = 0.7 e+0 G=0.000 2. In C=0 => Pc = 0.7 * 0.7 = 0.49 => Pc = 0.4 e+0 G=0.090 3. In C=1 => Pc = 0.4 * 0.3 = 0.12 => Pc = 0.1 e+0 G=0.020 4. In C=0 => Pc = 0.1 * 0.7 = 0.07 => Pc = 0.7 e-1 G=0.000 5. In C=0 => Pc = 0.07* 0.7 = 0.049 => Pc = 0.4 e-2 G=0.009 The ACX's exact Pc is Pcx = 0.7^4 * 0.3 = 0.07203, while the APF's approximate Pc is 0.04 which is smaller than Pcx. The output sizes (rounded to whole bits) are: L(Pcx)=[3.795]+1=5 bits, L(Pc)=[4.644]+1=6 bits. The gaps created in the Pc (or in the "range" in ACF's terminology) are shown next to each truncation. Note, for example in step 2, the product 0.49 was truncated to 0.4 and 0.09 gap was created in Pc, which via (1), but without rounding to the next whole bit since we're not outputing yet anything, results in the extra log(0.49/0.40) = 0.29 bits accumulated in the ACF's L compared to the ACX's L at that point. Important element to observe in step 2 is that the gap of 0.09 in the Pc (or equiv. in ACF's range), arising when encoding symbol '0' in step 2 did not get reassigned to symbol '1', since if '1' were the second symbol, its product would have been truncated as well, as shown below in alternate step 2a: 2a. In C=1 => Pc = 0.7 * 0.3 = 0.21 => Pc = 0.2e+0 G=0.01 Hence, either symbol in step 2 would have created a gap in the Pc (range), i.e. the part of the Pc coverage of the full interval [0,1) for all messages is simply wasted. For example, all 2 symbol messages are 00, 01, 10, 11. Their Pc values, as computed by ACF, add up to: 0.4 + 0.2 + 0.2 + 0.9e-1 = 0.89 leaving the total gap of 0.11 in the coverage of the interval [0,1) (or shortfall from 1 in the Kraft inequality (4) in [M1]). The exact Pcx for these 4 messages always add up to exactly 1 and thus have no this gap. Note that ACX does introduce a gap in [0,1) when truncating the final Pcx to a finite fraction in order to obtain a finite length codeword, which results in the baseline 1-2 bit redundancy it creates (see CWI discussion in the earlier post [M1]). But the extra gaps & the corresponding extra redundancy added by ACF, accumulate from symbol by symbol truncation errors, step by step, as shown above. If you have time and want to play with this more, you should be able to replicate these same conclusions from this little '1 decimal digit ACF' in an actual ACF by making it output Pc for all input messages of certain length (note that the usual ACF's "range" is an integer, a mantissa of Pc, and ACF doesn't' keep track of its exponent explicitly, so you will need to keep track of that yourself to display Pc's as decimal numbers; the QI's SW formalism is a much cleaner and more tidy way to do these kinds of computations; the ACF in SW formulation flows much more coherently than the usual handwaving at its little pictures). (Note: before you lose the context of the post again and bolt off onto that tangent 'who cares, it's so tiny', as you did in the previous post, recall what this was replying to: you were insisting to teach me on the AC gaps and asserting that there are none.) -- References ( http://www.1stworks.com/ref/RefLib.htm ) T1-T3 are on http://www.1stworks.com/ref/qi.htm M1. Post on quantization errors in AC & QI: http://groups.google.com/group/comp.compression/msg/b157e1aa25e598d8 41a. P.A.J. Volf "Weighting Techniques In Data Compression: Theory and Algorithms" Ph.D. thesis, Eindhoven University of Technology, Dec 2002 http://alexandria.tue.nl/extra2/200213835.pdf

0 |

1/10/2006 12:31:29 PM

Hi Again, > Let's now take a finite precision AC and watch its computation of Pc > used in (1). For simplicity we'll calculate in decimal notation (we'll > still use base 2 log in (1) to count bits). We'll use static binary > source with probabilities p(1)=0.3 and p(0)=0.7 and we will set ACF > precision to 1 decimal digit. Let's look at what happens to Pc when > coding string 00100: > 0. Initial Pc=1.0 (prob=1 for empty message) ------- GAPS > 1. In C=0 => Pc = 1.0 * 0.7 = 0.7 => Pc = 0.7 e+0 G=0.000 > 2. In C=0 => Pc = 0.7 * 0.7 = 0.49 => Pc = 0.4 e+0 G=0.090 > 3. In C=1 => Pc = 0.4 * 0.3 = 0.12 => Pc = 0.1 e+0 G=0.020 > 4. In C=0 => Pc = 0.1 * 0.7 = 0.07 => Pc = 0.7 e-1 G=0.000 > 5. In C=0 => Pc = 0.07* 0.7 = 0.049 => Pc = 0.4 e-2 G=0.009 > The ACX's exact Pc is Pcx = 0.7^4 * 0.3 = 0.07203, while the APF's > approximate Pc is 0.04 which is smaller than Pcx. The output sizes > (rounded to whole bits) are: L(Pcx)=[3.795]+1=5 bits, L(Pc)=[4.644]+1=6 > bits. The gaps created in the Pc (or in the "range" in ACF's > terminology) are shown next to each truncation. > Note, for example in step 2, the product 0.49 was truncated to 0.4 and > 0.09 gap was created in Pc, which via (1), but without rounding to the > next whole bit since we're not outputing yet anything, results in the > extra log(0.49/0.40) = 0.29 bits accumulated in the ACF's L compared to > the ACX's L at that point. And that's where the computation starts getting incorrect. You do not It might be that Pc gets rounded to 0.4, but that also means that the interval for the other symbol gets larger, thus there is no gap. This means, then, that the probabilies that are imposed by the AC coder no longer fit to the source (true), but it does not mean that there are gaps. The 0.090 here is the derivation of the "idealized" probability and a quantized one, Qc, that differs from Pc due to the finite precision. Note that this is different from ELS, where we do have gaps, indeed. > Important element to observe in step 2 is that the gap of 0.09 in the > Pc (or equiv. in ACF's range), arising when encoding symbol '0' in step > 2 did not get reassigned to symbol '1', since if '1' were the second > symbol, its product would have been truncated as well, as shown below > in alternate step 2a: > 2a. In C=1 => Pc = 0.7 * 0.3 = 0.21 => Pc = 0.2e+0 G=0.01 No, Pc = 0.3, G = -0.090, resulting in a total "gap" of zero. The confusion arises because you use only the code-word where AC in fact uses an interval I to represent the values. 0. Initial I = [0,1) = [0,0.7) U [0.7,1) 1. In C=0 => [0,0.7). At this time, the interval cannot yet be rescaled since the first digit (7) is not yet fixed, 2. In C=0 => [0,0.4). No digit written either. Instead, with 2a: 2a. In C=1 => [0.7,0.4). No digit written either, Pc = 0.3, Gap = -0.090 Rescaling doesn't happen in your example, because the end-points of the interval are not yet "fixed enough" to write out data. > If you have time and want to play with this more, you should be able to > replicate these same conclusions from this little '1 decimal digit ACF'. Yes, please play with the above a bit more. (-; So long, Thomas

0 |

1/10/2006 1:25:21 PM

nightlight wrote: )> As I see it, there are no excess bits in an Arith )> Coder, until the coder is told to wrap it up. Can )> you give some clear arguments why you claim that )> the bits added by termination are present before )> termination ? ) ) I thought the explanation was pretty clear. Well, perhaps a numeric ) example showing the accumulation of errors may help. Read my sentence again. 'the bits added at termination'. I was talking about the fixed cost. ) Let's now take a finite precision AC and watch its computation of Pc ) used in (1). For simplicity we'll calculate in decimal notation (we'll ) still use base 2 log in (1) to count bits). We'll use static binary ) source with probabilities p(1)=0.3 and p(0)=0.7 and we will set ACF ) precision to 1 decimal digit. Let's look at what happens to Pc when ) coding string 00100: ) ) 0. Initial Pc=1.0 (prob=1 for empty message) ------- GAPS ) 1. In C=0 => Pc = 1.0 * 0.7 = 0.7 => Pc = 0.7 e+0 G=0.000 ) 2. In C=0 => Pc = 0.7 * 0.7 = 0.49 => Pc = 0.4 e+0 G=0.090 ) 3. In C=1 => Pc = 0.4 * 0.3 = 0.12 => Pc = 0.1 e+0 G=0.020 ) 4. In C=0 => Pc = 0.1 * 0.7 = 0.07 => Pc = 0.7 e-1 G=0.000 ) 5. In C=0 => Pc = 0.07* 0.7 = 0.049 => Pc = 0.4 e-2 G=0.009 Why are you rounding down all the time ? In AC, some of the symbols probabilities are rounded up, in such a way that the total always adds up to the total range. In other words, the 'gap' for one of the symbols gets added to the code space for the other symbol. ) (Note: before you lose the context of the post again and bolt off onto ) that tangent 'who cares, it's so tiny', as you did in the previous ) post, recall what this was replying to: you were insisting to teach me ) on the AC gaps and asserting that there are none.) You were asserting that there are gaps _at each step_, which is false. It then looked as if you were suddenly talking about only the gaps that occur at termination, which I discarded as irrelevant, but which you jumped on with glee to point out I was wrong. SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT

0 |

1/10/2006 1:32:01 PM

The QI.exe file which you may already have (from the source; current source version is 1.03) has a command line option to test it on that same input (which is a high entropy limit for multi-alphabet coding, and which in I call radix codes): QI cr3 n1000000 i100 which tells it to code inputs in radix 3 (this can be any 32 bit value above 2), to use input of 1 million symbols (there is a constant MAXRDIG in Intro.h which limits the input size to max 2^20 or 1M digits, you can change that to allow larger sizes e.g. to 16 MEG) and to run the test 100 times on 100 random inputs (i100 for 100 iterations). The size it produces is 1584962.50... bits, which compared to the exact N*log(3) entropy has an excess of 1.62 e-04 bits on the total of 10^6 symbols (i.e. the excess per symbol is 1.6e-10 bits). To compare that with AC output size, one option is to make AC work in static mode without adapting to probabilities and make it not count the transmission of frequency table or number of symbols n (which is the same condition that the QI.exe figure applies to). Alternatively, you can add to QI's output the size to transmit N, A and the frequency table. QI.exe has a command QI cl<int> which computes self-delimiting size for <int>, or just "QI cl" to list a table for common values. There you get for N=10^6 its self-delimiting length L(N)=27.543 and for L(A)=2.49 bits. The cost for frequency table with QI/EC is the log of the binomial C(N+A-1,A-1), for N=10^6 and A=3, which is log(C(1000002,2))=38.863 bits, which totals (each rounded separately, which they don't need to) 28+3+39=70 bits to be added to QI's output to match the adaptive AC's coding conditions. Since the QI's output was essentially the entropy, the QI's total is 70 at most whole bits above the "entropy" (note the "entropy" N*log(3) didn't include N; also in high entropy limit QI doesn't need to transmit freq. table, but one would need to modify AC to work in high entropy limit, so I added table to QI, which distorts a bit comparison to entropy H). Now, running the Moffat98 coder (in 8 bit max symbol coding mode & frugal bits enabled), it outputs: 1588435.52 bits (this is avg. over 100 iterations), which is 3473 bits above the entropy, or 3403 bits above the comparable QI output size. (Note that Mofat98 coder has generally a slight bias to code worst for the max entropy inputs, but it gains in return on very low entropy inputs.) To compare speeds properly, one would need to modify QI's radix coder to use 8 bit alphabet size limit, instead of 32 bit limit, otherwise QI pays cost for accessing 4 times larger memory (and few other related costs in the code, such as 32 bit multiplies or padding 4 times larger output buffers, etc). Without adjusting the max alphabet size, QI (on a 2G P4 laptop) codes at 22 ns/sym while Moffat98 at 96 ns/sym which is a ratio 4.36. That is a smaller ratio vs Mofat98 than the for the binary coder high entropy limit vs Moffat98, which is about 6. I think that when both coders are normalized to the same max alphabet (thus to use the same buffers they for input & width of multiplies), it would probably come out the same high entropy ratio of 6 as in the binary case.

0 |

1/10/2006 1:51:49 PM

> Why are you rounding down all the time ? > In AC, some of the symbols probabilities are rounded up, > in such a way that the total always adds up to the total range. Those are not probabilities being rounded down but the update to the range size (which is the mantissa of the Pc, the coding probability of the entire message up to that point). Check any AC source and watch div in the coder to calc new range size (which discards the reminder), or take a look in (41a) p. 48, formulas for WNC coder (see the 1st line with asterisk in pseudo code). That is always rounded down. > It then looked as if you were suddenly talking about only the gaps > that occur at termination, which I discarded as irrelevant, but > which you jumped on with glee to point out I was wrong. The +1 in eq. (1) is the ACX (exact AC) truncation error cost for the last interval, when it does need to produce finite fraction. That last interval does have an interval gap as well, that is what those CWI intervals were about. The exess above that +1 bit is the accumulated error due to rounding along the way. The ACF error produces total of 2-3 bit on top of +1. You can measure it accumulate if you force output at any point and subtract the last +1 (which need to go, by definition on the last CWI interval specification). Anything above +1 in any intermediate stage is the accumulated excess. Note that total ACF output has a latency of approx. g bits, so until you make it clear up its internal state (and also adjust for +1 since that one is by definition on the last interval only) you can't see what the excess is.

0 |

1/10/2006 2:11:03 PM

> look in (41a) p. 48, formulas for WNC coder (see the > 1st line with asterisk in pseudo code). That should say: > 3rd line with asterisk..

0 |

1/10/2006 2:52:32 PM

nightlight wrote: )> Why are you rounding down all the time ? ) )> In AC, some of the symbols probabilities are rounded up, )> in such a way that the total always adds up to the total range. ) ) Those are not probabilities being rounded down but the update to the ) range size (which is the mantissa of the Pc, the coding probability of ) the entire message up to that point). Check any AC source and watch div ) in the coder to calc new range size (which discards the reminder), or ) take a look in (41a) p. 48, formulas for WNC coder (see the 1st line ) with asterisk in pseudo code). That is always rounded down. If the AC source you checked rounds down to calc the new range size, then you have a very poorly written AC source. No wonder you are misguided about gaps. SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT

0 |

1/10/2006 2:59:23 PM

Matt Mahoney wrote: > David A. Scott wrote: > > Matt if I can get nightlight to commit to coding his example of > > the 3 symols types. I would like to play again with fpaq0. To see > > how much better it can be made with as little change as possible. > > I like your style but I don't think I will go to the wall and make > > it bijective. But the nine times for each eight can be changed to > > eight for eight with a ninth only needed for the last byte. > > There is some room for improvement. I tried compressing 10,000,000 > bytes of random charaters A, B, C. fpaq0 compresses it to 1,982,988 > bytes. The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of > 1785 bytes. For 1,000,000 bytes it compresses to 198,322 bytes, a > difference of 201.7 bytes. > > -- Matt Mahoney I posted fpaq1.cpp to http://www2.cs.fit.edu/~mmahoney/compression/#fpaq0 It is an improved order 0 arithmetic coder using 64 bit arithmetic. On a 10MB file which repeats "ABC" it is 25 bytes over the theoretical limit, and I believe most of this is due to approximations made by the model early in compression. -- Matt Mahoney

0 |

1/10/2006 3:14:31 PM

> And that's where the computation starts getting incorrect. You donot > It might be that Pc gets rounded to 0.4, but that also means that the > interval for the other symbol gets larger, thus there is no gap. The Pc calculation, which is the total message "coding probability" (meaning it uses AC's model probabilities for its computation as a product for all p's of the symbols encountered along the way), rounds down on every symbol. Check any AC source code and look the integer divison when they update the total range size. The formulas in [41a] p. 48, show how this is done in the WNC87 coder (look at the 3rd line with the asterisk in the pseudo code). The same goes for Moffat98, except here they divide first, then multiply. In all cases, though, the result is rounded down (the integer "div" discards the reminder). There is no check for symbol value and a branch for one or another symbol value. It is unconditional loss of reminder -- the reminder gets discarded on every symbol, every time the new range is calculated. These are pure gaps in the coding interval (or shortfalls from 1 in the Kraft inequality). You can easily see them if you make AC output codewords for all possible inputs M of given length and add up Pc(M), which are the total message probabilities, for all possible inputs M of given length. The sum will come out smaller than 1, just as the numeric examples show. What you Willem are talking about is the division of the range among different symbols, in which case your compensation does apply. But the total interval has shrunk during the total interval update (when the integer div rounds down on every symbol). Note also that index computed by ACF is not rounded itself. This works exactly the same way as the QI's arithmetic, where eq. (21) computes path counts for given point using rounding up arithmetic, while the index formulas (22)-(23) use exact arithmetic, with no rounding. With QI these two calculations are separate, the rounding stuff is done only the for the tables to obtain quantized binomials.

0 |

1/10/2006 3:15:28 PM

Jo. > The Pc calculation, which is the total message "coding probability" Yes. > (meaning it uses AC's model probabilities for its computation as a > product for all p's of the symbols encountered along the way), rounds > down on every symbol. No. Definitely. Not. > Check any AC source code and look the integer divison when they update > the total range size. You typically do not update the total range size, but rather the coding interval low and high, and you have to do this consistently. If you always round here to the same direction, some intervals will get larger, and some smaller. I placed an example on top which is pretty much realistic except that I round to powers of two instead to powers of ten in real applications. Otherwise, that's the code, sorry. > The formulas in [41a] > p. 48, show how this is done in the WNC87 coder (look at the 3rd line > with the asterisk in the pseudo code). The same goes for Moffat98, No. You are confused. If you always round down *interval boundaries*, you do not round down *interval sizes* because the topmost interval then gets larger. Pick for example Nelson's coder from the net (just google for Nelson, Arithmetic Coding), then see yourself. > What you Willem are talking about is the division of the range among > different symbols, in which case your compensation does apply. But the > total interval has shrunk during the total interval update (when the > integer div rounds down on every symbol). No, see the example, compute yourself. The *top* interval gets larger than it should be if you round down its lower boundary. It is really that simple. AC coding does not have coding gaps. ELS does. AC coding has a quantization of probabilities, though. So long, Thomas

0 |

1/10/2006 4:14:08 PM

>> in the coder to calc new range size (which discards >> the reminder), or take a look in (41a) p. 48, formulas >> for WNC coder (see the 3rd line with asterisk in pseudo >> code). That is always rounded down. > > If the AC source you checked rounds down to calc the > new range size, then you have a very poorly written > AC source. No wonder you are misguided about gaps. I gave you above the places to check, which are not some "poorly written AC source" but the well known reference implementations of the AC coders. So, look Moffat98 or WNC87 source or the reference [41a] above which shows it quite clearly how the Pc is updated. You are welcome to show a coder which doesn't round down the size of the updated total range on every symbol. (It still has to be able to decode, though.) --- References: 41a. P.A.J. Volf "Weighting Techniques In Data Compression: Theory and Algorithms" Ph.D. thesis, Eindhoven University of Technology, Dec 2002 http://alexandria.tue.nl/extra2/200213835.pdf

0 |

1/10/2006 4:17:12 PM

"Matt Mahoney" <matmahoney@yahoo.com> wrote in news:1136906071.456046.325470@g49g2000cwa.googlegroups.com: > > Matt Mahoney wrote: >> David A. Scott wrote: >> > Matt if I can get nightlight to commit to coding his example of >> > the 3 symols types. I would like to play again with fpaq0. To see >> > how much better it can be made with as little change as possible. >> > I like your style but I don't think I will go to the wall and make >> > it bijective. But the nine times for each eight can be changed to >> > eight for eight with a ninth only needed for the last byte. >> >> There is some room for improvement. I tried compressing 10,000,000 >> bytes of random charaters A, B, C. fpaq0 compresses it to 1,982,988 >> bytes. The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of >> 1785 bytes. For 1,000,000 bytes it compresses to 198,322 bytes, a >> difference of 201.7 bytes. >> >> -- Matt Mahoney > > I posted fpaq1.cpp to > http://www2.cs.fit.edu/~mmahoney/compression/#fpaq0 > It is an improved order 0 arithmetic coder using 64 bit arithmetic. On > a 10MB file which repeats "ABC" it is 25 bytes over the theoretical > limit, and I believe most of this is due to approximations made by the > model early in compression. > > -- Matt Mahoney > I find this very interesting. The fact your coding 10,000,000 zeros should at least add roughly 5.8 bytes to my anwser. Instead they differ by 1 byte since I got 24 bytes 1,981,227 for what your claiming optimal at 1,981,203 Just complied and ran your code a 10,000,000 file that was ABCABC...ABCA note one exta A My code compress to 1,981,227 from what you should was opitmal of 1,981,203 You code compressed to 1,981,232 which is only 5 bytes longer mine. So both I hope are doing what they should. I don't see how you say this new one is 25 bytes over isn't it 29 bytes over? David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/10/2006 4:53:33 PM

>> The formulas in [41a] >> p. 48, show how this is done in the WNC87 coder >> (look at the 3rd line with the asterisk in the >> pseudo code). The same goes for Moffat98, > > No. You are confused. If you always round down *interval > boundaries*, you do not round down *interval sizes* > because the topmost interval then gets larger. Sorry, I gave you the wrong page & the wrong line number (that was for similar code in the decoder). Take a look in (41a) p. 47, the formula in the 1st line with astersk. which shows the interval size update. Here he updates (in simplified notation) the integer quantity P(n-1), which is his integer range for n-1 symbols, using (in abbreviated notation, see the paper): P(n) = floor[p * P(n-1)] .... (1) where p is the coding probability of the n-th symbol xn, which is the symbol now being encoded (for p he uses the conditional probability that the new symbol xn, just found, occurs after the given n-1 previous symbols). There is no other statement which assigns a value (updates) quantity P(n). Note that at the top he initializes value P(0) to 2^F (where F is the mantissa precision in bits), i.e. the quantity P is the mantissa of the actual Pc, meaning the initial width is 1.00.... The fraction dropped in (1) is gone from P(n) which is to be used on next step. There are no other updates of P(n). It simply lost the fraction at its edges irreversibly, no condition, no compensation for it anywhere else. The compensation occurs in the next line, where he calculates the new lower boundary point of the interval, Q(n) as (again streamlined, see paper): Q(n) = Q(n-1) + floor [p(x<xn) * P(n-1)] .... (2) where now p(x<xn) is conditional cumulative probability for symbols ahead of xn. For example (this is binary coder) if xn=0, then (2) doesn't add anything since no other symbol x is smaller than xn. If xn=1 then (2) adds to the lower boundary Q(n-1), the product: p(0)*P(n-1), where p(0) is probability of 0. The same conclusion follows from the basic accounting of the precisions and buffers used by AC: there are two real numbers < 1.0 in the coding loop whose precision grows indefinitely for the unlimited precsion AC: a) the Pc, which is the product of all probabilities of the symbols encounterd in the input, and b) the cummulative probability Qc (which is the exact full precision rational number < 1.0; note that the Q(n) in (2) is just an integer representing the trailing bits of Qc as seen from the current AC window). The Qc is the content of the AC's encoded output. The precision of that number does grow indefinitely, since it is the output itself. The other number, also large with unlimited precsion AC, the message probability Pc does not grow indefinitely. Its precision is reduced in each step in (1). Its fraction beyond the F bits precision is discarded unconditionally and irreversibly after each new symbol -- in (1) we multiply previous Pc with the probability of the new symbol, and truncate the result to F significant bits. If you are saying that at the end the AC has somehow also computed the product Pc in the full precision, then where is it stored? That number is _independent_ of the value Qc, hence you would need a second number of indefinite length, besides Qc, which would be some kind of second work buffer of the size of output. There is no such buffer in AC.

0 |

1/10/2006 5:20:59 PM

I gave you the incorrect page & line number to look at (that was for a similar code in the decoder, which still does the same, but it may confuse the discussion). The proper page for the encoder is 47 in [41a], the 1st line with asterisk. See the details on that code in the post below to Thomas Richter (you're both wrong the same way): http://groups.google.com/group/comp.compression/msg/39c25c38b882532e

0 |

1/10/2006 5:28:33 PM

"nightlight" <nightlight@omegapoint.com> wrote in news:1136914113.774637.18840@f14g2000cwb.googlegroups.com: > > I gave you the incorrect page & line number to look at (that was for a > similar code in the decoder, which still does the same, but it may > confuse the discussion). > > The proper page for the encoder is 47 in [41a], the 1st line with > asterisk. See the details on that code in the post below to Thomas > Richter (you're both wrong the same way): > > Actually you have yet to give a correct page and line number to show what you claim. All you do is farther show people how will read you are yet how little you seem to comprehend. Why is that? You have proposed a test for a arithmetic coder to put it in a bad light yet. Even this simple test you can't seem to do with a coder of your own design. Why is that? The fact that there are simple arithmetic coders that contain no gaps at the end of file or through the compression show that you know nothing about arithmetic compression. Quoting or misquoting a paper has little to do with reality when working code exists that shows your wrong. What is it about simple real world working code that you can't comprehend. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/10/2006 5:47:39 PM

> Even this simple test you can't seem to do with a coder of > your own design. Why is that? The QI source code (which was public for a while now) and the compiled executable QI.exe which comes with it, compress the 1,000,000 symbol input, alphabet size A=3, to smaller size than any AC you can ever have (even in principle). I already posted on that few messages earlier. You can run the program with the command line given there and see for yourself (and then look in the source and see how it did it). If you want a 10,000,000 symbol input, you need to change the upper limit in the header to 10,000,000, as described in that post (the default limit for max number of symbols in the source was arbitrarily set to 1M symbols). If you want it to read/write to a file, you can play with that, too. That's why the code was released in the first place -- so that anyone who is interested in specific questions that would require some work on my part which isn't already on my list of things to do, can do it himself and find out the answer. There is also the 1stWorks Corp. contact email on the QI web page where you can send any other requests for consideration. -------- Earlier post on QI.exe coding performance on 1,000,000 symbol (A=3) input: http://groups.google.com/group/comp.compression/msg/ff1ee67d18b63f5a

0 |

1/10/2006 6:19:50 PM

"nightlight" <nightlight@omegapoint.com> wrote in news:1136917190.294822.41540@g47g2000cwa.googlegroups.com: > > The QI source code (which was public for a while now) and the compiled > executable QI.exe which comes with it, compress the 1,000,000 symbol > input, alphabet size A=3, to smaller size than any AC you can ever have > (even in principle). I already posted on that few messages earlier. You > can run the program with the command line given there and see for > yourself (and then look in the source and see how it did it). If you > want a 10,000,000 symbol input, you need to change the upper limit in > the header to 10,000,000, as described in that post (the default limit > for max number of symbols in the source was arbitrarily set to 1M > symbols). If you want it to read/write to a file, you can play with > that, too. > > Again you show your incredable lack of intelligence. I write bijective arithemtic coders. Even if your QI was any good and the more you give useless rants it seems likely its not very good since you don't have a basic understanding of arithmetic compression. This has been pointed out to you several times. Let me try to explain it to you one more time. BIJECTIVE ARITHEMTIC FILE CODERS EXIST. Therefore the set of compressed files is nothing but a reordering of every possible input file. There are no gaps period. Again since you seem not to comprehend the obvious. THERE ARE NO GAPS. If the problem is such you make the BIJECTIVE ARITHMETIC CODER work only on files of type X in this case to be in X the file has to have only the letters A B C. Let the output set be any possible file call the set Y then if x is any element of X and y is any element of Y and the coding is bijective. That is compress( uncompress ( y) ) = y and uncompress ( compress (x) ) = x You have an optimal compressor by defination. If the above holds for every possible x and y. Can you do this in a QI file compression the more you stall the less likely you can. The point is even if you could write an optimal compress for this class of files at the best its a slightly different reordering and thats only if you could write such a compressor. But in terms of compression it will never always beat the equivalent bijective file compressor. Its not rocket science I am sure if you have normal people working for you even they could tell you where you have gone wrong. Unless you would fire them for daring to question your great knowledge. Again this is not a real world test and you know it. But just for laughs how small does it compress a 1,000,000 symbol input where A = 3? Is this a zero order static compression or not? You do know the difference don't you? And if you are only calculating the size of the index its not a fair comparasion. Since you yourself state you needed two other numbers to go along with it. ONE a count field for the number of inputs. TWO the number of inputs where a one is used. So thats 3 fields. Do you skip this important info in your compression comparasion with the highly modifed arithmetic coder you chose to use. Look you modified a general arithmetic coder to color you comparisons. You stated a test for a real airhtmetic file coder. You have been wrong on what arithmetic compression does so are you know saying you don't have the time to do the test you yourself proposed. Do you really think most people here belive anything your saying whem you will not do a real test? You must really think people are stupid if you cliam you have test code yet it somehow seems incapable of using real world files. How would you feel if we wrote the equivalent and changed you code to fit our needs and then the people bad mouth your code. I can tell you now you would not like it. I don't like it when you bad mouth arithmetic and then refuse to do a real file test. What are you hiding? Are you admiting in your own why it does not compress very good? I would mod you code to work with real files but then when it sucks you will scream as we are screaming that it was not done correctly. Do you comprehend anything being said here or is your mind so closed that you will not take the time to back you own claims. You can be glad I am not reviewing your stuff since the way you present it without a real test I would reject it. You might have something you might not but its clear you have not really tested it in a real world way. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/10/2006 7:31:52 PM

David A. Scott wrote: > "Matt Mahoney" <matmahoney@yahoo.com> wrote in > news:1136906071.456046.325470@g49g2000cwa.googlegroups.com: > > > > > Matt Mahoney wrote: > >> David A. Scott wrote: > >> > Matt if I can get nightlight to commit to coding his example of > >> > the 3 symols types. I would like to play again with fpaq0. To see > >> > how much better it can be made with as little change as possible. > >> > I like your style but I don't think I will go to the wall and make > >> > it bijective. But the nine times for each eight can be changed to > >> > eight for eight with a ninth only needed for the last byte. > >> > >> There is some room for improvement. I tried compressing 10,000,000 > >> bytes of random charaters A, B, C. fpaq0 compresses it to 1,982,988 > >> bytes. The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of > >> 1785 bytes. For 1,000,000 bytes it compresses to 198,322 bytes, a > >> difference of 201.7 bytes. > >> > >> -- Matt Mahoney > > > > I posted fpaq1.cpp to > > http://www2.cs.fit.edu/~mmahoney/compression/#fpaq0 > > It is an improved order 0 arithmetic coder using 64 bit arithmetic. On > > a 10MB file which repeats "ABC" it is 25 bytes over the theoretical > > limit, and I believe most of this is due to approximations made by the > > model early in compression. > > > > -- Matt Mahoney > > > > > I find this very interesting. The fact your coding 10,000,000 zeros > should at least add roughly 5.8 bytes to my anwser. Instead they > differ by 1 byte since I got 24 bytes 1,981,227 for what your claiming > optimal at 1,981,203 > > > Just complied and ran your code a 10,000,000 file that was > ABCABC...ABCA note one exta A > > My code compress to 1,981,227 from what you should was opitmal > of 1,981,203 > > You code compressed to 1,981,232 which is only 5 bytes longer > mine. So both I hope are doing what they should. I don't see > how you say this new one is 25 bytes over isn't it 29 bytes over? Oops, you're right, it's 29 bytes over. I also get 1981232 bytes. For 1,000,000 bytes I get 198,145 bytes which is 25 bytes over. fpaq1 compresses files of all zero bytes as follows: 0 -> 1 1 -> 2 10 -> 5 100 -> 9 1000 -> 13 10^4 -> 17 10^5 -> 21 10^6 -> 25 10^7 -> 29 10^8 -> 34 Here is a hex dump of the 34 byte compressed file. I'm not sure where the small inefficiencly is. FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FE 09 DE BD F7 fpaq1 is acutally worse than fpaq0 on 10^0 zero bytes. fpaq0: 10^6 -> 17 fpaq1: 10^6 -> 25 but better on all 1 bits (10^6 FF bytes): fpaq0: 10^6 -> 446 fpaq1: 10^6 -> 25 Also, Fabio Buffoni posted a version of fpaq0b that uses the 30 bit precision coder from paqar/paq6fb (carry counter and 1 bit at a time I/O). It also improves on fpaq0 using only 32 bit arithmetic. -- Matt Mahoney

0 |

1/10/2006 7:36:32 PM

Hy nightlight; > ... > > P(n) = floor[p * P(n-1)] .... (1) > > ... > > Q(n) = Q(n-1) + floor [p(x<xn) * P(n-1)] .... (2) Mei is this a nitpicking discussion. I think after this post I got your idea, so I did this picture with a (im)possible development of the probabilities. I think you mean that the limited precision arithmetic coder does not code exactly on the probability of the given source- symbol. Also you say that because there is no backfeed-correction of the modified probability, in result this quantization noise adds up. I agree with that, in the picture I hope to got it right. The blue one is the backfeed-quantizer, the green the unlimited precision-quantizer, the red the AC and the yellow the QI. I didn't read the paper of you so I don't know if yours is not maybe the blue one. I agree with that (completly) only in the case of static modeling! In the case of adaptive modeling it is in fact not true that the exact probability is the best predictor. The difference between adaptive and static modeling is that static nows it all and adaptive modeling is guessing. The fixed precision quantized _guess_ in the adaptive case may be(come) _better_ than the infinite precision _guess_. So there are sources for an adaptive modeled AC where rounding down produces _smaller_ output. For example when you always underestimate the wrongly predicted MPS. Ciao Niels P.S.: the pic is in http://www.paradice-insight.us/pics/AC.png

0 |

1/10/2006 7:41:19 PM

"Matt Mahoney" <matmahoney@yahoo.com> wrote in news:1136921792.785730.261850@g43g2000cwa.googlegroups.com: > David A. Scott wrote: >> "Matt Mahoney" <matmahoney@yahoo.com> wrote in >> news:1136906071.456046.325470@g49g2000cwa.googlegroups.com: >> >> > >> > Matt Mahoney wrote: >> >> David A. Scott wrote: >> >> > Matt if I can get nightlight to commit to coding his example >> >> > of >> >> > the 3 symols types. I would like to play again with fpaq0. To >> >> > see how much better it can be made with as little change as >> >> > possible. I like your style but I don't think I will go to the >> >> > wall and make it bijective. But the nine times for each eight >> >> > can be changed to eight for eight with a ninth only needed for >> >> > the last byte. >> >> >> >> There is some room for improvement. I tried compressing >> >> 10,000,000 bytes of random charaters A, B, C. fpaq0 compresses it >> >> to 1,982,988 bytes. The theoretical limit is 1/8 lg 3 = >> >> 1,981,203, a difference of 1785 bytes. For 1,000,000 bytes it >> >> compresses to 198,322 bytes, a difference of 201.7 bytes. >> >> >> >> -- Matt Mahoney >> > > > >> > I posted fpaq1.cpp to >> > http://www2.cs.fit.edu/~mmahoney/compression/#fpaq0 >> > It is an improved order 0 arithmetic coder using 64 bit arithmetic. >> > On a 10MB file which repeats "ABC" it is 25 bytes over the >> > theoretical limit, and I believe most of this is due to >> > approximations made by the model early in compression. >> > >> > -- Matt Mahoney >> > >> >> >> I find this very interesting. The fact your coding 10,000,000 zeros >> should at least add roughly 5.8 bytes to my anwser. Instead they >> differ by 1 byte since I got 24 bytes 1,981,227 for what your >> claiming optimal at 1,981,203 >> >> >> Just complied and ran your code a 10,000,000 file that was >> ABCABC...ABCA note one exta A >> >> My code compress to 1,981,227 from what you should was opitmal >> of 1,981,203 >> >> You code compressed to 1,981,232 which is only 5 bytes longer >> mine. So both I hope are doing what they should. I don't see >> how you say this new one is 25 bytes over isn't it 29 bytes over? > > Oops, you're right, it's 29 bytes over. I also get 1981232 bytes. > > For 1,000,000 bytes I get 198,145 bytes which is 25 bytes over. > > fpaq1 compresses files of all zero bytes as follows: > 0 -> 1 > 1 -> 2 > 10 -> 5 > 100 -> 9 > 1000 -> 13 > 10^4 -> 17 > 10^5 -> 21 > 10^6 -> 25 > 10^7 -> 29 > 10^8 -> 34 > > Here is a hex dump of the 34 byte compressed file. I'm not sure where > the small inefficiencly is. > > FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF > FF FF FF FF FF FF FF FF FF FF FF FF FF FE 09 DE > BD F7 > > > fpaq1 is acutally worse than fpaq0 on 10^0 zero bytes. > fpaq0: 10^6 -> 17 > fpaq1: 10^6 -> 25 > > but better on all 1 bits (10^6 FF bytes): > fpaq0: 10^6 -> 446 > fpaq1: 10^6 -> 25 > > Also, Fabio Buffoni posted a version of fpaq0b that uses the 30 bit > precision coder from paqar/paq6fb (carry counter and 1 bit at a time > I/O). It also improves on fpaq0 using only 32 bit arithmetic. > > -- Matt Mahoney > First of all this may be totally wrong but its my gut feeling. IN you coder X1 and X2 have to have the same bit patterns before you dump it out. Look at these two set of data using 32 bit registers. X1 0x7FFFFFFF X2 0x80000000 difference 1. and nothing written out while X1 0x12345678 X2 0x12345679 difference 1 and 3 bytes written out. I am sure that in practice its not usually that bad and for the case where you did the three symbols nothing like above popped up. But when you try all zeros or ones it may pop up and then all the sudden your actaully using a lot less than 32 bits for the state registers. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/10/2006 8:18:01 PM

"Matt Mahoney" <matmahoney@yahoo.com> wrote in news:1136921792.785730.261850@g43g2000cwa.googlegroups.com: > David A. Scott wrote: >> "Matt Mahoney" <matmahoney@yahoo.com> wrote in >> news:1136906071.456046.325470@g49g2000cwa.googlegroups.com: >> >> > >> > Matt Mahoney wrote: >> >> David A. Scott wrote: >> >> > Matt if I can get nightlight to commit to coding his example >> >> > of >> >> > the 3 symols types. I would like to play again with fpaq0. To >> >> > see how much better it can be made with as little change as >> >> > possible. I like your style but I don't think I will go to the >> >> > wall and make it bijective. But the nine times for each eight >> >> > can be changed to eight for eight with a ninth only needed for >> >> > the last byte. >> >> >> >> There is some room for improvement. I tried compressing >> >> 10,000,000 bytes of random charaters A, B, C. fpaq0 compresses it >> >> to 1,982,988 bytes. The theoretical limit is 1/8 lg 3 = >> >> 1,981,203, a difference of 1785 bytes. For 1,000,000 bytes it >> >> compresses to 198,322 bytes, a difference of 201.7 bytes. >> >> >> >> -- Matt Mahoney >> > > > >> > I posted fpaq1.cpp to >> > http://www2.cs.fit.edu/~mmahoney/compression/#fpaq0 >> > It is an improved order 0 arithmetic coder using 64 bit arithmetic. >> > On a 10MB file which repeats "ABC" it is 25 bytes over the >> > theoretical limit, and I believe most of this is due to >> > approximations made by the model early in compression. >> > >> > -- Matt Mahoney >> > >> >> >> I find this very interesting. The fact your coding 10,000,000 zeros >> should at least add roughly 5.8 bytes to my anwser. Instead they >> differ by 1 byte since I got 24 bytes 1,981,227 for what your >> claiming optimal at 1,981,203 >> >> >> Just complied and ran your code a 10,000,000 file that was >> ABCABC...ABCA note one exta A >> >> My code compress to 1,981,227 from what you should was opitmal >> of 1,981,203 >> >> You code compressed to 1,981,232 which is only 5 bytes longer >> mine. So both I hope are doing what they should. I don't see >> how you say this new one is 25 bytes over isn't it 29 bytes over? > > Oops, you're right, it's 29 bytes over. I also get 1981232 bytes. > > For 1,000,000 bytes I get 198,145 bytes which is 25 bytes over. > > fpaq1 compresses files of all zero bytes as follows: > 0 -> 1 > 1 -> 2 > 10 -> 5 > 100 -> 9 > 1000 -> 13 > 10^4 -> 17 > 10^5 -> 21 > 10^6 -> 25 > 10^7 -> 29 > 10^8 -> 34 > > Here is a hex dump of the 34 byte compressed file. I'm not sure where > the small inefficiencly is. > > FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF > FF FF FF FF FF FF FF FF FF FF FF FF FF FE 09 DE > BD F7 > > > fpaq1 is acutally worse than fpaq0 on 10^0 zero bytes. > fpaq0: 10^6 -> 17 > fpaq1: 10^6 -> 25 > > but better on all 1 bits (10^6 FF bytes): > fpaq0: 10^6 -> 446 > fpaq1: 10^6 -> 25 > > Also, Fabio Buffoni posted a version of fpaq0b that uses the 30 bit > precision coder from paqar/paq6fb (carry counter and 1 bit at a time > I/O). It also improves on fpaq0 using only 32 bit arithmetic. > > -- Matt Mahoney > > Here is the result of arb255 first on a file of all zeros then on a file of all 0xFF each of which is 1,000,000 bytes long 0000 E0 2C 30 99 A0 52 8F ED 1A 14 41 67 B1 4C 1B B5 *.,0..R....Ag.L..* 0010 EC 4A E7 25 C2 D8 60 . . . . . . . . . *.J.%..`* number of bytes is 23 0000 1F 2C 30 99 A0 52 8F ED 1A 14 41 67 B1 4C 1B B5 *.,0..R....Ag.L..* 0010 EC 4A E7 25 C2 C7 A0 . . . . . . . . . *.J.%...* number of bytes is 23 Note only the first byte and last few are different. The lack of a string of ones and zeros on output is how I do the I/O there is a mapping with a hope of making it more stable so you will not see the string of FFF or 000 for long repeats. Also as an unexpected side affect it would be better for a last pass compression that is getting ready for an encryption pass. In this case only your leading 0 for the 9 bits for 8 caused the expansion since in the case the freeend was so large it as if we both wrote out your last 1 for the count. At least in the large register cases. Note even if alternate I/O used the code most peple either break intervals so a "one" is high or low. Or they break the interval so most probalby is either high or low. For some reason I choose to do it totally different again so that it would compress better and as a side effect be better for the file if a last pass in compression before the encryption pass. I feel strongly 23 is most likely optimal and that you should with in a byte or so get the same length for fpaq0 if its all zeros or all ones. IN your case its like it used to low a probability when doing the all ones and used to high when doing all zeros. It most have something the fact you don't carry and that for this case 32 bits without carry not enough. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/10/2006 9:23:01 PM

>-- Willem wrote: > > Well, yeah. It is the 'always compresses better' that > he keeps harping on about that just simply isn't true. > And in his discussions, he keeps on demonstrating > this misunderstanding of his. > http://groups.google.com/group/comp.compression/msg/0f830d20dcd0ee50 There is no misunderstanding here. There are few frivolous ways in which "always compresses better" cannot obviously be true and which are not worth cluttering a discussion with by guarding against when talking with presumably informed participants. Hence, a statement "always compresses better" made in this newsgroup, where one can assume informed readers, should naturally be understood to include implicitly allowances such as "excluding these frivolus cases (a),(b),(c)... to which the statement does not apply". It appears that, having no argument against the substance of the claim, you have fallen to clinging onto some of these frivolus ways of circumventing "always". Let me review few of these. Say, we have agreed to a data set of 256 input samples that we wish to use to test the QI claims against. You can always "prove" the QI claims false using any the following "compression techniques" or "enhancements" of AC: a) You can write a QI work-alike (or if need be, just a clone, since the QI source is publicly available), which will create exactly the same output size as QI, at least on the given test. Hence it trivially cannot be true that QI "always compresses better" than any other entropy coder. b) You may insist that the sample inputs must be fed to the coders as "files", so that the OS will keep track of their lengths. Then you can "enhance" any existent AC (or generally any other compressor which assumes that its output must be a self-terminating code) by providing the AC decoder with the compressed length you obtained from the OS, which allows you to save approximately log(H) bits on the AC decode termination cost. If per chance the competitor doesn't follow the suit and "enhance" similarly his coder, or if he is entirely unaware that you are using this age old kind of "compressor helper", you've just pocketed a log(H) bits edge. c) You can compile the test set into your coder and decoder executables and then "compress" any of the 256 test inputs by simply transmitting to the decoder the 8 bit number telling it which of the samples to retrieve from the built in list. d) You can refine the technique (c), so that it is not as blatantly obvious and which doesn't take as much room. Instead of including all of the 256 test files into your coder, you just select one and make an "enhancement" on top of regular AC, so that for that selected input the "enhancement" outputs a single bit 0, which decoder uses to retrieve the built in pre-selected sample, and for the remaining 255 samples, it outputs 1 followed by the original output of the AC. You can further refine this, so it is harder to detect, by not storing the entire input sample, but just some section of it, and also by not going all the way to a single bit, but maybe few or some such. The AC with this "enhancement" will come out worse off by 1 bit on average, but at least it will "prove" the QI claim wrong. What's 1 little bit, compared to the great "victory". e) But even the "technology" (d), however subtle with its refinements compared to (c), still leaves some vulnerabilities. It still kind of looks like cheating, and one can get caught since there as an 'intenet' hiding in there. Plus it breaks down if the data set changes and you don't get chance to recompile for the new set. The breaktrough of the present method is to do a randomized version of (d), where you "enhance" the AC by letting a random number generator help select which of the input message patterns, which may be just small bit-string sections bound to occur by chance in any nontrivial test set, will get shorter codewords and by how much. As in (d), at some cost to the average performance, this kind of "compression technology" can "prove" the QI claim wrong, although it may take much longer time than (d) to find an actual example "disproving" the claim. But, in return for the longer wait (possibly astronomically long) compared to (d), this "compression technology" doesn't require any test set to be given upfront and it is "good" not just against QI but against any other genuine advance in the future, since it always can "beat them" at least on some inputs. And you virtually cannot get caught, since the deterministic intent of (d) has become a random fluctuation beyond human will or control. >From your latest line of arguments on 'random quantization' it would seem you have taken your final fallback position -- the "compression technology" of class (e), which is the hardened version of technique (d), which in turn was a subtler variant of method (c). The random generator is simply the set of discarded fractional parts of the AC 'range', which in turn can always be used to select a small, random fluctuation in the codeword lengths (at some cost to the average output size), hence implement the "compression technology" (e). Well, you're welcome to continue clinging onto that.

0 |

1/11/2006 8:12:00 AM

nightlight wrote: )>-- Willem wrote: )> )> Well, yeah. It is the 'always compresses better' that )> he keeps harping on about that just simply isn't true. )> And in his discussions, he keeps on demonstrating )> this misunderstanding of his. )> http://groups.google.com/group/comp.compression/msg/0f830d20dcd0ee50 ) ) There is no misunderstanding here. There are few frivolous ways in ) which "always compresses better" cannot obviously be true and which are ) not worth cluttering a discussion with by guarding against when talking ) with presumably informed participants. Hence, a statement "always ) compresses better" made in this newsgroup, where one can assume ) informed readers, should naturally be understood to include implicitly ) allowances such as "excluding these frivolus cases (a),(b),(c)... to ) which the statement does not apply". The statement 'always compresses better' is simply false as such, and should be replaced by 'compresses better on average'. I have made clear from the start that this was my argument. Your statement was and is based on a misunderstanding of how AC works. It's as simple as that. If you had read this newsgroup for longer instead of immediately starting to post, as is described in the netiquette guidelines, you would have realised that 'no compressor always outperforms another' is one of the basics here, and as this is what your statement blatantly goes against, this caused everyone to fall over you. The wise move would be to learn from this, instead of clinging to your claim and going up against everyone. SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT

0 |

1/11/2006 8:31:09 AM

> The statement 'always compresses better' is simply false as such, > and should be replaced by 'compresses better on average'. The statement "better on average" does not capture the essential relation between the two coders. They are almost twins regarding the coding efficiency and the method of enumerating messages, at least when everything else is set the same (such as coding in decerementing AC mode), except that QI quantization is optimal, while AC quantization is sub-optimal. Hence, aside for the few frivolous loopholes, the factual implication is that QI will always compress better than AC simply because its addends, which are the same between the two except for scaling factor, expand slower. The AC's truncated infinite fractions (along with the top-down enforcement of the Kraft inequality constraints), which don't exist in the integer EC/QI formulation { whether you leave them as code space gaps arising from the excess contraction of Pc or shift the burden randomly to the cost of skewed coding probabilites, using the discarded fractions as the random number generator to fluctuate probabilities, losing on average but getting around "always"} make the relation in compression efficiencies entirely one sided (not to mention speed, where the main practical difference is). When AC is set to code optimally on average, QI will always produce smaller output. That's why "always" [except the trivial] is a proper characterization. That you can trade AC optimality to get around "always" is a frivolous observation as illustrated in the listing of few such "techniques".

0 |

1/11/2006 9:13:26 AM

Hi Again, > > No. You are confused. If you always round down *interval > > boundaries*, you do not round down *interval sizes* > > because the topmost interval then gets larger. > Sorry, I gave you the wrong page & the wrong line number (that was for > similar code in the decoder). > Take a look in (41a) p. 47, the formula in the 1st line with astersk. Look, I prefer to look at working code, and I know what it does. > Note that at the top he initializes value P(0) to 2^F (where F is the > mantissa precision in bits), i.e. the quantity P is the mantissa of the > actual Pc, meaning the initial width is 1.00.... The fraction dropped > in (1) is gone from P(n) which is to be used on next step. There are no > other updates of P(n). It simply lost the fraction at its edges > irreversibly, no condition, no compensation for it anywhere else. You are confused. The value P(n) is *nowhere* stored in a sane AC encoder, it is implicit by size of the interval used to encode the symbol n. > The compensation occurs in the next line, where he calculates the new > lower boundary point of the interval, Q(n) as (again streamlined, see > paper): > Q(n) = Q(n-1) + floor [p(x<xn) * P(n-1)] .... (2) > where now p(x<xn) is conditional cumulative probability for symbols > ahead of xn. For example (this is binary coder) if xn=0, then (2) > doesn't add anything since no other symbol x is smaller than xn. If > xn=1 then (2) adds to the lower boundary Q(n-1), the product: > p(0)*P(n-1), where p(0) is probability of 0. > The same conclusion follows from the basic accounting of the precisions > and buffers used by AC: there are two real numbers < 1.0 in the coding > loop whose precision grows indefinitely for the unlimited precsion AC: > a) the Pc, which is the product of all probabilities of the symbols > encounterd in the input, No. Pc is the result of applying some kind of modelling of the input. This *need not* to relate to the relative frequencies of the symbols found in an actual sequence. A static model would have Pc *fixed* once and for all. > b) the cummulative probability Qc (which is the exact full precision > rational number < 1.0; note that the Q(n) in (2) is just an integer > representing the trailing bits of Qc as seen from the current AC > window). What is typically kept as a model is Qc, and not Pc. If you cannot keep Qc to full precision, you clearly loose coding efficiency because the "model" implied by the Qc does no longer fit to the model you intended (i.e. the coder no longer encodes the proposed model optimally, but rather another model). But nevertheless, there are no gaps. If you follow the formula (2) closely, you'd see that for the "topmost" symbol the update rule says that the upper boundary of the coding interval stays constant, whereas the lower boundary is updated and "rounded down", making the interval larger than it should, and thus making Pc larger than it should. This means a discrepancy between model and implementation, but no gaps. > The Qc is the content of the AC's encoded output. The precision of that > number does grow indefinitely, since it is the output itself. The other > number, also large with unlimited precsion AC, the message probability > Pc does not grow indefinitely. Neither Qc nor Pc have infinite precision in a realistic implementation. > Its precision is reduced in each step in > (1). Its fraction beyond the F bits precision is discarded > unconditionally and irreversibly after each new symbol -- in (1) we > multiply previous Pc with the probability of the new symbol, No, *that* is a model update. You *can* do that, but there's no need to drive the model like this. > and > truncate the result to F significant bits. If you are saying that at > the end the AC has somehow also computed the product Pc in the full > precision, then where is it stored? I never stated that Pc is kept in infinite precision. I stated that there are no gaps. In fact, Pc is *nowhere* stored. Instead, high and low interval counts are stored. > That number is _independent_ of the > value Qc, hence you would need a second number of indefinite length, > besides Qc, which would be some kind of second work buffer of the size > of output. There is no such buffer in AC. So then, this doesn't prove that there are gaps. It only proves that AC cannot implement all possible models. That is true in first place since the Qcs are quantized (and thus the Pcs) by the precision limitation. So long, Thomas

0 |

1/11/2006 9:16:09 AM

Hi again, > I gave you above the places to check, which are not some "poorly > written AC source" but the well known reference implementations of the > AC coders. So, look Moffat98 or WNC87 source or the reference [41a] > above which shows it quite clearly how the Pc is updated. Aparently, you don't read the sources correctly. > You are welcome to show a coder which doesn't round down the size of > the updated total range on every symbol. (It still has to be able to > decode, though.) Oh my, that should be your job. Ok, so here we go. The following is code from a real, existing arithmetic coder students of mine wrote. It works, is decodable and has no gaps, and it "doesn't round Pc down". I added comments: void ArithCod::Encode(UWORD low_count, UWORD high_count, UWORD total) // // This encodes a symbol i where high_count is the (scaled) probability // of finding a symbol with index smaller or equal than the current one, // and low_count is the (scaled) probability of finding a symbol whose index // is exactly smaller than the current. total is the scaling factor. // Specifically, low_count/total = Q[n-1], high_count/total = Q[n] in // your notation. low_count for the lowest symbol is therefore zero, // high_count for the topmost (last) symbol in the alphabet equals total. // m_Low and m_High are the (scaled) borders of the coding interval. { // compute scaling factor ULONG step = (m_High-m_Low+1)/total; // scale upper and lower interval borders m_High = m_Low + step*high_count - 1; m_Low = m_Low + step*low_count; // This is the update step. Now what happens for the first symbol // of the alphabet: low remains constant, high is scaled and // due to the finite precision of "step" rounded down. // For the last symbol, step * high_count = m_High - m_Low + 1 // by a simple identity, thus m_High stays constant and m_Low // is rounded down. -> The implied probability grows. // For all other symbols between, m_Low of the symbol n scales // as m_High of the symbol n-1. (Compute!) // Thus, no coding gaps and the claim that Pc is always rounded // down is refuted. // // ensure that m_High and m_Low are not in the same half // nb: here we generate the output bits! while ((m_High & m_Half) == (m_Low & m_Half)) { m_Stream.Put(m_High & m_Half); // argument casted to bool while (m_UnderflowBits > 0) { // output saved underflow bits m_UnderflowBits--; m_Stream.Put(~m_High & m_Half); } // before scaling | after scaling | output bit // ===============+===============+============ // m_Low: 00xy... | 0xy... | 0 // m_High: 00ab... | 0ab..1 | // or // m_Low: 01xy... | 0xy... | 1 // m_High: 01ab... | 0ab..1 | // m_Half is the representation of 0.5 in the precision // of the coder, namely 0x80000000 in the current implementation m_Low &= ~m_Half; // strip of 2nd MSB (we use only 31 bits!) m_Low <<= 1; m_High &= ~m_Half; // strip of 2nd MSB (we use only 31 bits!) m_High <<= 1; m_High |= 1; // Here low and high are updated and scaled. } // prevent underflow if m_Low and m_High are near to m_Half // This is the resolution of the carry-over problem. while ((m_Low & m_Quarter) && !(m_High & m_Quarter)) { m_UnderflowBits++; // before scaling | after scaling // ===============+============== // m_Low: 001xy... | 00xy... // m_High: 010ab... | 01ab..1 m_Low &= ~m_Quarter; m_Low <<= 1; m_High &= ~m_Half; // strip of 2nd MSB (we use only 31 bits!) m_High <<= 1; m_High |= 1|m_Half; } } So, that's it. Now, where is the "always rounds down" part you claim to have, and where are the coding gaps? So long, Thomas

0 |

1/11/2006 9:26:34 AM

>> Take a look in (41a) p. 47, the formula in the >> 1st line with asterisk. > > Look, I prefer to look at working code, and I know > what it does. I agree that code is ultimate judge, but only of its own implementation. The mathematical formulation given in [41a] captures more general properties, in particular the excess in the contraction of Pc due to reduced precision used for Pc computations by any finite precision AC (which was the point of contention). The reduction in Pc in turn leads to the excess in output bits. >> a) the Pc, which is the product of all probabilities >> of the symbols encountered in the input, > > No. Pc is the result of applying some kind of > modelling of the input. You are mistaking Pc with the "probabilities for the next symbol x" p(x). The latter is the result of the model computations and need not be constant or related to the frequencies. The p(x) is generally a conditional probability of the form p(xn|<x1..xn-1>) i.e. p(x) depends in unrestricted way on the entire preceding sequence. The p(x) is also "coding probability" but only for symbol x with a given preceding string. But the AC's total coding probability of the entire message, labeled here as Pc, and which enters the AC codeword length formula: L = ceiling(log(1/Pc)) + 1 .... (1) is a plain arithmetic product of all these p(x|<>) values for the symbols x encountered along the way. That includes the most general case for any probabilistic model. Check, for example [41a] p. 42, where he describes Elias coder and uses Pc (the expression in the 2nd line of the 2nd bullet, my Pc is denoted there as P(x1^n)). >> There are no other updates of P(n). It simply >> lost the fraction at its edges irreversibly, >> no condition, no compensation for it anywhere else. > > You are confused. The value P(n) is *nowhere* > stored in a sane AC encoder, it is implicit by size > of the interval used to encode the symbol n. No confusion above. I was talking there about the mathematical formulation of the algorithm from the cited thesis. Hence, except for your 1st sentence, the rest is consistent with what I said. >> ...on Qc, eq (2)... > > But nevertheless, there are no gaps. > If you follow the formula (2) closely, you'd see that for > the "topmost" symbol the update rule says that the upper > boundary of the coding interval stays constant, whereas the > lower boundary is updated and "rounded down", making the > interval larger than it should, and thus making Pc larger > than it should. You can't make the entire interval larger, unless decodability is not required). You can only partition the entire interval non-proportionately to the probabilities of the individual symbols. But the size of the entire interval at that point is Pc(n-1), when n-th symbol xn is being encoded, and that can only be reduced of left same if you are to comply with decodability. And as the 1st asterisk line shows, that value is reduced on each step by truncation. Left at that, this would result in coding interval gaps (or code space gaps) and the excess in output. These same types of code space gaps occur in QI, whenever the rounding up increments the mantissa of the next higher order binomial. It is a trivial observation that one can take any code with "code space gaps", meaning its Kraft inequality sum adds to less than 1, and remove the gaps by shortening some codewords, to increase the sum until it reaches exactly 1. In that sense, debating whether there are "gaps" or not is vacuous, since any code with gaps can be transformed into gapless code. AC coder may choose to remove gaps, e.g. by using the discarded fractional bits of the Pc as an equivalent of a random number generator, to randomly pick which codeword lengths to shorten in order to fill the gaps. Moffat98 coder is example of such procedure, which in each step biases up (relative to exact AC) the coding probability p(MPS) of the most probable symbol (MPS) and down p(LPS), of the less probable symbol, while keeping their sum fixed. That does avoid rounding down of the Pc shown in [41a]. But that doesn't preclude excess contraction of Pc. It merely pushes it to the renormalization procedure, where they feed in 0s into the lower bits of their full range R, mantissa of Pc (until it reaches at least a "Quarter" of full AC window, 32 bits). When coding MPS, whose sub-range R(MPS) was expanded by rounding, that feeding of 0s doesn't help or hurt as far as R(MPS)'s closeness to the exact arithmetic value xR(MPS). But for the subrange R(LPS), which is already smaller than its exact counterpart xR(LPS), feeding of 0s during normalization is precisely the rounding down of the Pc shown in [41a]. Since the R(LPS)<R(MPS) (by definition), the requests for the normalization of R, which get triggered when R drops below Quarter, will be triggered more often by the LPS encoding than by the MPS encoding, the net result of which is the excess contraction of Pc, only done at different point in the coding loop than that shown in [41a]. A side-effect of Moffat98 choice is that their coder introduces a random fluctuations in codeword lengths, which are biased to kick up the codeword lengths of LPS and down those of MPS. That type of random small deviations in probabilities, dp and dq, from the "optimum" p and q (for the given model) will generally increase the output size as dp^2/p + dq^2/q, which can be large when p or q -> 0. Hence, while one can trivially get rid of the gaps (which the explicit form of contraction of Pc shown [41a] introduces), doing it via random fluctuations driven by the discarded bits of the Pc as a pseudorandom number generator, will normally cost in the increased average code length. >> The precision of that number does grow indefinitely, since >> it is the output itself. The other number, also large >> with unlimited precision AC, the message probability Pc >> does not grow indefinitely. > > Neither Qc nor Pc have infinite precision in a > realistic implementation. The Qc, which is the full value (as rational number < 1) of the interval base, grows as large as the output. That is the output (with the redundant "0." omitted). The Pc only determines the length L of the output via eq. (1). Since the computed value Pc is constrained not to increase above its exact value, any reduction in its precision requires Pc to be kept on the safe side of decodability constraint, which means farther away from the exact value (which is optimal relative to the model being used). >> Its precision is reduced in each step in >> (1). Its fraction beyond the F bits precision is >> discarded unconditionally and irreversibly after >> each new symbol -- in (1) we multiply previous Pc >> with the probability of the new symbol, > > No, *that* is a model update. Adding one more label to a specific arithmetic procedure doesn't change anything. I was describing what happens in terms of the arithmetic operations and bits discarded in Pc, which is correct - Pc is kept within F significant bits of precision, the rest of its bits (it gains approx. F bits on each multiply with p(xn)) simply gets discarded on each coding step. Call it "model update" or anything else you wish, the arithmetic effect is a loss of precision of Pc. Since Pc is not allowed to grow (lest it violates decodability), the only place the loss of precision has to go is down, which is the excess contraction of Pc shown in [41a] as an explicit truncation, or as done in a round about and spread around way, during normalization, in Moffat98, with its biased codeword lengths fluctuations. >> If you are saying that at >> the end the AC has somehow also computed the >> product Pc in the full precision, then where >> is it stored? > > I never stated that Pc is kept in infinite precision. > I stated that there are no gaps. In fact, Pc is *nowhere* > stored. Instead, high and low interval counts are stored. Pc is being computed, in effect as floating point number with a max F bit mantissa. The AC just doesn't keep track of its exponent, using instead kludgey ad hoc rules for scaling (which amounts to round about way of keeping a 2 bit exponent modulo 4, stored as the Quarters position of R in the AC window) and that it allows mantissa width to fluctuate few bits below F. The SWI formulation of AC coding, as in EC/QI, with explicit exponent makes the procedure much cleaner and more logical (and probably a bit faster than the branching around kludgey Quarters and zooming; I didn't implement it, though, so maybe it won't turn out faster). > So then, this doesn't prove that there are gaps. > It only proves that AC cannot implement all possible > models. It proves that there is a mathematical formulation of the AC coder algorithm which shows explicit creation of the gaps arising from the excess in contraction of Pc. (resulting from the explicit truncation of Pc after the multiplication with p(xn)). One can do it differently and explain the net contraction of Pc some other way, e.g. by adjusting coder variables so that it looks as if it wasn't a lost precision but that there was "random model fluctuation" which accounts for deviation of Pc from its exact counterpart Pcx. It's a kind a of "explanation" that reminds me of a kid who trips and falls, while a parent is watching, and kid starts pretending to be looking for something on the floor, saying, I didn't really trip and fall, I actually meant to come down here to look for something.

0 |

1/11/2006 12:02:40 PM

> So, that's it. Now, where is the "always rounds down" part you claim > to have, and where are the coding gaps? As explained with the Moffat98 coder in the previous post, you can easily avoid the "always" part (but not the net reduction in Pc; just follow it through the normalization and check in which case the shifting in of 0s occurs more often), but only at the expense of paying an extra cost on the expected codeword lengths. That is a special case of a general tradeoff in coding, and if you proceed just few more steps in that direction, you will be able to "beat" any coder on any message that it encodes in more than one bit. Of course, you will be coding with an average excess of 1 bit.

0 |

1/11/2006 12:27:55 PM

"nightlight" <nightlight@omegapoint.com> wrote in news:1136967120.140527.123410@g49g2000cwa.googlegroups.com: > > b) You may insist that the sample inputs must be fed to the coders as > "files", so that the OS will keep track of their lengths. Then you can > "enhance" any existent AC (or generally any other compressor which > assumes that its output must be a self-terminating code) by providing > the AC decoder with the compressed length you obtained from the OS, > which allows you to save approximately log(H) bits on the AC decode > termination cost. If per chance the competitor doesn't follow the suit > and "enhance" similarly his coder, or if he is entirely unaware that > you are using this age old kind of "compressor helper", you've just > pocketed a log(H) bits edge. > > Yes we here like to use files so the con man can't play games. Especially those who claim they have some method that always compresses better. I guess this is as close as you will come to not only admiting that your statement about always compressing better was flat wrong. But that you can't at this point of time get QI to even coder better than an arithmetic based on your own suggested test. Which you assumed wrongly that arithmetic would fail. You failed because even though you can quote texts you fail to understand them and failed to realize one can code arithmetic without gaps. I suspect at this point of time you realize people here are not as dumb and as easily bullied as your workmates who don't want to waste the time to try to educate you. Yes it is slick how an AC coder suits its self to what you are calling "" this age old kind of "compressor helper" "" What I see is someing crying that his code is not that good so attempt to put real code down by calling it "age old" thats a real laugh. Do you have any references to this so called age old method or are you just pissed that you don't know how to use what you so nicely call an "age old" method. First of all I doubt its "age old" but I see why you claimed it was. If you pay me enough cash I could fix your coder to work properly without gaps at least for small files. Can you do that since its an "age old" method. It shouldn't be hard if you have any real programming experience. But in the end even if you do get it working it can't be better than an optimal bijective arithmetic coder. It might be faster but it will not be a better entropy compressor to think so is rather foolish. My guess is that you actually tried to beat the test you want people to write a arithmetic coder for and realized that you can't beat it. Why should you write code where the common man can actually test your code. You seem to be some sort of control freak and want people to honor you with actually running real world tests maybe you can do that it work but here many will just laugh at you. Or laugh along with you. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/11/2006 3:19:42 PM

> Mei is this a nitpicking discussion. I think after this post > I got your idea, so I did this picture with a (im)possible > development of the probabilities. It is a bit nitpicky, although that makes everyone go over the things they imagined they knew. A good refresher for the old thoughts and it brings in new thoughts which none of us would have thought on their own. These kinds of "nitpicky" arguments take a life on their own. > I agree with that, in the picture I hope to got it right. The > blue one is the backfeed-quantizer, the green the unlimited > precision-quantizer, the red the AC and the yellow the QI. > I didn't read the paper of you so I don't know if yours is > not maybe the blue one. How was that graph made? Did you measure some errors? Interesting observation, in any case. Indeed, even though QI provides the optimum quantization of enumeration at any given _fixed_ coder precision, that is not the best one can do in principle, since the fixed precision is a mere programming convenience. In the QI source kit, the radix and the binary entropy coders use different kinds of quantization, with the radix & permutation coders being more optimal since they perform "delayed rounding" within any one quantized power of radix (see [T3] p. 7). That would be roughly equivalent of binary coder only quantizing every m-th row, with m up to 32 (see [T3] p.9, N2) and reconstructing the m-1 rows in between on the fly, without additional quantization. As result the radix & permutation coders redundancy is even lower than coding the same input using regular multi-alphabet coder (which uses regular binary coder as its sub-task coder). When D. Scott posted his "nightlight challenge" I did run his alpha=3 array setup on a million digit array, using the sample program QI.exe from the QI source kit, and the redundancy was almost undetectable, about 1.6e-10 bits/digit. The binary coder would have had about 5e-8- redundancy on the similar size input (both still much smaller than AC's redundnacy). That test was described in the post: http://groups.google.com/group/comp.compression/msg/ff1ee67d18b63f5a > So there are sources for an adaptive modeled AC where rounding > down produces _smaller_ output. For example when you always > underestimate the wrongly predicted MPS. The Moffat98 AC uses quantization which as a side-effect systematcally skews the prediction in favor of overpredicting the MPS, underpredicting LPS. That is a good feature if you're tuning the coder for benchmarks on stationary Bernoulli sources, since the coder locks into the stationary distribution quicker. It does cost it bit on the average redundancy, which is above what it can be. It also is a hard-hardwired "feature" in the very center of the arithmetic of its coding, not something that a modeler can control, in case that is not how it wishes to skew the probabiliies. It is a bad idea, in my view, for a coder to take it upon itself to make such decision on how to bias the codeword lengths so systematically. QI Source & preprints are at: http://www.1stworks.com/ref/qi.htm

0 |

1/11/2006 3:55:26 PM

> I guess this is as close as you will come to not only admiting > that your statement about always compressing better was flat wrong. As explained in that post, I did assume informed enough participants, to bother spelling out what I though was well understood by everyone. That post was about the wrong assumption, not about the wrong statement. The statement was and is fine, under the conditions explained in the post. Although, as Willem helpfully noted, comp.compression isn't a place one ought to make it flat out in such a form, due to particular sensitivities in here. That he may be right about. I do have a bit of wooden ear, or so I am told every now and then, for these kinds of vibes and social clues.

0 |

1/11/2006 4:19:07 PM

Hi, > >> a) the Pc, which is the product of all probabilities > >> of the symbols encountered in the input, > > > > No. Pc is the result of applying some kind of > > modelling of the input. > You are mistaking Pc with the "probabilities for the > next symbol x" p(x). The latter is the result of the > model computations and need not be constant or related > to the frequencies. There's no "mistaking" here; Modelling the input means that you imply some model that generated the sequence, and thus it is a probability. > The p(x) is generally a conditional > probability of the form p(xn|<x1..xn-1>) i.e. p(x) > depends in unrestricted way on the entire preceding > sequence. We are in agreement here, no need to discuss. > But the AC's total coding probability of the entire > message, labeled here as Pc, and which enters > the AC codeword length formula: > L = ceiling(log(1/Pc)) + 1 .... (1) > is a plain arithmetic product of all these p(x|<>) > values for the symbols x encountered along the way. Properly rounded/approximated, though. > That includes the most general case for any probabilistic > model. Check, for example [41a] p. 42, where he > describes Elias coder and uses Pc (the expression > in the 2nd line of the 2nd bullet, my Pc is denoted > there as P(x1^n)). I'm still not arguing here. If Pc is the probability of the output message, then fine (I would believe it was something different in your example), but Pc itself is nowhere aparent in the implementation, and thus need not to be kept. What *is* quantized are the terms that contribute the Pc --- no arguing about that --- but still in a way such that there are no coding gaps. That is, the individual estimations an AC model can make are *different* from what the original model intended due to quantization, and by that, you get an ineffiency I do not deny. What I *do* deny is that there is unused code space due to "round down". > > But nevertheless, there are no gaps. > > If you follow the formula (2) closely, you'd see that for > > the "topmost" symbol the update rule says that the upper > > boundary of the coding interval stays constant, whereas the > > lower boundary is updated and "rounded down", making the > > interval larger than it should, and thus making Pc larger > > than it should. > You can't make the entire interval larger, unless decodability > is not required). You can only partition the entire > interval non-proportionately to the probabilities of the > individual symbols. Yes. > But the size of the entire interval > at that point is Pc(n-1), when n-th symbol xn is being > encoded, and that can only be reduced of left same if > you are to comply with decodability. No, the size of the interval you divide and you quantize into is Pc(n-1) * 2^m where m is the number of scaling steps performed so far (if you scale by doubling, that is): This is the interval which defines the coarseness of the "quantization" of the model probabilities. > And as the 1st > asterisk line shows, that value is reduced on each step > by truncation. Left at that, this would result in coding > interval gaps (or code space gaps) and the excess in output. No, and no. You make the coding interval smaller, but by making it smaller, the scaling of the sub-intervals into the full interval space gets coarser, and thus the quantization of probabilities coarser, and this until the code space gets re-scaled. However, the *subdivision* of the full interval into the sub-intervals for each symbol is always such that the entire coding interval gets used. Thus, no gaps here. > These same types of code space gaps occur in QI, whenever > the rounding up increments the mantissa of the next higher > order binomial. > It is a trivial observation that one can take any code > with "code space gaps", meaning its Kraft inequality sum > adds to less than 1, and remove the gaps by shortening > some codewords, to increase the sum until it reaches > exactly 1. In that sense, debating whether there are > "gaps" or not is vacuous, since any code with gaps > can be transformed into gapless code. Statement: The presented AC has no gaps. This holds for quite a lot of AC implementations, including Moffat's and Nelson's. It does not hold for ELS. > AC coder may > choose to remove gaps, e.g. by using the discarded > fractional bits of the Pc as an equivalent of a random > number generator, to randomly pick which codeword lengths > to shorten in order to fill the gaps. There are no bits of Pc discarded, simply because there's nothing to discard. Pc is never kept/represented in an AC coder. It gets represented/computed step by step. > Moffat98 coder is example of such procedure, which in each > step biases up (relative to exact AC) the coding probability > p(MPS) of the most probable symbol (MPS) and down p(LPS), > of the less probable symbol, while keeping their > sum fixed. You are talking about a *different* thing here. The AC coder does not care about p(LPS) and p(MPS) and there is no interpretation of probabilities here. You can keep these numbers fixed if you know the source you compress. It keeps working in any case, just not as a good compressor. By biassing p(LPS) / p(MPS), Moffat builts up a Markov model that might or might not appropriate to the source. > That does avoid rounding down of the Pc shown > in [41a]. But that doesn't preclude excess contraction of Pc. I don't see what you mean by "contraction" of Pc. There are two very different things happening there: 1) The coding of symbols, for given interval division. This is the AC coder. 2) Choosing the interval division such that it fits to a given model of the source. For 1), there are absolutely no coding gaps, regardless of limited precision of the interval, the interval subdivision and so on. For 2), the model that is implicitly used in 1) differs from the model that is implied by keeping the symbol counts. *That* is a problem, but it does not cause any unused bits in Pc. It is rather a "modelling error" due to not fitting the desired model due to quantization. > It merely pushes it to the renormalization procedure, > where they feed in 0s into the lower bits of their full > range R, mantissa of Pc (until it reaches at least a > "Quarter" of full AC window, 32 bits). It doesn't push "unused bits into Pc". The interval there is *NOT* Pc, it is rather "the 32 low-order bits" of Pc, where the high-order bits are already in a file, and the number of bits a carry can propagade into are kept in a counter. Thus, the implied Pc at the start of the coding procedure is 0, or to be more clear, an infinite string of zeros representing this zero as infinite binary fraction. By upscaling, all you do is to move - as a sliding window algoritm - the *still* zero bits into the internal accumulator of the algorithm, ready to modify it, and move the already computed bits out of Pc, writing it into the file (or counting them to resolve carry-over propagation). > When coding MPS, whose sub-range R(MPS) was expanded by rounding, that > feeding of 0s doesn't help or hurt as far as R(MPS)'s > closeness to the exact arithmetic value xR(MPS). But > for the subrange R(LPS), which is already smaller than > its exact counterpart xR(LPS), feeding of 0s during > normalization is precisely the rounding down of the > Pc shown in [41a]. But no way! All this implies is that the current subdivision of the full coding interval is only precise to M bits, where M is the number of bits written so far, plus the bits represented in the carry over- counter, plus the bits kept in the accumulator representing the coding interval. By multiplying this accumulator with a power of two, you just alter the window "view" onto Pc, i.e. change the coordinate system, but you do not at all alter any interval. Example: Binary arithmetic encoder, Moffat/Nelson Coding interval [0,874] Coding interval for the MPS: [0,513) Coding interval for the LPS: [513,874] After upscaling (note that the implicit m_High is scaled by two, then 1 added!) Coding interval [0,1749] MPS subinterval [0,1026) LPS subinterval [1026,1749] *NOTE* the exlusive upper boundary between the intervals, the inclusive boundary for the total coding interval. That is, any number of the accumulator value that is smaller than 513 before the scaling, or smaller than 1026 after the scaling is an MPS, otherwise it is an LPS. By upscaling, you *neither* alter the ratio of MPS to LPS, nor the placement of the accumulator value, nor any gap is opening here. Note that the size of the interval is upper bound (exlusive!) - lower bound. The interval [0,11] has 12 points, [0,11) just 11. > Since the R(LPS)<R(MPS) (by > definition), the requests for the normalization of R, > which get triggered when R drops below Quarter, will > be triggered more often by the LPS encoding than by > the MPS encoding, the net result of which is the excess > contraction of Pc, only done at different point in the > coding loop than that shown in [41a]. A side-effect > of Moffat98 choice is that their coder introduces > a random fluctuations in codeword lengths, which are > biased to kick up the codeword lengths of LPS and > down those of MPS. That type of random small deviations > in probabilities, dp and dq, from the "optimum" p > and q (for the given model) will generally increase > the output size as dp^2/p + dq^2/q, which can be > large when p or q -> 0. *That* effect is caused by the unability to represent the model parameters (in order not to call them probabilities) to the model, that is, you cannot subdivide any given interval exactly as you would need to. I don't say that this cannot be large. What I say is that this is not due to gaps in the subdivision of the interval. > Hence, while one can trivially get rid of the gaps > (which the explicit form of contraction of Pc shown > [41a] introduces), doing it via random fluctuations > driven by the discarded bits of the Pc as a pseudorandom > number generator, will normally cost in the increased > average code length. *Sigh* No. There are no unused bits. Every number in the interval subdivision belongs to either LPS or MPS, and if scaled consistently (namely, multiply number by two) the same number belongs to the same subinterval afterwards, and every number after upscaling belongs to either the LPS or the MPS, and there is no single "unused" number not assigned to any of the two. > > So then, this doesn't prove that there are gaps. > > It only proves that AC cannot implement all possible > > models. > It proves that there is a mathematical formulation > of the AC coder algorithm which shows explicit creation > of the gaps arising from the excess in contraction of Pc. > (resulting from the explicit truncation of Pc after > the multiplication with p(xn)). In that case, if the implementation shows no gaps, but the mathematical formulation does, this means that a) the mathematical description is invalid, or b) that you failed to read it. (-: > One can do it differently and explain the net contraction > of Pc some other way, e.g. by adjusting coder variables > so that it looks as if it wasn't a lost precision but > that there was "random model fluctuation" which > accounts for deviation of Pc from its exact counterpart > Pcx. It's a kind a of "explanation" that reminds me of > a kid who trips and falls, while a parent is watching, > and kid starts pretending to be looking for something > on the floor, saying, I didn't really trip and fall, > I actually meant to come down here to look for something. So what *is* a coding gap for you, then? For me it means unused code space. And there simply is none. So long, Thomas

0 |

1/11/2006 4:52:57 PM

"nightlight" <nightlight@omegapoint.com> wrote in news:1136994926.763677.126950@g44g2000cwa.googlegroups.com: ..... > coder). When D. Scott posted his "nightlight challenge" I did run his > alpha=3 array setup on a million digit array, using the sample program > QI.exe from the QI source kit, and the redundancy was almost > undetectable, about 1.6e-10 bits/digit. The binary coder would have had > about 5e-8- redundancy on the similar size input (both still much > smaller than AC's redundnacy). That test was described in the post: > .... Actually you never really described it fully in your post. How many bits did this so called test of yours compress to? And did you include the lengths of the extra fields for your so called length and your so called number of ones? And was it really a static zero entropy type of compression where each of the 3 symbols assumed equally likely? That is what you wanted others to do with the arithmetic. And last but not least what are the results with different random million digit arrays? I know you suffer from a wooden ear as you state but are you capable of actually anwsering these simple questions. Or is it like arithmetic coding in the sense your not really sure what your own code does you obviously don't seem to understand arithmetic coding based on your previous posts. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/11/2006 6:06:56 PM

>> Moffat98 coder is example of such procedure, which in each >> step biases up (relative to exact AC) the coding probability >> p(MPS) of the most probable symbol (MPS) and down p(LPS), >> of the less probable symbol, while keeping their >> sum fixed. > > The AC coder does not care about p(LPS) and p(MPS) > and there is no interpretation of probabilities here. > I am calling them (empirical) "probability" here. In Moffat98 code these are simply ratios of counts to total count. What I am saying above is that M98 systematically overvalues the p(MPS) and gives it always relatively larger interval than to p(LPS), which is undervalued. > There are no bits of Pc discarded, simply because > there's nothing to discard. .... > I don't see what you mean by "contraction" of Pc. .... > It doesn't push "unused bits into Pc". The interval > there is *NOT* Pc, it is rather "the 32 low-order > bits" of Pc, where the high-order bits are already > in a file, and the number of bits a carry can > propagade into are kept in a counter. Thus, the > implied Pc at the start of the coding procedure > is 0, or to be more clear, an infinite string of > zeros representing this zero as infinite binary fraction. ..... The above shows a major confusion somewhere on what Pc is. You started at the beginning of the post, on the issue of Pc being the (coding) message probability, apparently fine, But somewhere before the Moffat98 section, you're suddenly talking of Pc as if that were the Qc, the cumulative probability of the message, which is the AC output. In other places you're interpreting Pc as it mantissa of Pc, the F bit integer Pc(n) (from page 47 on WNC encoder pseudo code). Yet you also have Qc involved all along as well, so there is some major crossing of wires on these two (or three) over there. Consequently anything I said after the 'Pc is a product' section (which we now agree on, or so I thought), was interpreted and responded to absolutely without any contact at all with what I was saying. So I'll leave those replies alone for the moment, until the basic definitions of the two key symbols is in sync. Since this may be only a minor fall out of sync regarding the Volf's AC formulation (which is the same one as in his advisors' Tjalkens thesis, [36], both have many details and highlight AC aspects and perspective you won't find anywhere else, especially [36]), I will just point you to the p. 43 in [41a] where Qc and Pc arithmetic is shown, with their bits aligned as they get added. It may be that my use of Pc(n), which is the mantissa of Pc after n symbols have been encoded ( Pc = Pc(n) * 2^k, where k is exponent of Pc) has led to confusion of Pc and Pc(n). How did Pc get crossed with Qc, as shown in the quote above, I have no idea. On p. 43, you see the bits of Pc being added to the cumulative probability Qc (which is the AC output). The only bits of Pc whose values are nonzero are the most significant F bits of Pc, which are held in the integer Pc(n). The exponent of Pc is there integer k. Pc is simply a floating point number, with mantissa Pc(n) and exponent k. Qc is not a floating point number but an unlimited precision binary fraction (its bits are shown as q1,q2... on p. 43). I also label throughout the exact values of Pc and Qc, as Pcx and Qcx. Another relevant number is probability p(xn) of the n-th symbol found, which is xn. The p(xn) is another floating point number with max F bits of precision. It has mantissa and exponent, e.g. p(xn)= p.M * 2^p.E. We could similarly denote Pc = Pc.M * 2^Pc.E, where our integer Pc(n-1) is mantissa Pc.M, and k from their figure on p. 43 is exponent Pc.E. On page 47 (with WNC coder) the first line with the asterisk, shows multiplication of the floating point number Pc with another floating point number p(xn) in a form of a truncated multiplication of p(xn), which is left as an F bit floating point number, with integer mantissa of Pc after n-1 symbols have been encoded, the integer Pc(n-1). Note that p(xn) is written as conditional probability on p. 47, as we discussed and agreed on. The first critical point of mixed up symbolism was what happens in the product. That product represent multiplication Pc * p(xn). Obviously, the actual multiplication done there is done on two mantissas, while their exponents get added, i.e. the low level format (as if implementing floating point mul yourself) product in the first asterisk line on p. 47 is: Pc * p(xn) = (Pc.M * p.M) * 2^(Pc.E+p.E) ... (1) Since the product of mantissas (Pc.M * p.M) generates 2*F bits, the lowest F bits of the product are discarded. The usual AC scaling and normalizations and windows... are simply the round about way of doing the plain floating point manipulation of mantissas and exponents in disguise. You can now go back to addition fig on page 43, and see that is simply a floating point number Pc being added, now _without any approximations_ (ignoring for a moment carry blocking), to the unlimited precision binary fraction Qc. As with QI SWI arithmetic, even though Qc is unlimited, the sum on p. 43 is just a regular integer addition of Pc.M at the right place (given by Pc.E, which is k) into the Qc. This is all exacly same as done with QI's SWI arithmetic, where eq. (21) in [T3] is the approximate arithemtic, with rounding up to compute addends (to be stored into table), while the index computation via adding of table addends into the accumulated index I, eq. (22), is exact. QI's index "I" is AC's index Qc (a cumulative prob.). QI's quantized addends C(n,k) are AC's truncated addends computed in eq. (1) (for binary coder there is just one addend, being added when LPS is found, the same as in QI, except that AC must calculate via (1) all the intermediate addends which don't get added on MPS, while QI has them precomputed in the universal table which is independent of source probabilities). I will pause here and leave you to synchronize your definitions and notation, after which you should realize that all that I said, to which you seemed to objected above, was perfectly correct, when the correct definitions and semantics for the symbols are applied. --- References 36. T.J. Tjalkens "Efficient and fast data compression codes for discrete sources with memory" Ph.D. thesis, Eindhoven University of Technology, Sep 1987 http://alexandria.tue.nl/extra3/proefschrift/PRF5B/8709277.pdf 41a. .A.J. Volf "Weighting Techniques In Data Compression: Theory and Algorithms" Ph.D. thesis, Eindhoven University of Technology, Dec 2002 http://alexandria.tue.nl/extra2/200213835.pdf

0 |

1/11/2006 7:46:32 PM

This is the post: http://groups.google.com/group/comp.compression/msg/ff1ee67d18b63f5a All you are asking is right there, answered, plus many you didn't ask. For example, you ask here: > How many bits did this so called test of yours compress to? There it answers: ---- The size it produces is 1584962.50... bits, which compared to the exact N*log(3) entropy has an excess of 1.62 e-04 bits on the total of 10^6 symbols (i.e. the excess per symbol is 1.6e-10 bits). --- > And did you include the lengths of the extra fields for your > so called length and your so called number of ones? There it says how it added these numbers to match the AC coding conditions from the numbers QI.exe gave it (which wasn't running a test for AC but stand alone function) -- It says righ below it added 70 bits (which was very generously rounded up, all pieces separately): -------- To compare that with AC output size, one option is to make AC work in static mode without adapting to probabilities and make it not count the transmission of frequency table or number of symbols n (which is the same condition that the QI.exe figure applies to). Alternatively, you can add to QI's output the size to transmit N, A and the frequency table. QI.exe has a command QI cl<int> which computes self-delimiting size for <int>, or just "QI cl" to list a table for common values. There you get for N=10^6 its self-delimiting length L(N)=27.543 and for L(A)=2.49 bits. The cost for frequency table with QI/EC is the log of the binomial C(N+A-1,A-1), for N=10^6 and A=3, which is log(C(1000002,2))=38.863 bits, which totals (each rounded separately, which they don't need to) 28+3+39=70 bits to be added to QI's output to match the adaptive AC's coding conditions. Since the QI's output was essentially the entropy, the QI's total is 70 at most whole bits above the "entropy" (note the "entropy" N*log(3) didn't include N; also in high entropy limit QI doesn't need to transmit freq. table, but one would need to modify AC to work in high entropy limit, so I added table to QI, which distorts a bit comparison to entropy H). ----------- > And was it really a static zero entropy type of compression where > each of the 3 symbols assumed equally likely? That was answered right there on the top line: the equiprobable symbols are "high entropy limit" (as opposed to "low entropy limit" which is for highly sparse arrays, which one symbol vastly dominating others). ------------ The QI.exe file which you may already have (from the source; current source version is 1.03) has a command line option to test it on that same input (which is a high entropy limit for multi-alphabet coding, and which in I call radix codes): QI cr3 n1000000 i100 which tells it to code inputs in radix 3 (this can be any 32 bit value above 2), to use input of 1 million symbols (there is a constant MAXRDIG in Intro.h which limits the input size to max 2^20 or 1M digits, you can change that to allow larger sizes e.g. to 16 MEG) and to run the test 100 times on 100 random inputs (i100 for 100 iterations). ---------------------------- The comparison to Moffat98 answer (for 10^6 symbol file): QI output (adjusted generously to favor AC) came out 3403 bits shorter than AC's: ---------------------------- Now, running the Moffat98 coder (in 8 bit max symbol coding mode & frugal bits enabled), it outputs: 1588435.52 bits (this is avg. over 100 iterations), which is 3473 bits above the entropy, or 3403 bits above the comparable QI output size. (Note that Mofat98 coder has generally a slight bias to code worst for the max entropy inputs, but it gains in return on very low entropy inputs.) ------------------------------- All that you can run right there and look at the source that there was no cheating. The test was run on 100 random inputs. And each encode decode cycle checked that decoded data matches inpuit. For QI, the outputs all come out to exactly same size, i.e. the QI takes for size in high entropy limit the max one can get for converting number of 10^6 digits given in radix 3 into binary number (which is QI's output for high entropy limit coding, essentially the radix coding/decoding).

0 |

1/11/2006 8:06:24 PM

> It is a bit nitpicky, although that makes everyone go over the things > they imagined they knew. A good refresher for the old thoughts and it > brings in new thoughts which none of us would have thought on their > own. These kinds of "nitpicky" arguments take a life on their own. I think this is not the group to discuss this in that form, Bloom, Ross and the other 'giants' that were able to span the bridge between pure information-theory and implementation left or are not willing to participate any more. Being patient, unprejudge and curious is also a rare quality in the usenet. With all due respect. :) >>I agree with that, in the picture I hope to got it right. The >>blue one is the backfeed-quantizer, the green the unlimited >>precision-quantizer, the red the AC and the yellow the QI. >> I didn't read the paper of you so I don't know if yours is >>not maybe the blue one. > How was that graph made? Did you measure some errors? x-axis is symbol_at_pos(x) f(x) = -p(symbol_at_pos(x)) after/with quantization The infinite precision arithmetic coder doesn't have quantization-noise, in the context of static modeling, so it's the 'optimum' to measure the other coders against. The bigger the difference between green and any one of the others, the bigger the inefficiency introduced by quantization. The y-axis has no legend, because it's only to show the difference between real probability and quantized proability and should be considered as extremly zoomed. The blue backfeed somehow-coder is a quick idea, I guess modifying an AC with self-correction in that way is horrible complex (in relation to without). > Interesting observation, in any case. Indeed, even though QI provides > the optimum quantization of enumeration at any given _fixed_ coder > precision, that is not the best one can do in principle, since the > fixed precision is a mere programming convenience. In the QI source > kit, the radix and the binary entropy coders use different kinds of > quantization, with the radix & permutation coders being more optimal > since they perform "delayed rounding" within any one quantized power of > radix (see [T3] p. 7). That would be roughly equivalent of binary > coder only quantizing every m-th row, with m up to 32 (see [T3] p.9, > N2) and reconstructing the m-1 rows in between on the fly, without > additional quantization. So in principle you're providing a "virtually" bigger and alinear coding-register? Even adaptive alinear? > As result the radix & permutation coders > redundancy is even lower than coding the same input using regular > multi-alphabet coder (which uses regular binary coder as its sub-task > coder). Hmm, technically/mathematically the decomposition of a multi-alphabet coder into a multi-step binary coder is identical. But it never appears because then every step within the coding of a single symbol raises quantization-noise. With decomposition I mean binary choices MPS yes/no, SecPS yes/no, ThrPS yes/no, ..., LPS. It's easier to tune nevertheless. > When D. Scott posted his "nightlight challenge" I did run his > alpha=3 array setup on a million digit array, using the sample program > QI.exe from the QI source kit, and the redundancy was almost > undetectable, about 1.6e-10 bits/digit. The binary coder would have had > about 5e-8- redundancy on the similar size input (both still much > smaller than AC's redundnacy). That test was described in the post: Hehe, _much_ smaller here means whole percentages, if not tenth of. :) > ... It is a bad idea, in my view, > for a coder to take it upon itself to make such decision on how to bias > the codeword lengths so systematically. Yes, but it's an understandable approach in the context of testing against small (and handselected) corpi. For my image- compression project I tried to compress a lot very nice looking fractals too, it's my over-adaption testbed. Ciao Niels

0 |

1/11/2006 9:27:00 PM

"nightlight" <nightlight@omegapoint.com> wrote in news:1137009984.700743.75690@g44g2000cwa.googlegroups.com: > This is the post: > http://groups.google.com/group/comp.compression/msg/ff1ee67d18b63f5a > > All you are asking is right there, answered, plus many you didn't ask. > For example, you ask here: > >> How many bits did this so called test of yours compress to? > > There it answers: > ---- > The size it produces is 1584962.50... bits, which compared > to the exact N*log(3) entropy has an excess of 1.62 e-04 bits on the > total of 10^6 symbols (i.e. the excess per symbol is 1.6e-10 bits). > --- > I assume if your coder is honest that the number you got represents the average of 100 runs its very close to the real number N*log(3) 1584962.50072115618145373894394782 As you can tell this number makes your code shine and its better than the real entropy. You must have got 50 cases where it took 1584962 bits and 50 where it got 1584963 a change of just one bit in 1 out of the 100 cases and you would get a different anwser I wonder what the odds of that are. Very interesting your coder puts out fractional bits. Why do I doubt that. And again you don't answer the total questions. You mention in paper there are 3 parts. One part is the lenght of thing compressed and the other is the number of ones and the third which I assume the above is the index. >> And did you include the lengths of the extra fields for your >> so called length and your so called number of ones? > > There it says how it added these numbers to match the AC coding > conditions from the numbers QI.exe gave it (which wasn't running > a test for AC but stand alone function) -- It says righ below it added > 70 bits (which was very generously rounded up, all pieces separately): > So lets not be generous and round down. from the combination part you get 1584962 for the rest of needed overhead you need say 69 bits. Would it be far to say that you compressed the 1000000 sybmols to 1585031 bits, Let's be clear about this the list is for 3 symbols the total bits only refers to the compressed combinations and file lenght it does not carry in those bits just exactly what those symbols are. If thats not correct could you give a straight anwser. If it is correct could you confirm it with a yes. > -------- > To compare that with AC output size, one option is to make AC work in > static mode without adapting to probabilities and make it not count the > transmission of frequency table or number of symbols n (which is the > same condition that the QI.exe figure applies to). > > Alternatively, you can add to QI's output the size to transmit N, A and > the frequency table. QI.exe has a command QI cl<int> which computes > self-delimiting size for <int>, or just "QI cl" to list a table for > common values. There you get for N=10^6 its self-delimiting length > L(N)=27.543 and for L(A)=2.49 bits. The cost for frequency table with > QI/EC is the log of the binomial C(N+A-1,A-1), for N=10^6 and A=3, > which is log(C(1000002,2))=38.863 bits, which totals (each rounded > separately, which they don't need to) 28+3+39=70 bits to be added to > QI's output to match the adaptive AC's coding conditions. Since the > QI's output was essentially the entropy, the QI's total is 70 at most > whole bits above the "entropy" (note the "entropy" N*log(3) didn't > include N; also in high entropy limit QI doesn't need to transmit freq. > table, but one would need to modify AC to work in high entropy limit, > so I added table to QI, which distorts a bit comparison to entropy H). > ----------- > > I am just curious you say you need 2.49 bits to code the number 3 where did you get the formula for L(X) is it based on some universal number scheme where they give a neat way of caulating bits needed for univerasal coding of large numbers. Again these are bits you have to code them with whole number of bits don't you? >> And was it really a static zero entropy type of compression where >> each of the 3 symbols assumed equally likely? > > That was answered right there on the top line: the equiprobable symbols > are "high entropy limit" (as opposed to "low entropy limit" which is > for highly sparse arrays, which one symbol vastly dominating others). > Look either it is or is not? David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/11/2006 11:22:28 PM

> It is a bit nitpicky, although that makes everyone go over the things > they imagined they knew. A good refresher for the old thoughts and it > brings in new thoughts which none of us would have thought on their > own. These kinds of "nitpicky" arguments take a life on their own. After some thought of this (by me) I suggest you to carefully reformulate your 'always' message. I think something like you've "invented a more exact (in sense of error) or precise algorithm for performing (arithmetic) coding of unknown length real-numbers", that "(always) produces less quantization noise in the coding operation than (for example) the X-bit arithmetic coder", in result "the (attempt) to code all possible real-numbers would result (always) in a shorter code than with the AC". <context type="adaptive model"> What's misleading actually is that you (maybe try to say) that you always code shorter for _any_ _single_ message. To say that about the _sum_ of _any_ of them is true, as far as I can work it out. To say that that is true for nearly all of them is also true, because nearly all of them doesn't fit well into the "correct over/underestimation" paradigma (because of the holy /pig/eon). </context> I'm trying to filter out true and false in all of these postings that doesn't stay in one context and aren't nearly atomic. There is such much mix. Ciao Niels

0 |

1/12/2006 2:18:11 AM

David A. Scott wrote: > "Matt Mahoney" <matmahoney@yahoo.com> wrote in > > fpaq1 compresses files of all zero bytes as follows: > > 0 -> 1 > > 1 -> 2 > > 10 -> 5 > > 100 -> 9 > > 1000 -> 13 > > 10^4 -> 17 > > 10^5 -> 21 > > 10^6 -> 25 > > 10^7 -> 29 > > 10^8 -> 34 snip... for 10^6 zero bytes... > I feel strongly 23 is most likely optimal and that you should > with in a byte or so get the same length for fpaq0 if its all zeros > or all ones. IN your case its like it used to low a probability > when doing the all ones and used to high when doing all zeros. > It most have something the fact you don't carry and that for this > case 32 bits without carry not enough. I think fpaq1 is behaving correctly for the model it uses. The fpaq1 model is for 9 bit symbols so there are 9 contexts. In 8 of these there are 10^6 zero bits. In the other (the EOF bit) there are 10^6 zero bits and a 1. The model adds 1 to all counts. The probability for n zero bits is modeled as p = (1/2)(2/3)(3/4)...(n/(n+1)) = 1/(n+1), which codes in log(n+1) = ~20 bits for n = 10^6. The extra 1 bit in the EOF stream has probability 1/(n+2) requiring log(n+2) = ~20 bits. So the total is ~200 bits = 25 bytes. Using a bijective coder, the best you can do is get log(25) = ~5 bits of information from the length of the compressed data. (I suppose it depends on how you model the compressed length). You could improve compression by using a different data model, such as adding a constant less than 1 to the counts, or modeling a bit stream instead of a byte stream. -- Matt Mahoney

0 |

1/12/2006 5:29:36 AM

> I assume if your coder is honest that the number you > got represents the average of 100 runs its very close > Very interesting your coder puts out fractional bits. The fractional bit sizes used are not averages but actual fractional bit sizes. Their meaning is that if you tell decoder that an index will take a value from a range 0..M-1, which is M values, then the index itself has size log(M) bits. For example if I tell decoder: the next index has M=3, then the size of the index is log(3)=1.58496... bits. If you have to ship just that single index, the best you can do with it is a tapered Huffman code, which is 0,10,11 for the 3 values of the index. That way you will be sending either 1 or 2 bit output, based on the value of the index itself. The average cost, over multiple and _separate_ transmissions of such index is 5/3=1.667 bits per shipment, which is 0.0817 bits above the fractional bit index size log(3), thus about 5% larger. If you are shipping several items I1, I2,... (which need not be related to each other), that have such fractional sizes (meaning there is some max for each, M1, M2,...known to decoder) you can combine the indexes into a single mixed radix value V, which in turn is also a fractional bit item, since its range size is the product of range sizes as: M = M1*M2*... In this case you would pay a little (such as 5% on avg. above) _only on the total_ bit fraction for V, while all the individual item fractions have now been added exactly to obtain fractional size of V. To compute combined package index V, you interpret I1,I2,... as digits in mixed radix M1,M2,... and then you convert that number into binary format using: V = I1 + I2*M1 + I3*M1*M2 + ... (1) You also calculate M=M(V), which is total range size of V as: M = M1 * M2 * M3 * ... (2) Obviosuly, (1) requires arithmetic precision which grows as much as the size of index V. I have run into this kind of little coding task many times, and the best one could do with it were Huffman codes, which would not only fall short (on avg.) of the ideal size log(M) bits few percent, but the output size would fluctuate from sequence to sequence so it didn't package well into pre-assigned fixed size space. The ideal way, of using (1), becomes impractical as soon as V grows beyond 64 bits. That is one type of problems on which this new quantization scheme, QI, does its trick: with QI you can compute combined index (1) and its size (2), for a tiny cost in precision (which is a very small fraction of bit), using only N-1 regular precision integer multiplies for N items. Note that the whole array of 10^6 items in alphabet A=3 was computed by the QI.exe test program as this packaging problem, with M1=M2=...=3. Let me explain how QI does it for this case, with A=3 and N items being combined. In that case (1) and (2) become: I = D1 + D2*3 + D3*3^2 + D4*3^3 +... DN*3^(N-1) ... (3) M = 3^N ... (4) Above, D1,D2,D3,...,DN is the sequence of N digits in base 3, which is our interpretation of the input sequence. The old style radix conversion using (3) would need arithmetic precision of log(M) = N log(3) bits. With QI you do same number of multiplies as in (3), except in regular integer precision (the QI source is set to 32 bit precision). The basic tool QI uses for this is "Sliding Window Integer" (SWI), which is a numeric type like a floating point (FP) number, except that it is an integer (see [T3] p. 7). You can look at it as a hybrid of regular FP numbers and unlimited precision integers. Or like an FP with more flexibility in rounding and operations. Or like an unlimited precsion integer with some constraints on its bit pattern. In any case, an SWI is specified via 3 items: g=precision in bits, m=mantissa (sliding window, an integer of max width g bits) and s=exponent (shift, regular integer). Since we'll use fixed g=32, we don't need to drag g around explicitly any more. With that, some SWI variable Q is given via a pair of integers Q=(m,s), meaning Q=m*2^s, i.e. m shifted left s times. The value Q is used in arithmetic as if it were some long integer m*2^s, but about which we do know that only 32 bits given in its component m are nonzero and that there are s zeros follown it. In the source code, the header Qi.h has a structure SWI, which is: typedef union _swi { // SW Integer type struct { dword m; // mantissa: 32 bit unsigned int e; // exponent: signed 32 bit int }; qword q; // 64-bit alias e.g. for 64-bit compares } SWI; Otherwise the arithmetic with Q is exactly the same as with a large integer of that size. The result of such operations is generally not an SWI, since extra significant bits may be produced. Except that computing with Q is faster and Q takes much less memory than a large integer of similar magnitude (which would be g+s bits wide). Note that for large integers X < 2^32, the SWI mantissa is simply that number X and exponent s is 0. For larger numbers, s is nonzero and we keep the 32 bit mantissa m normalized i.e. 2^31 <= m < 2^32. There only one extra operator used for SW numbers which is lacking in regular large integer arithmetic, and that is rounding, which is rounding up to g precision i.e. any large integer X is converted into SWI format number Q by copying leading g bits of X into intger m and placing into s the count of bits remaing in the tail of X (tail: bits we didn't copy into m), then if the tail had any nonzero bit we increment m (on overflow we renormalize it and increment the exponent s). Since this is important operation below, I will denote SW rounding of X as {X}sw. The QI source has a file Swi.c which implements the SWI arithmetic (as needed by the rest). The QI method applied to our problem in eq. (3) breaks into 2 phases: a) Compute quantized power array Q[i]=A^i for i=0,1,...N (its elements are of type SWI): SWI Q[N+1]={1,3, 3^2, 3^3,..., 3^N}; b) Use array Q[] to compute I via radix expansion, eq. (3). To compute power Q[i+1] for radix A we use Q[i+1]=Q[i]*A, and we initialize Q[0]=(1,0), Q[1]=(A,0) (see also function new_radix() in Radix.c for an actual implementation). So far this is the same as if one were doing regular power table for radix conversions. The key new element QI brings in here is the handling of the multiplication Q[i]*A. If Q[i] were just a 32 bit integer, the product would be a 64 bit integer. Denote Q[i]=(m,s), where m=mantissa of Q[i] and s=exponent of Q[i]. The SWI arithmetic works like large integer, hece Q[i]*A = (m*A)*2^2. The result is not SWI any more. We convert it to SWI using rounding up operator {}sw (which yields SWI variable): Q[i+1] = { m*A*2^s }sw ... (4) The function implementing mutiplication with rounding up in (4) is given as: SWI swuMul(SWI x,dword y) in the file Swi.c. **Important note is that (4) is the only place we will use rounding up. >From here on, _all_ operations are _exact_. With the array Q[] computed, we apply (3) to compute index (which is a binary value of radix 3 number) for sequence of digits D[N]={D1,D2,...,DN}. The basic step in (3) is multiplication Di*Q[i], i=1,2...N, and addition of the result to the sum I. The product X=Di*Q[i]=Di*m*2^s is a large integer with up to 64 nonzero bits, followed by s trailing zeros from Q[i]. Hence adding X to I is done by adding the 64 bit integer Di*m into the buffer I at the bit offset s. The source file Swi.c has a function swuMulAdd() which combines this multiplication Di*Q[i] and addition into I at position s. (There are also functions swuMulx() and swxAdd() which can perform the same in two separate steps.) The result of all this is the index I. The number of bits in I is determined using the fact that I is always smaller than Q[N] (I can take values from 0 to Q[N]-1). Hence the number of bits in I is L(I)=log(Q[N]). Since Q[N]=(m,s)=m*2^s, L(I)=s+log(m). Function radix_szf() in Radix.c computes this fractional size from given N digits and power array Q[], while function radix_sz() returns the rounded up integer number of bits. Note that because of rounding up in (4), value Q[N] is little bit larger than the exact integer A^N, therefore log(Q[N]) will be larger than log(A^N)=N*log(A) (that's the number shown as 'exact entropy'). The difference between the two is 1.62e-4 bits for A=3 and N=10^6. In the earlier post I used rounded up to next integer value for L(I). That packaging wastes about 1/2 bits. Since decoder will have Q[N] table as well, we can use mantissa m of Q[N] to package the upper 32 bits of I (which are extracted at the position s in buffer I) via tapered Huffman codes. In the test we had: Q[N]=(0xB521509A,1584931) and the average tapered Huffman length for x < 0xB521509A is 31.5867 bits, hence that would get us on average to within 0.087 bits from the exact fractional bit value for the whole index I. > You mention in paper there are 3 parts. One part is the > lenght of thing compressed and the other is the number > of ones and the third which I assume the above is the index. That's correct. The index alone is generally smaller than entropy (for binary coder: by 1/2 log(n) bits), but the combined count of 1's, which needs log(n+1) bits, plus index length are longer than the entropy by approx E=1/2 log(n) bits. The entropy formula for (binary case, p=p(1), q=p(0), p+q=1): H(n)=n* [p*log(1/p)+q*log(1/q)] ... (5) does not include cost of sending n and p, but does include cost of sending count of 1's, integer k. If coder knows p, then it can encode k in about 1/2 log(n) bits, hence you code without the earelier excess of E=1/2 log(n). Note than in high entropy limit, such as our test case, the exact index (3) is same as entropy, while the quantized index is slightly larger. But in high entropy case, coder doesn't need to send any frequency table since all posible frequencies are enumerated into the same index, hence the index produce d by QI already has that built in. You do need to send A and N, which will cost you about 30 bits (the entropy formula n*log(3) didn't count cost of sending these two either, hence adding these two items to our size doesn't change our distance from entropy). > So lets not be generous and round down. from the > combination part you get 1584962 for the rest of > needed overhead you need say 69 bits. > Would it be far to say that you compressed the > 1000000 sybmols to 1585031 bits, Roughly, withing your rounding, yes. That is the figure to match the conditions of an adaptive AC (which I ran in the test). That is not what we need to send to decode it, though. We don't need to send frequency table in high entropy limit coding. That frequencty table size was 39 bits. It was added only to match the cost adaptive AC had to pay to adapt (which is approximately same as the cost of sending the frequency table explicitly). Hence, rounding up sizes for A to 3 bits and for N to 27 bits, you need 30 bits. Rounding up the index, you need 1584963 bits. Hence the total decodable output for N=10^6 symbols, A=3 is: DECODABLE OUTPUT = 1584993 bits = 198124.125 bytes That size is fixed, the same for all inputs. You can check the whole code-decode-compare in the function radix_iter() in Tests.c file. To verify the fixed size claim, you can corrupt the bits beyond the declared size of compressed index and verify that it decodes. Or you can check dec_radix() function in Radix.c and verify that the first thing it does is to obtain this same index size in bits and then it extracts the leading 32 bits of the index using the calculated end (it decodes it from the end, the last bit is the most significant bit of the index). > Let's be clear about this the list is for 3 symbols the > total bits only refers to the compressed combinations > and file lenght it does not carry in those bits just > exactly what those symbols are. The coder assumes symbols 0,1,2,...A-1. If they are anything else you need to send separately the A values as a simple array matching the order of assumed values 0,1,2..., in the input. The array of items is in whatever size each may be (they could be 64 bit integers, or each item can be different size from others). > I am just curious you say you need 2.49 bits to code > the number 3 where did you get the formula for L(X) > is it based on some universal number scheme where they > give a neat way of caulating bits needed for univerasal > coding of large numbers. The self-delimiting length is a number log(n)+log(log(n))+... (as calculated by function sdlen() in Qiutl.c). That's how many bits one needs to code integers of arbitrary size about which no limit is known to the coder. Explicit codes exist (such as Elias omega) which approximate these fractional size with integers, averaging about same for large enough samples. You can check [1] for a survey and list of codes. > Again these are bits you have > to code them with whole number of bits don't you? As with other fractional counts, no, you don't need to pay the full rounding up cost. Since you have A and N (both are self-delimiting in general case, with no upfront limits on A & N), you can enumerate the two together and send just one self-delimiting number, so you round just one fraction for 2 numbers. If you're really after every last fraction, you can also enumerate the two together with the top 32 bits of index, so you only pay a single fractional bit rounding on the total output. -- References ( http://www.1stworks.com/ref/RefLib.htm ) T1-T3 are on http://www.1stworks.com/ref/qi.htm 1. Peter Fenwick "Punctured Elias Codes for variable-length coding of the integers (1996)" http://citeseer.ist.psu.edu/fenwick96punctured.html

0 |

1/12/2006 8:52:06 AM

Matt Mahoney wrote: > David A. Scott wrote: > > Matt if I can get nightlight to commit to coding his example of > > the 3 symols types. I would like to play again with fpaq0. To see > > how much better it can be made with as little change as possible. > > I like your style but I don't think I will go to the wall and make > > it bijective. But the nine times for each eight can be changed to > > eight for eight with a ninth only needed for the last byte. > > There is some room for improvement. I tried compressing 10,000,000 > bytes of random charaters A, B, C. fpaq0 compresses it to 1,982,988 > bytes. The theoretical limit is 1/8 lg 3 = 1,981,203, a difference of > 1785 bytes. For 1,000,000 bytes it compresses to 198,322 bytes, a > difference of 201.7 bytes. > actually the ideal length is 10,000,000 * lg(3)/8 which is 1981203.12590144522681717367993477 bytes. 1981203 is actually not enough you need to round up to get 1981204 bytes. That what I get using the two files of 10,000,000 bytes first is 333,334 bytes of A followed by 333,333 bytes of B followed by 333,333 bytes of C The same with a file of 10,000,000 bytes that are ABCABC.... ABCA Both compress to same size check it out. Its a plain simple bijective coders useing old methods or at least thats what some think. The code is straight forward and easy to use. see http://bijective.dogma.net/nitelite.zip David Scott PS I may be drunk Take it for what its worht trust no one who can't test with real code. Don't even trust me test it. Remember I am an idoit who know belives I think some one drugged me. Take it with a grain of salt.

0 |

1/13/2006 5:39:07 AM

Since the debate has wound down, here is a condensed summary of the differences between QI and arithmetic coding (AC) all in one place. QI is an advance of enumerative coding (EC) which solves optimally the fundamental problem of unlimited precision EC. The solution of this problem has been sought throughout the four decades since Lynch-Davisson 1966 coder, with various attempts extending to at least year 2003. All such attempts resulted in partial solutions only, involving significant tradeoffs in each case. The arithmetic coding (AC) itself, arose as one such attempt by Rissanen in 1970s, and it could be viewed as the most successful among such partial and sub-optimal solutions of the EC precision problem (that is how Rissanen viewed it in his early AC papers as well). For the rest of this post I will address the QI's solutions for the four principal remaining weaknesses and tradeoffs introduced by the AC algorithm. A1) -- SPEED & POWER CONSUMPTION -- The AC use of complex, power hungry instructions (mul/div when coding at its maximum precision) and the requirement for coding operations on the more probable symbol (MPS) results in speed & power consumption penalties (the latter becoming increasingly important). In contrast, QI performs no coding operations on the MPS and it uses fewer and simpler instructions on the less probable symbol (LPS). These two performance distinctions extend to a general alphabet of size A coding, through all A-1 internal nodes of the binary decomposition of the alphabet (cf. [T1], pp. 32-38). Variety of details giving rise to the significant QI speed advantage in different coding settings fit together in a general pattern of QI's much better division of labor at all levels -- within the functions of the coder itself and extending to a similarly improved division of labor between the coder and the modeling engine. Within the coder proper, QI separates cleanly the general combinatorial properties of all symbol sequences satisfying some types of conditions (the "enumerative classes") from the incidental properties distinguishing individual sequences of that type. The general part (the quantized enumerative addends via eq. (21) p. 8, [T3]) is computed up front, once and for all and outside of the coding loop, with the results saved into universal tables (which are independent of source probabilities). The coding loop for a particular instance does only the absolute minimum work that deals exclusively with the individual properties of that instance (the index computation, eqs. (21),(22) p. 8 [T3]). Similarly, QI/EC modeling engine (cf. p. 27, [T2]) processes the entire finite sequence being encoded, decides on its decomposition into enumerative classes (ranging from simple segmentation of the input into fixed or variable contiguous blocks, through BW transform and selection of the optimal BW output column segments), then hands its complete output to the suitable enumerators within the encoder for index computation within the selected enumerative classes. Both components, the modeler and the coder (enumerator), perform their specialized tasks on the entire input, without interacting symbol by symbol as done with AC. Therefore, the speed & power consumption edge of QI over AC is not a result of a coding trick or a lucky pick of parameters which happens to work well in some cases or any such accidental circumstance. The QI speed gains are large for all inputs -- for all source parameters and for all input sizes. It is a fundamentally much more efficient way to do the coding, in the same way that Henry Ford's production line was a much more efficient way to build cars than, say, organizing the same number of workers & machines so that they all work together on the same, single car, from the raw materials until it is out the door, and only then start on the single next car. The latter organization corresponds closely to the division of labor used by AC, within the coder itself and between the coder and its modeling engine (car <=> codeword, materials for one car <=> one input symbol, workers/machines <=> coder & modeler functions). A2) -- PROBABILISTIC PARAMETRIZATION -- The AC reparametrization of EC enumeration of finite sequences into the probabilistic framework (where the exact combinatorial parameters of finite sequences are replaced with normalized limit values of infinite sequences), can generally provide only a lower resolution, approximate models for a much richer space of finite sequences coding problems (which includes all practical coding tasks). In contrast, QI modeling interface uses precise finite sequence parameters, which are richer, sharper and more flexible language for modeling finite sequences. As result, in the field of constrained coding (used e.g. in recording media and general constrained channel coding), where such finer controls over the precise finite sequence parameters are vital, EC remains the method of choice, despite the intervening advent of AC and the performance drawbacks of the unlimited precision EC. In the wider realm of practical coding, AC's loss of resolution in the space of possible parametrizations has generally narrowed down the spectrum of modeling algorithms useful with AC, to essentially the PPM & CTW type of algorithms as the apex of modeling. It has also constrained the language and the type of parameters that a modeling engine can use to transmit all it knows about the finite sequence being encoded to the coders, reducing it in practice to 'probabilities of the next single symbol'. Yet, presently the most widespread and the most practical general compression algorithms, such as LZ & BWT families, perform "surprisingly" effective modeling of finite sequences in what are intrinsically the finite sequence parametrizations (dynamic dictionary entries for LZ, or context sorting via BW block transform for BWT) without ever computing 'probabilities of the next single symbol' (or any probabilities at all). Another side-effect of the ill-fitting parametrization for enumeration of finite sequences, is the performance penalty. Specifically, if one were to try emulating with AC the streamlined table based coding of QI (Schalkwijk has shown how this can be done for Elias algorithm, cf. p. 19-20 [T2]), so that AC would need encoding operations only for LPS, while skipping the MPS, and for LPS to have addends precomputed and ready in the table, one would need separate table of the same size as QI's table, for each source probability distribution. In binary case this would increase the table size by factor O(n) over the QI's universal table size, which is the factor QI gained over the exact EC (i.e. AC did not solve at all the EC table size problem arising in the high speed mode). In short, the probabilistic parametrization lacks the resolution to draw the precise line separating cleanly the universal (which could be precomputed into a table) from the instance enumerative properties of the finite symbol sequences. A3) -- CODING PRECISION (REDUNDANCY) -- The AC choice (A2) is accomplished through Stirling and several further approximations of the enumeration itself (which are in addition to the finite precision approximation, cf. [T3] p. 2, [T2] pp. 22-25), which taken together with suboptimal AC quantization results in the AC output excess of the general form: D = O(log(N)) + O(N) + O(1) ... (1) bits over the optimum finite precision solutions computed by QI. The O(log(N)) type terms in D are due to excess in the cost of transmitting probabilities via 'learning' for an adaptive AC or failing to account for effects of shortening of the rest of a finite sequence during coding for a static AC (side-effect of (A2)). These costs can be in some situations largely avoided by AC e.g. by using KT estimator for Bernoulli sources or the 'faithful probabilities' (the decrementing AC in [34]). But even when applicable these methods involve tradeoffs (e.g. speed penalty and a complete disconnect from the conventional AC modeling engine for a decrementing AC, while for KT, the increased pointwise excess in all but the low entropy density range and the increased average excess on composite sources). The term of O(N) type in D, which is the result of QI's optimal bottom up vs AC's suboptimal top down index quantization (and the resulting AC truncation of generally infinite fractions, which are one of the consequences of the ill-fitting finite sequence parametrization (A2) e.g. when dividing its current interval among the alphabet symbols, which is absent in QI), despite appearing significant in D, dominates only for large number of symbols N or for large alphabets A, or for the intermediate sizes of the two when present together. That term is approximately: O(N) = 2*N*(2A-1)*log(e)/2^g ... (2) bits. { g is the coder arithmetic precision, A alphabet size, cf. eq. (20) p. 14 [41a] for AC and d(g) p. 8 in [T3] + Radix.c in [QIC] for QI, see function enc_radix() and note that function it calls swuMulAdd() uses exact mul & add without rounding, thus it does no quantization on the interval partition between the A alphabet symbols, while AC quantizes these A partitions; you can also verify that QI's redundancy does not grow with alphabet by running QI.exe with option "cr<radix>" and look at the excess term shown as Q = .,. = E+<absolute excess>.} Note that in our earlier 'contest' we had used A=3, which is via (2) the most favorable A>2 test for AC since using the smallest non-binary alphabet size A minimizes the O(N) term. Had we used a 32 bit value A32 for the alphabet size, QI's redundancy would remain unchanged, while AC's O(N) redundancy would increase by a large factor A32/3 (note though that O(N) is not the total AC excess over QI in (1), hence the total excess would not grow by the same factor A32/3). For the binary alphabet, A=2, eq. (2) yields for the O(N) term of the full difference D: O(N) = 6*N*log(e)/2^g ... (3) Or, expressed as a ratio of AC's over QI's maximum quantization redundancies: AC(N) / QI(N) = 4 ... (4) i.e. even for binary alphabets, which is the best case for AC, the AC's sub-optimal quantization "leaks" four times more excess bits per symbol than QI's optimal quantization. { Note: although the expressions (2),(3) refer to the maximum redundancies of the coders, the average and pointwise redundancies are 2-3 smaller for both coders. Since there are no such simple closed forms expressions for these, one can only measure them. The executable QI.exe included in [QIC] has a command line option "ct" which measures QI's quantization redundancies, maximum and average (and several others) on _all_ quantized binomials for N up to 2^20 (this max N can be changed by editing the constant "#define MAXTN 1*MEG" in Intro.h).} Finally, the terms of O(1) type in D, which are 2-4 bits (depending on inputs & specific AC & QI implementations), despite small absolute size, may dominate the difference D when the outputs themselves are very small (e.g. in low entropy limit or generally for short outputs at any entropy rate). The empirically observed compression efficiency differences shown in the QI vs AC performance table (p. 10, [T3]) are dominated (in all but the last row) by the contributions of the O(log(N)) and O(1) terms in D. Although these differences are clearly quite small, such observation is somewhat tautological, since the types of inputs considered (probabilistic sources with sufficiently large N) are those on which AC or Huffman coders perform reasonably well. These inputs are merely a subset of all practically important finite sequences for which O(log(N)) and O(1) terms don't amount to very much relative to the output size. A high precision coding tool such as QI, or even the unlimited precision EC, will by the very definition of such subset, improve very little over AC or Huffman on that subset. To put it another way, if we were to take modern high precision surgical instruments to ancient Roman physicians to test and evaluate, they would find very little if any gain for "surgery" with the new instruments. Of course, what they considered "surgery" consisted only of procedures which could be done well and safely with the more blunt instruments they had. Much of the vast realm of present day surgical procedures, which are accessible only to the high precision modern instruments, would have been invisible or well outside of what they understood as a conceivable "surgery". Similarly, there is a large unexplored realm of practically important finite sequences and associated modeling and compression algorithms for such sequences, all virtually invisible from the probabilistic parametrization & predictive/adaptive modeling viewpoint. From that vantage point, such sequences typically appear as highly fragmented and very unpredictable from one fragment to the next, with "low" intra- fragment entropy, but "high" inter-fragment (combined) entropy, thus they are incompressible by virtue of noise from the O(1) and O(log(N)) excess terms in D becoming comparable in size to the encoded fragment sizes. QI, being a high precision ultra low noise coding instrument, optimal at any given arithmetic precision, yet _simultaneously_ extremely fast across the spectrum of inputs and universally applicable, opens the gates into this untapped realm of finite sequences, modeling and compression algorithms, in which the relative gains vs AC or Huffman are not limited to mere few percent or even to 100%. To illustrate this point, consider the BW transform output column R (cf. Fig. 5, pp 34-35, [T2]). For increasingly long right contexts of the symbols in R, R becomes increasingly fragmented, with low intra-fragment and high inter-fragment entropy (when partitioned at MDL optimum), which are well beyond the useful resolution of AC or Huffman to encode separately and thus optimally (both perform very poorly when applied directly to R, be it in one piece or at any level of fragmentation). The optimum segementation of R (in the MDL sense, which is computable via Huffman-like bottom up "MDL greedy" contexts merging), will generally vary in context depth across R, and it is a function of coder precision & noise, with longer contexts, thus finer partitions, accessible only to the high enough precision coders. In the absence of high precision low noise coder, R is presently coded with blunt ad hoc tools, such as MTF, which obliterates all the context boundaries, so finely drawn just moment before by the BW transform. The MTF is then usually followed by one or more of runlength, Elias, Huffman codes, sometimes by AC (with little practical benefit to compensate for the speed penalty). As a simple indicator of the degree of sub- optimality introduced by the existent second phase BWT methods, the output size is measurably sensitive even to the mere 1-1 remapping of the alphabet (sensitive enough, that specialized algorithms have been developed to search, a la TSP, for the optimum BWT alphabet mappings). In addition to the general compression scenario where BWT is used, many specialized problems of great practical importance offer a glimpse at this same unexplored realm waiting for the high precision, low noise coder. Most notable among these are incremental frame updates in video coding, be it for live video or for recording & playback media. The residue after the inter- frame redundancies removal, including the motion compensation, are collection of small, uncorrelated fragments with relatively low intra- fragment and high inter-fragment entropies, but where the O(1) and O(log(N)) terms of AC or Huffman, accumulated over multiple fragments, would wipe out any gains if one were to try coding such fragments separately. Even the conventional EC in a suboptimal hierarchical approximation has been found overall advantageous vs AC in this domain. A similar conflict between the optimum MDL segementation vs. the coder noise is encountered in various forms across the image coding domains, especially for methods based on hierarchical set partitioning (cf. citations on p. 2 [T3]). As the last example of practically important coding tasks in the high precision coder realm, we note the coding of complex data packages, which often arise as result of serialization (e.g. for network transmission, messaging, storage, output of interpreters, etc), and which contain many small elements, mutually unrelated at any level where a coder or its modeler would have access to, and which have low intra-fragment and high inter-fragment entropies. In most cases, trying to code such data via AC or Huffman would likely increase the size, thus one often simply stores such data as given, or at best, if the programmer time is inexpensive enough, uses some minor specialized ad hoc tidying up. For this types of 'complex data package' coding tasks QI provides not only the needed high precision coding, but its advantage (A4) described below, makes possible the traversal and the random access to the individual package components without having to decompress any components or store separate compressed lengths or pad the compressed sizes to some fixed maximum lengths. A4) -- STABLE & AVAILABLE COMPRESSED SIZE -- The output size produced by AC (and even more so for Huffman) fluctuates unpredictably from instance to instance even for perfectly equiprobable messages. This is a manifestation of their coding sub-optimality (A3) and their lower resolution parametrization (A2). In contrast, the output of QI is perfectly stable for equiprobable messages (enumerative class), not just down to 1 bit, but down to the exact bit fraction. Namely, since in a suitable coding setup, the QI's precisely known upper bound on the value of the index allows encoding of the leading g bits of such index via the mixed radix codes (e.g. bundled with other related or unrelated items into a common mixed radix number, cf. N3 & N4, p. 9, [T3]), that means that QI can code such equiprobable messages to sizes identical to the exact bit fraction. AC normalizes all index bounds to 1.00.. (following its (A2) parametrization prescription), which obliterates the precise upper bounds for the index which exist for finite sequence enumeration. Consider, for example, a permutation of N elements (cf. pp. 49-50 in [T1], also permutation coder in Radix.c [QIC]). QI will encode every instance of such permutation into the log(N!) bits to the exact bit fraction (which is within 1/2^(g-1) from log(N!)). AC, even coding in its most exact mode (enumerative AC mode of [34]) will produce at least O(1) variation in output size from instance to instance, and for the conventional adaptive or static AC implementations, also the O(log(N)) variation, while Huffman will produce a huge O(N) variation. Therefore, if one has to store such output in a fixed size field (as often required for compressed database/search engine fields or when bundled into fixed structure packets), with AC & Huffman one has to reserve the space for the worst case input instance (the optimum estimate of which, especially if 100% guarantee is required, may be non-trivial for AC; while Huffman becomes largely unsuitable due to its huge O(N) variation), adding to redundancy, lowering performance and significantly complicating the development of such compressors. In addition to stable output size for stable input entropy, QI has the actual precise compressed size (precise down to 1/2^g bit fraction) readily available from the "enumerative class tag" such as N for permutations (cf. p. 3 [T3], more detail in [T2] p. 27), which is the info which it needs to know for decoding, anyway. But QI can have this precise compressed size _without decoding_ the index and without storing the compressed size separately -- it is available precomputed, from its quantization tables (e.g. quantized N! or quantized binomial C(n,k)). In contrast, to find out the compressed size without storing it separately, AC would need to decode the entire output, with possible overruns beyond the end of the compressed buffer (that occurs when AC codes most tightly, in "frugal bit" mode, where it reads up to g bits beyond the end of compressed data, then after decoding it "pushes" the unused bits back, which means an outside information needs to be kept to prevent it from accessing unmapped pages, or a g bit padding needs to be added). Similarly, Huffman coder would need to either decode the entire permutation, or to store the compressed size separately. The combination of availability and stability of output size, makes QI ideal for data base & search engine coding tasks, where one needs not just the stable and predictable compressed size, but also the ability to quickly traverse and randomly access compressed components of the complex packages (db records, serialized packages, compressed structured documents, such as html & xml pages, spreadsheets, etc), without decompression. As noted at the end of (A3), this QI property, even though it is merely another facet of its high precision, low noise coding, is an entirely separate lever from the redundancy considerations of (A3), contributing independently to the 'opening the gate' into the presently untapped realm of high precision coding. ======= ONE COMMON QUESTION ======= > Willem: ... assuming I have a stream of symbols, where > at each position in the stream, the probability distribution > of the symbols is different, then how does QI coder adapt > itself to all those different distributions ? The answer was given as methods (a) and (b) described in the post: http://groups.google.com/group/comp.compression/msg/1314ff87da597fad --- References ( http://www.1stworks.com/ref/RefLib.htm ) QIC. QI C source code research kit, publicly available at: http://www.1stworks.com/ref/qi.htm T1. R.V. Tomic "Fast, optimal entropy coder" 1stWorks TR04-0815, 52p, Aug 2004 http://www.1stworks.com/ref/TR/tr04-0815b.pdf T2. R.V. Tomic "Quantized indexing: Background information" 1stWorks TR05-0625, 39p, Jun 2005 http://www.1stworks.com/ref/TR/tr05-0625a.pdf T3. R.V. Tomic "Quantized Indexing: Beyond Arithmetic Coding" arXiv cs.IT/0511057, 10p, Nov 2005 http://arxiv.org/abs/cs.IT/0511057 34. J.G. Cleary, I.H. Witten "A Comparison of Enumerative and Adaptive Codes" IEEE Trans. Inform. Theory IT-30 (2), 306-315, 1984 http://www.1stworks.com/ref/Cleary84Enum.pdf 41b. B. Ryabko, A. Fionov "Fast and Space-Efficient Adaptive Arithmetic Coding" Proc. 7th IMA Intern. Conf. on Cryptography and Coding, 1999 http://www.1stworks.com/ref/RyabkoAri99.pdf

0 |

1/22/2006 8:41:59 AM

"nightlight" <nightlight@omegapoint.com> wrote in news:1137919319.704116.113760@g44g2000cwa.googlegroups.com: > Since the debate has wound down, here is a condensed summary of the > differences between QI and arithmetic coding (AC) all in one place. > It only appeared to wind down when you seemed to give up on the contest. I assumed you realized you where wrong. .... > In the wider realm of practical coding, AC's loss of resolution > in the space of possible parametrizations has generally narrowed > down the spectrum of modeling algorithms useful with AC, If this was true then from your early statements it should be childs play to put you coder inside Matts PAQ which has won several times for the best compressor. It would make QI shine if this was possible. All you have shown so far is how easy and flexible arithemtic codeing is when you successfully modifed moffat's code to fit your needs. Appearently so far it was to hard to get QI to fit in the world of Moffat and use real files so anyone could check with the files of there choice. ....... > > Note that in our earlier 'contest' we had used A=3, which is via > (2) the most favorable A>2 test for AC since using the smallest > non-binary alphabet size A minimizes the O(N) term. Had we used > a 32 bit value A32 for the alphabet size, QI's redundancy would Look you picked the 3 alphabet size so that AC would look bad I guess that here you recognized that QI can't start to compare to AC for the "contest" It is nice to have something that works on files isn't it? Not sure if I trust you on the A32 contest or what ever you mean by that you have yet to show code for the other contest that can use files. First of all not sure you have a grasp of what you are proposing. If one had such a large number of symbol 2**32 its not likely that for any reasonable length file any symobol would appear more than once. Have you actually thought this through? Just wondering I should know better than to ask but have you based this so called error on the number of symbols. Or have you used what I think most here would like to see based on FILE LENGTH. You can't get many symbols in a test file if they are 32 bits in length. If its based only on the number of symbols. The test files are going to be very long before you get to the same number of symbols in a short file of only three symbol types. In fact if they are completely random and independent of each other its tough to beat the copy function. ..... > i.e. even for binary alphabets, which is the best case for AC, > the AC's sub-optimal quantization "leaks" four times more > excess bits per symbol than QI's optimal quantization. > Strange this best case which I assume is better than the 3 symbol case "leaks four times more excess bits per symbol than QI's optimal quantization. If this is so why could you not complete the contest? Why does if appear the poor simple arithmetic coder from one so low that can not write fancy articles like you seem to win the contest? Oh I see the key words "QI's optimal quantization" is that a fancy trick to mean if you ever understand the problems invovled then in theory someday in the distant future someone could actually optimally finish the code so in theory you think it would be better. That's all nice and interesting but I would like not to use AC's best case and go with what was originally your idea my only contribution to this was to suggest making it really a practical contest and use files. After all you still must belive that QI could be some how be magically transformed to do the simple contest. You have already proved how flexable AC code is by you own work in modifying Moffat surely since QI is so much better and so more flexable and etc that it would be childs play to make a version that works on files as in the "contest".you are refering too. ..... > compensate for the speed penalty). As a simple indicator of the > degree of sub- optimality introduced by the existent second > phase BWT methods, the output size is measurably sensitive even > to the mere 1-1 remapping of the alphabet (sensitive enough, > that specialized algorithms have been developed to search, a la > TSP, for the optimum BWT alphabet mappings). I actaully look forward to you attempt to improve BWT I think it will give me the motivation I need to improve my own code in that area. I hope that your method works better here then what so far seems like short coming from the "contest" In BWT I don't have the best or close to it. But it might be fun to work on it if QI and your programming team ever make a file version so others can test it. GOOD LUCK ..... > > A4) -- STABLE & AVAILABLE COMPRESSED SIZE -- > > The output size produced by AC (and even more so for Huffman) > fluctuates unpredictably from instance to instance even for > perfectly equiprobable messages. This is a manifestation of > their coding sub-optimality (A3) and their lower resolution > parametrization (A2). In contrast, the output of QI is perfectly > stable for equiprobable messages (enumerative class), not just > down to 1 bit, but down to the exact bit fraction. It's interesting to note that the file size I compressed to for the contest didn't seem to fuctuate at all for a wide range of input messages which were all the came length and perfectly equiprobale. However as noted in the counting therom There can not be this perfect match up. There will have to be some message length for which some perfectly equiprobable messages map to some N and some N+1. This again is the result of the counting theorm either you have found a clever way to avoid this problem. Or you have gaps in your final compression thus wasting space which does not seem to happen in simple bijective arithmetic coders. I love this its really funny. QI "down to the exact bit fraction" Its really funny when you think about it. I have bad news this knowing it down to the exact bit fraction did not seem to help in the contest. Why is that? Is it beacuse you still don't understand the full scope of the problem when it comes to compressing. If you had an understanding of the problem you would not have put AC down with what you seemed to first think would cause a problem the compression of 3 symbols "the contest". Now it seems that its not so bad as you first thought is it? Just what is it about no gaps and bijective to the set of files we are trying to compress that you don't yet seem able to grasp? It reminds me when in high school I won the school championship. The guy I was playing at the end I was not a member of the chess club at the time. Stated he was hot and read all the books. He asked why would a freshman even enter the contest. I had not read books He could think ahead several moves. I only one. Well I beat the cry baby he had all kinds of excuses and was real read up on chess. I played a few games for fun and waxed him over and over. Yes you are better read than me but can you write a real compression program even a simple one for the contest that QI should have easily won. Well I don't know so far I see a lot of quoting of text which would make you a good tech writter but when the shoe leather has to hit the ground I don't see anything yet. But then again thats only what it looks like to me surely its different for other people. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/22/2006 4:07:10 PM

> Appearently so far it was to hard to get QI to > fit in the world of Moffat and use real files > so anyone could check with the files of there > choice. QI source code & executable (which are available for anyone to verify as they wish), gives you OutSizeQI in bits to use for the compression ratio vs any AC you may have: RC = OutSizeAC [bits] / OutSizeQI [bits] ... (1) using memory buffers that you can fill with any data you wish. When you can make a rational explanation as to how writing the buffers to a file and reading them from it, can change RC in (1), then I will consider your suggestion for "upgrading" the QIC kit with file I/O. So far, you haven't given a _single rational reason_ as to how RC in (1) can change by mere writing & reading the memory buffers to/from the disk. We all know why you don't want to say it, see the method (b) described here: http://groups.google.com/group/comp.compression/msg/8d62e372056d9d53 All you have been offering is a repetitive lame whine like the one above: "so anyone could check with the files of their choice..." which is a complete BS, given that with the QI source _publicly available_ anyone can already check on any input pattern given to the coder via a memory buffer (which includes, obviously, any pattern you may read & write from/to a file). When you give a rational reason that is consistent with all the facts, such as the availability of QI source at: http://www.1stworks.com/ref/qi.htm and the intended audience & purpose for the source release: http://groups.google.com/group/comp.compression/msg/6d8dcafd8b947ea1 then I will listen. Otherwise you might as well insist that the file used in the QI "upgrade" you are demanding for "real" test must also be named as SCOTT1234.txt to count as a "real" test in your book (which will affect RC in (1) by about as much as any other file name, or any file at all). A difference that adding file i/o would make, though, regarding the QI source kit would be to lower the signal to noise ratio on: (A1) the speed advantage over AC, as well on (A3) (the optimality e.g. by insisting on counting output sizes in bytes only) and (A4) (stability & availability of the precise output size). Why would I care, given the absence of any rational basis to believe that RC in (1) will change from writing memory buffers to a file, to waste time and to expand the QI source size just to add gratuitous noise on the top of the (A1),(A3) & (A4) signals? > Look you picked the 3 alphabet size so that AC would > look bad... It was you who proposed A=3 test (your ABC), not me. I only responded when Matt Mahoney reported his first results on your ABC test "challenge": http://groups.google.com/group/comp.compression/msg/eb1fed7f8181bd31 that the QI.exe included in the source kit already has a command line which can give you the answer for symbols 0,1,2 instead of ABC for N=10^6 (or with recompiling for N=10^7). My response on Matt's post was here: http://groups.google.com/group/comp.compression/msg/ff1ee67d18b63f5a which you questioned (without apparently reading it at all since all you asked was already answered in the post). Then all of it re-re-re-explained to you again here: http://groups.google.com/group/comp.compression/msg/508e96ebcb4577f1 and more here (on fractional bit sizes): http://groups.google.com/group/comp.compression/msg/1e015f38d228969e and here few more times: http://groups.google.com/group/comp.compression/msg/6d77316462d9ee42 http://groups.google.com/group/comp.compression/msg/1a6773533448f7de http://groups.google.com/group/comp.compression/msg/69ff2ec175f1a5ed http://groups.google.com/group/comp.compression/msg/a7be8670d64e9f25 .... etc, round and around. You ought to go read all that before asking the same things for the 30-th time. As to the tests results on A3, Matt showed his results, I got Moffat98 results, all consistently showing significant AC output excess over QI. Since you rejected to test (or at least didn't report your results) on a proper random sample, as explained in the posts above, you have no claim to be compared to anything. The point made in (A2) with A32, which for whatever reason you keep misstating exactly upside down in your post above, is that some larger alphabet A32 >> 3 will only increase the O(N) component of the AC excess vs QI, given as D in (A2). This component will grow as A32/3. QI.exe will show you the excess for any A up to A=2^32-1 (as already explained, use QI.exe cmd line "cr<alphabet_size_A>" to see it). > If this is so why could you not complete the contest? There is no contest left regarding the ratio of output sizes RC. As to your loop: while(1) printf("Yeah, but what about file i/o?"); you first need to explain how the file i/o will change the ratio RC in (1) the answer for which we already know for memory buffer tests, Mahoney & Moffat98 providing AC sizes in (1) and QI.exe providing QI size in (1) (which anyone can verify with QI source kit as explained 30 times before). > First of all not sure you have a grasp of what you are proposing. > If one had such a large number of symbol 2**32 its not likely > that for any reasonable length file any symobol would appear > more than once. Have you actually thought this through? It appears, someone else here might need a bit of "grasping". Take some alphabet size A such that 2^31 < A < 2^32, for our N= 10^6 and run your AC on it. For example, I just ran QI.exe using the command line (you are welcome to verify this): QI cr3133061822 n1e6 i10 which is a test for A=3,133,061,822 N=10^6 and i=10 iterations with randomly generated inputs (the iter doesn't matter for QI since, as pointed out in (A4), QI produces always exactly the same size, including the max index value which in this case will always have the top 32 bits smaller than 0x8865BAF8) and the QI output size for the index is: Q = 31544926.09167... bits which is about 2.422e-4 bits above the lower bound N*log(A) for this index. That amounts to QI excess of 2.422e-10 bits per symbol. You can run your AC on this A & N and tell me what you get. Measure the total size and the maximum variation in size (with AC you get only whole bits since its exact upper bound on the index is obliterated due to normalization of index to 1.0, which in turn is done to comply with AC's coarse-grained probabilistic parametrization for finite sequences, enumeration them on the cumulative probability scale, as explained in (A2)). Recall also that just storing 32 bits/symbol for N=10^6 will use 455073 bits _more_ than the QI's output 31544927 in whole bits. > It's interesting to note that the file size I compressed > to for the contest didn't seem to fuctuate at all for a > wide range of input messages which were all the came > length and perfectly equiprobale. ... > I love this its really funny. QI "down to the exact bit > fraction" Its really funny when you think about it. The question is not "file size" (which is rounded to next byte at best, or even sector or cluster on the disk) but size in bits (or bit fractions). As explained in (A4) on several examples, there are many practically important cases when size in bits, and even size in bit fractions matters (whenever you have many such little leftovers). That you can imagine or cite cases where such precision doesn't matter, such as when storing output into a free size file on the disk, is a non sequitur for the point made in (A4). The point in (A4) is that there are practically important cases where such precision does matter and there QI is distinctly ahead of AC (let alone Huffman). That the bit fractions do matter (and what they mean), it should be already obvious even from our tests on A=3, where we were trying to pack symbol x<3 as closely to the exact bit fraction log(3)=1.584962... as possible, instead of just storing each symbol in 2 whole bits, or coding it in Huffman code as e.g. 0:0 1:10 2:11 (which gets you 1.66.. bits/sym on average). The point (A4) is in this case (A=3) that if you need to store compressed inputs with N=10^6 symbols into fixed size fields (or skip quickly over compressed data in a larger package containing them), with the flat 2-bit code for symbol you will need to reserve space for 2*10^6 bits. Huffman, which on average reaches 1.66.,. bits/symbol will also need to reserve 2*10^6 bits to guarantee that _all_ sequences from the set of possible inputs (I called it Set_QI) will fit into the reserved space. With AC, depending on implementation you may need about 20-60 bits, as in Mahoney's reported tests, to guarantee fit for _all_ possible A=3 N=10^6 sequences { much more for Moffat98 which has a sub-optimally skewed quantization as explained here: http://groups.google.com/group/comp.compression/msg/efa6336f483bbb89 in a post to Thomas Richter). The alternative to reserving the extra space with AC and Huffman, in cases where you only need to traverse, within a larger package of items, the compressed data _quickly_ (which means without decompressing) but not store it into the fixed size record/packet fields is to store the compressed length separately, in addition to storing N and A (for self-contained compressed package). In contrast, with QI coding these A=3, N=10^6 inputs, you not only can know that its index will certainly fit in precisely 1584963 whole bits, but you _also_ know from the quantized power q[A^N] (which is the QI table entry) that the leading 32 bits of the index will be always smaller than the 32 bit value 0xB521509A (which means you can package the index via mixed radix codes with other items to reclaim the N*log(3) bit fraction to within 1/2^31 bits from the exact N*log(3) lower bound). QI also does not need to store separately the compressed length -- all it needs are A and N values, which it already needs anyway (just like the other coders do need A and N in some form) to know the exact index size and its exact upper bound. This precision is consequence of QI output optimality (it produces index closest to N*log(3) for any g-bit addends which also satisfy Kraft inequality eq. (20) in [T3]) and its finer- granularity parametrization (A2). That was all explained at length in the earlier posts cited above. The (A4) only summarizes the main points.

0 |

1/22/2006 8:24:00 PM

"nightlight" <nightlight@omegapoint.com> wrote in news:1137961440.803193.5700@g44g2000cwa.googlegroups.com: > Subject: Re: Quantized Indexing Source Code (update & alg. history) > From: "nightlight" <nightlight@omegapoint.com> > Newsgroups: comp.compression,sci.math > >> Appearently so far it was to hard to get QI to >> fit in the world of Moffat and use real files >> so anyone could check with the files of there >> choice. > > QI source code & executable (which are available for anyone to > verify as they wish), gives you OutSizeQI in bits to use for > the compression ratio vs any AC you may have: > > Appearantly I missunderstood your comments below in message number 94 http://groups.google.com/group/comp.compression/browse_frm/thread/7053c23c0 d01c81c/1e015f38d228969e?lnk=st&q=1584993&rnum=1&hl=en#1e015f38d228969e QUOTE ON Hence the total decodable output for N=10^6 symbols, A=3 is: DECODABLE OUTPUT = 1584993 bits = 198124.125 bytes That size is fixed, the same for all inputs. You can check the whole code-decode-compare in the function radix_iter() in Tests.c file. QUOTE OFF Maybe my code that is in http://bijective.dogma.net/nitelite.zip is way off base and doesn't seem to compress to 1584992 bits or 198124 bytes. That is if I put it to a buffer and included in a poor way the length field. Actully the compressed file length was 198121 bytes. Maybe this is all a dream of mine. If so I GIVE UP YOU WIN. I HAVE WASTED ENOUGH TIME TRYING TO ARGUE WITH YOU. DO YOU UNDERSTAND I GIVE YOU WIN I GOT TIRED BEFORE YOU DO THEREFOR YOU WIN. I WILL NOT POST IN THIS THREAD ANYMORE SO GO AHEAD BE THE LAST ONE TO POST. David A. Scott -- My Crypto code http://bijective.dogma.net/crypto/scott19u.zip http://www.jim.com/jamesd/Kong/scott19u.zip old version My Compression code http://bijective.dogma.net/ **TO EMAIL ME drop the roman "five" ** Disclaimer:I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged. As a famous person once said "any cryptograhic system is only as strong as its weakest link"

0 |

1/22/2006 10:07:29 PM

I guess your AC must have choked up on the 32 bit alphabet size A32=3,133,061,822. Otherwise, why would you start recycling "results" from your "random unbiased sample" of five cherry picked files where even ZIP beats the N*log(3) "lower bound" on index size by about 200 times. To say nothing of comparing apples & oranges in other ways as well (such adding different costs for N etc, or using 64 bit AC,... see old posts). We don't need to waste further time flogging that particular dead horse. We also have Matt's and Moffat98 results (you have Moffat98 source code link to verify my statement on that) on the proper unbiased random sample already (you're using Matt's AC anyway). And QI source is there to test using the command line as described. That's all you or anyone else needs to find out for themselves. > I GIVE UP YOU WIN. This has as much to do with "win" as does winning on: "which is bigger 2+2 or 3+3?" It is an elementary mathematical fact (see [T3] pp. 2, 8 and [T2] pp. 22-25, and section A3 in the "summary" post). There is as much of a "contest" and "win" about it as in the "2+2 vs 3+3" contest. The room QI leaves for improvement in these N,A tests is very tiny, a gap from QI to the entropy N*log(A) of about about 2e-10 bits per symbol for _any_ alphabet size or 2e-4 bits for the total N=10^6 symbols (well within 1 whole bit of entropy). Not even the _infinite precision_ AC will beat that little gap (it will be 1-2 bits above the N*log(A) lower bound). The only coding algorithm which uses a limited precision arithmetic which could beat that tiny bit fraction left is another version of QI (or an equivalent optimal, hence bottom-up quantizer) using higher precision than g=32 bit arithmetic included in the source (at any given addend precision, the QI addends are mathematically the smallest ones which also satisfy the Kraft inequality eq. (20) in [T3] and pigeonhole principle, eq (8) in [T3]).

0 |

1/22/2006 11:27:40 PM

Hi, In comp.compression nightlight <nightlight@omegapoint.com> wrote: > Since the debate has wound down, here is a condensed summary of the > differences between QI and arithmetic coding (AC) all in one place. Question: What is the purpose of this post? a) Keep QI in discussion? (for example for b) b) Promotion of QI? c) a scientific discussion. In case of a) or b), you are wrong here. This group is not your favourite advertisment place. We all work for companies that pay our bills and are all aware of that, but you should try to loosen this relation a bit when posting here. In case of c), you lack conterarguments that have been presented in the discussion, and your posting requires shortening. /* historical side remarks snipped * > For the rest of this post I will address the QI's solutions > for the four principal remaining weaknesses and tradeoffs > introduced by the AC algorithm. > A1) -- SPEED & POWER CONSUMPTION -- > The AC use of complex, power hungry instructions (mul/div when > coding at its maximum precision) and the requirement for coding > operations on the more probable symbol (MPS) results in speed & > power consumption penalties (the latter becoming increasingly > important). Objection, your honor. Whether mul/div is "slow" compared to the QI table lookup (that is not mentioned here) is a question of the table size and the framework the algorithm is implemented for. You don't discuss the table sizes. Do they grow with the data set? Side remark: State of the art desktop processors are *much* faster on mul/div than on table lookup due to lack of caching. > In contrast, QI performs no coding operations on the > MPS and it uses fewer and simpler instructions on the less > probable symbol (LPS). Objection, your honor. If you mention that QI performs no coding operation on MPS, you should mention that there are variants of AC for which this holds. MQ is one of them. So what are you targetting at? Speed, or optimality? I don't think you can have both. > These two performance distinctions extend > to a general alphabet of size A coding, through all A-1 internal > nodes of the binary decomposition of the alphabet (cf. [T1], pp. > 32-38). > Variety of details giving rise to the significant QI speed > advantage in different coding settings fit together in a general > pattern of QI's much better division of labor at all levels -- > within the functions of the coder itself and extending to a > similarly improved division of labor between the coder and the > modeling engine. In which sense "much better"? > Within the coder proper, QI separates cleanly the general > combinatorial properties of all symbol sequences satisfying some > types of conditions (the "enumerative classes") from the > incidental properties distinguishing individual sequences of > that type. The general part (the quantized enumerative addends > via eq. (21) p. 8, [T3]) is computed up front, once and for all > and outside of the coding loop, with the results saved into > universal tables (which are independent of source > probabilities). Here we have a problem. How large are these tables? If I want an improvement over AC on messages of length N -> infinity, I would suppose the size of the tables have to grow in some power of N to have all the binomial factors covered. Comments? > The coding loop for a particular instance does > only the absolute minimum work that deals exclusively with the > individual properties of that instance (the index computation, > eqs. (21),(22) p. 8 [T3]). > Similarly, QI/EC modeling engine (cf. p. 27, [T2]) processes the > entire finite sequence being encoded, decides on its > decomposition into enumerative classes (ranging from simple > segmentation of the input into fixed or variable contiguous > blocks, through BW transform and selection of the optimal BW > output column segments), then hands its complete output to the > suitable enumerators within the encoder for index computation > within the selected enumerative classes. Both components, the > modeler and the coder (enumerator), perform their specialized > tasks on the entire input, without interacting symbol by symbol > as done with AC. Objection: If we deal with non-all-purpose compressors, you often have to deal pre-defined alphabets. AC can handle this fine because the modeller is a separate stage. How would QI handle this? > Therefore, the speed & power consumption edge of QI over AC is > not a result of a coding trick or a lucky pick of parameters > which happens to work well in some cases or any such accidental > circumstance. The QI speed gains are large for all inputs -- for > all source parameters and for all input sizes. See above. No serious speed analysis has happened yet. I personally wouldn't care, but since you want to discuss it, you should use a scientific method to do it. /* snip */ Paragraph about Ford snipped. What's the value of this paragraph? It's not part of a summary. > A2) -- PROBABILISTIC PARAMETRIZATION -- > The AC reparametrization of EC enumeration of finite sequences > into the probabilistic framework (where the exact combinatorial > parameters of finite sequences are replaced with normalized > limit values of infinite sequences), can generally provide only > a lower resolution, approximate models for a much richer space > of finite sequences coding problems (which includes all > practical coding tasks). In contrast, QI modeling interface uses > precise finite sequence parameters, which are richer, sharper > and more flexible language for modeling finite sequences. Might very well be. > As result, in the field of constrained coding (used e.g. in > recording media and general constrained channel coding), where > such finer controls over the precise finite sequence parameters > are vital, EC remains the method of choice, despite the > intervening advent of AC and the performance drawbacks of the > unlimited precision EC. Why the method of choice? Which things become simpler, which things become harder? Where to put pre-knowledge on the data if there is no model, for example? (I'm not saying it's good or bad, I just lack a clear methology here.) > In the wider realm of practical coding, AC's loss of resolution > in the space of possible parametrizations has generally narrowed > down the spectrum of modeling algorithms useful with AC, to > essentially the PPM & CTW type of algorithms as the apex of > modeling. It has also constrained the language and the type of > parameters that a modeling engine can use to transmit all it > knows about the finite sequence being encoded to the coders, > reducing it in practice to 'probabilities of the next single > symbol'. Yet, presently the most widespread and the most > practical general compression algorithms, such as LZ & BWT > families, perform "surprisingly" effective modeling of finite > sequences in what are intrinsically the finite sequence > parametrizations (dynamic dictionary entries for LZ, or context > sorting via BW block transform for BWT) without ever computing > 'probabilities of the next single symbol' (or any probabilities > at all). That's because you're mixing two things here. If you consider BWT + AC as one code, you would see that the purpose of BWT is just to provide an alphabet and symbol definition to the AC to make it efficently compressible. That is, BWT defines a model that is suitable for most text/program compression. It is a lot less usable model for, e.g., image compression. BWT without entropy coding does not work. Entropy coding without BWT does work, but not very well on, e.g. text files. Thus, BWT+AC does a lot more than "predict the next symbol". It also defines what "the symbol" should be, in a clever way. That is, the "probabilities" within AC are not "probabilities of words". > Another side-effect of the ill-fitting parametrization for > enumeration of finite sequences, is the performance penalty. > Specifically, if one were to try emulating with AC the > streamlined table based coding of QI (Schalkwijk has shown how > this can be done for Elias algorithm, cf. p. 19-20 [T2]), so > that AC would need encoding operations only for LPS, while > skipping the MPS, and for LPS to have addends precomputed and > ready in the table, one would need separate table of the same > size as QI's table, for each source probability distribution. That is likely to be correct (I'll buy this). > In > binary case this would increase the table size by factor O(n) > over the QI's universal table size, which is the factor QI > gained over the exact EC (i.e. AC did not solve at all the EC > table size problem arising in the high speed mode). In short, > the probabilistic parametrization lacks the resolution to draw > the precise line separating cleanly the universal (which could > be precomputed into a table) from the instance enumerative > properties of the finite symbol sequences. This might be, but how does the QI table size grow in n? /* snip */ > Or, expressed as a ratio of AC's over QI's maximum quantization > redundancies: > AC(N) / QI(N) = 4 ... (4) What is the definition of AC(N), resp. QI(N) here? > i.e. even for binary alphabets, which is the best case for AC, > the AC's sub-optimal quantization "leaks" four times more > excess bits per symbol than QI's optimal quantization. > { Note: although the expressions (2),(3) refer to the maximum > redundancies of the coders, the average and pointwise > redundancies are 2-3 smaller for both coders. Upper bounds on redundancies, then? Redundancies measured in what? > The empirically observed compression efficiency differences > shown in the QI vs AC performance table (p. 10, [T3]) are > dominated (in all but the last row) by the contributions of the > O(log(N)) and O(1) terms in D. Objection: There has not yet been an independent empirical observation. For my place mostly because I can't compile your source. (-; /* snip */ Unnecesary small talk snipped. > In addition to the general compression scenario where BWT is > used, many specialized problems of great practical importance > offer a glimpse at this same unexplored realm waiting for the > high precision, low noise coder. Most notable among these are > incremental frame updates in video coding, be it for live video > or for recording & playback media. The residue after the inter- > frame redundancies removal, including the motion compensation, > are collection of small, uncorrelated fragments I object with the "uncorrelated". This one model one typically chooses here. > with relatively > low intra- fragment and high inter-fragment entropies, but where > the O(1) and O(log(N)) terms of AC or Huffman, accumulated over > multiple fragments, would wipe out any gains if one were to try > coding such fragments separately. Even the conventional EC in a > suboptimal hierarchical approximation has been found overall > advantageous vs AC in this domain. More data please. By whom? > A similar conflict between > the optimum MDL segementation vs. the coder noise is encountered > in various forms across the image coding domains, especially for > methods based on hierarchical set partitioning (cf. citations on > p. 2 [T3]). > A4) -- STABLE & AVAILABLE COMPRESSED SIZE -- > The output size produced by AC (and even more so for Huffman) > fluctuates unpredictably from instance to instance even for > perfectly equiprobable messages. This is a manifestation of > their coding sub-optimality (A3) and their lower resolution > parametrization (A2). In contrast, the output of QI is perfectly > stable for equiprobable messages (enumerative class), not just > down to 1 bit, but down to the exact bit fraction. Seems to be about correct. So long, Thomas

0 |

1/23/2006 12:04:36 PM

Just a quick follow-up to my follow-up: Another item that lacks discussion/summarization if you want to say so: QI is a LIFO coder. Whether this is good or bad depends on the application, but it ought to be said. So long, Thomas

0 |

1/23/2006 2:47:16 PM

Thomas Richter <thor@mersenne.math.TU-Berlin.DE> writes: > > A1) -- SPEED & POWER CONSUMPTION -- > > > The AC use of complex, power hungry instructions (mul/div when > > coding at its maximum precision) and the requirement for coding > > operations on the more probable symbol (MPS) results in speed & > > power consumption penalties (the latter becoming increasingly > > important). > > Objection, your honor. Whether mul/div is "slow" compared to > the QI table lookup (that is not mentioned here) is a question > of the table size and the framework the algorithm is implemented > for. You don't discuss the table sizes. Do they grow with the > data set? Side remark: State of the art desktop processors are > *much* faster on mul/div than on table lookup due to lack of > caching. I believe I already mentioned that the not-yet-hit-the-streets kit on my desk will do 200 multiplies in the same time as one main-memory access. This particular core is one where power consumption has been a priority consideration, being not a desktop processor. There was no response. Have you just been trolled too? Phil -- What is it: is man only a blunder of God, or God only a blunder of man? -- Friedrich Nietzsche (1844-1900), The Twilight of the Gods

0 |

1/23/2006 11:16:25 PM

>> Since the debate has wound down, here is a condensed summary >> of the differences between QI and arithmetic coding (AC) >> all in one place. > > Question: What is the purpose of this post? > > a) Keep QI in discussion? (for example for b) > b) Promotion of QI? > c) a scientific discussion. Various aspects of differences between QI and AC collected and integrated here were scattered over many posts. As stated at the top, I wanted them all in in one place, where one can see the relations among them more clearly (as the summary itself shows). As for the larger purpose and intended audience, see the post: M1. http://groups.google.com/group/comp.compression/msg/6d8dcafd8b947ea1 > case of c), you lack conterarguments that have > been presented in the discussion, and your > posting requires shortening. The two requirements above are mutually exclusive. As to counterarguments, it is an open group. There were some groundless objections already that were dealt with in the two followups from me. > Whether mul/div is "slow" compared to the QI table > lookup (that is not mentioned here) is a question > of the table size and the framework the algorithm > is implemented for. First, mul/div will consume more power than a simple table lookup, especially if tables are in ROM on a portable device (where power consumption is a critical parameter). The speed ratio is a more complex matter due to cache considerations and that's why the test were done and the [QIC] source released. > You don't discuss the table sizes. The table sizes depend on the type of enumerative classes (cf. N2, p. 9 [T3]). For permutation codes they grow linearly with N (number of items permuted) and similarly with general mixed radix codes (N=number of digits). For general entropy coder, the binomial tables have N^2/4 entries, where N is the max block size. The [QIC] source uses 32 bit precision and it keeps only mantissa in the tables. It computes exponents via array containing N+1 entries for log(x!) in fixed point. Hence for general entropy coding the tables are N^2 bytes. The block size N in the source can be set from 32 to 16384. The "typical" block size (e.g. as used in the tests) for binary coder is N=1K which makes the table size 1MB. This table can be cut in half by storing only every 2nd addend front (the row of Pascal triangle) and interpolating in between via C(n,k)=C(n-1,k)+C(n-1,k-1) (this also reduces quantization errors since the interpolated SWI addends are not quantized but the two terms are added directly into the index). The mixed radix codes given in [QIC] use this "skip" method with factor A (alphabet size) instead of 2, which makes their redundancy independent of the alphabet size (unlike AC where the quantization redundancy for N symbols is proportional to A*N). Regarding cache: C1) Not all of N^2/4 entries need to fit in the cache. Namely, in multiblock coding of a stationary source the table entries used are along the line y=p*x on the (x,y) lattice, where p= probability of 1 (LPS), within a band of width sqrt(N) (for not overly low entropy), which means you need approx. N^3/2 entries to fit in the cache. For N=1K, that is about 3% of the 1MB, or 32KB, well within the desktop processors cache sizes. C2) The cache miss cost can be greatly reduced to practically negligible in the QI case due to the orderly nature of table accesses. Namely, QI coder can simply read but not use the table addend right away. Several statements later (or even on the next or later loop pass) it can use the retrieved addend for addition, which by now was long preloaded into the cache from the main RAM. A very simple variant of this method (which is all I tried), in which the table mantissa gets loaded one loop cycle ahead in the very basic simpleminded implementation, improves QI speed by about 30% on the machines with the largest cache penalty. C3) The CPU/CACHE vs RAM speed ratio C/M has been evolving. The machines (all win/intel) we have around, span 1996-2005 (the oldest ones are still kept for testing). Looking at the worst case QI vs AC speed ratio (the high entropy limit), on the oldest machines the ratio was 6-7, then by 2000 it dropped to 4-5, then it rose again back to 6-7 on the latest machines. (This is all without any cache specific code optimizations, which will work better on newer CPUs due to greater parallelism.) The trend seems to be that the memory bus speed has been catching up with the CPU speeds. C4) For very low entropy sources, the [QIC] source includes a specialized "sparse coder" with tables which cover only a narrow strip of width MK (Max K, this can be set via QI.exe option "mk<maxk>") near the axis y=0. Hence its entire table size is (MK-2)*N. For very sparse arrays, due to larger block size N available (for any given cache size), in multi-block coding this version runs typically 2-3 times faster than the generic QI (only generic QI was used for the comparison table in [T3]). >> In contrast, QI performs no coding operations on the >> MPS and it uses fewer and simpler instructions on the less >> probable symbol (LPS). > > .. If you mention that QI performs no coding operation on MPS, > you should mention that there are variants of AC for which > this holds. MQ is one of them. So what are you targetting > at? Speed, or optimality? I don't think you can have both. Well, I did say upfront that I was talking about AC "coding at its maximum precision" (such as Moffat98, [4]). Since the 3 other points A2-A4 dealt with various aspects of coding precision advantages of QI (and the unexplored potentials opened), mixing in some specialized solutions (be it MQ, QM, Q, ELS, Z, runlength etc) with much lower coding accuracy in the full spectrum of input parameters would have only added the noise to the A2-A4 signal. Doing a study of AC performance/accuracy tradeoffs is a mechanical, non-creative work that you can give to any student as a term project to play with and learn. With so many more interesting, unexplored areas opened by QI, fiddling with various AC approximations and tradeoffs is like leaving a luxurious dinner prepared by the best chefs to eat some week old fast food leftovers from a fridge. > Speed, or optimality? I don't think you can have both. Well, QI does have both. It codes more accurately than the most accurate AC, it runs much faster than the fastest full precision AC such as [4], and it even runs faster than any less accurate specialized quasi-ACs on their own narrow turfs (cf. recent survey by [Said04] which includes MQ coder, showing speed factors of only 2-3 for the quasi-ACs vs the full AC; these are not even close to typical QI factors 10-20, let alone the extreme points of 200+). >> pattern of QI's much better division of labor at all levels -- >> within the functions of the coder itself and extending to a >> similarly improved division of labor between the coder and the >> modeling engine. > > In which sense "much better"? While generally, "better" does require specification of the value function being optimized, in the case here this value function is made explicit (the speed & power consumption), from the title of that section down to the descriptions & car factory analogy that followed the sentence you quoted. Also, generally, a better division of labor means that the interaction between the functional blocks combined to perform the full task is minimized. AC does it exactly backwards - it maximizes this interaction by conditioning coding steps on modeling steps, symbol by symbol. The Henry Ford analogy is very close to the differences here. > Here we have a problem. How large are these tables? If > I want an improvement over AC on messages of length > N -> infinity, I would suppose the size of the tables > have to grow in some power of N to have all the binomial > factors covered. Comments? The tables sizes were already answered above. These sizes are result of memory & speed tradeoffs. The size of coding blocks (for index computation) depends on the source. The actual selection of the block size or input segementation policy is a job of the modeling engine within EC/QI scheme. The basic objective is to minimize total output (MDL), the index + the model parameters. Hence the transformation of the input sequence into the enumerative classes varies greatly, depending on input sequence ([T2] pp. 26-35). A simple coder for stationary sources would use maximum block size N that the tables allow. Generally, though, the "messages" given to the coder for indexing need not be contiguous segments of the input sequence at all (e.g. when using BWT as the QI modeler, see also [M2] for two ways of QI use with AC modelers). > If we deal with non-all-purpose compressors, you often > have to deal pre-defined alphabets. AC can handle this > fine because the modeller is a separate stage. How would > QI handle this? The QI multi-alphabet coding is described in [T1] pp. 31-38). In general case it is a binary decomposition into (A-1) streams, (where A is alphabet size). That decomposition can be seen as a special case of the m-th order Markov source described in [M2] method (b). For high entropy sources QI uses mixed radix codes ([T1] pp. 46-51, also in Radix.c in the [QIC] source). Note also that QI modeling is "descriptive" and not "predictive", hence decoder is not trying to predict anything based on previous symbols. It simply reconstructs the sequence from the specified enumerative classes and their index values given (cf. [T2] pp. 26-35). Hence QI's division of labor between the coder & modeler is much cleaner than for AC, since the interaction between the two doesn't occur symbol by symbol as it does with AC. The often touted "separate" modeler & coder of AC, looks in comparison like a car factory building one car at time, with all machines & workers just working on that one care, from start to end, and only then starting the next car. Obviously the trick is how to get good enumerative classes for a given sequence (or set of sequences/source). The BWT (just the bare transform) is the general purpose QI modeler. The selection of the optimum partition of the BWT output column is an interesting problem with solutions still evolving (the simple bottom up method mentioned in (A3), although accurate, is still too slow and clumsy for my taste). For idealized sources, such as the m-th orer Markov source one can use simpler methods (see [M2] method b). > No serious speed analysis has happened yet. I personally > wouldn't care, but since you want to discuss it, you > should use a scientific method to do it. There is obviusly much more one can do on theoretical and experimental sides. But, with many more interesting problems in front of me, I will leave that particular exploration to others. Tests & comparisons with Moffat98, which is overall the best (on combined speed & accuracy) full precision general AC, given the magnitude of the speed differences found, is more than indicative what is ahead on that topic. Note also that Moffat98 crowns the two decades of AC optimizations (and it has also a highly optimized C code), while QI tested (p. 10 [T3]) was a relatively crude prototype code of an algorithm 6-8 months old at the time. Hence, it is reasonable to expect only greater speed ratios in the future. > Paragraph about Ford snipped. What's the value of this > paragraph? It is a close analogy to the difference in the division of labor at all levels between QI and AC. If the speed & power consumption is the value function being optimized (instead of AC's "online" capability, which is the Morse code style analog line "on-line", with a fictitious constraint of always having to process one symbol at a time from top to bottom and from start to end), then the QI's streamlined division of labor is the way to do it. > Why the method of choice? Which things become simpler, which > things become harder? Where to put pre-knowledge on the data > if there is no model, for example? In constrained coding you are coding a maximum entropy input (all symbols equiprobable, being an output of some unconstrained entropy coder) and you wish to produce output which satisfies precise conditions on output symbol counts, such as max and min numbers of successive 0's and 1's, with often different min & max values for the start and end of the sequence. The way EC does this is to look at the constrained sequence (which in CC they consider as "encoded output") as the EC encoder's input, where the complex constriants are simply an enumerative regularity, which is known upfront, hence it needs no transmission, that EC enumerates (calculates index). The EC output is the index plus any extra info to specify enumerative class (with much of the 'regularity' info being hardwired constant, thus not packaged). Hence the "CC decoding" is the "EC encoding" (and vice versa). While AC can compute the index for these complex constraints, what you end up with is an AC of enumerative style (like decrementing AC of [34], but more complex) with complex and expensive probability updates done in the coding loop and lower accuracy than EC. Since in CC they use relatively smaller block sizes (dictated by hardware & recording media formats), the unlimited precision EC on small blocks with addend tables in ROM is often faster, simpler and more accurate than AC. You can check Immink's work [26] for more details and on his arguments for use of EC within CC. He develped a floating point approximation for EC which, except for violating the EC form of Kraft inequality (eq. (20), p. 8 [T3]) and pigeonhole principle (eq. (8), p. 5 [T3]), almost worked (see [T3] p. 2). It is basically similar kind of idea as QI, except that he didn't have a clear concept for SWI ([T3] p. 7, which unlike FP, have arithmetic operations formally decoupled from rounding, which is with SWI considered an independent operator that application invokes or doesn't invoke, like any other operator, as appropriate for the algorithm) or a clean formalism with factored constraints (eq. (5) & (15) for EC, (21) & (23) for QI). With everything mangled together it was difficult in that approach to see what exactly was missing (eqs. (20) & (8), (5) in [T3]) and how to fix the decodability problem (eqs. (21), (23) in [T3]). > That's because you're mixing two things here. If you consider > BWT + AC as one code, you would see that the purpose of BWT > is just to provide an alphabet and symbol definition to the > AC to make it efficently compressible. ...Thus, BWT+AC does > a lot more than "predict the next symbol". It also defines > what "the symbol" should be, in a clever way. That is, > the "probabilities" within AC are not "probabilities of > words". I am not missing anything. You are merely hardwired to translate everything when the term "model" comes up into the AC modeling paradigm, that particular division of labor with its probabilistic language and the "probabilities of the next symbol" (see distinction (A2)). BWT or LZ don't do the modeling of finite sequences in that language. BWT output can be coded enumeratively without ever performing any translation of BWT model (the BW bare transform output column R, with its full context tree) into "probabilities of the next symbol" or any probability at all. The enumerative coder only needs to know the boundaries of the fragments (which represent enumerative classes) to produce the index. The whole AC translation layer is never computed or considered. All the communication between the modeling engine (BW transform generator + segment constructor) and the coder is in terms of the finite sequence parameters (fragment positions & lengths). The probabilistic language is not needed by either and that layer is simple never constructed. With existent BWT implementations, you have variants where probabilistic layer gets constructed, and others where it doesn't (e.g. MTF & various forms of runlength & other universal and synthetic Huffman codes). Depending in which language you interpret those methods which never compute any probabilities, you can at best insist that they use implicit probabilities. >From the perspective of "descriptive" modeling ([T2] pp. 26-35) though, that kind of non-functional conceptual scaffolding is a superfluous, low resoluton post hoc "explanation" of the much richer and finer-grained finite sequence tranformations and properties. That was not the "mixing" of the "two things" but the main point of the distinction (A2). > This might be, but how does the QI table size grow in n? The QI binomial table has n^2/4 integer entires for block of size n (the QI.exe gives you more accurate figures, see new_qbct() in Qi.c which for binomials has exact value n^2/4- n+1). The table AC would need would have n^3/4 entries if its p is can distinguish n values (see [T2] pp. 19-20). Obviously, if you quantize p into a different number of levels than n, then you need L*n^2/4 entries. >> Or, expressed as a ratio of AC's over QI's maximum quantization >> redundancies: >> AC(N) / QI(N) = 4 ... (4) > > What is the definition of AC(N), resp. QI(N) here? That was from the cited refs right above that place (cf. eq. (20) p. 14 [41a] for AC and d(g) p. 8 in [T3]...). They are upper bounds on absolute redundancies, in bits. >From AC(N)=4*N*A*log(e)/2^g for A=2 => AC(N)=8*N*log(e)/2^g while QI(N)=2*N*log(e)/2^g, hence the ratio (4). Note that AC quantization losses are due to truncation of the infinite fractions (compared to unlimited precision Ac) and to its variable size mantissa for Pc (I assume you clarified by now the conventions of the Volf-Tjalkens AC formalism), which is allowed to drop to 2^g/4, thus to g-2 mantissa bits, before they renormalize it. That leads to a lower precision intervals (the mantissa of Pc) and a reduction of Pc for the next stage. QI, which does that quantization outside of its coding loop (eq. (21) p. 8 [T3]) can afford to be more careful and keep its g bit mantissa always normalized to exactly g bits. Also, working in integers QI doesn't have the infinite fractions that must be forgotten (one way or the other) on every step. E.g. near the axis y=x, where binomials are nearly equal (the same exponents), adding two binomials C1+C2 leads to rounding up only 1/2 of the time (only if the lowest bits of C1 and C2 are different). Finally, for the initial g+O(1) steps QI makes no approximation at all (and for even more steps farther away from the axis y=x). { The QI.exe with the [QIC] source shows you (via option "cbr") row by row for all binomials all quantizations, in the last step and cumulative and the resulting cumulative bit excess (for n up to 2^20, which can be extended in the Intro.h). The command "ct" gives the full stats for the entire table for n up to 2^20, e.g. for n=1K, it shows 72% of binomial computations (eq. (21) p. 8 [T3]) produced rounding up, while for n=16K, 73.96% were rounded up. } > There has not yet been an independent empirical > observation. > For my place mostly because I can't compile your > source. (-; The current [QIC} 1.03 has _ASM option in Intro.h to turn off few MSVC _asm macros (for bit position & mul/div). Also, Jasen Betts who posted in this thread has created a unix version (with different, perhaps a bit lower res timers). I don't have a set- up to test it at the moment so it is not released (it will have another directory port\* with his win32 substitutes), but he could probably email you his copy that does compile and run. >> coding such fragments separately. Even the conventional EC in a >> suboptimal hierarchical approximation has been found overall >> advantageous vs AC in this domain. > > More data please. By whom? Mostly from my own ongoing work on the video codec (which motivated last summer the search for a more practical high precision enumerative coder and which ended up as a long detour, when the QI idea came up). See also similar observations from Oktem et al. papers [23],[46] which deal with more general image coding, on advantages of their hierarchical EC (the interframe differences coded by a video codec are subsets of their inputs with additional fragmentation, hence they are even more suitable for the high precision coding). > QI is a LIFO coder. Whether this is good or bad depends on the > application, but it ought to be said. QI can code as LIFO or FIFO (cf. N1, p.9 [T3]). Note also that encoder can send data out with delay g bits, in LIFO or FIFO mode. In FIFO mode decoder can also decode it as soon as it arrives, while in LIFO mode it has to wait until the whole block arrives. The sample source code uses only LIFO mode. Also, within descriptive modeling there is no FIFO/LIFO coding concept. The sequence handed to the coder+modeler is coded as a whole. The coding task is considered as a request to produce the minimum length description of that sequence, understandible to decoder and taking into account any shared knowledge about it that encoder & decoder may have. The coder & modeler do not gratuitously impose upon themselves some "ideological" constraints, or drag in by inertia various conceptual leftovers from the Morse telegraph era, of having to model+encode one symbol at a time, from start to end before they can "touch" the next one symbol, which also must have index i+1 (or that symbols have to be read in any particular order or just once or just one symbol in one coding loop step,... etc). Of course, all of the above doesn't imply that there is a self- imposed "ideological" constraint prohibiting coding/modeling the AC way, even when that is the most suitable way (see [M2]). --- References ( http://www.1stworks.com/ref/RefLib.htm ) QIC. QI C source code research kit, publicly available at: http://www.1stworks.com/ref/qi.htm T1. R.V. Tomic "Fast, optimal entropy coder" 1stWorks TR04-0815, 52p, Aug 2004 http://www.1stworks.com/ref/TR/tr04-0815b.pdf T2. R.V. Tomic "Quantized indexing: Background information" 1stWorks TR05-0625, 39p, Jun 2005 http://www.1stworks.com/ref/TR/tr05-0625a.pdf T3. R.V. Tomic "Quantized Indexing: Beyond Arithmetic Coding" arXiv cs.IT/0511057, 10p, Nov 2005 http://arxiv.org/abs/cs.IT/0511057 34. J.G. Cleary, I.H. Witten "A Comparison of Enumerative and Adaptive Codes" IEEE Trans. Inform. Theory IT-30 (2), 306-315, 1984 http://www.1stworks.com/ref/Cleary84Enum.pdf 4. A. Moffat, R. M. Neal, I.H. Witten "Arithmetic coding revisited" ACM Trans. on Inf. Sys. Vol 16, No 3, 256-294, July 1998 http://www.stanford.edu/class/ee398/handouts/papers/Moffat98ArithmCoding.pdf 41b. B. Ryabko, A. Fionov "Fast and Space-Efficient Adaptive Arithmetic Coding" Proc. 7th IMA Intern. Conf. on Cryptography and Coding, 1999 http://www.1stworks.com/ref/RyabkoAri99.pdf [Said04] Amir Said "Comparative Analysis of Arithmetic Coding Computational Complexity" HPL-2004-75 http://citeseer.ist.psu.edu/said04comparative.html M2. Coding methods (a) & (b) for AC modeling engine: http://groups.google.com/group/comp.compression/msg/1314ff87da597fad 26. K.A.S. Immink "A Practical Method for Approaching the Channel Capacity of Constrained Channels" IEEE Trans. Inform. Theory IT-43 (5), 1389-1399, 1997 http://www.exp-math.uni-essen.de/%7Eimmink/pdf/ultra.pdf http://www.exp-math.uni-essen.de/%7Eimmink/refs.html 23. L. Oktem "Hierarchical Enumerative Coding and Its Applications in Image Compressing" Ph.D. thesis, TUT Finland, 1999 http://www.1stworks.com/ref/oktemThesis.pdf 46. L. Oktem, R. Oktem, K. Egiazarian, J. Astola "Efficient Encoding Of The Significance Maps In Wavelet Based Image Compression" Proc. ISCAS 2000, 28-31 May, Geneva, Switzerland http://www.cs.tut.fi/%7Ekaren/project_site/publications/sigmap.pdf

0 |

1/24/2006 12:27:13 AM

(a) > I believe I already mentioned that the not-yet-hit-the-streets kit > on my desk will do 200 multiplies in the same time as one main-memory > access. (b) > This particular core is one where power consumption has been > a priority consideration, being not a desktop processor. 1. Statement (b) has no relation to the power consumption of mul/div vs power consumption of memory read (e.g. fixed table in ROM on a portable device). The mul/div will use more power than such memory read. And the faster that mul/div circuit, the more power it uses. 2. As to cache miss penalty, see C1-C4 from the previous reply. 3. Note that you can compute quantized binomials one step at a time using 1 mul and 1 div (in machine precision), without any tables at all (e.g. C(n+1,k+1) = C(n,k)*(n+1)/(k+1) using QI's sliding window integers for C's). For QI that is simply an extreme point in the spectrum of possible table size reductions (see note N2, p. 9 in [T3] for other points in that spectrum). The tradeoff is that in that case QI would need to do one such mul/div pair on each symbol, LPS and MPS. That just happens to be exactly the amount and exact kind of work AC does for its encoding steps (which is not a coincidence, see [T2]). The speed difference would be gone at this extreme point of the spectrum. The only difference left would be the QI's more accurate quantization (with about 1/4 in quantization produced excess of bits/sym compared to AC). For the rest of the spectrum, though, the AC can't select some other tradeoff, which would make it code as fast as QI without dropping even further (and very drastically so) its already inferior coding precision. Even the low accuracy quasi-ACs (see Said04 ppaper from the previous post) trading accuracy for speed, run only 2-3 times faster than the full AC, such as Moffat98. That's not anywhere near the QI vs Moffat98 speed ratios across the spectrum of input entropies.

0 |

1/24/2006 1:06:35 AM

Holy shit. This has been the longest, most drawn out thread. So how many BINARY BITS caused this whole dispute? NOT BYTES, BITS.

0 |

1/24/2006 1:12:49 AM

> Holy shit. > This has been the longest, most drawn out thread. You're not exactly helping keep it shorter. > So how many BINARY BITS caused this whole dispute? > NOT BYTES, BITS. See table on page 10 in the arXiv preprint: "Quantized Indexing: Beyond Arithmetic Coding" http://arxiv.org/abs/cs.IT/0511057

0 |

1/24/2006 1:29:32 AM

Hi, > Various aspects of differences between QI and AC collected and > integrated here were scattered over many posts. As stated at the > top, I wanted them all in in one place, where one can see the > relations among them more clearly (as the summary itself shows). If so, then your collection at least had a very strong tendency. > > Whether mul/div is "slow" compared to the QI table > > lookup (that is not mentioned here) is a question > > of the table size and the framework the algorithm > > is implemented for. > First, mul/div will consume more power than a simple table > lookup, especially if tables are in ROM on a portable > device (where power consumption is a critical parameter). This depends. If the CPU has to idle for many cycles, it will draw more power during the idling that during the data fetch. > > You don't discuss the table sizes. > The table sizes depend on the type of enumerative classes (cf. > N2, p. 9 [T3]). For permutation codes they grow linearly with N > (number of items permuted) and similarly with general mixed > radix codes (N=number of digits). For general entropy coder, the > binomial tables have N^2/4 entries, where N is the max block > size. Do you mean (N^2)/4 or N^(2/4) = sqrt(N)? Thus, table size grows O(N^0.5) where N is the size of the message. That is, in realistic setups, the question remains whether you receive at all a point where the advantage pays off. Or, mathematically, there is a lower bound M on the memory requirements and a random source such that for all table sizes smaller than M, QI is outperformed by AC. It remains to check what M is and whether M is small enough to make the above a realistic argument for the proposed application, namely embedded devices. > The block size N in the source can be set from 32 to 16384. > The "typical" block > size (e.g. as used in the tests) for binary coder is N=1K which > makes the table size 1MB. Pretty small block sizes, actually. There's no blocking requirement for AC, so it may happen that for a given random source AC outperforms QI in the long run. Now since it has been understood that there are random sources where AC might perform better, what are the random sources QI might outperform AC. And if so, are they relevant to the target application? As I read you, QI shows an advantage in the high entropy domain. What is your target application? (Video/audio data that come out of a predictor are IMHO not 'high entropy' because there's even for lossless a compression gain of 1:2 to 1:4. But then, what's high entropy?. How high?) > Regarding cache: > C1) Not all of N^2/4 entries need to fit in the cache. Namely, > in multiblock coding of a stationary source the table entries > used are along the line y=p*x on the (x,y) lattice, where p= > probability of 1 (LPS), within a band of width sqrt(N) (for not > overly low entropy), which means you need approx. N^3/2 entries > to fit in the cache. For N=1K, that is about 3% of the 1MB, or > 32KB, well within the desktop processors cache sizes. Huh? N^3/2 grows faster than N^2/4. 3% for a stationary 0 order Markov source, then? > C4) For very low entropy sources, the [QIC] source includes a > specialized "sparse coder" with tables which cover only a narrow > strip of width MK (Max K, this can be set via QI.exe option > "mk<maxk>") near the axis y=0. Hence its entire table size is > (MK-2)*N. For very sparse arrays, due to larger block size N > available (for any given cache size), in multi-block coding > this version runs typically 2-3 times faster than the generic QI > (only generic QI was used for the comparison table in [T3]). How does this compare to AC? How to specific high-speed implementations as MQ? > Doing a study of AC performance/accuracy tradeoffs is > a mechanical, non-creative work that you can give to > any student as a term project to play with and learn. > With so many more interesting, unexplored areas opened > by QI, fiddling with various AC approximations and > tradeoffs is like leaving a luxurious dinner prepared > by the best chefs to eat some week old fast food > leftovers from a fridge. I don't deny that QI might be interesting. However, I don't like unproved claims... > > Speed, or optimality? I don't think you can have both. > Well, QI does have both. .... as this one. > It codes more accurately than the most accurate AC, Does it? With blocking? It means that the "memory" of the coder and thus probabilities are only based on a relatively short sample set, where "short" = block size. > it runs much faster than the fastest full precision > AC such as [4], and it even runs faster than any less accurate > specialized quasi-ACs on their own narrow turfs (cf. recent > survey by [Said04] which includes MQ coder, showing speed > factors of only 2-3 for the quasi-ACs vs the full AC; these are > not even close to typical QI factors 10-20, let alone the > extreme points of 200+). I don't know how they implemented MQ, but you can get it to a speed where it is close to writing out bits uncompressed. At least, I would want to do an independent measurement. > > In which sense "much better"? > While generally, "better" does require specification of the > value function being optimized, in the case here this value > function is made explicit (the speed & power consumption), from > the title of that section down to the descriptions & car factory > analogy that followed the sentence you quoted. Not really. If you make this claim, you possibly should back it up because I do not yet find the arguments too convincing. Analogies are fine, but don't prove anything. Thus, is there a provable theorem, or is there an independent study of whether the mentioned claims are correct? > Also, generally, > a better division of labor means that the interaction between > the functional blocks combined to perform the full task is > minimized. AC does it exactly backwards - it maximizes this > interaction by conditioning coding steps on modeling steps, > symbol by symbol. The question is: Is this a useful thing to do? I do have situations where I know in advance that the bitstream I'm pushing into the coder consists actually of several interleaved random sources which behaive quite differently. AC allows me to define contexts. Can QI do that? With its tight modelling, is it able to exploit this statistics? Example: Consider a source consisting of two first-order Markov sources A and B. At an even timestep, I draw from A, at odd timesteps, I draw from B. With modelling, the implementation of an AC coder that is optimized to this situation is easy. What would I need to do with QI? Not exploiting the special nature of this source might be very wasteful. > > Here we have a problem. How large are these tables? If > > I want an improvement over AC on messages of length > > N -> infinity, I would suppose the size of the tables > > have to grow in some power of N to have all the binomial > > factors covered. Comments? > The tables sizes were already answered above. These sizes are > result of memory & speed tradeoffs. The size of coding blocks > (for index computation) depends on the source. If you care about compression performance, the only thing that is meaingful is the limit N->infinity. Is there a finite memory implementation of QI that runs optimal in this limit? > > If we deal with non-all-purpose compressors, you often > > have to deal pre-defined alphabets. AC can handle this > > fine because the modeller is a separate stage. How would > > QI handle this? > The QI multi-alphabet coding is described in [T1] pp. 31-38). In > general case it is a binary decomposition into (A-1) streams, > (where A is alphabet size). That decomposition can be seen as a > special case of the m-th order Markov source described in [M2] > method (b). For high entropy sources QI uses mixed radix codes > ([T1] pp. 46-51, also in Radix.c in the [QIC] source). This wouldn't cover the random source example above. Of course, I can model this as some 2nd order markov chain, but I then pay the penalty of making it harder to adapt to it. In other words, I do not make optimal usage of my apriori knowledge on the source. > Note also that QI modeling is "descriptive" and not > "predictive", hence decoder is not trying to predict anything > based on previous symbols. What means "descriptive"? As I see it, you keep relative frequencies or symbol counts here. Whether you call this "descriptive" (as in, I have counted so and so much sequences of this kind) or "predictive" (as in, I predict the probability of a symbol due to the counts) is just a matter of language. Thus, what *is* the difference, leaving language issues alone? > > No serious speed analysis has happened yet. I personally > > wouldn't care, but since you want to discuss it, you > > should use a scientific method to do it. > There is obviusly much more one can do on theoretical and > experimental sides. But, with many more interesting problems in > front of me, I will leave that particular exploration to others. All fine with me. For that, I would prefer: a) not to state results for tests that haven't been verified yet. b) to have a source code here I can actually make use of. First things first. > > Why the method of choice? Which things become simpler, which > > things become harder? Where to put pre-knowledge on the data > > if there is no model, for example? > In constrained coding you are coding a maximum entropy input > (all symbols equiprobable, being an output of some unconstrained > entropy coder) and you wish to produce output which satisfies > precise conditions on output symbol counts, such as max and min > numbers of successive 0's and 1's, with often different min & > max values for the start and end of the sequence. The way EC > does this is to look at the constrained sequence (which in CC > they consider as "encoded output") as the EC encoder's input, > where the complex constriants are simply an enumerative > regularity, which is known upfront, hence it needs no > transmission, that EC enumerates (calculates index). The EC > output is the index plus any extra info to specify enumerative > class (with much of the 'regularity' info being hardwired > constant, thus not packaged). Hence the "CC decoding" is the "EC > encoding" (and vice versa). While AC can compute the index for > these complex constraints, what you end up with is an AC of > enumerative style (like decrementing AC of [34], but more > complex) with complex and expensive probability updates done in > the coding loop and lower accuracy than EC. Since in CC they use > relatively smaller block sizes (dictated by hardware & recording > media formats), the unlimited precision EC on small blocks with > addend tables in ROM is often faster, simpler and more accurate > than AC. > You can check Immink's work [26] for more details and on his > arguments for use of EC within CC. He develped a floating point > approximation for EC which, except for violating the EC form of > Kraft inequality (eq. (20), p. 8 [T3]) and pigeonhole principle > (eq. (8), p. 5 [T3]), almost worked (see [T3] p. 2). It is > basically similar kind of idea as QI, except that he didn't have > a clear concept for SWI ([T3] p. 7, which unlike FP, have > arithmetic operations formally decoupled from rounding, which > is with SWI considered an independent operator that application > invokes or doesn't invoke, like any other operator, as > appropriate for the algorithm) or a clean formalism with > factored constraints (eq. (5) & (15) for EC, (21) & (23) for > QI). With everything mangled together it was difficult in that > approach to see what exactly was missing (eqs. (20) & (8), (5) > in [T3]) and how to fix the decodability problem (eqs. (21), > (23) in [T3]). Ok, thanks. > > That's because you're mixing two things here. If you consider > > BWT + AC as one code, you would see that the purpose of BWT > > is just to provide an alphabet and symbol definition to the > > AC to make it efficently compressible. ...Thus, BWT+AC does > > a lot more than "predict the next symbol". It also defines > > what "the symbol" should be, in a clever way. That is, > > the "probabilities" within AC are not "probabilities of > > words". > I am not missing anything. You are merely hardwired to translate > everything when the term "model" comes up into the AC modeling > paradigm, that particular division of labor with its > probabilistic language and the "probabilities of the next > symbol" (see distinction (A2)). BWT or LZ don't do the modeling > of finite sequences in that language. BWT output can be coded > enumeratively without ever performing any translation of BWT > model (the BW bare transform output column R, with its full > context tree) into "probabilities of the next symbol" or any > probability at all. The enumerative coder only needs to know the > boundaries of the fragments (which represent enumerative > classes) to produce the index. The whole AC translation layer is > never computed or considered. All the communication between the > modeling engine (BW transform generator + segment constructor) > and the coder is in terms of the finite sequence parameters > (fragment positions & lengths). The probabilistic language is not > needed by either and that layer is simple never constructed. That's about what I meant, though. If you think that I'm thinking in "symbols or probablities" then this is just not so, which is why I wrote that AC isn't used to "predict the next symbol". If so, then the word "symbol" is not used in the right sense. > With existent BWT implementations, you have variants where > probabilistic layer gets constructed, and others where it > doesn't (e.g. MTF & various forms of runlength & other universal > and synthetic Huffman codes). Depending in which language you > interpret those methods which never compute any probabilities, > you can at best insist that they use implicit probabilities. Ok, agreed. > >From the perspective of "descriptive" modeling ([T2] pp. 26-35) > though, that kind of non-functional conceptual scaffolding is a > superfluous, low resoluton post hoc "explanation" of the much > richer and finer-grained finite sequence tranformations and > properties. That was not the "mixing" of the "two things" > but the main point of the distinction (A2). Ok, so a misunderstanding on my side. Sorry. > >> Or, expressed as a ratio of AC's over QI's maximum quantization > >> redundancies: > >> AC(N) / QI(N) = 4 ... (4) > > > > What is the definition of AC(N), resp. QI(N) here? > That was from the cited refs right above that place (cf. > eq. (20) p. 14 [41a] for AC and d(g) p. 8 in [T3]...). > They are upper bounds on absolute redundancies, in bits. > >From AC(N)=4*N*A*log(e)/2^g for A=2 => AC(N)=8*N*log(e)/2^g > while QI(N)=2*N*log(e)/2^g, hence the ratio (4). > Note that AC quantization losses are due to truncation of the > infinite fractions (compared to unlimited precision Ac) and to > its variable size mantissa for Pc (I assume you clarified by now > the conventions of the Volf-Tjalkens AC formalism), which is > allowed to drop to 2^g/4, thus to g-2 mantissa bits, before they > renormalize it. That leads to a lower precision intervals (the > mantissa of Pc) and a reduction of Pc for the next stage. QI, > which does that quantization outside of its coding loop (eq. > (21) p. 8 [T3]) can afford to be more careful and keep its g bit > mantissa always normalized to exactly g bits. Also, working in > integers QI doesn't have the infinite fractions that must be > forgotten (one way or the other) on every step. E.g. near the > axis y=x, where binomials are nearly equal (the same exponents), > adding two binomials C1+C2 leads to rounding up only 1/2 of the > time (only if the lowest bits of C1 and C2 are different). Ok. Thanks. > > There has not yet been an independent empirical > > observation. > > For my place mostly because I can't compile your > > source. (-; > The current [QIC} 1.03 has _ASM option in Intro.h to turn off > few MSVC _asm macros (for bit position & mul/div). Also, Jasen > Betts who posted in this thread has created a unix version (with > different, perhaps a bit lower res timers). I don't have a set- > up to test it at the moment so it is not released (it will have > another directory port\* with his win32 substitutes), but he > could probably email you his copy that does compile and run. I would need to get rid of Windows.h/conio.h. A makefile is put toghether fastly, and I don't mind about the speed. At least *not now* because too many things are compiler dependent in this area, unless the speed differs in magnitudes. Isn't there a switch in VS to force it to ANSI C? > > QI is a LIFO coder. Whether this is good or bad depends on the > > application, but it ought to be said. > QI can code as LIFO or FIFO (cf. N1, p.9 [T3]). Note also > that encoder can send data out with delay g bits, in LIFO or > FIFO mode. In FIFO mode decoder can also decode it as soon as > it arrives, while in LIFO mode it has to wait until the whole > block arrives. The sample source code uses only LIFO mode. > Also, within descriptive modeling there is no FIFO/LIFO coding > concept. Maybe not, but in practical applications there is. (-; > The sequence handed to the coder+modeler is coded as a > whole. The coding task is considered as a request to produce the > minimum length description of that sequence, understandible to > decoder and taking into account any shared knowledge about it > that encoder & decoder may have. The coder & modeler do not > gratuitously impose upon themselves some "ideological" > constraints, or drag in by inertia various conceptual leftovers > from the Morse telegraph era, of having to model+encode one > symbol at a time, from start to end before they can "touch" the > next one symbol, which also must have index i+1 (or that > symbols have to be read in any particular order or just once or > just one symbol in one coding loop step,... etc). That's quite not the point. The entropy coder is often only a minor part of an overall design, and you often do not have the choice of defining the coding order at your will. Besides, hardware applications cannot buffer large amounts of data. So long, Thomas

0 |

1/24/2006 11:30:01 AM

> If so, then your collection at least had a > very strong tendency. What kind of "tendency"? To show what AC shortcomings were solved by QI? At the top is says what are the 4 elements that follow, and that is exactly what was done: "For the rest of this post I will address the QI's solutions for the four principal remaining weaknesses and tradeoffs introduced by the AC algorithm." >> First, mul/div will consume more power than a simple table >> lookup, especially if tables are in ROM on a portable >> device ... > > This depends. If the CPU has to idle for many cycles, > it will draw more power during the idling that during > the data fetch. The pertinent comparison is between mul/div vs memory read (RAM or ROM). The latter takes less power. >> For general entropy coder, the binomial tables have >> N^2/4 entries, where N is the max block size. > > Do you mean (N^2)/4 or N^(2/4) = sqrt(N)? The binomial table is quadratic (see N2 p. 9 [T3] where table sizes are discussed). Hence the size is N*N/4. The C source shows a more accurate value: N*N/4 - N +1 (see Qi.c function new_qbct(), the rows K=0,1 are not stored). > Thus, table size grows O(N^0.5) where N is the size > of the message. Double _no_: It is O(N^2), not O(N^0,5). The N is not the size of the message, it is table size. The table size determines max block which can be indexed as single index. The input string length may be longer or shorter. The QI modeler performs decomposition of the input string S(L) of length L into enumerative classes (see [T2] pp. 27-35). The output of modeler is a set of strings X1(n1), X2(n2),... The table size N only limits n1<=N, n2<=N,... The strings X1(n1),... _may_ be segments of S(L), such as for the simple quasi- stationary memoryless source. For stationary source one would also have n1=n2... Generally, neither is true i.e. the strings X1(n1),... have no simple relation to S(L) or to each other e.g. they could be fragments of the BWT output column R, each of its own length (determined by the optimal, in MDL sense, partition of R, see (A3) & my previous reply). > That is, in realistic setups, the question remains whether you > receive at all a point where the advantage pays off. Or, > mathematically, there is a lower bound M on the memory > requirements and a random source such that for all table sizes > smaller than M, QI is outperformed by AC. Note that even for table limits N=32 or 64, where exact EC has been used, in hierarchical approximation (see Oktem) is fairly competitive with AC. With QI, when you change the max block size N, what changes is the balance between how much is being output as index (within enumerative class) and how much as "class tags" (the model info, such as count of 1's). Since QI packages "class tags" if they are compressible (see "entropy pump" in [T2] p. 27), the only difference is how much work it does. It will work faster if it uses maximum blocks provided by the table limit N. But the upper limit N on table size depends on CPU cache and if you make it too large it can slow the coder down. Value N=1K happens to be the right balance for most machines tested, although on newer machines, the table limit N=2K even 4K works faster than N=1K. Now, that doesn't mean that X1(n1) has to have n1=1K or 2K or some such. N is only the upper bound on lengths of enumerative classes strings modeler can hand to coder. The the effect of the input S(L) length L is that speed ratio generally increases for longer L. The high level reason for this is that the QI's better division of labor pays off better at larger scales. The lower level mechanisms by which this general pattern is realized may vary depending on coding task e.g. on stationary source with p=probability of 1, after multiple blocks of length N, the table elements around the line y=p*x will load into cache, so that later passes will work faster. Note also that timers & timing methods included in [QIC] QI.exe may not be very accurate on short blocks (for N<128) due to timer granularity (~280 ns per tick). The EC.exe program (and source) includes type of loop needed to tests speed of exact EC on short blocks (N=32). > It remains to check what M is and whether M is small enough > to make the above a realistic argument for the proposed > application, namely embedded devices. If the total input length L is very small, e.g. 32 or 64 bits, the AC and QI times are so short and comparable to timer ganularity and also subject to general OS time-slicing effects, that the variation from test to test will be of similar magnitude as the times themselves. One would need a batch method used in EC.exe loops to refine these timings (QI will still come out faster since it does much less work and the tiny table easily fits in cache). > Pretty small block sizes, actually. There's no blocking > requirement for AC, so it may happen that for a given > random source AC outperforms QI in the long run. Now > since it has been understood that there are random sources > where AC might perform better, How does that follow? You need a finer resolution of the concepts 'input length L', 'table limit N' and coder's message lengths n1, n2,... There is no basis in theory or in experiment for any such 'conjecture', be it for speed or for compression effectivness. > what are the random sources QI might outperform AC. Any. As explained earlier, QI can code exactly as AC, in stream mode, if there is anything for which that is the best method, with the only difference that QI quantization is more accurate (due to QI's better utilization of g bit mantissa, which AC allows to vary to well below g bits; plus due to QI's integer arithmetic without needless infinite fractions which have to be dropped by a finite precision AC, one way or the other). > And if so, are they relevant to the target application? As > I read you, QI shows an advantage in the high entropy domain. > What is your target application? (Video/audio data that come > out of a predictor are IMHO not 'high entropy' because there's > even for lossless a compression gain of 1:2 to 1:4. But then, > what's high entropy?. How high?) I think you need to clarify first what is "perform"? You seem to have switched somewhere along the way from speed (table sizes, cache) to compression ratios. For any set of messages M= {M1,M2,...}, provided all coders have a "perfect" model of M, QI will _always_ compress M to a smaller or equal size as AC (trivially, AC or any compressor, can compress individual message from M such as M1 shorter by simply assigning it codeword of length 1). Of course, with "imperfect model" of M, a random pick of codewords (satisfying Kraft inequality) may do better on M than any other coder. Regarding the "imperfect" models of M, though, as illustrated by array "Vary" (p.10, [T3]), the QI's "descriptive" modeling is much more resilient to "surprise" than the usual "predictive" AC modeling. The "descriptive" method is always "surprised" less than the "predictive" method. The AC codewords are an approximate form of QI codewords, which use needless scaling of addends (which results in infinite fractions for exact AC which are "forgotten" by a finite precision AC), randomly fluctuating mantissa length (a very bad idea for precision), sub-optimal top down quantization (the largest addends quantized first, in contrast to optimal bottom up QI quantization of (21) in [T3]). The difference is essentially as if you went out and randomly changed all codeword lengths for set M and then checked if the total violates Kraft inequality and retried the whole thing if the violation was found. If the original set was optimal (the shortest total output size on M for given precision g), as it is with QI (assuming M is a valid enumerative class or a set of such classes, i.e. that we have a "perfect" model of M used by all coders), anything you get from your random variation will produce always a larger (or at best remain equal if you are very lucky) output for M. Also, I have no idea what is this about "high entropy"? I did mention radix codes, and we used that for little tests here in this thread. But that is not the only advantage. You can check the table in [T3] or do tests with the source [QIC]. The compression advantage is across the spectrum of inputs. The point of (A3) in the summary is that those small advantages are merely a tip of the iceberg, and tip on which AC is at its best. Compared to AC, QI has intrinsically higher precision, lower noise coding and finer-grained more flexible modeling interface and language more suitable for finite sequences. The test results on "conventional" compression tasks show only a relatively small QI compression advantage, since "conventional" is by definition the subset of finite sequences where AC performs well. (A3) points out that there is much more beyond the "conventional" subset and the QI compression gains are not limited to the few percent shown in the [T3] table. >> N^3/2 entries to fit in the cache. For N=1K, that is about >> 3% of the 1MB, or 32KB, well within the desktop processors >> cache sizes. > > Huh? N^3/2 grows faster than N^2/4. With full parentheses that should be N^(3/2). The other one is (N^2)/4. It should have been clear from the immediate context that N^3/2, which came from N*sqrt(N), is N^(3/2). >> C4) ... >> For very sparse arrays, due to larger block size N >> available (for any given cache size), in multi-block >> coding this version runs typically 2-3 times faster >> than the generic QI (only generic QI was used for >> the comparison table in [T3]). > > How does this compare to AC? The genric QI results vs Moffat98 AC are in the table in [T3]. You can extrapolate the top rows speed ratios (which is where sparse coder applies) by such factor. > How to specific high-speed implementations as MQ? I have given you [Said04] reference which has results for MQ (and other quasi-ACs), along with full precision AC. The MQ is only 2-3 times faster than full ACs, which doesn't even come close to QI vs full AC ratios, even for generic QI, much less for specialized QI version optimized for some range of inputs (which would be a more fair comparison against specialized ACs). You could, of course, use runlength coders, which will run as fast as QI on very sparse inputs, some coarse grained ones even faster than QI. They will also have larger redundancy than even AC and will perform very badly (in speed and compression) if the input isn't exactly the "ideal" low entropy source for which their codewords were precomputed. > I don't deny that QI might be interesting. However, > I don't like unproved claims... Placing the source code for public access is equivalent to publishing a mathematical proof (in non peer reviewed preprint). Now, if you can't compile source on your system, that is equivalent of saying you can't read the published proof because you don't read English. That is not the same thing as "unproven claims." There are hundreds of millions win32 PCs which can compile or run the [QIC] code. >> Speed, or optimality? I don't think you can have both. >> Well, QI does have both. > > ... as this one. The QI source code has been out there. With the level of hostilty from some corners, anyone who wanted to empirically falsify that claim, has a plenty of chance. Or, with preprints publicly available, which show why it is so, one could have shown that some key mathematical step is fatally wrong, making the whole scheme flawed. Aside from minor typos and a minor complaint of one of the authors whose EC contribution was cited in [T3] (he though I should have put emphasis on a different aspect of his contribution, which I will in the next revision), no falsification of even a minor point has turned up. >> It codes more accurately than the most accurate AC, > > Does it? With blocking? It means that the "memory" > of the coder and thus probabilities are only based > on a relatively short sample set, where "short" = > block size. The blocking doesn't limit the modeler state variables or how the modeling is done. The modeler considers entire input sequence as a request to provide minimum decodable description of that sequence, taking into account any shared knowledge coder & decoder may have about it. Adaptive/predictive modeling is a proper subset of descriptive modeling. Descriptive modeling merely removes needless self-imposed constraints of the adaptive modeler. The blocking only limits how big is the maximum single index that coder can compute. If the enumerative class provided by modeler has sequences longer than coder's table limit N, coder breaks them into smaller blocks. The block boundaries don't cause any bit fraction loss since QI codes these fractions via mixed radix codes (cf. N4 p. 9 [T3]). Your conclusion is no different than concluding that one can't compute Pi to more than 15-16 digits on a PC, since that is how many digits floating point unit handles in a 64 bit 'double'. > I don't know how they implemented MQ, but you can get > it to a speed where it is close to writing out bits > uncompressed. At least, I would want to do an > independent measurement. Well, you can emil him and ask. From the paper, he did seem to go out of his way to optimize the coders tested. Moffat98 also has a test of their coder against Q coder, which they claim is the coder and implementation to beat (on speed), again showing only 2-3 ratio, which is not even close to QI's ratios. Based on those two, I decided not to waste any time 'reserching' the quasi-AC flora and fauna. Note also that even a mere examiniong and copying of uncompressed bits, bit by bit, isn't the fastest way to do it. Look for example at the QI's block coding loop (EncDec.c) which at the top has: do n+=32; // Encode next 32 MPS while((w=*s++)==0); // find next nonzero dword Anything that examines and merely copies bits, bit by bit, on a sparse array will run slower than doing it 32 or 64 bits per instruction (the output index was padded with 0 bits via memset). And the above isn't even optimized QI code (e.g. it doesn't need to do n+=32 inside loop, since the pointer's increment already keeps track of the count, hence we can calculate n outside of the loop as n=((int)s-s0)<<3; further one could unroll loops as done in Moffat98,... etc). >> function is made explicit (the speed & power consumption), >> from the title of that section down to the descriptions >> & car factory analogy that followed the sentence you quoted. > > Not really. If you make this claim, you possibly should > back it up because I do not yet find the arguments too > convincing. Regarding speed & power consumption, as stated above, the QI's division of labor is clearly better, as explained there at the two levels, within the coder and between the coder and modeler. Taking calculations out of the coding loop for the values which are universal constants for a given class of sequences is better for speed and power consumption. Reorganizing modeler calls to coder, so they don't go symbol by symbol but for the entire sequence of symbols is better for the speed and power consumption. What you're mixing in, as your next comment shows, is the separate question, whether such organization will affect compression negatively. A fair question, but whatever its answer, the above "better for speed & power..." is still perfectly valid, as stated. > The question is: Is this a useful thing to do? I do > have situations where I know in advance that the bitstream > I'm pushing into the coder consists actually of several > interleaved random sources which behaive quite differently. > AC allows me to define contexts. Can QI do that? With its > tight modelling, is it able to exploit this statistics? Of course it can, in more ways than one. That was already answered at the end of the original post: M1. http://groups.google.com/group/comp.compression/msg/27c6f329038a1bdc which points to methods (a) and (b) in an earlier post: M2. http://groups.google.com/group/comp.compression/msg/1314ff87da597fad The more relevant method for your setup is (b). That is all assuminmg only use of the existent AC modeling engine i.e. which parametrizes the properties of the finite sequences in the probabilistic language and which converts all that it can extract from the sequence into "probability of the next symbol". Generally, that is only a coarse grained, impoverished way of modeling the finite sequences. But, if as you say, that is all one is allowed to have for the sake of argument, then QI can use methods (a) and (b) above. The price paid is the compression quality of adaptive AC and in case of method (a), the coding speed of AC. Of course, there is neither law nor theorem, be it in practice or in theory, requiring that one has to use coarse grained probabilistic parametrization on finite sequences or gratuituous constraints and communication bottleneck "probability of single next symbol" or model+code "online" (meaning analog line on- line, a la Morse telegraph). Needlessly imposing upon onself such gratuituous constraints is simply a result of conceptual intertia and a herd mentality. > Example: Consider a source consisting of two first-order > Markov sources A and B. At an even timestep, I draw from > A, at odd timesteps, I draw from B. With modelling, the > implementation of an AC coder that is optimized to this > situation is easy. What would I need to do with QI? Not > exploiting the special nature of this source might be > very wasteful. QI would simply classify odd and even elements into two separate enumerative classes, each having its own index. The QI output will consist of 2 indices I1 and I2. Since QI also knows the exact upper bounds M1 and M2 on I1 and I2 (these are all integers with QI, M1 & M2 are SWI mantissas from quantized addend tables), it would encode the top g=32 bits of I1 and I2, call them D1 and D2, as digits in mixed radix M1, M2, i.e. as I = D1 + M1*D2, and this "I" will have upper bound M=M1*M2, in case you need to package similarly that index via mixed radix with some other components of your output. The combined index will be Iq consisting of concatenated leftover sections of I1 and I2 (which now fill the full bits since SWI addends which are quantized full length upper bounds on I1 & I2, and whose mantissas are M1 & M2, have all bits beyond the top g bits=0, i.e. no additional redundancy beyond that contained in the QI quantized addends is produced by these truncated I1 & I2, while I has no additional error since it is computed as a 64 bit product without any approximation) with the mixed radix number I as the high 64 bits of Iq. Note that AC will produce single index Ia ( longer than QI's Iq, due to suboptimal AC quantization + unavoidable O(1) term at least), but unlike QI's index Iq where QI knows its exact upper bound M, AC's coarse grained probabilistic parametrization requires it to normalize all its indices to have the same upper bound 1.00 (since they 'supposed' to be 'cummulative probabilities'), obliterating in the process the bit fractions. Hence the AC's complete index is always in integer number of bits, while QI's is in fractional bits (which can be packaged exactly via mixed radix codes if there are more than just one item to send, or in all cases via tapered/flat Huffman code, where one wastes on average less than 1-log(e)+log(log(e)) = 0.086... bits per index from the exact bit fraction). > If you care about compression performance, the only > thing that is meaingful is the limit N->infinity. Why? Have you ever coded an infinite sequence? Do you expect to ever code one? It is meaningful only in the sense that such coarse grained parametrization (in terms of infinite limit values, after all the 'vanishing' terms can be dropped) has an advantage of having often simple closed form expression which are easy for human consumption. > Is there a finite memory implementation of QI that > runs optimal in this limit? AC or QI implementations tested were limited to max inputs of 2^30 bits. Any finite precision coder, AC or QI or any other, has redundancy terms O(N) due to quantization. Hence none of them will approach entropy in the limit n->inf. For binary alphabet, QI's O(N) term is 4 times smaller than AC's O(N) term (see (A3) and refes there, already discussed). The QI table size limit, call it TN, has nothing to do with these O(N) terms. (They would have a relation if one were outputing blocks in whole number of bits, which QI doesn't do, cf. N4, p. 9 [T3]). Of course, the addends at N=TN have particular quantization error, which is a special case of these O(N) terms, as do addends at N>TN and addends at N<TN. { The program QI.exe included with [QIC] has a command "ct" which displays table stats, showng such redundancies for the whole table, max, avg, last row, etc. Command "cbr" shows the same for any row, plus individual quantizations, on the last step and cumulative, for each binomial. These commands, unlike the full coding tables, keep only one row at a time, so they go up to much higher values N e.g. N=1M while coding tables max is N=16K. } > This wouldn't cover the random source example above. > Of course, I can model this as some 2nd order markov > chain, but I then pay the penalty of making it harder > to adapt to it. In other words, I do not make optimal > usage of my apriori knowledge on the source. The method (b) in (M2), with the extra for your even/odd example would have no problem with non-stationary even/edd example. The quasi-stationary source for QI modeler means it has to select the proper block boundaries (so that within a block it has approximately stationary data i.e. a proper enumerative class). The additional even/odd condition you specified only means that this segementation is done separately for even and for odd input array indices. > What means "descriptive"? As I see it, you keep relative frequencies or symbol counts here. Minor point: relative frequences (that enumerative mode AC would use) are less accurate parametrization of finite sequences for limited precision coders than the exact integer symbol counts (that QI uses). > Whether you call this "descriptive" (as in, I have counted > so and so much sequences of this kind) or "predictive" (as > in, I predict the probability of a symbol due to the counts) > is just a matter of language. Thus, what *is* the difference, > leaving language issues alone? No, that is not the difference. In "predictive" modeling scheme modeler is trying to predict next symbol at position i+1 (in the sense of calculating probabilities for values of the next symbol) based _only_ on the symbols at positions 1,..,i. In "descriptive" modeling there is no such restriction on what kind of correlations symbols are allowed to have. Whenever the symbols 1..i are not correlated well enough with i+1, be it because i is too small or 1..i is low entropy data with too little info altogether or because of nature of data, the predictor pays a price for a wrong presumption. Of course, any practical predictive modeler has to cheat the "ideal" predictor scheme and use various ad hoc Escape schemes (where encoder is allowed to look ahead of decoder and clue it about sudden change) to get around the worst case penalties for entirely wrong prediction that a "ideal" predictor would pay. Consider array "Vary" = int32 { ..-2,-1,0,+1,+2,.. } which was shown in [T3] p.10. { Any array with sudden changes will do here, but this one had advantage (in view of 10 page limit) that I could specify it unambiguously in less than half a line. } Here you have a big "surprise" in the middle of the array, where mostly 1's change to mostly 0's. Also at each 32 bit block, there are little "surprises" of similar kind. Both QI and AC were order-0 coders (i.e. no correlation assumed from a[i] to a[i+1]), and both are allowed to assume quasi-stationary source, i.e. that densities of 1's & 0's may vary along the array. The predictive Moffat98 AC did very poorly here, especially for shorter arrays where at N=4K it produced twice as long output as QI. The QI quasi-stationary order-0 modeler used was very simple: it looks at the whole array as a single block and splits the block in exact half if the two halves would encode shorter (including the cost to indicate the split, which uses longer Huffman codes for the counts) and it is allowed to do the split down to 1K and then stop. { NOTE: There are much better ways to do this kind of modeling via sliding window splits, which are nearly as fast as the simple method used but much more accurate. There are also better ways to code counts instead of precomputed Huffman tables for binomial distribution, but none of it was used in the tests in [T3].} The negative effect of the "suprprise" for the descriptive modeler are limited to the second order items (small compared to enumerative index), such as counts in this case, which cost slightly more to encode when non-default block size is used for enumerative class. With "predictive" AC modeler, the "surprise" affected every "next symbol" (especially from the middle of the array) with codeword lengths for MPS and LPS selected exactly backwards, until eventually the frequences evened out (thus it finally realized what is the actual MPS from now on), but that happened at the very end of the array, well, too late. The freuency scaling didn't help much for shorter arrays. Of course, Moffat98 had additional disadvantage with this kind of input since it is tuned to give extra advantage to MPS, giving it shorter codes than what even the past frequences suggest. So the wrong MPS vs LPS for much of the second half, penalized it more than necessary. This quirk of Moffat98 AC was explained in the earlier post: M3. http://groups.google.com/group/comp.compression/msg/efa6336f483bbb89 There isn't really a good way for _pure_ predictive modeler to fix this problem. Namely, if it changes the adaptation rate, to adapt faster, thus scale accumulated frequences down more often, that will make the coding of genuine stationary sequences less efficient due to greater error in the probabilities (fewer bits used for p, hence greater effect of single count fluctuations). The only way to get around for practical "predictive" coders is to cheat from the "pure predictive" ideal and clue in the decoder to switch gears via some ad hoc escape mechanism. Since "descriptive" modeler is not bound by some imagined "predictive ideal" (but only to MDL considerations, it is much more consistent and resilent in real life situations, where everything is non-ideal. Of course, it doesn't mean that "descriptive" coder has sworn never to select "predictor" as a method to describe some sequence if that provides the minimum length for total output. It only means that "prediction" isn't set on a pedestal as the "only true way". Naturally, as with any specialized compressor, good or bad, there is trivially a subset of finite sequences on which "pure predictive" modeler will do slightly better than the "descriptive" modeler. You're welcome to offer an example of array on which predictive modeler will do a lot better (of the similar ratios as with "Vary" results) than descriptive, if you believe there is any. I don't think there is. (While it is quite easy to trick pure predictor to perform very badly.) > All fine with me. For that, I would prefer: > > a) not to state results for tests that haven't > been verified yet. I consider providing publicly source code in the most commonly available standard language C for the most common platform win32/MSVC, equivalent to providing a mathematical proof in a conventional formalism and in the most commonly read human language. > b) to have a source code here I can actually make > use of. ... > > I would need to get rid of Windows.h/conio.h. > A makefile is put toghether fastly, and I don't > mind about the speed. Sorry again about win32 & MSVC. Jasen Betts has made a unix compilable version (which includes his windows.h substitute and few changes for timer functions) which he could email you. His email is in his posts in this thread. > The entropy coder is often only a minor part of > an overall design, and you often do not have > the choice of defining the coding order at > your will. I was talking about a more powerful and much faster modeler+coder scheme which is opened by the existence of a universally applicable, very fast across the spectrum of inputs, high precision, low noise (the absolute optimum at any arithmetic precision available) coder. To take the full advantage of this largely unexplored algorithmic territory (A3), being opened (the wonder algorithm of the 90s, the Burrows-Wheeler transform with its mesmerizing beauty, was a mere early precursor of the powerful 'magical' algorithms from this territory), there is lots of work to be done for anyone who decides to explore these virgin lands ahead of the crowd. For the existent AC modelers which can't be upgraded, QI can still produce an equal compression as AC, but still code it much faster via method (b) described earlier. { This is an area I am less interested in, although I may do some QI adaptations of this kind for our company clients if that is requested. } > Besides, hardware applications cannot buffer > large amounts of data. Well, even just two symbols ahead is more than "single next symbol" ahead. Even that little flexibility gives you a slight edge, when you don't needlessly constrain coder to "one next symbol". I have yet to run into a coding task where coder is required (by the nature of the task, not by some arbitrary dogma) to encode and output 1 symbol at a time. Everything that is compressed needs packaging and framing of some sort, in hardware and in software, and doing such framing symbol by symbol makes no sense.

0 |

1/24/2006 9:16:18 PM

Hi, > > If so, then your collection at least had a > > very strong tendency. > What kind of "tendency"? The tendency of advocating one algorithm without presenting the shortcommings. > > This depends. If the CPU has to idle for many cycles, > > it will draw more power during the idling that during > > the data fetch. > The pertinent comparison is between mul/div vs memory read > (RAM or ROM). The latter takes less power. This is an unfair comparison because the full operation of performing the read a) stalls the program and by that b) draws more power. > >> For general entropy coder, the binomial tables have > >> N^2/4 entries, where N is the max block size. > > > > Do you mean (N^2)/4 or N^(2/4) = sqrt(N)? > The binomial table is quadratic (see N2 p. 9 [T3] where table > sizes are discussed). Hence the size is N*N/4. The C source > shows a more accurate value: N*N/4 - N +1 (see Qi.c function > new_qbct(), the rows K=0,1 are not stored). So it's O(N^2). I don't care about the factors (and it is unusual to care about them). > > Thus, table size grows O(N^0.5) where N is the size > > of the message. > Double _no_: It is O(N^2), not O(N^0,5). > The N is not the size of the message, it is table size. The > table size determines max block which can be indexed as single > index. In which relation is N to the size of a block? Linear? I want a dependency on the block size. (Every codec can be blocked, but that's a sub-optimal solution because you loose statistics. Or more specific, you cannot be asymptotically optimal if you cannot push N towards infinity.) Specifically, I'm interested in the behaivour for N -> infinity. /* snip */ > > That is, in realistic setups, the question remains whether you > > receive at all a point where the advantage pays off. Or, > > mathematically, there is a lower bound M on the memory > > requirements and a random source such that for all table sizes > > smaller than M, QI is outperformed by AC. > Note that even for table limits N=32 or 64, where exact EC has > been used, in hierarchical approximation (see Oktem) is fairly > competitive with AC. With QI, when you change the max block size > N, what changes is the balance between how much is being output > as index (within enumerative class) and how much as "class tags" > (the model info, such as count of 1's). Since QI packages "class > tags" if they are compressible (see "entropy pump" in [T2] p. > 27), the only difference is how much work it does. It will work > faster if it uses maximum blocks provided by the table limit N. > But the upper limit N on table size depends on CPU cache and if > you make it too large it can slow the coder down. Value N=1K > happens to be the right balance for most machines tested, > although on newer machines, the table limit N=2K even 4K works > faster than N=1K. Now, that doesn't mean that X1(n1) has to have > n1=1K or 2K or some such. N is only the upper bound on lengths > of enumerative classes strings modeler can hand to coder. You're too tight at the machine. If I'm discussing optimality, I don't care about cache sizes. This is a second argument that then has to follow as soon as the code should be made fast. *NOT NOW* By limiting the table size, you limit the quality by limiting the block size. I want to understand how the block size relates to the table size, and then I want to understand what happens in infinitely large blocks. *Then* one can enter the discussion whether it pays off speedwise to use blocks. > Note also that timers & timing methods included in [QIC] QI.exe > may not be very accurate on short blocks (for N<128) due to > timer granularity (~280 ns per tick). The EC.exe program (and > source) includes type of loop needed to tests speed of exact EC > on short blocks (N=32). I *don't care* about timing. Not now. And if I had to, I would use my own method of measuring it. > > It remains to check what M is and whether M is small enough > > to make the above a realistic argument for the proposed > > application, namely embedded devices. > If the total input length L is very small, e.g. 32 or 64 bits, > the AC and QI times are so short and comparable to timer > ganularity and also subject to general OS time-slicing effects, > that the variation from test to test will be of similar > magnitude as the times themselves. You're shifting off. I will find methods to measure - you shouldn't care. > > Pretty small block sizes, actually. There's no blocking > > requirement for AC, so it may happen that for a given > > random source AC outperforms QI in the long run. Now > > since it has been understood that there are random sources > > where AC might perform better, > How does that follow? You need a finer resolution of > the concepts 'input length L', 'table limit N' and coder's > message lengths n1, n2,... There is no basis in theory > or in experiment for any such 'conjecture', be it for > speed or for compression effectivness. It is pretty simple: If you encode messages in "blocks", then the coder aparently cannot use knowledge from the previous block to encode the next one. Thus, you loose information, thus you are suboptimal. Whether this information is statical, are "probabilities" or whatever does not matter. You cannot get better by blocking, you can only get worse. > > what are the random sources QI might outperform AC. > Any. As explained earlier, QI can code exactly as AC, in stream > mode, if there is anything for which that is the best method, No, it can't, obviously. If I have to block, I can build an AC coder - in principle - that carries the statistics over a size that is larger than a QI block size. See above. > with the only difference that QI quantization is more accurate > (due to QI's better utilization of g bit mantissa, which AC > allows to vary to well below g bits; plus due to QI's integer > arithmetic without needless infinite fractions which have to > be dropped by a finite precision AC, one way or the other). > > And if so, are they relevant to the target application? As > > I read you, QI shows an advantage in the high entropy domain. > > What is your target application? (Video/audio data that come > > out of a predictor are IMHO not 'high entropy' because there's > > even for lossless a compression gain of 1:2 to 1:4. But then, > > what's high entropy?. How high?) > I think you need to clarify first what is "perform"? You seem to > have switched somewhere along the way from speed (table sizes, > cache) to compression ratios. There are different arguments here. Argument 1) is that I don't thrust your argument about asymptotic optimality. Mainly because you tell me that you cannot perform the limit N->infinity. Argument 2) is that you claim that you're faster than AC. I do not buy this because you're using tables and you're likely running into cache stalls. One *can* trade speed for optimality, but I can also do that with AC. There is nothing wrong with doing so, but then you should state this. > For any set of messages M= > {M1,M2,...}, provided all coders have a "perfect" model of M, QI > will _always_ compress M to a smaller or equal size as AC > (trivially, AC or any compressor, can compress individual > message from M such as M1 shorter by simply assigning it > codeword of length 1). Obviously not, namely as soon as M becomes larger as the block size. But then, optimality is defined for the limit message size -> infinity. > Of course, with "imperfect model" of M, a random pick of > codewords (satisfying Kraft inequality) may do better on M than > any other coder. Regarding the "imperfect" models of M, though, > as illustrated by array "Vary" (p.10, [T3]), the QI's > "descriptive" modeling is much more resilient to "surprise" than > the usual "predictive" AC modeling. The "descriptive" method is > always "surprised" less than the "predictive" method. I do not make any claims about any *specific* message. I do not care about them - not now. > Also, I have no idea what is this about "high entropy"? That was your claim, namely that QI outperforms AC in the "high entropy regime". So what is "high entropy"? > > How to specific high-speed implementations as MQ? > I have given you [Said04] reference which has results for MQ > (and other quasi-ACs), along with full precision AC. The MQ is > only 2-3 times faster than full ACs, which doesn't even come > close to QI vs full AC ratios, even for generic QI, much less > for specialized QI version optimized for some range of inputs > (which would be a more fair comparison against specialized ACs). Have you or haven't you made direct comparisons? I don't thrust to go into tables and compare implementations cross-wise. The problem is that the data is possibly based on different implementations. > > I don't deny that QI might be interesting. However, > > I don't like unproved claims... > Placing the source code for public access is equivalent to > publishing a mathematical proof (in non peer reviewed preprint). > Now, if you can't compile source on your system, that is > equivalent of saying you can't read the published proof because > you don't read English. No, that's because you don't write english. There's a well-established standard for C sources, namely C89 or C99. Pick it, work in it, and I can compile it. <windows.h> is not in C89, nor in POSIX or whatever. > That is not the same thing as "unproven > claims." There are hundreds of millions win32 PCs which can > compile or run the [QIC] code. That still doesn't make win32 available on my desktop. And just because it is wide-spread it still doesn't make it ANSI-C. > The QI source code has been out there. With the level of > hostilty from some corners, anyone who wanted to empirically > falsify that claim, has a plenty of chance. Or, with preprints > publicly available, which show why it is so, one could have > shown that some key mathematical step is fatally wrong, making > the whole scheme flawed. Aside from minor typos and a minor > complaint of one of the authors whose EC contribution was cited > in [T3] (he though I should have put emphasis on a different > aspect of his contribution, which I will in the next revision), > no falsification of even a minor point has turned up. Just by claiming that there is no error, or because no-one has found one you cannot prove that there is nothing. I'm on the road of understanding and measuring, though the mentioned article is, sorry to say, not very readable, and the code is not very readable, and not very compilable either. I'm willing to invest *some* work, but you also have to do your homeworks. > > Does it? With blocking? It means that the "memory" > > of the coder and thus probabilities are only based > > on a relatively short sample set, where "short" = > > block size. > The blocking doesn't limit the modeler state variables or how > the modeling is done. So what's blocking then? Is it ever possible to get a *clear* and *short* answer? Do you or don't you restart the encoder on a block end? > The blocking only limits how big is the maximum single index > that coder can compute. Thus, it *does* limit the asymptotic optimality. > If the enumerative class provided by > modeler has sequences longer than coder's table limit N, coder > breaks them into smaller blocks. The block boundaries don't > cause any bit fraction loss since QI codes these fractions via > mixed radix codes (cf. N4 p. 9 [T3]). > Your conclusion is no different than concluding that one can't > compute Pi to more than 15-16 digits on a PC, since that is how > many digits floating point unit handles in a 64 bit 'double'. > > Example: Consider a source consisting of two first-order > > Markov sources A and B. At an even timestep, I draw from > > A, at odd timesteps, I draw from B. With modelling, the > > implementation of an AC coder that is optimized to this > > situation is easy. What would I need to do with QI? Not > > exploiting the special nature of this source might be > > very wasteful. /* snip */ > QI would simply classify odd and even elements into two separate > enumerative classes, each having its own index. Fine. How do I tell QI that this is a useful thing to do? If I have pre-knowledge on the sequence as in the above example, how to I tell the back-end? /* snip */ You're again posting things I haven't asked for. I asked for an algorithm that allows me to drive QI optimally for this random source. Do I need to do anything special? > > If you care about compression performance, the only > > thing that is meaingful is the limit N->infinity. > Why? Because things like entropy and optimality make only sense in this limit. There is no mathematically meaningful definition otherwise. > > Is there a finite memory implementation of QI that > > runs optimal in this limit? > AC or QI implementations tested were limited to max inputs > of 2^30 bits. Any finite precision coder, AC or QI or > any other, has redundancy terms O(N) due to quantization. > Hence none of them will approach entropy in the limit n->inf. > For binary alphabet, QI's O(N) term is 4 times smaller than > AC's O(N) term (see (A3) and refes there, already discussed). I haven't asked this. Can I, or can't I modify QI such that it doesn't do blocking. Can I, or can't I run QI such that it runs optimal for infinitely long messages? Please keep yourself *short*. > > Whether you call this "descriptive" (as in, I have counted > > so and so much sequences of this kind) or "predictive" (as > > in, I predict the probability of a symbol due to the counts) > > is just a matter of language. Thus, what *is* the difference, > > leaving language issues alone? > No, that is not the difference. In "predictive" modeling scheme > modeler is trying to predict next symbol at position i+1 (in the > sense of calculating probabilities for values of the next > symbol) based _only_ on the symbols at positions 1,..,i. In > "descriptive" modeling there is no such restriction on what kind > of correlations symbols are allowed to have. Whenever the > symbols 1..i are not correlated well enough with i+1, be it > because i is too small or 1..i is low entropy data with too > little info altogether or because of nature of data, the > predictor pays a price for a wrong presumption. > Of course, any practical predictive modeler has to cheat the > "ideal" predictor scheme and use various ad hoc Escape schemes > (where encoder is allowed to look ahead of decoder and clue it > about sudden change) to get around the worst case penalties for > entirely wrong prediction that a "ideal" predictor would pay. > Consider array "Vary" = int32 { ..-2,-1,0,+1,+2,.. } which was > shown in [T3] p.10. { Any array with sudden changes will do > here, but this one had advantage (in view of 10 page limit) that > I could specify it unambiguously in less than half a line. } > Here you have a big "surprise" in the middle of the array, where > mostly 1's change to mostly 0's. Also at each 32 bit block, > there are little "surprises" of similar kind. Both QI and AC > were order-0 coders (i.e. no correlation assumed from a[i] to > a[i+1]), and both are allowed to assume quasi-stationary source, > i.e. that densities of 1's & 0's may vary along the array. The > predictive Moffat98 AC did very poorly here, especially for > shorter arrays where at N=4K it produced twice as long output as > QI. The QI quasi-stationary order-0 modeler used was very > simple: it looks at the whole array as a single block and splits > the block in exact half if the two halves would encode shorter > (including the cost to indicate the split, which uses longer > Huffman codes for the counts) and it is allowed to do the split > down to 1K and then stop. { NOTE: There are much better ways to > do this kind of modeling via sliding window splits, which are > nearly as fast as the simple method used but much more accurate. > There are also better ways to code counts instead of precomputed > Huffman tables for binomial distribution, but none of it was > used in the tests in [T3].} But that is only a different look on the scheme. You also pay a price for the surprise, if you want to say so. You need to pay the price for encoding on where to split the block. Whether you call this the price for misprediction, or the price for splitting the data does not matter. > > a) not to state results for tests that haven't > > been verified yet. > I consider providing publicly source code in the most commonly > available standard language C for the most common platform > win32/MSVC, equivalent to providing a mathematical proof in a > conventional formalism and in the most commonly read human > language. The most comonly available standard language for C is ANSI-C. It runs on all platforms. I do care about the rest. > To take the full advantage of this largely unexplored > algorithmic territory (A3), being opened (the wonder > algorithm of the 90s, the Burrows-Wheeler transform > with its mesmerizing beauty, was a mere early precursor > of the powerful 'magical' algorithms from this territory), > there is lots of work to be done for anyone who decides > to explore these virgin lands ahead of the crowd. > For the existent AC modelers which can't be upgraded, QI can > still produce an equal compression as AC, but still code it > much faster via method (b) described earlier. { This is an > area I am less interested in, although I may do some QI > adaptations of this kind for our company clients if that > is requested. } Look, I do not, and never did deny that you might have a point; I need to understand this point, and I will understand this point even better (and I do understand it now more than before) but for that, it takes time. It takes more time if the arguments you give are less scientific and more advertising because I react allergic to the "hype language" you tend to use. Thus, please have some patience with me. Be *shorter*, and more to the point. Quote less, write less. This post is much longer than it should be. So long, Thomas

0 |

1/25/2006 10:07:09 AM

>> The tendency of advocating one algorithm >> without presenting the shortcomings. Well, QI is not a one trick pony i.e. one algorithm. It is an optimal solution for EC precision problem as shown in eqs. (21)-(23) in [T3]. Hence one can view it as a class or family of algorithms. This is basically what the summary [M1] states right at the top: M1. QI Summary (solutions for AC shortcomings, potentials) http://groups.google.com/group/comp.compression/msg/27c6f329038a1bdc The shortcoming of QI would be something that is intrinsic to QI, not to some particular algorithm from the family, much less to a particular implementation of any such algorithm. The only shortcoming of QI I am aware of is its 'youth', especially in view of elements (A2) and (A3) in [M1], i.e. to realize its maximum potential new kind of modeling (cf. [T2] pp. 27-35) needs to be developed and implemented. Without that, QI "only" offers a large speed & power consumption advantage over AC and a slight compression edge (if one discounts the row "Vary" on p. 10 [T3] since AC modeler can be implemented to model such cases the 'descriptive' way). Hence, I don't consider a QI shortcoming any objections to a particular algorithm from the family (or its particular implementation), provided there are well known methods for resolving it or if there are already variants that resolve it or if there are mathematical solutions which can be implemented using well known methods. Recalling now much of the arguments in this and previous thread, I seems much of misunderstanding came from considering QI as a single, cast in stone, algorithm. >> The pertinent comparison is between mul/div vs memory read >> (RAM or ROM). The latter takes less power. > > This is an unfair comparison because the full > operation of performing the read a) stalls > the program and by that b) draws more power. > It doesn't have to cause a stall if you organize computations to be cache aware as pointed out in (C2) in post [M2]. M2. Cache, Speed, Clarification for [M1] http://groups.google.com/group/comp.compression/msg/ad14695608d27f6f The stall will occur only if the immediate next instructions are conditioned on that memory value to proceed. (The processors which have large ratios of CPU/RAM speeds are normally pipelined and will execute such independent reads in parallel.) Since QI table accesses have a very orderly nature, it is easy to organize such memory reads to avoid stalls or any cache penalty for all practical purposes. > So it's O(N^2). I don't care about the factors (and > it is unusual to care about them). If you're considering cache limitation, as we were doing there, the factor 4 in table size makes a difference. >> The N is not the size of the message, it is table size. >> The table size determines max block which can be indexed >> as single index. > > In which relation is N to the size of a block? Linear? > I want a dependency on the block size. (Every codec can > be blocked, but that's a sub-optimal solution because > you loose statistics. There is no reason to lose any statistics or any information at all. The QI table limit NT (let's use less confusing symbol for it), is equivalent to CPU register limit to 32 bits. That does not limit your addition to 32 bit integers, provided you propagate carry to the next 32 bits on longer integers. Your C code and algorithms may be entirely oblivious to any such low level details. For QI modeler the coder table limit is exactly that kind of limitation. The modeler could in principle ignore it completely, and provide enumerative classes of any lengths, and let the coder break them down and index in NT sized chunks and do the mixed radix code to remove interblock bit fraction gaps (as explained with your even/odd coder example and I1,I2 coding in mixed radix M1, M2, also in N4 p. 10 in [T3]). In practice, if one is looking for fast and overall competitive implementations, the practical modeler should account for NT value and the coder should account for cache limitations in selecting its NT. That is not something peculiar to QI, since in any domain, each layer of a competitive practical implementation of an algorithm needs to take into account the basic parameters and the limits of the adjacent layers it deals with. > Or more specific, you cannot be asymptotically optimal > if you cannot push N towards infinity.) Specifically, > I'm interested in the behavior for N -> infinity. Input size N (symbols) can be pushed to infinity. For QI with given NT limit, that only means that number of blocks NB=N/NT needs to go to infinity. With QI, due to its precise integer arithmetic with quantized addends, the blocks are exactly like digits of mixed radix number, where the radix (e.g. for binary coder of stationary source) values Rj for j-th digit are the quantized binomials C(NT,Kj), where Kj is count of 1's in j-th block. The special convenience of the radix Rj is that these are sliding window integers (SWI, p.7 [T3]), thus they have form: Rj=(Mj,Ej)=Mj*2^Ej. Hence the computation of the binary form of a mixed radix number with given digits Dj (0 =< Dj < Rj, where Dj is index for j-th block) factors into simple computation in radices Mj, which are g bit integers (as explained in previous post with I1 & I2, M1 & M2, see also N4 p. 9 in [T3]; [QIC] includes a very practical, ready to use mixed radix coder, including a special permutation coder, all of which should run circles in compression and speed around anything you can do with AC on that kind of inputs; plus the properties (A4) described in [M1]). The upper bound on QI redundancy due to g-bit quantization is DQ(g) = 2*log(e)/2^g bits per (input) symbol. That is 4 times smaller than the corresponding upper bound for AC: DA(g)= 8*log(e)/2^g using the same g bit precision (see [M1] & [M2] for refs & explanation). Thus neither coder when limited to g bits arithmetic precision will produce output which converges to entropy for N->inf. To obtain the latter, both coders need a precision g which grows as log(N). Of course, at any g(N), the QI max excess per symbol DQ(g(N)) remains 4 times smaller than the AC max excess per symbol DA(g(N)). As a finer detail for the N->inf analysis, the block limit NT needs to grow as O(log(N)), since the QI mixed radix coding of block indices introduces (for NB>2) redundancy of 1/2^NT bits per block. Since NT>g (in practice NT>>g), this higher order correction to DQ(g), although in practice entirely negligible for any commonly used NT, g and N (e.g. g=32, NT=1K, N=2^32), becomes important for N->inf and g(N)->inf. An implication of the NT(N)=O(log(N)) is that the number of table entries O(NT^2) grows as O(log(N)^2) as N->inf, and since entry size is g(N)=O(log(N)), the total QI table size grows as O(log(N)^3) in N->inf limit, if the asymptotic optimality is required (hence limited precision requirement is dropped). The actual precisions g selected in practical implementations are chiefly determined by the available register sizes, which is normally such that machine MemorySize =< 2^RegisterSize (in order to be able to address the entire memory), which then leads to automatic fulfillment of the g(N)=O(log(N)) condition if you use g=RegisterSize. The table size limits NT are with QI "much" larger than g (otherwise you would be using exact EC, since binomials grow slower than O(2^N), e.g. NT=32 and g=32 requires no table quantization), hence NT will automatically satisfy asymptotic requirement NT(N)= O(log(N)) if g(N) does and if any quantization of binomials is done. As noted in [M1] & [M2], for any finite N, AC has additional absolute (on the total output) excess of O(1) which is 2-4 bits, plus O(log(N)) bits in case for adaptive and semi-static AC. The point of (A2) section in [M1] was that even though these terms are ignored in the conventional coarse-grained N->inf analysis, that approach overlooks certain types of practically important finite sequences, rendering them as "incompressible" in practice simply because the AC's O(1) & O(log(N)) "noise" precludes effective modeling and packaging algorithms. > You're too tight at the machine. If I'm discussing > optimality, I don't care about cache sizes. This > is a second argument that then has to follow as soon > as the code should be made fast. *NOT NOW* These are parallel tracks of the discussion. You and others have brought up cache issues, hence they had to be answered. For N->inf asymptotics, one can ignore such "details". For N->inf, the QI is asymptotically as optimal as AC, but with uniformly 4 times smaller (in binary case) distance from the entropy than AC, at any precision value g(N)=O(log(N)). > By limiting the table size, you limit the quality > by limiting the block size. Hopefully, the above has convinced you that this is not the case. The table limit NT is exactly as problematic as the precision limit g i.e. not problem at all if you allow them to grow as O(log(N)) in N->inf limit. > It is pretty simple: If you encode messages in "blocks", > then the coder apparently cannot use knowledge from > the previous block to encode the next one. That is a completely gratuitous requirement. There is no reason for QI modeler to forget any information from block to block. Or to even care, in principle, about the blocks any more than it has to care about the arithmetic precision g or the machine register size. >> As explained earlier, QI can code exactly >> as AC, in stream mode, if there is anything >> for which that is the best method, ... M3. Methods (a) and (b) of QI coding for AC modeler: http://groups.google.com/group/comp.compression/msg/1314ff87da597fad > No, it can't, obviously. If I have to block, I > can build an AC coder - in principle - that > carries the statistics over a size that is > larger than a QI block size. See above. Again that gratuitous assumption of QI modeler memory being erased on block boundaries. As explained above, that is not necessary at all and QI doesn't do that. As a minor detail, the method (a) described there is a pure streaming coder, no blocks are assumed there (since it never reaches the end of the block). It is basically equivalent to computing binomials using multiplicative recurrences of the type C(n+1,k+1)=C(n,k)*(n+1)/(k+1), with only two rows of n and n+1 of the table kept (for speed). This is exactly the calculation that AC does, only in the opposite direction, from n+1 -> n, i.e. the AC calculation is: C(n,k) = (k+1)/(n+1)*C(n+1,k+1), where it interprets factor (k+1)/(n+1) as probability of 1s, (k is count of 1's). In other words AC is simply the extreme point in the spectrum of QI's possible table size vs speed tradeoffs (see N2 p.9 [T3], for the rest of that spectrum), in which the tables are minimized to size 0, while the coding time is maximized due to having to compute all binomials (addends of eq. (21) [T3], in general), on LPS and MPS symbols and inside the coding loop (see pp. 19-25 in [T2] for more details on the relation between AC & QI coding). > There are different arguments here. Argument 1) > is that I don't thrust your argument about asymptotic > optimality. Mainly because you tell me that you > cannot perform the limit N->infinity. Hopefully the N->inf analysis above has cleared this objection. > Argument 2) is that you claim that you're faster > than AC. I do not buy this because you're using > tables and you're likely running into cache stalls. a) At the theoretical level QI does less work per coding step because of its better division of labor at all levels takes out of the coding loop universal properties of the symbol sequences (see (A1) in (M1)). Hence, among others, it uses no coding operation on MPS, and it does much less work on LPS. Further, the interaction between the modeler and coder is minimized so that modeler passes its coding tasks in maximum size batches of symbols available instead of symbol by symbol interaction as done within the AC modeling interface. b) At the empirical level, where cache consideration enters the picture, in addition to (C1)-(C4) points in (M2), with the source made available, I consider that matter closed (you can ask Jason Betts his unix version which has windosw.h & conio.h substitutes, plus the needed changes in Qiutl.c). I gave figures I obtained without even using the main cache related optimizations (C2) and (C4), and anyone can challenge them. With (C2) alone, one can for all practical purposes remove any cache miss penalty, while (C4) (the sparse coder, which is included in the [QIC] source) only further improves the largest of the speed ratios. > One *can* trade speed for optimality, but I can also > do that with AC. There is nothing wrong with doing > so, but then you should state this. QI can trade it, too. For example if it were competing against runlength coder on sparse data, it would not need to use high precision tables (g=32), or large tables (the sparse coder needs much smaller tables) or very tight coding for counts, or mixed radix block boundaries coding (the tapered Huffman of D1, D2,.. will give it on average the bit fraction to within .086 bits). These and similar tradeoffs would improve speed significantly at the expense of precision (which can be kept to remain better than that of the runlength coder). For competition against the AC best known for combination of speed and accuracy (Moffat98), different tradeoffs were used (mixed radix for blocks was used, but still only Huffman for symbol counts and data sizes) and the results are shown in [T3]. That particular balance of speed & accuracy, is still plenty fast to run circles around any quasi-AC reported in Said04 and Moffat98 papers. With [QIC] source in public, that's all I can do from my end, with the time I had available. Even though QI was conceived in July 2004: M4. QI history http://groups.google.com/group/comp.compression/msg/f9dc68cc361bb740 hence about year and half ago, the work on it was and is heavily multiplexed with the regular work requirements, throughout. With so many interesting developments with QI (especially with new modeling along the lines of (A2) & (A3)) going on, fiddling with varieties of ACs and public source code variants, or even writing the reports & preprints, to everyone satisfaction is like chewing a week old MacDonalds burger, while the platter of premium filet mignon broiled to perfection is in front of me. > Obviously not, namely as soon as M becomes larger as > the block size. But then, optimality is defined for > the limit message size -> infinity. That should be clear by now. >> Also, I have no idea what is this about "high entropy"? > > That was your claim, namely that QI outperforms AC in the > "high entropy regime". So what is "high entropy"? I don't know what particular "high entropy" statement this refers to. QI does outperform AC for any entropy. The relative compression gains, which is evident from [T3] (or the analysis of redundancies), increase for the smaller outputs, since O(1) and O(log(n)) terms become significant. As explained in (A3) [M1], the observation that these gains are small is a tautology. The potential gains are much larger then just few percent shown in [T4]. That, of course has to be shown in the future. (A3) merely gives heuristic reasons (based on my own experiments & math over past year and half, all still unpublished) why one should expect so. >> for specialized QI version optimized for some range of inputs >> (which would be a more fair comparison against specialized ACs). > > Have you or haven't you made direct comparisons? I don't > thrust to go into tables and compare implementations cross-wise. Not against quasi-AC or any other AC but Moffat98. Here in this group, we saw Matt Mahoney show his results for the high entropy limit of A=3 N=10^6 sequence. I tried the same with QI, since the QI.exe in the released [QIC] kit already does that so anyone can verify, and also against Moffat98 (which did worse than Matt's AC on compression), and there is no contest, not even close, in speed or compression on that type of sequences (Matt didn't give his timings, but his coder doesn't unroll loops as Moffat98 and it uses stream i/o so it will be slower than straight array coders). The QI radix codes use the A-row skipping method (cf. N2 p. 9 [T3]), which makes the O(N) redundancy independent of radix, while with AC O(N) grows as N*A. Hence the case A=3 was the best non-binary case for AC, and still there was no contest. Cross-table comparison was only good enough to give me a basic idea where do quasi-ACs roughly stand vs Moffat98 and other full precision coders and whether it is worth spending time doing my own tests on these (and I concluded that it was not worth it). > There's a well-established standard for C sources, namely > C89 or C99. Pick it, work in it, and I can compile it. > <windows.h> is not in C89, nor in POSIX or whatever. The coding functionality, with _ASM in Intro.h set to 0, is a very generic C (provided you set the 8, 16, 32, 64 bit integer typedefs in Qitypes.h if different from those set there). Only the timer and some keyboard checks in the upper level tests are windows/MSVC specific. Removing those can answer all but speed questions. Since speed is an important element of the difference, the ANSI C, without a standard microsecond or better level timer would require more elaborate test functions which loop many times on each single aspect (generate, code, decode, compare). The problem with that approach is that it would introduce distortion relative to cache effects, which would go in favor of QI (allowing it better use of cache by running single specialized component with fixed input at a time). For fairness I chose the less objectionable path of using win32 high res timers. >> The blocking only limits how big is the maximum single index >> that coder can compute. > > Thus, it *does* limit the asymptotic optimality. As explained above, not at all. >> QI would simply classify odd and even elements into >> two separate enumerative classes, each having its >> own index. > > Fine. How do I tell QI that this is a useful thing to do? > If I have pre-knowledge on the sequence as in the above > example, how to I tell the back-end? For AC, with one shoe fits all, all of modeler info about sequence gets funneled through 'probabilities of the next symbol'. With QI, which uses much richer and more precise language of finite sequences parameters, there are many ways to tell the coder what to code here. The general pattern of doing this is described in [T2] p. 27, and elaborated in the rest of that chapter (on QI/EC modeling). As with any streamlined division of labor, more thought is needed upfront, so less work and a better job is done later, in the coding loop (that's a special case of the old 'ounce of prevention' vs 'pound of cure' difference). In your example, one could use general AC modeler interface described as method (b) [M3], which understands the multiple probability classes and splitting of the input for different enumerators. We wouldn't have modeler call the coder on each symbol, but it would call enumerator to enumerate one array (which modeler may extract separately), then call enumerator for another array. One can streamline the work in variety of ways (which could handle more general interleaved type inputs) so that modeler doesn't need to actually copy sub-arrays, just as one doesn't create N literal BWT rows and sort them. None of this should be any more complicated than AC modeler setting up two contexts and alternating them from call to call. The main difference here is that we wouldn't traverse the whole coder+modeler hierarchy with calls and function entries, symbol by symbol, but would make the modeler do the separation into enumerative classes first (be it as explicit copy of sub-arrays or as init() of some more general interleaved inputs enumerator, which are needed in (b) when dealing with Markov sources, anyway), then the enumerator does the requested enumerations of the given complete sequences. If you want to do less work, do it faster, more accurately and with lower power consumption, that's the way to do it, here or in any program or generally in any production (or even in mental work). The packaging of components would be QI's generic mixed radix packager (used in multi-block and multi-alphabet coders). The package normally includes number of components and their sizes as parameters (which are packaged as well). Since this is not a simple order-0 coding one would also have to include a description specifying number of AC probability classes that method (b) has to include (unless this is assumed to be known to decoder). To describe this info, in case it is not known upfront, it will costs you the same as with AC given the same upfront information. If you use adaptive AC, the 'learning' will generally cost it O(log(n)) bits more than encoding optimally the exact same information separately. > Because things like entropy and optimality make > only sense in this limit. There is no mathematically > meaningful definition otherwise. That is only the asymptotic optimality, which is just one low resolution fact in the realm of finite sequences. In any case, QI is asymptotically optimal, too. Provided you don't set any limit on precision g or table size NT, as is the case for AC or any other coding algorithm. At a higher resolution view, QI quantization excess is 4 times smaller (for A=2) than that of AC at any given arithmetic precision g. For multi-alphabet coding the AC quantization excess per symbol grows as O(A), while QI's remains O(1) in its most accurate mode (as shown in [QIC] source, which includes a high entropy limit coder usable for any A < 2^32). > Can I, or can't I modify QI such that it doesn't do > blocking. Can I, or can't I run QI such that it > runs optimal for infinitely long messages? The two questions are unrelated. As indicated above, the answer is yes to both. > But that is only a different look on the scheme. It is not just a different perspective on the same modeling. There is a tangible mathematical constraint on predictor as to what and how it can correlate and no such restriction for the descriptor. While asymptotically for n->inf, their outputs do converge to the same limits, for n<inf (which is all that anyone will ever code) the descriptive method has a measurable advantage, such as more than two times smaller output on "Vary" (or any similar sequence) for n=4K. > You also pay a price for the surprise, if you > want to say so. You need to pay the price for > encoding on where to split the block. Whether > you call this the price for misprediction, or > the price for splitting the data does not matter. Yes, but you pay much less in descriptive mode. You don't even need QI to test array "Vary" (or any similar input with large surprises) using two AC coders, one in predictive one in descriptive mode. (You can even give it to a student as little term project.) The difference in cost is due to predictor being very wrong on LPS vs MPS on every input symbol the whole N/2 times in "Vary",while descriptor is wrong only on coding the model information, which is O(log(n)) size. Hence, the descriptor, due to its clean separation between the O(log(N)) model info from the primary O(N) output, can better localize the damage of the "surprise" to the O(log(N)) component. This is a quite easily verifiable empirical fact.

0 |

1/25/2006 9:20:21 PM

--- Errata: > As a finer detail for the N->inf analysis, the block limit NT > needs to grow as O(log(N)), since the QI mixed radix coding of > block indices introduces (for NB>2) redundancy of 1/2^NT bits The "1/2^NT" above should say "2*log(e)/2^g" (since for the described method of block boundary coding in multiblock QI via mixed radix codes, the quantization is done on the g-bit radices Mj not on the NT-bit radices Rj; the latter would be quite impractical for NT larger than 64 bits on 32 bit processors). This doesn't affect any conclusions, but one statement needs a slight refinement. A more precise QI's quantization redundancy DQ(g) for the multiblock coding method described and with the block boundary effects included, is DQ(g)=2*(1+1/NT)/2^g. That affects slightly the statement that QI has "4 times" smaller quantization redundancy than AC (which is exact in single block case), to the more precise multiblock case statement "4/(1+1/NT) times". E.g. for the [QIC] default value NT=1K, the "4 times" would say "3.996" times, if one is speaking at that level of precision.

0 |

1/26/2006 4:41:23 PM

--- Errata: > As a finer detail for the N->inf analysis, the block limit NT > needs to grow as O(log(N)), since the QI mixed radix coding of > block indices introduces (for NB>2) redundancy of 1/2^NT bits The "1/2^NT" above should say "2*log(e)/2^g" (since for the described method of block boundary coding in multiblock QI via mixed radix codes, the quantization is done on the g-bit radices Mj not on the NT-bit radices Rj; the latter would be quite impractical for NT larger than 64 bits on 32 bit processors). This doesn't affect any conclusions, but one statement needs a slight refinement. A more precise QI's quantization redundancy DQ(g) for the multiblock coding method described and with the block boundary effects included, is DQ(g)=2*(1+1/NT)*log(e)/2^g. That affects slightly the statement that QI has "4 times" smaller quantization redundancy than AC (which is exact in single block case), to the more precise multiblock case statement "4/(1+1/NT) times". E.g. for the [QIC] default value NT=1K, the "4 times" would say "3.996 times", if one is speaking at that level of precision.

0 |

1/26/2006 4:47:50 PM

=== Update to Quantized Indexing RefLib page === http://www.1stworks.com/ref/RefLib.htm Quantized Indexing Home: http://www.1stworks.com/ref/qi.htm * Local copies for several external papers (which had broken links) were added * Added Richard Brak's AMSI lectures on lattice path combinatorics === Updates 8a. R. Brak Enumerative Combinatorics (lattice paths) Lectures, AMSI Summer School, 2003 http://www.1stworks.com/ref/combin_P1.PDF 28. L.D. Davisson Universal noiseless coding IEEE Trans. Inform. Theory IT-19 (6), 783-795, 1973 http://www.1stworks.com/ref/Davisson1973Universal.pdf 29. J. Rissanen, G.G. Langdon Universal Modeling and Coding IEEE Trans. Inform. Theory IT-19 (1), 12-23, 1981 http://www.1stworks.com/ref/Rissanen1981Universal.pdf 30. A. Barron, J. Rissanen, Bin Yu The minimum description length principle in coding and modeling IEEE Trans. Inform. Theory IT-44 (6), 2743-2760, 1998 http://www.1stworks.com/ref/Barron1998minimum.pdf 33. R. Krichevsky, V. Trofimov The performance of universal encoding IEEE Trans. Inform. Theory IT-27 (2), 199-207, 1981 http://www.1stworks.com/ref/Krichevsky1981performance.pdf 53. M. Feder, N. Merhav, M. Gutman Universal prediction of individual sequences IEEE Trans. Inform. Theory IT-38 (4), 1258-1270, 1992 http://www.1stworks.com/ref/Feder1992Universal.pdf 54. J. Rissanen Universal coding, information, prediction, and estimation IEEE Trans. Inform. Theory IT-30 (4), 629-636, 1984 http://www.1stworks.com/ref/Rissanen1984Universal.pdf 55. M.J. Weinberger, J. Rissanen, M. Feder A universal finite memory source IEEE Trans. Inform. Theory IT-41 (3), 643-652, 1995 http://www.1stworks.com/ref/Weinberger1995universal.pdf

0 |

2/8/2006 5:49:04 AM