COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

### arithmetic encoding with rationals

• Follow

```hi

I'm trying to implement artithmetic coding in a programming language which
has built-in support for arbitrary size rational numbers, so my first
version is straightforward implementation with rationals. Good thing about
it is that it is easy and straightforward, it is able to work with any data
and it achieves theoretical encoding efficiency (aside some possible
optimizations for fixed-length data).

However there is a problem -- in process rational numbers get larger and
larger eating more memory and making encoding slower.

I'm aware that arithmetic coders are usually implemented via fixed length
http://marknelson.us/1991/02/01/arithmetic-coding-statistical-modeling-data-compression/

But there is a problem with it:

* code is fairly complex
* it imposes limitations on precision of probabilities and thus also size
of alphabet
* it does some non-trivial rounding which makes encoding worse than
theoretical
* it isn't easy to veryify validity of code

E.g. here's code from Mark's article:

range = (long) ( high-low ) + 1;
high = low + (unsigned short int )
(( range * s->high_count ) / s->scale - 1 );
low = low + (unsigned short int )
(( range * s->low_count ) / s->scale );

As I understand it does the rounding, but there is no discussion of rounding
in the article.

There is an article which discusses all bloody details of rounding in 60
pages:
http://www.hpl.hp.com/techreports/2004/HPL-2004-76.pdf
(E.g. figure on page 20:  Subdivision of a coding interval with approximate
multiplications. Due to the fixed-precision arithmetic, we can only
guarantee that all coding intervals are disjoint if we leave small regions
between intervals unused for coding.)

But again it describes implementation which uses fixed-length integers and
it is fairly complex and I'm not too excited about implementing it from

So my question is -- is it possible to formulate this rounding in terms of
rational numbers while keeping things simple?
I think I know how to do this in encoder -- just making interval a little
bit smaller, e.g. for [lo, hi) find rounded loR and hiR such that
lo <= loR < hiR <= hi

But decoder needs to take it into account as well (right?) and I'm not sure
how to do it.
Naive approach is to replicate encoder's work in decoder, but maybe there is
more elegant approach?

```
 0
Reply udodenko (1040) 2/14/2011 12:46:56 PM

```On Feb 14, 5:46=A0am, "Captain Obvious" <udode...@users.sourceforge.net>
wrote:
> hi
>
> I'm trying to implement artithmetic coding in a programming language whic=
h
> has built-in support for arbitrary size rational numbers, so my first
> version is straightforward implementation with rationals. Good thing abou=
t
> it is that it is easy and straightforward, it is able to work with any da=
ta
> and it achieves theoretical encoding efficiency (aside some possible
> optimizations for fixed-length data).
>
> However there is a problem -- in process rational numbers get larger and
> larger eating more memory and making encoding slower.
>
> I'm aware that arithmetic coders are usually implemented via fixed length
/01/arithmetic-coding-statistical-modelin...
>
> But there is a problem with it:
>
> =A0* code is fairly complex
> =A0* it imposes limitations on precision of probabilities and thus also s=
ize
> of alphabet
> =A0* it does some non-trivial rounding which makes encoding worse than
> theoretical
> =A0* it isn't easy to veryify validity of code
>
> E.g. here's code from Mark's article:
>
> range =3D (long) ( high-low ) + 1;
> high =3D low + (unsigned short int )
> (( range * s->high_count ) / s->scale - 1 );
> low =3D low + (unsigned short int )
> (( range * s->low_count ) / s->scale );
>
> As I understand it does the rounding, but there is no discussion of round=
ing
> in the article.
>
> There is an article which discusses all bloody details of rounding in 60
> pages:
> =A0http://www.hpl.hp.com/techreports/2004/HPL-2004-76.pdf
> (E.g. figure on page 20: =A0Subdivision of a coding interval with approxi=
mate
> multiplications. Due to the fixed-precision arithmetic, we can only
> guarantee that all coding intervals are disjoint if we leave small region=
s
> between intervals unused for coding.)
>
> But again it describes implementation which uses fixed-length integers an=
d
> it is fairly complex and I'm not too excited about implementing it from
>
> So my question is -- is it possible to formulate this rounding in terms o=
f
> rational numbers while keeping things simple?
> I think I know how to do this in encoder -- just making interval a little
> bit smaller, e.g. for [lo, hi) find rounded loR and hiR such that
> lo <=3D loR < hiR <=3D hi
>
> But decoder needs to take it into account as well (right?) and I'm not su=
re
> how to do it.
> Naive approach is to replicate encoder's work in decoder, but maybe there=
is
> more elegant approach?

The nice thing about arithmetic coding if done correctly there is no
wasted space.  You could keep track of the residue buy using fractions
instead of a larger binary integer  but nothing is gained.  I have
programmed many arithmetic coders with various sizes for the state
registers. As you increase the number of bits you quickly come to
realize that adding more bits only would be useful if you want to use
it as a last pass before encryption.
You also will find that for most useful files there is a point where
if will compress files worse than using fewer bits due to the
nonstationary nature of real data.
A pure simple arithmetic compressor will compress all permutations of
a data file to the same number of bytes. If you use one with lower
precession then the different permutations of the file will compress
to different lengths and if done correctly the you can often take
advantage of the local clustering and make the lower precession
arithmetic compress real files better than a pure arithmetic coder.

A simple pure bijective arithmetic coder is arb255 its easy to change
it from the current 64 bit register size to smaller sizes if you wish
to play with it.

David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"

```
 0
Reply biject.bwts (40) 2/14/2011 2:13:38 PM

```On 14.02.2011 13:46, Captain Obvious wrote:

> I'm trying to implement artithmetic coding in a programming language
> which has built-in support for arbitrary size rational numbers, so my
> first version is straightforward implementation with rationals. Good
> thing about it is that it is easy and straightforward, it is able to
> work with any data and it achieves theoretical encoding efficiency
> (aside some possible optimizations for fixed-length data).
>
> However there is a problem -- in process rational numbers get larger and
> larger eating more memory and making encoding slower.

Of course. At a certain point, you either need to cancel the rationals,
or throw away precision by rounding.

> I'm aware that arithmetic coders are usually implemented via fixed
> http://marknelson.us/1991/02/01/arithmetic-coding-statistical-modeling-data-compression/
>
>
> But there is a problem with it:
>
> * code is fairly complex

Not really. It is only a couple of lines long. It's getting very easy
for binary sources, which is the typical use-case for arithmetic coders.
You need to understand the reasons why the program is written it is, but
what is more important is that the encoder and the decoder follow the
same program flow and the same arithmetic operations.

Actually, I would believe that the logic you need to implement all the
rational classes, canceling and rounding is probably more complex than
using the typical fix-point implementation.

> * it imposes limitations on precision of probabilities and thus also
> size of alphabet

True. However, this goes for any encoder. A huffman tree can also get
very deep for large alphabets.

> * it does some non-trivial rounding which makes encoding worse than
> theoretical

The rounding is rather trivial, but its implications on the probability
model might be hard to write down precisely. However, the typical coding
loss is rather due to modeling errors (i.e. not providing a very good
model for the source) rather than using the precise probabilities.
Consider the following example:

When encoding a binary source, you estimate the probability of the zero
symbol to be p(0) = 32767/65536. What do you think causes a higher
coding loss:

a) simplifying p(0) to 1/2,
b) using a zero-order model instead of a first-order model.

> * it isn't easy to veryify validity of code.

Why? I don't understand this argument. It is debugging code, as always.

> E.g. here's code from Mark's article:
>
> range = (long) ( high-low ) + 1;
> high = low + (unsigned short int )
> (( range * s->high_count ) / s->scale - 1 );
> low = low + (unsigned short int )
> (( range * s->low_count ) / s->scale );
>
> As I understand it does the rounding, but there is no discussion of
> rounding in the article.

The trick isn't really *how* the rounding is done. The trick is to do it
*consistently* in the encoder and the decoder. Any rounding convention
would do provided you round consistently.

The problem with AC coding is probably carry-over handling, but that's
not fixed by using rationals.

Greetings,
Thomas
```
 0
Reply thor16 (320) 2/14/2011 2:54:04 PM

```I've little experience with arithmetic coding.  I did study
and implement IBM's Q-coder, coming to the conclusion that its
cleverest features were those which *violated* a rigorous
arithmetic model in tha interest of speed.

"Captain Obvious" <udodenko@users.sourceforge.net> might have writ, in
news:4d592448\$0\$23756\$14726298@news.sunsite.dk:

>  [fixed-length arithmetic coders do] some non-trivial rounding which
> makes encoding worse than theoretical

Suppose a probability estimate is 0.399 but the coder rounds this
to 0.400 to keep arithmetic simple.  That will cost you *very* little
in bitrate.  Remember that most of the tiny amount you lose encoding
a .601 event as .600 will be gained back coding .399 event as .400.

> ... Due to the fixed-precision arithmetic, we
> can only guarantee that all coding intervals are disjoint if we leave
> small regions between intervals unused for coding.)

Do arithmetic coders have this property?  I don't think
IBM's Q[M]-coder does, and certainly not Mr. Biject's !

> So my question is -- is it possible to formulate this rounding in
> terms of rational numbers while keeping things simple?

a/b with some other, smaller-denominator, a'/b'.  Using a fixed
power-of-two for b' would be a popular choice.  :-)

James Dow Allen
```
 0
Reply gmail1217 (37) 2/14/2011 3:59:23 PM

3 Replies
33 Views