COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

### Predicting the size of a Huffman block?

• Follow

That's pretty much it: Given the number of occuring symbols and the
length of the file or data to be compressed, how do I estimate the
size of the output?  I have to do this quickly on 8-bit, 16-bit and 32-
bit computers.  I can work on predicting the size of the overhead
myself.
-------------------
Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community

 0
Reply rose.joseph12 (88) 2/8/2012 2:18:13 PM

On 08.02.2012 15:18, Harry Potter wrote:
> That's pretty much it: Given the number of occuring symbols and the
> length of the file or data to be compressed, how do I estimate the
> size of the output?  I have to do this quickly on 8-bit, 16-bit and 32-
> bit computers.  I can work on predicting the size of the overhead
> myself.

What type of information do you have, and which type of answer would you
need? Do you have the Huffman code available? What do you know on the
statistics of the source? Do you need a worst-case or an average-case

just "no".

Greetings,
Thomas

 0
Reply thor16 (320) 2/8/2012 3:55:59 PM

On Feb 8, 10:55=A0am, Thomas Richter <t...@math.tu-berlin.de> wrote:

> The general answer to your question, lacking any further information, is
> just "no".

Given the number of occuring symbols AND frequency of those symbols
PLUS length of the file.

I believe it is possible to calculate the size of a STATIC HUFFMAN
table and the expected size of the resulting file.

For non-static you probably do almost as much work estimating the file
size as just doing the compression completely.


 0

On Feb 8, 10:55=A0am, Thomas Richter <t...@math.tu-berlin.de> wrote:
> What type of information do you have, and which type of answer would you
> need? Do you have the Huffman code available? What do you know on the
> statistics of the source? Do you need a worst-case or an average-case
> answer? Is an estimate sufficient?
>
*  General information--I'll do more specific later
*  I'm only starting the code--I don't have it yet
*  The compressor can scan the source file or parts of it for the
statistics of the source befoe compression
*  I need the best possible depending on user settings
*  An estimation will do--and is in fact what I want.

> The general answer to your question, lacking any further information, is
> just "no".
>

 0
Reply rose.joseph12 (88) 2/9/2012 3:41:45 PM

On Feb 8, 9:18=A0pm, Harry Potter <rose.josep...@yahoo.com> wrote:
> Given the number of occuring symbols and the
> length of the file or data to be compressed, how do I estimate the
> size of the output?

Do you have the histogram [H1, H2, ... Hk] of tokens
to be coded?  If so, the output size in bits is
given approximately by
T (0.03 + log T) - sum Hj log Hj
where T =3D sum Hj, and log is base 2.
0.03 is an estimate of the inefficiency (due to
discreteness) of Huffman coding.

Hope this helps.
James

 0
Reply jdallen2000 (489) 2/9/2012 4:54:20 PM

On 09.02.2012 16:41, Harry Potter wrote:
> On Feb 8, 10:55 am, Thomas Richter<t...@math.tu-berlin.de>  wrote:
>> What type of information do you have, and which type of answer would you
>> need? Do you have the Huffman code available? What do you know on the
>> statistics of the source? Do you need a worst-case or an average-case
>> answer? Is an estimate sufficient?
>>
> *  General information--I'll do more specific later
> *  I'm only starting the code--I don't have it yet
> *  The compressor can scan the source file or parts of it for the
> statistics of the source befoe compression

Is a two-pass approach acceptable, i.e. determine the relative
frequencies first, design the optimal Huffman code, then encode?

> *  I need the best possible depending on user settings
> *  An estimation will do--and is in fact what I want.

If this is simple zero-order Huffman coder: Set p_i to the relative
frequency of symbol i. ceil(-log_2(p_i)) will give you the number of
bits the Huffman code will assign to the symbol. N \sum_i p_i *
ceil(-log_2(p_i)) will give you the compressed file size in bits, where
N is the number of symbols in the stream. Round up to the next byte,
done. Remember that you somehow also need to signal the Huffman code you
used, i.e. there is an overhead of side-channels not included in the
above computation.

Greetings,
Thomas

 0
Reply thor16 (320) 2/9/2012 4:56:01 PM

On Feb 9, 11:56=A0am, Thomas Richter <t...@math.tu-berlin.de> wrote:
> Is a two-pass approach acceptable, i.e. determine the relative
> frequencies first, design the optimal Huffman code, then encode?
>
A two-pass approach is my plan.

> If this is simple zero-order Huffman coder: Set p_i to the relative
> frequency of symbol i. ceil(-log_2(p_i)) will give you the number of
> bits the Huffman code will assign to the symbol. N \sum_i p_i *
> ceil(-log_2(p_i)) will give you the compressed file size in bits, where
> N is the number of symbols in the stream. Round up to the next byte,
> done. Remember that you somehow also need to signal the Huffman code you
> used, i.e. there is an overhead of side-channels not included in the
> above computation.
>
I don't understand most of the above computations as I am limited to a
high-school math education.  But logarithms are too complex to do on
an 8-bit computer with any semblance of speed.

 0
Reply rose.joseph12 (88) 2/9/2012 5:09:36 PM

I should've mentioned: the figures I have are the size of the block to
compress, the values that occur and the number of occurences for each
value.  However, I don't want to use the number of occurences in the
calculation as it would take too long.  And as for the overhead, I
don't want to reveal how I'm going to do compression, so I'll have to
figure it out myself.  Sorry about the rudeness--it wasn't intended.  :
(

On Feb 9, 12:09=A0pm, Harry Potter <rose.josep...@yahoo.com> wrote:
> On Feb 9, 11:56=A0am, Thomas Richter <t...@math.tu-berlin.de> wrote:> Is =
a two-pass approach acceptable, i.e. determine the relative
> > frequencies first, design the optimal Huffman code, then encode?
>
> A two-pass approach is my plan.
>
> > If this is simple zero-order Huffman coder: Set p_i to the relative
> > frequency of symbol i. ceil(-log_2(p_i)) will give you the number of
> > bits the Huffman code will assign to the symbol. N \sum_i p_i *
> > ceil(-log_2(p_i)) will give you the compressed file size in bits, where
> > N is the number of symbols in the stream. Round up to the next byte,
> > done. Remember that you somehow also need to signal the Huffman code yo=
u
> > used, i.e. there is an overhead of side-channels not included in the
> > above computation.
>
> I don't understand most of the above computations as I am limited to a
> high-school math education. =A0But logarithms are too complex to do on
> an 8-bit computer with any semblance of speed.


 0
Reply rose.joseph12 (88) 2/9/2012 5:25:07 PM

On 09.02.2012 18:09, Harry Potter wrote:

> I don't understand most of the above computations as I am limited to a
> high-school math education.  But logarithms are too complex to do on
> an 8-bit computer with any semblance of speed.

Point 1) No, taking the log to the base of two is not a complex
operation. It is a matter of shifting and counting the shifts and is
trivial.

Point 2) Sorry, folks, but without mathematics no compression. The
problem can be solved by getting an education, and only by that. If
that's not your intention - deliver Pizza instead. Sorry to sound so
strict, but it is *really* that simple. As always, it is up to you to
fix it.


 0
Reply thor16 (320) 2/9/2012 7:45:11 PM

On Feb 9, 2:45=A0pm, Thomas Richter <t...@math.tu-berlin.de> wrote:
> Point 1) No, taking the log to the base of two is not a complex
> operation. It is a matter of shifting and counting the shifts and is
> trivial.
>
I can do that.

> Point 2) Sorry, folks, but without mathematics no compression. The
> problem can be solved by getting an education, and only by that. If
> that's not your intention - deliver Pizza instead. Sorry to sound so
> strict, but it is *really* that simple. As always, it is up to you to
> fix it.

You're right.  I think I have some ideas, though.

 0
Reply rose.joseph12 (88) 2/10/2012 2:03:15 PM

On Feb 9, 1:45=A0pm, Thomas Richter <t...@math.tu-berlin.de> wrote:
>
> Point 2) Sorry, folks, but without mathematics no compression. The
> problem can be solved by getting an education, and only by that. If
> that's not your intention - deliver Pizza instead. Sorry to sound so
> strict, but it is *really* that simple. As always, it is up to you to
> fix it.

For order-0/statistical and complex methods, absolutely.  But I wrote
my own LZ77 compressor long before I understood what those were.  My
math is also limited, but I've tried to make up for it with excessive
research.

I have entertained the idea of going back to college (ie. night school
or community college) to complete calculus and trigonometry.  I have
been using many methods and formulas in my hobby work for over a
decade without fully understanding them.  The proverbial "light bulb"
has not yet lit up yet.

 0
Reply MobyGamer301 (139) 2/10/2012 3:06:36 PM

10 Replies
41 Views

Similiar Articles:

7/27/2012 9:55:56 PM