COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

### Predicting the size of a Huffman block?

• Email
• Follow

That's pretty much it: Given the number of occuring symbols and the
length of the file or data to be compressed, how do I estimate the
size of the output?  I have to do this quickly on 8-bit, 16-bit and 32-
bit computers.  I can work on predicting the size of the overhead
myself.
-------------------
Joseph Rose, a.k.a. Harry Potter
Working magic in the computer community

 0
Reply rose.joseph12 (88) 2/8/2012 2:18:13 PM

See related articles to this posting

On 08.02.2012 15:18, Harry Potter wrote:
> That's pretty much it: Given the number of occuring symbols and the
> length of the file or data to be compressed, how do I estimate the
> size of the output?  I have to do this quickly on 8-bit, 16-bit and 32-
> bit computers.  I can work on predicting the size of the overhead
> myself.

What type of information do you have, and which type of answer would you
need? Do you have the Huffman code available? What do you know on the
statistics of the source? Do you need a worst-case or an average-case

just "no".

Greetings,
Thomas

 0
Reply thor16 (336) 2/8/2012 3:55:59 PM

On Feb 8, 10:55=A0am, Thomas Richter <t...@math.tu-berlin.de> wrote:

> The general answer to your question, lacking any further information, is
> just "no".

Given the number of occuring symbols AND frequency of those symbols
PLUS length of the file.

I believe it is possible to calculate the size of a STATIC HUFFMAN
table and the expected size of the resulting file.

For non-static you probably do almost as much work estimating the file
size as just doing the compression completely.


 0

On Feb 8, 10:55=A0am, Thomas Richter <t...@math.tu-berlin.de> wrote:
> What type of information do you have, and which type of answer would you
> need? Do you have the Huffman code available? What do you know on the
> statistics of the source? Do you need a worst-case or an average-case
> answer? Is an estimate sufficient?
>
*  General information--I'll do more specific later
*  I'm only starting the code--I don't have it yet
*  The compressor can scan the source file or parts of it for the
statistics of the source befoe compression
*  I need the best possible depending on user settings
*  An estimation will do--and is in fact what I want.

> The general answer to your question, lacking any further information, is
> just "no".
>

 0
Reply rose.joseph12 (88) 2/9/2012 3:41:45 PM

On Feb 8, 9:18=A0pm, Harry Potter <rose.josep...@yahoo.com> wrote:
> Given the number of occuring symbols and the
> length of the file or data to be compressed, how do I estimate the
> size of the output?

Do you have the histogram [H1, H2, ... Hk] of tokens
to be coded?  If so, the output size in bits is
given approximately by
T (0.03 + log T) - sum Hj log Hj
where T =3D sum Hj, and log is base 2.
0.03 is an estimate of the inefficiency (due to
discreteness) of Huffman coding.

Hope this helps.
James

 0
Reply jdallen2000 (495) 2/9/2012 4:54:20 PM

On 09.02.2012 16:41, Harry Potter wrote:
> On Feb 8, 10:55 am, Thomas Richter<t...@math.tu-berlin.de>  wrote:
>> What type of information do you have, and which type of answer would you
>> need? Do you have the Huffman code available? What do you know on the
>> statistics of the source? Do you need a worst-case or an average-case
>> answer? Is an estimate sufficient?
>>
> *  General information--I'll do more specific later
> *  I'm only starting the code--I don't have it yet
> *  The compressor can scan the source file or parts of it for the
> statistics of the source befoe compression

Is a two-pass approach acceptable, i.e. determine the relative
frequencies first, design the optimal Huffman code, then encode?

> *  I need the best possible depending on user settings
> *  An estimation will do--and is in fact what I want.

If this is simple zero-order Huffman coder: Set p_i to the relative
frequency of symbol i. ceil(-log_2(p_i)) will give you the number of
bits the Huffman code will assign to the symbol. N \sum_i p_i *
ceil(-log_2(p_i)) will give you the compressed file size in bits, where
N is the number of symbols in the stream. Round up to the next byte,
done. Remember that you somehow also need to signal the Huffman code you
used, i.e. there is an overhead of side-channels not included in the
above computation.

Greetings,
Thomas

 0
Reply thor16 (336) 2/9/2012 4:56:01 PM

On Feb 9, 11:56=A0am, Thomas Richter <t...@math.tu-berlin.de> wrote:
> Is a two-pass approach acceptable, i.e. determine the relative
> frequencies first, design the optimal Huffman code, then encode?
>
A two-pass approach is my plan.

> If this is simple zero-order Huffman coder: Set p_i to the relative
> frequency of symbol i. ceil(-log_2(p_i)) will give you the number of
> bits the Huffman code will assign to the symbol. N \sum_i p_i *
> ceil(-log_2(p_i)) will give you the compressed file size in bits, where
> N is the number of symbols in the stream. Round up to the next byte,
> done. Remember that you somehow also need to signal the Huffman code you
> used, i.e. there is an overhead of side-channels not included in the
> above computation.
>
I don't understand most of the above computations as I am limited to a
high-school math education.  But logarithms are too complex to do on
an 8-bit computer with any semblance of speed.

 0
Reply rose.joseph12 (88) 2/9/2012 5:09:36 PM

I should've mentioned: the figures I have are the size of the block to
compress, the values that occur and the number of occurences for each
value.  However, I don't want to use the number of occurences in the
calculation as it would take too long.  And as for the overhead, I
don't want to reveal how I'm going to do compression, so I'll have to
figure it out myself.  Sorry about the rudeness--it wasn't intended.  :
(

On Feb 9, 12:09=A0pm, Harry Potter <rose.josep...@yahoo.com> wrote:
> On Feb 9, 11:56=A0am, Thomas Richter <t...@math.tu-berlin.de> wrote:> Is =
a two-pass approach acceptable, i.e. determine the relative
> > frequencies first, design the optimal Huffman code, then encode?
>
> A two-pass approach is my plan.
>
> > If this is simple zero-order Huffman coder: Set p_i to the relative
> > frequency of symbol i. ceil(-log_2(p_i)) will give you the number of
> > bits the Huffman code will assign to the symbol. N \sum_i p_i *
> > ceil(-log_2(p_i)) will give you the compressed file size in bits, where
> > N is the number of symbols in the stream. Round up to the next byte,
> > done. Remember that you somehow also need to signal the Huffman code yo=
u
> > used, i.e. there is an overhead of side-channels not included in the
> > above computation.
>
> I don't understand most of the above computations as I am limited to a
> high-school math education. =A0But logarithms are too complex to do on
> an 8-bit computer with any semblance of speed.


 0
Reply rose.joseph12 (88) 2/9/2012 5:25:07 PM

On 09.02.2012 18:09, Harry Potter wrote:

> I don't understand most of the above computations as I am limited to a
> high-school math education.  But logarithms are too complex to do on
> an 8-bit computer with any semblance of speed.

Point 1) No, taking the log to the base of two is not a complex
operation. It is a matter of shifting and counting the shifts and is
trivial.

Point 2) Sorry, folks, but without mathematics no compression. The
problem can be solved by getting an education, and only by that. If
that's not your intention - deliver Pizza instead. Sorry to sound so
strict, but it is *really* that simple. As always, it is up to you to
fix it.


 0
Reply thor16 (336) 2/9/2012 7:45:11 PM

On Feb 9, 2:45=A0pm, Thomas Richter <t...@math.tu-berlin.de> wrote:
> Point 1) No, taking the log to the base of two is not a complex
> operation. It is a matter of shifting and counting the shifts and is
> trivial.
>
I can do that.

> Point 2) Sorry, folks, but without mathematics no compression. The
> problem can be solved by getting an education, and only by that. If
> that's not your intention - deliver Pizza instead. Sorry to sound so
> strict, but it is *really* that simple. As always, it is up to you to
> fix it.

You're right.  I think I have some ideas, though.

 0
Reply rose.joseph12 (88) 2/10/2012 2:03:15 PM

On Feb 9, 1:45=A0pm, Thomas Richter <t...@math.tu-berlin.de> wrote:
>
> Point 2) Sorry, folks, but without mathematics no compression. The
> problem can be solved by getting an education, and only by that. If
> that's not your intention - deliver Pizza instead. Sorry to sound so
> strict, but it is *really* that simple. As always, it is up to you to
> fix it.

For order-0/statistical and complex methods, absolutely.  But I wrote
my own LZ77 compressor long before I understood what those were.  My
math is also limited, but I've tried to make up for it with excessive
research.

I have entertained the idea of going back to college (ie. night school
or community college) to complete calculus and trigonometry.  I have
been using many methods and formulas in my hobby work for over a
decade without fully understanding them.  The proverbial "light bulb"
has not yet lit up yet.

 0
Reply MobyGamer301 (143) 2/10/2012 3:06:36 PM

10 Replies
64 Views

Similar Articles

12/12/2013 5:38:00 AM
[PageSpeed]

Similar Artilces:

Db block size
Buongiorno Ho un server con 16 GB ram , 2 Opteron su cui =E8 installato un SUSE 9 ES a64 bit e Oracle 9.2.0.6 a 64 bit in fase di installazione stavo "sintonizzando " i vari parametri , tra cui il parametro db_block size, sapete indicarmi , visto l'hardaware che ho disponibile , e l'utilizzo che ne divrei fare( data warehouse) qual'=E8 la dimensione ottimale di questo parametro , dato che da come ho letto =E8 molto importante in termini di performance. Geazie mille Claudio wrote: > Buongiorno > Ho un server con 16 GB ram , 2 Opteron su cui =E8 installato un SU...

Why the difference in block size after copy?
Hi, I am curious as to why the total number of blocks differ between the source file and the copied file? I copied a file and put it on the same disk as the source file. Sizes are same in terms of bytes, but not in blocks. Is this normal? John In article <1104942512.578701.281920@z14g2000cwz.googlegroups.com>, John wrote: > I am curious as to why the total number of blocks differ between the > source file and the copied file? I copied a file and put it on the same > disk as the source file. Sizes are same in terms of bytes, but not in > blocks. Is this normal? Hole dete...

size of block device by ftell()
Hi all, I want to get the size of a block device by ftell(). I found that I can get the size of a device by seek() and tell() in Python. But not in C. What is difference between them? How can I get the size of a block device by ftell()? # whoami root # du -sh /etc/services 364K /etc/services # df -h | grep hdb1 /dev/hdb1 111G 2.0G 103G 2% /mnt/data1 --------------------------------------------------------------------------- # cat ftell_test.c #include <stdio.h> long int ftell_test(const char *d); int main(void) { printf("%ld\n", ftell_test(&q...

Change block device buffer size
Hi All, Currently the buffer size of block device in Kernel2.4 is 4096 bytes. Can I change it to 1024 bytes? If so, which header file shall I change? Thanks for your help, Mike ...

send function blocks on large buffer sizes
Hello, I am working on Solaris 8 SPARC, I am using send / receive for IPC in my machine (send data from server process to client process). The client process only starts reading after the server process is done writing (I know that this is not a good design and I intend to change it but currently I would like to understand exactly what is going on with my current code). When trying to send large block sizes (more then 160K) the send command blocks and since my client process will not start reading until the server is done writing I have a deadlock. I would like to understand what it the limi...

Hi all, I am connecting an Embedded Matlab Function block to a Display block. My embedded Matlab Function block will output a STRING word (such as Low, Moderate, High) and displayed on display block. but i failed to compile it due to below error: Size mismatch (size [1 x 3] ~= size [1 x 4]). The size to the left is the size of the left-hand side of the assignment. May i know this message is indicating which block that having mismatched size and how can i change the size? Thank you. ...

Variable block size image segmentation/compression
Hi guys, I was just wondering if some of you here are familiar with image block size partitioning? Right now I have a simple vector quantization codec that takes aligned blocks of the same size from the image. This restriction limits considerably the quality of the compressed image, so I'd like to implement an adaptative block splitting technique instead. I've started by testing out a quadtree partitioning method, based on the variance of each block, but to me the splitting is still unnatural, it doesn't fully adapt the image's features. The second test I did was to use a r...

Reed-Solomon code block size choice
Hi all, In designing a reed-solomon code, one has to choose the appropriat code size. If the constraint is computational complexity and delay, is i better to use a large block size, or use several numbers of small blocks In other words, for RS(n,k), is it computationally more efficient to hav n=255, or divide data into 17 blocks with each one has n=15? Here w assume other factors like error correcting capability doesn't matter. Thanks a lot. -Min "mguo" <mguo@omnexcontrols.com> asked in message news:CPWdnf7-Z5cY9_7ZnZ2dnUVZ_tudnZ2d@giganews.com... > Hi all, In desig...

Block size of CMS Signed Data using Bouncycastle
I need to put a large message into a CMS "Enveloped Data" structure. Bouncycastle puts the signed content into a unique big octet string. IAIK instead splits the content into many little octect stings and the method setBlockSize() allows you to set the size of these blocks. Anyone knows how to do the same thing using Bouncycastle? Thanks Regards Giovanni ...

Lotus Notes 8 Calendar -- Saturday/Sunday Block Size
Just converted to Notes 8, and am flailing about, trying to get used to everything being different from the program I used previously (at the risk of starting a holy war of some sort, I shall not mention the "O" word). I am one of those unlucky souls who, on occasion, has to schedule work for a Saturday or Sunday. In the default "One Week" and "One Month" calendar views, the blocks/columns for Saturday and Sunday are smaller and narrower than the rest of the week. I have also found that changing the view to begin with Sunday then causes the Friday and S...

GT.M, block size, global buffers & TRANS2BIG
Hi, I do not quite understand the relationship between the above. The common practice in case of TRANS2BIG is to increase number of global buffers (when the application logic cannot be changed). That's OK. But the GT.M documentation also says that greater block size could negatively influence transaction processing (restarts etc.). At this point I'm getting lost. To me, greater block size basically means that smaller number of blocks gets 'dirty' when an update is performed, so the cache utilization is lower (in percentage, not in absolute memory size). So for greater ...

Inferring block ram in Spartan II with non standard bus sizes
Hello. Im relatively new to VHDL and im doing a small project in which i've to store the co-efficients of the FIR filter into the block ram at system startup which are supposed to be used later by the filter block for implementing the MAC operation needed for the filter. I am using spartan II by xilinx. First of all, what i'd like to know is that, what does the phrase 'infer from the code' imply in the context of memory instantiation in vhdl or any HD. I used to think that there are dedicated hardware memory blocks on the FPGA and we need to address the port pins as in done in ...

Inferring block ram in Spartan II with non standard bus sizes #2
Hello. Im relatively new to VHDL and im doing a small project in which i've to store the co-efficients of the FIR filter into the block ram at system startup which are supposed to be used later by the filter block for implementing the MAC operation needed for the filter. I am using spartan II by xilinx. First of all, what i'd like to know is that, what does the phrase 'infer from the code' imply in the context of memory instantiation in vhdl or any HD. I used to think that there are dedicated hardware memory blocks on the FPGA and we need to address the port pins as in done in ...

block
Hi, Why do people use the block command in VHDL? For example: state_machine : block etc etc I've done a search in google but can find no answers. I was wondering what affect it has on the code. Thanks, "Steve" <Steve@nospam.com> wrote in message news:<c516s3$96f$1@newsg1.svr.pol.co.uk>... > Why do people use the block command in VHDL? For example: > > state_machine : block > etc > etc It would allow you to limit the scope of signals and other declarations that would otherwise span the architecture. I find it less trouble to use a variable ...

Code size as a function of source size
Hi, I would like to pose a very simple question, for which nevertheless I could not find an answer through google: Of what general type is the dependence of the size of compiled code to the size of the respective source code? Or to put it "mathematically", what function f(s) approximates the points (s, o(s)) best, where s is the size of the source code (say in bytes and normalized by removing commentaries and not using uncommonly long identifiers or amounts of white space) and o(s) is the size of the corresponding object code (without any libraries linked in). It seems natu...

sum of segment size and executable size
Hi, why I am getting different size value if i run "size" command and if i see using "ls -l" on an object file. my.c file contains only one empty function my.c f() { } The Output of size command on HPUX $size my.o 8 + 0 + 0 = 8 ( here 8 is text segment size, rest is 0 )$ ls -l my.o -rw-rw-r-- 1 odcqa1 users 696 Dec 30 11:41 my.o Can any one explain me? Why the size of my.o is larger than total of it's segment's size. -Sachin sachin_mzn@yahoo.com said the following, on 12/30/04 01:22: > Hi, > why I am getting different size value if i r...

Find corresponding From blocks in a Goto Block
Hello, is there a fast way to determine if a Goto block has corresponding From blocks? I mean a way with simulink commands. I must find all Goto blocks without an corresponding From block, so i can remove it. My current solution is a brute force method that compares the TAG infos from all Goto and From blocks. regards Frank If you have Embedded Coder then you can do this: modelObject = get_param(modelName, 'UDDObject') feature('EngineInterface', 1) modelObject.getCompiledBlockList feature('EngineInterface', 0) this will let you know which blocks are not needed in...

how does size get passed into new( size ) ?
Q how does size get passed into new( size ) ....when we are creating object ? class Base { public: static void * operator new(size_t size); ... }; Base *p = new Base( ); // calls Base::operator new! Thanks pallav singh Pallav singh wrote: > Q how does size get passed into new( size ) ....when we are creating > object ? How does any argument get passed into a function? Sometimes it's in the stack, sometimes it's in a register. The language Standard does not mandate any particular way of passing the arguments. > class Base { > public: > static void ...

panel size vs. frame size
Hi, i'm writing a gui that has a main panel and includes 3 subpanels.. and has a Menu Bar.. i'm setting size of Frame with GraphicsEnvironment.getLocalGraphicsEnvironment().getMaximumWindowBounds() function. My screen Size is 1280 * 800 and because of Start button, it takes 1280 * 770.. its ok so far. you know the JFrame has a title bar. and suppose that i add a MenuBar to that JFrame. how can i measure the best size of main Panel and its bounds. i'm using null layout in main Panel and force the subpanels to main Panel but the X and Y values are not true. i think it shifts to d...

Prediction
I need to predict the future value of a signal. I dont know anything abou the signal other than it looks like bandpass filtered white noise. I nee to predict the value say 10 samples ahead (I guess this is called ten-step predictor???). Does anyone has a good suggestion on how to d this? Linear prediction? Adaptive filtering or system identificatio techniques? sorenbirk wrote: > I need to predict the future value of a signal. I dont know anything about > the signal other than it looks like bandpass filtered white noise. If you don't know anything about the signal, then you can...