Adobe zlib compression aggressiveness

  • Follow


Hi,
In a PDF file, for the objects that use the FlateDecode filter, zlib
compression is used.  As far as I know, they use the actual zlib
library itself.

However, they must have some special tuning going on, since I can
never create a deflate stream that is as small or perhaps even smaller
than the Adobe generated deflate stream.

I am using maximum compression settings of zlib (well, maximum level
and memory setting of 8 and a window size of 15 bits)

I dont see them using any dictionaries either, so I dont think thats
the issue.

My question is, what other parameters could they have tweaked of zlib
in order to get such an efficient stream?

B

0
Reply byaarov (33) 4/5/2007 5:17:47 PM

<byaarov@yahoo.com> wrote in message 
news:1175793467.037145.322610@p77g2000hsh.googlegroups.com...
> Hi,
> In a PDF file, for the objects that use the FlateDecode filter, zlib
> compression is used.  As far as I know, they use the actual zlib
> library itself.
>
> However, they must have some special tuning going on, since I can
> never create a deflate stream that is as small or perhaps even smaller
> than the Adobe generated deflate stream.
>
> I am using maximum compression settings of zlib (well, maximum level
> and memory setting of 8 and a window size of 15 bits)
>
> I dont see them using any dictionaries either, so I dont think thats
> the issue.
>
> My question is, what other parameters could they have tweaked of zlib
> in order to get such an efficient stream?
>

if in fact using zlib, it is possible they may have customized the string 
matcher to be better suited to the kinds of data present.


for example, I will note another example, png:
IME, my deflater, when tuned for png (speed/ratio), typically performs worse 
on typical data (text and binary data);
when tuned for ordinary data, its speed and ratio, when used on png, leaves 
a little to be desired.

now, it is possible that my algo just isn't tuned all that well in general, 
but at times I have considered having a customized version for png encoding 
(likely more emphasizing speed than ratio though, and probably based on a 
modified earlier version of my encoder).


dunno, this is only a simple example, who knows what adobe might have done.

> B
> 


0
Reply cr88192 4/5/2007 8:33:26 PM


On Apr 5, 10:17 am, byaa...@yahoo.com wrote:
> My question is, what other parameters could they have tweaked of zlib
> in order to get such an efficient stream?

They could be using the deflateTune() function, or equivalently they
may have modified the compression levels table that sets four
parameters to some value for each compression level.  Alternatively,
they could have rewritten part of deflate to do string matching
better, or pick better matches out of those found.  Or they may simply
be using a heuristic to decide when to flush deflate blocks, which can
also provide better compression.

Mark

0
Reply Mark 4/5/2007 8:43:57 PM

On Apr 5, 1:43 pm, "Mark Adler" <mad...@alumni.caltech.edu> wrote:
> On Apr 5, 10:17 am, byaa...@yahoo.com wrote:
>
> > My question is, what other parameters could they have tweaked of zlib
> > in order to get such an efficient stream?
>
> They could be using the deflateTune() function, or equivalently they
> may have modified the compression levels table that sets four
> parameters to some value for each compression level.  Alternatively,
> they could have rewritten part of deflate to do string matching
> better, or pick better matches out of those found.  Or they may simply
> be using a heuristic to decide when to flush deflate blocks, which can
> also provide better compression.
>
> Mark

I read the PDF specification chapter 3, and they describe somewhat
some predictors they use for images and text.  For images, they say
they use PNG predictors and for text, they have some way of seeding
the huffman tables.  I understand the impact of that algorithmically,
but in zlib, how does one provide predictor functions?  Is this done
via deflateTune()?

Also, any custom predictor functions are only used during compression
right?  Any decompression routine even with out that prediction logic
should be able to inflate the stream I suppose?

B

0
Reply byaarov 4/6/2007 7:41:47 PM

<byaarov@yahoo.com> wrote in message 
news:1175888506.962950.226600@p77g2000hsh.googlegroups.com...
> On Apr 5, 1:43 pm, "Mark Adler" <mad...@alumni.caltech.edu> wrote:
>> On Apr 5, 10:17 am, byaa...@yahoo.com wrote:
>>
>> > My question is, what other parameters could they have tweaked of zlib
>> > in order to get such an efficient stream?
>>
>> They could be using the deflateTune() function, or equivalently they
>> may have modified the compression levels table that sets four
>> parameters to some value for each compression level.  Alternatively,
>> they could have rewritten part of deflate to do string matching
>> better, or pick better matches out of those found.  Or they may simply
>> be using a heuristic to decide when to flush deflate blocks, which can
>> also provide better compression.
>>
>> Mark
>
> I read the PDF specification chapter 3, and they describe somewhat
> some predictors they use for images and text.  For images, they say
> they use PNG predictors and for text, they have some way of seeding
> the huffman tables.  I understand the impact of that algorithmically,
> but in zlib, how does one provide predictor functions?  Is this done
> via deflateTune()?
>

what makes you so certain that they used zlib to begin with?...

after all, deflate is simple enough, and common enough, that they may well 
have just implemented a custom compressor (even if they have a 'zlib 
header', that says close to nothing...).


> Also, any custom predictor functions are only used during compression
> right?  Any decompression routine even with out that prediction logic
> should be able to inflate the stream I suppose?
>

potentially, but a lot depends on what kind of predictor.
in PNG, for images, the predictors applied prior to deflating the image, and 
after inflating the image (and are thus not part of the mechanics of 
deflate, but another stage).

as for custom huffman tables, ..., yes, these will not effect a decoder, but 
can help with encoding.


> B
> 


0
Reply cr88192 4/7/2007 12:13:56 AM

On Apr 6, 12:41 pm, byaa...@yahoo.com wrote:
> I understand the impact of that algorithmically,
> but in zlib, how does one provide predictor functions?

The only ways that come to mind are to a) provide a dictionary, or b)
pre-process the the input, e.g. tokenizing words, to help zlib find
matches at the next level of structure in the data.

> Also, any custom predictor functions are only used during compression
> right?  Any decompression routine even with out that prediction logic
> should be able to inflate the stream I suppose?

They would need to be used during decompression as well to either undo
the processing or to provide the same dictionary at the other end.

Mark

0
Reply Mark 4/7/2007 4:52:43 AM

On Apr 6, 5:13 pm, "cr88192" <cr88...@NOSPAM.hotmail.com> wrote:
> what makes you so certain that they used zlib to begin with?...

I know they were using zlib a few years ago, but I don't know for sure
whether they still are.  However the same comments apply to the
possible differences between zlib's deflate and their possibly home-
grown deflate.

Mark

0
Reply Mark 4/7/2007 4:54:43 AM

"Mark Adler" <madler@alumni.caltech.edu> wrote in message 
news:1175921683.270589.123260@d57g2000hsg.googlegroups.com...
> On Apr 6, 5:13 pm, "cr88192" <cr88...@NOSPAM.hotmail.com> wrote:
>> what makes you so certain that they used zlib to begin with?...
>
> I know they were using zlib a few years ago, but I don't know for sure
> whether they still are.  However the same comments apply to the
> possible differences between zlib's deflate and their possibly home-
> grown deflate.
>

yes, that is a good enough answer...

for all I had known though, the OP didn't know, and was just assuming. in my 
case, I didn't know one way or another.


> Mark
> 


0
Reply cr88192 4/7/2007 7:05:54 AM

7 Replies
222 Views

(page loaded in 0.186 seconds)


Reply: