f



Compression should have been disabled

Regarding compression in the SAS online doc, SAS will disable
compression if "...not possible for the compressed file to be smaller
than an uncompressed one."  However, I've experienced a number of
occasions where the compressed file was about 50% larger than the
uncompressed version.  Just wondering why SAS did not disable
compression in these cases.  The log below shows such an example:

/******************************************************/
379      data lib._history;
380          set allsort;
381      run;

NOTE: There were 180248070 observations read from the data set
WORK.ALLSORT.
NOTE: The data set LIB._HISTORY has 180248070 observations and 8
variables.
NOTE: Compressing data set LIB._HISTORY increased size by 54.12
percent.
Compressed is 1097986 pages; un-compressed would require 712444
pages.
NOTE: DATA statement used (Total process time):
real time           18:43.88
user cpu time       9:29.72

/******************************************************/


Thanks,
Dave

0
2/2/2005 6:22:27 PM
comp.soft-sys.sas 142828 articles. 1 followers. Post Follow

4 Replies
408 Views

Similar Articles

[PageSpeed] 1

Hi Dave -

I think the key is "not possible".  In SAS's view your file was potentially
compressible, but in actuality wasn't.  Whatever algorithm they use didn't
detect your scenario.

My experience is that wide files, especially those with large alpha
variables compress the best.  Narrow files, such as yours with 8 variables,
especially if variables are numeric or narrow width character, don't do as
well.

You would do well to test the results with a subset of your file (10,000
instead of 180 million) before bothering to compress the whole shebang.
Look into "obs=" for your program or datastep options.

Regards

Paul Choate
DDS Data Extraction
(916) 654-2160

-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of DB01
Sent: Wednesday, February 02, 2005 10:22 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Compression should have been disabled

Regarding compression in the SAS online doc, SAS will disable
compression if "...not possible for the compressed file to be smaller
than an uncompressed one."  However, I've experienced a number of
occasions where the compressed file was about 50% larger than the
uncompressed version.  Just wondering why SAS did not disable
compression in these cases.  The log below shows such an example:

/******************************************************/
379      data lib._history;
380          set allsort;
381      run;

NOTE: There were 180248070 observations read from the data set
WORK.ALLSORT.
NOTE: The data set LIB._HISTORY has 180248070 observations and 8
variables.
NOTE: Compressing data set LIB._HISTORY increased size by 54.12
percent.
Compressed is 1097986 pages; un-compressed would require 712444
pages.
NOTE: DATA statement used (Total process time):
real time           18:43.88
user cpu time       9:29.72

/******************************************************/


Thanks,
Dave
0
pchoate (2538)
2/2/2005 6:58:21 PM
Dave,

i think you missread the online docs.  It states the following:

"When a request is made to compress a data set, SAS attempts to determine if
compression will increase the size of the file. SAS reads the header portion
of the file and examines the types and lengths of the variables. If, due to
the number and type of the variables, it is not possible for the compressed
file to be at least 12 bytes per observation smaller than an uncompressed
version, compression is disabled and a message is written to the SAS log. "

Notice it said "if it is not possible" that the resulting file to be atleast
twelve bits smaller than the original.  In your case it more than likely it
possible if one simply when off the header info.  (I am not sure what all
SAS is looking specifically at in the header info.).



Toby Dunn




From: DB01 <dave.boylan@GMAIL.COM>
Reply-To: DB01 <dave.boylan@GMAIL.COM>
To: SAS-L@LISTSERV.UGA.EDU
Subject: Compression should have been disabled
Date: Wed, 2 Feb 2005 10:22:27 -0800

Regarding compression in the SAS online doc, SAS will disable
compression if "...not possible for the compressed file to be smaller
than an uncompressed one."  However, I've experienced a number of
occasions where the compressed file was about 50% larger than the
uncompressed version.  Just wondering why SAS did not disable
compression in these cases.  The log below shows such an example:

/******************************************************/
379      data lib._history;
380          set allsort;
381      run;

NOTE: There were 180248070 observations read from the data set
WORK.ALLSORT.
NOTE: The data set LIB._HISTORY has 180248070 observations and 8
variables.
NOTE: Compressing data set LIB._HISTORY increased size by 54.12
percent.
Compressed is 1097986 pages; un-compressed would require 712444
pages.
NOTE: DATA statement used (Total process time):
real time           18:43.88
user cpu time       9:29.72

/******************************************************/


Thanks,
Dave
0
tobydunn (6018)
2/2/2005 7:10:54 PM
DB01 <dave.boylan@GMAIL.COM> wrote:
> Regarding compression in the SAS online doc, SAS will disable
> compression if "...not possible for the compressed file to be smaller
> than an uncompressed one."  However, I've experienced a number of
> occasions where the compressed file was about 50% larger than the
> uncompressed version.  Just wondering why SAS did not disable
> compression in these cases.  The log below shows such an example:
>
> /******************************************************/
> 379      data lib._history;
> 380          set allsort;
> 381      run;
>
> NOTE: There were 180248070 observations read from the data set
WORK.ALLSORT.
> NOTE: The data set LIB._HISTORY has 180248070 observations and 8
variables.
> NOTE: Compressing data set LIB._HISTORY increased size by 54.12
percent.
> Compressed is 1097986 pages; un-compressed would require 712444 pages.
> NOTE: DATA statement used (Total process time):
> real time           18:43.88
> user cpu time       9:29.72
>
> /******************************************************/

[1]  That isn't *quite* what the SAS docs say.

[2]  You should be able to guess whether SAS compression is worth
your while ahead of time.  Are your character variables packed as
tightly as is reasonable to start with?  Are your numeric variables
needing all the bytes you have given them in your LENGTH statement?
If so, then SAS compression won't help you much.  Don't bother doing
it.  If your strings are likely to have lots of blanks or lots of
duplication, or your numbers are short integers saved as LENGTH 8,
then you might just get a lot of benefit out of SAS compression.

[3]  Why are you working with 180-million-record data sets and
writing such inefficient code?  The above code does nothing but
create yet another copy of your mongo data set, and you could have
avoided at least one of those copies.  Maybe both, depending on
your business process and your data needs.  Rather than worry about
the lousy job that SAS compression does on tightly-packed data
records, I would worry about all the disk space you're using.

David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician
0
cassell.david (1747)
2/2/2005 9:39:45 PM
>[1]  That isn't *quite* what the SAS docs say.

My apologies for misrepresenting the online doc

>[2]  You should be able to guess whether SAS compression is =ACworth
>your while ahead of time.

OK

> Are your character variables pack=ACed as
>tightly as is reasonable to start with?

Yes

>Are your numeric va=ACriables
>needing all the bytes you have given them in your LENGTH statement?

Yes

>If so, then SAS compression won't help you much.  Don't bother doing
>it.

This helps, thanks.

>If your strings are likely to have lots of blanks or lots of
>duplication, or your numbers are short integers saved as LENGTH 8,
>then you might just get a lot of benefit out of SAS compression.

OK

>[3]  Why are you working with 180-million-record data sets and
>writing such inefficient code?

Lots of transactions -and-

How are you measuring efficiency here?  If computer resources are your
_only_ consideration, then I would agree.

>The above code does nothing =ACbut
>create yet another copy of your mongo data set,

Correct

> and you coul=ACd have
>avoided at least one of those copies.  Maybe both, depending on
>your business process and your data needs.

Unavoidable based on _my_ business process and data needs.

>Rather than worr=ACy about
>the lousy job that SAS compression does on tightly-packed data
>records, I would worry about all the disk space you're using.
>

Thank you for your advice. =20

>David=20

Dave

0
2/3/2005 3:40:49 AM
Reply: