f



Re: problem with large sas data sets #5

If I/O as source of corruption is the criterion, then practically all
applications are "inherently untrustworthy".  True, such risk would have
to increase as amount of I/O increases, but there are means of reducing
that risk - i.e. contemporary analogs of the parity track in 9-track
tapes.

It seems to me that the issue is one of managing risk.  For example, if
a binary compare utility, suggested by RolandRB, declare two files
equivalent, I'd like to know how running a PROC COMPARE further reduces
risk.

Regards,
Mark

-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
RolandRB
Sent: Tuesday, April 22, 2008 3:22 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: problem with large sas data sets

On Apr 21, 4:51 pm, sas_9264 <Shiping9...@gmail.com> wrote:
> Hi, sometimes I have a problem to use unix command to copy, mv or soft
> link large sas data set(over 4-5 GB). After I do that, I cann't open
> that data anymore . Sas complain that ERROR: The open failed because
> library member TEMP.XXXXXX_XX041608.DATA is damaged.Does anyone has
> similar experience?
>
> Thanks,
>
> Shiping

There was a post on this newsgroup (list) about a year ago of a huge
dataset having a few corrupted records after doing a copy.
Unfortunately, you should expect this. After copying a huge dataset,
and somehow being sure it has flushed from the cache, you should use a
utility to do a comparison or better, use proc compare to make sure
you made a good copy. Do a few million shuffles of chunks of data and
one or two might well not work. It's one of the reasons I state that a
"validated" sas reporting system can never be truly validated. There
are too many I/Os going on and sometimes these might fail. By its
design of being a procedural language and creating many datasets in
the process of running a complex job, it is inherently untrustworthy.
0
mkeintz (198)
4/22/2008 12:37:26 PM
comp.soft-sys.sas 142827 articles. 4 followers. Post Follow

1 Replies
587 Views

Similar Articles

[PageSpeed] 43

On Apr 22, 2:37=A0pm, mkei...@WHARTON.UPENN.EDU ("Keintz, H. Mark")
wrote:
> If I/O as source of corruption is the criterion, then practically all
> applications are "inherently untrustworthy". =A0True, such risk would have=

> to increase as amount of I/O increases, but there are means of reducing
> that risk - i.e. contemporary analogs of the parity track in 9-track
> tapes.
>
> It seems to me that the issue is one of managing risk. =A0For example, if
> a binary compare utility, suggested by RolandRB, declare two files
> equivalent, I'd like to know how running a PROC COMPARE further reduces
> risk.
>
> Regards,
> Mark
>
>
>
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SA...@LISTSERV.UGA.EDU] On Behalf Of
>
> RolandRB
> Sent: Tuesday, April 22, 2008 3:22 AM
> To: SA...@LISTSERV.UGA.EDU
> Subject: Re: problem with large sas data sets
>
> On Apr 21, 4:51 pm, sas_9264 <Shiping9...@gmail.com> wrote:
> > Hi, sometimes I have a problem to use unix command to copy, mv or soft
> > link large sas data set(over 4-5 GB). After I do that, I cann't open
> > that data anymore . Sas complain that ERROR: The open failed because
> > library member TEMP.XXXXXX_XX041608.DATA is damaged.Does anyone has
> > similar experience?
>
> > Thanks,
>
> > Shiping
>
> There was a post on this newsgroup (list) about a year ago of a huge
> dataset having a few corrupted records after doing a copy.
> Unfortunately, you should expect this. After copying a huge dataset,
> and somehow being sure it has flushed from the cache, you should use a
> utility to do a comparison or better, use proc compare to make sure
> you made a good copy. Do a few million shuffles of chunks of data and
> one or two might well not work. It's one of the reasons I state that a
> "validated" sas reporting system can never be truly validated. There
> are too many I/Os going on and sometimes these might fail. By its
> design of being a procedural language and creating many datasets in
> the process of running a complex job, it is inherently untrustworthy.- Hid=
e quoted text -
>
> - Show quoted text -

Don't forget that you have cosmic rays smashing into your computer and
its disks at all times. Protons with the energy of a well-hit tennis
ball. I used to be a science teacher and we had a mini cloud-chamber
we demonstrated with a bit of dry ice. You don't have to wait long to
see a cosmic ray pass through. Leave photographic film in your freezer
for a good number of years and it will be ruined by cosmic rays.

There are ways of ensuring data integrity if you really need it by
duplication, backups and corruption checks.

The more I/Os you do, the more the risk of data corruption for the
same elapsed time.
0
rolandberry (1870)
4/22/2008 4:54:36 PM
Reply: