f



Re: Deleting SAS Data from a SAS DATASET

One thing you might do is to add an index on the snap_dt to the dataset; =
if that's there then you should be able to delete the records in place:

proc sql;
   delete from prod.master_date;
  where snap_dt =3D "&end_dt"d;
quit;
run;

In both the ways you  are trying now you are creating new data sets =
rather than deleting records from the current data set; it would seem to =
me that a SQL delete statement would be faster than creating new =
datasets even if there isn't an index on the date.=20

-Mary
  ----- Original Message -----=20
  From: SUBSCRIBE SAS-L Chandra Gadde=20
  To: SAS-L@LISTSERV.UGA.EDU=20
  Sent: Friday, August 15, 2008 12:14 PM
  Subject: Deleting SAS Data from a SAS DATASET


  Hi All

  I have several SAS datasets that are very very big. (50GB of size). =
Every
  month, the data is being appended to these datasets. I need to deleted =
the
  data which is greater than 24 months. What is the best method to do =
this?
  Please help me.

  I tried PROC SQS and DATA STEP. But these two are taking very long =
time.

  Data prod.Master_data;
    set prod.master_date;
    if snap_dt =3D "&end_dt"d then delete;
  run;

  proc sql;
  crete table prod.master_date from prod.master_date
  where snap_dt ne "&end_dt"d;
  quit;
0
mlhoward (1803)
8/15/2008 6:17:25 PM
comp.soft-sys.sas 142827 articles. 4 followers. Post Follow

2 Replies
1022 Views

Similar Articles

[PageSpeed] 39

Hi
You've got good ideas how to delete the old observations. My modify
proposal was wrong. Sorry about that!
I still think that changing the data model would be a good idea.
Regards
Patrick
0
8/15/2008 7:18:37 PM
"Mary" <mlhoward@avalon.net> wrote in message
news:018401c8ff03$2d472080$832fa8c0@HP82083701405...
> One thing you might do is to add an index on the snap_dt to the dataset; =
> if that's there then you should be able to delete the records in place:
>
> proc sql;
>    delete from prod.master_date;
>   where snap_dt =3D "&end_dt"d;
> quit;
> run;
>

Your SQL code will work whether or not there's an index on the dataset.  The
problem however, is that the observations aren't physically deleted, they're
just marked for deletion.  The size of the dataset on disk remains the same,
and the next time you process it, it'll take just as much time (more or
less) to read.

And trying it on a very simple, one-variable dataset with 100,000
observations, where half are deleted, the SQL step used .10 CPU seconds and
a data step recreating the dataset without the deleted observations consumed
..06 CPU seconds.  Wall time was a couple seconds shorter for the data step
too.

Different datasets on different machines might come up with different
results, but it doesn't look certain to me that deleting in place is faster.

> In both the ways you  are trying now you are creating new data sets =
> rather than deleting records from the current data set; it would seem to =
> me that a SQL delete statement would be faster than creating new =
> datasets even if there isn't an index on the date.=20
>
> -Mary
>   ----- Original Message -----=20
>   From: SUBSCRIBE SAS-L Chandra Gadde=20
>   To: SAS-L@LISTSERV.UGA.EDU=20
>   Sent: Friday, August 15, 2008 12:14 PM
>   Subject: Deleting SAS Data from a SAS DATASET
>
>
>   Hi All
>
>   I have several SAS datasets that are very very big. (50GB of size). =
> Every
>   month, the data is being appended to these datasets. I need to deleted =
> the
>   data which is greater than 24 months. What is the best method to do =
> this?
>   Please help me.
>
>   I tried PROC SQS and DATA STEP. But these two are taking very long =
> time.
>
>   Data prod.Master_data;
>     set prod.master_date;
>     if snap_dt =3D "&end_dt"d then delete;
>   run;
>
>   proc sql;
>   crete table prod.master_date from prod.master_date
>   where snap_dt ne "&end_dt"d;
>   quit;


0
8/16/2008 12:30:28 AM
Reply: