f



Re: Deleting SAS Data from a SAS DATASET #13

I am not expert with indexes, but I would expect that an index on each
data set would be employed even when the data are accessed through
a view.  If that is correct, there would be clear advantage to indexing
data for each month separately.  If the data are in one large file,
then when old data are deleted and new data added, the entire index
will need to be recreated on all of the data.  That is a big operation.
If, instead, data are maintained in separate files for each month and
each month of data is indexed independently, then indexing needs to
be performed only for the new data.  The indexes for months which do
not age out would not need to be touched.

Advantage, VIEWS.

Dale

---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@NO_SPAMfhcrc.org
Ph:  (206) 667-2926
Fax: (206) 667-5977
---------------------------------------


--- On Fri, 8/15/08, ./ ADD NAME=Data _null_, <iebupdte@GMAIL.COM> wrote:

> From: ./ ADD NAME=Data _null_, <iebupdte@GMAIL.COM>
> Subject: Re: Deleting SAS Data from a SAS DATASET
> To: SAS-L@LISTSERV.UGA.EDU
> Date: Friday, August 15, 2008, 2:59 PM
> On 8/15/08, Mary <mlhoward@avalon.net> wrote:
> > A view helps on deletes, but I wonder how it affects
> performance of querying the data- wouldn't storing the
> data in 24 different locations cause a significant slowdown
> in perfomance upon querying the data versus having it all in
> one table that is indexed?  If this data is queryied a lot
> but only deleted once a month, the time in querying (which
> probably is in peak time) could be much more important than
> the time in deleting (which could be run when the computer
> is not busy, such as nights or weekends).
>
> It is not the VIEW that has an influence on deleting the
> old data.
>
> I would think that having the 24 indexed data sets might be
> about as
> good as having the giant data set.  I would think the
> indexes could be
> used while accessing the data through views.  Where's
> that guy that
> says he knows everything about using indexed data sets?
>
> I would agree that much depends on how the data is used.
> And I don't
> know the answers to those questions.
>
> >
> > -Mary
> >  ----- Original Message -----
> >  From: ./ ADD NAME=Data _null_,
> >  To: SAS-L@LISTSERV.UGA.EDU
> >  Sent: Friday, August 15, 2008 3:51 PM
> >  Subject: Re: Deleting SAS Data from a SAS DATASET
> >
> >
> >  Summary: PROC DATASETS; AGE statement. + VIEWs
> >
> >  This won't help you delete data from your very
> big data set, but you
> >  may find this example interesting.
> >
> >  You say you append data monthly to a big data set
> then when big gets
> >  too big you need to clean out the old.  And that
> takes a very long
> >  time.
> >
> >  However if you don't physically append but use a
> view to
> >  append/combine you may find it easier to get rid of
> the unwanted old
> >  data.
> >
> >  Consider this code. it pushes MonthlyUpdate onto the
> stack of 24 data
> >  sets and the 24th data set is deleted.  Then all the
> data sets get
> >  renamed to produce a new group of 24.  You can see
> from the notes how
> >  the operation works.  The data sets don't have to
> use a numbered range
> >  M01-M24 I did that for my convenience.
> >
> >  proc datasets library=work;
> >     age MonthlyUpdate m01-m24;
> >     run;
> >     quit;
> >
> >  NOTE: Deleting WORK.M24 (memtype=DATA).
> >  NOTE: Aging the name WORK.M23 to WORK.M24
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M22 to WORK.M23
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M21 to WORK.M22
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M20 to WORK.M21
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M19 to WORK.M20
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M18 to WORK.M19
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M17 to WORK.M18
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M16 to WORK.M17
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M15 to WORK.M16
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M14 to WORK.M15
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M13 to WORK.M14
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M12 to WORK.M13
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M11 to WORK.M12
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M10 to WORK.M11
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M09 to WORK.M10
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M08 to WORK.M09
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M07 to WORK.M08
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M06 to WORK.M07
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M05 to WORK.M06
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M04 to WORK.M05
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M03 to WORK.M04
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M02 to WORK.M03
> (memtype=DATA).
> >  NOTE: Aging the name WORK.M01 to WORK.M02
> (memtype=DATA).
> >  NOTE: Aging the name WORK.MONTHLYUPDATE to WORK.M01
> (memtype=DATA).
> >
> >
> >  Here is a complete example.  Somewhat contrived but I
> think is covers
> >  most of the relevant points.
> >
> >  /* create example data base 2 years of monthly data
> */
> >  filename makedata temp;
> >  data _null_;
> >     file makedata;
> >     do m = 24 to 1 by -1;
> >        month =
> intnx('MONTH',today(),-m,'B');
> >        format month date9.;
> >        put 'data m' m z2. ';';
> >        put +3 'month="' month
> +(-1)'"d;';
> >        put +3 'do day = month to
> intnx("month",month,0,"E");';
> >        put +6    'output;';
> >        put +6    'end;';
> >        put +3 'Format month monyy. day
> date.;';
> >        put +3 'run;';
> >        end;
> >     run;
> >  %inc makedata / source2;
> >
> >  /* macro var to make long list of member easier to
> manage */
> >  %let members = ;
> >  proc sql noprint;
> >     select memname into :members separated by '
> '
> >        from dictionary.members
> >           where libname eq 'WORK' and memname
> eqt 'M';
> >     quit;
> >     run;
> >  %put NOTE: MEMBERS=&members;
> >
> >  /* a 2 year view */
> >  data view24 / view=view24;
> >     set &members open=defer;
> >     run;
> >
> >  /* New data for montly update */
> >  data MonthlyUpdate;
> >     month =
> intnx('MONTH',today(),0,'B');
> >     do day = month to
> intnx("Month",month,0,"E");
> >        output;
> >        end;
> >     format month monyy.;
> >     run;
> >
> >  proc datasets library=work;
> >     age MonthlyUpdate m01-m24;
> >     run;
> >     quit;
> >
> >  /* use the view to access the past 24 months of data
> */
> >  proc print data=view24;
> >     run;
> >
> >
> >
> >  On 8/15/08, SUBSCRIBE SAS-L Chandra Gadde
> <ddraj2015@gmail.com> wrote:
> >  > Hi All
> >  >
> >  > I have several SAS datasets that are very very
> big. (50GB of size). Every
> >  > month, the data is being appended to these
> datasets. I need to deleted the
> >  > data which is greater than 24 months. What is
> the best method to do this?
> >  > Please help me.
> >  >
> >  > I tried PROC SQS and DATA STEP. But these two
> are taking very long time.
> >  >
> >  > Data prod.Master_data;
> >  >  set prod.master_date;
> >  >  if snap_dt = "&end_dt"d then
> delete;
> >  > run;
> >  >
> >  > proc sql;
> >  > crete table prod.master_date from
> prod.master_date
> >  > where snap_dt ne "&end_dt"d;
> >  > quit;
> >  >
> >
0
8/15/2008 10:38:46 PM
comp.soft-sys.sas 142827 articles. 4 followers. Post Follow

0 Replies
869 Views

Similar Articles

[PageSpeed] 20

Reply: