HASH Help Needed

  • Follow


I am working with two large datasets.   The first table, JAN,  is
huge, 140million records, distinct by PersonID, EventDate and Hour:

PersonID
EventDate
Hour
EventAmt


The second table, MASTER,  has 2 fields RateCode and Cycle that I need
to lookup and put onto the first file.  A person can have different
cycle and rate codes for different days or range of days.   The Second
table has 1 million records, distinct by PersonID, FromDate and
ToDate:
PersonID
FromDate
ToDate
RateCode
Cycle

I=92m sure you could do a SQL statement and get from table B values
where the  ID=92s match and the FromDate <=3D EventDate   and   ToDate >=3D
EventDate, but I think that a HASH would be much quicker.   But, I=92m
very new to HASH and am trying to figure out how to set this up.
Here=92s what I have so far:

/
***************************************************************************=
**************/
Data JanNew;
Set   Jan;

If _n_ =3D 0 then set Master;
If _n_ =3D 1 then do;
	Declare hash MAS (Dataset : =91master=92, multidata: =91Y=92, Ordered: =91=
A=92)

	Rc=3Dmas.definekey (=91PersonId=92);

	Rc=3Dmas.definedata (=91fromdt=92, =91todate=92, =91ratecode=92, =91cycle=
=92);

	RC=3Dmas.definedone();

	Do until (eof);
	    Set Jan end=3Deof;
	    RC=3Dmas.find ();
               If RC =3D 0 then do;
	        If FromDate <=3D EventDate <=3D ToDate then  =85=85=85=85?????????=
?
/
***************************************************************************=
**************/

I=92m lost as to how to check the dates to make sure I am looking at the
right lookup records, then get the variables back from the hash object
and output the Event record.



0
Reply mary_idaho (1) 4/19/2010 7:06:30 PM

Hi

The "issue" is that hash tables use exact key matches but you have to
deal with a date range. The option    multidata: 'y'   in the declare
statement allow to load multiple observations per key into the hash
table.

The approach is to first match over the ID and then loop through the
subset to find the observation with the correct date range.

See example below.

Data Main(drop=j);
  format EventDate date9.;
  do personID=1 to 3;
    do j=1 to 10;
      EventDate='01jan2010'd+floor(ranuni(1)*365);
      output;
    end;
  end;
run;

Data Lookup;
  format FromDate ToDate date9.;
  retain RateCode Cycle 'Example Value';
  do personID=1 to 10;
    do FromDate='01jan2010'd to '01Dec2010'd by 30;
      ToDate=FromDate+10;
      output;
    end;
  end;
run;

Data Main(drop=hit rc);
  /* declare hash table */
  attrib FromDate ToDate length=8 format=date9.;
  attrib RateCode Cycle length=$32;
  if _n_=1 then
  do;
    declare hash h(dataset: "Lookup", hashexp: 10,multidata: 'y');
    h.defineKey('personID');
    h.defineData('FromDate','ToDate','RateCode','Cycle');
    h.defineDone();
    call missing(FromDate,Todate,RateCode,Cycle);  end;

  set Main;

  rc=h.find();
  do while(rc = 0);
    if EventDate>=FromDate and EventDate<=ToDate then
    do;
      hit='1';
      leave;
    end;
    rc = h.find_next();
  end;
  if hit ne '1' then call missing(FromDate,Todate,RateCode,Cycle);
run;



In case your data is already sorted by ID and date then may be another
approach could perform bettter.

HTH
Patrick

0
Reply Patrick 4/20/2010 2:34:10 PM



Thanks Patrick!!!   This was very helpful.   I appreciate the
response.   Thanks again.  --Mary
0
Reply MarySAS 4/20/2010 9:06:01 PM

2 Replies
123 Views

(page loaded in 0.397 seconds)

Similiar Articles:













7/18/2012 4:23:14 AM


Reply: