|
|
Re: Help! Accessing observations from a dataset using a lookup #8
Summary: Read the problem
#iw-value=1
Michael,
You wrote in part:
Howard, oh I gasped all right, my friend, I gasped BIG TIME! I guess
that the noise of the traffic on the Beltway between Rockville and
Washington, DC must have muted it out:-)
Actually, _BECAUSE_ of the amazing size of the MASTER data set, I ruled
out an in-memory solution such as Hash Tables or Formats. I didn't just
choose a SAS Index solution simply because I wrote the book. If the
prohibition against "modifying" MASTER applies even to building a SAS
index on ID, then I believe Alon is left standing alone without a valid
tool to use.
However, in the problem Alon <akadas@GMAIL.COM> explained that the big file
must be read sequentially, so an index solution is not available, and that
he needed only 300 records. It is easy to store 300 ID's in memory, read
(for some time), checking each ID to see if it is in the wanted ID's. So a
a partial in memory solution is most appropriate. Since there are many
ID's to check a hash would be the one most likely to complete.
I wonder how reliable the data on the file is and what sort of information
it contains. The answers may be worth a bigger gasp than the size.
Ian Whitlock
|
|
0
|
|
|
|
Reply
|
iw1junk (1195)
|
7/1/2007 8:59:33 PM |
|
All,
First I'd like to say thanks to all of you for the input.
Just to confirm for all of you, your gasps are were not in vain, I did
not mistakenly change an 'm' to a 'b' or read an additional 3 zeroes.
Yes, these are on tapes as well.
The key to this problem is that Master is the large dataset which I
cannot modify (there goes the index idea out the window) but Lookup is
the smaller dataset. It may have 100 id's or 100,000 id's, but to say
the least it is much much smaller. Though copying the dataset over
again and then dealing with it works, This amount of redundancy
assosciated with it is too large for me to leave this unnoticed
(though unfortunately that is what myself and everyone else in my
company resorts to).
Even though the format method seems to be nice I am not sure how I
would define the format. I also thought of a different approach that
would store the id's in a macro variable and then do a 'where in
(&id)' - but then realized that cannot be scaled for larger amount of
Id's (or can it?).
One thing that came to mind is putting Id's into several macro
variables in chunks of say 1000 (I am not sure if one macro variable
can fit that many, but for arguments sake lets say it can) &ID1
&ID2.... &&ID&N (&N is the # of macros needed - say N=6 if you have
5,453 Id's).
So one solution would be to just have a do loop with checks on the
Id's (now I'm just strechting out the code's capability, I've never
done this before and have never seen it done before so I hope
something like this is possible - if any of you have suggestions then
please comment).
do i = 1 to &N;
if (id in (&&ID&I..)) then output;
end;
**I realize the above has an issue since it is not in a %do loop but a
regular data step do loop
The nice thing about this is that you can do some stuff like add flags
(ie &&FLAG&I ) which will serve as a counter and when &&flag&i = 1000
you short out the statement '(id in (&&ID&I..))'.
I will also test out the Hash objects method (its good to learn how to
do this eventually).
Again, thanks everyone for the support.
-Alon
|
|
0
|
|
|
|
Reply
|
akadas (8)
|
7/5/2007 4:11:20 PM
|
|
|
1 Replies
31 Views
(page loaded in 6.111 seconds)
|
|
|
|
|
|
|
|
|