52-week high rolling window -- HELP!!!!!!!!

  • Follow


Hi All,

I'm trying to create a SQL query, which finds the previous 52-week's
maximum price for each day. My stock prices are daily. The variables
are p (price), ticker and date. I have written the following code:

%let J=52;
proc sql noprint;
    create table _52mom.umd4
	as select a.ticker, a.date, max(p) as high
	from _52mom.intreturns_s4 (keep = ticker date) as a,
_52mom.intreturns_s4 as b
	where a.ticker = b.ticker and b.date between a.date and intnx('week',
a.date, -&J)
	group by a.ticker, a.date
	having count(a.date)>=200;
quit;

However, my dataset is quite large (the whole set is about 4.6G, only
3 variables - Price, Ticker, and Date). I have subdivided it into four
subsamples. The smallest is about 500M.

I have tested the above code on a 8M data, and it works out fine.
However, when I run it on the 500M, it becomes extremely slow (I have
been running for 6 hours now, still going, not sure when it will
finish). Also I'm not sure about the spillover table that proc SQL
creates, and how large that will become.

Could someone please kindly help me on this issue? I didn't think the
data would take this long to run, is there an easier/more efficient
way to do this? Sorry I am kind of new to SAS, don't know much about
it yet.

Thanks very much in advance!!!
0
Reply billyiqing (2) 8/5/2010 6:22:39 AM

Hi

Joining such a huge table with itself is for sure very resource
intensive.

Assuming that your source is a SAS table I'd expect that the code
below will perform better. If the max price was reached on several
days then you'll get several obs per ticker.

data intreturns_s4;
  format date date9.;
  stop=today();
  drop stop;
  do ticker=1 to 3;
    do date='01JAN2009'd to stop;
      p=ceil(ranuni(1)*100);
      output;
    end;
  end;
run;

%let J=52;
data _null_;
  start_date=intnx('week',today(), -&J,'b');
  stop_date =intnx('week',today(), -1,'e');
  put start_date= weekdate.;
  put stop_date =   weekdate.;
  call symput('start_date',cats(start_date));
  call symput('stop_date',cats(stop_date));
run;


proc sql;
  create view work.intreturns_s4_SORTED as
    select ticker, date, p
      from work.intreturns_s4
        where date between &start_date and &stop_date
      order by ticker, p DESC
  ;
quit;

data work.umd4;
  array p_retained {1} 8 _temporary_;
  set intreturns_s4_SORTED;
  by ticker;
  if first.ticker then p_retained{1}=p;
  if p=p_retained{1} then output;
run;

proc print data=work.umd4;
run;


If you need everything done in SQL then search this forum with
keywords SQL and TOP. There is more than one thread discussing how to
select the top record using SQL.

HTH
Patrick
0
Reply Patrick 8/5/2010 9:36:31 AM


On Aug 5, 2:36=A0am, Patrick <patrick.mat...@gmx.ch> wrote:
> Hi
>
> Joining such a huge table with itself is for sure very resource
> intensive.
>
> Assuming that your source is a SAS table I'd expect that the code
> below will perform better. If the max price was reached on several
> days then you'll get several obs per ticker.
>
> data intreturns_s4;
> =A0 format date date9.;
> =A0 stop=3Dtoday();
> =A0 drop stop;
> =A0 do ticker=3D1 to 3;
> =A0 =A0 do date=3D'01JAN2009'd to stop;
> =A0 =A0 =A0 p=3Dceil(ranuni(1)*100);
> =A0 =A0 =A0 output;
> =A0 =A0 end;
> =A0 end;
> run;
>
> %let J=3D52;
> data _null_;
> =A0 start_date=3Dintnx('week',today(), -&J,'b');
> =A0 stop_date =3Dintnx('week',today(), -1,'e');
> =A0 put start_date=3D weekdate.;
> =A0 put stop_date =3D =A0 weekdate.;
> =A0 call symput('start_date',cats(start_date));
> =A0 call symput('stop_date',cats(stop_date));
> run;
>
> proc sql;
> =A0 create view work.intreturns_s4_SORTED as
> =A0 =A0 select ticker, date, p
> =A0 =A0 =A0 from work.intreturns_s4
> =A0 =A0 =A0 =A0 where date between &start_date and &stop_date
> =A0 =A0 =A0 order by ticker, p DESC
> =A0 ;
> quit;
>
> data work.umd4;
> =A0 array p_retained {1} 8 _temporary_;
> =A0 set intreturns_s4_SORTED;
> =A0 by ticker;
> =A0 if first.ticker then p_retained{1}=3Dp;
> =A0 if p=3Dp_retained{1} then output;
> run;
>
> proc print data=3Dwork.umd4;
> run;
>
> If you need everything done in SQL then search this forum with
> keywords SQL and TOP. There is more than one thread discussing how to
> select the top record using SQL.
>
> HTH
> Patrick

I think this is something PROC EXPAND, in the ETS module, is uniquely
qualified to do without a lot of tedious Data Step and / or SQL
coding.  Check out my paper, "Time Series Magic: Using PROC EXPAND
with Time Series Data" which is available for free download at
http://www.sierrainformation.com .  Click on the Free Downloads link
on the left hand side of the page and take things from there.

Thanks,

Andrew Karp
Sierra Information Services
http://www.sierrainformation.com
0
Reply sfbay0001 (430) 8/5/2010 2:26:19 PM

Thank you very much for your help Patrick!

However, one thing that I didn't make clear in the original thread was
that I need a rolling window, i.e. I need to calculate this 52-week
high for EVERY observation in my dataset. Hence the big join in my
SQL.

Any thoughts on this?

Thanks
Bill

On Aug 5, 6:22=A0pm, billyiqing <billyiq...@gmail.com> wrote:
> Hi All,
>
> I'm trying to create a SQL query, which finds the previous 52-week's
> maximum price for each day. My stock prices are daily. The variables
> are p (price), ticker and date. I have written the following code:
>
> %let J=3D52;
> proc sql noprint;
> =A0 =A0 create table _52mom.umd4
> =A0 =A0 =A0 =A0 as select a.ticker, a.date, max(p) as high
> =A0 =A0 =A0 =A0 from _52mom.intreturns_s4 (keep =3D ticker date) as a,
> _52mom.intreturns_s4 as b
> =A0 =A0 =A0 =A0 where a.ticker =3D b.ticker and b.date between a.date and=
 intnx('week',
> a.date, -&J)
> =A0 =A0 =A0 =A0 group by a.ticker, a.date
> =A0 =A0 =A0 =A0 having count(a.date)>=3D200;
> quit;
>
> However, my dataset is quite large (the whole set is about 4.6G, only
> 3 variables - Price, Ticker, and Date). I have subdivided it into four
> subsamples. The smallest is about 500M.
>
> I have tested the above code on a 8M data, and it works out fine.
> However, when I run it on the 500M, it becomes extremely slow (I have
> been running for 6 hours now, still going, not sure when it will
> finish). Also I'm not sure about the spillover table that proc SQL
> creates, and how large that will become.
>
> Could someone please kindly help me on this issue? I didn't think the
> data would take this long to run, is there an easier/more efficient
> way to do this? Sorry I am kind of new to SAS, don't know much about
> it yet.
>
> Thanks very much in advance!!!

0
Reply billyiqing (2) 8/6/2010 1:49:12 AM

Hi Bill

Just to clarify before I'm coming up with something - please correct
the following statements if wrong:

- The result data set contains all the observations from the source
data set which fall into the date range of the last 52 weeks.
- The combination of ticker and date is not unique.
- The variable 'high' (max. price) contains the max value of
'p' (price) for each combination of ticker and date.

i.e. Have

Ticker Date P
1 01jan2010 5
1 01jan2010 4
1 01jan2010 1
1 02jan2010 6
1 02jan2010 9
1 02jan2010 3

Want:
Ticker Date High
1 01jan2010 5
1 01jan2010 5
1 01jan2010 5
1 02jan2010 9
1 02jan2010 9
1 02jan2010 9


Thanks
Patrick
0
Reply Patrick 8/6/2010 7:38:54 AM

Hi Patrick

The data looks something like this:

Ticker    Date            P
1           01Jan2010   5
1           02Jan2010   5
1           03Jan2010   6
1           04Jan2010   6.5
1           05Jan2010   6
2           01Jan2010   12
2           02Jan2010   12.4
2           03Jan2010   12.3
2           04Jan2010   13
2           05Jan2010   12.8
2           05Jan2010   14

And I want something like this:

Ticker    Date            P              High
1           01Jan2010   5               5
1           02Jan2010   5               5
1           03Jan2010   6               6
1           04Jan2010   6.5            6.5
1           05Jan2010   6               6.5
2           01Jan2010   12             12
2           02Jan2010   12.4          12.4
2           03Jan2010   12.3          12.4
2           04Jan2010   13             13
2           05Jan2010   12.8           13
2           05Jan2010   14             14

The result data needs to contain ALL observations in the original
dataset, with an additional column containing the past 52-week high
for each observation.
Each ticker would have the stock price for the stock's entire period,
therefore the the Ticker-Date is unique. But neither the ticker itself
nor the date itself is unique.
The date spans across several years - sometimes really long periods.

Thank you SO much for your help!!!!
On Aug 6, 7:38=A0pm, Patrick <patrick.mat...@gmx.ch> wrote:
> Hi Bill
>
> Just to clarify before I'm coming up with something - please correct
> the following statements if wrong:
>
> - The result data set contains all the observations from the source
> data set which fall into the date range of the last 52 weeks.
> - The combination of ticker and date is not unique.
> - The variable 'high' (max. price) contains the max value of
> 'p' (price) for each combination of ticker and date.
>
> i.e. Have
>
> Ticker Date P
> 1 01jan2010 5
> 1 01jan2010 4
> 1 01jan2010 1
> 1 02jan2010 6
> 1 02jan2010 9
> 1 02jan2010 3
>
> Want:
> Ticker Date High
> 1 01jan2010 5
> 1 01jan2010 5
> 1 01jan2010 5
> 1 02jan2010 9
> 1 02jan2010 9
> 1 02jan2010 9
>
> Thanks
> Patrick

0
Reply billyiqing 8/6/2010 9:47:52 AM

Hi Bill

Only now I understood what this 52 maximum for each day really means.
Nasty little bugger this one!

What SAS version are you on? I'm thinking about an approach with the
SAS iterator object which is only available since SAS 9.2.

Let me know.

Patrick
0
Reply Patrick 8/6/2010 1:22:16 PM

Hi Bill

Below code which (I think) solves the problem and should perform
reasonably well.
Everything else I can think of right now wouldn't add much performance
gain but complicate the code a lot.

If you're not familiar with the hash object:
Besides of the SASDoc there are also a few good white papers around. I
find also this quick reference very helpful:
http://support.sas.com/rnd/base/datastep/dot/hash-tip-sheet.pdf

data have;
  infile datalines delimiter=' ';
  input Ticker Date:date9. P;
  format date date9.;
datalines;
1 01May2008 50
1 01Feb2009 3
1 01Mar2009 2
1 01Apr2009 4
1 01May2009 3
1 01Jun2009 1
1 01Jul2009 3
1 01Aug2009 4
1 01Jan2010 5
1 02Jan2010 5
1 03Jan2010 6
1 04Jan2010 6.5
1 05Jan2010 6
2 01Jan2010 12
2 02Jan2010 12.4
2 03Jan2010 12.3
2 04Jan2010 13
2 05Jan2010 12.8
2 05Jan2010 14
;
run;

/* number of weeks to look at */
%let J=52;

/* select all obs within date range and order result set by ticker and
date */
proc sql;
  create view Vhave_Sorted as
    select *
      from have
     order by ticker,date
  ;
quit;

data want(keep=Ticker Date p high);
  set Vhave_Sorted;
  by ticker date;
  ThisDate=date;

  /* declare hash table for look up */
  if _n_=1 then
  do;
    declare hash h ();
    rc=h.defineKey('ticker','ThisDate');
    rc=h.defineData('p');
    rc=h.defineDone();
  end;

  /* clear all entries in hash table if a new ticker starts */
  if first.ticker then
  do;
    rc=rc = h.clear();
  end;

  /* add keys and data as in current record as defined to hash table
*/
  rc=h.add();

  /* lookup data in hash table and keep the max data value in the last
52 weeks */
  /* starting from the current date
value                                       */
  do ThisDate=(date- 7*&j) to date;
    rc=h.find();
    high=max(high,p);
  end;

run;

proc print data=want noobs;
run;

I was not 100% sure how exactly the date range has to be defined. You
might have to amend it to suit your needs.
What I did is just look back at the last 7*52 days:
-> do ThisDate=(date- 7*&j) to date;
.....or should it be 7*52+1 - or is it the last 52 weeks before the
date in current week (you would need intnx() for this).

I'm sure you'll manage to sort this out ;-)

Let me know if the code worked for you.

HTH
Patrick

0
Reply Patrick 8/6/2010 2:34:14 PM

Hi Andrew

Could you provide a code example which solves Bill's problem (he gave
us now some sample data)?

I'm really curious how this could be done with PROC EXPAND only.

Thanks
Patrick
0
Reply Patrick 8/6/2010 2:37:27 PM

Hi Bill

And last but not least (seem I got inspired...):
In case that you have a lot of RAM (>4GB addressable) and you don't
need your result data set sorted by ticker and date then loading the
full table into a hash for lookup would avoid the necessity for
sorting (which costs a lot of resources and time).

data have;
  infile datalines delimiter=' ';
  input Ticker Date:date9. P;
  format date date9.;
datalines;
1 01Feb2009 3
1 01Mar2009 2
1 01Apr2009 4
1 01May2009 3
1 01Jun2009 1
1 01Jul2009 3
1 01Aug2009 4
1 01Jan2010 5
1 01May2008 50
1 02Jan2010 5
1 03Jan2010 6
1 04Jan2010 6.5
1 05Jan2010 6
2 01Jan2010 12
2 02Jan2010 12.4
2 03Jan2010 12.3
2 04Jan2010 13
2 05Jan2010 12.8
2 05Jan2010 14
;
run;

/* number of weeks to look at */
%let J=52;

data want(keep=Ticker Date p high);
  set have;

  /* declare hash table for look up */
  if _n_=1 then
  do;
    declare hash h (dataset:'work.have');
    rc=h.defineKey('ticker','Date');
    rc=h.defineData('p');
    rc=h.defineDone();
  end;

  /* lookup data in hash table and keep the max data value in the last
52 weeks */
  /* starting from the current date
value                                       */
  ThisDate=date;
  do date=(ThisDate- 7*&j) to date;
    rc=h.find();
    high=max(high,p);
  end;
  date=ThisDate;

run;

proc print data=want noobs;
run;


If you want to use this code then you have to pre-define the size of
the hash table - which can be tricky. The default is very likely to
small and not efficient.

Just two links dealing with this hashexp question:

http://books.google.com.au/books?id=PplQOP__uhcC&pg=PA397&lpg=PA397&dq=sas+hashexp+calculate&source=bl&ots=u3MMo0v2g1&sig=igTYtfGOKMDMm7EEksxHBuu-La4&hl=en&ei=kiJcTMnIOZKecdvbjPsB&sa=X&oi=book_result&ct=result&resnum=6&ved=0CDEQ6AEwBQ#v=onepage&q&f=false

http://support.sas.com/kb/34/193.html


I'm guilty of not having defined the hash table size in the code I
provided. For the previous example the hash table stays quite small
(less than 3000 records for 7.5 years and one ticker). It might be
worth to define a hashexp of 2 only (not sure about this - that's why
I just ignored it).

HTH
Patrick
0
Reply Patrick 8/6/2010 3:17:31 PM

Billy,

If you havent done so already, I think that if you first create a
dataset with only the DAILY MAXIMUM prices and run your query on that
dataset, you will see significant performance improvements. After all
the last 52-week max price will be among these. SAS will query perhaps
2-10% fewer records to find that rolling MAX, I think.

T
0
Reply tanwan 8/21/2010 11:51:54 AM

10 Replies
302 Views

(page loaded in 0.097 seconds)

Similiar Articles:













7/20/2012 6:32:39 AM


Reply: