f



Converting SAS code

  We have reached a point where our SAS code is taking enormous
amounts of time to execute. The size of the data sets together with
the nature of the algorithms require days and almost weeks to
completion in some instances. My boss has therefore opened up the idea
of using high performance computing in other languages.

  How do you convert for example a merge command between three sets? I
paste a sample from one of the programs below to give you an idea of
some operations. Please add your ideas on how we can do this more
efficiently.

  DATA a(sgio=yes);

         LENGTH tmin tmax aar 3; /* tre bytes */

         LENGTH cram $ 1;

         RETAIN tmin 0;



         SET libt.crami;  /* pnr aar ugenr ugegrad ugears */

         WHERE MOD(pnr, 10) = &pnrcif;



         BY pnr aar ugenr;



         IF first.aar THEN DO;

                 tmin = ugenr;

         END;

         IF last.aar THEN DO;

                 tmax = ugenr;

     /*            cram = '1';*/

                 OUTPUT;

         END;

         KEEP pnr aar tmin tmax;

/* indeholder: pnr aar tmin tmax cram */

PROC PRINT data=a(obs=500);

run;



/* laver en record for alle personer for alle ?r */



/* indeholder: pnr aar */



/* inds?tter status fra ida, beregner cramstatus for alle ?r og
beregner tmin/tmax for ?r med cram=0 */

DATA b(sgio=yes);

         LENGTH aar 3;

         MERGE libt.pnraar a(IN=cram1) libt.statusi (KEEP = pnr aar
status selvst alder);

         BY pnr aar;

         /*%include '/data2/700730/spells/program/pnrfilter.sas';*/

         WHERE MOD(pnr, 10) = &pnrcif;

                IF cram1=0 THEN DO;

                 cram = '0';

                 tmin = 53;  /* tmin s?ttes til den sidste uge i ?ret +1 */

                 IF aar IN (1986, 1992, 1997, 2003) THEN tmin = 54;

                 tmax = 0;

                END;

                ELSE cram = '1';



                /* antager, at manglende ida-oplysninger om status betyder, at
personen ikke er i arbejdsstyrken */

         IF status = . THEN status = 0;

                IF selvst = . THEN selvst = 0;

RUN;

                /* indeholder: pnr aar tmin tmax status cram */



DATA a; RUN;

DATA pnraar; RUN;



/* beregner nu arbejdsstilling og cramstatus for foreg?ende ?r */

DATA c(sgio=yes);

         LENGTH tmaxprev statprev aar 3;

         RETAIN tmaxprev 0;

         RETAIN statprev 0;

         RETAIN cramprev '0';

                RETAIN selvstprev 0;



         SET b;

         tmaxprev = lag(tmax);

         statprev = lag(status);

         cramprev = lag(cram);

                selvstprev = lag(selvst);



                IF aar = 1985 THEN DO;    /* tilstand for 1984 s?ttes lig tilstand
for 1985 */

                 tmaxprev=tmax;

                 statprev=status;

                 cramprev=cram;

                                selvstprev=selvst;

         END;

RUN;
0
hlane (5)
11/28/2009 3:44:37 PM
comp.soft-sys.sas 140665 articles. 1 followers. Post Follow

4 Replies
693 Views

Similar Articles

[PageSpeed] 6

Maybe, you could improve the performance of your SAS algorithms.
Sometimes very simple changes make significant performance
improvement.  Doing so would seem easier than rewriting in a new
language.  Poorly optimized algorithms probable run about the same
speed in any language.

If you would tell the group about the overall process we could
probably help you improve your existing SAS programs.   The code
snippet you supplied does not proved much information.


On 11/28/09, Håkan Lane <hlane@cls.dk> wrote:
>  We have reached a point where our SAS code is taking enormous
> amounts of time to execute. The size of the data sets together with
> the nature of the algorithms require days and almost weeks to
> completion in some instances. My boss has therefore opened up the idea
> of using high performance computing in other languages.
>
>  How do you convert for example a merge command between three sets? I
> paste a sample from one of the programs below to give you an idea of
> some operations. Please add your ideas on how we can do this more
> efficiently.
>
>  DATA a(sgio=yes);
>
>        LENGTH tmin tmax aar 3; /* tre bytes */
>
>        LENGTH cram $ 1;
>
>        RETAIN tmin 0;
>
>
>
>        SET libt.crami;  /* pnr aar ugenr ugegrad ugears */
>
>        WHERE MOD(pnr, 10) = &pnrcif;
>
>
>
>        BY pnr aar ugenr;
>
>
>
>        IF first.aar THEN DO;
>
>                tmin = ugenr;
>
>        END;
>
>        IF last.aar THEN DO;
>
>                tmax = ugenr;
>
>    /*            cram = '1';*/
>
>                OUTPUT;
>
>        END;
>
>        KEEP pnr aar tmin tmax;
>
> /* indeholder: pnr aar tmin tmax cram */
>
> PROC PRINT data=a(obs=500);
>
> run;
>
>
>
> /* laver en record for alle personer for alle ?r */
>
>
>
> /* indeholder: pnr aar */
>
>
>
> /* inds?tter status fra ida, beregner cramstatus for alle ?r og
> beregner tmin/tmax for ?r med cram=0 */
>
> DATA b(sgio=yes);
>
>        LENGTH aar 3;
>
>        MERGE libt.pnraar a(IN=cram1) libt.statusi (KEEP = pnr aar
> status selvst alder);
>
>        BY pnr aar;
>
>        /*%include
> '/data2/700730/spells/program/pnrfilter.sas';*/
>
>        WHERE MOD(pnr, 10) = &pnrcif;
>
>               IF cram1=0 THEN DO;
>
>                cram = '0';
>
>                tmin = 53;  /* tmin s?ttes til den sidste uge i ?ret +1 */
>
>                IF aar IN (1986, 1992, 1997, 2003) THEN tmin = 54;
>
>                tmax = 0;
>
>               END;
>
>               ELSE cram = '1';
>
>
>
>               /* antager, at manglende ida-oplysninger om status betyder, at
> personen ikke er i arbejdsstyrken */
>
>        IF status = . THEN status = 0;
>
>               IF selvst = . THEN selvst = 0;
>
> RUN;
>
>               /* indeholder: pnr aar tmin tmax status cram */
>
>
>
> DATA a; RUN;
>
> DATA pnraar; RUN;
>
>
>
> /* beregner nu arbejdsstilling og cramstatus for foreg?ende ?r */
>
> DATA c(sgio=yes);
>
>        LENGTH tmaxprev statprev aar 3;
>
>        RETAIN tmaxprev 0;
>
>        RETAIN statprev 0;
>
>        RETAIN cramprev '0';
>
>               RETAIN selvstprev 0;
>
>
>
>        SET b;
>
>        tmaxprev = lag(tmax);
>
>        statprev = lag(status);
>
>        cramprev = lag(cram);
>
>               selvstprev = lag(selvst);
>
>
>
>               IF aar = 1985 THEN DO;    /* tilstand for 1984 s?ttes lig
> tilstand
> for 1985 */
>
>                tmaxprev=tmax;
>
>                statprev=status;
>
>                cramprev=cram;
>
>                               selvstprev=selvst;
>
>        END;
>
> RUN;
>
0
iebupdte
11/28/2009 4:29:16 PM
On Nov 28, 8:44=A0am, hl...@CLS.DK (H=C3=A5kan Lane) wrote:
> =A0 We have reached a point where our SAS code is taking enormous
> amounts of time to execute. The size of the data sets together with
> the nature of the algorithms require days and almost weeks to
> completion in some instances. My boss has therefore opened up the idea
> of using high performance computing in other languages.
>
> =A0 How do you convert for example a merge command between three sets? I
> paste a sample from one of the programs below to give you an idea of
> some operations. Please add your ideas on how we can do this more
> efficiently.
>
> =A0 DATA a(sgio=3Dyes);
>
> =A0 =A0 =A0 =A0 =A0LENGTH tmin tmax aar 3; /* tre bytes */
>
> =A0 =A0 =A0 =A0 =A0LENGTH cram $ 1;
>
> =A0 =A0 =A0 =A0 =A0RETAIN tmin 0;
>
> =A0 =A0 =A0 =A0 =A0SET libt.crami; =A0/* pnr aar ugenr ugegrad ugears */
>
> =A0 =A0 =A0 =A0 =A0WHERE MOD(pnr, 10) =3D &pnrcif;
>
> =A0 =A0 =A0 =A0 =A0BY pnr aar ugenr;
>
> =A0 =A0 =A0 =A0 =A0IF first.aar THEN DO;
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0tmin =3D ugenr;
>
> =A0 =A0 =A0 =A0 =A0END;
>
> =A0 =A0 =A0 =A0 =A0IF last.aar THEN DO;
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0tmax =3D ugenr;
>
> =A0 =A0 =A0/* =A0 =A0 =A0 =A0 =A0 =A0cram =3D '1';*/
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0OUTPUT;
>
> =A0 =A0 =A0 =A0 =A0END;
>
> =A0 =A0 =A0 =A0 =A0KEEP pnr aar tmin tmax;
>
> /* indeholder: pnr aar tmin tmax cram */
>
> PROC PRINT data=3Da(obs=3D500);
>
> run;
>
> /* laver en record for alle personer for alle ?r */
>
> /* indeholder: pnr aar */
>
> /* inds?tter status fra ida, beregner cramstatus for alle ?r og
> beregner tmin/tmax for ?r med cram=3D0 */
>
> DATA b(sgio=3Dyes);
>
> =A0 =A0 =A0 =A0 =A0LENGTH aar 3;
>
> =A0 =A0 =A0 =A0 =A0MERGE libt.pnraar a(IN=3Dcram1) libt.statusi (KEEP =3D=
 pnr aar
> status selvst alder);
>
> =A0 =A0 =A0 =A0 =A0BY pnr aar;
>
> =A0 =A0 =A0 =A0 =A0/*%include '/data2/700730/spells/program/pnrfilter.sas=
';*/
>
> =A0 =A0 =A0 =A0 =A0WHERE MOD(pnr, 10) =3D &pnrcif;
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 IF cram1=3D0 THEN DO;
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cram =3D '0';
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0tmin =3D 53; =A0/* tmin s?ttes til den=
 sidste uge i ?ret +1 */
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0IF aar IN (1986, 1992, 1997, 2003) THE=
N tmin =3D 54;
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0tmax =3D 0;
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 END;
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ELSE cram =3D '1';
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* antager, at manglende ida-oplysninger =
om status betyder, at
> personen ikke er i arbejdsstyrken */
>
> =A0 =A0 =A0 =A0 =A0IF status =3D . THEN status =3D 0;
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 IF selvst =3D . THEN selvst =3D 0;
>
> RUN;
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* indeholder: pnr aar tmin tmax status c=
ram */
>
> DATA a; RUN;
>
> DATA pnraar; RUN;
>
> /* beregner nu arbejdsstilling og cramstatus for foreg?ende ?r */
>
> DATA c(sgio=3Dyes);
>
> =A0 =A0 =A0 =A0 =A0LENGTH tmaxprev statprev aar 3;
>
> =A0 =A0 =A0 =A0 =A0RETAIN tmaxprev 0;
>
> =A0 =A0 =A0 =A0 =A0RETAIN statprev 0;
>
> =A0 =A0 =A0 =A0 =A0RETAIN cramprev '0';
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 RETAIN selvstprev 0;
>
> =A0 =A0 =A0 =A0 =A0SET b;
>
> =A0 =A0 =A0 =A0 =A0tmaxprev =3D lag(tmax);
>
> =A0 =A0 =A0 =A0 =A0statprev =3D lag(status);
>
> =A0 =A0 =A0 =A0 =A0cramprev =3D lag(cram);
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 selvstprev =3D lag(selvst);
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 IF aar =3D 1985 THEN DO; =A0 =A0/* tilsta=
nd for 1984 s?ttes lig tilstand
> for 1985 */
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0tmaxprev=3Dtmax;
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0statprev=3Dstatus;
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0cramprev=3Dcram;
>
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 selvstpre=
v=3Dselvst;
>
> =A0 =A0 =A0 =A0 =A0END;
>
> RUN;

I agree with data _null_ that you need to think through your logical
processes. That said, there are also good tricks that can help. Look
at using a hash in SAS. Plenty of papers on how to do that. You can
also look at parallel processing but it depends on your task and
whether that is possible.

Also, while you could convert SAS code to something like C#, you may
not see any performance gains. I would not suggest you go down that
route w/o a lot of testing.

If you need consulting help, send me an email. I have helped other
companies deal with a similar issue in the past.

Alan
http://www.savian.net

0
Savian
11/28/2009 9:32:55 PM
I agree with Alan - little tweeks are for insignificant issues. Your
boss is considering another language because he senses the real issue
is "program structure", and Base/SAS programmers very rarely use good
structured programming techniques.

You can easily "merge" three or more datasets with Objects written in
SCL.
I have done this with 30+ SAS and Oracle datasets.

This approach takes more code and more programming skill compared to
Base/SAS and Macro, but hey - how much time are you going to burn/
waste learning a new language? With SCL you have access to the full
set of Base/SAS and macro commands, plus a few hundred more.

I am covering this topic over on www.keystonesug.com in a series of
presentations "object programming".  Chapter #2 will be out on
December 9th and will cover Data Step replacement with Object code -
although this example will NOT be optimized for performance .

:o)





0
montura
12/2/2009 12:25:22 PM
On Dec 2, 5:25=A0am, montura <montura...@gmail.com> wrote:
> I agree with Alan - little tweeks are for insignificant issues. Your
> boss is considering another language because he senses the real issue
> is "program structure", and Base/SAS programmers very rarely use good
> structured programming techniques.
>
> You can easily "merge" three or more datasets with Objects written in
> SCL.
> I have done this with 30+ SAS and Oracle datasets.
>
> This approach takes more code and more programming skill compared to
> Base/SAS and Macro, but hey - how much time are you going to burn/
> waste learning a new language? With SCL you have access to the full
> set of Base/SAS and macro commands, plus a few hundred more.
>
> I am covering this topic over onwww.keystonesug.comin a series of
> presentations "object programming". =A0Chapter #2 will be out on
> December 9th and will cover Data Step replacement with Object code -
> although this example will NOT be optimized for performance .
>
> :o)

SCL is a different language than Base SAS so you will have to learn it
as well. Plus it is not really growing. If someone was heading in that
direction from scratch, I would suggest learning C# before investing
any time in SCL. C# has a better editor, better support, 3rd party
support, etc.

If you are going to be doing loads of matrix work, an alternate
language may be faster than SAS. Similarly to relational database work
or strings. That said, most business problems mix and match issues and
SAS is a very good general purpose business language.

From my experience, if someone is out to displace SAS code it has
deeper issues (i.e. price) than performance. They just need to be
honest and not lay it out that SAS is slow.

What is the real intention of this manager?

Alan
0
Savian
12/2/2009 4:37:29 PM
Reply: