Listmembers -- I have encountered slow (hours of CPU time) performance trying to run a simple log10 transform compute statement (see below) on a moderate sized file (4788 numeric vars X 40 cases; <5Mb; stored on the local harddisk) on my desktop and laptop PCs (>1.75GHz; >425Mb RAM; Win XP Home SP1) using spss v.12.0.2. By contrast, I observe quite reasonable performance when I run a Student's t test (2394 condition A's vs. 2394 condition B's) on the same data. In attempting to diagnose the problem, I've scoured Raynald Levesque's site and book, as well as spss's corporate site. From these sources, I've experimented with modifying (1) the workspace size -- increasing it to 400Mb, (b) frequency of caches -- increasing it from the default n=20 to n=5000, and number of EXECUTE statements -- from 1 per 4388 COMPUTE statements to 1 per 100 COMPUTEs (see below). I have also confirmed that the syntax runs on a small subset of variables (1st 100) to test whether there was a simple syntax error. The syntax for modifying the spss settings looks like: CACHE. SET WORKSPACE=399000. SET CACHE 4788. show all. The syntax for the transformation looks like: COMPUTE lgsf1=LG10(safe1) . COMPUTE lgsf2=LG10(safe2) . COMPUTE lgsf3=LG10(safe3) . COMPUTE lgsf4=LG10(safe4) . COMPUTE lgsf5=LG10(safe5) . COMPUTE lgsf6=LG10(safe6) . COMPUTE lgsf7=LG10(safe7) . COMPUTE lgsf8=LG10(safe8) . COMPUTE lgsafe9=LG10(safe9) . .. .. .. COMPUTE lgth2391=LG10(thrt2391) . COMPUTE lgth2392=LG10(thrt2392) . COMPUTE lgth2393=LG10(thrt2393) . COMPUTE lgth2394=LG10(thrt2394) . EXECUTE . As I said, I've experimented with the frequency of interspersing EXECUTE statements. If anyone has any suggestions for either improving performance or diagnosing the problem, I would much appreciate it. Perhaps by condensing the code into a more elegant form, performance would be improved?? Thanks, Alex Shackman ------------------------------------------------------------------ Alexander J. Shackman Laboratory for Affective Neuroscience | W.M. Keck Laboratory for Functional Brain Imaging & Behavior University of Wisconsin-Madison 1202 West Johnson Street Madison, Wisconsin 53706 PH: +1 (608) 358-5025 (cell) FAX: +1 (608) 265-2875 EMAIL: ajshackman@gmail.com WWW: http://psyphz.psych.wisc.edu/~shackman | http://brainimaging.waisman.wisc.edu/~shackman/

0 |

1/16/2005 12:08:18 AM

"shackman@wisc.edu" <ajshackman@gmail.com> wrote in message news:1105834098.917026.66130@f14g2000cwb.googlegroups.com... > Listmembers -- > > I have encountered slow (hours of CPU time) performance trying to run a > simple log10 transform compute statement (see below) on a moderate > sized file (4788 numeric vars X 40 cases; <5Mb; stored on the local > harddisk) on my desktop and laptop PCs (>1.75GHz; >425Mb RAM; Win XP > Home SP1) using spss v.12.0.2. By contrast, I observe quite reasonable > performance when I run a Student's t test (2394 condition A's vs. 2394 > condition B's) on the same data. > > In attempting to diagnose the problem, I've scoured Raynald Levesque's > site and book, as well as spss's corporate site. From these sources, > I've experimented with modifying (1) the workspace size -- increasing > it to 400Mb, (b) frequency of caches -- increasing it from the default > n=20 to n=5000, and number of EXECUTE statements -- from 1 per 4388 > COMPUTE statements to 1 per 100 COMPUTEs (see below). I have also > confirmed that the syntax runs on a small subset of variables (1st 100) > to test whether > there was a simple syntax error. > > The syntax for modifying the spss settings looks like: > > CACHE. > SET WORKSPACE=399000. > SET CACHE 4788. > show all. > > The syntax for the transformation looks like: > > COMPUTE lgsf1=LG10(safe1) . > COMPUTE lgsf2=LG10(safe2) . > COMPUTE lgsf3=LG10(safe3) . > COMPUTE lgsf4=LG10(safe4) . > COMPUTE lgsf5=LG10(safe5) . > COMPUTE lgsf6=LG10(safe6) . > COMPUTE lgsf7=LG10(safe7) . > COMPUTE lgsf8=LG10(safe8) . > COMPUTE lgsafe9=LG10(safe9) . > . > . > . > COMPUTE lgth2391=LG10(thrt2391) . > COMPUTE lgth2392=LG10(thrt2392) . > COMPUTE lgth2393=LG10(thrt2393) . > COMPUTE lgth2394=LG10(thrt2394) . > EXECUTE . > > As I said, I've experimented with the frequency of interspersing > EXECUTE statements. > > If anyone has any suggestions for either improving performance or > diagnosing the problem, I would much appreciate it. Perhaps by > condensing > the code into a more elegant form, performance would be improved?? > WORKSPACE and CACHE have nothing to do with it. I'm virtually certain the problem is simply the number of command lines being parsed and (more importantly) echoed in the log. If I'm right, you can make it much faster just by not echoing the commands (SET PRINTBACK OFF.) You should also be able to make it fast by replacing the long string of computes with a much shorter sequence like vector lgsf = lgsf(2394). /* a vector of new variables format lgsf1 to lgsf2394 (f8.5). vector safe = safe1 to safe2394. /* vector of existing variables loop #i = 1 to 2394. compute lgsf(#i) = lg10(safe(#i)). end loop. Jonathan Fry SPSS Inc.

0 |

1/17/2005 10:51:02 PM

On Mon, 17 Jan 2005 16:51:02 -0600, "Jonathan Fry" <jon@NOSPAM.spss.com> wrote: > > "shackman@wisc.edu" <ajshackman@gmail.com> wrote in message > news:1105834098.917026.66130@f14g2000cwb.googlegroups.com... > > Listmembers -- > > > > I have encountered slow (hours of CPU time) performance trying to run a > > simple log10 transform compute statement (see below) on a moderate > > sized file (4788 numeric vars X 40 cases; <5Mb; stored on the local > > harddisk) on my desktop and laptop PCs (>1.75GHz; >425Mb RAM; Win XP > > Home SP1) using spss v.12.0.2. By contrast, I observe quite reasonable > > performance when I run a Student's t test (2394 condition A's vs. 2394 > > condition B's) on the same data. [snip, good documentation concerning attempts to fix; example of code, etc.] > > WORKSPACE and CACHE have nothing to do with it. I'm virtually certain the > problem is simply the number of command lines being parsed and (more > importantly) echoed in the log. If I'm right, you can make it much faster > just by not echoing the commands (SET PRINTBACK OFF.) A few thousand lines need to be echoed, once. This is trivial, isn't it? > > You should also be able to make it fast by replacing the long string of > computes with a much shorter sequence like > > vector lgsf = lgsf(2394). /* a vector of new variables > format lgsf1 to lgsf2394 (f8.5). > vector safe = safe1 to safe2394. /* vector of existing variables > loop #i = 1 to 2394. > compute lgsf(#i) = lg10(safe(#i)). > end loop. > I'm always interested in benchmarking and in what affects performance. The above looks like it could help. But, Jon, and everyone, the original post says that the task did not finish in *hours*! I think that I can understand how this *might* arise under a control of an interpretative parser -- If this is, indeed the cause, I think that SPSS may want to put serious effort into improving the handling of long syntax files, for instance, by some partial compilation of commands 1) When SPSS reads 2394 commands with unique variable names, does it have to search the original variable list, in order, to find the variable to operate on? I remember horrible performance from BMDP owing to this cause, maybe 25 years ago, with a list of 200 variables. I cured it by using the variable 'number' which BMDP could allow, rather similar in essence to the cure described above. However, BMDP fixed that problem, long before SPSS bought them out in 1990 or so. 2) The early version of Paradox's programmable interface had a problem with any long syntax, since it (seemingly) saved all commands as a block of text, and needed to rescan all previous commands in order to count lines, to find where each next command was. That seemed slow for hundreds of lines, on a 35 MH computer. While I've never had a few thousand commands for 40 cases, I have done a few hundred commands for hundreds of cases, with slower computers, and I expect less than a minute. I know that taking a log is slow, compared to some things, but it should not be that slow. Another possibility? The present problem has two 'WorkSpaces', of a sort, that are unusually long - over 32 thousand bytes for each record, and perhaps twice that for the text of Computes. I guess can imagine a subtle problem of 'thrashing' if those two spaces are not both completely in memory. It does seem to me that SPSS should reserve those spaces without difficulty, as Jon says. However, if Windows is finding some reason to mis-handle the allocations, could Windows invoke ridiculous amounts of paging? -- This should show up to the user as a disk-read light that stays on during that long execution. Still curious. -- Rich Ulrich, wpilib@pitt.edu http://www.pitt.edu/~wpilib/index.html

0 |

1/18/2005 4:01:51 AM

shackman@wisc.edu <ajshackman@gmail.com> wrote: > Listmembers -- > I have encountered slow (hours of CPU time) performance trying to run a > simple log10 transform compute statement (see below) on a moderate > sized file (4788 numeric vars X 40 cases; <5Mb; stored on the local > harddisk) on my desktop and laptop PCs (>1.75GHz; >425Mb RAM; Win XP > Home SP1) using spss v.12.0.2. By contrast, I observe quite reasonable > performance when I run a Student's t test (2394 condition A's vs. 2394 > condition B's) on the same data. ..............snip, snip Jonathan and Rich's comments made me curious. It clearly does seem to be a problem of SPSS's interpreter. I duplicated the OP's problem for 1000 and 5000 variables and found: 1) Running per the OP's approach, with 1000s of syntax statements, took about 10 sec. for 1000 variables, and about 270 sec for 5000 . 2) Using DO REPEAT do repeat x = x1 to x5000/y = y1 to y5000 . compute y - lg10(x) . etc. took about 2.5 sec for 1000 variables, and about 100 sec for 5000 variables. 3) Loop, per Jonathan's suggestion, was essentially instantaneous for 1000 variables, and took about 1.5 sec. for 5000 variables. Some comments: SPSS *does* appear to be seriously interpretation bound. I would never have guessed how much this was true. What is more surprising to me is the nonlinearity in degradation of performance. Why should it take 27 times as long to run 5 times as many syntax statements, per 1), or 40 times as long to run a do repeat that is 5X as long? I'm surprised, in fact, that the do repeat is any faster than just straight syntax because I had generally presumed that do repeat simply amplified the flow of syntax fed to the SPSS "engine." -- =-=-=-=-=-=-=-=-=-==-=-=-= Mike Lacy, Ft Collins CO 80523 Clean out the 'junk' to email me.

0 |

1/18/2005 7:06:04 PM

Michael.Lacy.junk@colostate.edu wrote: > Jonathan and Rich's comments made me curious. It clearly > does seem to be a problem of SPSS's interpreter. I duplicated > the OP's problem for 1000 and 5000 variables and found: > 1) Running per the OP's approach, with 1000s of syntax statements, > took about 10 sec. for 1000 variables, and about 270 sec for 5000 . ..... snip, snip of my own material I did the preceding with PRINTBACK ON. (I had neglected Jonathan's comment here about that, and Rich pointed this out offline.) So, I reran the preceding with SET PRINTBACK OFF, and got a time of 6.5 sec for 1000 variables, and 160 sec with 5000 variables. (I'm running under Windows with Viewer output, which might matter here.) Note that the scaling of the problem is still nonlinear. I can't say as I can understand why this might be. Perhaps Jonathan can enlighten us here. -- =-=-=-=-=-=-=-=-=-==-=-=-= Mike Lacy, Ft Collins CO 80523 Clean out the 'junk' to email me.

0 |

1/18/2005 9:16:31 PM

On 18 Jan 2005 14:16:31 -0700, Michael.Lacy.junk@colostate.edu wrote: > Michael.Lacy.junk@colostate.edu wrote: > > Jonathan and Rich's comments made me curious. It clearly > > does seem to be a problem of SPSS's interpreter. I duplicated > > the OP's problem for 1000 and 5000 variables and found: > > > 1) Running per the OP's approach, with 1000s of syntax statements, > > took about 10 sec. for 1000 variables, and about 270 sec for 5000 . > > > .... snip, snip of my own material > > I did the preceding with PRINTBACK ON. (I had neglected Jonathan's > comment here about that, and Rich pointed this out offline.) > > So, I reran the preceding with SET PRINTBACK OFF, and got > a time of 6.5 sec for 1000 variables, and 160 sec with 5000 variables. > (I'm running under Windows with Viewer output, which might matter > here.) Note that the scaling of the problem is still nonlinear. > > I can't say as I can understand why this might be. Perhaps > Jonathan can enlighten us here. This is a little slow, sure, but it is not the grave PROBLEM originally cited -- far slower, on a fast PC. Just to be sure, I looked back at what I quoted from the Original Post -- Original > > > > I have encountered slow (hours of CPU time) performance trying to run a > > simple log10 transform compute statement (see below) on a moderate > > sized file (4788 numeric vars X 40 cases; <5Mb; stored on the local > > harddisk) on my desktop and laptop PCs (>1.75GHz; >425Mb RAM; Win XP > > Home SP1) using spss v.12.0.2. It seems to me that Mike has confirmed that a total performance time of *hours* of CPU time is outside of the scope of Jon's explanation. And, if I read it right, SPSS performs it that slowly on two computers, so the explanation is not just "one screwed up installation" of SPSS and/or Windows. Unless the OP was combining the times of a hundred runs? However, he was generally excellently specific in his documentation. -- Rich Ulrich, wpilib@pitt.edu http://www.pitt.edu/~wpilib/index.html

0 |

1/18/2005 10:06:14 PM