```One of my colleagues claimed that "FORTRAN would be way faster than MATLAB for doing Monte Carlo simulation".

Here we're interested in generating Compound Poisson random variables - that is, random variables of the form X1+...+XN where N is random with the Poisson distribution.  I tested various codes for generating 1 million Poisson(10)-Lognormal(0,1) samples and with careful vectorization brought the MATLAB run time down from 200+ sec to about 1.2 sec, not far off the theoretical limit of about 0.8 sec imposed by randn (see FEX #26042 <http://www.mathworks.com/matlabcentral/fileexchange/26042> for details).

By comparison, SAS and Mathematica implementations took about 3 sec and 10 sec respectively, though I'm not enough of an expert to say if that is the fastest possible time in those languages.

The question is, how much further speed gain could be realized by going to the trouble of implementing an algorithm in a "faster" language such as C or FORTRAN?
```
 0

Ben Petschel <noreply@nospam.org> wrote:
> One of my colleagues claimed that "FORTRAN would be way faster than
MATLAB for doing Monte Carlo simulation".
>
> Here we're interested in generating Compound Poisson random variables -
> that is, random variables of the form X1+...+XN where N is random with the
> Poisson distribution.  I tested various codes for generating 1 million
> Poisson(10)-Lognormal(0,1) samples and with careful vectorization brought
> the MATLAB run time down from 200+ sec to about 1.2 sec, not far off the
> theoretical limit of about 0.8 sec imposed by randn (see FEX #26042
> <http://www.mathworks.com/matlabcentral/fileexchange/26042> for details).
>

So your process here was to generate 1 million Poisson samples, with mean
of 10, and then generate around 10 million Lognormal samples, and sum them?

What are you using to generate the Poisson samples?  How much of your cpu
time is spent doing Poisson, how much doing Lognormal, how much doing the
sums?

"Fast Generation of Discrete Random Variables", Marsaglia, Tsang and Wang,
Jornal of Statistical Software, vol 11, Issue 3, July 2004.

The above paper has an excellent method which is very fast.  Generation of
Poisson variates should approach the speed with which you can generate
uniform variates.

> By comparison, SAS and Mathematica implementations took about 3 sec and
> 10 sec respectively, though I'm not enough of an expert to say if that is
> the fastest possible time in those languages.
>
> The question is, how much further speed gain could be realized by going
> to the trouble of implementing an algorithm in a "faster" language such as
> C or FORTRAN?

If you wnat to go multi-threaded, potentially quite a bit.  Even if not, if
the answer to my question above is that lots of time is required for
Poisson c.f. the genrating same number of Uniform variates, then also quite
a bit.

I have C implementations available, but don't do Fortran myself.  I doubt
that you ould see much difference at all between C and Fortran for this,
but there is certainly scope for improvement over what is available in
MATLAB, SAS or MAthematica.

--
Dr Tristram J. Scott
Energy Consultant
```
 0

tristram.scott@ntlworld.com (Tristram Scott) wrote in message <7fpUm.6811\$6O1.2081@newsfe23.ams2>...
> What are you using to generate the Poisson samples?  How much of your cpu
> time is spent doing Poisson, how much doing Lognormal, how much doing the
> sums?
>
> "Fast Generation of Discrete Random Variables", Marsaglia, Tsang and Wang,
> Jornal of Statistical Software, vol 11, Issue 3, July 2004.
>
> The above paper has an excellent method which is very fast.  Generation of
> Poisson variates should approach the speed with which you can generate
> uniform variates.

Tristram, thanks for the suggestions!

Marsaglia et al's compact table lookup algorithm is implemented in RANDRAW (FEX #7309) and using this instead of POISSRND reduced the total runtime from 3 sec to 1.2 sec.

The final split was about 0.2sec for Poisson, 0.8sec for exp(randn) and 0.2 sec for the aggregation.  Given that the built-ins randn/exp/plus are already highly optimized, could a C implementation be any faster than about 1 second without multithreading?

Sounds like multithreading is the way to go!

Regards,
Ben
```
 0

Ben Petschel <noreply@nospam.org> wrote:
> Marsaglia et al's compact table lookup algorithm is implemented in
> RANDRAW (FEX #7309) and using this instead of POISSRND reduced the total
> runtime from 3 sec to 1.2 sec.
>

I haven't looked at the implementation within RANDRAW.  Is it using the
same algorithm for the underlying uniform generation as you are comparing
it with in POISSRND?  I use the Mersenne Twister throughout.

> The final split was about 0.2sec for Poisson, 0.8sec for exp(randn) and
> 0.2 sec for the aggregation.  Given that the built-ins randn/exp/plus are
> already highly optimized, could a C implementation be any faster than about

I'm not sure without actually trying this out, but potentially there would
be room for improvement.  How important is this to you?

The exp() is likely taking as much cpu time as the randn:
>> n = 1e6;
>> t = cputime;x = rand(10,n);t(end+1) = cputime;
>> y = exp(x);t(end+1) = cputime;z = sum(y);t(end+1) = cputime;
>> diff(t)

ans =

1.1400    1.6600    0.2500

So, 1.14 seconds for generating 10 million randn, 1.66 to exp them, and
0.25 to sum them.

I guess it might be possible to adapt Marsaglia's Ziggurat method to cope
with non-decreasing PDFs, such as the Lognormal has, but I am not sure it
would be a cheap method in the end.   This would potentially allow you to
avoid calling exp() 10 million times.

Do you ask for exp(randn()), or do you ask for randn() and then take the
exp()?  This is quite a large array of data, so there might be efficiencies
in avoiding allocating intermediate storage.

When you take the sums, are you doing this down columns, rather than across
rows?

>> t = cputime;x = rand(n,10);t(end+1) = cputime;
>> y = exp(x);t(end+1) = cputime;z = sum(y,2);t(end+1) = cputime;
>> diff(t)

ans =

1.1400    1.6200    0.4600

Across rows is taking 0.46 seconds instead of 0.25.

>
> Sounds like multithreading is the way to go!
>

It certainly can be, but there is always overhead in handling threads and
waiting for them all to finish etc.  If you are looking at 1 sec for this
part of the code, I would guess that probably the thread overhead is going
to take a big chunk out of your potential gains.  But, if you are doing
this chunk of code many times over within some other loops, all Monte Carlo
fashion, then multi-threading should be able to make a big difference.

--
Dr Tristram J. Scott
Energy Consultant
```
 0

```Not much to add here. But one must be careful when using cputime on a
multicore processor.
---Bob.

"Tristram Scott" <tristram.scott@ntlworld.com> wrote in message
news:gjvUm.6243\$Ub.459@newsfe17.ams2...
>> Marsaglia et al's compact table lookup algorithm is implemented in
>> RANDRAW (FEX #7309) and using this instead of POISSRND reduced the total
>> runtime from 3 sec to 1.2 sec.
>>
>
> I haven't looked at the implementation within RANDRAW.  Is it using the
> same algorithm for the underlying uniform generation as you are comparing
> it with in POISSRND?  I use the Mersenne Twister throughout.
>
>
>
>> The final split was about 0.2sec for Poisson, 0.8sec for exp(randn) and
>> 0.2 sec for the aggregation.  Given that the built-ins randn/exp/plus are
>> already highly optimized, could a C implementation be any faster than
>
> I'm not sure without actually trying this out, but potentially there would
> be room for improvement.  How important is this to you?
>
> The exp() is likely taking as much cpu time as the randn:
>>> n = 1e6;
>>> t = cputime;x = rand(10,n);t(end+1) = cputime;
>>> y = exp(x);t(end+1) = cputime;z = sum(y);t(end+1) = cputime;
>>> diff(t)
>
> ans =
>
>    1.1400    1.6600    0.2500
>
> So, 1.14 seconds for generating 10 million randn, 1.66 to exp them, and
> 0.25 to sum them.
>
> I guess it might be possible to adapt Marsaglia's Ziggurat method to cope
> with non-decreasing PDFs, such as the Lognormal has, but I am not sure it
> would be a cheap method in the end.   This would potentially allow you to
> avoid calling exp() 10 million times.
>
> Do you ask for exp(randn()), or do you ask for randn() and then take the
> exp()?  This is quite a large array of data, so there might be
> efficiencies
> in avoiding allocating intermediate storage.
>
> When you take the sums, are you doing this down columns, rather than
> across
> rows?
>
>>> t = cputime;x = rand(n,10);t(end+1) = cputime;
>>> y = exp(x);t(end+1) = cputime;z = sum(y,2);t(end+1) = cputime;
>>> diff(t)
>
> ans =
>
>    1.1400    1.6200    0.4600
>
> Across rows is taking 0.46 seconds instead of 0.25.
>
>>
>> Sounds like multithreading is the way to go!
>>
>
> It certainly can be, but there is always overhead in handling threads and
> waiting for them all to finish etc.  If you are looking at 1 sec for this
> part of the code, I would guess that probably the thread overhead is going
> to take a big chunk out of your potential gains.  But, if you are doing
> this chunk of code many times over within some other loops, all Monte
> Carlo
> fashion, then multi-threading should be able to make a big difference.
>
>
> --
> Dr Tristram J. Scott
> Energy Consultant
>

```
 0

```tristram.scott@ntlworld.com wrote:
> Ben Petschel wrote:
> > The final split was about 0.2sec for Poisson, 0.8sec for exp(randn) and
> > 0.2 sec for the aggregation.  Given that the built-ins randn/exp/plus are
> > already highly optimized, could a C implementation be any faster than about
> > 1 second without multithreading?
>
> I'm not sure without actually trying this out, but potentially there would
> be room for improvement.  How important is this to you?

I guess if there was only 20% speedup or even 50% it wouldn't be worth the hassle of implementing in another language.
```
 0

Gui speed
Hi I have a GUI with around thirty buttons, depending what mode you are in some will be visible and others wont. Is it quicker for the GUI to create them as it needs them and delete them when finished with, or create them all at the start and switch the visibility on/off? Thanks in advance Dan Dan <daNOSPAMn@theDONTSPAMME303factory.co.uk> wrote in message news:<eef457b.-1@webx.raydaftYaTP>... > Hi > I have a GUI with around thirty buttons, depending what mode you are > in some will be visible and others wont. > Is it quicker for the GUI to create them as it needs them

speed #2
BTW, I repeated my speed test, tcl vs. perl, but this time with the loop in a proc as suggested to improve tcl speed. 10,000,000 iterations: perl 5.4 seconds tcl 7.3 seconds (with my original expr) tcl 3.6 seconds (with incr instead of expr) Putting the tcl loop in a proc made a big difference. In the first test, perl was 10 times faster than tcl. But now, they're in the same ballpark. Now I feel better about tcl. I like using the event loop with non blocking I/O. I wonder if perl has any equivalent to the tcl event loop. -- Internet service http://www.isp2dial.com/ In article <o1tge39tv7br866vn4fmea66evpbigs28t@4ax.com>, John Kelly <jak@isp2dial.com> wrote: . . . >blocking I/O. I wonder if perl has any equivalent to the tcl event >loop. . . . <URL: http://poe.perl.org >. Also <URL: http://download.fedora.redhat.com/pub/fedora/linux/extras/6/i386/repoview/perl-Event.html >, apparently, but I don't understand that one.

timeseries speed
I asked this question earlier, got hundreds of views, but no suggestions/insight into this issue: So perhaps I can reiterate the question slightly differently. Creating timeseries objects is a very slow function. It takes my relatively fast workstation 10 sec / 10,000 records when feeding into the timeseries object using the following code: TS = timeseries(RawTemp, RawDateTime) RawTemp is a vector of decimal numbers (e.g. 1.234) and RawDateTime is a vector of cells containing date strings (e.g. 12/31/2008 00:02:35.0) as described in the product help. When reading potentially hundreds of

speed control
On the board i am using, there is a LED and reciever set up between the blades of a fan. As the fan/motor rotates, a digital pulse is sent via a DAQ to LabView. I can see this pulse and display it using an LED but i have two queries. &nbsp; 1- There should be three pulses every rotation (Three bladed fan) but I only seem to be recieving abour one every 4/5 seconds. This occurs on the test panel and the program. &nbsp; 2- I would like to display the speed, has anyone got any clues on how to do this? I was thinking along the lines of timing a pulse and using a formula to calculate the speed, but i'm having trouble accomplishing this.&nbsp; &nbsp; Any help would be greatly appreciated. &nbsp; Hi there If you can see the LED blinking it means that your sensor is operating corectly. If you can not pass the data to LV (or you can sometimes) it means that you have a problem with dealing with DAQ itself. IT is hard to say more not knowing your setup. About the speed. I understand you want to disply the angular velocity, i.e. the number of rotations per second, rigt? If you have three pulses per rotation, so provided the time between pulses "tp"

netbackup speed
We are running netbackup 5.1 on SF 440.and the library attached is L100 with 3 Ultrium2 drives.Last week we changed the drives from Ultrium2 to Ultrium3 along with the tapes but the backup speed remains the same. The backup of 1.3TB oracle db backup running on a SF6900 takes the same amount of time it was taking with old drives. And I am confused why the new drives could not reduce the backup window? Please help.thanks in advance. atif In article <1140760587.056008.321590@t39g2000cwt.googlegroups.com>, atif76@gmail.com wrote: > We are running netbackup 5.1 on SF 440.and the library attached is L100 > with 3 Ultrium2 drives.Last week we changed the drives from Ultrium2 to > Ultrium3 along with the tapes but the backup speed remains the same. > > The backup of 1.3TB oracle db backup running on a SF6900 takes the same > amount of time it was taking with old drives. And I am confused why the > new drives could not reduce the backup window? > > Please help.thanks in advance. > atif It's unclear if you're backing up to locally attached tape drives or streaming over the network to them. If it's a network stream, have you

Tiger speed
and get up out of the seat without looking at the screen knowing that in a few moments the printed sheet will appear from the printer. Incidentally the G5 does not have any unusual software loaded onto it, it has been lifted out of the box and had Adobe software and Quark Xpress loaded and has, since then, sat unloved at the end of a desk listening to to us all poking fun at it's flabby gutless performance housed within it's preposterously flashy exterior. Perhaps it's sulking. I need speed! Quite frankly it would not bother me in the slightest if I had to pay twice as much... unloved at the end of a desk listening to to > us all poking fun at it's flabby gutless performance housed within it's > preposterously flashy exterior. Perhaps it's sulking. > > I need speed! Quite frankly it would not bother me in the slightest if I had > to pay twice as much for the computer and software but it has got to be > faster than a 350Mhz G3. > > James I would check my box to see if you have a hardware problem, or reinstall the OS and apps. There is something wrong. I have both a G3 iBook (700 MHz) and a G4 P'Mac (dual 500 MHz), and I

connection speed
I was just wondering why a lot of games ask you to choose your connection speed. How does the game use this information? <bob@coolgroups.com> wrote in message news:1fc3411d-e6d6-4170-b75d-41de428ca655@i12g2000prf.googlegroups.com... >I was just wondering why a lot of games ask you to choose your > > connection speed. How does the game use this information? It's so the NSA knows how many feeds they can hook up for your web cam and microphone. Seriously though, it's so the multi-player (online) mode of the game knows how much/how quickly data can be sent to and received from your computer. People with different speed connections connect to the same servers, so the game needs some way of keeping everybody synchronized. If it's an offline game asking this question, then I think it would only be a survey type question.