f



parallel processing

Hi, 

I'm not quite sure what 'feature' i'm looking for ... any input
appreciated.

I want to parallelize a particular task.

#/usr/bin/perl -w
use strict;
my @target;
our @result;

for (my $i = 0; $i < @target; $i++) {
  $result[$i] = &do_some_work($target[$i]);
}
&report_results;
....

&do_some_work requires a minute or so to complete.  @target contains
several hundred elements.  Therefore, total execution time runs in the
hundreds of minutes.

Also, @target is not ordered ... e.g. there are no dependencies within
@target ... if &do_some_work finishes processing $target[159] before
it starts (or finishes) $target[17], no problems.

I figure that if i could find a way to spawn lots of copies of
&do_some_work ... that i could reduce total execution time.  Assuming
that my machine has sufficient resources, I might even get total
execution time down to a minute or so.  This would be a major win for
me -- I would like this app to complete within ten minutes at the
outside.

What Perl 'feature' should I explore to do this?  Am I walking into
'threads' here?

--sk

Stuart Kendrick
FHCRC
0
skendric
1/6/2004 9:16:18 PM
comp.lang.perl.misc 33233 articles. 2 followers. brian (1246) is leader. Post Follow

6 Replies
629 Views

Similar Articles

[PageSpeed] 42

skendric@fhcrc.org (Stuart Kendrick) writes:
> I'm not quite sure what 'feature' i'm looking for ... any input
> appreciated.
>
> I want to parallelize a particular task.
>
> #/usr/bin/perl -w
> use strict;

You should probably prefer 'use warnings;' to the -w flag these days.
I still use -w, but it's mostly finger macros I haven't retrained yet.

> my @target;
> our @result;
>
> for (my $i = 0; $i < @target; $i++) {
>   $result[$i] = &do_some_work($target[$i]);
> }
> &report_results;
> ...

Ack, don't *do* that.  Specifically, don't call subs with &.  See
perlfaq7, "What's the difference between calling a function as &foo
and foo()?"

You can probably get away with just fork()ing inside do_some_work()
(note lack of '&').  'perldoc -f fork' should give you the skinny.
See also perlipc for a slightly broader view.

-=Eric
-- 
Come to think of it, there are already a million monkeys on a million
typewriters, and Usenet is NOTHING like Shakespeare.
		-- Blair Houghton.
0
Eric
1/6/2004 9:19:56 PM
skendric@fhcrc.org (Stuart Kendrick) wrote:
> I want to parallelize a particular task.
> 
> #/usr/bin/perl -w
> use strict;
> my @target;
> our @result;

Why 'our'?

> for (my $i = 0; $i < @target; $i++) {

  for my $i (0..$#target) {

or, better,
  push $result, do_some_work($_) for @target;

>   $result[$i] = &do_some_work($target[$i]);

Don't call subs with &.

> }
> &report_results;
> ...
> 
> &do_some_work requires a minute or so to complete.  @target contains
> several hundred elements.  Therefore, total execution time runs in the
> hundreds of minutes.
> 
> Also, @target is not ordered ... e.g. there are no dependencies within
> @target ... if &do_some_work finishes processing $target[159] before
> it starts (or finishes) $target[17], no problems.
> 
> I figure that if i could find a way to spawn lots of copies of
> &do_some_work ... that i could reduce total execution time.

This will only help if either your machine has more than one processor
or do_some_work spends time doing nothing: say, waiting for results
from the network. If the task is pure computation, multi-threading on
a single-processor machine will increase the time taken to
complete, due to threading overheads.

>  Assuming that my machine has sufficient resources, I might even get
> total execution time down to a minute or so.  This would be a major
> win for me -- I would like this app to complete within ten minutes
> at the outside.  

> What Perl 'feature' should I explore to do this?
> Am I walking into 'threads' here?

Yup. Probably 'async'. Make sure you are using a post-5.8.0 perl, and
read perldoc perlthrtut. If your tasks really are independant, about
the only tricky bit should be making sure all the threads have
finished before reporting the results.

Ben

-- 
Heracles: Vulture! Here's a titbit for you / A few dried molecules of the gall
   From the liver of a friend of yours. / Excuse the arrow but I have no spoon.
(Ted Hughes,        [ Heracles shoots Vulture with arrow. Vulture bursts into ]
 /Alcestis/)        [ flame, and falls out of sight. ]         ben@morrow.me.uk
0
Ben
1/6/2004 9:32:37 PM
Hello Stuart,

> I'm not quite sure what 'feature' i'm looking for ... any input
> appreciated.
> 
> I want to parallelize a particular task.

Have a look at the documentation for the Parallel::ForkManager
module, we've used it to great effect for certain tasks.

Hope this helps,

Simon Taylor

0
Simon
1/6/2004 11:04:15 PM
Stuart Kendrick wrote:
> I want to parallelize a particular task.

Forking multiple child processes is very easily done by help of the 
CPAN module Parallel::ForkManager.

-- 
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

0
Gunnar
1/6/2004 11:08:47 PM
It was a dark and stormy night, and Stuart Kendrick managed to scribble:

> Hi,
> 
> I'm not quite sure what 'feature' i'm looking for ... any input
> appreciated.
> 
> I want to parallelize a particular task.
> 

Depending on the task, it may not run any faster unless you have more than 1 CPU.

gtoomey
0
Gregory
1/7/2004 5:27:48 AM
thanx for all the input.  turns out that the parallel processes needed
read/write access to data structures within the main process ... so i
used threads and threads:shared.  thanx also for the stylistic
pointers ... i'm pulling out & and -w from my scripts now.

i'm pleased with the result ...
http://www.skendric.com/device/Cisco/shutdown-network ... a script
which disables the access layer of our network in about a minute,
thanx to the use of threading ... one thread per ethernet switch.  i
hope we'll never use it ... but in the event of a catastrophic worm
infection, i'm going to be real grateful that i have this tool
available to me.

--sk

Stuart Kendrick
FHCRC
0
skendric
1/21/2004 4:22:50 PM
Reply: