f

#### Parallel processing

```Hello,

I'm trying to implement a parallel process, and I'm not sure how to set it up and I was wondering if someone could help me please?

My code resembles,

for i=1:N  % N approx = 10^6
x_new = my_func(x_old)
A(:,i) = x_new;
x_old = x_new;
end

where x_old and x_new are Mx1 vectors and A is a MxN matrix.  From one iteration to the next, the only dependence is x_new(i) on x_old(i).  I'd like to parallelise this by splitting the elements of x_new and x_old across more than one core.  Is there a way I can, say,

matlabpool open 5
<use core's local copy of x_new, x_old, A>
for i=1:N  % N approx = 10^6
x_new = my_func(x_old)   % now x_new and x_old are M/5 x 1 vectors
A(:,i) = x_new;
x_old = x_new;
end
<retrieve local copies>

Even if someone just tells me what the commands are, I'd be grateful since that'll be enough for me to look them up.

Thanks!

PLH.
```
 0
PLH
1/11/2011 10:23:05 AM
comp.soft-sys.matlab 211266 articles. 23 followers. lunamoonmoon (257) is leader.

9 Replies
697 Views

Similar Articles

[PageSpeed] 20

```
"PLH " <paulhalkyard@googlemail.com> wrote in message
news:ighb28\$hkg\$1@fred.mathworks.com...
> Hello,
>
> I'm trying to implement a parallel process, and I'm not sure how to set it
> up and I was wondering if someone could help me please?
>
> My code resembles,
>
> for i=1:N  % N approx = 10^6
>     x_new = my_func(x_old)
>     A(:,i) = x_new;
>      x_old = x_new;
> end
>
> where x_old and x_new are Mx1 vectors and A is a MxN matrix.  From one
> iteration to the next, the only dependence is x_new(i) on x_old(i).  I'd
> like to parallelise this by splitting the elements of x_new and x_old
> across more than one core.  Is there a way I can, say,
>
> matlabpool open 5  <use core's local copy of x_new, x_old, A>
> for i=1:N  % N approx = 10^6
>     x_new = my_func(x_old)   % now x_new and x_old are M/5 x 1 vectors
>     A(:,i) = x_new;
>      x_old = x_new;
> end
> <retrieve local copies>
>
> Even if someone just tells me what the commands are, I'd be grateful since
> that'll be enough for me to look them up.

I think the command you're looking for is PARFOR.

--
Steve Lord
slord@mathworks.com
comp.soft-sys.matlab (CSSM) FAQ: http://matlab.wikia.com/wiki/FAQ
http://www.mathworks.com

```
 0
Steven_Lord
1/11/2011 2:46:15 PM
```Thanks for your reply.

I thought parfor doesn't work if the iterations are dependent on each other, as they are here.  The i-th entry of x_old or x_new requires the (i-1)th entry.  It's the entries of the vectors x_new and x_old that are independent in my code.

The only way I can see to use parfor here is to un-vectorise my code and loop over the elements of x_new and x_old, but that doesn't feel like the right thing to do.

```
 0
PLH
1/11/2011 3:09:05 PM
```Sorry, just realised a typo:

The i-th ITERATION depends on the (i-1)-th ITERATION. The ENTRIES are independent.
```
 0
PLH
1/11/2011 3:18:05 PM
```
"PLH " <paulhalkyard@googlemail.com> wrote in message
news:ighrqh\$38v\$1@fred.mathworks.com...
>
> I thought parfor doesn't work if the iterations are dependent on each
> other, as they are here.  The i-th entry of x_old or x_new requires the
> (i-1)th entry.  It's the entries of the vectors x_new and x_old that are
> independent in my code.

Yes, I missed the "x_new = x_old" in your code.

In this case, I think you're probably stuck, as this tight order dependence
is likely to interfere with parallelization.  You _can't_ start work on the
second element until the computations on the first element are complete, and
so on down the line.  This seems pretty serial to me.

If you're attempting to run this code in parallel to improve the execution
time, I'd say you should instead identify the bottlenecks and see if there
are other ways to improve those sections of your code.  Run a smallish
example in the Profiler (see HELP PROFILE) to identify the bottlenecks and
then focus on improving the efficiency of those sections of the code.

--
Steve Lord
slord@mathworks.com
comp.soft-sys.matlab (CSSM) FAQ: http://matlab.wikia.com/wiki/FAQ
http://www.mathworks.com

```
 0
Steven_Lord
1/11/2011 3:39:57 PM
```Thanks for your time with this.

I've been doing some more reading on Mathworks, and I came across SPMD, but I'm struggling to understand how to implement it.  I tried:

M = 500; N = 1000;

x_old = rand(1,M);
x_new = x_old;
A = zeros(M,N);

A = codistributed(A);
x_old = codistributed(x_old);
x_new = codistributed(x_new);

matlabpool open 5
spmd
for j=1:N
x_new = rand*x_old-rand;
A(:,j) = x_new;
x_old = x_new;
end
end

I got a long error message (one from each core, I think):

??? Error using ==> spmd_feval at 8
Error detected on lab(s) 1 2 3 4 5

Error in ==> spmdtest at 17
for j=1:N

Caused by:

Attempted to access startA(2); index out of bounds because numel(startA)=1.

Error stack:
subsref.m at 99
subsasgn.m at 139

Attempted to access startA(2); index out of bounds because numel(startA)=1.

Error stack:
subsref.m at 99
subsasgn.m at 139

Attempted to access startA(2); index out of bounds because numel(startA)=1.

Error stack:
subsref.m at 99
subsasgn.m at 139

Attempted to access startA(2); index out of bounds because numel(startA)=1.

Error stack:
subsref.m at 99
subsasgn.m at 139

Attempted to access startA(2); index out of bounds because numel(startA)=1.

Error stack:
subsref.m at 99
subsasgn.m at 139

Do I need to bury the for loop in a function and call that within spmd so that the counting index j is interpreted properly?

Thanks
```
 0
PLH
1/11/2011 4:03:05 PM
```
"PLH " <paulhalkyard@googlemail.com> wrote in message
news:ighuvp\$qpg\$1@fred.mathworks.com...
> Thanks for your time with this.
>
> I've been doing some more reading on Mathworks, and I came across SPMD,
> but I'm struggling to understand how to implement it.  I tried:

*snip*

From your description, it sounds like what you're doing is pretty thoroughly
serial in nature; I don't think you're going to be able to parallelize it.
[It's like baking a cake -- no matter how much you may want to perform the
"mix the ingredients" and "bake the cake" steps in parallel, it's not going
to turn out well if you do.  Unless (maybe) you're working with oven-safe
mixing equipment.]

*snip*

>       x_new = rand*x_old-rand;  A(:,j) = x_new;
>       x_old = x_new;

This looks like a filtering operation.  If this is your actual application
or a close approximation to it, take a look at the FILTER function and see
if it's appropriate for your application.

--
Steve Lord
slord@mathworks.com
comp.soft-sys.matlab (CSSM) FAQ: http://matlab.wikia.com/wiki/FAQ
http://www.mathworks.com

```
 0
Steven_Lord
1/11/2011 6:46:04 PM
```To my thinking, it's definitely a problem that should parallelise - although, I may not have explained myself properly to make that clear anyone else.

I've written a classical integrator for an ensemble of particles.  My code just calculates the trajectories of an ensemble of particles using a 4th order Runge-Kutta method.  Crucially, the particles are non-interacting.  The i-th particle (whose position is represented by the i-th entry of my x_new and x_old vectors) is completely independent of every other particle.

I believe this is a "single-program multiple data" problem, where the different initial conditions of the particles are the data and the program common to all of them is the classical integration scheme.

I've only referred to x_new and x_old but, as you've probably guessed from this, there's also a v_old and a v_new.  Those details, however, aren't important to my problem.

I suppose I could batch process my code, and mash together the results at the end.  But it seems, from what I've read at least, that spmd should be able to deal with my problem.

So, although the processing of each entry is most definitely serial, I believe I should be able to process the ensemble, i.e. the different entries of the x vectors, in a parallel fashion.

But I have no experience using spmd, and I'm struggling to get it to work.  Maybe this is because I've implemented it badly or perhaps it's because I've misuderstood its purpose.  If you could help with this at all, I'd be very grateful.

THanks,

PLH.
```
 0
PLH
1/11/2011 7:10:21 PM
```"PLH " <paulhalkyard@googlemail.com> writes:

> I've been doing some more reading on Mathworks, and I came across
> SPMD, but I'm struggling to understand how to implement it.  I tried:
>
> M = 500; N = 1000;
>
> x_old = rand(1,M); x_new = x_old;
> A = zeros(M,N);
>
> A = codistributed(A);
> x_old = codistributed(x_old);
> x_new = codistributed(x_new);
>
> matlabpool open 5
> spmd
>   for j=1:N     x_new = rand*x_old-rand;      A(:,j) = x_new;
>       x_old = x_new;
>   end
> end
>
> I got a long error message (one from each core, I think):

The problem here is basically that you're creating *codistributed*
arrays outside the SMPD block and then passing them in. You should be
creating *distributed* arrays there, like so:

A = distributed(A);
x_old = distributed(x_old);
x_new = distributed(x_new);

Inside the SPMD block, these appear as codistributed arrays, i.e.:

spmd
class(A) % returns 'codistributed' on each worker.
end

With that change, your code gives no errors. (Whether it's going to do
what you want is another matter...)

Cheers,

Edric.
```
 0
Edric
1/12/2011 9:39:35 AM
```Thanks for the help Edric!  This is just a test example, so it does what i intended it to do.  But I did some other tests, and I found significant slowdown using spmd.

For those that are interested, I had implicitly assumed that 1 worker processing a 1xpq vector would be slower than q workers each processing a 1xp vector when p is large.  Actually, that's not true, and the scaling with respect to p is such that using one worker is faster (presumably the limit is set by some memory limitation).  Essentially, I assumed parallelising would be faster than vectorising, which is incorrect for the problem I'm interested in.

Anyway, thanks for the help everyone.
```
 0
PLH
1/13/2011 10:16:05 AM