MPI beginners question

  • Follow


I am looking into updating a huge existing serial code by parallelising
several of its slower subroutines (its rather to large and complicated to
parallelise the whole code in the time I have avaliable). Unfortunately
i've run into a problem to which any help would be very much appreciated.

I'm working on a linux cluster with Lam MPI and while I have some simple
'hello world' style examples compiling and running correctly, when MPI_INIT
is called in a subroutine called after a section of serial code the serial
code seems to run # times (# = number of processors used) - for example the
pseudo code below:  

PROGRAM TEST
 CALL SERIAL_SUBROUTINE1
 CALL MPI_SUBROUTINE
 CALL SERIAL_SUBROUTINE2
END

SUBROUTINE SERIAL_SUBROUTINE1
   PRINT *,'serial subroutine 1'
END

SUBROUTINE MPI_SUBROUTINE
   INCLUDE 'mpif.h'
   CALL MPI_INIT(ierr)
   <MPI CODE>
   CALL MPI_FINALIZE(ierr)
END

with produce an output of 'serial subroutine 1' x the number of cpu's used
in its execution.

Is there any way round this other than calling MPI_INIT() somewhere before
the first piece of code and carefully specifying what code should be run on
each node (eg. specifying that all serial subroutines should be run on node
0)?

Any help pointers would be very much appreciated as the thought of going
through the whole code is disturbing!



0
Reply Benjamin 5/19/2005 11:09:17 PM

Benjamin Mort wrote:
> I am looking into updating a huge existing serial code by parallelising
> several of its slower subroutines (its rather to large and complicated to
> parallelise the whole code in the time I have avaliable). Unfortunately
> i've run into a problem to which any help would be very much appreciated.
> 
> I'm working on a linux cluster with Lam MPI and while I have some simple
> 'hello world' style examples compiling and running correctly, when MPI_INIT
> is called in a subroutine called after a section of serial code the serial
> code seems to run # times (# = number of processors used) - for example the
> pseudo code below:  
> 
> PROGRAM TEST
>  CALL SERIAL_SUBROUTINE1
>  CALL MPI_SUBROUTINE
>  CALL SERIAL_SUBROUTINE2
> END
> 
> SUBROUTINE SERIAL_SUBROUTINE1
>    PRINT *,'serial subroutine 1'
> END
> 
> SUBROUTINE MPI_SUBROUTINE
>    INCLUDE 'mpif.h'
>    CALL MPI_INIT(ierr)
>    <MPI CODE>
>    CALL MPI_FINALIZE(ierr)
> END
> 
> with produce an output of 'serial subroutine 1' x the number of cpu's used
> in its execution.
> 
> Is there any way round this other than calling MPI_INIT() somewhere before
> the first piece of code and carefully specifying what code should be run on
> each node (eg. specifying that all serial subroutines should be run on node
> 0)?
> 
> Any help pointers would be very much appreciated as the thought of going
> through the whole code is disturbing!

The way most/all implementations of MPI work is to launch N copies of
your program when the MPI job begins:

% lamboot hostfile
% a.out
% lamhalt

This launches N separate a.out executables on N different hosts and
immediately they all begin running, regardless of whether or when they
call MPI_INIT.  If there's a serial section at the beginning of the
program before you call MPI_INIT, it will execute on all N hosts.

The only thing MPI_INIT does is initialize the MPI communication
services.  If you don't use those services, your program's pre-MPI_INIT
instructions will still run.  Meanwhile, MPI's communication services
will merely be unavailable to that block of code.

If you want only one of your processes to do some serial work, your
program should look like this:

> PROGRAM TEST
   CALL MPI_INIT
   CALL MPI_RANK(..., mpi_process_id)
   IF (mpi_process_id == 0) THEN
>    CALL SERIAL_SUBROUTINE1
   END IF
   CALL MPI_BARRIER
   ! call MPI_BARRIER so that all process will wait until the serial
   !   subroutine finishes before proceeding
>  CALL MPI_SUBROUTINE
>  CALL SERIAL_SUBROUTINE2
> END

    Randy

-- 
Randy Crawford   http://www.ruf.rice.edu/~rand   rand AT rice DOT edu
0
Reply Randy 5/20/2005 5:09:24 PM


1 Replies
285 Views

(page loaded in 1.368 seconds)

3/22/2013 10:55:02 AM


Reply: