Child Process Exited while Making Connection to Remote Process

  • Follow


I'm a relative newcomer to MPI.  For my class project for my Computer
Networks class I'm running a Phylogenetic Analysis program that splits
the work out to several different processes, and I'm doing it with MPI
to get the parallelism.  After writing my code I tried it on an input
that should only require one additional processor, so I ran it with
just four processors at its disposal.

However, I got the output listed below.  Does anybody know what this
means?  I'd include my source code but it's fairly large.  I'm cur-
rently trying to capture the problem in a smaller section of code, and
will post that to this newsgroup as soon as I get it.

---thanks,
Kevin Simonson

"You'll never get to heaven, or even to LA,
if you don't believe there's a way."
from Why Not

-------------------------------------------------------------------------------

tack:Ncl/Java_bash-2.05b$ make ParRid3
cp ParRid3.Si ParRid3.c
~clement/mpi/mpich-1.2.5.2/bin/mpicc
-I~clement/mpi/mpich-1.2.5.2/mpe/include -c ParRid3.c
~clement/mpi/mpich-1.2.5.2/bin/mpicc -o ParRid3 ParRid3.o -lmpe -lm
tack:Ncl/Java_bash-2.05b$ mpirun -machinefile machines.LINUX -np 4
ParRid3
1179
1179
rm_6554:  p4_error: interrupt SIGSEGV: 11
p0_5980:  p4_error: Child process exited while making connection to
remote process on apple.cs.byu.edu: 0
tack:Ncl/Java_bash-2.05b$

2
Reply kvnsmnsn (147) 12/14/2004 11:15:54 PM

kvnsmnsn@hotmail.com wrote:
> I'm a relative newcomer to MPI.  For my class project for my Computer
> Networks class I'm running a Phylogenetic Analysis program that splits
> the work out to several different processes, and I'm doing it with MPI
> to get the parallelism.  After writing my code I tried it on an input
> that should only require one additional processor, so I ran it with
> just four processors at its disposal.
> 
> However, I got the output listed below.  Does anybody know what this
> means?  I'd include my source code but it's fairly large.  I'm cur-
> rently trying to capture the problem in a smaller section of code, and
> will post that to this newsgroup as soon as I get it.
> 
> ---thanks,
> Kevin Simonson
> 
> "You'll never get to heaven, or even to LA,
> if you don't believe there's a way."
> from Why Not
> 
> -------------------------------------------------------------------------------
> 
> tack:Ncl/Java_bash-2.05b$ make ParRid3
> cp ParRid3.Si ParRid3.c
> ~clement/mpi/mpich-1.2.5.2/bin/mpicc
> -I~clement/mpi/mpich-1.2.5.2/mpe/include -c ParRid3.c
> ~clement/mpi/mpich-1.2.5.2/bin/mpicc -o ParRid3 ParRid3.o -lmpe -lm
> tack:Ncl/Java_bash-2.05b$ mpirun -machinefile machines.LINUX -np 4
> ParRid3
> 1179
> 1179
> rm_6554:  p4_error: interrupt SIGSEGV: 11
> p0_5980:  p4_error: Child process exited while making connection to
> remote process on apple.cs.byu.edu: 0
> tack:Ncl/Java_bash-2.05b$

Probably one of your child processes encountered a segfault error, causing it to
die, which caused the other MPI processes to die when they lost contact.

First, try recompiling using the -mpitrace flag.  That will generate a trace of
all your MPI subroutine calls, helping you to see where your segfault lies.

Second, add lots of print statements in your code, to generate a run-time trace
of your own, especially around MPI_Init.

Also, try running a trivial MPI program that you *know* should work.  If it
doesn't, then don't debug your source code; debug your installation of MPICH
instead.  To get MPICH working, you'll need to enable rsh or ssh, use some sort
of shared filesystem (in most cases), among several other steps.  If you have
installed MPICH incorrectly, your source code is the least of your worries.

    Randy

-- 
Randy Crawford   http://www.ruf.rice.edu/~rand   rand AT rice DOT edu
1
Reply Randy 12/15/2004 12:03:34 AM


1 Replies
1021 Views

(page loaded in 0.041 seconds)

Similiar Articles:













7/24/2012 6:02:47 AM


Reply: