Hello!
I'm developing a parallel version of TSP using MPI.
I started from this code:
http://www.owlnet.rice.edu/~comp422/homeworks/hw2/index.html
I modified this code to use standard c++ priority queue instead of the
priority queue implemented as liked list. I also coded a sequential
version and run some test and I was really surprised.
MPI in this implementation is really slow. For example, computing TSP
for 12 cities tooks 27.589803s for seq and 37.284650 for parallel
version.
For small TSP this difference is bigger.
If I compare data for example between 2 and 4 processor or 4 and 8 I
can see speedup.
Since it's my first MPI program I don't know exactly what to do to
improve MPI performance. I think there's a bottleneck on
communication, but how to solve?
Thanks,
Simone
|
|
0
|
|
|
|
Reply
|
bsimone
|
9/18/2004 12:09:07 AM |
|
[First of all, I have no affiliation with this source code or the Rice class.]
With 2 processes, you will see no speedup, since process 0 (Coordinator) does no
work and only coordinates the other (Worker) processes. Therefore, you should
measure overall speedup based on N-1 processes and not N processes. Ergo, you
might see some speedup starting at 3 processes, but not at 2 processes.
The simplest way to measure your communication time is to time your code using a
function like times(2) or any other user+system timer. Your MPI communication
should make up most of the 'system' time. If you add more MPI processes (and
each process does the same amount of work as before), if the system time grows,
that should be entirely due to additional time spent in MPI communication.
I've looked at this code only briefly, but in general, the only way to reduce
comm time in algorithms like this is for each process to do more work before it
sends its results back to the 'Coordinator' process. To do this, you may need
to devise a better way to partition the work so that the processes don't waste
time searching unproductively. I'd put my optimization efforts into devising a
way for each worker process to search longer (but equally as productively)
between messages to the Coordinator.
BTW, if you found that the Communicator process spent a lot of its time doing no
productive work, and you wanted to make better use of process 0, you could
change the MPI calls from blocking (MPI_Send) to nonblocking (MPI_Isend). This
would let the Communicator process work on other tasks between sending/receiving
messages. Possibly you could have it do some work on its own between messages
-- further pruning the search space or using heuristics to suggest better path
starting points for further searches by the Worker processes.
Randy
Simone wrote:
> Hello!
> I'm developing a parallel version of TSP using MPI.
> I started from this code:
> http://www.owlnet.rice.edu/~comp422/homeworks/hw2/index.html
> I modified this code to use standard c++ priority queue instead of the
> priority queue implemented as liked list. I also coded a sequential
> version and run some test and I was really surprised.
> MPI in this implementation is really slow. For example, computing TSP
> for 12 cities tooks 27.589803s for seq and 37.284650 for parallel
> version.
> For small TSP this difference is bigger.
> If I compare data for example between 2 and 4 processor or 4 and 8 I
> can see speedup.
> Since it's my first MPI program I don't know exactly what to do to
> improve MPI performance. I think there's a bottleneck on
> communication, but how to solve?
> Thanks,
> Simone
--
Randy Crawford http://www.ruf.rice.edu/~rand rand AT rice DOT edu
|
|
0
|
|
|
|
Reply
|
Randy
|
9/20/2004 4:58:19 PM
|
|
Thank you very much for your reply.
After a more accurate analysis I discovered what you said: this code
waste too much time in communications. I have also a solution I think.
In the sequential version I keep the starting city and then I search
the best path on the other n-1 cities. What I'm going to do on
parallel version is to divide problem keep the starting city, then
split the problem into n-1 subproblems and each worker process will
compute tsp on n-2 cities.
In this way communications are dramatically reduced and computation on
each node is increased.
Thank you,
Simone
Randy <joe@burgershack.com> wrote in message news:<cin27g$qs1$1@joe.rice.edu>...
> [First of all, I have no affiliation with this source code or the Rice class.]
>
> With 2 processes, you will see no speedup, since process 0 (Coordinator) does no
> work and only coordinates the other (Worker) processes. Therefore, you should
> measure overall speedup based on N-1 processes and not N processes. Ergo, you
> might see some speedup starting at 3 processes, but not at 2 processes.
>
> The simplest way to measure your communication time is to time your code using a
> function like times(2) or any other user+system timer. Your MPI communication
> should make up most of the 'system' time. If you add more MPI processes (and
> each process does the same amount of work as before), if the system time grows,
> that should be entirely due to additional time spent in MPI communication.
>
> I've looked at this code only briefly, but in general, the only way to reduce
> comm time in algorithms like this is for each process to do more work before it
> sends its results back to the 'Coordinator' process. To do this, you may need
> to devise a better way to partition the work so that the processes don't waste
> time searching unproductively. I'd put my optimization efforts into devising a
> way for each worker process to search longer (but equally as productively)
> between messages to the Coordinator.
>
> BTW, if you found that the Communicator process spent a lot of its time doing no
> productive work, and you wanted to make better use of process 0, you could
> change the MPI calls from blocking (MPI_Send) to nonblocking (MPI_Isend). This
> would let the Communicator process work on other tasks between sending/receiving
> messages. Possibly you could have it do some work on its own between messages
> -- further pruning the search space or using heuristics to suggest better path
> starting points for further searches by the Worker processes.
>
> Randy
>
>
> Simone wrote:
> > Hello!
> > I'm developing a parallel version of TSP using MPI.
> > I started from this code:
> > http://www.owlnet.rice.edu/~comp422/homeworks/hw2/index.html
> > I modified this code to use standard c++ priority queue instead of the
> > priority queue implemented as liked list. I also coded a sequential
> > version and run some test and I was really surprised.
> > MPI in this implementation is really slow. For example, computing TSP
> > for 12 cities tooks 27.589803s for seq and 37.284650 for parallel
> > version.
> > For small TSP this difference is bigger.
> > If I compare data for example between 2 and 4 processor or 4 and 8 I
> > can see speedup.
> > Since it's my first MPI program I don't know exactly what to do to
> > improve MPI performance. I think there's a bottleneck on
> > communication, but how to solve?
> > Thanks,
> > Simone
|
|
0
|
|
|
|
Reply
|
bsimone
|
9/22/2004 10:50:39 PM
|
|
|
2 Replies
519 Views
(page loaded in 0.07 seconds)
|