Hi,
I am performing a numerical simulation in Fortran with 8 soluion
variables in 2D. Each processor must exchange boundary values with
neighbouring processors every time step.
Without using any special data types there are at least 8*4 = 32
MPI_SENDRECVs each time step. I have managed to contruct my own
(non-contiguous) data types that encompasses all the solution
variables, so that all the bounday solution variables are exchanged in
4 MPI_SENDRECVs.
These datatypes are a combination of MPI_TYPE_CONTIGUOUS,
MPI_TYPE_VECTOR and MPI_TYPE_INDEXED types, all put together in
MPI_TYPE_STRUCTs, using MPI_ADDRESS for the displacements.
Now, from what I have read/heard, this should reduce the overall
communication time, due to each message send having a high latency.
However, I am wondering about the efficiency of using user defined MPI
datatypes like this. Are there some hidden costs in doing this?
Thanks,
Nym.
|
|
0
|
|
|
|
Reply
|
neverwillreply (16)
|
7/23/2006 11:17:41 AM |
|
Nym wrote:
> Hi,
>
> I am performing a numerical simulation in Fortran with 8 soluion
> variables in 2D. Each processor must exchange boundary values with
> neighbouring processors every time step.
>
> Without using any special data types there are at least 8*4 = 32
> MPI_SENDRECVs each time step. I have managed to contruct my own
> (non-contiguous) data types that encompasses all the solution
> variables, so that all the bounday solution variables are exchanged in
> 4 MPI_SENDRECVs.
>
> These datatypes are a combination of MPI_TYPE_CONTIGUOUS,
> MPI_TYPE_VECTOR and MPI_TYPE_INDEXED types, all put together in
> MPI_TYPE_STRUCTs, using MPI_ADDRESS for the displacements.
>
> Now, from what I have read/heard, this should reduce the overall
> communication time, due to each message send having a high latency.
> However, I am wondering about the efficiency of using user defined MPI
> datatypes like this. Are there some hidden costs in doing this?
>
> Thanks,
>
> Nym.
As a rule of thumb cathering all informations in one single message
reduces the communication overhead. However, this might depend on the
mpi distribution, the hardware, the network, the specific kind of
problem etc., i.e. only careful testing will show whether it is worth
doing or not. In my opinion using MPI_TYPE_STRUCT and family is very
nice feature since it often makes the programe easier to digest for
other people who did not write the programe.
BTW: Are you sure you cannot introduce ghost points into you problem
such that you do not have to exhange data at every time step.
Jesper
|
|
0
|
|
|
|
Reply
|
Jesper
|
7/24/2006 3:25:35 AM
|
|
Hi Jesper,
Thanks for your reply,
Jesper wrote:
> As a rule of thumb cathering all informations in one single message
> reduces the communication overhead. However, this might depend on the
> mpi distribution, the hardware, the network, the specific kind of
> problem etc., i.e. only careful testing will show whether it is worth
> doing or not.
The one problem with this is that if I am writing a program that I
would like to run well on a variety of platforms (which I am), then I
obviously can't test it on all of them.
> In my opinion using MPI_TYPE_STRUCT and family is very
> nice feature since it often makes the programe easier to digest for
> other people who did not write the programe.
I think I agree, although the setting up of all the different
boundaries using the different datatypes may not be that clear, but
sending them afterwards just as "TopBoundary" can be clear
> BTW: Are you sure you cannot introduce ghost points into you problem
> such that you do not have to exhange data at every time step.
I think data exchange will have to happen every time step:
1. To set the time step (which uses a global reduction)
2. This code is in fact an Adaptive Mesh Refinement code, so in
addition to the boundary values communication, there has to be
communication of the new/removed refinement levels at the boundaries.
This has to be done every time step, and as far as I can tell it is
unavoidable.
I think it would be possible to use more ghost cells to reduce the
communication, however this would be very fiddly, as the solver has
quite a few stages in it. However, I was planning to use non-blocking
communication, so that the solution on the interior of the region is
calculated while the boundary values are communicated. I was hoping
that this would reduce the communication time.
(This plan to use non-blocking communcation is also one reason for
using user define datatypes. When sending a vertical section of an
array in Fortran, density(1:2, :) say,(which is non-contiguous in
memory), just a plain MPI_SENDRECV without custom data types, then
Fortran copies the array into a contiguous section of memory, and then
on the return it copies the (perhaps changed) values back, and
deallocated the contiguous section of memory it used.
In addition to this perhaps being inefficient, it would not be possible
to do this for a non-blocking call, as when the data is received it
would be sent into the area of memory that contained the temporary
allocated contiguous section of memory, and would never be copied back
into the array used in the program. The only was to avoid this as far
as I can tell is to use a custom data type)
Nym.
|
|
0
|
|
|
|
Reply
|
Nym
|
7/24/2006 7:31:27 AM
|
|
Nym wrote:
> Hi,
>
> I am performing a numerical simulation in Fortran with 8 soluion
> variables in 2D. Each processor must exchange boundary values with
> neighbouring processors every time step.
>
> Without using any special data types there are at least 8*4 = 32
> MPI_SENDRECVs each time step. I have managed to contruct my own
> (non-contiguous) data types that encompasses all the solution
> variables, so that all the bounday solution variables are exchanged in
> 4 MPI_SENDRECVs.
>
> These datatypes are a combination of MPI_TYPE_CONTIGUOUS,
> MPI_TYPE_VECTOR and MPI_TYPE_INDEXED types, all put together in
> MPI_TYPE_STRUCTs, using MPI_ADDRESS for the displacements.
>
> Now, from what I have read/heard, this should reduce the overall
> communication time, due to each message send having a high latency.
> However, I am wondering about the efficiency of using user defined MPI
> datatypes like this. Are there some hidden costs in doing this?
Of couse the handling of (non-contiguous) datatypes include some cost.
But if a single MPI_Sendrecv() with your non-contiguos datatype performs
worse than 4 individual MPI_Sendrecv()s, you MPI implementation is
seriously broken.
Why don't you isolate the relevant code and test it separately, or even
better, evaluate the performance of the two versions of your application?
--
Joachim Worringen, Software Architect, Dolphin Interconnect Solutions
+ reply to joachim at dolphinics dot com
+ Statements express my personal opinion, and are no official statement
of Dolphin Interconnect Solutions (www.dolphinics.com)
|
|
0
|
|
|
|
Reply
|
Joachim
|
7/24/2006 8:27:30 AM
|
|
|
3 Replies
117 Views
(page loaded in 0.47 seconds)
|