Hello everyone.
I have run into a strange issue. My OpenMP parallelized program runs on
several threads (numbers varying from 1-8 as given by OMP_NUM_THREADS),
as reported by omp_get_num_threads(). However, it appears that all
threads are running on a single CPU core, meaning that instead of a
near-perfect speedup I get some slowdown when increasing the number of
threads.
What is even stranger is that my code used to run nicely on multiple
cores, and indeed does so on intel fortran on our cluster. Has anyone
experienced similar issues with OpenMP, or can you suggest somewhere to
start looking for the trouble?
(I have found that most of the program runtime is spent in BLAS
routines, so the performance difference between compilers is small. I
have been using Goto BLAS for these tests, and the Goto BLAS routines
are indeed running nicely in multicore mode.)
Cheers,
Paul.
|
|
0
|
|
|
|
Reply
|
Paul
|
2/11/2011 9:33:29 AM |
|
Paul,
What operating system and gfortran version are you using? I've seen
no such problems with gfortran 4.5.2 on Max OSX 10.6.6 (Snow Leopard).
Al Greynolds
www.ruda-cardinal.com
|
|
0
|
|
|
|
Reply
|
Al
|
2/11/2011 11:58:30 AM
|
|
Paul,
What operating system and gfortran version are you using? I've seen
no such OpenMP problems with gfortran 4.5.2 on Mac OSX 10.6.6 (Snow
Leopard).
Al Greynolds
www.ruda-cardinal.com
|
|
0
|
|
|
|
Reply
|
Al
|
2/11/2011 12:05:46 PM
|
|
Den 11.02.11 12.58, skrev Al Greynolds:
> Paul,
>
> What operating system and gfortran version are you using? I've seen
> no such problems with gfortran 4.5.2 on Max OSX 10.6.6 (Snow Leopard).
>
> Al Greynolds
> www.ruda-cardinal.com
>
>
Sorry, I forgot to mention that. Goto BLAS does not employ several cores
on my mac os x 10.6.6 (snow leopard), but my own openmp-ified code does
parallelize nicely. I have also gfortran 4.5.2. I have to agree with
your conclusions there.
The question was asked with respect to gfortran 4.5.1 on Fedora Linux
release 14. The processor is an Intel Core i7 2.80 GHz (we have several
similar machines with varying GHz ratings), 4 cores with hyperthreading.
Here, Goto BLAS does employ several cores, but my own openmp-ified code
does not parallelize. There is in fact a small performance penalty to
increasing the number of OpenMP threads, and top reports CPU usage at
100% (i.e. one full core) so I believe that all threads are running on
one core.
Paul.
|
|
0
|
|
|
|
Reply
|
Paul
|
2/11/2011 12:50:02 PM
|
|
On Feb 11, 3:33=A0am, Paul Anton Letnes <paul.anton.let...@gmail.com>
wrote:
> However, it appears that all threads are running on a single CPU core
Threads are managed by the OS. Besides how did you check it?
|
|
0
|
|
|
|
Reply
|
rusi_pathan
|
2/11/2011 1:11:50 PM
|
|
Den 11.02.11 14.11, skrev rusi_pathan:
> On Feb 11, 3:33 am, Paul Anton Letnes<paul.anton.let...@gmail.com>
> wrote:
>> However, it appears that all threads are running on a single CPU core
> Threads are managed by the OS. Besides how did you check it?
I checked it by noting that
1) 'top' in the linux terminal reports 100% CPU usage (it reports up to
800% for other programs)
2) Program execution does not speed up on the linux server when
increasing the OMP_NUM_THREADS environment variable, while it does speed
up on my mac laptop
3) omp_get_num_threads() reports 1,2,4,8 threads (as set by
OMP_NUM_THREADS env. var.) correctly, but there is no speedup
Perhaps omp_get_num_threads is reporting something else than the number
of actual threads? I do not know. I am posting here because I have no
idea where to look for fixes to this problem.
Cheers,
Paul
|
|
0
|
|
|
|
Reply
|
Paul
|
2/11/2011 3:27:56 PM
|
|
> Has anyone experienced similar issues with OpenMP, or can you suggest
> somewhere to start looking for the trouble?
Try a simple OpenMP program (like a single for loop), see if you can
reproduce the issue with that, then post here your compiler version
(output of "gfortran -v") and the exact code and command line you are
using.
--
FX
|
|
0
|
|
|
|
Reply
|
FX
|
2/11/2011 8:29:45 PM
|
|
Den 11.02.11 21.29, skrev FX:
>> Has anyone experienced similar issues with OpenMP, or can you suggest
>> somewhere to start looking for the trouble?
>
> Try a simple OpenMP program (like a single for loop), see if you can
> reproduce the issue with that, then post here your compiler version
> (output of "gfortran -v") and the exact code and command line you are
> using.
This is part of the problem: the simple OpenMP loop runs on all cores.
Also, the OpenMP code I wrote works on Mac OS X gfortran 4.5.2. The
OpenMP code also works flawlessly on Rocks cluster linux with intel
fortran v. 11.
I have now tested on gfortran 4.4.5 and 4.5.1 on Fedora Linux release
14, and both of these compilers result in no speedup. I know I can't
expect anyone here to find my problem (as I have no idea where to look
myself and a simple program doesn't reproduce the error), but it would
be interesting to see if someone here has had the same experience.
Paul
|
|
0
|
|
|
|
Reply
|
Paul
|
2/12/2011 9:00:40 AM
|
|
On 2/12/2011 2:00 AM, Paul Anton Letnes wrote:
> Den 11.02.11 21.29, skrev FX:
>>> Has anyone experienced similar issues with OpenMP, or can you suggest
>>> somewhere to start looking for the trouble?
>>
>> Try a simple OpenMP program (like a single for loop), see if you can
>> reproduce the issue with that, then post here your compiler version
>> (output of "gfortran -v") and the exact code and command line you are
>> using.
>
> This is part of the problem: the simple OpenMP loop runs on all cores.
> Also, the OpenMP code I wrote works on Mac OS X gfortran 4.5.2. The
> OpenMP code also works flawlessly on Rocks cluster linux with intel
> fortran v. 11.
>
> I have now tested on gfortran 4.4.5 and 4.5.1 on Fedora Linux release
> 14, and both of these compilers result in no speedup. I know I can't
> expect anyone here to find my problem (as I have no idea where to look
> myself and a simple program doesn't reproduce the error), but it would
> be interesting to see if someone here has had the same experience.
You mentioned that your code "used to run nicely on multiple cores." Is
it possible that something in the code or in your environment has
changed recently?
You might think about methodically removing pieces of your program until
it either works properly or is equivalent (for some definition thereof)
to a simple program that does work. This is guaranteed to be tedious
and time-consuming, but it has a chance of helping you isolate the problem.
Louis
|
|
0
|
|
|
|
Reply
|
Louis
|
2/12/2011 12:20:57 PM
|
|
On 11 Feb, 10:33, Paul Anton Letnes <paul.anton.let...@gmail.com>
wrote:
> I have run into a strange issue. My OpenMP parallelized program runs on
> several threads (numbers varying from 1-8 as given by OMP_NUM_THREADS),
> as reported by omp_get_num_threads(). However, it appears that all
> threads are running on a single CPU core, meaning that instead of a
> near-perfect speedup I get some slowdown when increasing the number of
> threads.
OpenMP works fine on GCC, including gfortran.
Threads are managed by the operating system. If you have 8 logical
CPUs (e.g. Intel i7) and 8 threads, all CPUs could theoretically be
saturated. Thus, either the process is started affinity to one CPU
(this could e.g. be set by the shell), or there is something in your
program that forces sequential execution.
The latter could be an OpenMP pragma that specify that ony one thread
may execute a major portion of the code:
!$OMP SINGLE
!$OMP MASTER
or contention for a global mutex, e.g. an unnamed critical section:
!$OMP CRITICAL
There could also be programming mistakes such as an !$OMP PARALLEL
block without an !$OMP DO or multiple !$OMP SECTION inside. There
could also be a typo like !$OMP PARALLEL instead of !$OMP PARALLEL DO,
which will have the effect you reported.
If the program will parallelize nicely on other systems, the problem
must be related to restricted CPU affinity, which is an OS or command
shell issue. If it does not, the problem is likely related to the
OpenMP code.
If nothing else helps, I'd suggest you try to rebuild GCC and
recompile your program.
Sturla
|
|
0
|
|
|
|
Reply
|
sturlamolden
|
2/12/2011 2:34:33 PM
|
|
Den 12.02.11 15.34, skrev sturlamolden:
> On 11 Feb, 10:33, Paul Anton Letnes<paul.anton.let...@gmail.com>
> wrote:
>
>> I have run into a strange issue. My OpenMP parallelized program runs on
>> several threads (numbers varying from 1-8 as given by OMP_NUM_THREADS),
>> as reported by omp_get_num_threads(). However, it appears that all
>> threads are running on a single CPU core, meaning that instead of a
>> near-perfect speedup I get some slowdown when increasing the number of
>> threads.
>
> OpenMP works fine on GCC, including gfortran.
>
> Threads are managed by the operating system. If you have 8 logical
> CPUs (e.g. Intel i7) and 8 threads, all CPUs could theoretically be
> saturated. Thus, either the process is started affinity to one CPU
> (this could e.g. be set by the shell), or there is something in your
> program that forces sequential execution.
>
> The latter could be an OpenMP pragma that specify that ony one thread
> may execute a major portion of the code:
>
> !$OMP SINGLE
> !$OMP MASTER
>
> or contention for a global mutex, e.g. an unnamed critical section:
>
> !$OMP CRITICAL
>
> There could also be programming mistakes such as an !$OMP PARALLEL
> block without an !$OMP DO or multiple !$OMP SECTION inside. There
> could also be a typo like !$OMP PARALLEL instead of !$OMP PARALLEL DO,
> which will have the effect you reported.
>
> If the program will parallelize nicely on other systems, the problem
> must be related to restricted CPU affinity, which is an OS or command
> shell issue. If it does not, the problem is likely related to the
> OpenMP code.
>
> If nothing else helps, I'd suggest you try to rebuild GCC and
> recompile your program.
>
>
> Sturla
This is interesting reading. I am posting my one single OpenMP loop
below, so you can have a look. I do believe it is a very simple !$OMP
PARALLEL DO and as mentioned, it usually (not always, oddly enough)
works nicely on intel fortran, as mentioned before.
It would appear that this might be an OS issue, then?
Paul
--------------------
!$omp parallel do &
!$omp private(icol, irow, til, ip1, ip2, fra, iq1, iq2, alpha_0) &
!$omp firstprivate(imin, imax) &
!$omp shared(LHS)
do icol = imin, imax
if (icol == 1) then
print *, 'omp num threads:', omp_get_num_threads() ! TODO Remove
end if
fra = reverse_lookup(icol, 1)
iq1 = reverse_lookup(icol, 2)
iq2 = reverse_lookup(icol, 3)
alpha_0 = alpha_0s(iq1, iq2)
do irow = 1, size(LHS, 1)
til = reverse_lookup(irow, 1)
ip1 = reverse_lookup(irow, 2)
ip2 = reverse_lookup(irow, 3)
LHS(irow, icol) = generate_element(pm_lhs, fra, til, iq1, &
iq2, ip1, ip2, numerics, alphas(ip1, ip2), &
alpha_0, fft_of_zetas, qs)
end do
end do
!$omp end parallel do
|
|
0
|
|
|
|
Reply
|
Paul
|
2/12/2011 3:06:28 PM
|
|
On 12 Feb, 16:06, Paul Anton Letnes <paul.anton.let...@gmail.com>
wrote:
> It would appear that this might be an OS issue, then?
That is hard to tell. The function "reverse_lookup" or
"generate_element" could still be synchronized with a mutex. That it
sometimes fail with Intel's compiler suggest a programming issue.
Sturla
|
|
0
|
|
|
|
Reply
|
sturlamolden
|
2/12/2011 7:09:08 PM
|
|
Den 12.02.11 20.09, skrev sturlamolden:
> On 12 Feb, 16:06, Paul Anton Letnes<paul.anton.let...@gmail.com>
> wrote:
>
>> It would appear that this might be an OS issue, then?
>
> That is hard to tell. The function "reverse_lookup" or
> "generate_element" could still be synchronized with a mutex. That it
> sometimes fail with Intel's compiler suggest a programming issue.
>
>
> Sturla
Just to clarify, reverse_lookup is in fact an array which is built
before entering the loop. Perhaps there is some sort of programming
issue with the generate_element() function, I will look into that
possibility. Thanks a lot for your suggestions!
Speaking of which, I guess reverse_lookup should be declared
firstprivate or shared? I didn't even think of that before.
Paul.
|
|
0
|
|
|
|
Reply
|
Paul
|
2/12/2011 9:13:58 PM
|
|
On Feb 12, 6:00=A0am, Paul Anton Letnes <paul.anton.let...@gmail.com>
wrote:
> Den 11.02.11 21.29, skrev FX:
>
> >> Has anyone experienced similar issues with OpenMP, or can you suggest
> >> somewhere to start looking for the trouble?
>
> > Try a simple OpenMP program (like a single for loop), see if you can
> > reproduce the issue with that, then post here your compiler version
> > (output of "gfortran -v") and the exact code and command line you are
> > using.
>
> This is part of the problem: the simple OpenMP loop runs on all cores.
> Also, the OpenMP code I wrote works on Mac OS X gfortran 4.5.2. The
> OpenMP code also works flawlessly on Rocks cluster linux with intel
> fortran v. 11.
>
> I have now tested on gfortran 4.4.5 and 4.5.1 on Fedora Linux release
> 14, and both of these compilers result in no speedup. I know I can't
> expect anyone here to find my problem (as I have no idea where to look
> myself and a simple program doesn't reproduce the error), but it would
> be interesting to see if someone here has had the same experience.
>
> Paul
Hi,
Can you post the single loop so I can compile-link-run?
Fernando.
|
|
0
|
|
|
|
Reply
|
ftinetti (148)
|
2/14/2011 11:10:37 AM
|
|
On Feb 12, 11:34=A0am, sturlamolden <sturlamol...@yahoo.no> wrote:
> On 11 Feb, 10:33, Paul Anton Letnes <paul.anton.let...@gmail.com>
> wrote:
>
> > I have run into a strange issue. My OpenMP parallelized program runs on
> > several threads (numbers varying from 1-8 as given by OMP_NUM_THREADS),
> > as reported by omp_get_num_threads(). However, it appears that all
> > threads are running on a single CPU core, meaning that instead of a
> > near-perfect speedup I get some slowdown when increasing the number of
> > threads.
>
> OpenMP works fine on GCC, including gfortran.
>
> Threads are managed by the operating system. If you have 8 logical
> CPUs (e.g. Intel i7) and 8 threads, all CPUs could theoretically be
> saturated. Thus, either the process is started affinity to one CPU
> (this could e.g. be set by the shell), or there is something in your
> program that forces sequential execution.
>
> The latter could be an OpenMP pragma that specify that ony one thread
> may execute a major portion of the code:
>
> =A0 =A0!$OMP SINGLE
> =A0 =A0!$OMP MASTER
>
> or contention for a global mutex, e.g. an unnamed critical section:
>
> =A0 =A0!$OMP CRITICAL
>
> There could also be programming mistakes such as an !$OMP PARALLEL
> block without an !$OMP DO or multiple !$OMP SECTION inside. There
> could also be a typo like !$OMP PARALLEL instead of !$OMP PARALLEL DO,
> which will have the effect you reported.
No, in fact the effect should be something like the opposite: every
thread will do all the job (race conditions aside).
|
|
0
|
|
|
|
Reply
|
ftinetti (148)
|
2/14/2011 11:13:29 AM
|
|
On 14 Feb, 12:13, gmail-unlp <ftine...@gmail.com> wrote:
> No, in fact the effect should be something like the opposite: every
> thread will do all the job (race conditions aside).
Yes. You are right. My pun, sorry.
Sturla
|
|
0
|
|
|
|
Reply
|
sturlamolden (723)
|
2/14/2011 11:51:09 AM
|
|
Den 14.02.11 12.10, skrev gmail-unlp:
> On Feb 12, 6:00 am, Paul Anton Letnes<paul.anton.let...@gmail.com>
> wrote:
>> Den 11.02.11 21.29, skrev FX:
>>
>>>> Has anyone experienced similar issues with OpenMP, or can you suggest
>>>> somewhere to start looking for the trouble?
>>
>>> Try a simple OpenMP program (like a single for loop), see if you can
>>> reproduce the issue with that, then post here your compiler version
>>> (output of "gfortran -v") and the exact code and command line you are
>>> using.
>>
>> This is part of the problem: the simple OpenMP loop runs on all cores.
>> Also, the OpenMP code I wrote works on Mac OS X gfortran 4.5.2. The
>> OpenMP code also works flawlessly on Rocks cluster linux with intel
>> fortran v. 11.
>>
>> I have now tested on gfortran 4.4.5 and 4.5.1 on Fedora Linux release
>> 14, and both of these compilers result in no speedup. I know I can't
>> expect anyone here to find my problem (as I have no idea where to look
>> myself and a simple program doesn't reproduce the error), but it would
>> be interesting to see if someone here has had the same experience.
>>
>> Paul
>
> Hi,
>
> Can you post the single loop so I can compile-link-run?
>
> Fernando.
Sure! Just see below. The outer loop in a double loop is
OpenMP-parallelized. The function generate_element contains a function
generate_j which also contains a loop, so significant work is done even
in the innermost loop. Unfortunately, this loop is way too short (only
10 iterations) to usefully parallelize. I guess it would be easier to
debug that way, though.
Cheers,
Paul.
----------------------
subroutine setup_LHS(...)
! LHS is the left hand side of an equation system
(snip)
!$omp parallel do &
!$omp private(icol, irow, til, ip1, ip2, fra, iq1, iq2, alpha_0) &
!$omp firstprivate(qs, numerics, imin, imax, reverse_lookup, alpha_0s,
alphas) &
!$omp shared(LHS, fft_of_zetas)
do icol = imin, imax
if (icol == 1) then
print *, 'omp num threads:', omp_get_num_threads() ! TODO Remove
end if
fra = reverse_lookup(icol, 1)
iq1 = reverse_lookup(icol, 2)
iq2 = reverse_lookup(icol, 3)
alpha_0 = alpha_0s(iq1, iq2)
do irow = 1, size(LHS, 1)
til = reverse_lookup(irow, 1)
ip1 = reverse_lookup(irow, 2)
ip2 = reverse_lookup(irow, 3)
LHS(irow, icol) = generate_element(pm_lhs, fra, til, iq1, &
iq2, ip1, ip2, numerics, alphas(ip1, ip2), &
alpha_0, fft_of_zetas, qs)
end do
end do
!$omp end parallel do
|
|
0
|
|
|
|
Reply
|
paul.anton.letnes (55)
|
2/14/2011 12:18:02 PM
|
|
On Feb 14, 9:18=A0am, Paul Anton Letnes <paul.anton.let...@gmail.com>
wrote:
> Den 14.02.11 12.10, skrev gmail-unlp:
>
>
>
>
>
>
>
>
>
> > On Feb 12, 6:00 am, Paul Anton Letnes<paul.anton.let...@gmail.com>
> > wrote:
> >> Den 11.02.11 21.29, skrev FX:
>
> >>>> Has anyone experienced similar issues with OpenMP, or can you sugges=
t
> >>>> somewhere to start looking for the trouble?
>
> >>> Try a simple OpenMP program (like a single for loop), see if you can
> >>> reproduce the issue with that, then post here your compiler version
> >>> (output of "gfortran -v") and the exact code and command line you are
> >>> using.
>
> >> This is part of the problem: the simple OpenMP loop runs on all cores.
> >> Also, the OpenMP code I wrote works on Mac OS X gfortran 4.5.2. The
> >> OpenMP code also works flawlessly on Rocks cluster linux with intel
> >> fortran v. 11.
>
> >> I have now tested on gfortran 4.4.5 and 4.5.1 on Fedora Linux release
> >> 14, and both of these compilers result in no speedup. I know I can't
> >> expect anyone here to find my problem (as I have no idea where to look
> >> myself and a simple program doesn't reproduce the error), but it would
> >> be interesting to see if someone here has had the same experience.
>
> >> Paul
>
> > Hi,
>
> > Can you post the single loop so I can compile-link-run?
>
> > Fernando.
>
> Sure! Just see below. The outer loop in a double loop is
> OpenMP-parallelized. The function generate_element contains a function
> generate_j which also contains a loop, so significant work is done even
> in the innermost loop. Unfortunately, this loop is way too short (only
> 10 iterations) to usefully parallelize. I guess it would be easier to
> debug that way, though.
>
> Cheers,
> Paul.
>
> ----------------------
> subroutine setup_LHS(...)
> ! LHS is the left hand side of an equation system
>
> (snip)
>
> !$omp parallel do &
> !$omp private(icol, irow, til, ip1, ip2, fra, iq1, iq2, alpha_0) &
> !$omp firstprivate(qs, numerics, imin, imax, reverse_lookup, alpha_0s,
> alphas) &
> !$omp shared(LHS, fft_of_zetas)
> do icol =3D imin, imax
> =A0 =A0 =A0if (icol =3D=3D 1) then
> =A0 =A0 =A0 =A0 =A0print *, 'omp num threads:', omp_get_num_threads() ! T=
ODO Remove
> =A0 =A0 =A0end if
> =A0 =A0 =A0fra =3D reverse_lookup(icol, 1)
> =A0 =A0 =A0iq1 =3D reverse_lookup(icol, 2)
> =A0 =A0 =A0iq2 =3D reverse_lookup(icol, 3)
> =A0 =A0 =A0alpha_0 =3D alpha_0s(iq1, iq2)
> =A0 =A0 =A0do irow =3D 1, size(LHS, 1)
> =A0 =A0 =A0 =A0 =A0til =3D reverse_lookup(irow, 1)
> =A0 =A0 =A0 =A0 =A0ip1 =3D reverse_lookup(irow, 2)
> =A0 =A0 =A0 =A0 =A0ip2 =3D reverse_lookup(irow, 3)
> =A0 =A0 =A0 =A0 =A0LHS(irow, icol) =3D generate_element(pm_lhs, fra, til,=
iq1, &
> =A0 =A0 =A0 =A0 =A0 =A0 =A0iq2, ip1, ip2, numerics, alphas(ip1, ip2), &
> =A0 =A0 =A0 =A0 =A0 =A0 =A0alpha_0, fft_of_zetas, qs)
> =A0 =A0 =A0end do
> end do
> !$omp end parallel do
Sorry, I don't have enough time to "complete" the code with
declarations/initializations/etc., that's why I need (and asked for)
some code I can directly compile-link-run-play almost directly from
the command line. Maybe you can just assign a constant to LHS just to
check for parallel behavior...
Looking the previous posts I was thinking in the line of OpenMP
implementation/OS problems, did you see something related to taskset?
Fernando.
|
|
0
|
|
|
|
Reply
|
ftinetti (148)
|
2/14/2011 12:58:31 PM
|
|
Den 14.02.11 13.58, skrev gmail-unlp:
> On Feb 14, 9:18 am, Paul Anton Letnes<paul.anton.let...@gmail.com>
> wrote:
>> Den 14.02.11 12.10, skrev gmail-unlp:
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>> On Feb 12, 6:00 am, Paul Anton Letnes<paul.anton.let...@gmail.com>
>>> wrote:
>>>> Den 11.02.11 21.29, skrev FX:
>>
>>>>>> Has anyone experienced similar issues with OpenMP, or can you suggest
>>>>>> somewhere to start looking for the trouble?
>>
>>>>> Try a simple OpenMP program (like a single for loop), see if you can
>>>>> reproduce the issue with that, then post here your compiler version
>>>>> (output of "gfortran -v") and the exact code and command line you are
>>>>> using.
>>
>>>> This is part of the problem: the simple OpenMP loop runs on all cores.
>>>> Also, the OpenMP code I wrote works on Mac OS X gfortran 4.5.2. The
>>>> OpenMP code also works flawlessly on Rocks cluster linux with intel
>>>> fortran v. 11.
>>
>>>> I have now tested on gfortran 4.4.5 and 4.5.1 on Fedora Linux release
>>>> 14, and both of these compilers result in no speedup. I know I can't
>>>> expect anyone here to find my problem (as I have no idea where to look
>>>> myself and a simple program doesn't reproduce the error), but it would
>>>> be interesting to see if someone here has had the same experience.
>>
>>>> Paul
>>
>>> Hi,
>>
>>> Can you post the single loop so I can compile-link-run?
>>
>>> Fernando.
>>
>> Sure! Just see below. The outer loop in a double loop is
>> OpenMP-parallelized. The function generate_element contains a function
>> generate_j which also contains a loop, so significant work is done even
>> in the innermost loop. Unfortunately, this loop is way too short (only
>> 10 iterations) to usefully parallelize. I guess it would be easier to
>> debug that way, though.
>>
>> Cheers,
>> Paul.
>>
>> ----------------------
>> subroutine setup_LHS(...)
>> ! LHS is the left hand side of an equation system
>>
>> (snip)
>>
>> !$omp parallel do&
>> !$omp private(icol, irow, til, ip1, ip2, fra, iq1, iq2, alpha_0)&
>> !$omp firstprivate(qs, numerics, imin, imax, reverse_lookup, alpha_0s,
>> alphas)&
>> !$omp shared(LHS, fft_of_zetas)
>> do icol = imin, imax
>> if (icol == 1) then
>> print *, 'omp num threads:', omp_get_num_threads() ! TODO Remove
>> end if
>> fra = reverse_lookup(icol, 1)
>> iq1 = reverse_lookup(icol, 2)
>> iq2 = reverse_lookup(icol, 3)
>> alpha_0 = alpha_0s(iq1, iq2)
>> do irow = 1, size(LHS, 1)
>> til = reverse_lookup(irow, 1)
>> ip1 = reverse_lookup(irow, 2)
>> ip2 = reverse_lookup(irow, 3)
>> LHS(irow, icol) = generate_element(pm_lhs, fra, til, iq1,&
>> iq2, ip1, ip2, numerics, alphas(ip1, ip2),&
>> alpha_0, fft_of_zetas, qs)
>> end do
>> end do
>> !$omp end parallel do
>
> Sorry, I don't have enough time to "complete" the code with
> declarations/initializations/etc., that's why I need (and asked for)
> some code I can directly compile-link-run-play almost directly from
> the command line. Maybe you can just assign a constant to LHS just to
> check for parallel behavior...
>
> Looking the previous posts I was thinking in the line of OpenMP
> implementation/OS problems, did you see something related to taskset?
>
> Fernando.
I see, and I understand. However, the code is a bit complex. This is
probably part of the reason for the strange behavior! As mentioned
earlier, a simple do loop example compiles, runs and speeds up as expected.
I will consider your advice with respect to just assigning a constant to
LHS and see if that works out. I think it could be helpful!
I have never used taskset, so I have no idea how I would go about using
it in this context.
Paul.
|
|
0
|
|
|
|
Reply
|
paul.anton.letnes (55)
|
2/14/2011 1:07:15 PM
|
|
On Feb 14, 10:07=A0am, Paul Anton Letnes <paul.anton.let...@gmail.com>
wrote:
> Den 14.02.11 13.58, skrev gmail-unlp:
>
>
>
>
>
> > On Feb 14, 9:18 am, Paul Anton Letnes<paul.anton.let...@gmail.com>
> > wrote:
> >> Den 14.02.11 12.10, skrev gmail-unlp:
>
> >>> On Feb 12, 6:00 am, Paul Anton Letnes<paul.anton.let...@gmail.com>
> >>> wrote:
> >>>> Den 11.02.11 21.29, skrev FX:
>
> >>>>>> Has anyone experienced similar issues with OpenMP, or can you sugg=
est
> >>>>>> somewhere to start looking for the trouble?
>
> >>>>> Try a simple OpenMP program (like a single for loop), see if you ca=
n
> >>>>> reproduce the issue with that, then post here your compiler version
> >>>>> (output of "gfortran -v") and the exact code and command line you a=
re
> >>>>> using.
>
> >>>> This is part of the problem: the simple OpenMP loop runs on all core=
s.
> >>>> Also, the OpenMP code I wrote works on Mac OS X gfortran 4.5.2. The
> >>>> OpenMP code also works flawlessly on Rocks cluster linux with intel
> >>>> fortran v. 11.
>
> >>>> I have now tested on gfortran 4.4.5 and 4.5.1 on Fedora Linux releas=
e
> >>>> 14, and both of these compilers result in no speedup. I know I can't
> >>>> expect anyone here to find my problem (as I have no idea where to lo=
ok
> >>>> myself and a simple program doesn't reproduce the error), but it wou=
ld
> >>>> be interesting to see if someone here has had the same experience.
>
> >>>> Paul
>
> >>> Hi,
>
> >>> Can you post the single loop so I can compile-link-run?
>
> >>> Fernando.
>
> >> Sure! Just see below. The outer loop in a double loop is
> >> OpenMP-parallelized. The function generate_element contains a function
> >> generate_j which also contains a loop, so significant work is done eve=
n
> >> in the innermost loop. Unfortunately, this loop is way too short (only
> >> 10 iterations) to usefully parallelize. I guess it would be easier to
> >> debug that way, though.
>
> >> Cheers,
> >> Paul.
>
> >> ----------------------
> >> subroutine setup_LHS(...)
> >> ! LHS is the left hand side of an equation system
>
> >> (snip)
>
> >> !$omp parallel do&
> >> !$omp private(icol, irow, til, ip1, ip2, fra, iq1, iq2, alpha_0)&
> >> !$omp firstprivate(qs, numerics, imin, imax, reverse_lookup, alpha_0s,
> >> alphas)&
> >> !$omp shared(LHS, fft_of_zetas)
> >> do icol =3D imin, imax
> >> =A0 =A0 =A0 if (icol =3D=3D 1) then
> >> =A0 =A0 =A0 =A0 =A0 print *, 'omp num threads:', omp_get_num_threads()=
! TODO Remove
> >> =A0 =A0 =A0 end if
> >> =A0 =A0 =A0 fra =3D reverse_lookup(icol, 1)
> >> =A0 =A0 =A0 iq1 =3D reverse_lookup(icol, 2)
> >> =A0 =A0 =A0 iq2 =3D reverse_lookup(icol, 3)
> >> =A0 =A0 =A0 alpha_0 =3D alpha_0s(iq1, iq2)
> >> =A0 =A0 =A0 do irow =3D 1, size(LHS, 1)
> >> =A0 =A0 =A0 =A0 =A0 til =3D reverse_lookup(irow, 1)
> >> =A0 =A0 =A0 =A0 =A0 ip1 =3D reverse_lookup(irow, 2)
> >> =A0 =A0 =A0 =A0 =A0 ip2 =3D reverse_lookup(irow, 3)
> >> =A0 =A0 =A0 =A0 =A0 LHS(irow, icol) =3D generate_element(pm_lhs, fra, =
til, iq1,&
> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 iq2, ip1, ip2, numerics, alphas(ip1, ip2),=
&
> >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 alpha_0, fft_of_zetas, qs)
> >> =A0 =A0 =A0 end do
> >> end do
> >> !$omp end parallel do
>
> > Sorry, I don't have enough time to "complete" the code with
> > declarations/initializations/etc., that's why I need (and asked for)
> > some code I can directly compile-link-run-play almost directly from
> > the command line. Maybe you can just assign a constant to LHS just to
> > check for parallel behavior...
>
> > Looking the previous posts I was thinking in the line of OpenMP
> > implementation/OS problems, did you see something related to taskset?
>
> > Fernando.
>
> I see, and I understand. However, the code is a bit complex. This is
> probably part of the reason for the strange behavior! As mentioned
> earlier, a simple do loop example compiles, runs and speeds up as expecte=
d.
>
> I will consider your advice with respect to just assigning a constant to
> LHS and see if that works out. I think it could be helpful!
>
> I have never used taskset, so I have no idea how I would go about using
> it in this context.
>
> Paul.
I see, no problem, I understood from a previous post (I copy the text
here):
> This is part of the problem: the simple OpenMP loop runs on all cores.
> Also, the OpenMP code I wrote works on Mac OS X gfortran 4.5.2. The
> OpenMP code also works flawlessly on Rocks cluster linux with intel
> fortran v. 11.
> I have now tested on gfortran 4.4.5 and 4.5.1 on Fedora Linux release
> 14, and both of these compilers result in no speedup.
that the simple loop did not work on Fedora, my mistake, sorry.
About code complexity and parallel performance, I think code
complexity only "affects" correctness, not parallel usage to only one
core, but obviously I don't know a looooot of details/facts.
About taskset: since I was thinking in OpenMP implementation/OS
problems, I suggested taking a look at taskset, which is used to
"retrieve or set a process's CPU affinity" -copied from
http://www.unix.com/man-page/Linux/1/taskset/
Since threads (I think always) inherit scheduling properties from the
process in which they are created, maybe you can verify/play with the
process scheduling properties. But it's just a guess...
Fernando.
|
|
0
|
|
|
|
Reply
|
ftinetti (148)
|
2/14/2011 1:42:27 PM
|
|
In article <2c0b330a-1cfb-42d1-b597-8aacac4702a0@f36g2000pri.googlegroups.com>,
gmail-unlp <ftinetti@gmail.com> wrote:
>
>About code complexity and parallel performance, I think code
>complexity only "affects" correctness, not parallel usage to only one
>core, but obviously I don't know a looooot of details/facts.
That's more-or-less correct, in theory. Whether it is the case
in practice is less clear.
>About taskset: since I was thinking in OpenMP implementation/OS
>problems, I suggested taking a look at taskset, which is used to
>"retrieve or set a process's CPU affinity" -copied from
>http://www.unix.com/man-page/Linux/1/taskset/
>Since threads (I think always) inherit scheduling properties from the
>process in which they are created, maybe you can verify/play with the
>process scheduling properties. But it's just a guess...
This will sound a bit patronising, but you are completely lost.
Nobody except an expert (in using operating systems, primarily)
should think of using such facilities, as they don't work the way
that they appear to, and never have (on ANY operating system!)
And most experts know better than to use such facilities except
in extremis, for that reason and many others.
The most that it is worth doing is calling a little C function
that returns the TID (thread identifier), which will check that
different parts of the code are using different threads. But
DO be warned that the mapping between OpenMP threads and system
threads is not simple, not at all.
If they all give the same TID, then your problem is that it is not
starting enough threads; if they give different ones, you have a
scheduler issue, and you should stop looking at the OpenMP code
and look elsewhere. And the latter is HARD.
Regards,
Nick Maclaren.
|
|
0
|
|
|
|
Reply
|
nmm12 (898)
|
2/14/2011 1:43:57 PM
|
|
In article <9577de97-eabd-45d6-9af2-04a8a112e0a8@l12g2000pra.googlegroups.com>,
gmail-unlp <ftinetti@gmail.com> wrote:
>
>Yes, I think so. Btw, do you have some suggestion to the OP not so
>HARD or for experts only or for those who know better? I really don't
>have any idea, so if there is not such a suggestion I would know the
>answer is in the pile-of-things-i-don't-know. I'll keep learning,
>however, just trying to reach those places and understand a little bit
>more.
The best that I can offer is to Email the slides on a course that
I have part-written, largely on how to avoid getting into trouble
with OpenMP. But it doesn't contain ANYTHING about the actual
configuration and system usage - except general guidelines. Please
say if you would like to see it.
Regards,
Nick Maclaren.
|
|
0
|
|
|
|
Reply
|
nmm12 (898)
|
2/14/2011 2:20:59 PM
|
|
On Feb 14, 10:43=A0am, n...@cam.ac.uk wrote:
> In article <2c0b330a-1cfb-42d1-b597-8aacac470...@f36g2000pri.googlegroups=
..com>,
>
> gmail-unlp =A0<ftine...@gmail.com> wrote:
>
> >About code complexity and parallel performance, I think code
> >complexity only "affects" correctness, not parallel usage to only one
> >core, but obviously I don't know a looooot of details/facts.
>
> That's more-or-less correct, in theory. =A0Whether it is the case
> in practice is less clear.
>
> >About taskset: since I was thinking in OpenMP implementation/OS
> >problems, I suggested taking a look at taskset, which is used to
> >"retrieve or set a process's CPU affinity" -copied from
> >http://www.unix.com/man-page/Linux/1/taskset/
> >Since threads (I think always) inherit scheduling properties from the
> >process in which they are created, maybe you can verify/play with the
> >process scheduling properties. But it's just a guess...
>
> This will sound a bit patronising, but you are completely lost.
What do you mean with "a bit patronising"? (I understand the part you
say I'm completely lost... ;P ...).
> Nobody except an expert (in using operating systems, primarily)
> should think of using such facilities, as they don't work the way
> that they appear to, and never have (on ANY operating system!)
I'm not exactly an expert (and I'm not able to define what an expert
is), and I've used such facilities, and they worked the way I
expected... This remains me that someone told to maintain some
software with the current bugs... Maybe I'm completely lost and I'm
completely lucky! :))
> And most experts know better than to use such facilities except
> in extremis, for that reason and many others.
Again, a realm I don't know. I just suggested a very simple task: to
play a little with an OS command just guessing something, in case of
nothing else appears.
> The most that it is worth doing is calling a little C function
> that returns the TID (thread identifier), which will check that
> different parts of the code are using different threads. =A0But
> DO be warned that the mapping between OpenMP threads and system
> threads is not simple, not at all.
Yes, that's why I didn't suggest that, besides I don't know calling
little C functions from Fortran (yes, I'm far far away from being an
expert...). About OpenMP implementation: yes, that's a place I don't
want to be, I usually try to figure out only if the problem is just
mine or there is some bug in the OpenMP implementation. The answer is
usually known before I start looking for, but I like to find out the
reason...
> If they all give the same TID, then your problem is that it is not
> starting enough threads; if they give different ones, you have a
> scheduler issue, and you should stop looking at the OpenMP code
> and look elsewhere. =A0And the latter is HARD.
Yes, I think so. Btw, do you have some suggestion to the OP not so
HARD or for experts only or for those who know better? I really don't
have any idea, so if there is not such a suggestion I would know the
answer is in the pile-of-things-i-don't-know. I'll keep learning,
however, just trying to reach those places and understand a little bit
more.
Thank you very much,
Fernando.
|
|
0
|
|
|
|
Reply
|
ftinetti (148)
|
2/14/2011 2:56:02 PM
|
|
Den 14.02.11 15.20, skrev nmm1@cam.ac.uk:
> In article<9577de97-eabd-45d6-9af2-04a8a112e0a8@l12g2000pra.googlegroups.com>,
> gmail-unlp<ftinetti@gmail.com> wrote:
>>
>> Yes, I think so. Btw, do you have some suggestion to the OP not so
>> HARD or for experts only or for those who know better? I really don't
>> have any idea, so if there is not such a suggestion I would know the
>> answer is in the pile-of-things-i-don't-know. I'll keep learning,
>> however, just trying to reach those places and understand a little bit
>> more.
>
> The best that I can offer is to Email the slides on a course that
> I have part-written, largely on how to avoid getting into trouble
> with OpenMP. But it doesn't contain ANYTHING about the actual
> configuration and system usage - except general guidelines. Please
> say if you would like to see it.
>
>
> Regards,
> Nick Maclaren.
If you have a course over how not to get into trouble with OpenMP, I for
one am interested.
Paul.
|
|
0
|
|
|
|
Reply
|
paul.anton.letnes (55)
|
2/14/2011 3:14:58 PM
|
|
On Feb 14, 12:14=A0pm, Paul Anton Letnes <paul.anton.let...@gmail.com>
wrote:
> Den 14.02.11 15.20, skrev n...@cam.ac.uk:
>
>
>
>
>
> > In article<9577de97-eabd-45d6-9af2-04a8a112e...@l12g2000pra.googlegroup=
s.com>,
> > gmail-unlp<ftine...@gmail.com> =A0wrote:
>
> >> Yes, I think so. Btw, do you have some suggestion to the OP not so
> >> HARD or for experts only or for those who know better? I really don't
> >> have any idea, so if there is not such a suggestion I would know the
> >> answer is in the pile-of-things-i-don't-know. I'll keep learning,
> >> however, just trying to reach those places and understand a little bit
> >> more.
>
> > The best that I can offer is to Email the slides on a course that
> > I have part-written, largely on how to avoid getting into trouble
> > with OpenMP. =A0But it doesn't contain ANYTHING about the actual
> > configuration and system usage - except general guidelines. =A0Please
> > say if you would like to see it.
>
> > Regards,
> > Nick Maclaren.
>
> If you have a course over how not to get into trouble with OpenMP, I for
> one am interested.
>
> Paul.
Me too, and thank you,
Fernando.
|
|
0
|
|
|
|
Reply
|
ftinetti (148)
|
2/14/2011 3:18:09 PM
|
|
|
24 Replies
805 Views
(page loaded in 0.225 seconds)
Similiar Articles: Windows API programming with gfortran or g95 - comp.lang.fortran ...gfortran problem linking to DLL - comp.lang.fortran Windows API programming with gfortran or g95 - comp.lang.fortran ... OpenMP "not working" on gfortran - comp.lang ... gfortran or ifort? - comp.lang.fortrangfortran problem linking to DLL - comp.lang.fortran Windows API programming with gfortran or g95 - comp.lang.fortran ... OpenMP "not working ... I use the ifort compiler ... gfortran for 64-bit Windows - comp.lang.fortran... to the handy "installer" version of gfortran http ... at comp.lang.fortran and are said to work well - but is not ... for me > (with the exception of no gain for OpenMP). Sockets in gfortran? - comp.lang.fortranI did something in that direction in a work- related project (not with gcc/gfortran though). I have been meaning to do such a module for my Flibs project (flibs.sf ... Commercial Fortran Compilers - comp.lang.fortranI used to work with AbsoftV9 - I don't remember ... as useful a way by the comparison sites; gfortran has come up to the top rank for OpenMP performance on linux, but not ... problem with mixed c and fortran code - comp.lang.fortran ...I remember this well because I spent a considerable amount of time to understand why a certain program did not work... As the OP mentioned explicitly gcc+gfortran, I ... Need a FORTRAN compiler for Win7 (or XP) - comp.lang.fortran ...... gcc-webdl run tdm-gcc-webdl.exe Select all (which I did), or gfortran + OpenMP ... I could > not get the ftp to work, but noticed the program exe in your link. problem in interface - comp.lang.fortranLibraries compiled with G95 may not mix with Gfortran, and vice ... Next, mixing compiler versions may not work, for ... in > question is always called outside the openmp ... GUI for Fortran programs - comp.lang.fortranIt can be compiled with > MinGW/g++ and gfortran (if you ... quad-core processor, since the Fortran is using OpenMP ... I don't know how well it works or does not work but it ... Where did Fortran go? - comp.lang.fortranAnd, FWIW, nearly everyone I work with uses MPI heavily, with some OpenMP tossed in, and co-arrays ... Will just continue to use > gfortran. I do not like software which ... OpenMP in gfortran - Google Sites - Free websites and wikisOpenMP in gfortran. Toward Faster Development Time, Better Application Performance and ... for public review, the gomp-3_0-branch branch has been created in SVN and work ... OpenMP - The GNU Fortran Compiler6.1.16 OpenMP. OpenMP (Open Multi-Processing) is an ... and the c$, *$ and !$ sentinels in fixed form, gfortran ... It might be possible to get a working solution if -Wl ... 7/25/2012 10:41:53 PM
|