f



forall Vs Do...enddo Vs Where

Hello,
I have a question regandin to "internal optimization" or the meanig of 
some instruction in fortran 95.

I'm speaking of :
- DO...ENDDO
- FORALL
- WHERE...END WHERE

I'm using a laptop whit single core processor on linux ubuntu 9.10 and 
gfortran 4.4.1.

Up to now I was shure that forall an where give me a increading of the 
speed in the computation due to internal optimization.

I tried to run the next code, please adjust the comments dependi the 
method that you want investigate:
-------------------------------
program forall
implicit none
integer :: i,j
integer :: x,y
real, allocatable :: a(:,:)
real :: ii,jj

!write(*,*) "dimensione nxm?"
!read(*,*) x,y
x=10000
y=10000
allocate(a(x,y))
call random_number(a)

do j=1,y
         do i=1,x
                 if (a(i,j)<=0.5) then
                         a(i,j) = 100
                 endif
         enddo
enddo

!forall (i=1:x,j=1:y,a(i,j)<=0.5)
!       a(i,j)=100
!end forall

!where (a<=0.5)
!	a=100
!end where
call random_number(ii)
call random_number(jj)
x=int(1+ii*x)
y=int(1+jj*x)
write(*,*) a(x,y)

end program forall
----------------------------------------

At the end I find this time:
gauss:~/Documenti$ time -p ./for2
    100.00000
real 4.41
user 2.60
sys 0.42

gauss:~/Documenti$ time -p ./forall
    100.00000
real 11.12
user 7.12
sys 0.57

gauss:~/Documenti$ time -p ./where
    100.00000
real 4.65
user 2.90
sys 0.36

All program was compiled with gfortran -O3 optimization

My question is: "where and forall are only a confortable instruction for 
the programmer or are there some case where we ca have a performance 
improvment?"

Regard

0
MM
1/4/2010 2:24:10 PM
comp.lang.fortran 11941 articles. 2 followers. Post Follow

11 Replies
2330 Views

Similar Articles

[PageSpeed] 30

<<--MM-->> wrote:

> !forall (i=1:x,j=1:y,a(i,j)<=0.5)
> !       a(i,j)=100
> !end forall
> 

> All program was compiled with gfortran -O3 optimization
> 
> My question is: "where and forall are only a confortable instruction for 
> the programmer or are there some case where we ca have a performance 
> improvment?"
> 
This question has been debated at some length.
My personal take is that forall was adopted to stem the threat of HPF 
developing as a separate fork of a Fortran-like language, rather than 
for practical advantage.
ifort doesn't attempt to optimize a single assignment forall unless 
preceded by !$ ivdep directive.  That doesn't work beyond a single 
assignment, due, in part, to the peculiar meaning of forall which 
implies multiple assignments (technically not loops).
It may be difficult to optimize a rank 2 forall, particularly for 
allocatable array.
1
Tim
1/4/2010 3:18:39 PM
On 2010-01-04 11:18:39 -0400, Tim Prince <TimothyPrince@sbcglobal.net> said:

> <<--MM-->> wrote:
> 
>> !forall (i=1:x,j=1:y,a(i,j)<=0.5)
>> !       a(i,j)=100
>> !end forall
>> 
> 
>> All program was compiled with gfortran -O3 optimization
>> 
>> My question is: "where and forall are only a confortable instruction 
>> for the programmer or are there some case where we ca have a 
>> performance improvment?"
>> 
> This question has been debated at some length.
> My personal take is that forall was adopted to stem the threat of HPF 
> developing as a separate fork of a Fortran-like language, rather than 
> for practical advantage.

I thought it was a technical fix to the limitations of array slices in array
assignment. The diagonal of a matrix is the quickest example. It came from
HPF and has other advantages but is basically array assignment done right or
on steroids particularly when combined with where. As an array assiignment it
can match formulaes more readily at the cost of temporary arrays that are not
overtly visible and that can be hard for compilers to optimize away.

> ifort doesn't attempt to optimize a single assignment forall unless 
> preceded by !$ ivdep directive.  That doesn't work beyond a single 
> assignment, due, in part, to the peculiar meaning of forall which 
> implies multiple assignments (technically not loops).
> It may be difficult to optimize a rank 2 forall, particularly for 
> allocatable array.


1
Gordon
1/4/2010 5:19:02 PM
Gordon Sande wrote:
> On 2010-01-04 11:18:39 -0400, Tim Prince <TimothyPrince@sbcglobal.net> 
> said:

>> This question has been debated at some length.
>> My personal take is that forall was adopted to stem the threat of HPF 
>> developing as a separate fork of a Fortran-like language, rather than 
>> for practical advantage.
> 
> I thought it was a technical fix to the limitations of array slices in 
> array
> assignment. The diagonal of a matrix is the quickest example. 
As that's your quickest example, it shows what a can of worms this is.
!$omp parallel workshare
forall(i=1:n)x(i,i)=1
!$omp end parallel workshare

is optimized by few compilers, and doesn't bring much economy of 
expression.  Equally few compilers take forall as an implicit invitation 
for threaded optimization.

The typical architectural requirement for threading to optimize this 
operation, in view of inherent high rate of DTLB miss on current 
architectures, may not have been foremost among the considerations when 
the syntax was thought up originally.
0
Tim
1/4/2010 5:59:20 PM
<<--MM-->> <no.spma@now.it> wrote:

> Hello,
> I have a question regandin to "internal optimization" or the meanig of
> some instruction in fortran 95.
> 
> I'm speaking of :
> - DO...ENDDO
> - FORALL
> - WHERE...END WHERE
> 
> I'm using a laptop whit single core processor on linux ubuntu 9.10 and
> gfortran 4.4.1.
> 
> Up to now I was shure that forall an where give me a increading of the
> speed in the computation due to internal optimization.

Forall was not designed with optimization in mind. It was designed (in
HPF) for parallelism, and then added to the Fortran standard as part of
incorporating the syntactic parts pf HPF. I don't have experience with
parallel machines to comment knowlegably. But for serial machines, there
is little reason to expect forall to be more efficient than simple DO
loops, and there is substantial data to suggest that it is often worse,
largely because it often involves temporary arrays. I don't know why you
would think that forall was somehow inherently more optimizable than DO
loops.

Tim and Gordon discussed that a little, but there is one point which
they did not mention and which I consider fundamental. Perhaps you know
this or consider it obvious. But you did ask, and there are some people
who definitely have been confused by the point, so I feel it important
to make.

DO is a looping construct. Forall and Where are array assignments. That
is a really fundamental difference. There are cases where one can
achieve a desired result using any of the forms, but do not let that
blind you to the fundamental difference. I have seen people take
"random" DO loops and change the syntax of the DO statement to that of a
FORALL, hoping that this might improve their performance or something.
Except in special cases, this results in something that won't even
compile.

-- 
Richard Maine                    | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle           |  -- Mark Twain
0
nospam
1/5/2010 3:13:12 AM
Richard Maine ha scritto:
> <<--MM-->> <no.spma@now.it> wrote:
> 
>> Hello,
>> I have a question regandin to "internal optimization" or the meanig of
>> some instruction in fortran 95.
>>
>> I'm speaking of :
>> - DO...ENDDO
>> - FORALL
>> - WHERE...END WHERE
>>

[CUT]

> 
> Forall was not designed with optimization in mind. It was designed (in
> HPF) for parallelism, and then added to the Fortran standard as part of
> incorporating the syntactic parts pf HPF. I don't have experience with
> parallel machines to comment knowlegably. But for serial machines, there
> is little reason to expect forall to be more efficient than simple DO
> loops, and there is substantial data to suggest that it is often worse,
> largely because it often involves temporary arrays. I don't know why you
> would think that forall was somehow inherently more optimizable than DO
> loops.
> 
> Tim and Gordon discussed that a little, but there is one point which
> they did not mention and which I consider fundamental. Perhaps you know
> this or consider it obvious. But you did ask, and there are some people
> who definitely have been confused by the point, so I feel it important
> to make.
> 
> DO is a looping construct. Forall and Where are array assignments. That
> is a really fundamental difference. There are cases where one can
> achieve a desired result using any of the forms, but do not let that
> blind you to the fundamental difference. I have seen people take
> "random" DO loops and change the syntax of the DO statement to that of a
> FORALL, hoping that this might improve their performance or something.
> Except in special cases, this results in something that won't even
> compile.
> 

I read about Forall and Where in some paper/tutorial for the fortra95, 
and in any case isn't clarifly the real difference, but the idea 
suggested was that the compiler can optimize the internal code.
I mean in Do loop on an array a(i,j) I use normaly a sequencing via the 
fast coordinate

do i=....
	do j=...
		a(i,j)=...
	enddo
enddo

when the software is increasing I use FORALL e WHERE in order to reduce 
the lines of code.

But now I discovered that in this case I can lost the efficency.

Is it true also for dual core or quad core processor?
0
MM
1/5/2010 10:39:40 AM
On 2010-01-05 06:39:40 -0400, "<<--MM-->>" <no.spma@now.it> said:

> Richard Maine ha scritto:
>> <<--MM-->> <no.spma@now.it> wrote:
>> 
>>> Hello,
>>> I have a question regandin to "internal optimization" or the meanig of
>>> some instruction in fortran 95.
>>> 
>>> I'm speaking of :
>>> - DO...ENDDO
>>> - FORALL
>>> - WHERE...END WHERE
>>> 
> 
> [CUT]
> 
>> 
>> Forall was not designed with optimization in mind. It was designed (in
>> HPF) for parallelism, and then added to the Fortran standard as part of
>> incorporating the syntactic parts pf HPF. I don't have experience with
>> parallel machines to comment knowlegably. But for serial machines, there
>> is little reason to expect forall to be more efficient than simple DO
>> loops, and there is substantial data to suggest that it is often worse,
>> largely because it often involves temporary arrays. I don't know why you
>> would think that forall was somehow inherently more optimizable than DO
>> loops.
>> 
>> Tim and Gordon discussed that a little, but there is one point which
>> they did not mention and which I consider fundamental. Perhaps you know
>> this or consider it obvious. But you did ask, and there are some people
>> who definitely have been confused by the point, so I feel it important
>> to make.
>> 
>> DO is a looping construct. Forall and Where are array assignments. That
>> is a really fundamental difference. There are cases where one can
>> achieve a desired result using any of the forms, but do not let that
>> blind you to the fundamental difference. I have seen people take
>> "random" DO loops and change the syntax of the DO statement to that of a
>> FORALL, hoping that this might improve their performance or something.
>> Except in special cases, this results in something that won't even
>> compile.
>> 
> 
> I read about Forall and Where in some paper/tutorial for the fortra95, 
> and in any case isn't clarifly the real difference, but the idea 
> suggested was that the compiler can optimize the internal code.
> I mean in Do loop on an array a(i,j) I use normaly a sequencing via the 
> fast coordinate
> 
> do i=....
> 	do j=...
> 		a(i,j)=...
> 	enddo
> enddo
> 
> when the software is increasing I use FORALL e WHERE in order to reduce 
> the lines of code.

To say it again, FORALL and WHERE are array assignments! To make it 
trivial, FORALL
is allowed to go from 1 to n, from n down to 1, the even indices up and the odd
indices down and any other way it choses to do so. If it had n 
processors it could
use all n in any random order it chose.

A DO loop of

  do i = 2, n
   a(i) = a(i) + a(i-1)
  enddo

will give a progressive partial sum but the same appearance with a FORALL will
only add adjacent values. The DO would give a different answer for "do 
i = n, 2, -1"
but the FORALL would not. FORALL does this by having a hidden array 
temporary that
might be optimized out. When the right hand side is complicated it can 
be hard for
a programmer to figure out a sequential form so instead they just put 
the results
into a temporary and copy the temporary at the end. Same for compilers 
and FORALL
statements.

> But now I discovered that in this case I can lost the efficency.
> 
> Is it true also for dual core or quad core processor?

Is your compiler (exactly that version of that vendor with exactly 
those switches!!)
going to multiprocess or not. Clearly it depends. If the compiler comes 
from a vendor
of parallel computers and you have paid for the full version and taken 
the vendors
cources on parallelism then the chance go way up. Big ifs!

Mostly multicore allows the complier to run at the same time as your 
email program.
Some I/O will be overlapped and even some music will be decoded in 
parallel. But
beyond that it is hard work.

There is an old saying about yachts. If you have to ask the price you 
can not afford one!
Here, if you have to ask about FORALL and WHERE you are very likely to 
not be able to
use the parallel features they are intended to enable in very special 
circumstnaces.


0
Gordon
1/5/2010 2:18:50 PM
<<--MM-->> wrote:

> 
> I read about Forall and Where in some paper/tutorial for the fortra95, 
> and in any case isn't clarifly the real difference, but the idea 
> suggested was that the compiler can optimize the internal code.
> I mean in Do loop on an array a(i,j) I use normaly a sequencing via the 
> fast coordinate
> 
> do i=....
>     do j=...
>         a(i,j)=...
>     enddo
> enddo
> 
> when the software is increasing I use FORALL e WHERE in order to reduce 
> the lines of code.
> 
> But now I discovered that in this case I can lost the efficency.
> 
> Is it true also for dual core or quad core processor?
Your pseudo-code contradicts what you said.  Unless you have an 
optimizing compiler which swaps loops (and you turn on that option), 
nesting the loops backwards as you have done is likely to "lose efficiency."
Similar compiler analysis (or more) is needed to optimize a rank 2 
forall().  where() presents somewhat different challenges to optimizing 
compilers.
The point was mentioned that forall is intended to require a compiler to 
diagnose and reject some situations which might prevent parallel 
operation on multi-core.  This falls disappointingly short of actually 
facilitating parallelism.
0
Tim
1/5/2010 3:02:46 PM
In article <1jbt60e.k9bpesczgc7wN%nospam@see.signature>,
nospam@see.signature (Richard Maine) writes: 

> I don't know why you
> would think that forall was somehow inherently more optimizable than DO
> loops.

> DO is a looping construct. Forall and Where are array assignments. That
> is a really fundamental difference. 

Maybe that is the reason he thought it would somehow be inherently more 
optimizable.  DO implies doing things one after the other.  If the 
compiler can prove to itself that parallel execution is OK, then it can 
do that optimisation.  However, with FORALL and WHERE, there is no 
serial implication, so the compiler can perhaps optimise a bit more 
aggressively.  

0
helbig
1/5/2010 5:32:14 PM
In article <hhvt2u$1c0$1@online.de>,
 helbig@astro.multiCLOTHESvax.de (Phillip Helbig---undress to 
 reply) wrote:

> In article <1jbt60e.k9bpesczgc7wN%nospam@see.signature>,
> nospam@see.signature (Richard Maine) writes: 
> 
> > I don't know why you
> > would think that forall was somehow inherently more optimizable than DO
> > loops.
> 
> > DO is a looping construct. Forall and Where are array assignments. That
> > is a really fundamental difference. 
> 
> Maybe that is the reason he thought it would somehow be inherently more 
> optimizable.  DO implies doing things one after the other.  If the 
> compiler can prove to itself that parallel execution is OK, then it can 
> do that optimisation.  However, with FORALL and WHERE, there is no 
> serial implication, so the compiler can perhaps optimise a bit more 
> aggressively.  

The semantics that we all wanted back in the 80s when the next 
fortran revision (f88 :-) was being discussed was exactly what you 
say above, a looping type construct in which the order of execution 
is unspecified.  That matched the vector hardware of the time.  
Unfortunately, FORALL adds a little more, and it is that little bit 
extra that gets in the way of optimization.  In particular, the 
problem seems to be the requirement that the statement is evaluated 
"as if" everything on the right hand side is stored into a temporary 
array of the appropriate size and then assigned to the left hand 
side target array.  If the compiler can't figure out that the 
temporary array is unneeded and assigns results directly to the 
target array (which seems to be somewhere between "always" and "too 
often"), then it actually does allocate a temporary array to hold 
the intermediate results.  It is that allocation and deallocation 
that seems to be the problem with optimization of FORALL.

The looping construct we wanted would have required the programmer 
to make sure that the order of execution was not important.  
Sometimes that is obvious for a statement or group of statements, 
sometimes it isn't, so this was a potential source of coding errors 
for programmers.  FORALL does the arbitrary-order part, but it 
provides the safety net of evaluation-before-assignment so that the 
programmer cannot possible make a mistake.  It is that safety net 
that seems to be the cause of the optimization and performance 
problems.

At this point, I don't know what the best solution is.  Should a new 
DOALL construct be added that works the right way?  Should a 
compiler directive be specified somehow in the standard to tell 
FORALL to behave correctly?  There doesn't really seem to be a good 
solution to the problem.  In hindsight, the FORALL semantics was a 
bad choice, but once it was in the language it is practically 
impossible to remove it, so we are stuck with it in the language 
forever.

BTW, when FORALL was added, I thought it was what we all wanted.  I 
did not recognize that such a seemingly minor difference between 
what we really wanted and what we got would have such major 
consequences.  As a result, I tend to avoid FORALL for all but 
trivial statements.  If a loop is important to performance, I tend 
to use old fashioned DO loops, or a mixture of DO loops and simple 
array syntax.  Even if a FORALL behaves well on one compiler, you 
can't rely on it working well on the next one.

$.02 -Ron Shepard
0
Ron
1/6/2010 1:34:21 AM
Tim Prince ha scritto:
> <<--MM-->> wrote:
> 
>>
>> I read about Forall and Where in some paper/tutorial for the fortra95, 
>> and in any case isn't clarifly the real difference, but the idea 
>> suggested was that the compiler can optimize the internal code.
>> I mean in Do loop on an array a(i,j) I use normaly a sequencing via 
>> the fast coordinate
>>
>> do i=....
>>     do j=...
>>         a(i,j)=...
>>     enddo
>> enddo
>>
>> when the software is increasing I use FORALL e WHERE in order to 
>> reduce the lines of code.
>>
>> But now I discovered that in this case I can lost the efficency.
>>
>> Is it true also for dual core or quad core processor?

> Your pseudo-code contradicts what you said. 

Yes, you are right, it was a mistake (change i to j and viceversa)
0
MM
1/6/2010 10:40:46 AM
On Jan 6, 2:34=A0am, Ron Shepard <ron-shep...@NOSPAM.comcast.net> wrote:
> In article <hhvt2u$1c...@online.de>,
> =A0hel...@astro.multiCLOTHESvax.de (Phillip Helbig---undress to
>
>
>
>
>
> =A0reply) wrote:
> > In article <1jbt60e.k9bpesczgc7wN%nos...@see.signature>,
> > nos...@see.signature (Richard Maine) writes:
>
> > > I don't know why you
> > > would think that forall was somehow inherently more optimizable than =
DO
> > > loops.
>
> > > DO is a looping construct. Forall and Where are array assignments. Th=
at
> > > is a really fundamental difference.
>
> > Maybe that is the reason he thought it would somehow be inherently more
> > optimizable. =A0DO implies doing things one after the other. =A0If the
> > compiler can prove to itself that parallel execution is OK, then it can
> > do that optimisation. =A0However, with FORALL and WHERE, there is no
> > serial implication, so the compiler can perhaps optimise a bit more
> > aggressively. =A0
>
> The semantics that we all wanted back in the 80s when the next
> fortran revision (f88 :-) was being discussed was exactly what you
> say above, a looping type construct in which the order of execution
> is unspecified. =A0That matched the vector hardware of the time. =A0
> Unfortunately, FORALL adds a little more, and it is that little bit
> extra that gets in the way of optimization. =A0In particular, the
> problem seems to be the requirement that the statement is evaluated
> "as if" everything on the right hand side is stored into a temporary
> array of the appropriate size and then assigned to the left hand
> side target array. =A0If the compiler can't figure out that the
> temporary array is unneeded and assigns results directly to the
> target array (which seems to be somewhere between "always" and "too
> often"), then it actually does allocate a temporary array to hold
> the intermediate results. =A0It is that allocation and deallocation
> that seems to be the problem with optimization of FORALL.
>
> The looping construct we wanted would have required the programmer
> to make sure that the order of execution was not important. =A0
> Sometimes that is obvious for a statement or group of statements,
> sometimes it isn't, so this was a potential source of coding errors
> for programmers. =A0FORALL does the arbitrary-order part, but it
> provides the safety net of evaluation-before-assignment so that the
> programmer cannot possible make a mistake. =A0It is that safety net
> that seems to be the cause of the optimization and performance
> problems.
>
> At this point, I don't know what the best solution is. =A0Should a new
> DOALL construct be added that works the right way? =A0Should a
> compiler directive be specified somehow in the standard to tell
> FORALL to behave correctly? =A0There doesn't really seem to be a good
> solution to the problem. =A0In hindsight, the FORALL semantics was a
> bad choice, but once it was in the language it is practically
> impossible to remove it, so we are stuck with it in the language
> forever.
>
> BTW, when FORALL was added, I thought it was what we all wanted. =A0I
> did not recognize that such a seemingly minor difference between
> what we really wanted and what we got would have such major
> consequences. =A0As a result, I tend to avoid FORALL for all but
> trivial statements. =A0If a loop is important to performance, I tend
> to use old fashioned DO loops, or a mixture of DO loops and simple
> array syntax. =A0Even if a FORALL behaves well on one compiler, you
> can't rely on it working well on the next one.
>
> $.02 -Ron Shepard- Hide quoted text -
>
> - Show quoted text -

Is the DO CONCURRENT of Fortran 2008 what you want?

Regards,

Mike Metcalf
0
m_b_metcalf
1/6/2010 3:13:11 PM
Reply: