f

#### forall Vs Do...enddo Vs Where

```Hello,
I have a question regandin to "internal optimization" or the meanig of
some instruction in fortran 95.

I'm speaking of :
- DO...ENDDO
- FORALL
- WHERE...END WHERE

I'm using a laptop whit single core processor on linux ubuntu 9.10 and
gfortran 4.4.1.

Up to now I was shure that forall an where give me a increading of the
speed in the computation due to internal optimization.

method that you want investigate:
-------------------------------
program forall
implicit none
integer :: i,j
integer :: x,y
real, allocatable :: a(:,:)
real :: ii,jj

!write(*,*) "dimensione nxm?"
x=10000
y=10000
allocate(a(x,y))
call random_number(a)

do j=1,y
do i=1,x
if (a(i,j)<=0.5) then
a(i,j) = 100
endif
enddo
enddo

!forall (i=1:x,j=1:y,a(i,j)<=0.5)
!       a(i,j)=100
!end forall

!where (a<=0.5)
!	a=100
!end where
call random_number(ii)
call random_number(jj)
x=int(1+ii*x)
y=int(1+jj*x)
write(*,*) a(x,y)

end program forall
----------------------------------------

At the end I find this time:
gauss:~/Documenti\$ time -p ./for2
100.00000
real 4.41
user 2.60
sys 0.42

gauss:~/Documenti\$ time -p ./forall
100.00000
real 11.12
user 7.12
sys 0.57

gauss:~/Documenti\$ time -p ./where
100.00000
real 4.65
user 2.90
sys 0.36

All program was compiled with gfortran -O3 optimization

My question is: "where and forall are only a confortable instruction for
the programmer or are there some case where we ca have a performance
improvment?"

Regard

```
 0
MM
1/4/2010 2:24:10 PM
comp.lang.fortran 11941 articles. 2 followers.

11 Replies
2400 Views

Similar Articles

[PageSpeed] 27

```<<--MM-->> wrote:

> !forall (i=1:x,j=1:y,a(i,j)<=0.5)
> !       a(i,j)=100
> !end forall
>

> All program was compiled with gfortran -O3 optimization
>
> My question is: "where and forall are only a confortable instruction for
> the programmer or are there some case where we ca have a performance
> improvment?"
>
This question has been debated at some length.
My personal take is that forall was adopted to stem the threat of HPF
developing as a separate fork of a Fortran-like language, rather than
ifort doesn't attempt to optimize a single assignment forall unless
preceded by !\$ ivdep directive.  That doesn't work beyond a single
assignment, due, in part, to the peculiar meaning of forall which
implies multiple assignments (technically not loops).
It may be difficult to optimize a rank 2 forall, particularly for
allocatable array.
```
 1
Tim
1/4/2010 3:18:39 PM
```On 2010-01-04 11:18:39 -0400, Tim Prince <TimothyPrince@sbcglobal.net> said:

> <<--MM-->> wrote:
>
>> !forall (i=1:x,j=1:y,a(i,j)<=0.5)
>> !       a(i,j)=100
>> !end forall
>>
>
>> All program was compiled with gfortran -O3 optimization
>>
>> My question is: "where and forall are only a confortable instruction
>> for the programmer or are there some case where we ca have a
>> performance improvment?"
>>
> This question has been debated at some length.
> My personal take is that forall was adopted to stem the threat of HPF
> developing as a separate fork of a Fortran-like language, rather than

I thought it was a technical fix to the limitations of array slices in array
assignment. The diagonal of a matrix is the quickest example. It came from
HPF and has other advantages but is basically array assignment done right or
on steroids particularly when combined with where. As an array assiignment it
can match formulaes more readily at the cost of temporary arrays that are not
overtly visible and that can be hard for compilers to optimize away.

> ifort doesn't attempt to optimize a single assignment forall unless
> preceded by !\$ ivdep directive.  That doesn't work beyond a single
> assignment, due, in part, to the peculiar meaning of forall which
> implies multiple assignments (technically not loops).
> It may be difficult to optimize a rank 2 forall, particularly for
> allocatable array.

```
 1
Gordon
1/4/2010 5:19:02 PM
```Gordon Sande wrote:
> On 2010-01-04 11:18:39 -0400, Tim Prince <TimothyPrince@sbcglobal.net>
> said:

>> This question has been debated at some length.
>> My personal take is that forall was adopted to stem the threat of HPF
>> developing as a separate fork of a Fortran-like language, rather than
>
> I thought it was a technical fix to the limitations of array slices in
> array
> assignment. The diagonal of a matrix is the quickest example.
As that's your quickest example, it shows what a can of worms this is.
!\$omp parallel workshare
forall(i=1:n)x(i,i)=1
!\$omp end parallel workshare

is optimized by few compilers, and doesn't bring much economy of
expression.  Equally few compilers take forall as an implicit invitation

The typical architectural requirement for threading to optimize this
operation, in view of inherent high rate of DTLB miss on current
architectures, may not have been foremost among the considerations when
the syntax was thought up originally.
```
 0
Tim
1/4/2010 5:59:20 PM
```<<--MM-->> <no.spma@now.it> wrote:

> Hello,
> I have a question regandin to "internal optimization" or the meanig of
> some instruction in fortran 95.
>
> I'm speaking of :
> - DO...ENDDO
> - FORALL
> - WHERE...END WHERE
>
> I'm using a laptop whit single core processor on linux ubuntu 9.10 and
> gfortran 4.4.1.
>
> Up to now I was shure that forall an where give me a increading of the
> speed in the computation due to internal optimization.

Forall was not designed with optimization in mind. It was designed (in
HPF) for parallelism, and then added to the Fortran standard as part of
incorporating the syntactic parts pf HPF. I don't have experience with
parallel machines to comment knowlegably. But for serial machines, there
is little reason to expect forall to be more efficient than simple DO
loops, and there is substantial data to suggest that it is often worse,
largely because it often involves temporary arrays. I don't know why you
would think that forall was somehow inherently more optimizable than DO
loops.

Tim and Gordon discussed that a little, but there is one point which
they did not mention and which I consider fundamental. Perhaps you know
this or consider it obvious. But you did ask, and there are some people
who definitely have been confused by the point, so I feel it important
to make.

DO is a looping construct. Forall and Where are array assignments. That
is a really fundamental difference. There are cases where one can
achieve a desired result using any of the forms, but do not let that
blind you to the fundamental difference. I have seen people take
"random" DO loops and change the syntax of the DO statement to that of a
FORALL, hoping that this might improve their performance or something.
Except in special cases, this results in something that won't even
compile.

--
Richard Maine                    | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle           |  -- Mark Twain
```
 0
nospam
1/5/2010 3:13:12 AM
```Richard Maine ha scritto:
> <<--MM-->> <no.spma@now.it> wrote:
>
>> Hello,
>> I have a question regandin to "internal optimization" or the meanig of
>> some instruction in fortran 95.
>>
>> I'm speaking of :
>> - DO...ENDDO
>> - FORALL
>> - WHERE...END WHERE
>>

[CUT]

>
> Forall was not designed with optimization in mind. It was designed (in
> HPF) for parallelism, and then added to the Fortran standard as part of
> incorporating the syntactic parts pf HPF. I don't have experience with
> parallel machines to comment knowlegably. But for serial machines, there
> is little reason to expect forall to be more efficient than simple DO
> loops, and there is substantial data to suggest that it is often worse,
> largely because it often involves temporary arrays. I don't know why you
> would think that forall was somehow inherently more optimizable than DO
> loops.
>
> Tim and Gordon discussed that a little, but there is one point which
> they did not mention and which I consider fundamental. Perhaps you know
> this or consider it obvious. But you did ask, and there are some people
> who definitely have been confused by the point, so I feel it important
> to make.
>
> DO is a looping construct. Forall and Where are array assignments. That
> is a really fundamental difference. There are cases where one can
> achieve a desired result using any of the forms, but do not let that
> blind you to the fundamental difference. I have seen people take
> "random" DO loops and change the syntax of the DO statement to that of a
> FORALL, hoping that this might improve their performance or something.
> Except in special cases, this results in something that won't even
> compile.
>

I read about Forall and Where in some paper/tutorial for the fortra95,
and in any case isn't clarifly the real difference, but the idea
suggested was that the compiler can optimize the internal code.
I mean in Do loop on an array a(i,j) I use normaly a sequencing via the
fast coordinate

do i=....
do j=...
a(i,j)=...
enddo
enddo

when the software is increasing I use FORALL e WHERE in order to reduce
the lines of code.

But now I discovered that in this case I can lost the efficency.

Is it true also for dual core or quad core processor?
```
 0
MM
1/5/2010 10:39:40 AM
```On 2010-01-05 06:39:40 -0400, "<<--MM-->>" <no.spma@now.it> said:

> Richard Maine ha scritto:
>> <<--MM-->> <no.spma@now.it> wrote:
>>
>>> Hello,
>>> I have a question regandin to "internal optimization" or the meanig of
>>> some instruction in fortran 95.
>>>
>>> I'm speaking of :
>>> - DO...ENDDO
>>> - FORALL
>>> - WHERE...END WHERE
>>>
>
> [CUT]
>
>>
>> Forall was not designed with optimization in mind. It was designed (in
>> HPF) for parallelism, and then added to the Fortran standard as part of
>> incorporating the syntactic parts pf HPF. I don't have experience with
>> parallel machines to comment knowlegably. But for serial machines, there
>> is little reason to expect forall to be more efficient than simple DO
>> loops, and there is substantial data to suggest that it is often worse,
>> largely because it often involves temporary arrays. I don't know why you
>> would think that forall was somehow inherently more optimizable than DO
>> loops.
>>
>> Tim and Gordon discussed that a little, but there is one point which
>> they did not mention and which I consider fundamental. Perhaps you know
>> this or consider it obvious. But you did ask, and there are some people
>> who definitely have been confused by the point, so I feel it important
>> to make.
>>
>> DO is a looping construct. Forall and Where are array assignments. That
>> is a really fundamental difference. There are cases where one can
>> achieve a desired result using any of the forms, but do not let that
>> blind you to the fundamental difference. I have seen people take
>> "random" DO loops and change the syntax of the DO statement to that of a
>> FORALL, hoping that this might improve their performance or something.
>> Except in special cases, this results in something that won't even
>> compile.
>>
>
> I read about Forall and Where in some paper/tutorial for the fortra95,
> and in any case isn't clarifly the real difference, but the idea
> suggested was that the compiler can optimize the internal code.
> I mean in Do loop on an array a(i,j) I use normaly a sequencing via the
> fast coordinate
>
> do i=....
> 	do j=...
> 		a(i,j)=...
> 	enddo
> enddo
>
> when the software is increasing I use FORALL e WHERE in order to reduce
> the lines of code.

To say it again, FORALL and WHERE are array assignments! To make it
trivial, FORALL
is allowed to go from 1 to n, from n down to 1, the even indices up and the odd
indices down and any other way it choses to do so. If it had n
processors it could
use all n in any random order it chose.

A DO loop of

do i = 2, n
a(i) = a(i) + a(i-1)
enddo

will give a progressive partial sum but the same appearance with a FORALL will
i = n, 2, -1"
but the FORALL would not. FORALL does this by having a hidden array
temporary that
might be optimized out. When the right hand side is complicated it can
be hard for
a programmer to figure out a sequential form so instead they just put
the results
into a temporary and copy the temporary at the end. Same for compilers
and FORALL
statements.

> But now I discovered that in this case I can lost the efficency.
>
> Is it true also for dual core or quad core processor?

Is your compiler (exactly that version of that vendor with exactly
those switches!!)
going to multiprocess or not. Clearly it depends. If the compiler comes
from a vendor
of parallel computers and you have paid for the full version and taken
the vendors
cources on parallelism then the chance go way up. Big ifs!

Mostly multicore allows the complier to run at the same time as your
email program.
Some I/O will be overlapped and even some music will be decoded in
parallel. But
beyond that it is hard work.

There is an old saying about yachts. If you have to ask the price you
can not afford one!
Here, if you have to ask about FORALL and WHERE you are very likely to
not be able to
use the parallel features they are intended to enable in very special
circumstnaces.

```
 0
Gordon
1/5/2010 2:18:50 PM
```<<--MM-->> wrote:

>
> I read about Forall and Where in some paper/tutorial for the fortra95,
> and in any case isn't clarifly the real difference, but the idea
> suggested was that the compiler can optimize the internal code.
> I mean in Do loop on an array a(i,j) I use normaly a sequencing via the
> fast coordinate
>
> do i=....
>     do j=...
>         a(i,j)=...
>     enddo
> enddo
>
> when the software is increasing I use FORALL e WHERE in order to reduce
> the lines of code.
>
> But now I discovered that in this case I can lost the efficency.
>
> Is it true also for dual core or quad core processor?
optimizing compiler which swaps loops (and you turn on that option),
nesting the loops backwards as you have done is likely to "lose efficiency."
Similar compiler analysis (or more) is needed to optimize a rank 2
forall().  where() presents somewhat different challenges to optimizing
compilers.
The point was mentioned that forall is intended to require a compiler to
diagnose and reject some situations which might prevent parallel
operation on multi-core.  This falls disappointingly short of actually
facilitating parallelism.
```
 0
Tim
1/5/2010 3:02:46 PM
```In article <1jbt60e.k9bpesczgc7wN%nospam@see.signature>,
nospam@see.signature (Richard Maine) writes:

> I don't know why you
> would think that forall was somehow inherently more optimizable than DO
> loops.

> DO is a looping construct. Forall and Where are array assignments. That
> is a really fundamental difference.

Maybe that is the reason he thought it would somehow be inherently more
optimizable.  DO implies doing things one after the other.  If the
compiler can prove to itself that parallel execution is OK, then it can
do that optimisation.  However, with FORALL and WHERE, there is no
serial implication, so the compiler can perhaps optimise a bit more
aggressively.

```
 0
helbig
1/5/2010 5:32:14 PM
```In article <hhvt2u\$1c0\$1@online.de>,
helbig@astro.multiCLOTHESvax.de (Phillip Helbig---undress to

> In article <1jbt60e.k9bpesczgc7wN%nospam@see.signature>,
> nospam@see.signature (Richard Maine) writes:
>
> > I don't know why you
> > would think that forall was somehow inherently more optimizable than DO
> > loops.
>
> > DO is a looping construct. Forall and Where are array assignments. That
> > is a really fundamental difference.
>
> Maybe that is the reason he thought it would somehow be inherently more
> optimizable.  DO implies doing things one after the other.  If the
> compiler can prove to itself that parallel execution is OK, then it can
> do that optimisation.  However, with FORALL and WHERE, there is no
> serial implication, so the compiler can perhaps optimise a bit more
> aggressively.

The semantics that we all wanted back in the 80s when the next
fortran revision (f88 :-) was being discussed was exactly what you
say above, a looping type construct in which the order of execution
is unspecified.  That matched the vector hardware of the time.
Unfortunately, FORALL adds a little more, and it is that little bit
extra that gets in the way of optimization.  In particular, the
problem seems to be the requirement that the statement is evaluated
"as if" everything on the right hand side is stored into a temporary
array of the appropriate size and then assigned to the left hand
side target array.  If the compiler can't figure out that the
temporary array is unneeded and assigns results directly to the
target array (which seems to be somewhere between "always" and "too
often"), then it actually does allocate a temporary array to hold
the intermediate results.  It is that allocation and deallocation
that seems to be the problem with optimization of FORALL.

The looping construct we wanted would have required the programmer
to make sure that the order of execution was not important.
Sometimes that is obvious for a statement or group of statements,
sometimes it isn't, so this was a potential source of coding errors
for programmers.  FORALL does the arbitrary-order part, but it
provides the safety net of evaluation-before-assignment so that the
programmer cannot possible make a mistake.  It is that safety net
that seems to be the cause of the optimization and performance
problems.

At this point, I don't know what the best solution is.  Should a new
DOALL construct be added that works the right way?  Should a
compiler directive be specified somehow in the standard to tell
FORALL to behave correctly?  There doesn't really seem to be a good
solution to the problem.  In hindsight, the FORALL semantics was a
bad choice, but once it was in the language it is practically
impossible to remove it, so we are stuck with it in the language
forever.

BTW, when FORALL was added, I thought it was what we all wanted.  I
did not recognize that such a seemingly minor difference between
what we really wanted and what we got would have such major
consequences.  As a result, I tend to avoid FORALL for all but
trivial statements.  If a loop is important to performance, I tend
to use old fashioned DO loops, or a mixture of DO loops and simple
array syntax.  Even if a FORALL behaves well on one compiler, you
can't rely on it working well on the next one.

\$.02 -Ron Shepard
```
 0
Ron
1/6/2010 1:34:21 AM
```Tim Prince ha scritto:
> <<--MM-->> wrote:
>
>>
>> I read about Forall and Where in some paper/tutorial for the fortra95,
>> and in any case isn't clarifly the real difference, but the idea
>> suggested was that the compiler can optimize the internal code.
>> I mean in Do loop on an array a(i,j) I use normaly a sequencing via
>> the fast coordinate
>>
>> do i=....
>>     do j=...
>>         a(i,j)=...
>>     enddo
>> enddo
>>
>> when the software is increasing I use FORALL e WHERE in order to
>> reduce the lines of code.
>>
>> But now I discovered that in this case I can lost the efficency.
>>
>> Is it true also for dual core or quad core processor?

Yes, you are right, it was a mistake (change i to j and viceversa)
```
 0
MM
1/6/2010 10:40:46 AM
```On Jan 6, 2:34=A0am, Ron Shepard <ron-shep...@NOSPAM.comcast.net> wrote:
> In article <hhvt2u\$1c...@online.de>,
> =A0hel...@astro.multiCLOTHESvax.de (Phillip Helbig---undress to
>
>
>
>
>
> > In article <1jbt60e.k9bpesczgc7wN%nos...@see.signature>,
> > nos...@see.signature (Richard Maine) writes:
>
> > > I don't know why you
> > > would think that forall was somehow inherently more optimizable than =
DO
> > > loops.
>
> > > DO is a looping construct. Forall and Where are array assignments. Th=
at
> > > is a really fundamental difference.
>
> > Maybe that is the reason he thought it would somehow be inherently more
> > optimizable. =A0DO implies doing things one after the other. =A0If the
> > compiler can prove to itself that parallel execution is OK, then it can
> > do that optimisation. =A0However, with FORALL and WHERE, there is no
> > serial implication, so the compiler can perhaps optimise a bit more
> > aggressively. =A0
>
> The semantics that we all wanted back in the 80s when the next
> fortran revision (f88 :-) was being discussed was exactly what you
> say above, a looping type construct in which the order of execution
> is unspecified. =A0That matched the vector hardware of the time. =A0
> Unfortunately, FORALL adds a little more, and it is that little bit
> extra that gets in the way of optimization. =A0In particular, the
> problem seems to be the requirement that the statement is evaluated
> "as if" everything on the right hand side is stored into a temporary
> array of the appropriate size and then assigned to the left hand
> side target array. =A0If the compiler can't figure out that the
> temporary array is unneeded and assigns results directly to the
> target array (which seems to be somewhere between "always" and "too
> often"), then it actually does allocate a temporary array to hold
> the intermediate results. =A0It is that allocation and deallocation
> that seems to be the problem with optimization of FORALL.
>
> The looping construct we wanted would have required the programmer
> to make sure that the order of execution was not important. =A0
> Sometimes that is obvious for a statement or group of statements,
> sometimes it isn't, so this was a potential source of coding errors
> for programmers. =A0FORALL does the arbitrary-order part, but it
> provides the safety net of evaluation-before-assignment so that the
> programmer cannot possible make a mistake. =A0It is that safety net
> that seems to be the cause of the optimization and performance
> problems.
>
> At this point, I don't know what the best solution is. =A0Should a new
> DOALL construct be added that works the right way? =A0Should a
> compiler directive be specified somehow in the standard to tell
> FORALL to behave correctly? =A0There doesn't really seem to be a good
> solution to the problem. =A0In hindsight, the FORALL semantics was a
> bad choice, but once it was in the language it is practically
> impossible to remove it, so we are stuck with it in the language
> forever.
>
> BTW, when FORALL was added, I thought it was what we all wanted. =A0I
> did not recognize that such a seemingly minor difference between
> what we really wanted and what we got would have such major
> consequences. =A0As a result, I tend to avoid FORALL for all but
> trivial statements. =A0If a loop is important to performance, I tend
> to use old fashioned DO loops, or a mixture of DO loops and simple
> array syntax. =A0Even if a FORALL behaves well on one compiler, you
> can't rely on it working well on the next one.
>
> \$.02 -Ron Shepard- Hide quoted text -
>
> - Show quoted text -

Is the DO CONCURRENT of Fortran 2008 what you want?

Regards,

Mike Metcalf
```
 0
m_b_metcalf
1/6/2010 3:13:11 PM