COMPGROUPS.NET | Search | Post Question | Groups | Stream | About | Register

### Most efficient way of vectorize the following code in Fortran?

• Email
• Follow

```Just wanted to see if Fortran really gives a huge speedup for my
application before I make decision to get deeply in converting my
current work into Fortran.

Could anybody show me what's the most Fortran translation of the
following Matlab code:

----------------------
for i=1:length(us);
u=us(i);
t6=u+kappa*eta;
t7=lambda*u*t./t6;
t8=lambda.*eta./t6.*log(1+u./eta/kappa*t5);
tmp(i)=exp(-u*theta*t-u*(y0-theta)/kappa*t5-rho(:) ' *t7(:)+rho(:)
' *t8(:));
end;
----------------------

Please note that here us is a vector; rho(:) is a column vector,
rho(:)' is the transpose and hence it becomes a row vector. t7(:) and
t8(:) are the column vectors of the same length as rho. rho(:) '
*t7(:) is dot product. Hence tmp(i) is a scalar. The result tmp is a
vector, having the same length as us.

Please note "us" is complex-valued vector, although all other
parameters are real-valued. The resultant tmp vector is also complex-
valued.

If Fortran can get rid of the loop, then I believe it is highly
efficient. In C/C++, I was unable to get rid of the loop(in C/C++ I
also have an inner loop of doing the dot-product.

Could anybody show me the best way of doing the above in Fortran?

I am using Intel Visual Fortran 10.1 integrated in MS VS .NET 2003, it
would be even more cool if I can couple the above code with some
compiler options to improve the speed. What are the options I can try?
I have 2 CPUs Xeon 2.4 GHz up to SSE2...

Thanks!

```
 0
Reply lunamoonmoon (258) 8/2/2007 3:48:12 AM

See related articles to this posting

```In article <1186026492.719912.129200@q3g2000prf.googlegroups.com>,
Luna Moon <lunamoonmoon@gmail.com> writes:>
> for i=1:length(us);
>     u=us(i);
>     t6=u+kappa*eta;
>     t7=lambda*u*t./t6;
>     t8=lambda.*eta./t6.*log(1+u./eta/kappa*t5);
>     tmp(i)=exp(-u*theta*t-u*(y0-theta)/kappa*t5-rho(:) ' *t7(:)+rho(:)
> ' *t8(:));
> end;
> ----------------------
>
> Please note that here us is a vector; rho(:) is a column vector,
> rho(:)' is the transpose and hence it becomes a row vector. t7(:) and
> t8(:) are the column vectors of the same length as rho. rho(:) '
> *t7(:) is dot product. Hence tmp(i) is a scalar. The result tmp is a
> vector, having the same length as us.
>
> Please note "us" is complex-valued vector, although all other
> parameters are real-valued. The resultant tmp vector is also complex-
> valued.

does this help?

t6 = us + kappa * eta
t7 = lambda * us * t / t6
t8 = lambda * eta / t6 * log(1 + us /  eta / kappa * t5)
tmp = exp(-us * theta * t - us * (y0 - theta) / kappa * t5
tmp = tmp - dot_product(transpose(rho), t7)
tmp = tmp + dot_product(transpose(rho), t8)

You, of course, forgot to tell us anything about kappa, eta, lambda,
t5, y0, and theta.  Note, t6, t7, t8, and tmp are possibly complex
valued arrays.

> If Fortran can get rid of the loop, then I believe it is highly
> efficient.

Why?

Judging from your other posts, you need to 1) get a book on
Fortran 95, 2) take a numerical analysis course, and 3) take
a course on programming.

--
Steve
```
 0
Reply kargl (773) 8/2/2007 4:16:34 AM

```Luna Moon wrote:
> Just wanted to see if Fortran really gives a huge speedup for my
> application before I make decision to get deeply in converting my
> current work into Fortran.

It should be a lot faster than Matlab...

> Could anybody show me what's the most Fortran translation of the
> following Matlab code:

> for i=1:length(us);
>     u=us(i);
>     t6=u+kappa*eta;
>     t7=lambda*u*t./t6;
>     t8=lambda.*eta./t6.*log(1+u./eta/kappa*t5);
>     tmp(i)=exp(-u*theta*t-u*(y0-theta)/kappa*t5-rho(:) ' *t7(:)+rho(:)
> ' *t8(:));
> end;

! for appropriate value of n
do i=1,n
u=us(i)
t6=u+kappa*eta
t7=lambda*u*t./t6
t8=lambda.*eta./t6.*log(1+u./eta/kappa*t5)
tmp(i)=exp(-u*theta*t-u*(y0-theta)/kappa*t5-&
dot_product(rho,t7)+dot_product(rhot8)
end do

(snip)

> If Fortran can get rid of the loop, then I believe it is highly
> efficient. In C/C++, I was unable to get rid of the loop(in C/C++ I
> also have an inner loop of doing the dot-product.

You can't get rid of the loop.  It might be that it Matlab
you can do some loops using matrix operations, but the loops
still have to be done internally.

-- glen

```
 0
Reply gah (12850) 8/2/2007 6:25:04 AM

```Steven G. Kargl wrote:
> 	Luna Moon <lunamoonmoon@gmail.com> writes:>
>> for i=1:length(us);
>>     u=us(i);
>>     t6=u+kappa*eta;
>>     t7=lambda*u*t./t6;
>>     t8=lambda.*eta./t6.*log(1+u./eta/kappa*t5);
>>     tmp(i)=exp(-u*theta*t-u*(y0-theta)/kappa*t5-rho(:) ' *t7(:)+rho(:)
>> ' *t8(:));
>> end;
>> ----------------------
>>
>> Please note that here us is a vector; rho(:) is a column vector,
>> rho(:)' is the transpose and hence it becomes a row vector. t7(:) and
>> t8(:) are the column vectors of the same length as rho. rho(:) '
>> *t7(:) is dot product. Hence tmp(i) is a scalar. The result tmp is a
>> vector, having the same length as us.
>>
>> Please note "us" is complex-valued vector, although all other
>> parameters are real-valued. The resultant tmp vector is also complex-
>> valued.
>
> does this help?
>
>     t6 = us + kappa * eta
>     t7 = lambda * us * t / t6
>     t8 = lambda * eta / t6 * log(1 + us /  eta / kappa * t5)
>     tmp = exp(-us * theta * t - us * (y0 - theta) / kappa * t5
>     tmp = tmp - dot_product(transpose(rho), t7)
>     tmp = tmp + dot_product(transpose(rho), t8)
>
> You, of course, forgot to tell us anything about kappa, eta, lambda,
> t5, y0, and theta.  Note, t6, t7, t8, and tmp are possibly complex
> valued arrays.
>
>
>> If Fortran can get rid of the loop, then I believe it is highly
>> efficient.
>
> Why?
>
> Judging from your other posts, you need to 1) get a book on
> Fortran 95, 2) take a numerical analysis course, and 3) take
> a course on programming.

4) RTFM
```
 0
Reply bogle (300) 8/2/2007 7:07:11 AM

```glen herrmannsfeldt wrote:

>> If Fortran can get rid of the loop, then I believe it is highly
>> efficient. In C/C++, I was unable to get rid of the loop(in C/C++ I
>> also have an inner loop of doing the dot-product.
>
> You can't get rid of the loop.  It might be that it Matlab
> you can do some loops using matrix operations, but the loops
> still have to be done internally.

Just to elaborate a little.... you can't get rid of the loop, in the
sense that the total number of mathematical operations to be done is the
same. Whether you write
DO i=1,n
a(i) = b(i) * c(i)
END DO
or
a = b * c
you should get the same performance on all compilers.

I can't speak for Matlab, but if the OP had been talking about IDL.....

IDL is an interpreted language, which means that writing out the loop
explicitly is very slow - for each interation of the loop, the full
interpreter must run. However, if you write out array expressions, then
the multiplication can be handed off to machine code immediately. This
means that the first form given above will run much slower than the
second (I've read reports of multi-day run times shrinking to hours when
optimisations like this were applied). If Matlab is similar, then
replacing loops with vector expressions will give dramatic speed ups -
possibly even giving performance comparable to compiled Fortran, if
enough of the work is shovelled into them.

Richard
```
 0
Reply rge219476 (61) 8/2/2007 1:10:55 PM

```On Aug 2, 12:16 am, ka...@troutmask.apl.washington.edu (Steven G.
Kargl) wrote:
>         Luna Moon <lunamoonm...@gmail.com> writes:>
>
>
>
> > for i=1:length(us);
> >     u=us(i);
> >     t6=u+kappa*eta;
> >     t7=lambda*u*t./t6;
> >     t8=lambda.*eta./t6.*log(1+u./eta/kappa*t5);
> >     tmp(i)=exp(-u*theta*t-u*(y0-theta)/kappa*t5-rho(:) ' *t7(:)+rho(:)
> > ' *t8(:));
> > end;
> > ----------------------
>
> > Please note that here us is a vector; rho(:) is a column vector,
> > rho(:)' is the transpose and hence it becomes a row vector. t7(:) and
> > t8(:) are the column vectors of the same length as rho. rho(:) '
> > *t7(:) is dot product. Hence tmp(i) is a scalar. The result tmp is a
> > vector, having the same length as us.
>
> > Please note "us" is complex-valued vector, although all other
> > parameters are real-valued. The resultant tmp vector is also complex-
> > valued.
>
> does this help?
>
>     t6 = us + kappa * eta
>     t7 = lambda * us * t / t6
>     t8 = lambda * eta / t6 * log(1 + us /  eta / kappa * t5)
>     tmp = exp(-us * theta * t - us * (y0 - theta) / kappa * t5
>     tmp = tmp - dot_product(transpose(rho), t7)
>     tmp = tmp + dot_product(transpose(rho), t8)
>
> You, of course, forgot to tell us anything about kappa, eta, lambda,
> t5, y0, and theta.  Note, t6, t7, t8, and tmp are possibly complex
> valued arrays.
>
> > If Fortran can get rid of the loop, then I believe it is highly
> > efficient.
>
> Why?
>
> Judging from your other posts, you need to 1) get a book on
> Fortran 95, 2) take a numerical analysis course, and 3) take
> a course on programming.
>
> --

kappa, eta, lambda,
t5, y0, and theta are all scalars...

You can really get rid of the loop?

```
 0
Reply lunamoonmoon (258) 8/2/2007 1:16:10 PM

```On Aug 2, 12:16 am, ka...@troutmask.apl.washington.edu (Steven G.
Kargl) wrote:
>         Luna Moon <lunamoonm...@gmail.com> writes:>
>
>
>
> > for i=1:length(us);
> >     u=us(i);
> >     t6=u+kappa*eta;
> >     t7=lambda*u*t./t6;
> >     t8=lambda.*eta./t6.*log(1+u./eta/kappa*t5);
> >     tmp(i)=exp(-u*theta*t-u*(y0-theta)/kappa*t5-rho(:) ' *t7(:)+rho(:)
> > ' *t8(:));
> > end;
> > ----------------------
>
> > Please note that here us is a vector; rho(:) is a column vector,
> > rho(:)' is the transpose and hence it becomes a row vector. t7(:) and
> > t8(:) are the column vectors of the same length as rho. rho(:) '
> > *t7(:) is dot product. Hence tmp(i) is a scalar. The result tmp is a
> > vector, having the same length as us.
>
> > Please note "us" is complex-valued vector, although all other
> > parameters are real-valued. The resultant tmp vector is also complex-
> > valued.
>
> does this help?
>
>     t6 = us + kappa * eta
>     t7 = lambda * us * t / t6
>     t8 = lambda * eta / t6 * log(1 + us /  eta / kappa * t5)
>     tmp = exp(-us * theta * t - us * (y0 - theta) / kappa * t5
>     tmp = tmp - dot_product(transpose(rho), t7)
>     tmp = tmp + dot_product(transpose(rho), t8)
>
> You, of course, forgot to tell us anything about kappa, eta, lambda,
> t5, y0, and theta.  Note, t6, t7, t8, and tmp are possibly complex
> valued arrays.
>
> > If Fortran can get rid of the loop, then I believe it is highly
> > efficient.
>
> Why?
>
> Judging from your other posts, you need to 1) get a book on
> Fortran 95, 2) take a numerical analysis course, and 3) take
> a course on programming.
>
> --

Any good reference Fortran books on vector and array operations?

```
 0
Reply lunamoonmoon (258) 8/2/2007 1:16:41 PM

```On Aug 2, 9:10 am, Richard Edgar <rg...@pas.rochester.edu> wrote:
> glen herrmannsfeldt wrote:
> >> If Fortran can get rid of the loop, then I believe it is highly
> >> efficient. In C/C++, I was unable to get rid of the loop(in C/C++ I
> >> also have an inner loop of doing the dot-product.
>
> > You can't get rid of the loop.  It might be that it Matlab
> > you can do some loops using matrix operations, but the loops
> > still have to be done internally.
>
> Just to elaborate a little.... you can't get rid of the loop, in the
> sense that the total number of mathematical operations to be done is the
> same. Whether you write
> DO i=1,n
>   a(i) = b(i) * c(i)
> END DO
> or
> a = b * c
> you should get the same performance on all compilers.

To elaborate further, one advantage of using Fortran 90 and later
versions over interpreted languages is that you can write loops OR the
equivalent array operations and get comparable speed. Fortran
compilers have been optimizing loops for literally half a century. So
write the code in the easiest and clearest way.

```
 0
Reply beliavsky (2211) 8/2/2007 1:24:10 PM

```Richard Edgar wrote:
> glen herrmannsfeldt wrote:
>
>>> If Fortran can get rid of the loop, then I believe it is highly
>>> efficient. In C/C++, I was unable to get rid of the loop(in C/C++ I
>>> also have an inner loop of doing the dot-product.
>> You can't get rid of the loop.  It might be that it Matlab
>> you can do some loops using matrix operations, but the loops
>> still have to be done internally.
>
> Just to elaborate a little.... you can't get rid of the loop, in the
> sense that the total number of mathematical operations to be done is the
> same. Whether you write
> DO i=1,n
>   a(i) = b(i) * c(i)
> END DO
> or
> a = b * c
> you should get the same performance on all compilers.
>
> I can't speak for Matlab, but if the OP had been talking about IDL.....
>
> IDL is an interpreted language, which means that writing out the loop
> explicitly is very slow - for each interation of the loop, the full
> interpreter must run. However, if you write out array expressions, then
> the multiplication can be handed off to machine code immediately. This
> means that the first form given above will run much slower than the
> second (I've read reports of multi-day run times shrinking to hours when
> optimisations like this were applied). If Matlab is similar, then
> replacing loops with vector expressions will give dramatic speed ups -
> possibly even giving performance comparable to compiled Fortran, if
> enough of the work is shovelled into them.

Matlab is, of course, similar along w/ many other "tricks" of
optimization such as pre-allocating arrays, function caching, etc.
Consequently, as you note, for problems that are deeply compute-bound,
the speedups from compiled code often aren't what might be dreamed of as
judicious choices in how the Matlab code is written can often accomplish
much of the same optimization...

--
```
 0
Reply none1568 (7455) 8/2/2007 2:23:50 PM

```Steven G. Kargl wrote:
> 	Luna Moon <lunamoonmoon@gmail.com> writes:>
>> for i=1:length(us);
>>     u=us(i);
>>     t6=u+kappa*eta;
>>     t7=lambda*u*t./t6;
>>     t8=lambda.*eta./t6.*log(1+u./eta/kappa*t5);
>>     tmp(i)=exp(-u*theta*t-u*(y0-theta)/kappa*t5-rho(:) ' *t7(:)+rho(:)
>> ' *t8(:));
>> end;
>> ----------------------
>>
>> Please note that here us is a vector; rho(:) is a column vector,
>> rho(:)' is the transpose and hence it becomes a row vector. t7(:) and
>> t8(:) are the column vectors of the same length as rho. rho(:) '
>> *t7(:) is dot product. Hence tmp(i) is a scalar. The result tmp is a
>> vector, having the same length as us.
>>
>> Please note "us" is complex-valued vector, although all other
>> parameters are real-valued. The resultant tmp vector is also complex-
>> valued.
>
> does this help?
>
>     t6 = us + kappa * eta
>     t7 = lambda * us * t / t6
>     t8 = lambda * eta / t6 * log(1 + us /  eta / kappa * t5)
>     tmp = exp(-us * theta * t - us * (y0 - theta) / kappa * t5
>     tmp = tmp - dot_product(transpose(rho), t7)
>     tmp = tmp + dot_product(transpose(rho), t8)
Except you don't want the transpose.  rho is a vector and a Fortran
one dimensional array can be either a row or column vector.  So,
I think you could do the last two lines as
tmp = tmp + dot_product(rho, t7 + t8)

Although I always have to draw a picture to make sure.

Dick Hendrickson
>
> You, of course, forgot to tell us anything about kappa, eta, lambda,
> t5, y0, and theta.  Note, t6, t7, t8, and tmp are possibly complex
> valued arrays.
>
>
>> If Fortran can get rid of the loop, then I believe it is highly
>> efficient.
>
> Why?
>
> Judging from your other posts, you need to 1) get a book on
> Fortran 95, 2) take a numerical analysis course, and 3) take
> a course on programming.
>
```
 0
Reply dick.hendrickson (1286) 8/2/2007 6:54:04 PM

```Dick Hendrickson wrote:

> Steven G. Kargl wrote:
>
>>     Luna Moon <lunamoonmoon@gmail.com> writes:>
>>
>>> for i=1:length(us);
>>>     u=us(i);
>>>     t6=u+kappa*eta;
>>>     t7=lambda*u*t./t6;
>>>     t8=lambda.*eta./t6.*log(1+u./eta/kappa*t5);
>>>     tmp(i)=exp(-u*theta*t-u*(y0-theta)/kappa*t5-rho(:) ' *t7(:)+rho(:)
>>> ' *t8(:));
>>> end;

(snip)

> Except you don't want the transpose.  rho is a vector and a Fortran
> one dimensional array can be either a row or column vector.  So,
> I think you could do the last two lines as
>       tmp = tmp + dot_product(rho, t7 + t8)

I think you mean dot_product(rho, t8-t7), which should be somewhat
vectoring not other types of optimization that might be done.

-- glen

```
 0
Reply gah (12850) 8/2/2007 8:11:52 PM

```On Aug 2, 9:24 am, Beliavsky <beliav...@aol.com> wrote:
> On Aug 2, 9:10 am, Richard Edgar <rg...@pas.rochester.edu> wrote:
>
>
>
> > glen herrmannsfeldt wrote:
> > >> If Fortran can get rid of the loop, then I believe it is highly
> > >> efficient. In C/C++, I was unable to get rid of the loop(in C/C++ I
> > >> also have an inner loop of doing the dot-product.
>
> > > You can't get rid of the loop.  It might be that it Matlab
> > > you can do some loops using matrix operations, but the loops
> > > still have to be done internally.
>
> > Just to elaborate a little.... you can't get rid of the loop, in the
> > sense that the total number of mathematical operations to be done is the
> > same. Whether you write
> > DO i=1,n
> >   a(i) = b(i) * c(i)
> > END DO
> > or
> > a = b * c
> > you should get the same performance on all compilers.
>
> To elaborate further, one advantage of using Fortran 90 and later
> versions over interpreted languages is that you can write loops OR the
> equivalent array operations and get comparable speed. Fortran
> compilers have been optimizing loops for literally half a century. So
> write the code in the easiest and clearest way.

Loop or not to loop?

Delooping means to change loop into array based operations.

At least in Matlab, delooping will give me more superior speed...

```
 0
Reply lunamoonmoon (258) 8/4/2007 5:39:39 PM

```That does not generally hold for Fortran. Matlab is interpreted,
Fortran is compiled.

```
 0
Reply highegg (245) 8/4/2007 5:51:38 PM

```Luna Moon wrote:
> On Aug 2, 9:24 am, Beliavsky <beliav...@aol.com> wrote:
>> On Aug 2, 9:10 am, Richard Edgar <rg...@pas.rochester.edu> wrote:
>>
>>
>>
>>> glen herrmannsfeldt wrote:
>>>>> If Fortran can get rid of the loop, then I believe it is highly
>>>>> efficient. In C/C++, I was unable to get rid of the loop(in C/C++ I
>>>>> also have an inner loop of doing the dot-product.
>>>> You can't get rid of the loop.  It might be that it Matlab
>>>> you can do some loops using matrix operations, but the loops
>>>> still have to be done internally.
>>> Just to elaborate a little.... you can't get rid of the loop, in the
>>> sense that the total number of mathematical operations to be done is the
>>> same. Whether you write
>>> DO i=1,n
>>>   a(i) = b(i) * c(i)
>>> END DO
>>> or
>>> a = b * c
>>> you should get the same performance on all compilers.
>> To elaborate further, one advantage of using Fortran 90 and later
>> versions over interpreted languages is that you can write loops OR the
>> equivalent array operations and get comparable speed. Fortran
>> compilers have been optimizing loops for literally half a century. So
>> write the code in the easiest and clearest way.
>
> Loop or not to loop?
>
> Delooping means to change loop into array based operations.
>
> At least in Matlab, delooping will give me more superior speed...

As you have been told repeatedly already, "vectorizing" in Matlab simply
moves explicit loops in m-files out of the interpreter into equivalent
loops in a compiled language.

Fortran array notation is, for the most part, syntactical sugar letting
the compiler do the work of writing the code for the loops for you.  It
will not, in general, be able to generate more efficient code than the
equivalent operations written explicitly in DO ... ENDDO fashion.  As
already pointed out earlier in this subthread -- there are a fixed
number of floating point operations involved in the computation and they
are going to take so many FP cycles on a given machine.  How the source
code is structured isn't going to make some of those go away.

IOW, you seem to be tilting at windmills looking for some magical
speedup that just isn't going to happen in the manner in which you're
approaching the problem.

Algorithms or multiple processors and effective parallelization to make
use of them or similar techniques may help -- a Holy Grail approach in
search of "best" solvers or language or source code format isn't going to.

--

```
 0
Reply none1568 (7455) 8/4/2007 6:11:28 PM

```On 2007-08-04 14:39:39 -0300, Luna Moon <lunamoonmoon@gmail.com> said:

> On Aug 2, 9:24 am, Beliavsky <beliav...@aol.com> wrote:
>> On Aug 2, 9:10 am, Richard Edgar <rg...@pas.rochester.edu> wrote:
>>
>>
>>
>>> glen herrmannsfeldt wrote:
>>>>> If Fortran can get rid of the loop, then I believe it is highly
>>>>> efficient. In C/C++, I was unable to get rid of the loop(in C/C++ I
>>>>> also have an inner loop of doing the dot-product.
>>
>>>> You can't get rid of the loop.  It might be that it Matlab
>>>> you can do some loops using matrix operations, but the loops
>>>> still have to be done internally.
>>
>>> Just to elaborate a little.... you can't get rid of the loop, in the
>>> sense that the total number of mathematical operations to be done is the
>>> same. Whether you write
>>> DO i=1,n
>>> a(i) = b(i) * c(i)
>>> END DO
>>> or
>>> a = b * c
>>> you should get the same performance on all compilers.
>>
>> To elaborate further, one advantage of using Fortran 90 and later
>> versions over interpreted languages is that you can write loops OR the
>> equivalent array operations and get comparable speed. Fortran
>> compilers have been optimizing loops for literally half a century. So
>> write the code in the easiest and clearest way.
>
> Loop or not to loop?
>
> Delooping means to change loop into array based operations.
>
> At least in Matlab, delooping will give me more superior speed...

In MatLab you have two choices for 1000 elements

1. 1000 steps you can see     that each cost 1 millisecond
2. 1000 steps that are hidden that each cost 1 microsecond

hiding by vectorization helps because the interpreter is slow

In Fortran you have the same two choices

1. 1000 steps you can see     that each cost 1 microsecond
2. 1000 steps that are hidden that each cost 1 microsecond

hiding by vectorization has no effect in Fortran as there is no
slow interpreter associated with each step. Rather there is a compiler
which adds its cost much in advance but allows the execution to
proceed at the full machine speed which ever way you choose to
write things.

The details of the costs may not be exactly what you are seeing but the
effect of compiling vrs interpreting will be there.

```
 0
Reply g.sande (1185) 8/4/2007 6:31:14 PM

14 Replies
68 Views

Similar Articles

12/9/2013 7:16:36 PM
[PageSpeed]