Hello,
I am working on merging two numerical codes which pretty much do the
same. The main routine operates on a set of arrays which are of type
real in one case and of type complex in another case.
This is because I work with Fourier transforms in one case. In this
case, I will have the real fields as well as the inverse FFT, which
allows many routines of the real program to be used.
So a good part of code is the same, but some routines work slightly
different.
To determine which type to use, the program parses a configuration file.
In pseudo-code, I would like something like this:
program do_stuff
logical do_complex
! parse config file sets do_complex = true or do_complex=false
call parse_config_file('input.ini', do_complex)
...
if (do_complex = true) then
complex, allocatable array1(:,:,:)
... (more arrays)
allocate arrays...
...
else
real(kind=8), allocatable array1(:,:,:)
... (more arrays)
allocate arrays...
end if
do t=0:T
if ( do_complex = true ) then
do_stuff_with_arrays_cplx(array1, array2, array3,...)
else
do_stuff_with_arrays_real(array1, array2, array3,...)
do_common_routines(array1, array2, ...)
end do
end program
What is the Fortran way of implementing this? I understand that I
cannot declare variables after my first executable statement.
I try to adhere to the Fortran 95 style since gfortran supports it. But
I mainly use the intel fortran compiler.
Any ideas or suggestions?
Cheers, Ralph
|
|
0
|
|
|
|
Reply
|
Ralph
|
9/22/2010 3:25:42 PM |
|
Ralph Kube wrote:
....
> In pseudo-code, I would like something like this:
....
> logical do_complex
> ...
> if (do_complex = true) then
> complex, allocatable array1(:,:,:)
> ...
> else
> real(kind=8), allocatable array1(:,:,:)
....
> What is the Fortran way of implementing this? I understand that I
> cannot declare variables after my first executable statement.
....
Declare the arrays ALLOCATABLE of each type without. Until the
appropriate ALLOCATE statement is executed depending on the logical
there is no memory loss.
If the routines are similar enough, one could write/use a generic
interface to handle the argument type difference as well.
--
|
|
0
|
|
|
|
Reply
|
dpb
|
9/22/2010 5:00:01 PM
|
|
dpb <none@non.net> wrote:
> Ralph Kube wrote:
> ...
>> In pseudo-code, I would like something like this:
>> logical do_complex
>> ...
>> if (do_complex = true) then
>> complex, allocatable array1(:,:,:)
>> ...
>> else
>> real(kind=8), allocatable array1(:,:,:)
(snip)
> Declare the arrays ALLOCATABLE of each type without. Until the
> appropriate ALLOCATE statement is executed depending on the logical
> there is no memory loss.
I believe that is a fine way to do it, though the OP didn't
say that memory use was the problem.
It is not unusual for FFT routines to do all the processing
in REAL arrays, even though called with COMPLEX. Even though
the standard didn't require it, in the Fortran 66 and Fortran 77
days it pretty much always worked. It should still work with
assumed size arrays.
FFT routines do a lot of multiplication by (0.,1.) which can
be done faster by just selecting the elements in the right order.
> If the routines are similar enough, one could write/use a generic
> interface to handle the argument type difference as well.
I think it would depend on the routines being used.
Well, also, it is common to need to call an FFT routine with
a real (imaginary part zero) array, and get a complex
(with the appropriate symmetry) result. There is usually
a special case for this in FFT libraries.
-- glen
|
|
0
|
|
|
|
Reply
|
glen
|
9/22/2010 6:53:41 PM
|
|
glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
> It is not unusual for FFT routines to do all the processing
> in REAL arrays, even though called with COMPLEX. Even though
> the standard didn't require it, in the Fortran 66 and Fortran 77
> days it pretty much always worked. It should still work with
> assumed size arrays.
That's for some values of "should". It isn't standard conforming and
never has been. Therefore, you might find compilers rejecting or
otherwise complaining about it. I agree that the usual implementations
ought to work as long as you can convince the compiler to "shut up and
trust you." That's actually a little harder these days because compilers
are more prone to check for such things.
It is pretty much guaranteed not to work for module procedures or
anything else with an explicit interface (unless you make the explicit
interface lie). Having assumed size arrays won't help there. That's
partly because generic procedures can disambiguate based on real versus
complex.
--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain
|
|
0
|
|
|
|
Reply
|
nospam
|
9/22/2010 7:54:51 PM
|
|
glen herrmannsfeldt wrote:
> dpb <none@non.net> wrote:
....
>> Declare the arrays ALLOCATABLE of each type without. Until the
>> appropriate ALLOCATE statement is executed depending on the logical
>> there is no memory loss.
>
> I believe that is a fine way to do it, though the OP didn't
> say that memory use was the problem.
True he didn't say it but one would presume it's in the back of the mind
in asking the question...
....
>> If the routines are similar enough, one could write/use a generic
>> interface to handle the argument type difference as well.
>
> I think it would depend on the routines being used.
....
Think???? :)
The problems of whether to make assumptions about storage order to fool
old-style F77 code I'll leave out as the OP mentioned specifically was
trying to be F95-aware. That I'd assume would mean using the features
of modules, etc., to catch coding errors as opposed to trying to hide
such from the compiler.
Hence, the suggestion couched w/ the "similar enough" w/ the express
intent of OP judging the applicability to his application space and I
leaving the crystal ball parked in the garage this time...
--
|
|
0
|
|
|
|
Reply
|
dpb
|
9/22/2010 9:40:07 PM
|
|
dpb <none@non.net> wrote:
> glen herrmannsfeldt wrote:
>> dpb <none@non.net> wrote:
> ...
>>> Declare the arrays ALLOCATABLE of each type without. Until the
>>> appropriate ALLOCATE statement is executed depending on the logical
>>> there is no memory loss.
>> I believe that is a fine way to do it, though the OP didn't
>> say that memory use was the problem.
> True he didn't say it but one would presume it's in the back of the mind
> in asking the question...
>>> If the routines are similar enough, one could write/use a generic
>>> interface to handle the argument type difference as well.
>> I think it would depend on the routines being used.
> ...
> Think???? :)
> The problems of whether to make assumptions about storage order to fool
> old-style F77 code I'll leave out as the OP mentioned specifically was
> trying to be F95-aware.
That is true, but it seems that the FFT is only in one of the cases.
It may be that it uses the FFT for the complex case, and something
completely different for the real case.
Generic is probably not the best choice for completely unrelated
algorithms that might have similar data as input.
More details are needed for a more specific reply.
-- glen
|
|
0
|
|
|
|
Reply
|
glen
|
9/22/2010 10:32:17 PM
|
|
glen herrmannsfeldt wrote:
....
> More details are needed for a more specific reply.
THINK????
--
|
|
0
|
|
|
|
Reply
|
dpb
|
9/22/2010 10:46:21 PM
|
|
On 09/23/2010 12:32 AM, glen herrmannsfeldt wrote:
> dpb<none@non.net> wrote:
>> glen herrmannsfeldt wrote:
>>> dpb<none@non.net> wrote:
>>>> If the routines are similar enough, one could write/use a generic
>>>> interface to handle the argument type difference as well.
>
>>> I think it would depend on the routines being used.
>> ...
>
>> Think???? :)
>
>> The problems of whether to make assumptions about storage order to fool
>> old-style F77 code I'll leave out as the OP mentioned specifically was
>> trying to be F95-aware.
>
> That is true, but it seems that the FFT is only in one of the cases.
>
> It may be that it uses the FFT for the complex case, and something
> completely different for the real case.
>
> Generic is probably not the best choice for completely unrelated
> algorithms that might have similar data as input.
>
> More details are needed for a more specific reply.
>
> -- glen
The code uses FFTs only in the case my arrays are of type complex. That
is because I use the fftw library and they advise against using their
routines that operate on real data over using the routines operating on
complex data.
So the way I see it, I have 2 possibilities for my problem:
1.) Allocate the array after I have decided which data type
(real/complex) is used
2.) Implement FFTs operating on real type arrays and use some flag to
decide which routines to call
Of course memory is in the back of my head, as I need to allocate many
arrays and do want to keep the memory imprint as low as possible.
I do not quiet understand what you mean when you say "make assumptions
about storage order...", could you give an example of what you mean by
this? I do have little experience with fortran.
|
|
0
|
|
|
|
Reply
|
Ralph
|
9/23/2010 6:51:22 AM
|
|
Ralph Kube <rku000@post.uit.no> wrote:
(snip, I wrote)
>> That is true, but it seems that the FFT is only in one of the cases.
>> It may be that it uses the FFT for the complex case, and something
>> completely different for the real case.
(snip)
> The code uses FFTs only in the case my arrays are of type complex. That
> is because I use the fftw library and they advise against using their
> routines that operate on real data over using the routines operating on
> complex data.
There is another case to consider, and that is the complex FFT
from real input data.
In the obvious case, you take N real data points, add a zero
imaginary part to create an N point complex array, and give
that to the FFT routine. If the input data describes an even
function in the FFT sense, such that X(i) equal X(N-i-2)
(for origin 1 arrays) the transform is real. In the case
of an odd function, such that X(i) equals -X(N-i-2), the
transform is pure imaginary. There is a process to convert
an N point real array into an N/2 point complex array,
transform that, then convert to the appropriate N point
complex result.
> I do not quiet understand what you mean when you say "make
> assumptions about storage order...", could you give an
> example of what you mean by this?
> I do have little experience with fortran.
Much of the Fortran 66 and Fortran 77 era FFT code
assumes that you can pass an N element COMPLEX array to a
subroutine, and treat it in the subroutine as a 2N element
REAL array. Among others, that depends on the real part coming
before the imaginary part in memory. This works out well
for the FFT where it is common to multiply elements by
(0.,1.), that is, i. Instead of actually doing a complex
multiply, the routine considers that
cmplx(-y,x) equal cmplx(x,y)*(0.,1.)
and selects the array elements in the appropriate order,
saving two multiplies by zero and two multiplies by one.
-- glen
|
|
0
|
|
|
|
Reply
|
glen
|
9/23/2010 11:17:52 AM
|
|
"Ralph Kube" <rku000@post.uit.no> wrote in message news:i7d760$2ote$1@inn.uit.no...
| Hello,
| I am working on merging two numerical codes which pretty much do the
| same. The main routine operates on a set of arrays which are of type
| real in one case and of type complex in another case.
| This is because I work with Fourier transforms in one case. In this
| case, I will have the real fields as well as the inverse FFT, which
| allows many routines of the real program to be used.
| So a good part of code is the same, but some routines work slightly
| different.
This kind of problem lends itself to use of generic procedures.
In your case, one procedure for real and the other for complex.
|
|
0
|
|
|
|
Reply
|
robin
|
9/23/2010 12:50:32 PM
|
|
glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
> Much of the Fortran 66 and Fortran 77 era FFT code
> assumes that you can pass an N element COMPLEX array to a
> subroutine, and treat it in the subroutine as a 2N element
> REAL array. Among others, that depends on the real part coming
> before the imaginary part in memory.
That part *IS* guaranteed by the standard. There is no guarantee that
you can make the call at all with the wrong type, but the memory order
is guaranteed. If you find an f66 or later compiler where that doesn't
work, the compiler is broken, and fairly badly so.
There are perfectly legal ways to do some variants of this and the
compiler has to get those right. In particular, you could use
equivalence. That's awkward in that it requires fixed array size, but it
is a case that was done.
--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain
|
|
0
|
|
|
|
Reply
|
nospam
|
9/23/2010 3:47:35 PM
|
|
Ralph Kube wrote:
....
> The code uses FFTs only in the case my arrays are of type complex. That
> is because I use the fftw library and they advise against using their
> routines that operate on real data over using the routines operating on
> complex data.
Well, I can't say I'm at all familiar w/ the fftw implementation, but at
www.fftw.org I find the following--
Features
FFTW 3.2.2 is the latest official version of FFTW (refer to the release
notes to find out what is new). Here is a list of some of FFTW's more
interesting features:
* Speed. (Supports SSE/SSE2/3dNow!/Altivec, since version 3.0.)
* Both one-dimensional and multi-dimensional transforms.
* Arbitrary-size transforms. (Sizes with small prime factors are
best, but FFTW uses O(N log N) algorithms even for prime sizes.)
* Fast transforms of purely real input or output data.
....
So, it doesn't seem to me that that coincides w/ the above statement.
Again, there isn't enough information about what you're really after or
how similar (or dissimilar) the two paths are to make any really
definitive recommendations, but I'd first of all wonder if the sizes are
such that it really matters much, anyway.
And, even if the size is significant, are you really going to routinely
treat a real time-series of twice the length as that of the complex
valued one so there's really any benefit in the problem size that is
analyzed one way versus the other?
IOW, is there really any benefit in even worrying over the two paths
particularly, especially since as noted the actual memory is only
allocated depending on the size and type requested (as the ALLOCATABLE
statement only creates a handle if you will, actual memory isn't
allocated until the ALLOCATE statement is executed)?
So, I'm wondering if you couldn't just have COMPLEX and forget about
REAL and just leave the imaginary part of the complex array zero for the
real inputs and be just as well off in terms of what actual problems you
solve.
That leaves the question of which algorithm to apply to be determined;
knowing nothing of the application that can't be answered or even
conjectured upon and that's not a Fortran question at all, of course.
Anyway, I'm wondering if you're making a problem that doesn't exist by
over-thinking it...
ps. In 40 years, I've never had a case where I had measured data to do
Fourier analysis on that was complex to start with -- every transducer I
ever had only had real outputs. So, I don't know the application that
would _start_ w/ complex data--I only had complex data arise _from_ the
FFT and proceeded from there.
I'm sure there must be areas that generate such--I just don't know what
they are; I always had measured stuff like vibrations, pressures, etc., ...
--
|
|
0
|
|
|
|
Reply
|
none1568 (6639)
|
9/23/2010 6:39:44 PM
|
|
"dpb" <none@non.net> wrote in message
news:i7g74g$q0j$1@news.eternal-september.org...
> So, I'm wondering if you couldn't just have COMPLEX and forget about REAL
> and just leave the imaginary part of the complex array zero for the real
> inputs and be just as well off in terms of what actual problems you solve.
> ps. In 40 years, I've never had a case where I had measured data to do
> Fourier analysis on that was complex to start with -- every transducer I
> ever had only had real outputs. So, I don't know the application that
> would _start_ w/ complex data--I only had complex data arise _from_ the
> FFT and proceeded from there.
That's why there are special algorithms that perform real-to-half-complex
FFTs in less than half the operations it takes to do the full complex FFT.
A classic paper in this regard is:
Sorensen, Jones, Heideman, and Burrus, "Real-valued fast Fourier transform
algorithms," IEEE Trans. Acoust. Speech Sig. Processing ASSP-35, 849-864
(1987).
Whenever I see someone post about wanting to compute FFTs it makes me think
of lambs being led to the slaughter. The first mistake is to do a full
complex FFT on real data and the second is to apply the convolution theorem
to calculate cyclic convolution when it is known not to be the fastest
method, at least in terms of operation counts. Since the lower operation
count alternative is also more readily adaptable to SIMD instruction sets,
it should be faster also in terms of wall-clock time although I don't know
of any fast convolution packages that implement it.
--
write(*,*) transfer(0.64682312090346863D-153,(/'X'/));end
|
|
0
|
|
|
|
Reply
|
James
|
9/23/2010 9:24:34 PM
|
|
James Van Buskirk wrote:
....
> That's why there are special algorithms that perform real-to-half-complex
> FFTs in less than half the operations it takes to do the full complex FFT.
....
But if it only takes a few 10s of msec anyway, what's the difference?
Now if OP has truly humongous datasets, then that's something else, but
there's no indication of anything at all beyond the generalities of what
the problem or problem size(s) is/are. I'm "just askin'"...
--
|
|
0
|
|
|
|
Reply
|
dpb
|
9/23/2010 9:37:58 PM
|
|
dpb <none@non.net> wrote:
> James Van Buskirk wrote:
>> That's why there are special algorithms that perform real-to-half-complex
>> FFTs in less than half the operations it takes to do the full complex FFT.
> But if it only takes a few 10s of msec anyway, what's the difference?
Just in the last few days, there is a discussion in comp.dsp about
doing 256M sample FFT.
That should tame more than 10ms on most hardware.
-- glen
|
|
0
|
|
|
|
Reply
|
glen
|
9/23/2010 9:57:12 PM
|
|
glen herrmannsfeldt wrote:
> dpb <none@non.net> wrote:
>> James Van Buskirk wrote:
>
>>> That's why there are special algorithms that perform real-to-half-complex
>>> FFTs in less than half the operations it takes to do the full complex FFT.
>
>> But if it only takes a few 10s of msec anyway, what's the difference?
>
> Just in the last few days, there is a discussion in comp.dsp about
> doing 256M sample FFT.
>
> That should tame more than 10ms on most hardware.
Same OP, I guess?
Indeed, that's large enough to pay some attention, granted.
OTOH, there are folks who post who think an array of a few K or 10s of K
is "large"...
Again, I was "just askin'"...
--
|
|
0
|
|
|
|
Reply
|
dpb
|
9/23/2010 11:09:19 PM
|
|
In article <i7ggim$6ce$1@news.eternal-september.org>,
James Van Buskirk <not_valid@comcast.net> wrote:
>
>That's why there are special algorithms that perform real-to-half-complex
>FFTs in less than half the operations it takes to do the full complex FFT.
LESS THAN half? Not really, because you could use two of them to do
a full complex FFT - it's a linear transform, after all. While you
can get extra gains by using a different result representation, the
same is true for the full FFT. All right, I'm nit-picking :-)
Regards,
Nick Maclaren.
|
|
0
|
|
|
|
Reply
|
nmm1
|
9/24/2010 8:15:02 AM
|
|
nmm1@cam.ac.uk wrote:
> In article <i7ggim$6ce$1@news.eternal-september.org>,
> James Van Buskirk <not_valid@comcast.net> wrote:
>>That's why there are special algorithms that perform real-to-half-complex
>>FFTs in less than half the operations it takes to do the full complex FFT.
> LESS THAN half? Not really, because you could use two of them to do
> a full complex FFT - it's a linear transform, after all. While you
> can get extra gains by using a different result representation, the
> same is true for the full FFT. All right, I'm nit-picking :-)
I thought about that, too, but then I figured that
(2N)log(2N)
---------- is (slightly) greater than two.
N log(N)
but then you have to include the time for the moving the
two halves around, too. It was close enough for me.
-- glen
|
|
0
|
|
|
|
Reply
|
glen
|
9/24/2010 12:30:50 PM
|
|
<nmm1@cam.ac.uk> wrote in message news:i7hmm6$g5f$1@gosset.csi.cam.ac.uk...
> In article <i7ggim$6ce$1@news.eternal-september.org>,
> James Van Buskirk <not_valid@comcast.net> wrote:
>>That's why there are special algorithms that perform real-to-half-complex
>>FFTs in less than half the operations it takes to do the full complex FFT.
> LESS THAN half? Not really, because you could use two of them to do
> a full complex FFT - it's a linear transform, after all. While you
> can get extra gains by using a different result representation, the
> same is true for the full FFT. All right, I'm nit-picking :-)
While the number of multiplications to perform a full-complex FFT is exactly
twice that required to perform a real-to-half-complex FFT, the number of
additions required for a full-complex FFT, a_k(F) = 2*2**k-4+2*a_k(RF),
where a_k(RF) is the number of additions required to perfrom a real-to-half-
complex FFT of order 2**k. The 2*a_k(RF) part is there to do the real-to-
half-complex FFT on the real and imaginary parts separately, and that last
2*2**k-4 additions are there to combine the real and imaginary parts when
most of the work is done. Look at the operation count tables in the paper
I cited, or work it out yourself for k = 2, N = 2**k = 4.
--
write(*,*) transfer(0.64682312090346863D-153,(/'X'/));end
|
|
0
|
|
|
|
Reply
|
James
|
9/24/2010 1:35:31 PM
|
|
In article <i7i9f6$m1a$1@news.eternal-september.org>,
James Van Buskirk <not_valid@comcast.net> wrote:
>
>>>That's why there are special algorithms that perform real-to-half-complex
>>>FFTs in less than half the operations it takes to do the full complex FFT.
>
>> LESS THAN half? Not really, because you could use two of them to do
>> a full complex FFT - it's a linear transform, after all. While you
>> can get extra gains by using a different result representation, the
>> same is true for the full FFT. All right, I'm nit-picking :-)
>
>While the number of multiplications to perform a full-complex FFT is exactly
>twice that required to perform a real-to-half-complex FFT, the number of
>additions required for a full-complex FFT, a_k(F) = 2*2**k-4+2*a_k(RF),
>where a_k(RF) is the number of additions required to perfrom a real-to-half-
>complex FFT of order 2**k. The 2*a_k(RF) part is there to do the real-to-
>half-complex FFT on the real and imaginary parts separately, and that last
>2*2**k-4 additions are there to combine the real and imaginary parts when
>most of the work is done. Look at the operation count tables in the paper
>I cited, or work it out yourself for k = 2, N = 2**k = 4.
I think that you have misunderstood my point. The advantage you refer
to is an artifact of the representation and definition of operation,
and is not fundamental. For example, if you use accumulation as the
basic storage and addition operation, you get a different answer.
Similarly, if you choose to deliver a pair of numbers instead of one
as an element of the result. And so on.
This is more relevant than you may realise, because the classic
'operation count' ceased to be a useful indication of performance round
about 1970.
Regards,
Nick Maclaren.
|
|
0
|
|
|
|
Reply
|
nmm1
|
9/24/2010 2:54:46 PM
|
|
nmm1@cam.ac.uk wrote:
(snip, someone wrote)
>> The 2*a_k(RF) part is there to do the real-to-
>>half-complex FFT on the real and imaginary parts separately, and that last
>>2*2**k-4 additions are there to combine the real and imaginary parts when
>>most of the work is done. Look at the operation count tables in the paper
>>I cited, or work it out yourself for k = 2, N = 2**k = 4.
> I think that you have misunderstood my point. The advantage you refer
> to is an artifact of the representation and definition of operation,
> and is not fundamental. For example, if you use accumulation as the
> basic storage and addition operation, you get a different answer.
> Similarly, if you choose to deliver a pair of numbers instead of one
> as an element of the result. And so on.
If you want to better, then you need to be more specific, and
specify the exact FFT implementation. But yes, among others,
many machines should be able to pipeline the multiply-add,
even if they don't do it in one instruction.
> This is more relevant than you may realise, because the classic
> 'operation count' ceased to be a useful indication of performance round
> about 1970.
Well, some of that went away for a while as people moved from
supercomputers to personal computers with floating point processors.
Even so, I don't believe that they have quite caught up to what
high-end processors were doing in the 1970s.
Also, I do wonder how many now optimize the case for multiply by
zero or one. That could make a big difference in some cases
of the FFT, especially the case of a complex FFT with the
imaginary part all zero.
-- glen
|
|
0
|
|
|
|
Reply
|
glen
|
9/24/2010 5:07:27 PM
|
|
In article <i7ilse$qno$1@speranza.aioe.org>,
glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
>
>> This is more relevant than you may realise, because the classic
>> 'operation count' ceased to be a useful indication of performance round
>> about 1970.
>
>Well, some of that went away for a while as people moved from
>supercomputers to personal computers with floating point processors.
>Even so, I don't believe that they have quite caught up to what
>high-end processors were doing in the 1970s.
That wasn't quite my point. It never was realistic to count floating
point operations and use them to estimate performance, as addition
was always much cheaper than multiplication which was much cheaper
than division. But, after about 1970, floating-point addition
became cheaper than memory access, so the measure became totally
unrealistic.
Yes, there was a period in the 1980s when the microprocessors had
the performance characteristics of machines 15 years before that,
but that was short-lived and is long gone, anyway.
Regards,
Nick Maclaren.
|
|
0
|
|
|
|
Reply
|
nmm1
|
9/24/2010 6:03:01 PM
|
|
nmm1@cam.ac.uk wrote:
(nick wrote)
>>> This is more relevant than you may realise, because the classic
>>> 'operation count' ceased to be a useful indication of performance round
>>> about 1970.
(and then I wrote)
>>Well, some of that went away for a while as people moved from
>>supercomputers to personal computers with floating point processors.
>>Even so, I don't believe that they have quite caught up to what
>>high-end processors were doing in the 1970s.
> That wasn't quite my point. It never was realistic to count floating
> point operations and use them to estimate performance, as addition
> was always much cheaper than multiplication which was much cheaper
> than division.
OK, but, for a given type of problem, the ratios (add to multiply,
multiply to divide) don't change so much, so it might not be
so far off. Now, when people quote megaFLOPS or gigaFLOPS, well,
yes, that isn't so useful. About as good as the Meaningless
Indicator of Processor Speed that people used to use (MIPS).
> But, after about 1970, floating-point addition
> became cheaper than memory access, so the measure became totally
> unrealistic.
So, yes, many algorithms are memory access limited.
> Yes, there was a period in the 1980s when the microprocessors had
> the performance characteristics of machines 15 years before that,
> but that was short-lived and is long gone, anyway.
As far as I know, no current processors use the fast iterative
divide algorithms that were used in high-end processors like
the Cray-1 and 360/91. But maybe most of the time divide isn't as
important as multiply.
-- glen
|
|
0
|
|
|
|
Reply
|
glen
|
9/24/2010 8:01:10 PM
|
|
<nmm1@cam.ac.uk> wrote in message news:i7ipeb$qvc$1@gosset.csi.cam.ac.uk...
> That wasn't quite my point. It never was realistic to count floating
> point operations and use them to estimate performance, as addition
> was always much cheaper than multiplication which was much cheaper
> than division. But, after about 1970, floating-point addition
> became cheaper than memory access, so the measure became totally
> unrealistic.
Actually, with a fully pipelined floating point multiplier, multiplies
are free and additions only have cost. This is true because there are
more additions than multiplications in an FFT. Memory access is indeed
more expensive than anything else per operation, but the key to
performance in FFTs is to run small computational kernels from the
register file while keeping supersets of them in cache. The only
true memory operations you need are a few transposes -- this is O(N)
not O(N*log(N)) provided your cache can hold at least sqrt(N) values.
--
write(*,*) transfer(0.64682312090346863D-153,(/'X'/));end
|
|
0
|
|
|
|
Reply
|
James
|
9/24/2010 8:46:12 PM
|
|
On 09/23/2010 08:39 PM, dpb wrote:
> Ralph Kube wrote:
> ...
>> The code uses FFTs only in the case my arrays are of type complex.
>> That is because I use the fftw library and they advise against using
>> their routines that operate on real data over using the routines
>> operating on complex data.
> * Fast transforms of purely real input or output data.
> ...
>
> So, it doesn't seem to me that that coincides w/ the above statement.
Digging deeper in the documentation, the section about 2d transformation
of real data
(http://www.fftw.org/fftw3_doc/The-Halfcomplex_002dformat-DFT.html#The-Halfcomplex_002dformat-DFT)
I find:
> Thus, if a multi-dimensional real-input/output DFT is required, we
> recommend using the ordinary r2c/c2r interface (see Multi-Dimensional
> DFTs of Real Data).
> Again, there isn't enough information about what you're really after or
> how similar (or dissimilar) the two paths are to make any really
> definitive recommendations, but I'd first of all wonder if the sizes are
> such that it really matters much, anyway.
> And, even if the size is significant, are you really going to routinely
> treat a real time-series of twice the length as that of the complex
> valued one so there's really any benefit in the problem size that is
> analyzed one way versus the other?
My program integrates a set of PDEs in time. When I use periodic
boundary conditions, I reduce the problem to time integration of ODEs
by Fourier transforms. In this case I use complex arrays for the Fourier
coefficients.
For Dirichlet or Neuman BCs I use a finite difference scheme and real
arrays. Solving this equations requires different techniques.
Nevertheless, in both cases I use many routines that are quiet similar.
So far I have two different programs, depending on which boundary
conditions I use. It would be much more convenient to have only one
program there.
Typical FFT sizes are 256x256 - 8192x8192, there are 3 fields I
transform and I store 4 timesteps of each field. This gives
2**13 * 2**13 * 16 * 3 * 4 = 12GB as an upper limit of memory usage
for the complex array case. This is quiet a lot.
Good that there are computers with more RAM.
> IOW, is there really any benefit in even worrying over the two paths
> particularly, especially since as noted the actual memory is only
> allocated depending on the size and type requested (as the ALLOCATABLE
> statement only creates a handle if you will, actual memory isn't
> allocated until the ALLOCATE statement is executed)?
I toyed around with the code now and will probably go this way:
real(kind=8), allocatable :: theta_r(:,:,:)
double complex, allocatable :: theta_c(:,:,:)
if real then
allocate theta_r
else
allocate theta_c
I think in combination with generic type bound procedures this will be a
good solution to my problem.
> So, I'm wondering if you couldn't just have COMPLEX and forget about
> REAL and just leave the imaginary part of the complex array zero for the
> real inputs and be just as well off in terms of what actual problems you
> solve.
This is a very good idea. But unfortunately there are some pieces of
code that take DFTs of the real arrays as well and they make assumption
on the ordering of the array in memory. Otherwise would this be a great
solution. But I will definitely look into it, as these parts of the code
look a bit too obscure for my taste. I might rewrite them in the future.
> That leaves the question of which algorithm to apply to be determined;
> knowing nothing of the application that can't be answered or even
> conjectured upon and that's not a Fortran question at all, of course.
No, its not a Fortran question.
> Anyway, I'm wondering if you're making a problem that doesn't exist by
> over-thinking it...
Probably :) For me Fortran is a really tough language to learn. But I
really appreciate the discussions on this list.
Cheers, Ralph
|
|
0
|
|
|
|
Reply
|
rku000 (7)
|
9/25/2010 3:36:33 PM
|
|
Ralph Kube <rku000@post.uit.no> wrote:
> real(kind=8), allocatable :: theta_r(:,:,:)
> double complex, allocatable :: theta_c(:,:,:)
I advise against using the nonstandard double complex in new code. It
won't work on some compilers. There were reasons for in in f77, which
had no portable alternatives. Some f77 compilers supported a declaration
as double complex, others accepted complex*16, some accepted both, and
some accepted neither.
But as long as you are using f90 constructs anyway, and in particular as
long as you are already using kind= specifier for real, it makes a lot
more sense to use it for complex as well. I suppose you might not be
aware that the kind= specifier is also allowed for complex and is
guaranteed to use the same kind values as real (you don't double the
value, as was typical with the old nonstandard complex*16 syntax). Thus
complex(kind=8)
instead of double complex.
I also recommend against using hard-wired kind numbers. The standard
doesn't guarantee those values and there are compilers that don't use
them. For example, for the compiler I preferentially use, real(kind=8)
is an error; double precision is real(kind=2), at least with the default
compiler settings. I might have thought you just posted that form to
make the example brief, and perhaps you did, but your use of double
complex instead of complex with a kind specifier makes me at least half
suspect that you aren't aware of the issue. I recommend instead
something like
integer, parameter :: dp_kind = kind(0.0d0)
!-- or better yet, use selected_real_kind, but I'm in a slight hurry
!-- and that takes longer to explain.
!-- The name dp_kind is just my personal style;
!-- I like to have _kind in the parameter names and then omit
!-- the optional kind= keyword in the specifier, but other people
!-- use different styles on that.
real(dp_kind), allocatable :: theta_r(:,:,:)
complex(dp_kind), allocatable :: theta_c(:,:,:)
--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain
|
|
0
|
|
|
|
Reply
|
nospam
|
9/25/2010 4:26:45 PM
|
|
On Thu, 23 Sep 2010 16:37:58 -0500, dpb <none@non.net>
wrote in <i7ghim$ai9$1@news.eternal-september.org>:
> James Van Buskirk wrote:
> ...
>> That's why there are special algorithms that perform real-to-half-complex
>> FFTs in less than half the operations it takes to do the full complex FFT.
> ...
> But if it only takes a few 10s of msec anyway, what's the difference?
> Now if OP has truly humongous datasets, then that's something else, but
> there's no indication of anything at all beyond the generalities of what
> the problem or problem size(s) is/are. I'm "just askin'"...
Well, in a case I worked on a while back, using a
real-to-half-cplx function meant that I could fit a 4Kx4K digital hologram
reconstruction entirely within the memory space of my 512 MB graphics
card. Using the original "add a zero imaginary part and do a full complex
transform" method I didn't have enough GPU memory to do the intermediate
transform once I managed to parallelise it (rather than bringing the FFT
results back out to PC memory for serial transform code). The resultant
speed-up owed a lot to avoiding the intermediate transfers, not just the
parallel processing of the transform.
--
Ivan Reid, School of Engineering & Design, _____________ CMS Collaboration,
Brunel University. Ivan.Reid@[brunel.ac.uk|cern.ch] Room 40-1-B12, CERN
KotPT -- "for stupidity above and beyond the call of duty".
|
|
0
|
|
|
|
Reply
|
Dr
|
10/7/2010 9:33:51 PM
|
|
|
26 Replies
278 Views
(page loaded in 0.161 seconds)
Similiar Articles: Problem with the variable type of function 'isfinite' - comp.soft ...how to define the input variable for different functions with same ..... to define ... function pushbutton1_Callback(hObject ... Defining variable type at runtime? - comp ... Static reflection - a base for runtime reflection? - comp.lang.c++ ...... but they are old and most is on runtime ... name of qualifiers, name of parameter variables etc e.g. type ... Define a variable at a fixed address? - comp.compilers.lcc ... class definition containing member of own type - comp.lang.c++ ...What is "incomplete type?" (compiler error) - comp.lang.c++ ... Static reflection - a base for runtime ... 0X Iterating over the member variables of any class type ... non ... Define a variable at a fixed address? - comp.compilers.lcc ...I need to define a variable at a fixed address of ... as argument or when assigning to a variable of that type. What is the ... ... Static reflection - a base for runtime ... Calculating member offset at runtime - comp.lang.c++.moderated ...But if I don't mind doing the offset calculation at runtime, would a ... define X(TYPE, FIELD) TYPE FIELD; P__FIELDLIST #undef X #define X(TYPE ... Global Variable vs Struct Variable Question - comp.lang.asm.x86 ...You could have a master function that could determine the "type" of ... How can we define global but constant variable? - comp.soft-sys ..... global but constant variable ... Get variable by name - comp.lang.asm.x86I need get a variable value passing it's name as ... single_argument(argument,name) begin puts "Type ... ... Computer Hardware Hi all, I want to get variable's name on runtime, any ... Detecting if a variable is defined - comp.soft-sys.matlab ...... and a variable to a subroutine ..... is put into READ-ONLY memory and the address of that ... be you six months from now) understands what is ... Defining variable type ... BASIC problem calling LIB$ RTL - comp.os.vms... as a numeric variable of the default type and size. The > > implicitly declared variable was initialized to zero at runtime ... symbolic editor. >>I prefer seeing what is ... converting a character string into a variable name - comp.lang ...... PRESSURE(istart:iend), I would store 'PRESSURE' in a character variable say CHAROUT of type ... the executable for the benefit of a debugger or error traceback runtime. Pointers to global and stack variables - comp.compilers... making pointers point > to a array based on runtime ... 58 -0800, satishkingdom@GMAIL.COM wrote: > >>What is ... Define a variable at a fixed address? - comp.compilers.lcc ... What is "incomplete type?" (compiler error) - comp.lang.c++ ...What is a "Type 3 Error"? - comp.sys.mac.apps... unstuffed it, etc. went to ... There are other variables > such as class_out declared as type character. All of my variables ... Explanation needed for const int "error: variably modified ... at ...You still have to look up the value at runtime (since ... Automatic const variables that are defined without an ... they do run the risk of people reading more into what is ... JScript Error 800a1391 'MM_NameOfPage_STRING' is undefined - comp ...... receive the following error: Microsoft JScript runtime ... asp" --> <% // *** Edit Operations: declare variables ... so the error message matches to the error type ... Defining function in Mupad - comp.soft-sys.matlab... piecewise functions using matlabFunction ..... set the variable ... or method 'gt' for input > arguments of type 'sym'. > > What is the reason for this? ... arguments of type ... Variables and Arguments - Microsoft Corporation: Software ...Variables take on values at runtime and these values are ... definition specifies the type of the variable ... // Define a variable named "str" of type string. Variable< string > ... c# - How to define a ref-type variable from a assembly loaded at ...In order to define a type at compile time, you must reference the assembly that contains the type at compile time. If you are binding at runtime, you cannot define an ... 7/22/2012 1:33:31 PM
|