Hi all,
I have a huge F90 code composed by several modules, with several
module procedures each, and a main program. No external procedures are
used. I've come across this situation: some modules have very large
arrays declared in the global scope (with sizes of order ~50000 each),
but some of these arrays are only used conditionally. They might be
used elsewhere, but there's also the possibility that they might not.
The situation is the following, schematically:
MODULE mod
integer, parameter :: N = 50000
real, allocatable, save :: U(:)
real :: V(N)
CONTAINS
subroutine using_UV
allocate(U(N))
!exec-statements using U and V
end subroutine using_UV
subroutine using_V
!exec-statements using V
end subroutine using_V
subroutine kill_U
deallocate(U)
end subroutine kill_U
END MODULE mod
PROGRAM foo
use mod, only: U, V, using_UV, using_V, kill_U
logical :: L
!(...)
call using_V
!calculate L:
L = <something>
if (L) then
call using_UV
!exec-statements using U and V
call kill_U
end if
!(...)
END PROGRAM foo
The array U will only be used if L is .TRUE., but V is necessary to
exist. Naively, I would think that when U is declared explicitly as an
automatic array ( REAL :: U(N) ), it allocates a large chunk of memory
that isn't used, hence that would make the program more inefficient
than when U is given the attribute ALLOCATABLE. But I don't know if
this will make any difference in speed or memory management after the
program is compiled (especially if the arrays get even larger).
Therefore my naive question is: is the attribute ALLOCATABLE for U
wise in this case, or is it irrelevant?
Best,
--helvio
|
|
0
|
|
|
|
Reply
|
helvio.vairinhos (21)
|
6/2/2010 4:14:31 AM |
|
helvio <helvio.vairinhos@googlemail.com> wrote:
> some modules have very large
> arrays declared in the global scope (with sizes of order ~50000 each),
> but some of these arrays are only used conditionally....
> Naively, I would think that when U is declared explicitly as an
> automatic array ( REAL :: U(N) ), it allocates a large chunk of memory
> that isn't used, hence that would make the program more inefficient
> than when U is given the attribute ALLOCATABLE. But I don't know if
> this will make any difference in speed or memory management after the
> program is compiled (especially if the arrays get even larger).
>
> Therefore my naive question is: is the attribute ALLOCATABLE for U
> wise in this case, or is it irrelevant?
I would say that your analysis seems accurate... except for one very
important detail. There was a day when things were different, but today,
unless your environment is extremely unusual, a size of 50,000 single
precision reals (so 200,000 bytes with most compilers) does not count as
"very large". Consider that your typical consumer computer today
probably has around 10,000 times that much physical memory, and a
virtual address space of also about 10,000 times that much. Even a old
machine that isn't big enough to install a current OS version probably
has physical memory of 1,000 times that.
If you really had arrays large enough to have significant impact on
memory availability, then I'd agree with your concern. But you are
multiple orders of magnitude away from that area. On any reasonable
current system, allocating 200kB of virtual memory that you don't
reference is not likely to make a difference that you can measure.
Neither is 10 times that much, or 100. At 1,000 times that much, you
might have grounds to worry.
So do it whichever way is most convenient for other reasons. I'd not
worry about the performance difference.
Do note that I am talking only about the difference between allocatable
versus automatic. There can be more cost to declaring static arrays,
even of relatively modest size (because they often end up bloating the
size of the executable program file and need to be loaded from it into
memory).
--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain
|
|
0
|
|
|
|
Reply
|
nospam
|
6/2/2010 4:52:15 AM
|
|
Richard Maine <nospam@see.signature> wrote:
> helvio <helvio.vairinhos@googlemail.com> wrote:
>> some modules have very large
>> arrays declared in the global scope (with sizes of order ~50000 each),
>> but some of these arrays are only used conditionally....
>> Naively, I would think that when U is declared explicitly as an
>> automatic array ( REAL :: U(N) ), it allocates a large chunk of memory
>> that isn't used, hence that would make the program more inefficient
>> than when U is given the attribute ALLOCATABLE. But I don't know if
>> this will make any difference in speed or memory management after the
>> program is compiled (especially if the arrays get even larger).
If it is in a module like that, I believe it will be static if
not allocatable. On many machines, access to static arrays is
sligthly more efficient than automatic, and automatic might be
a little more efficient that allocatable.
The differences are pretty small, and pipelining might even
remove them. You probably shouldn't worry about them unless
you actually have timing results that show that they are
specifically a problem.
Automatic arrays are often on the stack, which requires that
enough (virtual memory for) stack be available. Watch for
systems with stack limits.
>> Therefore my naive question is: is the attribute ALLOCATABLE for U
>> wise in this case, or is it irrelevant?
> I would say that your analysis seems accurate... except for one very
> important detail. There was a day when things were different, but today,
> unless your environment is extremely unusual, a size of 50,000 single
> precision reals (so 200,000 bytes with most compilers) does not count as
> "very large". Consider that your typical consumer computer today
> probably has around 10,000 times that much physical memory, and a
> virtual address space of also about 10,000 times that much. Even a old
> machine that isn't big enough to install a current OS version probably
> has physical memory of 1,000 times that.
Well, also that with a virtual memory system it should not even
be in memory if not accessed. (With the usual 4K page size.)
> If you really had arrays large enough to have significant impact on
> memory availability, then I'd agree with your concern. But you are
> multiple orders of magnitude away from that area. On any reasonable
> current system, allocating 200kB of virtual memory that you don't
> reference is not likely to make a difference that you can measure.
> Neither is 10 times that much, or 100. At 1,000 times that much, you
> might have grounds to worry.
There are still some strange systems around. I had problems
once on an Alpha system with 4GB memory trying to allocate
100K of static memory. At least with some systems and compilers
there is a small limit for static allocation.
> So do it whichever way is most convenient for other reasons. I'd not
> worry about the performance difference.
> Do note that I am talking only about the difference between allocatable
> versus automatic. There can be more cost to declaring static arrays,
> even of relatively modest size (because they often end up bloating the
> size of the executable program file and need to be loaded from it into
> memory).
That should only happen for initialized static data. C requires
that all static data be initialized (to zero if not otherwise
specified), and some Fortran compilers also do that. It seems
that many can efficiently initialize static zeros, but other
than zeros are written to the executable file. (Even very
large arrays of the same value.)
-- glen
|
|
0
|
|
|
|
Reply
|
glen
|
6/2/2010 5:30:10 AM
|
|
glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
> Richard Maine <nospam@see.signature> wrote:
> > helvio <helvio.vairinhos@googlemail.com> wrote:
>
> >> some modules have very large
> >> arrays declared in the global scope (with sizes of order ~50000 each),
> >> but some of these arrays are only used conditionally....
....
> If it is in a module like that, I believe it will be static if
> not allocatable.
Oops. I missed that because I didn't look at the code closely enough. I
just took the OP's word for it that he was talking about automatic
arrays, but that was inaccurate. An array dimensioned N is not automatic
when N is a parameter, as in the example code. In a non-module scope, it
might possibly be implemented in a way that looks like an automatic, but
it is not automatic as defined by the Fortran standard. In a module
scope, as you imply, you can't have an array dimensioned N if N isn't a
parameter.
> There are still some strange systems around. I had problems
> once on an Alpha system with 4GB memory trying to allocate
> 100K of static memory.
Yes, there are strange systems around. But any one where you have
trouble with 100k of static memory easily fits in the category I
described as "unless your environment is extremely unusual."
--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain
|
|
0
|
|
|
|
Reply
|
nospam
|
6/2/2010 5:49:35 AM
|
|
On Jun 2, 6:30=A0am, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> Richard Maine <nos...@see.signature> wrote:
> > helvio <helvio.vairin...@googlemail.com> wrote:
>
> If it is in a module like that, I believe it will be static if not alloca=
table.
Yes, sorry, I meant static and not automatic. I have several instances
of automatic arrays in my code, but this is not the case.
My question comes from the fact that my code contains several
instances of such modules as in my example, with many potentially non-
usable arrays like "U", but also many "V" arrays. In fact, I have more
"U"'s than "V"'s, in terms of the amount of allocated memory they
require. I am not afraid of exceeding physical memory space, I think I
am still far from that (at least for now), but I was worried that the
efficiency in accessing the "V"'s in physical memory would be affected
by the size of the total allocated memory. Naively, I thought that the
less allocated memory I have, the more efficient it is to access it.
And since I have the option of declaring the "U"'s as non-static, I
imagined that by having *only* "V"'s allocated in the physical memory,
their access would be more efficient especially if the sizes of "U"
and "V" increase.
In sum, I think my doubts reduce to the question of whether the
efficiency of accessing the physical memory depends on the size of the
allocated memory, or if it is independent of it.
I also have the following related question: can the ALLOCATION /
DEALLOCATION statements slow down the program if they are called
multiple times, as compared with a single static declaration of "U"?
e.g. by introducing a loop in my example above:
do i=3D1,M
call using_UV ! U is allocated here
call kill_U ! U is deallocated here
end do
Thanks!
--helvio
|
|
0
|
|
|
|
Reply
|
helvio
|
6/2/2010 12:46:36 PM
|
|
"helvio" <helvio.vairinhos@googlemail.com> wrote in message
news:0a10cc09-880c-4c42-9b9d-21a499ec18ff@y12g2000vbr.googlegroups.com...
| Hi all,
|
| I have a huge F90 code composed by several modules, with several
| module procedures each, and a main program. No external procedures are
| used. I've come across this situation: some modules have very large
| arrays declared in the global scope (with sizes of order ~50000 each),
| but some of these arrays are only used conditionally. They might be
| used elsewhere, but there's also the possibility that they might not.
| The situation is the following, schematically:
If U is used conditionally, ALLOCATABLE is fine.
That's the kind of thing for which it is intended to be used.
I notice that N is 50,000.
Is that some maximum value, or is it just a value that is larger
than anything you expect.
For instance, could N be read in?
|
|
0
|
|
|
|
Reply
|
robin
|
6/2/2010 1:20:09 PM
|
|
helvio <helvio.vairinhos@googlemail.com> wrote:
> In sum, I think my doubts reduce to the question of whether the
> efficiency of accessing the physical memory depends on the size of the
> allocated memory, or if it is independent of it.
It should be independent of it, or anyway close enough that you won't be
able to measure the difference.
>
> I also have the following related question: can the ALLOCATION /
> DEALLOCATION statements slow down the program if they are called
> multiple times, as compared with a single static declaration of "U"?
Yes, definitely. Of course, as with many things, that's only going to
matter if it is in an inner enough loop to be called lots of times.
It would seem that some of the classic advice on performance
optimization is in order. I don't feel like digging up the exact quotes;
there are some pretty well known ones. But roughly...
1. Worry about making the code right before you worry about making it
fast.
2. Improvements in algorithm are worth far more than code tweaks.
3. Even major improvements in code performance aren't going to matter
unless they are in time-critical potions of the program in the first
place. That one applies to your question above. Allocation and
deallocation do take time, but if much computation happens between the
allocation and deallocation, then their time usage is not likely to
matter relative to the computation.
4. When you do get to trying to tweak code to improve performance,
*MEASURE* the effects with your own code. Even experts can and regularly
do get surprised and things do vary from code to code. That means you
should not just accept performance judgements that people might give you
here. Yes, "people" includes me.
--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain
|
|
0
|
|
|
|
Reply
|
nospam
|
6/2/2010 3:35:50 PM
|
|
On Jun 2, 4:35=A0pm, nos...@see.signature (Richard Maine) wrote:
> helvio <helvio.vairin...@googlemail.com> wrote:
> > In sum, I think my doubts reduce to the question of whether the
> > efficiency of accessing the physical memory depends on the size of the
> > allocated memory, or if it is independent of it.
>
> It should be independent of it, or anyway close enough that you won't be
> able to measure the difference.
Thank you! :)
> > I also have the following related question: can the ALLOCATION /
> > DEALLOCATION statements slow down the program if they are called
> > multiple times, as compared with a single static declaration of "U"?
>
> Yes, definitely. Of course, as with many things, that's only going to
> matter if it is in an inner enough loop to be called lots of times.
Yup! These statements are indeed called many times, in one of the most
time consuming areas of my code. But the amount of matrix
multiplications among rank-2 subarrays of the U's and V's between
ALLOCATE and DEALLOCATE will definitely overshadow the amount of time
taken to allocate them.
My main worry was that the program could slow down if memory access
depended significantly on the size of the total allocated memory
(because I will have to access the memory locations of the V's many
times for matrix-multiplying their rank-2 subarrays). I always had
this idea in my head that physical memory access is one of the slowest
elementary processes, I just don't have a feeling for how
significantly slow it is.
But since you say that memory access is essentially independent of the
size of allocated memory, all I have to worry about is not to exceed
the available physical memory (N =3D 50000 was just an example). ;)
> It would seem that some of the classic advice on performance
> optimization is in order. I don't feel like digging up the exact quotes;
> there are some pretty well known ones.
Yup! This doesn't stop me at all from writing my code! I asked it
mostly as an academic question, to learn a little bit more.
> 1. Worry about making the code right before you worry about making it
> fast.
*thumbs up*
> 2. Improvements in algorithm are worth far more than code tweaks.
*thumbs up*
> 3. Even major improvements in code performance aren't going to matter
> unless they are in time-critical potions of the program in the first
> place. That one applies to your question above. Allocation and
> deallocation do take time, but if much computation happens between the
> allocation and deallocation, then their time usage is not likely to
> matter relative to the computation.
Yup! Not a problem.
> 4. When you do get to trying to tweak code to improve performance,
> *MEASURE* the effects with your own code. Even experts can and regularly
> do get surprised and things do vary from code to code. That means you
> should not just accept performance judgements that people might give you
> here. Yes, "people" includes me.
It's not a major tweak, it's just about choosing between two
straightforward ways of declaring arrays, both of which work. I'll
stick to one of them until it's time to test my code. Only then I will
measure the difference between the two options. And if I witness any
significant effects, then I might come back to this post and make a
comment about it.
Thanks a lot to all! You're always very helpful!
--helvio
|
|
0
|
|
|
|
Reply
|
helvio
|
6/2/2010 4:14:03 PM
|
|
helvio <helvio.vairinhos@googlemail.com> wrote:
> On Jun 2, 4:35�pm, nos...@see.signature (Richard Maine) wrote:
>> helvio <helvio.vairin...@googlemail.com> wrote:
>> > In sum, I think my doubts reduce to the question of whether the
>> > efficiency of accessing the physical memory depends on the size of the
>> > allocated memory, or if it is independent of it.
>> It should be independent of it, or anyway close enough that
>> you won't be able to measure the difference.
In some theoretical calculations log(n) is used, and as an
approximation that probably isn't so bad.
>> > I also have the following related question: can the ALLOCATION /
>> > DEALLOCATION statements slow down the program if they are called
>> > multiple times, as compared with a single static declaration of "U"?
>> Yes, definitely. Of course, as with many things, that's only going to
>> matter if it is in an inner enough loop to be called lots of times.
> Yup! These statements are indeed called many times, in one of the most
> time consuming areas of my code. But the amount of matrix
> multiplications among rank-2 subarrays of the U's and V's between
> ALLOCATE and DEALLOCATE will definitely overshadow the amount of time
> taken to allocate them.
It is mostly a problem in object-oriented programming. Objects
have to be allocated and deallocated, often many times. A matrix
usually won't be allocated in the inner loop, but two loops out
(for the two dimensions of the matrix).
> My main worry was that the program could slow down if memory access
> depended significantly on the size of the total allocated memory
> (because I will have to access the memory locations of the V's many
> times for matrix-multiplying their rank-2 subarrays). I always had
> this idea in my head that physical memory access is one of the slowest
> elementary processes, I just don't have a feeling for how
> significantly slow it is.
Well, it is but the rules are more complicated. Consider the two:
DO I=1,N
DO J=1,N
A(I,J)=B(I,J)+C(I,J)
ENDDO
ENDDO
DO J=1,N
DO I=1,N
A(I,J)=B(I,J)+C(I,J)
ENDDO
ENDDO
The number of memory accesses is the same for both,
but the times might be very different.
> But since you say that memory access is essentially independent of the
> size of allocated memory, all I have to worry about is not to exceed
> the available physical memory (N = 50000 was just an example). ;)
Probably you should stay below about half physical memory.
The OS may be using some, and that can make a big difference.
>> It would seem that some of the classic advice on performance
>> optimization is in order. I don't feel like digging up the
>> exact quotes; there are some pretty well known ones.
> Yup! This doesn't stop me at all from writing my code! I asked it
> mostly as an academic question, to learn a little bit more.
>> 1. Worry about making the code right before you worry
>> about making it fast.
> *thumbs up*
>> 2. Improvements in algorithm are worth far more than code tweaks.
> *thumbs up*
>> 3. Even major improvements in code performance aren't going to matter
>> unless they are in time-critical potions of the program in the first
>> place. That one applies to your question above. Allocation and
>> deallocation do take time, but if much computation happens between the
>> allocation and deallocation, then their time usage is not likely to
>> matter relative to the computation.
Well, if you add a bunch of 2x2 matrices you might notice...
> Yup! Not a problem.
>> 4. When you do get to trying to tweak code to improve performance,
>> *MEASURE* the effects with your own code. Even experts can and regularly
>> do get surprised and things do vary from code to code. That means you
>> should not just accept performance judgements that people might give you
>> here. Yes, "people" includes me.
Code to Code, compiler to compiler, system to system.
Way too many ways to keep track of.
> It's not a major tweak, it's just about choosing between two
> straightforward ways of declaring arrays, both of which work. I'll
> stick to one of them until it's time to test my code. Only then I will
> measure the difference between the two options. And if I witness any
> significant effects, then I might come back to this post and make a
> comment about it.
For smaller arrays, one guess is that it takes one more memory
access for automatic over static, and one more for allocatable
over automatic. That can be less true as they get larger, though.
-- glen
|
|
0
|
|
|
|
Reply
|
glen
|
6/2/2010 9:16:53 PM
|
|
"helvio" <helvio.vairinhos@googlemail.com> wrote in message
news:5a8433aa-8b46-4485-98b0-dab0b822b1d7@f14g2000vbn.googlegroups.com...
I also have the following related question: can the ALLOCATION /
DEALLOCATION statements slow down the program if they are called
multiple times, as compared with a single static declaration of "U"?
e.g. by introducing a loop in my example above:
do i=1,M
call using_UV ! U is allocated here
call kill_U ! U is deallocated here
end do
ALLOCATE and DEALLOCATE do take extra time,
but it is unlikely you could notice the time, let alone measure it.
CALLing a subroutine will take more time than ALLOCATE does.
You would have to ALLOCATE / DEALLOCATE a million
times before the time becomes significant,
and even then, the time taken by the remainder of the loop
will be far far far greater than the time taken by ALLOCATE / DEALLOCATE.
|
|
0
|
|
|
|
Reply
|
robin
|
6/3/2010 1:57:31 AM
|
|
helvio wrote:
....
> I also have the following related question: can the ALLOCATION /
> DEALLOCATION statements slow down the program if they are called
> multiple times, as compared with a single static declaration of "U"?
> e.g. by introducing a loop in my example above:
>
> do i=1,M
> call using_UV ! U is allocated here
> call kill_U ! U is deallocated here
> end do
....
The answer is, as others have said, "of course"--you can't do something
that you weren't doing otherwise and expect it to not cost at least
something. I'd reiterate Richard's comment: test it on you code and
see if you're curious enough to care for academic reasons. If it's
actually optimizing, profile first...
My only reason really for posting since none of that was anything
different than what has already been posted is "why are you doing this"
in regards to the code snippet above?
Unless you have something else to do w/ the memory you're releasing and
you're immediately going to reclaim it as the above code does, there's
absolutely nothing to be gained by deallocating anything until the loop
completes. In that case, whatever time the de- and re-allocation takes,
however, small, is simply wasted overhead.
If there's some other memory hog of other memory besides this in the
loop not shown, then the answer might be "maybe", but as somebody else
noted (I think in this thread; I've only been browsing, not reading)
modern OS will page swap out the section that's not used anyway behind
your back if it determines it needs something else and that page gets musty.
The one thing I see that might be a real detriment in this is that it is
at least possible that you could end up fragmenting memory badly by
doing this and actually cause a previously working program that was
robust to become not so. I don't know that it is particularly likely w/
the same memory sizes being released/reclaimed w/o something else in the
process going on in between, but add in the other code previously
mentioned and the fact there may be other applications in background,
etc., etc., and odds shorten...
All in all, unless I had a very clear and specific reason I'd certainly
not code that way in anything that was remotely like the actual snippet.
Granted, that may not be what the actual code really resembles; see
above... :)
--
|
|
0
|
|
|
|
Reply
|
dpb
|
6/3/2010 2:27:08 AM
|
|
dpb <none@non.net> wrote:
> helvio wrote:
>> I also have the following related question: can the ALLOCATION /
>> DEALLOCATION statements slow down the program if they are called
>> multiple times, as compared with a single static declaration of "U"?
>> e.g. by introducing a loop in my example above:
(snip)
> The answer is, as others have said, "of course"--you can't do something
> that you weren't doing otherwise and expect it to not cost at least
> something. I'd reiterate Richard's comment: test it on you code and
> see if you're curious enough to care for academic reasons. If it's
> actually optimizing, profile first...
> My only reason really for posting since none of that was anything
> different than what has already been posted is "why are you doing this"
> in regards to the code snippet above?
> Unless you have something else to do w/ the memory you're releasing and
> you're immediately going to reclaim it as the above code does, there's
> absolutely nothing to be gained by deallocating anything until the loop
> completes. In that case, whatever time the de- and re-allocation takes,
> however, small, is simply wasted overhead.
Well, sometimes you have a loop that may be processing different
sized arrays. Then do you test the previous size before
deallocating and reallocating the new size?
> If there's some other memory hog of other memory besides this in the
> loop not shown, then the answer might be "maybe", but as somebody else
> noted (I think in this thread; I've only been browsing, not reading)
> modern OS will page swap out the section that's not used anyway behind
> your back if it determines it needs something else and that page
> gets musty.
Actually, I posted that in terms of the static memory case, but
yes it applies in the allocatable case, too.
> The one thing I see that might be a real detriment in this is that it is
> at least possible that you could end up fragmenting memory badly by
> doing this and actually cause a previously working program that was
> robust to become not so. I don't know that it is particularly likely w/
> the same memory sizes being released/reclaimed w/o something else in the
> process going on in between, but add in the other code previously
> mentioned and the fact there may be other applications in background,
> etc., etc., and odds shorten...
If there was other allocation in the same loop it could easily
fragment all the rest of memory. One of the best ways to fragment
is to allocate/copy/deallocate ever increasing sizes of more
than one array inside a loop. Each one will be too big for the
hole left by the previous one.
> All in all, unless I had a very clear and specific reason I'd certainly
> not code that way in anything that was remotely like the actual snippet.
> Granted, that may not be what the actual code really resembles; see
> above... :)
It seems that it was supposed to be a test, not the actual code.
-- glen
|
|
0
|
|
|
|
Reply
|
glen
|
6/3/2010 4:02:13 AM
|
|
On Jun 3, 5:02=A0am, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
>
> It seems that it was supposed to be a test, not the actual code.
>
Yes. I want to know the pros and cons of multiple dynamic allocation
when large volumes of memory are used. My code doesn't actually do
that. My code is big and uses reasonably large arrays that must be
kept during the whole run, and a number of other large arrays that are
used only temporarily and/or conditionally. And their size varies. I
am yet far from exceeding physical memory, but my code will increase
with time and there is a good possibility that one day it will require
a significant portion of the available memory, and perhaps even
exceeding it (it has happened with other people in my field of
research). And if/when that happens I wouldn't like to have to recode
it.
I've been using dynamic allocation for the large temporary arrays of
my code. They are not used simultaneously, so I save a lot of memory
with respect to static allocation. What I wanted to know is the pros
and cons of *not* using static allocation when I can.
From the comments of all of you, I think I can conclude the following:
as long as I am using a small portion of the available memory, it is
wise to use static allocation, because multiple allocation/
deallocation might fragment the memory badly (unless I iteratively
allocate ever increasing arrays); and when the total required memory
by static allocation gets significant with respect to the total
available memory, then allocation/deallocation might be a good/the
only option (but then again, fragmentation might also be an issue).
I think that everytime I create a module for my code, and after I
defrag it, I will create two copies of it. Something like
'mod_modname_static.f90' and 'mod_modname_alloc.f90'. I will use the
static allocation version by default, and the dynamic allocation
version only if I see that my temporary arrays are too big.
Cheers,
--helvio
|
|
0
|
|
|
|
Reply
|
helvio
|
6/3/2010 12:11:50 PM
|
|
helvio <helvio.vairinhos@googlemail.com> wrote:
(snip)
> I think that everytime I create a module for my code, and after I
> defrag it, I will create two copies of it. Something like
> 'mod_modname_static.f90' and 'mod_modname_alloc.f90'. I will use the
> static allocation version by default, and the dynamic allocation
> version only if I see that my temporary arrays are too big.
Well, you could use the C preprocessor, which is supported by
many Fortran compilers, to select between the appropriate statements
based on compiler command line options.
#ifdef ALLOC
real, allocatable:: x(:,:)
#else
real x(100,100)
#endif
Then, at least for compilers based on gcc, the -DALLOC
command line option will select the allocatable version.
-- glen
|
|
0
|
|
|
|
Reply
|
glen
|
6/3/2010 4:55:34 PM
|
|
On 3/06/2010 7:16 AM, glen herrmannsfeldt wrote:
> helvio<helvio.vairinhos@googlemail.com> wrote:
>> On Jun 2, 4:35 pm, nos...@see.signature (Richard Maine) wrote:
>>> helvio<helvio.vairin...@googlemail.com> wrote:
>>>> In sum, I think my doubts reduce to the question of whether the
>>>> efficiency of accessing the physical memory depends on the size of the
>>>> allocated memory, or if it is independent of it.
>
>>> It should be independent of it, or anyway close enough that
>>> you won't be able to measure the difference.
>
> In some theoretical calculations log(n) is used, and as an
> approximation that probably isn't so bad.
Perhaps I misunderstand, but I don't think the time for access to
physical memory is order log(n), where n is the total allocated memory.
Perhaps it is if n is the size of the working set (so the time takes
into account things like cache and swapping?), but arrays that are
allocated and not accessed aren't what I'd consider part of the working
set. Or are you referring to the time needed to allocate the memory in
the first place?
|
|
0
|
|
|
|
Reply
|
Ian
|
6/3/2010 11:14:33 PM
|
|
Ian Harvey <ian_harvey@bigpond.com> wrote:
(snip, I wrote)
>> In some theoretical calculations log(n) is used, and as an
>> approximation that probably isn't so bad.
> Perhaps I misunderstand, but I don't think the time for access to
> physical memory is order log(n), where n is the total allocated memory.
> Perhaps it is if n is the size of the working set (so the time takes
> into account things like cache and swapping?), but arrays that are
> allocated and not accessed aren't what I'd consider part of the working
> set. Or are you referring to the time needed to allocate the memory in
> the first place?
In discussion of the radix sort, the algorithm naturally seems
to have O(N) time. The time is proportional to N and to the
number of digits in the values being sorted. On a machine with
fixed sized variables, that should be constant. But as N goes
to infinity, some claim that it doesn't make sense to compare
fixed sized values, but that the number of digits should increase
os log(N).
OK, now for memory access. The address decoders on semiconductor
RAMs require log(N) level of logic to address N bits. As memory
arrays get bigger, the wires connecting them together get longer,
requiring longer delays.
As for instructions executed, on many machines it takes more
instructions to address larger arrays than smaller ones.
It won't be a continuous increase, but it often does increase.
Consider, for example near and far modes in x86 code.
Many RISC machines with 32 bit instructions can directly
address up to about 20 bits with one instruction, but it
takes two instructions as addresses get larger. VAX has
addressing modes with 8, 16, and 32 bit offsets, with longer
instructions and execution time for the longer offsets.
For the sizes of N likely to be encountered, it is a rough
approximation, but averaged over many machines and sizes
it isn't so bad.
-- glen
|
|
0
|
|
|
|
Reply
|
glen
|
6/4/2010 5:02:28 AM
|
|
glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
> OK, now for memory access. The address decoders on semiconductor
> RAMs require log(N) level of logic to address N bits. As memory
> arrays get bigger, the wires connecting them together get longer,
> requiring longer delays.
>
> As for instructions executed, on many machines it takes more
> instructions to address larger arrays than smaller ones...
However, this all seems irrelevant to the OP's problem.
The number of wires and their length isn't going to change with his
memory size. That sounds like a comparison of different hardware - not
different sizes of code running on the same hardware. Yes, if his code
gets big enough, it could eventually require different hardware, but
that was one of the first, most basic, and most obvious points - that he
needed to worry if his memory size neared the limits of his system.
And the possible different instructions just aren't going to come up in
his case. Maybe if you were custom coding in assembly, but that's not
the case. In all but the most localized and special cases, the
instructions in the compiled code are going to be the ones that can
handle the largest arrays. That's because most of the code won't know at
compile time what size array it is dealing with.
For the OP's problem, which involved running a compiled code on a
particular architecture, the answer to his question is that there won't
be any difference from these matters.
--
Richard Maine | Good judgment comes from experience;
email: last name at domain . net | experience comes from bad judgment.
domain: summertriangle | -- Mark Twain
|
|
0
|
|
|
|
Reply
|
nospam
|
6/4/2010 6:12:24 AM
|
|
Richard Maine <nospam@see.signature> wrote:
> glen herrmannsfeldt <gah@ugcs.caltech.edu> wrote:
>> OK, now for memory access. The address decoders on semiconductor
>> RAMs require log(N) level of logic to address N bits. As memory
>> arrays get bigger, the wires connecting them together get longer,
>> requiring longer delays.
>> As for instructions executed, on many machines it takes more
>> instructions to address larger arrays than smaller ones...
> However, this all seems irrelevant to the OP's problem.
> The number of wires and their length isn't going to change with his
> memory size. That sounds like a comparison of different hardware - not
> different sizes of code running on the same hardware. Yes, if his code
> gets big enough, it could eventually require different hardware, but
> that was one of the first, most basic, and most obvious points - that he
> needed to worry if his memory size neared the limits of his system.
Yes.
> And the possible different instructions just aren't going to come up in
> his case. Maybe if you were custom coding in assembly, but that's not
> the case. In all but the most localized and special cases, the
> instructions in the compiled code are going to be the ones that can
> handle the largest arrays. That's because most of the code won't know at
> compile time what size array it is dealing with.
In the static case the compiler knows.
In the allocatable case, the compiler knows the size of the
variable holding the size. It seems likely that a compile
would generate different code for array accesses when the
subscript variables are larger than default integer, assuming
that the compiler allows for that.
(One complaint about Java is that the language definition
doesn't allow for larger than int (by definition, 32 bits)
when allocating arrays.)
> For the OP's problem, which involved running a compiled code on a
> particular architecture, the answer to his question is that there won't
> be any difference from these matters.
Most likely, yes. But then log() grows pretty slowly.
-- glen
|
|
0
|
|
|
|
Reply
|
glen
|
6/4/2010 11:14:35 AM
|
|
Richard Maine <nospam@see.signature> wrote:
(snip)
> For the OP's problem, which involved running a compiled code on a
> particular architecture, the answer to his question is that there won't
> be any difference from these matters.
integer*8 i,j,k
integer*1, allocatable:: x(:)
i=3000000000
j=i
allocate(x(i))
print *,size(x)
x(j)=j
print *,j,x(j)
print *,huge(i)
end
It seems that when run with gfortran on 64 bit linux, the
allocate fails, but no error is indicated. It fails with
segmentation fault on the assignment. Fortran 2003 says:
"If an error condition occurs during execution of an
ALLOCATE statement that does not contain the STAT=
specifier, execution of the program is terminated."
It seems that size(x) returns zero, but the program was
not terminated on the ALLOCATE.
-- glen
|
|
0
|
|
|
|
Reply
|
glen
|
6/4/2010 11:21:10 AM
|
|
On Jun 3, 5:55=A0pm, glen herrmannsfeldt <g...@ugcs.caltech.edu> wrote:
> helvio <helvio.vairin...@googlemail.com> wrote:
>
> (snip)
>
> > I think that everytime I create a module for my code, and after I
> > defrag it, I will create two copies of it. Something like
> > 'mod_modname_static.f90' and 'mod_modname_alloc.f90'. I will use the
> > static allocation version by default, and the dynamic allocation
> > version only if I see that my temporary arrays are too big.
>
> Well, you could use the C preprocessor, which is supported by
> many Fortran compilers, to select between the appropriate statements
> based on compiler command line options.
>
> #ifdef ALLOC
> =A0 =A0real, allocatable:: x(:,:)
> #else
> =A0 =A0real x(100,100)
> #endif
>
> Then, at least for compilers based on gcc, the -DALLOC
> command line option will select the allocatable version.
>
> -- glen
Thank you! I didn't consider this option previously, and it will be
very helpful in my code (and not just for the large array issue). I am
a physicist first and programmer second, and a consequence of that is
an incomplete understanding of Fortran, which I only use as an
auxiliary tool. I was unaware of C preprocessing and most details on
memory issues discussed here, and it's been great learning from the
masters. ;)
--helvio
|
|
0
|
|
|
|
Reply
|
helvio
|
6/4/2010 12:31:37 PM
|
|
With 4.6 I got warnings about integer overflow. See changes marked !#.
On 06/04/2010 09:21 PM, glen herrmannsfeldt wrote:
> implicit none
> integer(8) i,j
> integer(1), allocatable:: x(:)
> i=3000000000_8 !#
> j=i
> allocate(x(i))
> print *,size(x,KIND=8) !#
> x(j)=13_1 !#
> print *,j,x(j)
> print *,huge(i)
> end
>
|
|
0
|
|
|
|
Reply
|
Ian
|
6/4/2010 1:04:35 PM
|
|
On 6/3/2010 11:02 PM, glen herrmannsfeldt wrote:
> Ian Harvey<ian_harvey@bigpond.com> wrote:
> (snip, I wrote)
>
>>> In some theoretical calculations log(n) is used, and as an
>>> approximation that probably isn't so bad.
>
>> Perhaps I misunderstand, but I don't think the time for access to
>> physical memory is order log(n), where n is the total allocated memory.
>> Perhaps it is if n is the size of the working set (so the time takes
>> into account things like cache and swapping?), but arrays that are
>> allocated and not accessed aren't what I'd consider part of the working
>> set. Or are you referring to the time needed to allocate the memory in
>> the first place?
>
<snip>
>
> OK, now for memory access. The address decoders on semiconductor
> RAMs require log(N) level of logic to address N bits. As memory
> arrays get bigger, the wires connecting them together get longer,
> requiring longer delays.
<snip>
Sounds like we're talking about log(log(n)), where n is the maximum
memory address. FWIW.
Louis
|
|
0
|
|
|
|
Reply
|
Louis
|
6/4/2010 8:36:52 PM
|
|
On 6/3/2010 6:11 AM, helvio wrote:
<snip>
> I've been using dynamic allocation for the large temporary arrays of
> my code. They are not used simultaneously, so I save a lot of memory
> with respect to static allocation. What I wanted to know is the pros
> and cons of *not* using static allocation when I can.
>
> From the comments of all of you, I think I can conclude the following:
> as long as I am using a small portion of the available memory, it is
> wise to use static allocation, because multiple allocation/
> deallocation might fragment the memory badly (unless I iteratively
> allocate ever increasing arrays); and when the total required memory
> by static allocation gets significant with respect to the total
> available memory, then allocation/deallocation might be a good/the
> only option (but then again, fragmentation might also be an issue).
I think you have part of that backwards. Iterative allocation and
deallocation of increasingly large arrays is *more* likely to result in
fragmentation. Allocation from fragmented memory is likely to take longer.
Keep in mind, also, that there are two total memory sizes in play:
physical, the RAM on your box, and virtual, which is determined by
system architecture and possibly the OS. Virtual memory is much larger,
and whatever doesn't fit in physical memory will be paged in and out as
required. My guess is that if you really start using enough virtual
memory to get close to its limit, your system is going to be spending
most of its time paging, and it doesn't matter if you do that with
dynamically or statically allocated virtual memory.
If you were using very large automatic arrays and hitting a stack size
limit, dynamic allocation would be an obvious way to go. In your case,
it's not as clear that you're solving a real problem.
Louis
|
|
0
|
|
|
|
Reply
|
Louis
|
6/4/2010 9:32:17 PM
|
|
On 6/4/10 3:36 PM, Louis Krupp wrote:
> On 6/3/2010 11:02 PM, glen herrmannsfeldt wrote:
>> Ian Harvey<ian_harvey@bigpond.com> wrote:
>> (snip, I wrote)
>>
>>>> In some theoretical calculations log(n) is used, and as an
>>>> approximation that probably isn't so bad.
>>
>>> Perhaps I misunderstand, but I don't think the time for access to
>>> physical memory is order log(n), where n is the total allocated memory.
>>> Perhaps it is if n is the size of the working set (so the time takes
>>> into account things like cache and swapping?), but arrays that are
>>> allocated and not accessed aren't what I'd consider part of the working
>>> set. Or are you referring to the time needed to allocate the memory in
>>> the first place?
>>
> <snip>
>>
>> OK, now for memory access. The address decoders on semiconductor
>> RAMs require log(N) level of logic to address N bits. As memory
>> arrays get bigger, the wires connecting them together get longer,
>> requiring longer delays.
> <snip>
>
> Sounds like we're talking about log(log(n)), where n is the maximum
> memory address. FWIW.
>
> Louis
We may have lost sight of the goal here. Presumably, if someone
allocates or automatics an array, he will use it later on. The
"get a chunk of memory" cost is, roughly speaking, a small constant.
The cost of using any array a few times is, roughly speaking,
few*n. For large arrays, use whatever mode seems natural for the
problem; the memory management cost will be small. For small
arrays, your program will run so fast it doesn't matter what
you do to define your arrays. (And, yes, I know people can do
1.0e137 allocate/deallocate/FFTs of an array of size 4;
but most people don't ;) )
Dick Hendrickson
|
|
0
|
|
|
|
Reply
|
Dick
|
6/5/2010 2:31:23 AM
|
|
|
23 Replies
740 Views
(page loaded in 9.249 seconds)
Similiar Articles: Allocatable versus automatic arrays - comp.lang.fortranHi all, I have a huge F90 code composed by several modules, with several module procedures each, and a main program. No external procedures are u... Allocating, Deallocating and Reallocating - comp.lang.fortran ...Allocatable versus automatic arrays - comp.lang.fortran Allocating, Deallocating and Reallocating - comp.lang.fortran ... Allocatable versus automatic arrays - comp.lang ... Compiler Warning : global symbol '_GLOBAL_OFFSET_TABLE_' has non ...Allocatable versus automatic arrays - comp.lang.fortran Compiler Warning : global symbol '_GLOBAL_OFFSET_TABLE_' has non ... Allocatable versus automatic arrays - comp ... Large Arrays - Memory Problems - comp.soft-sys.matlabI'm solving quite large shortest path problems with matlabBGL (which ... Allocatable versus automatic arrays - comp.lang ... times for matrix ... and most details on ... Automatic generation of Jacobian matrices - comp.lang.fortran ...Allocatable versus automatic arrays - comp.lang.fortran Automatic generation of Jacobian matrices - comp.lang.fortran ... Allocatable versus automatic arrays - comp.lang ... initialize array of structs - comp.soft-sys.matlabAllocatable versus automatic arrays - comp.lang.fortran initialize array of structs - comp.soft-sys.matlab Allocatable versus automatic arrays - comp.lang.fortran It seems ... Difference between passing a number and a variable to a subroutine ...Allocatable versus automatic arrays - comp.lang.fortran Difference between passing a number and a variable to a subroutine ... Allocatable versus automatic arrays - comp ... how to ask GCC to automatically initialize local variables - comp ...Allocatable versus automatic arrays - comp.lang.fortran It seems that many can efficiently initialize static zeros ... On a machine with fixed sized variables, that should ... Strange Length of an array ?? - comp.lang.xharbourAllocatable versus automatic arrays - comp.lang.fortran Strange Length of an array ?? - comp.lang.xharbour Allocatable versus automatic arrays - comp.lang.fortran There ... Global Offset Table of a ELF executable - comp.unix.programmer ...Allocatable versus automatic arrays - comp.lang.fortran Global Offset Table of a ELF executable - comp.unix.programmer ... Allocatable versus automatic arrays - comp.lang ... Fortran Question - comp.lang.fortranAllocatable versus automatic arrays - comp.lang.fortran Allocatable versus automatic arrays - comp.lang.fortran Questions regarding derived data types with allocatable ... Jacobian matrix from lsqnonlin - comp.soft-sys.matlabAllocatable versus automatic arrays - comp.lang.fortran Allocatable versus automatic arrays - comp.lang.fortran memory availability, then ... Jacobian matrix from ... SOLUTION: compile time array size using type only - comp.lang.c++ ...Hi here is the way for finding array size using type only ///// template class cx_array_length { template static inline const char (&_boun... Algorithm behind "sum" command? - comp.soft-sys.matlabAllocatable versus automatic arrays - comp.lang.fortran Algorithm behind "sum" command? - comp.soft-sys.matlab Allocatable versus automatic arrays - comp.lang.fortran ... onbeforeunload being called multiple times - comp.lang.javascript ...Allocatable versus automatic arrays - comp.lang.fortran... DEALLOCATION statements slow down the program if they are called multiple times, as compared ... time is ... Allocatable versus automatic arrays - comp.lang.fortran | Computer ...Hi all, I have a huge F90 code composed by several modules, with several module procedures each, and a main program. No external procedures are u... Fortran: dynamic arrays vs. automatic array Avoiding Memory ...Fortran: dynamic arrays vs. automatic array Avoiding Memory Allocation ... They are no different than ALLOCATABLE arrays in this respect. – Hristo Iliev Jun 27 ... 7/23/2012 9:56:37 AM
|