i,
I've a general question to the assembly code that is generated by
the gcc compiler.
According to the ASCI-C standard, global variable (of storage class
static) must be initialized with the value '0'.
So let's say I have the global variable 'glob'
int glob;
int main( void ) {...}
Generating the assembly code with gcc v3.3.2, I get the following
assembler directive:
.comm glob,4,4
By definition from the assembler 'as' manual 'comm' is defined as follows:
..comm symbol , length
..comm declares a common symbol named symbol. When linking, a common symbol in
one object file may be merged with a defined or common symbol of the same
name in another object file. If ld does not see a definition for the
symbol--just one or more common symbols--then it will allocate length
bytes of uninitialized memory. length must be an absolute expression. If
ld sees multiple common symbols with the same name, and they do not all
have the same size, it will allocate space using the largest size.
When using ELF, the .comm directive takes an optional third argument. This
is the desired alignment of the symbol, specified as a byte boundary (for
example, an alignment of 16 means that the least significant 4 bits of the
address should be zero). The alignment must be an absolute expression, and
it must be a power of two. If ld allocates uninitialized memory for the
common symbol, it will use the alignment when placing the symbol. If no
alignment is specified, as will set the alignment to the largest power of
two less than or equal to the size of the symbol, up to a maximum of 16.
----------------------------------------------
So, '.comm glob,4,4' allocates 4 bytes of uninitialized memory with
allignment 4 (3. parameter). However, since uninitialized memory is
allocated, it might contain anything. Thus, there's no guarantee that the
memory allocated for 'glob' contains the requested value '0'.
Why is thise allocation of uninitialized memory without an explicit
initialization with '0' valid?
Regards,
Chris
|
|
0
|
|
|
|
Reply
|
Christian
|
5/11/2006 7:52:01 AM |
|
Christian Christmann <spamtrap@crayne.org> wrote in part:
> So, '.comm glob,4,4' allocates 4 bytes of uninitialized
> memory with allignment 4 (3. parameter). However, since
> uninitialized memory is allocated, it might contain
> anything. Thus, there's no guarantee that the memory
> allocated for 'glob' contains the requested value '0'.
Actually, there [usually] is: .comm follows .data and preceeds
..bss on the heap. The heap and in general all pages allocated
by the OS must be cleared before re-use for security reasons.
Dirty pages would make interesting reading!
Note this does _not_ apply to malloc() which is not a pure
syscall, and may put objects on the stack (which may or may
not be fresh) or reuse free()d heap.
-- Robert
|
|
0
|
|
|
|
Reply
|
Robert
|
5/11/2006 7:26:02 PM
|
|
Christian Christmann wrote:
> i,
>
> I've a general question to the assembly code that is generated by
> the gcc compiler.
> According to the ASCI-C standard, global variable (of storage class
> static) must be initialized with the value '0'.
> So let's say I have the global variable 'glob'
>
> int glob;
> int main( void ) {...}
>
> Generating the assembly code with gcc v3.3.2, I get the following
> assembler directive:
>
> .comm glob,4,4
>
> By definition from the assembler 'as' manual 'comm' is defined as follows:
>
> .comm symbol , length
>
> .comm declares a common symbol named symbol. When linking, a common symbol in
> one object file may be merged with a defined or common symbol of the same
> name in another object file. If ld does not see a definition for the
> symbol--just one or more common symbols--then it will allocate length
> bytes of uninitialized memory. length must be an absolute expression. If
> ld sees multiple common symbols with the same name, and they do not all
> have the same size, it will allocate space using the largest size.
>
> When using ELF, the .comm directive takes an optional third argument. This
> is the desired alignment of the symbol, specified as a byte boundary (for
> example, an alignment of 16 means that the least significant 4 bits of the
> address should be zero). The alignment must be an absolute expression, and
> it must be a power of two. If ld allocates uninitialized memory for the
> common symbol, it will use the alignment when placing the symbol. If no
> alignment is specified, as will set the alignment to the largest power of
> two less than or equal to the size of the symbol, up to a maximum of 16.
> ----------------------------------------------
>
> So, '.comm glob,4,4' allocates 4 bytes of uninitialized memory with
> allignment 4 (3. parameter). However, since uninitialized memory is
> allocated, it might contain anything. Thus, there's no guarantee that the
> memory allocated for 'glob' contains the requested value '0'.
> Why is thise allocation of uninitialized memory without an explicit
> initialization with '0' valid?
The area uninitialized storage is mapped into at load time is clear to
zeros by either the loader, or the runtime for the language. The C and
C++ standards, for example, don't specify where it gets clear but
that it happens magically somewhere before the program starts to run.
That's essentially before the first line of main() in C, or before
the first constructor for a global scope item in C++.
In most cases it's the loader, but there are GCC implementations where
it's in the runtime. If you're using a "normal" x86 GCC C or C++ on
Linux or Windows, it's the loader.
|
|
0
|
|
|
|
Reply
|
robertwessel2
|
5/11/2006 7:57:28 PM
|
|
Robert Redelmeier wrote:
> Christian Christmann <spamtrap@crayne.org> wrote in part:
> > So, '.comm glob,4,4' allocates 4 bytes of uninitialized
> > memory with allignment 4 (3. parameter). However, since
> > uninitialized memory is allocated, it might contain
> > anything. Thus, there's no guarantee that the memory
> > allocated for 'glob' contains the requested value '0'.
>
> Actually, there [usually] is: .comm follows .data and preceeds
> .bss on the heap. The heap and in general all pages allocated
> by the OS must be cleared before re-use for security reasons.
> Dirty pages would make interesting reading!
Well, the OS *should* clear newly allocated pages, but there have been
plenty of cases where one hasn't. Also, not all programs are loaded
and run in newly allocated address spaces (again, this is system
dependent). But as I mentioned in my other post, a language like C or
C++ may require that it happens somewhere - if the OS does it
automatically, great, if not it will need to get done explicitly.
Also, static items (and .data and .bss areas) are not usually
considered to be on the heap at all, at least not in the traditional
sense, nor is there any universal relationship between the locations
occupied by .data, .bss, .text, the heap and the stack (assuming your
implementation actually has any/all of those).
|
|
0
|
|
|
|
Reply
|
robertwessel2
|
5/11/2006 8:05:52 PM
|
|
robertwessel2@yahoo.com <spamtrap@crayne.org> wrote in part:
> Well, the OS *should* clear newly allocated pages,
> but there have been plenty of cases where one hasn't.
I consider those OSes buggy or worse, Microsoft :)
> Also, not all programs are loaded and run in newly allocated
> address spaces (again, this is system dependent).
Then those loaders had better do the clearing, or some startfile
added by the compiler. It has to happen somewhere, or _horrors_
a HLL won't be portable. Gasp! Say it ain't so! :)
> Also, static items (and .data and .bss areas) are not
> usually considered to be on the heap at all, at least not
> in the traditional sense,
Well, "heap" is a `c` term and I'm not too concerned
with respecting `c` traditions here in clax86.
> nor is there any universal relationship between the locations
> occupied by .data, .bss, .text, the heap and the stack
Perhaps true, but x86 has limited implementations, and
..text, .data, .bss ... .stack seems to be the one thing that
Linux, *BSD and even MS-Windows can agree upon.
A .bss .stack collision seems to be the preferred failure
mode rather than .stack bottom-of-memory and .bss top-of memory.
A single lower-probability collision vs two higher ones.
-- Robert
|
|
0
|
|
|
|
Reply
|
Robert
|
5/11/2006 10:18:47 PM
|
|
Robert Redelmeier wrote:
> robertwessel2@yahoo.com <spamtrap@crayne.org> wrote in part:
> > Well, the OS *should* clear newly allocated pages,
> > but there have been plenty of cases where one hasn't.
>
> I consider those OSes buggy or worse, Microsoft :)
You need to be exposed to more OSs, and more application domains.
There are plenty of OSs which don't even support virtual memory, or
memory protection, or multiple address spaces... And they're certainly
not buggy (or Microsoft).
> > Also, not all programs are loaded and run in newly allocated
> > address spaces (again, this is system dependent).
>
> Then those loaders had better do the clearing, or some startfile
> added by the compiler. It has to happen somewhere, or _horrors_
> a HLL won't be portable. Gasp! Say it ain't so! :)
As I said. If so specified by the language then someone has to do it.
If the OS does it by default, great, if not, it'll have to be the
runtime. While I hesitate to use an MS example, take a look at the
MS-DOS CRT startup code for your favorite 16-bit real mode compiler.
You'll find explicit code for clearing the .bss area.
> > Also, static items (and .data and .bss areas) are not
> > usually considered to be on the heap at all, at least not
> > in the traditional sense,
>
> Well, "heap" is a `c` term and I'm not too concerned
> with respecting `c` traditions here in clax86.
No, it's not. No heaps are required in C, but it's a common
implementation. A heap is a data structure, quite independent of any
language. In any event, common C term or not, heaps are for dynamic
allocation, so .data and .bss are almost never actually on the heap.
> > nor is there any universal relationship between the locations
> > occupied by .data, .bss, .text, the heap and the stack
>
> Perhaps true, but x86 has limited implementations, and
> .text, .data, .bss ... .stack seems to be the one thing that
> Linux, *BSD and even MS-Windows can agree upon.
>
> A .bss .stack collision seems to be the preferred failure
> mode rather than .stack bottom-of-memory and .bss top-of memory.
> A single lower-probability collision vs two higher ones.
A typical Windows C++ program ends up with the stack at the low
addresses, followed by a preallocated area for heap (which can have
additional non-contiguous areas added to it), then .text, then .data,
then .bss. Although that's certainly not fixed. You can move
everything if you rebase your program to load at 0x20000, for example.
Threads and DLLs also muck with those orders.
Try the following on your favorite system:
#include <cstdio>
int a[10000];
int b=3;
int main()
{
int c;
int *d;
d = new int;
printf("\na@%p", a);
printf("\nb@%p", &b);
printf("\nc@%p", &c);
printf("\nd@%p", d);
printf("\nmain@%p", main);
printf("\n\n");
}
|
|
0
|
|
|
|
Reply
|
robertwessel2
|
5/11/2006 11:43:01 PM
|
|
robertwessel2@yahoo.com <spamtrap@crayne.org> wrote in part:
> While I hesitate to use an MS example, take a look at the MS-DOS
> CRT startup code for your favorite 16-bit real mode compiler.
I have _no_ favorite compiler!
> You'll find explicit code for clearing the .bss area.
Natch! MS-DOS has no pretentions to be an OS,
much less secure. In face, a zero-byte RERUN.COM
file can be very useful.
> No, it's not. No heaps are required in C, but it's a common
> implementation. A heap is a data structure, quite independent
> of any language. In any event, common C term or not, heaps
> are for dynamic allocation, so .data and .bss are almost never
> actually on the heap.
More correctly, the heap is usually after .bss
> A typical Windows C++ program ends up with the stack at the
> low addresses, followed by a preallocated area for heap
> (which can have additional non-contiguous areas added to
> it), then .text, then .data, then .bss. Although that's
Surely you jest! Instead of simply worrying about the
classic heap/stack collision, the pgmr has to worry about
stack/BOM, heap/.text (perhaps handled by malloc), and .bss/TOM.
> certainly not fixed. You can move everything if you rebase
> your program to load at 0x20000, for example. Threads and
> DLLs also muck with those orders.
Ghastly. Small wonder these things crash.
> Try the following on your favorite system:
Hardly favorite: gcc under Linux is the least intolerable.
> d = new int;
errors.
$ gcc d.c
$ ./a.out
a@0x8049140
b@0x8049100
c@0xbffff784
d@(nil)
main@0x8048384
-- Robert
|
|
0
|
|
|
|
Reply
|
Robert
|
5/12/2006 2:31:04 AM
|
|
Robert Redelmeier wrote:
> robertwessel2@yahoo.com <spamtrap@crayne.org> wrote in part:
> > A typical Windows C++ program ends up with the stack at the
> > low addresses, followed by a preallocated area for heap
> > (which can have additional non-contiguous areas added to
> > it), then .text, then .data, then .bss. Although that's
>
> Surely you jest! Instead of simply worrying about the
> classic heap/stack collision, the pgmr has to worry about
> stack/BOM, heap/.text (perhaps handled by malloc), and .bss/TOM.
In Windows the first 64 or 128KB of address space are always
unaddressable, so running off the beginning of memory just runs you
into a guard page. Likewise Windows allocates guard pages around
stacks so a normal over/underflow will hit the guard page (OTOH, if you
say "sub esp,1000000" all bets are off).
> > certainly not fixed. You can move everything if you rebase
> > your program to load at 0x20000, for example. Threads and
> > DLLs also muck with those orders.
>
> Ghastly. Small wonder these things crash.
I don't know, perhaps it's my background, but the notion that these
things *should* have some fixed, or even consistent, relationship to
each other seems fundamentally odd to me. Perhaps it might mean
something in an environment where a program is a single executable file
that loads, runs a single instruction stream, and then stops, where it
makes perfect send to put .text, .data and .bss at one end or the other
of the address space, and then have the stack and heap grow at each
other in the remainder, but I certainly don't get to do much
programming in an environment like that.
> > Try the following on your favorite system:
>
> Hardly favorite: gcc under Linux is the least intolerable.
>
> > d = new int;
> errors.
>
> $ gcc d.c
> $ ./a.out
>
> a@0x8049140
> b@0x8049100
> c@0xbffff784
> d@(nil)
> main@0x8048384
So the program got loaded in the middle of the address space, and .data
and .bss just above that, and the stack is at the end growing back
down. Oddly, your (heap) allocation of "d" failed, so we don't know
where the heap is.
Oh wait, you need to compile that as a C++ program. Or replace the
"new" line with a malloc().
|
|
0
|
|
|
|
Reply
|
robertwessel2
|
5/12/2006 3:08:40 AM
|
|
robertwessel2@yahoo.com <spamtrap@crayne.org> wrote in part:
>> Surely you jest! Instead of simply worrying about the
>> classic heap/stack collision, the pgmr has to worry about
>> stack/BOM, heap/.text (perhaps handled by malloc), and .bss/TOM.
> In Windows the first 64 or 128KB of address space are always
> unaddressable, so running off the beginning of memory just runs
> you into a guard page. Likewise Windows allocates guard pages
> around stacks so a normal over/underflow will hit the guard page
Yes, I understand guard pages. But with this memory layout,
they're hard fixed and say cannot have the possibility of 3
gigabyte stack or 3 GB heap determined during execution.
Obstacles in the address space are _bad_.
> (OTOH, if you say "sub esp,1000000" all bets are off).
Why? That's a small number, less than 1 MiB On a Linux
system, the next stack access will segfault since it is
in unallocated space until the process gets close to 3 GB!
If an MS-Windows system collides or wraps with such a small
number, then small wonder its' pgms are unstable.
> I don't know, perhaps it's my background, but the notion that
> these things *should* have some fixed, or even consistent,
> relationship to each other seems fundamentally odd to me.
Why change without reason? Yes, I know about mmap(),
but it is not for trivial use by beginners.
> Perhaps it might mean something in an environment where
> a program is a single executable file that loads, runs a
> single instruction stream, and then stops, where it makes
> perfect send to put .text, .data and .bss at one end or
> the other of the address space, and then have the stack and
> heap grow at each other in the remainder, but I certainly
> don't get to do much programming in an environment like that.
What are you doing? Embedded environments in C++? Bletcherous!
Unstable withoug tight coding style. You try to run threads?
Major PITA trying to be your own scheduler/MM and I'm dubious
about the benefits versus a tuned OS.
>> a@0x8049140
>> b@0x8049100
>> c@0xbffff784
>> d@(nil)
>> main@0x8048384
> So the program got loaded in the middle of the address space,
Look more carefully or ask for fixed-fmt output:
main is 138.5 MB up from BOM [08048384h], b & a just higher,
then the stack almost 3 GB away.
> Oh wait, you need to compile that as a C++ program.
I'd rather do COBOL.
-- Robert
|
|
0
|
|
|
|
Reply
|
Robert
|
5/12/2006 1:15:24 PM
|
|
|
8 Replies
333 Views
(page loaded in 0.239 seconds)
|