ElectricFence Exiting: mprotect() failed: Cannot allocate memory

  • Follow


I am using electric fence 2.1.13 to try to find a memory allocation
problem that occurs after my application runs for about 3 hours.  When
I link to the electric fence library, I get "ElectricFence Exiting:
mprotect() failed: Cannot allocate memory" during initialization.
Could this be the source of the error that takes 3 hours to occur?  I
wonder because all I see at this point is a 12 byte malloc.

According to a comment in efence.c, "On some systems it will be
necessary to increase the amount of swap space in order to debug large
programs that perform lots of allocation, because of the per-buffer
overhead."  How does one increase the amount of swap space?  I am
running Linux 2.6.26 on an MPC8248.
0
Reply jobhunts02 (106) 10/14/2008 2:32:45 AM

On Oct 13, 7:32=A0pm, Bill <jobhunt...@aol.com> wrote:
> I am using electric fence 2.1.13 to try to find a memory allocation
> problem that occurs after my application runs for about 3 hours. =A0When
> I link to the electric fence library, I get "ElectricFence Exiting:
> mprotect() failed: Cannot allocate memory" during initialization.
> Could this be the source of the error that takes 3 hours to occur? =A0I
> wonder because all I see at this point is a 12 byte malloc.

I doubt that's the source of the error that takes 3 hours to occur.

> According to a comment in efence.c, "On some systems it will be
> necessary to increase the amount of swap space in order to debug large
> programs that perform lots of allocation, because of the per-buffer
> overhead." =A0How does one increase the amount of swap space? =A0I am
> running Linux 2.6.26 on an MPC8248.

I would recommend doing invasive debugging on a test system with
significant additional memory. It's hard to help you without knowing
more about your hardware. Do you have a hard drive? Do you have USB
ports? How much memory do you have?

DS
0
Reply davids (1012) 10/14/2008 3:15:24 AM


Bill <jobhunts02@aol.com> writes:

> I am using electric fence 2.1.13 to try to find a memory allocation
> problem that occurs after my application runs for about 3 hours.

What kind of problem?
Efence is good at finding memory corruption problems, not memory
allocation problems.

> When
> I link to the electric fence library, I get "ElectricFence Exiting:
> mprotect() failed: Cannot allocate memory" during initialization.
> Could this be the source of the error that takes 3 hours to occur?

Unlikely.

> I wonder because all I see at this point is a 12 byte malloc.

A *single* 12-byte malloc was performed by that point?
If so, your copy of efence is misconfigured, miscompiled, or busted
in some other way.

> According to a comment in efence.c, "On some systems it will be
> necessary to increase the amount of swap space in order to debug large
> programs that perform lots of allocation, because of the per-buffer
> overhead."

Efence adds 1 page guard to every malloc.
It is very rarely helpful in debugging non-toy applications.
You may have better luck with Valgrind.

> How does one increase the amount of swap space?  I am
> running Linux 2.6.26 on an MPC8248.

Try "man swapon".

Cheers,
-- 
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
0
Reply ppluzhnikov-nsp2 (108) 10/14/2008 4:19:45 AM

After about 3 hours, the program seg faults when trying to do a malloc
65K bytes.  At the time, according to top, there is plenty of memory
available

I tried using valgrind but it slowed down my application so much that
it was useless..



On Oct 13, 9:19=A0pm, Paul Pluzhnikov <ppluzhnikov-...@gmail.com> wrote:
> Bill <jobhunt...@aol.com> writes:
> > I am using electric fence 2.1.13 to try to find a memory allocation
> > problem that occurs after my application runs for about 3 hours.
>
> What kind of problem?
> Efence is good at finding memory corruption problems, not memory
> allocation problems.
>
> > When
> > I link to the electric fence library, I get "ElectricFence Exiting:
> > mprotect() failed: Cannot allocate memory" during initialization.
> > Could this be the source of the error that takes 3 hours to occur?
>
> Unlikely.
>
> > I wonder because all I see at this point is a 12 byte malloc.
>
> A *single* 12-byte malloc was performed by that point?
> If so, your copy of efence is misconfigured, miscompiled, or busted
> in some other way.
>
> > According to a comment in efence.c, "On some systems it will be
> > necessary to increase the amount of swap space in order to debug large
> > programs that perform lots of allocation, because of the per-buffer
> > overhead."
>
> Efence adds 1 page guard to every malloc.
> It is very rarely helpful in debugging non-toy applications.
> You may have better luck with Valgrind.
>
> > How does one increase the amount of swap space? =A0I am
> > running Linux 2.6.26 on an MPC8248.
>
> Try "man swapon".
>
> Cheers,
> --
> In order to understand recursion you must first understand recursion.
> Remove /-nsp/ for email.

0
Reply jobhunts02 (106) 10/14/2008 5:01:08 AM

I have a total of 128 MB of flash on my target board.  No USB ports.
Monitoring top, it does not appear that memory is being leaked, but it
is behaving as if running out of memory.  Is there a better way than
top to monitor memory?



On Oct 13, 8:15=A0pm, David Schwartz <dav...@webmaster.com> wrote:
> On Oct 13, 7:32=A0pm, Bill <jobhunt...@aol.com> wrote:
>
> > I am using electric fence 2.1.13 to try to find a memory allocation
> > problem that occurs after my application runs for about 3 hours. =A0Whe=
n
> > I link to the electric fence library, I get "ElectricFence Exiting:
> > mprotect() failed: Cannot allocate memory" during initialization.
> > Could this be the source of the error that takes 3 hours to occur? =A0I
> > wonder because all I see at this point is a 12 byte malloc.
>
> I doubt that's the source of the error that takes 3 hours to occur.
>
> > According to a comment in efence.c, "On some systems it will be
> > necessary to increase the amount of swap space in order to debug large
> > programs that perform lots of allocation, because of the per-buffer
> > overhead." =A0How does one increase the amount of swap space? =A0I am
> > running Linux 2.6.26 on an MPC8248.
>
> I would recommend doing invasive debugging on a test system with
> significant additional memory. It's hard to help you without knowing
> more about your hardware. Do you have a hard drive? Do you have USB
> ports? How much memory do you have?
>
> DS

0
Reply jobhunts02 (106) 10/14/2008 5:05:54 AM

On Mon, 13 Oct 2008 22:01:08 -0700 (PDT), Bill <jobhunts02@aol.com>
wrote:

>After about 3 hours, the program seg faults when trying to do a malloc
>65K bytes.  At the time, according to top, there is plenty of memory
>available

Apparently you do not have even 64 KiB of _contiguous_ virtual memory
available, but only a huge number of smaller fragments all over the
memory. I guess that the system would run a few hours longer, if the
largest allocation was 8 KiB :-).

Sounds like a typical dynamic memory fragmentation problem. 

The other alternative, if the stack and dynamic memory occupy the same
memory area (one growing upwards and the other downwards) is that he
stack size is constantly increasing due to a programming error,
finally inhibiting the growth of the heap.

Paul

0
Reply keinanen (1068) 10/14/2008 5:37:04 AM

Bill <jobhunts02@aol.com> writes:
> I have a total of 128 MB of flash on my target board.  No USB ports.
> Monitoring top, it does not appear that memory is being leaked, but it
> is behaving as if running out of memory.

Still assuming your description is correct, it is behaving as if the
malloc-code made an invalid memory access because of a corrupted
pointer inside the heap. But you can easily verify if the allocation
should have succeeded, ie if there was a continuous area of at least
64K of 'unused VM' available:

    1. Modify the segfault handler in the kernel to send a SIGSTOP
       instead of a SIGSEGV.

    2. Use pmap to inspect the address space layout of the affected
       process after it has been stopped by the signal.

              	       
0
Reply rweikusat (2716) 10/14/2008 8:43:30 AM

> ... a memory allocation
> problem that occurs after my application runs for about 3 hours.

Under glibc, setting the shell environment variable "export MALLOC_CHECK_=2"
[note the trailing underscore] performs additional internal consistency checks
that are relatively inexpensive.  Run "info libc" then search for MALLOC_CHECK_.

man swapon   # how to increase swap space.
/proc/<pid>/maps  reveals summary information for one process.
/proc/<pid>/smaps  reveals more details for one process.
/proc/meminfo  reports a system-wide summary.

-- 




0
Reply jreiser (114) 10/14/2008 3:25:11 PM

John Reiser <jreiser@BitWagon.com> writes:
>> ... a memory allocation
>> problem that occurs after my application runs for about 3 hours.
>
> Under glibc, setting the shell environment variable "export MALLOC_CHECK_=2"
> [note the trailing underscore] performs additional internal consistency checks
> that are relatively inexpensive.  Run "info libc" then search for MALLOC_CHECK_.
>
> man swapon   # how to increase swap space.

The OP is using a PPC-based SoC. I doubt that he has any swap space on
board.
0
Reply rweikusat (2716) 10/14/2008 3:32:13 PM

Below is what pmap -x gives for the process (snmpd) upon failing at a
call to malloc for 65536 bytes.  Does anything here would indicate a
possible problem trying to malloc 65536 bytes?  It should be noted
that a call to pmap -x before the failure while snmpd was still
running gave identical results.  Therefore, I wonder if the cause of
the problem can be seen here?


Address   Kbytes     RSS    Anon  Locked Mode   Mapping
0f8b8000      64       -       -       - r-x--  libresolv-2.6.so
0f8c8000     252       -       -       - -----  libresolv-2.6.so
0f907000       4       -       -       - r----  libresolv-2.6.so
0f908000       4       -       -       - rwx--  libresolv-2.6.so
0f909000       8       -       -       - rwx--    [ anon ]
0f91b000      16       -       -       - r-x--  libnss_dns-2.6.so
0f91f000     252       -       -       - -----  libnss_dns-2.6.so
0f95e000       4       -       -       - r----  libnss_dns-2.6.so
0f95f000       4       -       -       - rwx--  libnss_dns-2.6.so
0f970000      40       -       -       - r-x--  libnss_files-2.6.so
0f97a000     252       -       -       - -----  libnss_files-2.6.so
0f9b9000       4       -       -       - r----  libnss_files-2.6.so
0f9ba000       4       -       -       - rwx--  libnss_files-2.6.so
0f9cb000      28       -       -       - r-x--  librt-2.6.so
0f9d2000     252       -       -       - -----  librt-2.6.so
0fa11000       4       -       -       - r----  librt-2.6.so
0fa12000       4       -       -       - rwx--  librt-2.6.so
0fa23000    1264       -       -       - r-x--  libc-2.6.so
0fb5f000     252       -       -       - -----  libc-2.6.so
0fb9e000       8       -       -       - r----  libc-2.6.so
0fba0000      12       -       -       - rwx--  libc-2.6.so
0fba3000      12       -       -       - rwx--    [ anon ]
0fbb6000      12       -       -       - r-x--  libEclipseHms.so
0fbb9000     252       -       -       - -----  libEclipseHms.so
0fbf8000       4       -       -       - rwx--  libEclipseHms.so
0fc09000      16       -       -       - r-x--  libEclipseVer.so
0fc0d000     252       -       -       - -----  libEclipseVer.so
0fc4c000       4       -       -       - rwx--  libEclipseVer.so
0fc5d000      12       -       -       - r-x--  libEclipsePai.so
0fc60000     252       -       -       - -----  libEclipsePai.so
0fc9f000       4       -       -       - rwx--  libEclipsePai.so
0fcb0000       8       -       -       - r-x--  libEclipseCil.so
0fcb2000     256       -       -       - -----  libEclipseCil.so
0fcf2000       4       -       -       - rwx--  libEclipseCil.so
0fd03000      12       -       -       - r-x--  libEclipseConf.so
0fd06000     256       -       -       - -----  libEclipseConf.so
0fd46000       4       -       -       - rwx--  libEclipseConf.so
0fd57000      16       -       -       - r-x--  libEclipseSem.so
0fd5b000     252       -       -       - -----  libEclipseSem.so
0fd9a000       4       -       -       - rwx--  libEclipseSem.so
0fdab000      12       -       -       - r-x--  libEclipseLog.so
0fdae000     256       -       -       - -----  libEclipseLog.so
0fdee000       4       -       -       - rwx--  libEclipseLog.so
0fdff000       8       -       -       - r-x--  libEclipseLst.so
0fe01000     252       -       -       - -----  libEclipseLst.so
0fe40000       4       -       -       - rwx--  libEclipseLst.so
0fe51000      80       -       -       - r-x--  libpthread-2.6.so
0fe65000     256       -       -       - -----  libpthread-2.6.so
0fea5000       4       -       -       - r----  libpthread-2.6.so
0fea6000       4       -       -       - rwx--  libpthread-2.6.so
0fea7000       8       -       -       - rwx--    [ anon ]
0feb9000     640       -       -       - r-x--  libm-2.6.so
0ff59000     252       -       -       - -----  libm-2.6.so
0ff98000       4       -       -       - r----  libm-2.6.so
0ff99000      12       -       -       - rwx--  libm-2.6.so
0ffac000      12       -       -       - r-x--  libdl-2.6.so
0ffaf000     252       -       -       - -----  libdl-2.6.so
0ffee000       4       -       -       - r----  libdl-2.6.so
0ffef000       4       -       -       - rwx--  libdl-2.6.so
10000000    1192       -       -       - r-x--  snmpd
10169000      32       -       -       - rwx--  snmpd
10171000     552       -       -       - rwx--    [ anon ]
30000000     116       -       -       - r-x--  ld-2.6.so
3001d000      24       -       -       - rw---    [ anon ]
30023000       4       -       -       - r--s-    [ shmid=3D0x0 ]
30024000       4       -       -       - rw---    [ anon ]
30025000       4       -       -       - r--s-    [ shmid=3D0x0 ]
3005c000       4       -       -       - r----  ld-2.6.so
3005d000       4       -       -       - rwx--  ld-2.6.so
3005e000       4       -       -       - -----    [ anon ]
3005f000    8188       -       -       - rw---    [ anon ]
3085e000       4       -       -       - -----    [ anon ]
3085f000    8188       -       -       - rw---    [ anon ]
7ff61000     332       -       -       - rw---    [ stack ]
-------- ------- ------- ------- -------
total kB   25084       -       -       -




On Oct 14, 1:43=A0am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
> Bill <jobhunt...@aol.com> writes:
> > I have a total of 128 MB of flash on my target board. =A0No USB ports.
> > Monitoring top, it does not appear that memory is being leaked, but it
> > is behaving as if running out of memory.
>
> Still assuming your description is correct, it is behaving as if the
> malloc-code made an invalid memory access because of a corrupted
> pointer inside the heap. But you can easily verify if the allocation
> should have succeeded, ie if there was a continuous area of at least
> 64K of 'unused VM' available:
>
> =A0 =A0 1. Modify the segfault handler in the kernel to send a SIGSTOP
> =A0 =A0 =A0 =A0instead of a SIGSEGV.
>
> =A0 =A0 2. Use pmap to inspect the address space layout of the affected
> =A0 =A0 =A0 =A0process after it has been stopped by the signal.

0
Reply jobhunts02 (106) 10/14/2008 9:07:17 PM

Bill wrote:
> 
> Below is what pmap -x gives for the process (snmpd) upon failing
> at a call to malloc for 65536 bytes.  Does anything here would
> indicate a possible problem trying to malloc 65536 bytes?  It
> should be noted that a call to pmap -x before the failure while
> snmpd was still running gave identical results.  Therefore, I
> wonder if the cause of the problem can be seen here?

Please do not top-post, but do snip properly.  Your answer belongs
after (or intermixed with) the quoted material to which you reply,
after snipping all irrelevant material.  This gives prospective
repliers a fighting chance at understanding the thread.  See the
following links:

  <http://www.catb.org/~esr/faqs/smart-questions.html>
  <http://www.caliburn.nl/topposting.html>
  <http://www.netmeister.org/news/learn2quote.html>
  <http://cfaj.freeshell.org/google/>  (taming google)
  <http://members.fortunecity.com/nnqweb/>  (newusers)

-- 
 [mail]: Chuck F (cbfalconer at maineline dot net) 
 [page]: <http://cbfalconer.home.att.net>
            Try the download section.
0
Reply cbfalconer (19183) 10/14/2008 9:26:47 PM

Bill <jobhunts02@aol.com> writes:

> After about 3 hours, the program seg faults when trying to do a malloc
> 65K bytes.

Are you *sure* above description is accurate?

Is it that the application gets SIGSEGV *while* trying to do a
malloc (IOW, it crashes *inside* malloc), or is it that the
application gets NULL from malloc and gets SIGSEGV when it attempts
to use returned memory?

The former implies heap corruption, the latter heap exhaustion.

MALLOC_CHECK_, Valgrind, efence all help with the former, but are
useless for the latter.

Cheers,
-- 
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
0
Reply ppluzhnikov-nsp2 (108) 10/15/2008 4:05:46 AM

On Oct 14, 9:05=A0pm, Paul Pluzhnikov <ppluzhnikov-...@gmail.com> wrote:
> Bill <jobhunt...@aol.com> writes:
> > After about 3 hours, the program seg faults when trying to do a malloc
> > 65K bytes.
>
> Are you *sure* above description is accurate?
>
> Is it that the application gets SIGSEGV *while* trying to do a
> malloc (IOW, it crashes *inside* malloc), or is it that the
> application gets NULL from malloc and gets SIGSEGV when it attempts
> to use returned memory?

A backtrace in the SIGSEGV signal handler I put into the application
points to the line where the malloc occurs.  There is an if statement
to check for a NULL pointer and print a message if malloc returned a
NULL pointer.  No message is printed.

>
> The former implies heap corruption, the latter heap exhaustion.
>
> MALLOC_CHECK_, Valgrind, efence all help with the former, but are
> useless for the latter.

1. Valgrind slows down the application too much to be effective.
2. efence exits during initialization with the "Exiting: mprotect()
failed: Cannot allocate memory" error.
3. I am running a test right now with MALLOC_CHECK_=3D2 and will examine
the results in the morning.

>
> Cheers,
> --
> In order to understand recursion you must first understand recursion.
> Remove /-nsp/ for email.

0
Reply jobhunts02 (106) 10/15/2008 5:55:50 AM

> A backtrace in the SIGSEGV signal handler I put into the application
> points to the line where the malloc occurs.  There is an if statement
> to check for a NULL pointer and print a message if malloc returned a
> NULL pointer.  No message is printed.

Beware of the possibility of buffering.  Consider the program below.
When run interactively with stdout connected to a terminal, then:
	f(10) is NULL.
	Segmentation fault
where the first  line is unbuffered stdout from the program,
and the   second line is unbuffered stderr from the shell.
When run with stdout re-directed into a regular file, then you see only:
	Segmentation fault
on stderr, and the file is *empty* ["No message is printed."]
because the buffer was not flushed.  So remember fflush().

-----
#include <stdio.h>

char *f(a)
{
	return 0;
}

main()
{
	char *p = f(10);
	if (NULL==p) {
		printf("f(10) is NULL.\n");
		/* fflush(stdout);    THE FIX */
	}
	return *p;
}
-----
0
Reply jreiser (114) 10/15/2008 1:35:01 PM

Bill <jobhunts02@aol.com> writes:
> Below is what pmap -x gives for the process (snmpd) upon failing at a
> call to malloc for 65536 bytes.  Does anything here would indicate a
> possible problem trying to malloc 65536 bytes?

[...]

> Address   Kbytes     RSS    Anon  Locked Mode   Mapping
> 0f8b8000      64       -       -       - r-x--  libresolv-2.6.so

[...]

> 10000000    1192       -       -       - r-x--  snmpd
> 10169000      32       -       -       - rwx--  snmpd
> 10171000     552       -       -       - rwx--    [ anon ]

The last line should describe the 'regular heap' of the application
(the area used by brk/sbrk). Its present size is 552K and it could
grow by about another 510M until it would 'hit' ld-2.6.so (sbrk/brk
would return null pointers then).


> 30000000     116       -       -       - r-x--  ld-2.6.so
> 3001d000      24       -       -       - rw---    [ anon ]
> 30023000       4       -       -       - r--s-    [ shmid=0x0 ]
> 30024000       4       -       -       - rw---    [ anon ]
> 30025000       4       -       -       - r--s-    [ shmid=0x0 ]
> 3005c000       4       -       -       - r----  ld-2.6.so
> 3005d000       4       -       -       - rwx--  ld-2.6.so
> 3005e000       4       -       -       - -----    [ anon ]
> 3005f000    8188       -       -       - rw---    [ anon ]
> 3085e000       4       -       -       - -----    [ anon ]
> 3085f000    8188       -       -       - rw---    [ anon ]
> 7ff61000     332       -       -       - rw---    [ stack ]
> -------- ------- ------- ------- -------
> total kB   25084       -       -       -

The two 8818K segements preceded by a single page w/ 'no access' are
most likely (userspace) NPTL-stacks for two threads (default NPTL
thread stack size is 8M, the lowest 4K are used as guard page so that
an access beyond the bounds of one stack causes a [MMU] exception
instead of overwriting data on the other stack). These stacks are
allocated by calling mmap with MAP_ANON. There is still plenty of
space for other anonymous mappings between the highest used address
(0x3105f000) and the lowest presently used address of the
conventional 'stack segment'.

Unless I am very much mistaken, this process should certainly be
capable of allocating more virtual memory using either brk/sbrk or
mmap.

BTW, while getting non-spam e-mails at least ocassionally is nice :-),
I usually read postings in the groups I frequent, except insofar
'certain posters', whom I deem to be more of an annoyance than an
information source, will be filtered by my newsreader.
0
Reply rweikusat (2716) 10/15/2008 4:05:31 PM

On 2008-10-14, Paul Pluzhnikov <ppluzhnikov-nsp@gmail.com> wrote:
>
> Efence adds 1 page guard to every malloc.
> It is very rarely helpful in debugging non-toy applications.

That may be your experience but personally I find it incredibly
useful for certain classes of problems.  Maybe not high-level stuff
or full applications but for low level data structure test beds I
find you can literally do in a morning what may take a week overwise.

-- 
Andrew Smallshaw
andrews@sdf.lonestar.org
0
Reply andrews (354) 10/15/2008 6:16:10 PM

On Oct 15, 9:05=A0am, Rainer Weikusat <rweiku...@mssgmbh.com> wrote:
> Bill <jobhunt...@aol.com> writes:
> > Below is what pmap -x gives for the process (snmpd) upon failing at a
> > call to malloc for 65536 bytes. =A0Does anything here would indicate a
> > possible problem trying to malloc 65536 bytes?
>
> [...]
>
> > Address =A0 Kbytes =A0 =A0 RSS =A0 =A0Anon =A0Locked Mode =A0 Mapping
> > 0f8b8000 =A0 =A0 =A064 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r-x-- =
=A0libresolv-2.6.so
>
> [...]
>
> > 10000000 =A0 =A01192 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r-x-- =
=A0snmpd
> > 10169000 =A0 =A0 =A032 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rwx-- =
=A0snmpd
> > 10171000 =A0 =A0 552 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rwx-- =
=A0 =A0[ anon ]
>
> The last line should describe the 'regular heap' of the application
> (the area used by brk/sbrk). Its present size is 552K and it could
> grow by about another 510M until it would 'hit' ld-2.6.so (sbrk/brk
> would return null pointers then).
>
> > 30000000 =A0 =A0 116 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r-x-- =
=A0ld-2.6.so
> > 3001d000 =A0 =A0 =A024 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ anon ]
> > 30023000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r--s- =
=A0 =A0[ shmid=3D0x0 ]
> > 30024000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ anon ]
> > 30025000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r--s- =
=A0 =A0[ shmid=3D0x0 ]
> > 3005c000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - r---- =
=A0ld-2.6.so
> > 3005d000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rwx-- =
=A0ld-2.6.so
> > 3005e000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - ----- =
=A0 =A0[ anon ]
> > 3005f000 =A0 =A08188 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ anon ]
> > 3085e000 =A0 =A0 =A0 4 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - ----- =
=A0 =A0[ anon ]
> > 3085f000 =A0 =A08188 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ anon ]
> > 7ff61000 =A0 =A0 332 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 - rw--- =
=A0 =A0[ stack ]
> > -------- ------- ------- ------- -------
> > total kB =A0 25084 =A0 =A0 =A0 - =A0 =A0 =A0 - =A0 =A0 =A0 -
>
> The two 8818K segements preceded by a single page w/ 'no access' are
> most likely (userspace) NPTL-stacks for two threads (default NPTL
> thread stack size is 8M, the lowest 4K are used as guard page so that
> an access beyond the bounds of one stack causes a [MMU] exception
> instead of overwriting data on the other stack). These stacks are
> allocated by calling mmap with MAP_ANON. There is still plenty of
> space for other anonymous mappings between the highest used address
> (0x3105f000) and the lowest presently used address of the
> conventional 'stack segment'.
>
> Unless I am very much mistaken, this process should certainly be
> capable of allocating more virtual memory using either brk/sbrk or
> mmap.


When it crashes, I get a SIGSEGV signal with an si_code of SEGV_MAPERR
and si_addr of  0x2d.  What does address 0x2d represent?  Is the
problem that address 0x2d is not in the ranges shown in pmap?



0
Reply jobhunts02 (106) 10/15/2008 6:35:57 PM

Bill <jobhunts02@aol.com> writes:

> When it crashes, I get a SIGSEGV signal with an si_code of SEGV_MAPERR

Page fault when accessing an unmapped page.

> and si_addr of  0x2d.  What does address 0x2d represent?

The address that the program tried to access.

>  Is the
> problem that address 0x2d is not in the ranges shown in pmap?

Well, sort of.  0x2d isn't in that range because that page isn't mapped.
But it's not supposed to be mapped.  The first page of virtual memory is
always unmapped, so that NULL pointer dereferences generate faults.  So
it's an address that can't possibly be valid.  If the crash is inside
malloc, as you said earlier, then most likely some pointer in malloc's
data structures got overwritten with 0x0000002d.

If you have a core dump, you might be able to trace backwards a little
ways to figure out where this pointer itself is located.  If you
recognize the data around it, it might suggest to you what part of your
program could be guilty of overwriting it.  (As a start, 0x2d is ASCII
'-'.  Any part of your program use hyphens?)
0
Reply nate14 (514) 10/15/2008 7:40:10 PM

Andrew Smallshaw wrote:
> Paul Pluzhnikov <ppluzhnikov-nsp@gmail.com> wrote:
> 
>> Efence adds 1 page guard to every malloc.
>> It is very rarely helpful in debugging non-toy applications.
> 
> That may be your experience but personally I find it incredibly
> useful for certain classes of problems.  Maybe not high-level
> stuff or full applications but for low level data structure test
> beds I find you can literally do in a morning what may take a
> week overwise.

Take a look at the description of the debug facilities in
nmalloc.txh.  That is part of nmalloc.zip, and is the source for
the info documentation of nmalloc.  nmalloc, in turn, is almost
pure standard C, but relies on the system sbrk() to get mamory
space, and makes some (quite usual) assumptions about memory.  See:

  <http://cbfalconer.home.att.net/download/nmalloc.zip>

-- 
 [mail]: Chuck F (cbfalconer at maineline dot net) 
 [page]: <http://cbfalconer.home.att.net>
            Try the download section.
0
Reply cbfalconer (19183) 10/16/2008 12:00:17 AM

Nate Eldredge <nate@vulcan.lan> writes:

>>  Is the
>> problem that address 0x2d is not in the ranges shown in pmap?
>
> Well, sort of.  0x2d isn't in that range because that page isn't mapped.
> But it's not supposed to be mapped.  The first page of virtual memory is
> always unmapped, so that NULL pointer dereferences generate faults.  So
> it's an address that can't possibly be valid.  If the crash is inside
> malloc, as you said earlier, then most likely some pointer in malloc's
> data structures got overwritten with 0x0000002d.

The OP stated that he doesn't actually know that, only deduces this
from lack of printed message (which, as John Reiser aptly suggests, may
be due to naive use of stdout buffering; where stderr was likely
called for).

Much more likely than pointer being overwritten is that malloc()
in fact returned NULL, and OP then did (an equivalent of):

  struct Foo *p = malloc(sizeof(Foo));
  p->some_field_at_offset_0x2d = 1;

Cheers,
-- 
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
0
Reply ppluzhnikov-nsp2 (108) 10/16/2008 3:42:26 AM

On Oct 15, 6:35=A0am, John Reiser <jrei...@BitWagon.com> wrote:

> #include <stdio.h>
>
> char *f(a)
> {
> =A0 =A0 =A0 =A0 return 0;
>
> }
>
> main()
> {
> =A0 =A0 =A0 =A0 char *p =3D f(10);
> =A0 =A0 =A0 =A0 if (NULL=3D=3Dp) {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 printf("f(10) is NULL.\n");
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* fflush(stdout); =A0 =A0THE FIX */
> =A0 =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0 return *p;}

This is still not very good. What if 'printf' needs to allocate memory
to do its job? What if 'fflush' does? In an error handler like this,
you are better off calling 'write' directly.

DS
0
Reply davids (1012) 10/16/2008 5:55:19 AM

Paul Pluzhnikov <ppluzhnikov-nsp@gmail.com> writes:
> Nate Eldredge <nate@vulcan.lan> writes:
>>>  Is the
>>> problem that address 0x2d is not in the ranges shown in pmap?
>>
>> Well, sort of.  0x2d isn't in that range because that page isn't mapped.
>> But it's not supposed to be mapped.  The first page of virtual memory is
>> always unmapped, so that NULL pointer dereferences generate faults.  So
>> it's an address that can't possibly be valid.  If the crash is inside
>> malloc, as you said earlier, then most likely some pointer in malloc's
>> data structures got overwritten with 0x0000002d.
>
> The OP stated that he doesn't actually know that, only deduces this
> from lack of printed message (which, as John Reiser aptly suggests, may
> be due to naive use of stdout buffering; where stderr was likely
> called for).
>
> Much more likely than pointer being overwritten is that malloc()
> in fact returned NULL, and OP then did (an equivalent of):

This means roughly 'it is more likely that the system ran out of
memory than that the application contained a programming error'.
But this is again a question which can be answered very simply: Check
the PC/IP value at the time of the segfault. That's either within
malloc (as the OP has repeatedly claimed) or within application code.

So, what is it?
0
Reply rweikusat (2716) 10/16/2008 6:39:07 AM

David Schwartz <davids@webmaster.com> writes:
> On Oct 15, 6:35�am, John Reiser <jrei...@BitWagon.com> wrote:
>
>> #include <stdio.h>
>>
>> char *f(a)
>> {
>> � � � � return 0;
>>
>> }
>>
>> main()
>> {
>> � � � � char *p = f(10);
>> � � � � if (NULL==p) {
>> � � � � � � � � printf("f(10) is NULL.\n");
>> � � � � � � � � /* fflush(stdout); � �THE FIX */
>> � � � � }
>> � � � � return *p;}
>
> This is still not very good. What if 'printf' needs to allocate memory
> to do its job? What if 'fflush' does? In an error handler like this,
> you are better off calling 'write' directly.

Another option is to force a segfault, ie assign to the area when the
pointer is null. The values of the various CPU registers, especially
the program counter, can then (in combination with a disasembler) be
used to determine the location of the crash and hence, the condition
at the time of the test.
0
Reply rweikusat (2716) 10/16/2008 6:42:35 AM

I am trying to use DUMA to find the problem and am getting the
following error when I try to link to the DUMA library:


# LD_PRELOAD=/lib/libduma.a /bin/snmpd &
# ERROR: ld.so: object '/lib/libduma.a' from LD_PRELOAD cannot be
preloaded: ignored.

What can cause this error when using LD_PRELOAD?
0
Reply jobhunts02 (106) 10/17/2008 8:56:13 PM

> # LD_PRELOAD=/lib/libduma.a /bin/snmpd &
> # ERROR: ld.so: object '/lib/libduma.a' from LD_PRELOAD cannot be
> preloaded: ignored.

Only an executable object (.e_type is ET_DYN or ET_EXEC) can be pre-loaded.
An archive library (*.a) that contains ET_REL files cannot be pre-loaded.

-- 
0
Reply jreiser (114) 10/17/2008 9:48:15 PM

Bill <jobhunts02@aol.com> writes:

> I am trying to use DUMA to find the problem and am getting the
> following error when I try to link to the DUMA library:

It amazes me that in all this time you never managed to answer a
simple question: is the crash *in* malloc, or is it in your own code?

You are continuing to debug this as if there is malloc corruption,
and this will prove futile if the crash is (as I suspect) in your
own code instead.

> # LD_PRELOAD=/lib/libduma.a /bin/snmpd &
> # ERROR: ld.so: object '/lib/libduma.a' from LD_PRELOAD cannot be
> preloaded: ignored.

You can only preload shared libraries.

Besides, reading man page for libduma, I see that it uses *exact*
same strategy as efence: a guard page after every allocation.

So, once you manage to build a shared libduma.so, and preload it;
it will most likely fail just like efence did, because the overhead
of guard pages is too great for majority of real-world (non-toy)
applications.

Cheers,
-- 
In order to understand recursion you must first understand recursion.
Remove /-nsp/ for email.
0
Reply ppluzhnikov-nsp2 (108) 10/18/2008 3:12:55 AM

25 Replies
44 Views

(page loaded in 0.409 seconds)


Reply: