How to find memory corruption when malloc is not the cause?

  • Follow


I'm trying to debug a huge application (Sage maths software)

http://www.sagemath.org/

on 64-bit SPARC.

The software works fine on 32-bit SPARC, but is unstable on 64-bit
SPARC. Although it functions to a limited extent on 64-bit SPARC

sage:
factor(111111111111111111111111111111111111111111111111111111111111111111)
3 * 7 * 11^2 * 13 * 23 * 37 * 67 * 4093 * 8779 * 21649 * 513239 *
599144041 * 183411838171 * 1344628210313298373

simply quitting, without doing anything at all, results in a core
dump.

----------------------------------------------------------------------
| Sage Version 4.5, Release Date: 2010-07-16                         |
| Type notebook() for the GUI, and license() for information.        |
----------------------------------------------------------------------
sage: quit
Exiting Sage (CPU time 0m0.39s, Wall time 0m5.48s).
/rootpool2/local/kirkby/sage-4.5-hacked-for-64-bit-solaris/local/bin/
sage-sage: line 206:  7148 Segmentation Fault      (core dumped) sage-
ipython "$@" -i
kirkby@t2:[~/sage-4.5-hacked-for-64-bit-solaris] $

I've tried using libumen

http://blogs.sun.com/hema/entry/libumem_for_detecting_memory_corruption

on the core file created, but that finds no problems.

kirkby@t2:[~/sage-4.5-hacked-for-64-bit-solaris] $ mdb core
Loading modules: [ libumem.so.1 libc.so.1 libuutil.so.1 ld.so.1 ]
> ::umem_verify
Cache Name                      Addr             Cache Integrity
umem_magazine_1                        10010e028 clean
umem_magazine_3                        100114028 clean
umem_magazine_7                        10011a028 clean
umem_magazine_15                       100120028 clean
umem_magazine_31                       100128028 clean
umem_magazine_47                       10012e028 clean
umem_magazine_63                       100134028 clean
etc etc

Has anyone got any suggestions how I might go about finding the
problem? I've built it on two systems, and both show the same behavior

 * Sun T5240, 32 GB RAM
 * Blade 1000, 2 GB RAM.

limumen finds a couple of memory leaks, but I don't think these would
be an issue on a machine with 32 GB RAM. In any case, the error
message does not indicate it can't allocate memory.

Dave
0
Reply Dr 7/22/2010 10:05:26 AM

On 2010-07-22 11:05:26 +0100, Dr. David Kirkby said:

> I'm trying to debug a huge application (Sage maths software)
> 
> http://www.sagemath.org/
> 
> on 64-bit SPARC.
> 
> The software works fine on 32-bit SPARC, but is unstable on 64-bit
> SPARC. Although it functions to a limited extent on 64-bit SPARC
> 
> sage:
> factor(111111111111111111111111111111111111111111111111111111111111111111)
> 3 * 7 * 11^2 * 13 * 23 * 37 * 67 * 4093 * 8779 * 21649 * 513239 *
> 599144041 * 183411838171 * 1344628210313298373
> 
> simply quitting, without doing anything at all, results in a core
> dump.
> 
> ----------------------------------------------------------------------
> | Sage Version 4.5, Release Date: 2010-07-16                         |
> | Type notebook() for the GUI, and license() for information.        |
> ----------------------------------------------------------------------
> sage: quit
> Exiting Sage (CPU time 0m0.39s, Wall time 0m5.48s).
> /rootpool2/local/kirkby/sage-4.5-hacked-for-64-bit-solaris/local/bin/
> sage-sage: line 206:  7148 Segmentation Fault      (core dumped) sage-
> ipython "$@" -i
> kirkby@t2:[~/sage-4.5-hacked-for-64-bit-solaris] $
> 
> I've tried using libumen
> 
> http://blogs.sun.com/hema/entry/libumem_for_detecting_memory_corruption
> 
> on the core file created, but that finds no problems.

What's the stack trace from the core?

-- 
Chris

0
Reply Chris 7/22/2010 10:38:26 AM


Dr. David Kirkby wrote:
> I'm trying to debug a huge application (Sage maths software)
[SNIP]
What happens when you run it under dbx?  (Or is it only the optimised 
version that chunders?)

	Cheers,
		Gary	B-)
0
Reply Gary 7/22/2010 11:58:33 AM

Dr. David Kirkby wrote:
> I'm trying to debug a huge application (Sage maths software)
> 
> http://www.sagemath.org/
> 
> on 64-bit SPARC.
> 
> The software works fine on 32-bit SPARC, but is unstable on 64-bit
> SPARC. Although it functions to a limited extent on 64-bit SPARC
> 
> sage:
> factor(111111111111111111111111111111111111111111111111111111111111111111)
> 3 * 7 * 11^2 * 13 * 23 * 37 * 67 * 4093 * 8779 * 21649 * 513239 *
> 599144041 * 183411838171 * 1344628210313298373
> 
> simply quitting, without doing anything at all, results in a core
> dump.
> 
> ----------------------------------------------------------------------
> | Sage Version 4.5, Release Date: 2010-07-16                         |
> | Type notebook() for the GUI, and license() for information.        |
> ----------------------------------------------------------------------
> sage: quit
> Exiting Sage (CPU time 0m0.39s, Wall time 0m5.48s).
> /rootpool2/local/kirkby/sage-4.5-hacked-for-64-bit-solaris/local/bin/
> sage-sage: line 206:  7148 Segmentation Fault      (core dumped) sage-
> ipython "$@" -i
> kirkby@t2:[~/sage-4.5-hacked-for-64-bit-solaris] $
> 
> I've tried using libumen
> 
> http://blogs.sun.com/hema/entry/libumem_for_detecting_memory_corruption
> 
> on the core file created, but that finds no problems.
> 
> kirkby@t2:[~/sage-4.5-hacked-for-64-bit-solaris] $ mdb core
> Loading modules: [ libumem.so.1 libc.so.1 libuutil.so.1 ld.so.1 ]
>> ::umem_verify
> Cache Name                      Addr             Cache Integrity
> umem_magazine_1                        10010e028 clean
> umem_magazine_3                        100114028 clean
> umem_magazine_7                        10011a028 clean
> umem_magazine_15                       100120028 clean
> umem_magazine_31                       100128028 clean
> umem_magazine_47                       10012e028 clean
> umem_magazine_63                       100134028 clean
> etc etc
> 
> Has anyone got any suggestions how I might go about finding the
> problem? I've built it on two systems, and both show the same behavior
> 
>  * Sun T5240, 32 GB RAM
>  * Blade 1000, 2 GB RAM.
> 
> limumen finds a couple of memory leaks, but I don't think these would
> be an issue on a machine with 32 GB RAM. In any case, the error
> message does not indicate it can't allocate memory.
> 
> Dave

IIRC, a segmentation fault means that the program has tried to access 
memory that does not exist or is not available (for your process).

Find out what the program was doing immediately before the failure.  If 
necessary, step through it one instruction at a time.

Be suspicious of computed subscripts and/or pointers.  It's also 
possible that a pointer has been overwritten with garbage or that a 
pointer has been used without being initialized.

This is absolutely Computer Programming 101!  And memory leaks suggest 
sloppy design/coding.

Some programs were NEVER designed; they just grew.


0
Reply Richard 7/22/2010 12:36:43 PM

3 Replies
766 Views

(page loaded in 0.056 seconds)

Similiar Articles:













7/21/2012 1:22:55 AM


Reply: