SIGBUS: 10 error

  • Follow


Hi all,

I am debugging a mpi program and got this error. The weird thing is
I've checked the codes where the error was from, it just called a
constructor to new an object and it did well before the program ran to
particular time.

Anyone has any idea about why "p2_12374:  p4_error: interrupt SIGBUS:
10" occured? And any method might help me find out where the real bug
locates? Many thanks.

Patricia

0
Reply lanchenn (28) 8/10/2005 10:46:54 AM

Patricia wrote:
> Hi all,
> 
> I am debugging a mpi program and got this error. The weird thing is
> I've checked the codes where the error was from, it just called a
> constructor to new an object and it did well before the program ran to
> particular time.
> 
> Anyone has any idea about why "p2_12374:  p4_error: interrupt SIGBUS:
> 10" occured? And any method might help me find out where the real bug
> locates? Many thanks.
> 
> Patricia

SIGBUS errors usually indicate that an invalid memory address was
dereferenced.  (SIGBUS is like a SIGSEGV, except that the former's
memory dereference lies outside your process' address space, while the
latter is a dereference that lies within the process' memory space, but
in an invalid memory segment.)

    http://en.wikipedia.org/wiki/SIGBUS

Probably your object's constructor failed to allocate memory and then
tried to dereference the new object using an address of NULL, causing
the bus error.

The p4 error handler obviously lies within the MPI library code, but the
error might still be occurring in your code.  It's being caught by the
only SIGBUS error handler that's available: it's within MPI and was
registered when you called MPI_INIT.

    Randy

-- 
Randy Crawford   http://www.ruf.rice.edu/~rand   rand AT rice DOT edu

"If English was good enough for Jesus Christ, it ought to be good enough
for the children of Texas."  -- Texas Governor Ma Ferguson (1924)
0
Reply Randy 8/10/2005 8:47:17 PM


Hi,

Thanks for ur reply.I changed the codes, the SIGBUS:10 error changed to
SIGSEGV: 11. And core file was maken out. dbx reads the core like
below:

0x7f741f10: _malloc_unlocked+0x021c:    ld       [%o0 + 8], %o1
Current function is std::allocator<std::pair<const int,ProcessState>
>::allocate
  389         void * tmp = _RWSTD_STATIC_CAST(void*,(::operator
new(_RWSTD_STATIC_CAST(size_t,(n)))));

trace stack as:
[1] _malloc_unlocked(0x7f7c2858, 0x1bbfa8, 0x7f7bc008, 0x2008,
0x1bc040, 0x0),
 at 0x7f741f10
  [2] malloc(0x2008, 0x0, 0x7fbdc0c0, 0x7fbdc790, 0x7fb40b60,
0x7fbded84), at 0x
7f741cd8
  [3] _filbuf(0x142238, 0x1, 0x7f7bc008, 0x0, 0x2000, 0x1), at
0x7f78edf8
  [4] _doprnt(0x0, 0xffbedf68, 0x142238, 0x7f7bc008, 0x40, 0x11cf6c),
at 0x7f784
afc
  [5] printf(0x11cf6c, 0x142228, 0x7f7c3a54, 0x0, 0x2, 0x0), at
0x7f788154
  [6] p4_error(0x139048, 0xb, 0xb, 0x0, 0x0, 0x0), at 0xfe7d0
  [7] sig_err_handler(0xb, 0x0, 0xffbee0e0, 0x7f7bc008, 0x0, 0x0), at
0xfea14
  [8] _setpgid(0xb, 0x0, 0xffbee0e0, 0x7f7c284c, 0x7f7c2848, 0x0), at
0x7f79ebc4

  [9] _malloc_unlocked(0x7f7c2858, 0x1bbfa8, 0x7f7bc008, 0xb8,
0x1bc040, 0x0), a
t 0x7f741ea0
  [10] malloc(0xb4, 0xffbee8c3, 0x0, 0xffbee8c2, 0x2234c, 0x7f741ce4),
at 0x7f74
1cd8
  [11] operator new(0xb4, 0xffbee8c3, 0x13740, 0xffbeee38, 0x7fa4a8d4,
0xb4), at
 0x7fa371b8
=>[12] std::allocator<std::pair<const int,ProcessState>
>::allocate(this = 0xffb
ee5ff, n = 180U, _ARG3 = (nil)), line 389 in "memory"
  [13] std::allocator_interface<std::allocator<std::pair<const
int,ProcessState>
 >,__rwstd::__rb_tree<int,std::pair<const
int,ProcessState>,__rwstd::__select1st
 <std::pair<const
int,ProcessState>,int>,std::less<int>,std::allocator<std::pair<
const int,ProcessState> > >::__rb_tree_node>::allocate(this =
0xffbee5ff, n = 1U
, p = (nil)), line 488 in "memory"
  [14] __rwstd::__rb_tree<int,std::pair<const
int,ProcessState>,__rwstd::__selec
t1st<std::pair<const
int,ProcessState>,int>,std::less<int>,std::allocator<std::p
air<const int,ProcessState> > >::__add_new_buffer(this = 0x1b9c28),
line 167 in
"tree"
  [15] __rwstd::__rb_tree<int,std::pair<const
int,ProcessState>,__rwstd::__selec
t1st<std::pair<const
int,ProcessState>,int>,std::less<int>,std::allocator<std::p
air<const int,ProcessState> > >::__get_link(this = 0x1b9c28), line 189
in "tree"
  [16] __rwstd::__rb_tree<int,std::pair<const
int,ProcessState>,__rwstd::__selec
t1st<std::pair<const
int,ProcessState>,int>,std::less<int>,std::allocator<std::p
air<const int,ProcessState> > >::__get_node(this = 0x1b9c28), line 223
in "tree"
  [17] __rwstd::__rb_tree<int,std::pair<const
int,ProcessState>,__rwstd::__selec
t1st<std::pair<const
int,ProcessState>,int>,std::less<int>,std::allocator<std::p
air<const int,ProcessState> > >::init(this = 0x1b9c28), line 483 in
"tree"
  [18] __rwstd::__rb_tree<int,std::pair<const
int,ProcessState>,__rwstd::__selec
t1st<std::pair<const
int,ProcessState>,int>,std::less<int>,std::allocator<std::p
air<const int,ProcessState> > >::__rb_tree(this = 0x1b9c28, _RWSTD_COMP
= STRUCT
, always = false, alloc = CLASS), line 499 in "tree"
  [19]
std::map<int,ProcessState,std::less<int>,std::allocator<std::pair<const
i
nt,ProcessState> > >::map(this = 0x1b9c28, comp = STRUCT, alloc =
CLASS), line 1
48 in "map"
  [20] ContContrlState::ContContrlState(this = 0x1b9c00), line 18 in
"ContContrl
State.hh"
  [21] ContContrlObject::allocateState(this = 0x1b3c20), line 804 in
"ContContrl
Object.cc"
  [22] StateManager::saveState(this = 0x1b3e78), line 126 in
"StateManager.cc"
  [23] TimeWarp::saveState(this = 0x1b3c20), line 429 in "TimeWarp.cc"
  [24] TimeWarp::executeSimulation(this = 0x1b3c20), line 325 in
"TimeWarp.cc"
  [25] LTSFScheduler::runProcesses(this = 0xffbef270), line 50 in
"LTSFScheduler
..cc"
  [26] LogicalProcess::simulate(this = 0xffbeee90, _ARG2 = 2147483647),
line 885
 in "LogicalProcess.cc"
  [27] main(argc = 1, argv = 0x16a420), line 299 in "main.cc"

Could you give me some hints about this? what is _malloc_unlocked?

Patricia

Randy wrote:
> Patricia wrote:
> > Hi all,
> >
> > I am debugging a mpi program and got this error. The weird thing is
> > I've checked the codes where the error was from, it just called a
> > constructor to new an object and it did well before the program ran to
> > particular time.
> >
> > Anyone has any idea about why "p2_12374:  p4_error: interrupt SIGBUS:
> > 10" occured? And any method might help me find out where the real bug
> > locates? Many thanks.
> >
> > Patricia
>
> SIGBUS errors usually indicate that an invalid memory address was
> dereferenced.  (SIGBUS is like a SIGSEGV, except that the former's
> memory dereference lies outside your process' address space, while the
> latter is a dereference that lies within the process' memory space, but
> in an invalid memory segment.)
>
>     http://en.wikipedia.org/wiki/SIGBUS
>
> Probably your object's constructor failed to allocate memory and then
> tried to dereference the new object using an address of NULL, causing
> the bus error.
>
> The p4 error handler obviously lies within the MPI library code, but the
> error might still be occurring in your code.  It's being caught by the
> only SIGBUS error handler that's available: it's within MPI and was
> registered when you called MPI_INIT.
>
>     Randy
>
> --
> Randy Crawford   http://www.ruf.rice.edu/~rand   rand AT rice DOT edu
>
> "If English was good enough for Jesus Christ, it ought to be good enough
> for the children of Texas."  -- Texas Governor Ma Ferguson (1924)

0
Reply Patricia 9/2/2005 2:25:37 PM

Patricia wrote:

> Hi,
> 
> Thanks for ur reply.I changed the codes, the SIGBUS:10 error changed to
> SIGSEGV: 11. And core file was maken out. dbx reads the core like
> below:
> 
> 0x7f741f10: _malloc_unlocked+0x021c:    ld       [%o0 + 8], %o1
> Current function is std::allocator<std::pair<const int,ProcessState>
>>::allocate
>   389         void * tmp = _RWSTD_STATIC_CAST(void*,(::operator
> new(_RWSTD_STATIC_CAST(size_t,(n)))));
> 

> trace stack as:

[...]

>   [5] printf(0x11cf6c, 0x142228, 0x7f7c3a54, 0x0, 0x2, 0x0), at
> 0x7f788154
>   [6] p4_error(0x139048, 0xb, 0xb, 0x0, 0x0, 0x0), at 0xfe7d0
>   [7] sig_err_handler(0xb, 0x0, 0xffbee0e0, 0x7f7bc008, 0x0, 0x0), at
> 0xfea14
>   [8] _setpgid(0xb, 0x0, 0xffbee0e0, 0x7f7c284c, 0x7f7c2848, 0x0), at
> 0x7f79ebc4
> 
>   [9] _malloc_unlocked(0x7f7c2858, 0x1bbfa8, 0x7f7bc008, 0xb8,
> 0x1bc040, 0x0), a
> t 0x7f741ea0
>   [10] malloc(0xb4, 0xffbee8c3, 0x0, 0xffbee8c2, 0x2234c, 0x7f741ce4),
> at 0x7f74
> 1cd8

[...]

This is nothing to do with MPI, but your own code is causing a segmentation
fault which is being trapped by a signal handler installed by MPI.  That is
rather rude of the MPI I think, even more so since it _appears_ to be
calling functions that are not async-signal safe (such as printf)
http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html#tag_02_04_03

But the stack trace looks pretty confusing to me - you might be better off
reverting to the standard SIGSEGV handler to get rid of the p4_error junk
in the stack trace.  To do this, add

#include <signal.h>


   struct sigaction Action;
   sigemptyset(&Action.sa_mask);
   Action.sa_handler = SIG_DFL;
   Action.sa_flags = 0;
   sigaction(SIGSEGV, &Action, NULL);

somewhere after MPI_Init().  That *might* help the debugging.  Another thing
that might help is using some debugging version of malloc, since it looks
like heap corruption is causing a fault inside malloc.  (although I dont
understand why setpgid() is in there too).

Since this is nothing to do with MPI (well, except for the b0rken SEGV
handler), it would be better to followup to a group relating to your
compiler+platform, instead.

HTH,
Ian McCulloch

0
Reply Ian 9/2/2005 5:40:43 PM

Patricia,

From the little bit of web searching I've done, it sounds like your
heap's integrity has been corrupted.  Probably a write to a dynamically
created object exceeded the memory space that was allocated for it.  A
subsequent call to "new" then encounters invalid metadata within the
heap, and a pointer is dereferenced that points outside your process's
data space (SIGBUS) or points to zero (SEGFAULT).

Be sure to check the memory footprint of your object and its constituent
objects.  I assume "rb_tree" is a red-black tree.  It's likely that the
data held within the RB tree's nodes is actually a different size than
what has been requested by new, and not enough memory is being
allocated, and subsequent writes into the tree exceed the space
available for each datum, and the heap gets corrupted.

Of course, the heap corruption could have arisen from any other object's
earlier misuse of the heap.  It needn't be within your rb_tree code.
That's just where the heap corruption was first encountered.

That's my best guess anyway.  Not sure if it helps.

_malloc_unlocked() appears to be the core function within malloc() that
actually traverses the heap and returns the allocated memory:

http://cvs.opensolaris.org/source/xref/usr/src/lib/libc/port/gen/malloc.c

    149 void *
    150 malloc(size_t size)
    151 {
    152 	void *ret;
    153
    154 	if (!primary_link_map) {
    155 		errno = ENOTSUP;
    156 		return (NULL);
    157 	}
    158 	assert_no_libc_locks_held();
    159 	lmutex_lock(&libc_malloc_lock);
    160 	ret = _malloc_unlocked(size);
    161 	lmutex_unlock(&libc_malloc_lock);
    162 	return (ret);
    163 }

    Randy


Patricia wrote:
> Hi,
> 
> Thanks for ur reply.I changed the codes, the SIGBUS:10 error changed to
> SIGSEGV: 11. And core file was maken out. dbx reads the core like
> below:
> 
> 0x7f741f10: _malloc_unlocked+0x021c:    ld       [%o0 + 8], %o1
> Current function is std::allocator<std::pair<const int,ProcessState>
> 
>>::allocate
> 
>   389         void * tmp = _RWSTD_STATIC_CAST(void*,(::operator
> new(_RWSTD_STATIC_CAST(size_t,(n)))));
> 
> trace stack as:
> [1] _malloc_unlocked(0x7f7c2858, 0x1bbfa8, 0x7f7bc008, 0x2008,
> 0x1bc040, 0x0),
>  at 0x7f741f10
>   [2] malloc(0x2008, 0x0, 0x7fbdc0c0, 0x7fbdc790, 0x7fb40b60,
> 0x7fbded84), at 0x
> 7f741cd8
>   [3] _filbuf(0x142238, 0x1, 0x7f7bc008, 0x0, 0x2000, 0x1), at
> 0x7f78edf8
>   [4] _doprnt(0x0, 0xffbedf68, 0x142238, 0x7f7bc008, 0x40, 0x11cf6c),
> at 0x7f784
> afc
>   [5] printf(0x11cf6c, 0x142228, 0x7f7c3a54, 0x0, 0x2, 0x0), at
> 0x7f788154
>   [6] p4_error(0x139048, 0xb, 0xb, 0x0, 0x0, 0x0), at 0xfe7d0
>   [7] sig_err_handler(0xb, 0x0, 0xffbee0e0, 0x7f7bc008, 0x0, 0x0), at
> 0xfea14
>   [8] _setpgid(0xb, 0x0, 0xffbee0e0, 0x7f7c284c, 0x7f7c2848, 0x0), at
> 0x7f79ebc4
> 
>   [9] _malloc_unlocked(0x7f7c2858, 0x1bbfa8, 0x7f7bc008, 0xb8,
> 0x1bc040, 0x0), a
> t 0x7f741ea0
>   [10] malloc(0xb4, 0xffbee8c3, 0x0, 0xffbee8c2, 0x2234c, 0x7f741ce4),
> at 0x7f74
> 1cd8
>   [11] operator new(0xb4, 0xffbee8c3, 0x13740, 0xffbeee38, 0x7fa4a8d4,
> 0xb4), at
>  0x7fa371b8
> =>[12] std::allocator<std::pair<const int,ProcessState>
> 
>>::allocate(this = 0xffb
> 
> ee5ff, n = 180U, _ARG3 = (nil)), line 389 in "memory"
>   [13] std::allocator_interface<std::allocator<std::pair<const
> int,ProcessState>
>  >,__rwstd::__rb_tree<int,std::pair<const
> int,ProcessState>,__rwstd::__select1st
>  <std::pair<const
> int,ProcessState>,int>,std::less<int>,std::allocator<std::pair<
> const int,ProcessState> > >::__rb_tree_node>::allocate(this =
> 0xffbee5ff, n = 1U
> , p = (nil)), line 488 in "memory"
>   [14] __rwstd::__rb_tree<int,std::pair<const
> int,ProcessState>,__rwstd::__selec
> t1st<std::pair<const
> int,ProcessState>,int>,std::less<int>,std::allocator<std::p
> air<const int,ProcessState> > >::__add_new_buffer(this = 0x1b9c28),
> line 167 in
> "tree"
>   [15] __rwstd::__rb_tree<int,std::pair<const
> int,ProcessState>,__rwstd::__selec
> t1st<std::pair<const
> int,ProcessState>,int>,std::less<int>,std::allocator<std::p
> air<const int,ProcessState> > >::__get_link(this = 0x1b9c28), line 189
> in "tree"
>   [16] __rwstd::__rb_tree<int,std::pair<const
> int,ProcessState>,__rwstd::__selec
> t1st<std::pair<const
> int,ProcessState>,int>,std::less<int>,std::allocator<std::p
> air<const int,ProcessState> > >::__get_node(this = 0x1b9c28), line 223
> in "tree"
>   [17] __rwstd::__rb_tree<int,std::pair<const
> int,ProcessState>,__rwstd::__selec
> t1st<std::pair<const
> int,ProcessState>,int>,std::less<int>,std::allocator<std::p
> air<const int,ProcessState> > >::init(this = 0x1b9c28), line 483 in
> "tree"
>   [18] __rwstd::__rb_tree<int,std::pair<const
> int,ProcessState>,__rwstd::__selec
> t1st<std::pair<const
> int,ProcessState>,int>,std::less<int>,std::allocator<std::p
> air<const int,ProcessState> > >::__rb_tree(this = 0x1b9c28, _RWSTD_COMP
> = STRUCT
> , always = false, alloc = CLASS), line 499 in "tree"
>   [19]
> std::map<int,ProcessState,std::less<int>,std::allocator<std::pair<const
> i
> nt,ProcessState> > >::map(this = 0x1b9c28, comp = STRUCT, alloc =
> CLASS), line 1
> 48 in "map"
>   [20] ContContrlState::ContContrlState(this = 0x1b9c00), line 18 in
> "ContContrl
> State.hh"
>   [21] ContContrlObject::allocateState(this = 0x1b3c20), line 804 in
> "ContContrl
> Object.cc"
>   [22] StateManager::saveState(this = 0x1b3e78), line 126 in
> "StateManager.cc"
>   [23] TimeWarp::saveState(this = 0x1b3c20), line 429 in "TimeWarp.cc"
>   [24] TimeWarp::executeSimulation(this = 0x1b3c20), line 325 in
> "TimeWarp.cc"
>   [25] LTSFScheduler::runProcesses(this = 0xffbef270), line 50 in
> "LTSFScheduler
> .cc"
>   [26] LogicalProcess::simulate(this = 0xffbeee90, _ARG2 = 2147483647),
> line 885
>  in "LogicalProcess.cc"
>   [27] main(argc = 1, argv = 0x16a420), line 299 in "main.cc"
> 
> Could you give me some hints about this? what is _malloc_unlocked?
> 
> Patricia
> 
> Randy wrote:
> 
>>Patricia wrote:
>>
>>>Hi all,
>>>
>>>I am debugging a mpi program and got this error. The weird thing is
>>>I've checked the codes where the error was from, it just called a
>>>constructor to new an object and it did well before the program ran to
>>>particular time.
>>>
>>>Anyone has any idea about why "p2_12374:  p4_error: interrupt SIGBUS:
>>>10" occured? And any method might help me find out where the real bug
>>>locates? Many thanks.
>>>
>>>Patricia
>>
>>SIGBUS errors usually indicate that an invalid memory address was
>>dereferenced.  (SIGBUS is like a SIGSEGV, except that the former's
>>memory dereference lies outside your process' address space, while the
>>latter is a dereference that lies within the process' memory space, but
>>in an invalid memory segment.)
>>
>>    http://en.wikipedia.org/wiki/SIGBUS
>>
>>Probably your object's constructor failed to allocate memory and then
>>tried to dereference the new object using an address of NULL, causing
>>the bus error.
>>
>>The p4 error handler obviously lies within the MPI library code, but the
>>error might still be occurring in your code.  It's being caught by the
>>only SIGBUS error handler that's available: it's within MPI and was
>>registered when you called MPI_INIT.
>>
>>    Randy
>>
>>--
>>Randy Crawford   http://www.ruf.rice.edu/~rand   rand AT rice DOT edu
>>
>>"If English was good enough for Jesus Christ, it ought to be good enough
>>for the children of Texas."  -- Texas Governor Ma Ferguson (1924)
> 
> 

-- 
Randy Crawford   http://www.ruf.rice.edu/~rand   rand AT rice DOT edu

"Overstatement sucks." -- William of Ockham
0
Reply Randy 9/2/2005 7:27:52 PM

4 Replies
176 Views

(page loaded in 0.107 seconds)

Similiar Articles:










7/16/2012 6:55:58 PM


Reply: