Why does this core dump on solaris and not on linux

  • Follow


I am using system shared memory techniques, like shmget, shmat etc to
create a shared memory segment and manipulate data. I get a core dump
(SIGSEGV) when I run the daemon on Solaris. The same program works fine
on Red hat enterprise linux.

Here is the relevant section of code causing the problem.

probeSession *pSession = reinterpret_cast<probeSession *>(shmStart +

nO        ffset);

util::utilLog(ePROBE_LOG_ERROR, "Failed to %d. \n",
pSession->captureRuleIndex)

The access to the shared memory variable pSession->captureRuleIndex is
the culprit.

Even on Solaris if i do as follows, It works fine.

int nTest;

memcpy (&nTest, &(pSession->captureRuleIndex), sizeof (int));

util::utilLog(ePROBE_LOG_ERROR, "Failed to %d. \n", nTest);

I want to avoid the memcpy for performance reasons. In the real world I
want to access the entire struct (probeSession) which is about 1264
bytes . So memcpy is an expensive operation for a real time packet
capture system that I am working on.

Ninan

0
Reply nin234 (39) 1/19/2005 10:28:47 PM

In article <1106173727.657401.57050@z14g2000cwz.googlegroups.com>,
 nin234@yahoo.com wrote:

> I am using system shared memory techniques, like shmget, shmat etc to
> create a shared memory segment and manipulate data. I get a core dump
> (SIGSEGV) when I run the daemon on Solaris. The same program works fine
> on Red hat enterprise linux.
> 
> Here is the relevant section of code causing the problem.
> 
> probeSession *pSession = reinterpret_cast<probeSession *>(shmStart +
> 
> nO        ffset);

What's the type of shmStart?  Remember that when you perform arithmetic 
with pointers, it's done in units of the size of the type pointed to.  
But if nOffset is the result of offsetof(), it's in units of chars.

-- 
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
0
Reply Barry 1/20/2005 1:36:19 AM


shmStart is of type char *
We calculate the offset. I also observed the relevant shared memory
location using gdb x utility and the shared memory looks fine. All the
relevant fields have been populated.
Even
(gdb) print *pSession
Printed the contents of the structure as expected. I could see all the
relevant fields.

More over we have versions of this product in Linux which works fine.
Also the memcpy as I noted in the first posting works on Solaris.
My guess is this is some specific Solaris issue. Here are my
observations and line of thinking. I am able to copy the memory (via
memcpy) location and it is the access by value that is failing
for example if i pass the value to a function
void f (int index)
f (pSession->captureRuleIndex)
This call causes a core dump. It does n't even execute the first line
inside the function. That means it is core dumping while the function
is trying to create the temporary variable to hold the value.
It will also core dumps for
util::utilLog(ePROBE_LOG_ERROR, "Failed to %d. \n",
pSession->captureRuleIndex)
This is basically a function similar to printf.
In the advanced unix programming book (Richard Stevens) I read that
shared is not part of the stack or the heap. Is that what is creating
the issue and does linux behave differently.

Barry Margolin wrote:
> In article <1106173727.657401.57050@z14g2000cwz.googlegroups.com>,
>  nin234@yahoo.com wrote:
>
> > I am using system shared memory techniques, like shmget, shmat etc
to
> > create a shared memory segment and manipulate data. I get a core
dump
> > (SIGSEGV) when I run the daemon on Solaris. The same program works
fine
> > on Red hat enterprise linux.
> >
> > Here is the relevant section of code causing the problem.
> >
> > probeSession *pSession = reinterpret_cast<probeSession *>(shmStart
+
> >
> > nO        ffset);
>
> What's the type of shmStart?  Remember that when you perform
arithmetic
> with pointers, it's done in units of the size of the type pointed to.

> But if nOffset is the result of offsetof(), it's in units of chars.
>
> --
> Barry Margolin, barmar@alum.mit.edu
> Arlington, MA
> *** PLEASE post questions in newsgroups, not directly to me ***

0
Reply nin234 1/20/2005 2:41:13 AM

nin234@yahoo.com writes:

>I am using system shared memory techniques, like shmget, shmat etc to
>create a shared memory segment and manipulate data. I get a core dump
>(SIGSEGV) when I run the daemon on Solaris. The same program works fine
>on Red hat enterprise linux.

Are you sure you get a SIGSEGV and not a SIGBUS?
(Is this on SPARC?  Might this be an alignment problem?)

Casper
0
Reply Casper 1/20/2005 7:04:00 AM

I am sorry . Segmentation violation was the error I was getting when I
used gdb, . Rather gdb printed out that message. I installed two signal
handlers in my program for SIGBUS and SIGSEGV and it  catches SIGBUS.
So I am suspecting it is an alignment issue. My question , why is the
compiler not taking care of it. I am using gcc /g++. when using shmat,
I have 0 as the second argument.

0
Reply nin234 1/20/2005 6:30:53 PM

nin234@yahoo.com writes:

> I am sorry . Segmentation violation was the error I was getting when I
> used gdb, . Rather gdb printed out that message. I installed two signal
> handlers in my program for SIGBUS and SIGSEGV and it  catches SIGBUS.
> So I am suspecting it is an alignment issue. My question , why is the
> compiler not taking care of it. I am using gcc /g++. when using shmat,
> I have 0 as the second argument.
>

How can the compiler take care of alignment when you are using char * type?

Dragan

-- 
Dragan Cvetkovic, 

To be or not to be is true. G. Boole      No it isn't.  L. E. J. Brouwer

!!! Sender/From address is bogus. Use reply-to one !!!
0
Reply Dragan 1/20/2005 6:44:59 PM

nin234@yahoo.com writes:

>I am sorry . Segmentation violation was the error I was getting when I
>used gdb, . Rather gdb printed out that message. I installed two signal
>handlers in my program for SIGBUS and SIGSEGV and it  catches SIGBUS.
>So I am suspecting it is an alignment issue. My question , why is the
>compiler not taking care of it. I am using gcc /g++. when using shmat,
>I have 0 as the second argument.


Because you're casting the result so the compiler cannot
take care of it: you're breaking the rules and therefor you need
to use memcpy.

BTW, you should not think that memcpy is slower; it is certainly required
for portability.

Casper
0
Reply Casper 1/20/2005 7:07:45 PM

6 Replies
179 Views

(page loaded in 0.077 seconds)

Similiar Articles:













7/21/2012 6:09:43 AM


Reply: