Program hangs - but why?

  • Follow


Hi all

I am in the process of porting a program from Windows to Solaris. My
program seemingly works fine but after a couple of hours, it becomes
irresponsive and "trussing" the program has the following output
repeating ad infinitum (two iterations shown).
The program (which is coded in C++ and compiled with g++ 3.3.2) has
been boiled down to a loop which basically sleeps for one milliseconds
until ten seconds have passed and then prints the time-of-day using
time, localtime and asctime. (This is not entirely correct as it also
uses a framework utilising std::vector, std::string and other basic C++
and C functionality).
Even though the output suggests otherwise, there are no threads in my
program. uname gives the following information about our system (one
processor only):
SunOS u10sparc 5.8 Generic_117350-28 sun4u sparc SUNW,Ultra-5_10.

Is there anyone out there with an idea as to what might be going on?
Any help will be greatly appreciated.

Kind regards
Peter Koch Larsen


Output from truss:
lwp_cond_wait(0xFEC734F0, 0xFEC73500, 0xFEC6CD88) (sleeping...)
lwp_cond_wait(0xFEC734F0, 0xFEC73500, 0xFEC6CD88) Err#62 ETIME
sigsuspend(0xFFBEF2F8)          (sleeping...)
signotifywait()                 (sleeping...)
door_return(0x00000000, 0, 0x00000000, 0) (sleeping...)
lwp_cond_wait(0xFEC734F0, 0xFEC73500, 0xFEC6CD88) (sleeping...)
lwp_cond_wait(0xFEC734F0, 0xFEC73500, 0xFEC6CD88) Err#62 ETIME
sigsuspend(0xFFBEF2F8)          (sleeping...)
signotifywait()                 (sleeping...)
door_return(0x00000000, 0, 0x00000000, 0) (sleeping...)

0
Reply peter 3/7/2006 10:42:24 AM

peter.koch.larsen@gmail.com writes:
> I am in the process of porting a program from Windows to Solaris. My
> program seemingly works fine but after a couple of hours, it becomes
> irresponsive and "trussing" the program has the following output
> repeating ad infinitum (two iterations shown).

Output from pstack or a debugger (such as dbx) might show the place
where it's stuck more clearly.

> lwp_cond_wait(0xFEC734F0, 0xFEC73500, 0xFEC6CD88) (sleeping...)
> lwp_cond_wait(0xFEC734F0, 0xFEC73500, 0xFEC6CD88) Err#62 ETIME
> sigsuspend(0xFFBEF2F8)          (sleeping...)
> signotifywait()                 (sleeping...)
> door_return(0x00000000, 0, 0x00000000, 0) (sleeping...)

Door calls are often used for name services (into nscd).  That's just
a guess, though; the stack should show more.

-- 
James Carlson, KISS Network                    <james.d.carlson@sun.com>
Sun Microsystems / 1 Network Drive         71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677
0
Reply James 3/7/2006 12:15:17 PM


James Carlson wrote:
> peter.koch.larsen@gmail.com writes:
> 
>>I am in the process of porting a program from Windows to Solaris. My
>>program seemingly works fine but after a couple of hours, it becomes
>>irresponsive and "trussing" the program has the following output
>>repeating ad infinitum (two iterations shown).
> 
> 
> Output from pstack or a debugger (such as dbx) might show the place
> where it's stuck more clearly.
pstack in not available on Solaris 8 :-(
I think Peter should use Solaris 10 for his development as there are far 
better tool to track down problems than on Solaris 8 ( pstack dtrace etc)
Once the program works on Solaris 10 it should be just a matter of a 
re-compile on the Solaris 8 system.
> 
> 
>>lwp_cond_wait(0xFEC734F0, 0xFEC73500, 0xFEC6CD88) (sleeping...)
>>lwp_cond_wait(0xFEC734F0, 0xFEC73500, 0xFEC6CD88) Err#62 ETIME
>>sigsuspend(0xFFBEF2F8)          (sleeping...)
>>signotifywait()                 (sleeping...)
>>door_return(0x00000000, 0, 0x00000000, 0) (sleeping...)
> 
> 
> Door calls are often used for name services (into nscd).  That's just
> a guess, though; the stack should show more.
> 
0
Reply Richard 3/7/2006 4:15:34 PM

Richard Skelton wrote:
> James Carlson wrote:
> 
>> peter.koch.larsen@gmail.com writes:
>>
>>> I am in the process of porting a program from Windows to Solaris. My
>>> program seemingly works fine but after a couple of hours, it becomes
>>> irresponsive and "trussing" the program has the following output
>>> repeating ad infinitum (two iterations shown).
>>
>>
>>
>> Output from pstack or a debugger (such as dbx) might show the place
>> where it's stuck more clearly.
> 
> pstack in not available on Solaris 8 :-(
Sorry my mistake pstack is available on Solaris 8 but I still feel that 
Peter should develop his application on Solaris 10 :-)
> I think Peter should use Solaris 10 for his development as there are far 
> better tool to track down problems than on Solaris 8 ( pstack dtrace etc)
> Once the program works on Solaris 10 it should be just a matter of a 
> re-compile on the Solaris 8 system.
> 
>>
>>
>>> lwp_cond_wait(0xFEC734F0, 0xFEC73500, 0xFEC6CD88) (sleeping...)
>>> lwp_cond_wait(0xFEC734F0, 0xFEC73500, 0xFEC6CD88) Err#62 ETIME
>>> sigsuspend(0xFFBEF2F8)          (sleeping...)
>>> signotifywait()                 (sleeping...)
>>> door_return(0x00000000, 0, 0x00000000, 0) (sleeping...)
>>
>>
>>
>> Door calls are often used for name services (into nscd).  That's just
>> a guess, though; the stack should show more.
>>
0
Reply Richard 3/7/2006 4:50:24 PM

As others have noted, S10 is easier to debug on
and there really isn't enough info here to figure this
out.

I can advise you to use the alternate libthread on
Solaris 8 - simply link with -llwp and you'll get a much
better thread library.  That's largely the same code that is the
standard libthread on S9.  On S10, libthread == libc, and all
this fooling around w/ thread libraries stopped...

- Bart

0
Reply barts 3/8/2006 6:20:09 AM

Hi all

First let me thank you all for your time. I appreciate your responses
even if I they haven't been directly applicable for me.
I am not in a position to change operating system or stuff like that,
so I just have to muddle along with what I've got - meaning
old-fashioned try-error testing.

My last shot actually gave code, that makes my program live forever.
This is the snippet of the working code:

/*
time_t ltime;
struct tm *now;

time(&ltime);
now = localtime(&ltime);
char* ctimeptr = asctime(now);
targetStr = string(ctimeptr?ctimeptr:"");

// Skip '\n':
if (!targetStr.empty() && targetStr[targetStr.size()-1] == '\n')
  targetStr.resize(targetStr.size()-1);
*/
targetStr = "PKL-time";

If I remove the last line and uncomment the rest, my program "dies" as
described in my original post. As the code is not exactly rocket
science, this leads me to believe that some corruption of memory must
have taken place elsewhere. 

Kind regards
Peter

0
Reply peter 3/8/2006 12:36:31 PM

can anyone help me in this?

/# truss -faled  -p 16579
Base time stamp:  1340956354.3288  [ Fri Jun 29 13:22:34 IST 2012 ]
16579/1:        psargs: /opt/BGw/CXC1730901_R4S//lib/executable/BGwCollector Server1 EMM
16579/8:         0.6718 lwp_park(0xFD2FBC68, 0)                         Err#62 ETIME
16579/8:         0.6720 time()                                          = 1340956355
16579/16868:    lwp_park(0xFCEFBBB0, 0)         (sleeping...)
16579/1:        lwp_park(0x00000000, 0)         (sleeping...)
16579/5:        pollsys(0xFD67BB80, 1, 0xFD67BC10, 0x00000000) (sleeping...)
16579/6:        lwp_park(0x00000000, 0)         (sleeping...)
16579/2:        pollsys(0xFD97BB88, 1, 0xFD97BC10, 0x00000000) (sleeping...)
16579/3:        pollsys(0xFD87BB78, 1, 0xFD87BC10, 0x00000000) (sleeping...)
16579/7:        lwp_park(0x00000000, 0)         (sleeping...)
16579/16869:    lwp_park(0xFCDFBBB0, 0)         (sleeping...)
16579/4:        sigtimedwait(0xFD77BE78, 0xFD77BDF8, 0x00000000) (sleeping...)
16579/8:         1.6714 lwp_park(0xFD2FBC68, 0)                         Err#62 ETIME
16579/8:         1.6716 time()                                          = 1340956356
16579/8:         2.6714 lwp_park(0xFD2FBC68, 0)                         Err#62 ETIME
16579/8:         2.6715 time()                                          = 1340956357
16579/8:         3.6714 lwp_park(0xFD2FBC68, 0)                         Err#62 ETIME
16579/8:         3.6716 time()                                          = 1340956358
16579/16868:     4.6713 lwp_park(0xFCEFBBB0, 0)                         Err#62 ETIME
16579/8:         4.6714 lwp_park(0xFD2FBC68, 0)                         Err#62 ETIME
16579/16868:     4.6715 time()                                          = 1340956359
16579/8:         4.6716 time()                                          = 1340956359
16579/16868:     4.6718 openat(-3041965, "/var/opt/BGw/ServerGroup1/Server1/BhartiIN/Comverse_SMSC/CDR_in/COM_MT_toSDP/", O_RDONLY|O_NDELAY|O_LARGEFILE) = 8
16579/16868:     4.6736 fcntl(8, F_SETFD, 0x00000001)                   = 0
16579/16868:     4.6737 fstat64(8, 0xFCEFBA20)                          = 0
16579/16868:     4.6737 getdents64(8, 0xFEA30000, 8192)                 = 72
16579/16868:     4.6738 getdents64(8, 0xFEA30000, 8192)                 = 0
16579/16868:     4.6739 close(8)                                        = 0
16579/16868:     4.6740 time()                                          = 1340956359
16579/3:         5.1352 pollsys(0xFD87BB78, 1, 0xFD87BC10, 0x00000000)  = 0
16579/2:         5.5303 pollsys(0xFD97BB88, 1, 0xFD97BC10, 0x00000000)  = 0
16579/8:         5.6715 lwp_park(0xFD2FBC68, 0)                         Err#62 ETIME
16579/8:         5.6717 time()                                          = 1340956360
16579/16868:    lwp_park(0xFCEFBBB0, 0)         (sleeping...)
16579/3:        pollsys(0xFD87BB78, 1, 0xFD87BC10, 0x00000000) (sleeping...)
16579/2:        pollsys(0xFD97BB88, 1, 0xFD97BC10, 0x00000000) (sleeping...)
16579/8:         6.6715 lwp_park(0xFD2FBC68, 0)                         Err#62 ETIME
16579/8:         6.6717 time()                                          = 1340956361
16579/5:         7.0727 pollsys(0xFD67BB80, 1, 0xFD67BC10, 0x00000000)  = 0

--
0
Reply gore_suraj1234 (1) 6/29/2012 3:54:25 PM

6 Replies
1108 Views

(page loaded in 0.088 seconds)

Similiar Articles:











7/24/2012 12:42:06 PM


Reply: