Mysterious CPU consumption

  • Follow


Dear Unix Experts,

I have a bug somewhere that causes my code to consume all available CPU
time, but it does it in a bizzare way.  When I look at the output of
"top", I see that  overall CPU time is approximately 30% user and 70%
system, and the load average is just over 1.  But none of the listed
processes shows any significant %CPU: no process "owns" the excessive
processing time.

Of course I know which process is responsible for it - it's the code
I'm currently hacking, and killing it does return things to normal.  I
suspect that the problem is a loop that is calling select() over and
over again or something like that, and I'll eventually track it down.
But the fact that this CPU time is not being attributed to it is
mysterious, and I wonder if it is giving me a clue about what is going
wrong.  Has anyone else ever seen this behaviour?

This is with Linux, 2.6.3 kernel.

Cheers,  Phil.

0
Reply phil_gg04 (22) 5/30/2005 1:49:33 PM

phil_gg04@treefic.com writes:

> Dear Unix Experts,
>
> I have a bug somewhere that causes my code to consume all available CPU
> time, but it does it in a bizzare way.  When I look at the output of
> "top", I see that  overall CPU time is approximately 30% user and 70%
> system, and the load average is just over 1.

Sounds like it's stuck in a loop doing system calls repeatedly.

> But none of the listed processes shows any significant %CPU: no
> process "owns" the excessive processing time.
>
> Of course I know which process is responsible for it - it's the code
> I'm currently hacking, and killing it does return things to normal.  I
> suspect that the problem is a loop that is calling select() over and
> over again or something like that, and I'll eventually track it down.
> But the fact that this CPU time is not being attributed to it is
> mysterious, and I wonder if it is giving me a clue about what is going
> wrong.  Has anyone else ever seen this behaviour?
>
> This is with Linux, 2.6.3 kernel.

The behavior you describe is normal with old versions of "top" and an
NPTL enabled kernel/libc.  What happens is that another thread than
the first is using the CPU time, and the tools don't know where to
look to find out.  Upgrade your procps package and/or try "ps ux -T".
That command will list all the threads of each process, and should
show the CPU usage correctly.

To find the actual bug, try using strace to pinpoint the exact system
call.  If you're lucky, it's one that's called from few places in your
code.

-- 
M�ns Rullg�rd
mru@inprovide.com
0
Reply iso 5/30/2005 1:58:33 PM


>> overall CPU time is approximately 30% user and 70% system
>> [but ] none of the listed processes shows any significant %CPU

> The behavior you describe is normal with old versions of "top" and an
> NPTL enabled kernel/libc.  What happens is that another thread than
> the first is using the CPU time, and the tools don't know where to
> look to find out.

Ah, an instrumentation failure!  Thanks  M=E5ns, I'll upgrade and see if
that reports it properly.  This is indeed a multi-threaded application.

Cheers,  Phil.

0
Reply phil_gg04 5/30/2005 2:19:41 PM

On Mon, 30 May 2005 06:49:33 -0700, phil_gg04 wrote:

> I have a bug somewhere that causes my code to consume all available CPU
> time, but it does it in a bizzare way.  When I look at the output of
> "top", I see that  overall CPU time is approximately 30% user and 70%
> system, and the load average is just over 1.  But none of the listed
> processes shows any significant %CPU: no process "owns" the excessive
> processing time.

There is another possible explanation than the instrumentation defect
suggested by M�ns Rullg�rd. The scenario you describe may be due to
short lived processes. If processes begin and terminate in less than the
sample time of the monitoring tools the CPU time they consume can be hard
to account for.  The symptoms you report suggests that tasks (e.g.,
processes or threads) are being spawned and almost immediately exiting.
0
Reply Kurtis 6/1/2005 5:26:45 AM

>> the load average is just over 1.  But none of the listed
>> processes shows any significant %CPU
> may be due to short lived processes

Thanks Kurtis, that's possible.  But I don't think it's the problem in
my case, because looking at the process IDs allocated to new processes
they are only increasing at a sensible rate.  If there were short-lived
processes I'd see big gaps between the PIDs of other processes.

--Phil.

0
Reply phil_gg04 6/1/2005 11:36:56 AM

4 Replies
278 Views

(page loaded in 0.076 seconds)

Similiar Articles:













7/25/2012 8:09:39 PM


Reply: