futex high sys cpu utilization

  • Follow


Hi,=20

In my c++ (Linux) application I am seeing high CPU utilization (almost 90%)=
 and it that sys-cpu:user-cpu ratio is 8:2. I then tried to run =93strace=
=94 command and found that around 80% of time is being spent in =93futex=94=
 system call, please see snippet of strace here:=20

% time     seconds  usecs/call     calls    errors syscall=20
------ ----------- ----------- --------- --------- ----------------=20
 80.66  100.007743        1567     63806     22761 futex=20
  9.15   11.340274       96925       117         4 restart_syscall=20
  5.60    6.939598        4401      1577           poll=20

Initially I though this is because of pthread_lock, unlock, condition timed=
wait, condition wait, etc. system calls I am using in the application, to p=
rove that I have overridden pthread_lock, unlock, condition timedwait, cond=
ition wait, unlock, etc. using LD_PRELOAD functionality and concluded that =
not many threads are waiting here (mutex->__data.__nusers remains less than=
 5 most of the time), so it doesn=92t seem that these sys calls are part of=
 the problem.=20

Now I am running out of clue about how to find out code location/system cal=
l which is causing this futex high utilization, any help on this would be a=
ppreciated.=20

Thank you,=20
Jaydeep
0
Reply chovatia.jaydeep (17) 7/17/2012 5:20:20 AM

Jaydeep Chovatia , dans le message
<f87be638-dccd-4a9d-a092-ef9ae92e0290@googlegroups.com>, a écrit :
> In my c++ (Linux) application I am seeing high CPU utilization (almost
> 90%) and it that sys-cpu:user-cpu ratio is 8:2. I then tried to run
> “strace” command and found that around 80% of time is being spent in
> “futex” system call, please see snippet of strace here: 

If your uptime is more than 17 days, try:

sudo date -s "$(date)"

(or equivalent) and see if it fixes things. This is not a joke: there was a
leap second at the end of last month, and a strange bug in the NTP handling
causes futexes to expire prematurely. This exact command prevented Iceweasel
from eating all my CPU.
0
Reply george54 (194) 7/17/2012 8:03:29 AM


Jaydeep Chovatia <chovatia.jaydeep@gmail.com> writes:
> In my c++ (Linux) application I am seeing high CPU utilization
> (almost 90%) and it that sys-cpu:user-cpu ratio is 8:2. I then tried
> to run “strace” command and found that around 80% of time is being
> spent in “futex” system call, please see snippet of strace here:
>
> % time     seconds  usecs/call     calls    errors syscall 
> ------ ----------- ----------- --------- --------- ---------------- 
>  80.66  100.007743        1567     63806     22761 futex 
>   9.15   11.340274       96925       117         4 restart_syscall 
>   5.60    6.939598        4401      1577           poll 
>
> Initially I though this is because of pthread_lock, unlock,
> condition timedwait, condition wait, etc. system calls I am using in
> the application, to prove that I have overridden pthread_lock,
> unlock, condition timedwait, condition wait, unlock, etc. using
> LD_PRELOAD functionality and concluded that not many threads are
> waiting here (mutex->__data.__nusers remains less than 5 most of the
> time), so it doesn’t seem that these sys calls are part of the
> problem.

Wild guess: Try running date -s "`date`". If this helps, you are
probably the last person on earth to notice that the Linux leap second
handling code is/was broken ...
0
Reply rweikusat (2678) 7/17/2012 10:55:41 AM

On Tuesday, 17 July 2012 03:55:41 UTC-7, Rainer Weikusat  wrote:
> Jaydeep Chovatia &lt;chovatia.jaydeep@gmail.com&gt; writes:
> &gt; In my c++ (Linux) application I am seeing high CPU utilization
> &gt; (almost 90%) and it that sys-cpu:user-cpu ratio is 8:2. I then tried
> &gt; to run =93strace=94 command and found that around 80% of time is bei=
ng
> &gt; spent in =93futex=94 system call, please see snippet of strace here:
> &gt;
> &gt; % time     seconds  usecs/call     calls    errors syscall=20
> &gt; ------ ----------- ----------- --------- --------- ----------------=
=20
> &gt;  80.66  100.007743        1567     63806     22761 futex=20
> &gt;   9.15   11.340274       96925       117         4 restart_syscall=
=20
> &gt;   5.60    6.939598        4401      1577           poll=20
> &gt;
> &gt; Initially I though this is because of pthread_lock, unlock,
> &gt; condition timedwait, condition wait, etc. system calls I am using in
> &gt; the application, to prove that I have overridden pthread_lock,
> &gt; unlock, condition timedwait, condition wait, unlock, etc. using
> &gt; LD_PRELOAD functionality and concluded that not many threads are
> &gt; waiting here (mutex-&gt;__data.__nusers remains less than 5 most of =
the
> &gt; time), so it doesn=92t seem that these sys calls are part of the
> &gt; problem.
>=20
> Wild guess: Try running date -s &quot;`date`&quot;. If this helps, you ar=
e
> probably the last person on earth to notice that the Linux leap second
> handling code is/was broken ...

Thanks for your prompt response.
I had already tried the leap second solution but after that also i see this=
 problem.

Thank you,
Jaydeep
0
Reply chovatia.jaydeep (17) 7/17/2012 3:46:58 PM

Nicolas George <nicolas$george@salle-s.org> wrote:
> Jaydeep Chovatia , dans le message
> <f87be638-dccd-4a9d-a092-ef9ae92e0290@googlegroups.com>, a écrit :
> > In my c++ (Linux) application I am seeing high CPU utilization (almost
> > 90%) and it that sys-cpu:user-cpu ratio is 8:2. I then tried to run
> > “strace” command and found that around 80% of time is being spent in
> > “futex” system call, please see snippet of strace here: 

> If your uptime is more than 17 days, try:

> sudo date -s "$(date)"

> (or equivalent) and see if it fixes things. This is not a joke: there was a
> leap second at the end of last month, and a strange bug in the NTP handling
> causes futexes to expire prematurely. This exact command prevented Iceweasel
> from eating all my CPU.

IIRC, the bug only causes this particular behavior when using sub-second
timeouts on a futex. The OP might also want to reconsider why he's using
sub-second timeouts in a loop.
0
Reply William 7/17/2012 6:09:51 PM

William Ahern <william@wilbur.25thandClement.com> writes:
> Nicolas George <nicolas$george@salle-s.org> wrote:
>> Jaydeep Chovatia , dans le message
>> <f87be638-dccd-4a9d-a092-ef9ae92e0290@googlegroups.com>, a écrit :
>> > In my c++ (Linux) application I am seeing high CPU utilization (almost
>> > 90%) and it that sys-cpu:user-cpu ratio is 8:2. I then tried to run
>> > “strace” command and found that around 80% of time is being spent in
>> > “futex” system call, please see snippet of strace here: 
>
>> If your uptime is more than 17 days, try:
>
>> sudo date -s "$(date)"
>
>> (or equivalent) and see if it fixes things. This is not a joke: there was a
>> leap second at the end of last month, and a strange bug in the NTP handling
>> causes futexes to expire prematurely. This exact command prevented Iceweasel
>> from eating all my CPU.
>
> IIRC, the bug only causes this particular behavior when using sub-second
> timeouts on a futex. The OP might also want to reconsider why he's using
> sub-second timeouts in a loop.

This affects anything using a subsecond timeout together with
CLOCK_REALTIME hrtimer and the reason is these hrtimers maintain a
redundant copy of 'the current time' which is not updated from the
leap second insertion code because Thomas Gleixner killed the call
that was supposed to do this in 2007
(746976a301ac9c9aa10d7d42454f8d6cdad8ff2b).

In case this is of interest to someone: I could offer a working patch
for that (used in production) for anything which predates the
6b43ae8a619d17c4935c3320d2ef9e92bdeed05d kernel/time/ntp.c commit
(Thu Mar 15 13:04:03 2012 -0700, meaning, it uses the hrtimer-based
leap-second handling before John Stultz introduced a locking order
problem into that by splitting a certain lock into two locks).

The patch itself is against 2.6.36.4. It my introduce blasphemous
whitespaces. It was tested with the 'John Stultz' test case program.

Index: linux-2-6/kernel/time/ntp.c
===================================================================
RCS file: /sysdata/cvs/linux-2-6/kernel/time/ntp.c,v
retrieving revision 1.6
retrieving revision 1.7
diff -u -r1.6 -r1.7
--- linux-2-6/kernel/time/ntp.c	19 Aug 2010 17:04:51 -0000	1.6
+++ linux-2-6/kernel/time/ntp.c	4 Jul 2012 15:35:33 -0000	1.7
@@ -184,10 +184,21 @@
  * day, the system clock is set back one second; if in leap-delete
  * state, the system clock is set ahead one second.
  */
+static void call_clock_was_set(unsigned long unused)
+{
+	(void)unused;
+	clock_was_set();
+}
+
 static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer)
 {
+	static struct timer_list
+		call_clock_was_set_timer = TIMER_INITIALIZER(call_clock_was_set, 0, 0);
+	
 	enum hrtimer_restart res = HRTIMER_NORESTART;
+	int leap;
 
+	leap = 0;
 	write_seqlock(&xtime_lock);
 
 	switch (time_state) {
@@ -196,17 +207,17 @@
 	case TIME_INS:
 		timekeeping_leap_insert(-1);
 		time_state = TIME_OOP;
-		printk(KERN_NOTICE
-			"Clock: inserting leap second 23:59:60 UTC\n");
 		hrtimer_add_expires_ns(&leap_timer, NSEC_PER_SEC);
 		res = HRTIMER_RESTART;
+
+		leap = -1;
 		break;
 	case TIME_DEL:
 		timekeeping_leap_insert(1);
 		time_tai--;
 		time_state = TIME_WAIT;
-		printk(KERN_NOTICE
-			"Clock: deleting leap second 23:59:59 UTC\n");
+
+		leap = 1;
 		break;
 	case TIME_OOP:
 		time_tai++;
@@ -219,6 +230,15 @@
 	}
 
 	write_sequnlock(&xtime_lock);
+	
+	if (leap) {
+	    mod_timer(&call_clock_was_set_timer, jiffies);
+	    
+	    printk(KERN_NOTICE
+		    "Clock: %s leap second 23:59:%u UTC\n",
+		    leap == -1 ? "inserting" : "deleting",
+		    leap == -1 ? 60 : 59);
+	}
 
 	return res;
 }
0
Reply rweikusat (2678) 7/17/2012 6:30:55 PM

On Mon, 16 Jul 2012, Jaydeep Chovatia wrote:

> Initially I though this is because of pthread_lock, unlock, condition 
> timedwait, condition wait, etc. system calls I am using in the 
> application, to prove that I have overridden pthread_lock, unlock, 
> condition timedwait, condition wait, unlock, etc. using LD_PRELOAD 
> functionality and concluded that not many threads are waiting here 
> (mutex->__data.__nusers remains less than 5 most of the time), so it 
> doesn?t seem that these sys calls are part of the problem.

I think what you describe might amount to high futex contention. ISTR 
NPTL/futexen are optimized for the uncontended case (= no one to wait for 
when acquiring, no one to wake when releasing), and they are purposely (or 
at least knowingly) heavy-weight in the contended case.

.... The references I can manage now are 
<http://man7.org/linux/man-pages/man7/futex.7.html> and 
<http://kerneltrap.org/node/422> (search the latter for "the main 
application cost is the lost paralellism" for the argument).

I assume __nusers is the number of blocked threads. "Less than 5 most of 
the time" accomodates "more than 3 most of the time", which could mean the 
pthread mutex operations take a trip to the kernel too very frequently.

This could be a good question for comp.programming.threads. Maybe look 
around in <http://www.1024cores.net/home/lock-free-algorithms>.

Laszlo
0
Reply lacos2 (200) 7/17/2012 9:08:55 PM

On Tue, 17 Jul 2012, Rainer Weikusat wrote:

> This affects anything using a subsecond timeout together with 
> CLOCK_REALTIME hrtimer and the reason is these hrtimers maintain a 
> redundant copy of 'the current time' which is not updated from the leap 
> second insertion code because Thomas Gleixner killed the call that was 
> supposed to do this in 2007 (746976a301ac9c9aa10d7d42454f8d6cdad8ff2b).

Thanks for the reference.

<troll>

"Quite a few people consider me to be one of the Grumpy Old Men. That's 
related to my age and the age-related unwillingness to cope with crap."

https://www.linux.com/news/special-feature/linux-developers/591039-30-linux-kernel-developers-in-30-weeks-thomas-gleixner

If he thinks he needs an excuse, he's gonna need a better one.

</troll>

Laszlo
0
Reply lacos2 (200) 7/17/2012 9:16:48 PM

"Ersek, Laszlo" <lacos@caesar.elte.hu> writes:
> On Tue, 17 Jul 2012, Rainer Weikusat wrote:
>> This affects anything using a subsecond timeout together with
>> CLOCK_REALTIME hrtimer and the reason is these hrtimers maintain a
>> redundant copy of 'the current time' which is not updated from the
>> leap second insertion code because Thomas Gleixner killed the call
>> that was supposed to do this in 2007
>> (746976a301ac9c9aa10d7d42454f8d6cdad8ff2b).

[...]

> "Quite a few people consider me to be one of the Grumpy Old
> Men. That's related to my age and the age-related unwillingness to
> cope with crap."
>
> https://www.linux.com/news/special-feature/linux-developers/591039-30-linux-kernel-developers-in-30-weeks-thomas-gleixner
>
> If he thinks he needs an excuse, he's gonna need a better one.

I don't think so. Even the smartest guy will occasionally make silly
mistakes. Because of this, there should be some kind of 'code review
process' by 'other smart guys' to catch these. And the problem is that
this doesn't exist, not the least because critcizing "I'm the core
maintainer of this, that and something else" guys is a really unwise
thing to do for anyone lower in the hierarchy with any ambitions
beyond "My problem is fixed. I'm happy".
0
Reply rweikusat (2678) 7/18/2012 8:18:45 PM

Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
> "Ersek, Laszlo" <lacos@caesar.elte.hu> writes:
> > On Tue, 17 Jul 2012, Rainer Weikusat wrote:
> >> This affects anything using a subsecond timeout together with
> >> CLOCK_REALTIME hrtimer and the reason is these hrtimers maintain a
> >> redundant copy of 'the current time' which is not updated from the
> >> leap second insertion code because Thomas Gleixner killed the call
> >> that was supposed to do this in 2007
> >> (746976a301ac9c9aa10d7d42454f8d6cdad8ff2b).

> [...]

> > "Quite a few people consider me to be one of the Grumpy Old
> > Men. That's related to my age and the age-related unwillingness to
> > cope with crap."
> >
> > https://www.linux.com/news/special-feature/linux-developers/591039-30-linux-kernel-developers-in-30-weeks-thomas-gleixner
> >
> > If he thinks he needs an excuse, he's gonna need a better one.

> I don't think so. Even the smartest guy will occasionally make silly
> mistakes. Because of this, there should be some kind of 'code review
> process' by 'other smart guys' to catch these. And the problem is that
> this doesn't exist, not the least because critcizing "I'm the core
> maintainer of this, that and something else" guys is a really unwise
> thing to do for anyone lower in the hierarchy with any ambitions
> beyond "My problem is fixed. I'm happy".

Code review probably would never have caught this. Catching this bug would
have required a thorough understanding of the timer infrastructure
(especially the niche cases of tick adjustments). I mean, the reviewer would
have had to foresee this kind of error (a missing line in some far flung
module) and know to look for it. Very few other developers would have had
this expertise. It's an unintended consequence of specialization; nobody
around to fix the other guy's mistakes.

IME code review is usually more a waste of time than anything. You usually
need someone as experienced or more experienced than the author wrt to the
subject code. But that means the cost of the review is quite high; higher
than most people are willing to regularly accept. Afterall, there's a reason
the other experienced engineer is off doing something else.

I'm not saying it can't be done systematically. It just requires a
considerable investment in time. Maybe the best way to handle code reviews
is to pair programmers together. Each generally goes about his own business,
but part of his responsibility is to keep track of what the other guy is
doing (keeping his opinions to himself, lest egos get in the way). When he's
finally tasked with reviewing some particuarly hairy bit of code he can come
up to speed immediately. Sort of like a President and Vice-President
situation. The VP keeps his mouth shut, and in return the President keeps
him apprised of everything that's going on so the VP can immediately step
into his shoes when required.
0
Reply William 7/18/2012 9:41:37 PM

William Ahern <william@wilbur.25thandClement.com> writes:
> Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
>> "Ersek, Laszlo" <lacos@caesar.elte.hu> writes:
>> > On Tue, 17 Jul 2012, Rainer Weikusat wrote:
>> >> This affects anything using a subsecond timeout together with
>> >> CLOCK_REALTIME hrtimer and the reason is these hrtimers maintain a
>> >> redundant copy of 'the current time' which is not updated from the
>> >> leap second insertion code because Thomas Gleixner killed the call
>> >> that was supposed to do this in 2007
>> >> (746976a301ac9c9aa10d7d42454f8d6cdad8ff2b).
>
>> [...]
>
>> > "Quite a few people consider me to be one of the Grumpy Old
>> > Men. That's related to my age and the age-related unwillingness to
>> > cope with crap."
>> >
>> > https://www.linux.com/news/special-feature/linux-developers/591039-30-linux-kernel-developers-in-30-weeks-thomas-gleixner
>> >
>> > If he thinks he needs an excuse, he's gonna need a better one.
>
>> I don't think so. Even the smartest guy will occasionally make silly
>> mistakes. Because of this, there should be some kind of 'code review
>> process' by 'other smart guys' to catch these. And the problem is that
>> this doesn't exist, not the least because critcizing "I'm the core
>> maintainer of this, that and something else" guys is a really unwise
>> thing to do for anyone lower in the hierarchy with any ambitions
>> beyond "My problem is fixed. I'm happy".
>
> Code review probably would never have caught this. Catching this bug would
> have required a thorough understanding of the timer infrastructure
> (especially the niche cases of tick adjustments).

Not at all. Employing some common sense had been entirely sufficient:
If clock_was_set is really useless when the clock is being set because
of a leap second, why isn't it also useless when the clock is being
set via settimeofday? 

And these are the kind of questions someone who is not immersed in the
details of a specific problem is likely to ask: Granted, this is a
tree. Coming to think of it, a lot of trees seem to be all around
here. Could we be in a forest?
0
Reply rweikusat (2678) 7/19/2012 4:12:24 PM

10 Replies
114 Views

(page loaded in 1.806 seconds)

Similiar Articles:













7/17/2012 6:56:58 PM


Reply: