f



threads and exit() woes

Hi,

   I've to deal with a multi-threaded program that has, as
one of its threads a "watchdog thread" that, when it doesn't
notice some variable getting set within a certain time,
is supposed to stop the whole program (at any cost, no
worries about data lost). It does attempt to shut down the
program by calling exit(). Now, all the references I have
consulted (TLPI, APUE 3rd ed. etc.) all claim that when one
of the threads calls exit() the program will be ended. A
look at SUSv4 just mentions in addition that the end of
the program might be delayed if there are outstanding
asynchronuous I/O operations that can't be cancelled
(nothing I guess I'm having).

This did work with a 3.4 Linux kernel. But after switching
to a 4.4 kernel it suddenly doesn't work reliably anymore.
If it fails one thread seems to run amok, using about 50%
of the CPU time, the other 50% being used by ksoftirqd. The
whole thing can't be stopped in any way (not even with 'kill
-SIGKILL'). I've also tried to replace the exit() call with
a kill(getpid(), SIGKILL) but also with no luck. Attaching
with gdb fails as well (hangs indefinitely). Looks like a
real zombie: dead and very active at the same time:-(

Does that ring a bell with anyone of you? One of the threads
is rather likely to do a lot of epoll() calls.

Please keep in mind that I can't simply change the whole
architecture - this is an embedded system already out in
the field, and my role in this is to get a new kernel ver-
sion to work, not upset a more or less working application
(unless I can come up with very convincing arguments;-)

                       Best regards, Jens
-- 
  \   Jens Thoms Toerring  ___      jt@toerring.de
   \__________________________      http://toerring.de
0
jt
12/12/2016 11:03:08 PM
comp.unix.programmer 10848 articles. 0 followers. kokososo56 (349) is leader. Post Follow

24 Replies
195 Views

Similar Articles

[PageSpeed] 50

Jens Thoms Toerring <jt@toerring.de> wrote:
> Hi,
> 
>    I've to deal with a multi-threaded program that has, as
> one of its threads a "watchdog thread" that, when it doesn't
> notice some variable getting set within a certain time,
> is supposed to stop the whole program (at any cost, no
> worries about data lost). It does attempt to shut down the
> program by calling exit().
<snip>
> This did work with a 3.4 Linux kernel. But after switching
> to a 4.4 kernel it suddenly doesn't work reliably anymore.
> If it fails one thread seems to run amok, using about 50%
> of the CPU time, the other 50% being used by ksoftirqd. The
> whole thing can't be stopped in any way (not even with 'kill
> -SIGKILL'). I've also tried to replace the exit() call with
> a kill(getpid(), SIGKILL) but also with no luck. Attaching
> with gdb fails as well (hangs indefinitely). Looks like a
> real zombie: dead and very active at the same time:-(

A shot in the dark: is the application using robust mutexes? That's the
first thing that comes to mind. Robust mutexes require the kernel, when
destroying a thread, to walk a userspace linked-list data structure.

0
william
12/12/2016 11:35:59 PM
william@wilbur.25thandclement.com wrote:
> Jens Thoms Toerring <jt@toerring.de> wrote:
> > Hi,
> > 
> >    I've to deal with a multi-threaded program that has, as
> > one of its threads a "watchdog thread" that, when it doesn't
> > notice some variable getting set within a certain time,
> > is supposed to stop the whole program (at any cost, no
> > worries about data lost). It does attempt to shut down the
> > program by calling exit().
> <snip>
> > This did work with a 3.4 Linux kernel. But after switching
> > to a 4.4 kernel it suddenly doesn't work reliably anymore.
> > If it fails one thread seems to run amok, using about 50%
> > of the CPU time, the other 50% being used by ksoftirqd. The
> > whole thing can't be stopped in any way (not even with 'kill
> > -SIGKILL'). I've also tried to replace the exit() call with
> > a kill(getpid(), SIGKILL) but also with no luck. Attaching
> > with gdb fails as well (hangs indefinitely). Looks like a
> > real zombie: dead and very active at the same time:-(

> A shot in the dark: is the application using robust mutexes? That's the
> first thing that comes to mind. Robust mutexes require the kernel, when
> destroying a thread, to walk a userspace linked-list data structure.

Unfortunately, I can't say (and the term "robust mutex" was new
to me, admittedly). There are several libraries involved that
create their own threads (libevent, libusb etc.) about which I
can't say much. The rest of the threads in the application itself
usually use pipes for basic communication apart from very simple
boolean values, defined as volatile sig_atomic_t for certain
state information. But, as far as I can see (but this can change
as I get around to delve deeper into the application) there are
no mutex locks that might lead to some kind of dead-lock. But
then it's 150 kloc of code I'm not too familiar with... I'll de-
finitely look at this aspect!

Could something like that keep a program alive that sends it-
self a SIGKILL (or does exit() or _exit())? That are all things
I've tried. The only result was that the chance that it got
stuck in that strange busy, non-killable state seemed to change
(and each test runs until the problem appears can take an hour
and more, making things somewhat annoying;-)

                Thank you and best regards, Jens
-- 
  \   Jens Thoms Toerring  ___      jt@toerring.de
   \__________________________      http://toerring.de
0
jt
12/13/2016 12:16:48 AM
On Monday December 12 2016 18:35, in
comp.unix.programmer, "william@wilbur.25thandClement.com"
<william@wilbur.25thandClement.com> wrote:

> Jens Thoms Toerring <jt@toerring.de> wrote:
>> Hi,
>> 
>>    I've to deal with a multi-threaded program that has, as
>> one of its threads a "watchdog thread" that, when it doesn't
>> notice some variable getting set within a certain time,
>> is supposed to stop the whole program (at any cost, no
>> worries about data lost). It does attempt to shut down the
>> program by calling exit().
> <snip>
>> This did work with a 3.4 Linux kernel. But after switching
>> to a 4.4 kernel it suddenly doesn't work reliably anymore.
>> If it fails one thread seems to run amok, using about 50%
>> of the CPU time, the other 50% being used by ksoftirqd. The
>> whole thing can't be stopped in any way (not even with 'kill
>> -SIGKILL'). I've also tried to replace the exit() call with
>> a kill(getpid(), SIGKILL) but also with no luck. Attaching
>> with gdb fails as well (hangs indefinitely). Looks like a
>> real zombie: dead and very active at the same time:-(
> 
> A shot in the dark: is the application using robust mutexes? That's the
> first thing that comes to mind. Robust mutexes require the kernel, when
> destroying a thread, to walk a userspace linked-list data structure.

Another shot in the dark:
Did the C runtime library (glibc or local equivalent) change? If so, was it
compiled so as to use the exit_group(2) syscall in the exit(3) function?

According to various Linux kernel docs, since the introduction of NPTL,
exit(2) only terminates the calling thread, leaving all other threads in
the "process" active. To terminate /all/ threads at once, use exit_group(2).
Since glibc v2.3, the exit(3) call has invoked exit_group(2) instead of
exit(2). Perhaps your newer version of the runtime library has reverted back
to calling exit(2).


-- 
Lew Pitcher
"In Skills, We Trust"
PGP public key available upon request

0
Lew
12/13/2016 3:42:21 AM
Jens Thoms Toerring <jt@toerring.de> wrote:
> william@wilbur.25thandclement.com wrote:
>> Jens Thoms Toerring <jt@toerring.de> wrote:
>> > Hi,
>> > 
>> >    I've to deal with a multi-threaded program that has, as
>> > one of its threads a "watchdog thread" that, when it doesn't
>> > notice some variable getting set within a certain time,
>> > is supposed to stop the whole program (at any cost, no
>> > worries about data lost). It does attempt to shut down the
>> > program by calling exit().
>> <snip>
>> > This did work with a 3.4 Linux kernel. But after switching
>> > to a 4.4 kernel it suddenly doesn't work reliably anymore.
>> > If it fails one thread seems to run amok, using about 50%
>> > of the CPU time, the other 50% being used by ksoftirqd. The
>> > whole thing can't be stopped in any way (not even with 'kill
>> > -SIGKILL'). I've also tried to replace the exit() call with
>> > a kill(getpid(), SIGKILL) but also with no luck. Attaching
>> > with gdb fails as well (hangs indefinitely). Looks like a
>> > real zombie: dead and very active at the same time:-(
> 
>> A shot in the dark: is the application using robust mutexes? That's the
>> first thing that comes to mind. Robust mutexes require the kernel, when
>> destroying a thread, to walk a userspace linked-list data structure.
> 
> Unfortunately, I can't say (and the term "robust mutex" was new
> to me, admittedly). There are several libraries involved that
> create their own threads (libevent, libusb etc.) about which I
> can't say much.

USB, embedded... I switch my vote to a USB driver issue ;)

<snip>
> Could something like that keep a program alive that sends it-
> self a SIGKILL (or does exit() or _exit())? That are all things
> I've tried. The only result was that the chance that it got
> stuck in that strange busy, non-killable state seemed to change
> (and each test runs until the problem appears can take an hour
> and more, making things somewhat annoying;-)

Theoretically the kernel shouldn't have a problem if the linked-list is
corrupted or if any of the memory it points to has weird permissions.
However, the Linux kernel is quite complex and has more than its fair share
of bugs.

The ksoftirqd load made me think of some kind of pathological page faulting
behavior occuring from kernel context as it tears the process down (see
exit_robust_list in kernel/futex.c). But I don't even know if ksoftirqd
handles page faults at all.

Don't put much stock in my comments. I haven't personally run into issues
with robust mutexes, beyond bugs in glibc[1]. That locking doesn't stand out
to you would make me look elsewhere.

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=12683

0
william
12/13/2016 5:31:02 AM
On Mon, 2016-12-12, Jens Thoms Toerring wrote:
> Hi,
>
>    I've to deal with a multi-threaded program that has, as
> one of its threads a "watchdog thread" that, when it doesn't
> notice some variable getting set within a certain time,
> is supposed to stop the whole program (at any cost, no
> worries about data lost). It does attempt to shut down the
> program by calling exit(). Now, all the references I have
> consulted (TLPI, APUE 3rd ed. etc.) all claim that when one
> of the threads calls exit() the program will be ended. A
> look at SUSv4 just mentions in addition that the end of
> the program might be delayed if there are outstanding
> asynchronuous I/O operations that can't be cancelled
> (nothing I guess I'm having).
>
> This did work with a 3.4 Linux kernel. But after switching
> to a 4.4 kernel it suddenly doesn't work reliably anymore.
> If it fails one thread seems to run amok, using about 50%
> of the CPU time, the other 50% being used by ksoftirqd. The
> whole thing can't be stopped in any way (not even with 'kill
> -SIGKILL'). I've also tried to replace the exit() call with
> a kill(getpid(), SIGKILL) but also with no luck. Attaching
> with gdb fails as well (hangs indefinitely). Looks like a
> real zombie: dead and very active at the same time:-(
>
> Does that ring a bell with anyone of you? One of the threads
> is rather likely to do a lot of epoll() calls.
>
> Please keep in mind that I can't simply change the whole
> architecture - this is an embedded system already out in
> the field, and my role in this is to get a new kernel ver-
> sion to work, not upset a more or less working application
> (unless I can come up with very convincing arguments;-)

Apart from what the others wrote:

- Can you use strace or pstack or something to find out what that
  remaining thread is doing?  Even looking in /proc can be useful.

- Keep in mind that exit() does things before exiting, e.g. run exit
  handlers.

Also shots in the dark ...

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .     .
\X/     snipabacken.se>   O  o   .
0
Jorgen
12/13/2016 6:44:58 AM
On Tue, 2016-12-13, Lew Pitcher wrote:
> On Monday December 12 2016 18:35, in
> comp.unix.programmer, "william@wilbur.25thandClement.com"
> <william@wilbur.25thandClement.com> wrote:
>
>> Jens Thoms Toerring <jt@toerring.de> wrote:
>>> Hi,
>>> 
>>>    I've to deal with a multi-threaded program that has, as
>>> one of its threads a "watchdog thread" that, when it doesn't
>>> notice some variable getting set within a certain time,
>>> is supposed to stop the whole program (at any cost, no
>>> worries about data lost). It does attempt to shut down the
>>> program by calling exit().
>> <snip>
>>> This did work with a 3.4 Linux kernel. But after switching
>>> to a 4.4 kernel it suddenly doesn't work reliably anymore.
>>> If it fails one thread seems to run amok, using about 50%
>>> of the CPU time, the other 50% being used by ksoftirqd. The
>>> whole thing can't be stopped in any way (not even with 'kill
>>> -SIGKILL'). I've also tried to replace the exit() call with
>>> a kill(getpid(), SIGKILL) but also with no luck. Attaching
>>> with gdb fails as well (hangs indefinitely). Looks like a
>>> real zombie: dead and very active at the same time:-(
>> 
>> A shot in the dark: is the application using robust mutexes? That's the
>> first thing that comes to mind. Robust mutexes require the kernel, when
>> destroying a thread, to walk a userspace linked-list data structure.
>
> Another shot in the dark:
> Did the C runtime library (glibc or local equivalent) change? If so, was it
> compiled so as to use the exit_group(2) syscall in the exit(3) function?
>
> According to various Linux kernel docs, since the introduction of NPTL,
> exit(2) only terminates the calling thread, leaving all other threads in
> the "process" active. To terminate /all/ threads at once, use exit_group(2).
> Since glibc v2.3, the exit(3) call has invoked exit_group(2) instead of
> exit(2).

This also seems to be documented in _exit(2).  (Note the underscore.)

> Perhaps your newer version of the runtime library has reverted back
> to calling exit(2).

Also, perhaps Jens' team has broken exit() while porting.  Since it's
embedded I suppose they (or a third party) provide the OS.  From your
description, this seems easy to get wrong.

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .     .
\X/     snipabacken.se>   O  o   .
0
Jorgen
12/13/2016 6:49:29 AM
jt@toerring.de (Jens Thoms Toerring) writes:

[terminate program via exit run by watchdog thread]

> This did work with a 3.4 Linux kernel. But after switching
> to a 4.4 kernel it suddenly doesn't work reliably anymore.
> If it fails one thread seems to run amok, using about 50%
> of the CPU time, the other 50% being used by ksoftirqd. The
> whole thing can't be stopped in any way (not even with 'kill
> -SIGKILL').

This suggests that the thread is in a D state (uninterruptible sleep)
which persists for some reason. Trying to determine what it's doing in
the kernel (eg, strace, /proc/<pid>/wchan) might be useful.
0
Rainer
12/13/2016 4:06:52 PM
On 13.12.16 00.03, Jens Thoms Toerring wrote:
> This did work with a 3.4 Linux kernel. But after switching
> to a 4.4 kernel it suddenly doesn't work reliably anymore.
> If it fails one thread seems to run amok, using about 50%
> of the CPU time, the other 50% being used by ksoftirqd. The
> whole thing can't be stopped in any way (not even with 'kill
> -SIGKILL'). I've also tried to replace the exit() call with
> a kill(getpid(), SIGKILL) but also with no luck. Attaching
> with gdb fails as well (hangs indefinitely). Looks like a
> real zombie: dead and very active at the same time:-(

Probably an exit handler does unexpected things. This could be part of 
the C runtime as well as part of a used library or even your code.

Maybe shutting down your program this way runs into badly tested code 
paths with some race conditions.

Try abort() which does not invoke that much exit handlers.

> Does that ring a bell with anyone of you? One of the threads
> is rather likely to do a lot of epoll() calls.

Definitely I/O. It should check for the exit condition before invoking 
another I/O. The Linux kernel behaves quite bad when killing processes 
with outstanding I/O. Request like that are simply ignored.


Marcel
0
Marcel
12/13/2016 5:34:23 PM
Marcel Mueller <news.5.maazl@spamgourmet.org> writes:
>On 13.12.16 00.03, Jens Thoms Toerring wrote:
>> This did work with a 3.4 Linux kernel. But after switching
>> to a 4.4 kernel it suddenly doesn't work reliably anymore.
>> If it fails one thread seems to run amok, using about 50%
>> of the CPU time, the other 50% being used by ksoftirqd. The
>> whole thing can't be stopped in any way (not even with 'kill
>> -SIGKILL'). I've also tried to replace the exit() call with
>> a kill(getpid(), SIGKILL) but also with no luck. Attaching
>> with gdb fails as well (hangs indefinitely). Looks like a
>> real zombie: dead and very active at the same time:-(
>
>Probably an exit handler does unexpected things. This could be part of 
>the C runtime as well as part of a used library or even your code.
>
>Maybe shutting down your program this way runs into badly tested code 
>paths with some race conditions.
>
>Try abort() which does not invoke that much exit handlers.
>
>> Does that ring a bell with anyone of you? One of the threads
>> is rather likely to do a lot of epoll() calls.
>
>Definitely I/O. It should check for the exit condition before invoking 
>another I/O. The Linux kernel behaves quite bad when killing processes 
>with outstanding I/O. Request like that are simply ignored.
>

If SIGKILL doesn't kill the process, you've a kernel bug.
0
scott
12/13/2016 6:13:40 PM
On Tuesday December 13 2016 13:13, in comp.unix.programmer, "Scott Lurndal"
<scott@slp53.sl.home> wrote:

> Marcel Mueller <news.5.maazl@spamgourmet.org> writes:
>>On 13.12.16 00.03, Jens Thoms Toerring wrote:
>>> This did work with a 3.4 Linux kernel. But after switching
>>> to a 4.4 kernel it suddenly doesn't work reliably anymore.
>>> If it fails one thread seems to run amok, using about 50%
>>> of the CPU time, the other 50% being used by ksoftirqd. The
>>> whole thing can't be stopped in any way (not even with 'kill
>>> -SIGKILL'). I've also tried to replace the exit() call with
>>> a kill(getpid(), SIGKILL) but also with no luck. Attaching
>>> with gdb fails as well (hangs indefinitely). Looks like a
>>> real zombie: dead and very active at the same time:-(
>>
>>Probably an exit handler does unexpected things. This could be part of
>>the C runtime as well as part of a used library or even your code.
>>
>>Maybe shutting down your program this way runs into badly tested code
>>paths with some race conditions.
>>
>>Try abort() which does not invoke that much exit handlers.
>>
>>> Does that ring a bell with anyone of you? One of the threads
>>> is rather likely to do a lot of epoll() calls.
>>
>>Definitely I/O. It should check for the exit condition before invoking
>>another I/O. The Linux kernel behaves quite bad when killing processes
>>with outstanding I/O. Request like that are simply ignored.
>>
> 
> If SIGKILL doesn't kill the process, you've a kernel bug.

Even with a non-buggy kernel, SIGKILL won't terminate a zombie process, nor a
process stuck in "uninterruptable sleep" state.

It would be helpfull to see the state of the hung thread, as reported by ps or
some other tool.

-- 
Lew Pitcher
"In Skills, We Trust"
PGP public key available upon request

0
Lew
12/13/2016 6:27:48 PM
Lew Pitcher <lew.pitcher@digitalfreehold.ca> writes:
>On Tuesday December 13 2016 13:13, in comp.unix.programmer, "Scott Lurndal"
><scott@slp53.sl.home> wrote:
>
>> Marcel Mueller <news.5.maazl@spamgourmet.org> writes:
>>>On 13.12.16 00.03, Jens Thoms Toerring wrote:
>>>> This did work with a 3.4 Linux kernel. But after switching
>>>> to a 4.4 kernel it suddenly doesn't work reliably anymore.
>>>> If it fails one thread seems to run amok, using about 50%
>>>> of the CPU time, the other 50% being used by ksoftirqd. The
>>>> whole thing can't be stopped in any way (not even with 'kill
>>>> -SIGKILL'). I've also tried to replace the exit() call with
>>>> a kill(getpid(), SIGKILL) but also with no luck. Attaching
>>>> with gdb fails as well (hangs indefinitely). Looks like a
>>>> real zombie: dead and very active at the same time:-(
>>>
>>>Probably an exit handler does unexpected things. This could be part of
>>>the C runtime as well as part of a used library or even your code.
>>>
>>>Maybe shutting down your program this way runs into badly tested code
>>>paths with some race conditions.
>>>
>>>Try abort() which does not invoke that much exit handlers.
>>>
>>>> Does that ring a bell with anyone of you? One of the threads
>>>> is rather likely to do a lot of epoll() calls.
>>>
>>>Definitely I/O. It should check for the exit condition before invoking
>>>another I/O. The Linux kernel behaves quite bad when killing processes
>>>with outstanding I/O. Request like that are simply ignored.
>>>
>> 
>> If SIGKILL doesn't kill the process, you've a kernel bug.
>
>Even with a non-buggy kernel, SIGKILL won't terminate a zombie process, nor a
>process stuck in "uninterruptable sleep" state.

A zombie no longer holds resources, with the exception of the exit status
(say 32-bits) and the pid. 

It's the parent responsibility to reap the status.

An operating system that allows an application to enter an
"uninterruptable sleep" state is broken.

It used to be in SVR3, that one could end up in an uninterruptable
sleep state during close(2) when the file descriptor referenced a
character special device for a parallel port (e.g. printer) and the
printer was off-line.  Bugs like that were mainly fixed a quarter
century ago.
0
scott
12/13/2016 6:44:59 PM
On Tuesday December 13 2016 13:44, in comp.unix.programmer, "Scott Lurndal"
<scott@slp53.sl.home> wrote:

> Lew Pitcher <lew.pitcher@digitalfreehold.ca> writes:
>>On Tuesday December 13 2016 13:13, in comp.unix.programmer, "Scott Lurndal"
>><scott@slp53.sl.home> wrote:
>>
>>> Marcel Mueller <news.5.maazl@spamgourmet.org> writes:
>>>>On 13.12.16 00.03, Jens Thoms Toerring wrote:
>>>>> This did work with a 3.4 Linux kernel. But after switching
>>>>> to a 4.4 kernel it suddenly doesn't work reliably anymore.
>>>>> If it fails one thread seems to run amok, using about 50%
>>>>> of the CPU time, the other 50% being used by ksoftirqd. The
>>>>> whole thing can't be stopped in any way (not even with 'kill
>>>>> -SIGKILL'). I've also tried to replace the exit() call with
>>>>> a kill(getpid(), SIGKILL) but also with no luck. Attaching
>>>>> with gdb fails as well (hangs indefinitely). Looks like a
>>>>> real zombie: dead and very active at the same time:-(
>>>>
>>>>Probably an exit handler does unexpected things. This could be part of
>>>>the C runtime as well as part of a used library or even your code.
>>>>
>>>>Maybe shutting down your program this way runs into badly tested code
>>>>paths with some race conditions.
>>>>
>>>>Try abort() which does not invoke that much exit handlers.
>>>>
>>>>> Does that ring a bell with anyone of you? One of the threads
>>>>> is rather likely to do a lot of epoll() calls.
>>>>
>>>>Definitely I/O. It should check for the exit condition before invoking
>>>>another I/O. The Linux kernel behaves quite bad when killing processes
>>>>with outstanding I/O. Request like that are simply ignored.
>>>>
>>> 
>>> If SIGKILL doesn't kill the process, you've a kernel bug.
>>
>>Even with a non-buggy kernel, SIGKILL won't terminate a zombie process, nor
>>a process stuck in "uninterruptable sleep" state.
> 
> A zombie no longer holds resources, with the exception of the exit status
> (say 32-bits) and the pid.
> 
> It's the parent responsibility to reap the status.

True. It remains in the process table (and visible through ps(1)) until the
parent reaps the status, or permits init(8) to reap the status. Since the
process is already dead, it CANNOT be "killed" (terminated and removed from
the process table) by SIGKILL.

> An operating system that allows an application to enter an
> "uninterruptable sleep" state is broken.

OK. Thanks for the opinion.

Howver, whether or not the OS is, in your opinion, "broken", "uninterruptable
sleep" is still a permitted state. And, because the process cannot be
scheduled, it cannot receive /any/ signal, let alone SIGKILL. 

> It used to be in SVR3, that one could end up in an uninterruptable
> sleep state during close(2) when the file descriptor referenced a
> character special device for a parallel port (e.g. printer) and the
> printer was off-line.  Bugs like that were mainly fixed a quarter
> century ago.


-- 
Lew Pitcher
"In Skills, We Trust"
PGP public key available upon request

0
Lew
12/13/2016 7:18:22 PM
On 13.12.16 19.13, Scott Lurndal wrote:
>> Definitely I/O. It should check for the exit condition before invoking
>> another I/O. The Linux kernel behaves quite bad when killing processes
>> with outstanding I/O. Request like that are simply ignored.
>
> If SIGKILL doesn't kill the process, you've a kernel bug.

Well, welcome to real word.
A process hanging in state D is one of the most often causes of system 
reboots. This did not change significantly over the last 15 years from 
Debian Woody to recent Raspbian with kernel 4.4. Of course, it is not 
that often that I have serious trouble. Once or twice per year or 
something like that.
AFAIK there is absolutely no recovery from a process blocked in state D. 
This seems to be a Linux specific "feature".


Marcel
0
Marcel
12/13/2016 8:01:54 PM
Hi,

   thank you all - I'm quite overwhelmed by the number and
quality of responses! So please don't be annoyed if I don't
respond to each post in detail.

   As usual I guess I've looked too much at "red herrings".
It doesn't seem to have been something really related to
threads. After a lot more of looking at the rather longish
output of strace I started to notice a pattern, i.e. that
one of the threads got interrupted in a call of close().
This often happend a long (relatively speaking) time be-
fore the software watchdog tried to stop the program - and
that thread never got re-scheduled.

So I switched my attention to the serial driver (that close()
call was for a device file for one of the serial ports of the
processor) and found a different version of it. And, lo and
behold, with that updated driver I haven't seen any of that
strange behaviour anymore for about 400 test runs. While
that is, of course, no proof that everything is well, it at
least encouraging;)

Unfortunately, the somewhat restricted tools I have at my
disposal don't tell me too much what state a process is in.
'ps' is rather terse in what it tells you (no D/S/R etc., i.
e. no STAT field at all) one is used from a PC. But the pro-
cess/thread was definitely not sleeping nor a zombie - it was
so active that it used up about 50% of the CPU time, and ob-
viously somehow kept [ksoftirqd] busy as well;-)

So from what I can say at the moment it was a slightly buggy
driver that, in what manner I can't tell yet, didn't close
the device file as requested and thus kept the program from
exiting. At least my believe in TLPI/APUE has been restored
in that it most likely was a situation where an exit() would
have killed all threads if not a buggy driver had intervened;-)

               Thank you all and best regards, Jens
-- 
  \   Jens Thoms Toerring  ___      jt@toerring.de
   \__________________________      http://toerring.de
0
jt
12/13/2016 10:32:32 PM
On Tue, 2016-12-13, Jens Thoms Toerring wrote:
> Hi,
>
>    thank you all - I'm quite overwhelmed by the number and
> quality of responses! So please don't be annoyed if I don't
> respond to each post in detail.
>
>    As usual I guess I've looked too much at "red herrings".
> It doesn't seem to have been something really related to
> threads. After a lot more of looking at the rather longish
> output of strace I started to notice a pattern, i.e. that
> one of the threads got interrupted in a call of close().
> This often happend a long (relatively speaking) time be-
> fore the software watchdog tried to stop the program - and
> that thread never got re-scheduled.
>
> So I switched my attention to the serial driver (that close()
> call was for a device file for one of the serial ports of the
> processor)

Seems that was the turning point.  Nice!

> and found a different version of it. And, lo and
> behold, with that updated driver I haven't seen any of that
> strange behaviour anymore for about 400 test runs. While
> that is, of course, no proof that everything is well, it at
> least encouraging;)
>
> Unfortunately, the somewhat restricted tools I have at my
> disposal don't tell me too much what state a process is in.
> 'ps' is rather terse in what it tells you (no D/S/R etc., i.
> e. no STAT field at all) one is used from a PC.

One useful trick is to look in the Linux /proc file system.  I think
that's where ps gets its information anyway, and there's more useful
information in there too.  The proc(5) man page et cetera may be
needed to interpret it.

> But the pro-
> cess/thread was definitely not sleeping nor a zombie - it was
> so active that it used up about 50% of the CPU time, and ob-
> viously somehow kept [ksoftirqd] busy as well;-)

> So from what I can say at the moment it was a slightly buggy
> driver that, in what manner I can't tell yet, didn't close
> the device file as requested and thus kept the program from
> exiting.

A guess: the buggy serial driver sometimes couldn't deal with the
resource cleanup caused by the file descriptor closing.  close() never
returned but initiated some work: partly attributed to the process,
and partly to the kernel itself.  Maybe the work was actual I/O.

Probably you'd have triggered the same thing with a 'kill -9' or an
abort() as with exit().  In both cases there's a freeing of kernel
resources associated with that file descriptor.

> At least my believe in TLPI/APUE has been restored
> in that it most likely was a situation where an exit() would
> have killed all threads if not a buggy driver had intervened;-)
>
>                Thank you all and best regards, Jens

/Jorgen

-- 
  // Jorgen Grahn <grahn@  Oo  o.   .     .
\X/     snipabacken.se>   O  o   .
0
Jorgen
12/14/2016 12:10:17 AM
Marcel Mueller <news.5.maazl@spamgourmet.org> wrote:
> On 13.12.16 19.13, Scott Lurndal wrote:
>>> Definitely I/O. It should check for the exit condition before invoking
>>> another I/O. The Linux kernel behaves quite bad when killing processes
>>> with outstanding I/O. Request like that are simply ignored.
>>
>> If SIGKILL doesn't kill the process, you've a kernel bug.
> 
> Well, welcome to real word.
> A process hanging in state D is one of the most often causes of system 
> reboots. This did not change significantly over the last 15 years from 
> Debian Woody to recent Raspbian with kernel 4.4. Of course, it is not 
> that often that I have serious trouble. Once or twice per year or 
> something like that.
> AFAIK there is absolutely no recovery from a process blocked in state D. 
> This seems to be a Linux specific "feature".

The classic stumbling block is that the block device subsystems in Linux as
well the *BSDs are fundamentally synchronous. This is related historically
to why polling I/O on regular (block device) files is defined by POSIX to
alway immediately return ready. Given the expectations engendered by the
history, it was apparently too convenient for implementations to bake
synchronous interfaces into their block device and driver models.

NFS implementations on Linux (and I assume other Unix systems) were
especially notorious in this regard, because the kernel implementations
adopted the same synchronous interface model, but for obvious reasons were
much more prone to putting processes into a prolonged, uninterruptible
state.

AFAIU, making block device I/O asynchronous (and thus interruptible)
requires extensive refactoring of the driver model as well as the individual
drivers for those operating systems.

POSIX AIO on those systems simply use kernel threads to do the synchronous
calls, which only hides the issue. The kernel thread could still block,
consuming system resources indefinitely even after the requesting process
has long exited. You get a slightly cleaner user process tree, yes, but
requests still linger behind the scenes, and resource accounting can no
longer be kept deterministic without some ugly compromises.

Given the pedigree of Solaris, AIX, and HP-UX, I'm curious what those
systems did. Did they refactor their driver model? Officially commit to the
kernel thread hack? Or find some sort of compromise, e.g. a
quasi-synchronous interface where updated drivers could bubble up through
the call stack an interrupt or timeout?

There have been several attempts over the years to systematize the kernel
thread hack in Linux. See, e.g., these 2007 articles

  "Fibrils and asynchronous system calls", https://lwn.net/Articles/219954/
  "LCA: A new approach to asynchronous I/O" https://lwn.net/Articles/316806/

and most recently from 2016

  "Fixing asynchronous I/O, again" https://lwn.net/Articles/671649/

I like to think they always fail because at the end of the day using slave
threads can be easily done in userspace. And interfaces like splice(2),
sendfile(2), eventfd(2), etc that can allow the userspace solution to match
or even exceed the kernel-space solution are useful in their own right. That
reality makes it difficult to accept the maintenance burden of an in-kernel
overlay solution that doesn't address the underlying issues. But maybe
that's just wishful thinking.

0
william
12/14/2016 12:23:08 AM
On Tue, 13 Dec 2016 16:23:08 -0800
<william@wilbur.25thandClement.com> wrote:

> The classic stumbling block is that the block device subsystems in
> Linux as well the *BSDs are fundamentally synchronous. 

It's not clear to me why they should be anything other than
synchronous.  The devices themselves might in some cases support a
queued command interface (e.g. SCSI) but that view of the device is
very different from a linear-store-of-bytes abstraction. 

The kernel provides applications with a perfectly good asynchronous
interface: the timeslice.  if the application has something better to
do while it's blocked against I/O, it can put that processing on
another pid.  In the typical case, the application blocks against
needed input, and the kernel can schedule CPU time for something else. 

> I like to think they always fail because at the end of the day using
> slave threads can be easily done in userspace. 

Exactly. 

--jkl
0
James
12/14/2016 4:30:15 AM
<william@wilbur.25thandClement.com> writes:

>Given the pedigree of Solaris, AIX, and HP-UX, I'm curious what those
>systems did. Did they refactor their driver model? Officially commit to the
>kernel thread hack? Or find some sort of compromise, e.g. a
>quasi-synchronous interface where updated drivers could bubble up through
>the call stack an interrupt or timeout?

SVR4.2 ES/MP completely redesigned the I/O system to handle
asynchronicity natively (along with eliminating the BFKL[*]).
The POSIX asynchronous I/O apis were implemented naturally
throughout the I/O stack.

Our Chorus microkernel-based port of SVR4.2 ES/MP (called SVR4/MK,
or project Amadeus in Europe) also supported the asynchronous interfaces
internally, and they were heavily used by Oracle for performance.


[*] Big F'ing Kernel Lock
0
scott
12/14/2016 1:39:18 PM
> Well, welcome to real word.
> A process hanging in state D is one of the most often causes of system 
> reboots. This did not change significantly over the last 15 years from 
> Debian Woody to recent Raspbian with kernel 4.4. Of course, it is not 
> that often that I have serious trouble. Once or twice per year or 
> something like that.
> AFAIK there is absolutely no recovery from a process blocked in state D. 
> This seems to be a Linux specific "feature".

I'm not sure I agree with that.  Hanging device drivers (in state
"D"), specifically due to USB devices being disconnected at
inconvenient times, seems to be a bigger problem than just Linux.
I've observed it occasionally on the *BSDs.  Usually it's quite
obvious that the device shouldn't have been intentionally disconnected,
but that the cable/connector was a little loose and someone wiggled
it.
0
gordonb
12/15/2016 6:35:59 AM
On 15.12.16 07.35, Gordon Burditt wrote:
>> AFAIK there is absolutely no recovery from a process blocked in state D.
>> This seems to be a Linux specific "feature".
>
> I'm not sure I agree with that.  Hanging device drivers (in state
> "D"), specifically due to USB devices being disconnected at
> inconvenient times, seems to be a bigger problem than just Linux.
> I've observed it occasionally on the *BSDs.  Usually it's quite
> obvious that the device shouldn't have been intentionally disconnected,
> but that the cable/connector was a little loose and someone wiggled
> it.

Bugs and I/O errors can occur everywhere. Not that nice, but that's life.
The only problem is the kernel is unable to recover from this errors 
without reboot. This is not contemporary.


Marcel
0
Marcel
12/15/2016 6:32:10 PM
Marcel Mueller <news.5.maazl@spamgourmet.org> writes:
> On 15.12.16 07.35, Gordon Burditt wrote:
>>> AFAIK there is absolutely no recovery from a process blocked in state D.
>>> This seems to be a Linux specific "feature".
>>
>> I'm not sure I agree with that.  Hanging device drivers (in state
>> "D"), specifically due to USB devices being disconnected at
>> inconvenient times, seems to be a bigger problem than just Linux.
>> I've observed it occasionally on the *BSDs.  Usually it's quite
>> obvious that the device shouldn't have been intentionally disconnected,
>> but that the cable/connector was a little loose and someone wiggled
>> it.
>
> Bugs and I/O errors can occur everywhere. Not that nice, but that's life.
> The only problem is the kernel is unable to recover from this errors
> without reboot. This is not contemporary.

It is contemporary because it's happening now.

'Uninterruptible sleep' state usually means 'the operation being waited
for is always expected to complete' as it's entirely within the domain
of the local system. Insofar the state persists when talking to a
device, that's usually a hardware failure. Another possible cause would
be a kernel mutex deadlock.

Interruptible sleeping needs correct support code for every instance of
a sleep. That's a whole load of opportunities for additional bugs as
this will usually need 'resource allocation unwinding' back up the
complete callstack. It also needs to be handled correctly in all
applications. IMHO, is very questionable if this is really a good idea
"just in case there's a kernel bug".

It's entirely unclear how "recovery in case of hardware errors" should
look like. If a mass storage device fails, the result is going to be
"unpleasant" regardless of requiring a reboot to paper over the issue
for some time.

The idea to use 'D' state for network filesystems is obviously moronic
and there should be some kind of 'emergency abort' for removable storage
devices, too.

0
Rainer
12/16/2016 5:38:09 PM
On Fri, 16 Dec 2016 17:38:09 +0000
Rainer Weikusat <rweikusat@talktalk.net> wrote:
>It's entirely unclear how "recovery in case of hardware errors" should
>look like. If a mass storage device fails, the result is going to be
>"unpleasant" regardless of requiring a reboot to paper over the issue
>for some time.

Unless the device is the drive the OS system files are hosted on or some other
critical main board component, then any hardware failure should be dealt with
gracefully. Period. Hardware failures should be expected and the OS should help
the admins diagnose the problem, not just give up and die.

>The idea to use 'D' state for network filesystems is obviously moronic
>and there should be some kind of 'emergency abort' for removable storage
>devices, too.

FreeBSD had a nice bug back in the day (maybe still does) whereby if you 
mounted a floppy disk as a filesystem then removed the disk the kernel would
crash. Despite numerous people including myself pointing this out they still
hadn't fixed it by 6.0, at which point I switched to linux for other reasons.

-- 
Spud

0
spud
12/19/2016 9:38:04 AM
On 16.12.16 18.38, Rainer Weikusat wrote:
>> Bugs and I/O errors can occur everywhere. Not that nice, but that's life.
>> The only problem is the kernel is unable to recover from this errors
>> without reboot. This is not contemporary.
>
> It is contemporary because it's happening now.
>
> 'Uninterruptible sleep' state usually means 'the operation being waited
> for is always expected to complete' as it's entirely within the domain
> of the local system. Insofar the state persists when talking to a
> device, that's usually a hardware failure. Another possible cause would
> be a kernel mutex deadlock.

Even if DMA is involved it should be possible to cancel this operation. 
And well, if a hardware DMA does not complete within a few minutes it 
will likely never complete. So unloading the driver is just fine in 
99,9% of the cases.

> Interruptible sleeping needs correct support code for every instance of
> a sleep. That's a whole load of opportunities for additional bugs as
> this will usually need 'resource allocation unwinding' back up the
> complete callstack.

Agree.
But I do not talk about graceful exit. Just cancel all related threads.
Of course, this might leave the driver in an inconsistent state. Not too 
surprising since there is the bug. So the next action is to forcibly 
unload the driver. Since most drivers reset their device when loaded 
(again) it is likely that the hardware could recover from this error.

> It also needs to be handled correctly in all
> applications. IMHO, is very questionable if this is really a good idea
> "just in case there's a kernel bug".

I do not see any action other than "kill" that could be executed in this 
state. So I see no need for any action in userspace.


> It's entirely unclear how "recovery in case of hardware errors" should
> look like. If a mass storage device fails, the result is going to be
> "unpleasant" regardless of requiring a reboot to paper over the issue
> for some time.

If it is the root filesystem or swap, yes. There is no reasonable recovery.
But most of the time state D is not related to the system disk. More 
likely it is a WLAN device (amazingly unreliable this kind of hardware) 
or an USB stick or some other less important device.

> The idea to use 'D' state for network filesystems is obviously moronic
> and there should be some kind of 'emergency abort' for removable storage
> devices, too.

Indeed. NFS is really annoying if the network is not 100% solid.


Marcel
0
Marcel
12/19/2016 7:29:59 PM
>>The idea to use 'D' state for network filesystems is obviously moronic
>>and there should be some kind of 'emergency abort' for removable storage
>>devices, too.
> 
> FreeBSD had a nice bug back in the day (maybe still does) whereby if you 
> mounted a floppy disk as a filesystem then removed the disk the kernel would
> crash. Despite numerous people including myself pointing this out they still
> hadn't fixed it by 6.0, at which point I switched to linux for other reasons.

I expect that you would have the same problem for *ANY* removable
device with a UFS filesystem with soft updates enabled (on FreeBSD
10.1, and I think on 11.0).  I've managed to trigger some kind of
panic related to soft updates by accidental removal of a mounted
filesystem (as in "accidentally yanked the cable out").  Floppies
using a FAT-16 filesystem probably won't have this issue.  Neither,
it seems, will a UFS filesystems with soft updates turned off.  The
data is inconsistent, but the system doesn't panic.  Sometimes, the
panic was triggered after the program that wrote the data had already
terminated (but not all data flushed to disk).  Soft updates does
seem to work well for actually non-removable drives.  The problem
of panics doesn't exist when non-removable drives are removed from
the system by a power failure.

I'm not sure about journaling on UFS, but journaling is usually
unsuitable for my application for removable media:  large copy to
the drive, followed by the data being read-only for a long time
(maybe months), or else read a few times (usually by different
systems) and then deleted.  Journaling increases the number of
writes (possibly wearing out flash drives earlier), and I don't
really care about the integrity of the data *between* the time the
copy starts and everything gets written.  I do care about data
integrity after it's unmounted and re-mounted.

No, this wasn't any essential filesystem like /, swap, /usr, or
/var.  Most of the time it was /mnt or /mnt2, filesystems used for
data transfer or archive using USB memory sticks, or a USB hard
drive.  I suppose it would also happen with a USB or normal floppy
drive.  Nothing is permanently mounted on /mnt.  In case of accidental
disconnection, I'd expect the data in process of being transferred
to be toast, and I really don't care much about that.  I can't trust
the copy anyway.

0
gordonb
12/20/2016 3:04:22 AM
Reply: