fork() race in SIGCHLD handler

  • Follow


Hi,

In tracking down some unexpected behaviour in a program of mine, I have
noticed that the test code below exhibits a timing race on fork() in Linux
kernel 2.6.0.  If the test code below illustrating the problem is compiled
and run, the first output is "child_pid is less than zero", all subsequent
lines being the expected "child_pid is greater than zero".  (I get
"child_pid is greater than zero" at all times with kernel 2.4.23).

What is clearly happening is that by the time the child process has exited
and the SIGCHLD handler is executing in the parent process, the parent
process has not had its copy of the static child_pid variable modified by
having the return value of the fork() call assigned to it in its own
address space.  The parent process after fork()ing is beginning after the
child process has already terminated.

At first I thought this may be a kernel or glibc bug, but on reflection I am
not so sure - I do not think the asynchronous SIGCHLD handler in the parent
process is required to wait around for the process in which it is executing
to begin executing synchronously after fork()ing.  I can get around the
difficulty by putting a flag in the first line of the parent process after
the fork(), on which the SIGCHLD handler can block, but does POSIX say
anything about this?  (Apologies for posting this to comp.unix.programmer
as well as comp.os.linux.development.system, but the comp.unix.programmer
regulars seem to know a lot about POSIX).

It is interesting though that the assignment of -1 to child_pid in the last
line of the signal handler does not cause subsequent calls to the handler
to exhibit the "unexpected" behaviour.  Probably on all iterations after
the first the address space of the parent process has been set up so that
copies-on-write to its copy of child_pid take place quickly.

This is with glibc 2.3.2, and the compiler is gcc-3.2.3.

Chris.

//////////////////// test code ///////////////////

#include <signal.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdlib.h>
#include <time.h>
 
pid_t child_pid = -1;
 
void childexit_signalhandler(int sig) {
 
  char message_less[] = "child_pid is less than zero\n";
  char message_greater[] = "child_pid is greater than zero\n";
 
  waitpid(-1, 0, WNOHANG);  /* eliminate zombies */
 
  if (child_pid < 0) write(0, message_less,
                           sizeof(message_less));
  else if (child_pid > 0) write(0, message_greater,
                                sizeof(message_greater));
 
  child_pid = -1;
}
 
int main(void) {
 
  struct sigaction sig_act_chld;
  int index;
  struct timespec delay;
  struct timespec residual_delay;
 
  sig_act_chld.sa_handler = childexit_signalhandler;
  sigemptyset(&sig_act_chld.sa_mask);
  sig_act_chld.sa_flags = 0;
  sigaction(SIGCHLD, &sig_act_chld, 0);
 
  for (;;) {
    child_pid = fork();
    if (child_pid > 0) {   /* parent process */
      /* set up a 1s delay */
      delay.tv_sec = 1;
      delay.tv_nsec = 0;
      while (nanosleep(&delay, &residual_delay) == -1) 
        delay = residual_delay;
    }
    if (!child_pid) {      /* child process */
      /* do something meaningless */
      for (index = 0; index < 100; index++);
      _exit(0);
    }
  }
  return 0;
}


-- 
To reply by e-mail, remove the "--nospam--" in the address
0
Reply Chris 12/30/2003 2:13:15 PM

In comp.os.linux.development.system Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
> noticed that the test code below exhibits a timing race on fork() in Linux
> kernel 2.6.0.	 If the test code below illustrating the problem is compiled

I recall linus saying that he had changed the order of the first
process to execute after fork twice in the 2.6.0 series. There was some
problem such as you detail below, but it went away ages ago with the
changes made some way back now.

> and run, the first output is "child_pid is less than zero", all subsequent
> lines being the expected "child_pid is greater than zero".  (I get
> "child_pid is greater than zero" at all times with kernel 2.4.23).

> What is clearly happening is that by the time the child process has exited
> and the SIGCHLD handler is executing in the parent process, the parent
> process has not had its copy of the static child_pid variable modified by
> having the return value of the fork() call assigned to it in its own

Well, this sounds a bit like the old issue of "who runs first".	 I
presume this is UP?

> address space.  The parent process after fork()ing is beginning after the
> child process has already terminated.

Uh huh. Very likely. So? Isn't that allowed? There's nothing wrong with
that.

> At first I thought this may be a kernel or glibc bug, but on reflection I am
> not so sure - I do not think the asynchronous SIGCHLD handler in the parent
> process is required to wait around for the process in which it is executing
> to begin executing synchronously after fork()ing.

Well, in fact the handler executes (because the child dies)
before the return value from the fork is known!

It sounds like you should delay setting the handler in the parent till
the fork has returned, but that leaves a window in which the child can
die unattended.


> I can get around the
> difficulty by putting a flag in the first line of the parent process after
> the fork(), on which the SIGCHLD handler can block, but does POSIX say
> anything about this?	(Apologies for posting this to comp.unix.programmer

Dunno. I imagine so.

> as well as comp.os.linux.development.system, but the comp.unix.programmer
> regulars seem to know a lot about POSIX).

> It is interesting though that the assignment of -1 to child_pid in the last
> line of the signal handler does not cause subsequent calls to the handler
> to exhibit the "unexpected" behaviour.  Probably on all iterations after
> the first the address space of the parent process has been set up so that
> copies-on-write to its copy of child_pid take place quickly.

Unknown by me. Compiler issues will intervene to make that cloudy.

> This is with glibc 2.3.2, and the compiler is gcc-3.2.3.

Well try a different compiler for a start! Though I don't know if it
will make a difference, since so far life sounds vaguely within spec.
But mapping the space would be useful. Try no optimization.

> pid_t child_pid = -1;

It makes me nervous not having this static. Shrug. SHouldn't you
declare this volatile too?


> void childexit_signalhandler(int sig) {
>
>   char message_less[]	   = "child_pid is less than zero\n";
>   char message_greater[] = "child_pid is greater than zero\n";
>
>   waitpid(-1, 0, WNOHANG);  /* eliminate zombies */
>
>   if (child_pid < 0)	    write(0, message_less, sizeof(message_less));
>   else if (child_pid > 0) write(0, message_greater, sizeof(message_greater));
>
>   child_pid = -1;
> }
>
> int main(void) {


>   sig_act_chld.sa_handler = childexit_signalhandler;
>   sigemptyset(&sig_act_chld.sa_mask);
>   sig_act_chld.sa_flags = 0;
>   sigaction(SIGCHLD, &sig_act_chld, 0);

So you set the sigchld action to be to tell us what the child_pid is
known to be in the parent process.

>   for (;;) {

>     child_pid = fork();

>     if (child_pid > 0) {   /* parent process */
>	/* set up a 1s delay */
>	delay.tv_sec = 1; delay.tv_nsec = 0;

>	while (nanosleep(&delay, &residual_delay) == -1)
>	  delay = residual_delay;

>     }

>     if (!child_pid) {	     /* child process */
>	/* do something meaningless */
>	for (index = 0; index < 100; index++);
>	_exit(0);
>     }

>   }
>   return 0;
> }


Yes, well. Your child dies and activates the handler, but child_pid in
the handler still looks the way it used to be before being assigned by
the fork.

This sounds as though it's partially a problem with gcc.  Can you
scatter a few "volatile"s around, and see if anything changes?





Peter
0
Reply ptb 12/30/2003 3:50:29 PM


P.T. Breuer wrote:

[snip]
 
> Well, this sounds a bit like the old issue of "who runs first".        I
> presume this is UP?

It is UP, yes, but the scheduler will provide reasonable equality of
processor time and I expect the real issue is what process has to accept
the delays on copy-on-write, and that that has changed between kernel 2.4
and kernel 2.6.

>> address space.  The parent process after fork()ing is beginning after the
>> child process has already terminated.

[snip]
 
> Well, in fact the handler executes (because the child dies)
> before the return value from the fork is known!
> 
> It sounds like you should delay setting the handler in the parent till
> the fork has returned, but that leaves a window in which the child can
> die unattended.

I cannot do that, because in the test code I know that the child will return
before the parent fork() has returned, so I will definitely miss it.

>> I can get around the
>> difficulty by putting a flag in the first line of the parent process
>> after the fork(), on which the SIGCHLD handler can block, but does POSIX
>> say
>> anything about this? (Apologies for posting this to comp.unix.programmer
> 
> Dunno. I imagine so.

On further thought, of course my idea will not work - the parent and the
signal handler are in the same thread of execution, so blocking the handler
will block the parent.

I suspect the proper approach is to mask off SIGCHLD before fork()ing, and
then unmask it again in the parent once the fork() has returned.  I have
just tried that and it appears to work as expected.  Phew!

>> It is interesting though that the assignment of -1 to child_pid in the
>> last line of the signal handler does not cause subsequent calls to the
>> handler
>> to exhibit the "unexpected" behaviour.  Probably on all iterations after
>> the first the address space of the parent process has been set up so that
>> copies-on-write to its copy of child_pid take place quickly.
> 
> Unknown by me. Compiler issues will intervene to make that cloudy.
> 
>> This is with glibc 2.3.2, and the compiler is gcc-3.2.3.
> 
> Well try a different compiler for a start! Though I don't know if it
> will make a difference, since so far life sounds vaguely within spec.
> But mapping the space would be useful. Try no optimization.

Actually, I don't think the compiler is the issue - the explanation for this
behaviour is indentified by both of us above and should be compiler
independent.  I have tried the same test code on another machine with
gcc-2.95.3, and with kernel 2.6.0 I get the same problem as with gcc-3.2. 
It also occurs with or without optimisation.  Only the mask/unmask approach
seems to guarantee correct results.

>> pid_t child_pid = -1;
> 
> It makes me nervous not having this static. Shrug. SHouldn't you
> declare this volatile too?

I probably should, but having changed it to volatile the effect is still the
same.

Chris.


-- 
To reply by e-mail, remove the "--nospam--" in the address
0
Reply Chris 12/30/2003 4:31:51 PM

"P.T. Breuer" wrote:
> 
> Well, this sounds a bit like the old issue of "who runs first".

No matter who executes first you should get correct behaviour.
Even if you know who executes first, you don't know how many
cycles it will get. Eventually you will have a situation where
the first executing process gets preempted before it returned
from the fork function. It is often desirable to have the child
run first for performance reasons, since we are talking about
performance it doesn't hurt too much if we in a few rare cases
have the parent execute first.

printf("%d",fork()==fork()==fork()); /* :-) */

> 
> > address space.  The parent process after fork()ing is beginning after the
> > child process has already terminated.
> 
> Uh huh. Very likely. So? Isn't that allowed? There's nothing wrong with
> that.

Not just likely, also desirable. I think you will have the
smallest number of context switches that way.

> 
> > At first I thought this may be a kernel or glibc bug, but on reflection I am
> > not so sure - I do not think the asynchronous SIGCHLD handler in the parent
> > process is required to wait around for the process in which it is executing
> > to begin executing synchronously after fork()ing.
> 
> Well, in fact the handler executes (because the child dies)
> before the return value from the fork is known!

Well, there is a race between the process executing and the
signal handler. You cannot use a lock to protect against a
singal handler, as that will cause a deadlock. So I guess
the only solution is to mask the handler while the process
is in the critical region. And it seems the fork() call and
the assignment to the variable need both be in the critical
region. So masking before fork and unmasking after assigning
to the variable probably is the best (only?) solution.

> 
> It sounds like you should delay setting the handler in the parent till
> the fork has returned, but that leaves a window in which the child can
> die unattended.

You need the handler to be ready before the signal can
arrive. That means the handler must be set up before the
fork call which is responsible for the signal eventually
arriving in the parent process. I don't know what will
happen if you get the signal while it is blocked and first
then actually installs the handler. I would avoid that by
installing the handler before fork and just keep the signal
blocked during the critical region in the parent.

> 
> > pid_t child_pid = -1;
> 
> It makes me nervous not having this static.

Shouldn't make any difference.

> Shrug. SHouldn't you declare this volatile too?

Might be a good idea. But is that really necesarry if it
is protected with a mutual exclusion implemented by
blocking the signal?

-- 
Kasper Dupont -- der bruger for meget tid paa usenet.
For sending spam use mailto:aaarep@daimi.au.dk
/* Would you like fries with that? */
0
Reply Kasper 12/30/2003 8:24:44 PM

In comp.os.linux.development.system Kasper Dupont <kasperd@daimi.au.dk> wrote:
> "P.T. Breuer" wrote:
> > Well, in fact the handler executes (because the child dies)
> > before the return value from the fork is known!

> Well, there is a race between the process executing and the
> signal handler. You cannot use a lock to protect against a
> singal handler, as that will cause a deadlock. So I guess
> the only solution is to mask the handler while the process
> is in the critical region. And it seems the fork() call and
> the assignment to the variable need both be in the critical
> region. So masking before fork and unmasking after assigning
> to the variable probably is the best (only?) solution.

I'm not quite sure what you mean by masking - blocking, I take it, with
signals being queued while the block is in place, and treated when the
block is removed.

I'm not sure how you achieve that! One can block signals while a handler
is running (with sigaction), but he wants to block (and queue) them
while the handler is not running.

Oh - you mean use sigprocmask, with SIG_BLOCK. Yes, that will work
nicely.  Is there some limit to the number of queued signals? Or
are the processes sending the signal blocked until it can be delivered?

That would make sense, but a synchronous signal would be dangerous!
A process could deadlock y signalling itself while the signal
was blocked. So I don't believe synchronous signalling can be in posix,
and hence I believe there must be a finite length to the queue of
pending signals. Maybe "1". And other signals are lost.

Oh well, I'm sure you'll enlighten me!

> Shouldn't make any difference.

> > Shrug. SHouldn't you declare this volatile too?

> Might be a good idea. But is that really necesarry if it
> is protected with a mutual exclusion implemented by
> blocking the signal?

I would feel better if the compiler were forced to implement this
global variable the way I expect, as a memory location that one writes
to directly. That would avoid me feeling uneasy.

Peter
0
Reply ptb 12/30/2003 10:00:35 PM

Kasper Dupont wrote:

[snip]
 
> Well, there is a race between the process executing and the
> signal handler. You cannot use a lock to protect against a
> singal handler, as that will cause a deadlock. So I guess
> the only solution is to mask the handler while the process
> is in the critical region. And it seems the fork() call and
> the assignment to the variable need both be in the critical
> region. So masking before fork and unmasking after assigning
> to the variable probably is the best (only?) solution.

As you will see from my follow-up post of earlier today, this was the
approach which worked.  On reflection, it is probably the only approach
which will work, and it is relatively clean.

These timing/race issues with signal handlers are some of the most difficult
to resolve.  The syncronisation of multiple threads seems quite easy in
comparison, and the fork() race issue I stumbled on is a problem waiting to
happen, because it is unintuitive (until it happens to you, and then after
a little thought the problem is obvious).

Chris.

-- 
To reply by e-mail, remove the "--nospam--" in the address
0
Reply Chris 12/30/2003 11:41:38 PM

Chris Vine wrote:
> 
> These timing/race issues with signal handlers are some of the most difficult
> to resolve.  The syncronisation of multiple threads seems quite easy in
> comparison, and the fork() race issue I stumbled on is a problem waiting to
> happen, because it is unintuitive (until it happens to you, and then after
> a little thought the problem is obvious).

Indeed. I'm not sure I would have thought about it
either. But of course after reading your description
the problem (and solution) was obvious.

-- 
Kasper Dupont -- der bruger for meget tid paa usenet.
For sending spam use mailto:aaarep@daimi.au.dk
/* Would you like fries with that? */
0
Reply Kasper 12/30/2003 11:50:23 PM

"P.T. Breuer" wrote:
> 
> I'm not quite sure what you mean by masking - blocking, I take it, with
> signals being queued while the block is in place, and treated when the
> block is removed.

Yes blocking is what I mean. Isn't the term masking also
sometimes used about this? At least the term masking has
been used about interrupts. Actually interrupts and
signals are quite similar.

> 
> I'm not sure how you achieve that! One can block signals while a handler
> is running (with sigaction), but he wants to block (and queue) them
> while the handler is not running.

No problem.

> 
> Oh - you mean use sigprocmask, with SIG_BLOCK. Yes, that will work
> nicely.

Yes.

> Is there some limit to the number of queued signals?

One of each type.

> Or
> are the processes sending the signal blocked until it can be delivered?

Nope, the sending process will continue at once.
(Maybe this is different in the case of RT signals,
I know something is different, but I'm not sure
what it is.)

> 
> That would make sense, but a synchronous signal would be dangerous!
> A process could deadlock y signalling itself while the signal
> was blocked. So I don't believe synchronous signalling can be in posix,
> and hence I believe there must be a finite length to the queue of
> pending signals. Maybe "1". And other signals are lost.

I think this link should answer what happens in
the case of synchronous signals.
http://kt.zork.net/kernel-traffic/kt20031201_243.html#9

But when exactly is a signal synchronous? If it
is caused by an exception like SIGSEGV and
similar cases, it is obviously synchronous. But
how about kill(getpid(),...) or raise()?

> 
> I would feel better if the compiler were forced to implement this
> global variable the way I expect, as a memory location that one writes
> to directly. That would avoid me feeling uneasy.

I see you got a point. But is there really any
problem? Doesn't the compiler know, that a system
call can cause conents in variables to change?

Actually at compile time the system call is considered
an external function in a different object file. So
the compiler knows nothing about what it is going to
do. Since the variable is global, ie. not static,
that means any external function could potentially
change the variable. So the compiler must be prepared
to handle cases where a system call changes a variable.

If the variable was static without being volatile, it
might be interesting. But still, a pointer to the
signal handler is given to one system call. Later
another system call unblocking signals causes the
handler to get invoked. That could have happened even
without involving signals and with another object
file written in pure C code. Just assume the first
function being called saves the function pointer and
the second calls the previously saved function pointer.
(It almost happens that way, though a signal of course
is more complicated). So I cannot see how the compiler
could do anything of potential harm, as long as you
use the necesarry system calls to guard against race
conditions.

-- 
Kasper Dupont -- der bruger for meget tid paa usenet.
For sending spam use mailto:aaarep@daimi.au.dk
/* Would you like fries with that? */
0
Reply Kasper 12/31/2003 12:08:15 AM

On Tue, 30 Dec 2003 15:50:29 +0000, P.T. Breuer wrote:

> In comp.os.linux.development.system Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
>> noticed that the test code below exhibits a timing race on fork() in Linux
>> kernel 2.6.0.	 If the test code below illustrating the problem is compiled
> 
> I recall linus saying that he had changed the order of the first
> process to execute after fork twice in the 2.6.0 series. There was some
> problem such as you detail below, but it went away ages ago with the
> changes made some way back now.

Your recollection matches mine, except that the problem
didn't go away, but was made less likely. As far as I know,
there is nothing one can do, except make parent and child to
communicate in some way.

-- Pete

0
Reply Pete 12/31/2003 3:33:20 AM

On Tue, 30 Dec 2003 23:41:38 +0000 Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
> Kasper Dupont wrote:
>
> [snip]
>  
>> Well, there is a race between the process executing and the
>> signal handler. You cannot use a lock to protect against a
>> singal handler, as that will cause a deadlock. So I guess
>> the only solution is to mask the handler while the process
>> is in the critical region. And it seems the fork() call and
>> the assignment to the variable need both be in the critical
>> region. So masking before fork and unmasking after assigning
>> to the variable probably is the best (only?) solution.
>
> As you will see from my follow-up post of earlier today, this was the
> approach which worked.  On reflection, it is probably the only approach
> which will work, and it is relatively clean.

One (maybe "the") standard way to synchronize between parent and child
for who-runs-first is a pipe.  Create a pipe before forking, then
whoever must run 2nd reads from the pipe.  When the 1st process is
past the critical section, it closes the pipe.  The other process will
then unblock.

I didn't read the thread in detail but it sounds like that's what you
should be doing here.  signal masking/unmasking seems roundabout in
comparison (to me), and it's probably more fragile.

/fc
0
Reply Frank 12/31/2003 3:37:50 AM

Frank Cusack wrote:

[snip]

> One (maybe "the") standard way to synchronize between parent and child
> for who-runs-first is a pipe.  Create a pipe before forking, then
> whoever must run 2nd reads from the pipe.  When the 1st process is
> past the critical section, it closes the pipe.  The other process will
> then unblock.
> 
> I didn't read the thread in detail but it sounds like that's what you
> should be doing here.  signal masking/unmasking seems roundabout in
> comparison (to me), and it's probably more fragile.

The problem was not really a "who runs first" one (although the problem can
be solved by ordering "who runs first" using pipes, as you suggest).

My problem was that the SIGCHLD signal handler was not supposed to execute
until the fork() call in the parent process had returned - that is, until
it had assigned the child process number to the global variable intended to
hold it.  To do that, blocking with sigprocmask() immediately before the
fork() call and unblocking (in the parent) immediately after achieves the
effect wanted.  This queues the signal until the SIGCHLD handler is able to
deal with it.

The issue could also be dealt with with pipes, by blocking the _exit() or
exec() call made by the child process and so delaying the event
(termination of the child process) giving rise to the SIGCHLD signal, and
thank you for the suggestion.  As a means of dealing with my particular
problem this is probably marginally less attractive as it means, where an
exec() call is to be made by the child process, the execution of the new
program in the child process can be delayed, perhaps unnecessarily, and in
the case of blocking before an _exit() call may, I suppose, give rise to an
unnecessary context switch by preventing the child process from finishing
until the parent process had started executing after the fork().  However,
this all seems pretty marginal, I agree.

Chris.

-- 
To reply by e-mail, remove the "--nospam--" in the address
0
Reply Chris 12/31/2003 10:02:30 PM

> ... the test code below exhibits a timing race on fork() in Linux ...

To fix the race, implement fork() using the NPTL version of clone:
      int clone(int (*fn)(void *arg), void *child_stack, int flags, void *arg,
               pid_t *ptid, struct user_desc *tls, pid_t *ctid);
With suitable flags, the kernel sets *ptid and *ctid before returning
from the syscall.

Hint: as revealed by strace, fork() in glibc-2.3.2 on RH9 (2.4.20-27.9) uses:
      clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
            <ignored>, <ignored>, 0x4002f0c8)
and the only tricky task is finding a suitable fn.  Consult the glibc
source in  sysdeps/unix/sysv/linux/i386/clone.S .

0
Reply John 12/31/2003 11:19:50 PM

On Wed, 31 Dec 2003, Chris Vine wrote:

> My problem was that the SIGCHLD signal handler was not supposed to execute
> until the fork() call in the parent process had returned - that is, until
> it had assigned the child process number to the global variable intended to

OK - but returning from a system call is one of the places where
the kernel usually checks if there's a signal pending.  In the
case where you're having problems, the signal has been delivered
before you're ready.  But given that fork() doesn't guarantee
which executes first (parent or child), you have to expect, and
work around, this sort of thing.

> hold it.  To do that, blocking with sigprocmask() immediately before the
> fork() call and unblocking (in the parent) immediately after achieves the
> effect wanted.  This queues the signal until the SIGCHLD handler is able to
> deal with it.

Yes, that's another way of doing it.

-- 
Rich Teer, SCNA, SCSA                               .  *   * . * .* .
                                                     .   *   .   .*
President,                                          * .  . /\ ( .  . *
Rite Online Inc.                                     . .  / .\   . * .
                                                    .*.  / *  \  . .
                                                      . /*   o \     .
Voice: +1 (250) 979-1638                            *   '''||'''   .
URL: http://www.rite-online.net                     ******************
0
Reply Rich 1/1/2004 1:21:48 AM

On Wed, 31 Dec 2003, John Reiser wrote:

> > ... the test code below exhibits a timing race on fork() in Linux ...
>
> To fix the race, implement fork() using the NPTL version of clone:

Why?  That makes the OP's code needlessly Linux specific.

-- 
Rich Teer, SCNA, SCSA                               .  *   * . * .* .
                                                     .   *   .   .*
President,                                          * .  . /\ ( .  . *
Rite Online Inc.                                     . .  / .\   . * .
                                                    .*.  / *  \  . .
                                                      . /*   o \     .
Voice: +1 (250) 979-1638                            *   '''||'''   .
URL: http://www.rite-online.net                     ******************
0
Reply Rich 1/1/2004 1:22:33 AM

On Wed, 31 Dec 2003 22:02:30 +0000 Chris Vine <chris@cvine--nospam--.freeserve.co.uk> wrote:
> Frank Cusack wrote:
>
> [snip]
>
>> One (maybe "the") standard way to synchronize between parent and child
>> for who-runs-first is a pipe.  Create a pipe before forking, then
>> whoever must run 2nd reads from the pipe.  When the 1st process is
>> past the critical section, it closes the pipe.  The other process will
>> then unblock.
>> 
>> I didn't read the thread in detail but it sounds like that's what you
>> should be doing here.  signal masking/unmasking seems roundabout in
>> comparison (to me), and it's probably more fragile.
>
> The problem was not really a "who runs first" one (although the problem can
> be solved by ordering "who runs first" using pipes, as you suggest).
>
> My problem was that the SIGCHLD signal handler was not supposed to execute
> until the fork() call in the parent process had returned - that is, until
....

IOW, the classic who-runs-first problem.  fork() doesn't return until
said process (parent or child) is running.  You need for the parent to
run first, to set the global pid var.

Classic problems deserve classic solutions.

> The issue could also be dealt with with pipes, by blocking the _exit() or
> exec() call made by the child process and so delaying the event
> (termination of the child process) giving rise to the SIGCHLD signal, and
> thank you for the suggestion.  As a means of dealing with my particular
> problem this is probably marginally less attractive as it means, where an
> exec() call is to be made by the child process, the execution of the new
> program in the child process can be delayed, perhaps unnecessarily, and in
> the case of blocking before an _exit() call may, I suppose, give rise to an
> unnecessary context switch by preventing the child process from finishing
> until the parent process had started executing after the fork().  However,
> this all seems pretty marginal, I agree.

Not to be harsh, but doesn't that seem ridiculous?  And there's only
one point where you'd block, immediately after fork().

a) If latency is that important, you should pre-fork the children.
b) If you agree that it's marginal, you should stick with the standard
solution.

int p[2];
....
if (!pipe(p))
  perror() ...

gPid=fork();
if (gPid > 0) {
  /* parent */
  (void) close(p[0]); /* cleanup */
  (void) close(p[1]); /* unblock child */
  ...

} else if (gpid == 0) {
  (void) close(p[1]);
  /* block on parent */
  while (read(p[0], &p[1], 1) < 0) {
    if (errno != EINTR) {
      perror(); ...
      break;
    }
  }
  close(p[0]);
  ...

} else {
  perror(); ...
}

For systems where the child runs first, you'll have a context switch back
the parent.  For systems where the parent runs first, you'll incur no such
penalty.  EVEN IF you are writing ONLY FOR LINUX, you cannot count on the
child running first!  It has changed in the past and will change again.
Writing your app to depend on the current (undocumented, and for good
reason) behavior is a mistake.

/fc
0
Reply Frank 1/3/2004 6:52:52 AM

On Wed, 31 Dec 2003 15:19:50 -0800 John Reiser <jreiser@BitWagon.com> wrote:
>> ... the test code below exhibits a timing race on fork() in Linux ...
>
> To fix the race, implement fork() using the NPTL version of clone:
>       int clone(int (*fn)(void *arg), void *child_stack, int flags, void *arg,
>                pid_t *ptid, struct user_desc *tls, pid_t *ctid);
> With suitable flags, the kernel sets *ptid and *ctid before returning
> from the syscall.

That's sick.  I don't mean that in the "good" way.

/fc
0
Reply Frank 1/3/2004 6:54:05 AM

On Thu, 01 Jan 2004 01:22:33 GMT Rich Teer <rich.teer@rite-group.com> wrote:
> On Wed, 31 Dec 2003, John Reiser wrote:
>
>> > ... the test code below exhibits a timing race on fork() in Linux ...
>>
>> To fix the race, implement fork() using the NPTL version of clone:
>
> Why?  That makes the OP's code needlessly Linux specific.

And sensitive to an API which is explicitly stated to be in constant flux.

/fc
0
Reply Frank 1/3/2004 6:55:32 AM

Frank Cusack wrote:
> On Thu, 01 Jan 2004 01:22:33 GMT Rich Teer <rich.teer@rite-group.com> wrote:
> 
>>On Wed, 31 Dec 2003, John Reiser wrote:
>>
>>
>>>>... the test code below exhibits a timing race on fork() in Linux ...
>>>
>>>To fix the race, implement fork() using the NPTL version of clone:
>>
>>Why?  That makes the OP's code needlessly Linux specific.

On a Linux system using NPTL clone(), it is possible for the programmer to
guarantee that a SIGCHLD handler has an inexpensive way to know in advance
and check the pid of every exiting child process.  On which other *NIX-like
system(s) is this possible?

> 
> 
> And sensitive to an API which is explicitly stated to be in constant flux.

Now that Linux 2.6.0 is out, the API is no longer in constant flux.
The probability of a change that is not backwards compatible is very small
[bug fixes excepted, of course.]
The API is the default in RedHat 9, RedHat Advanced Server 3.0, Fedora Core 1,
SuSE 9.0, and others.  There are several million machines using it today.

-- 

0
Reply John 1/3/2004 4:54:46 PM

Frank Cusack wrote:

> On Wed, 31 Dec 2003 22:02:30 +0000 Chris Vine
> <chris@cvine--nospam--.freeserve.co.uk> wrote:

[snip]
 
>> The issue could also be dealt with with pipes, by blocking the _exit() or
>> exec() call made by the child process and so delaying the event
>> (termination of the child process) giving rise to the SIGCHLD signal, and
>> thank you for the suggestion.  As a means of dealing with my particular
>> problem this is probably marginally less attractive as it means, where an
>> exec() call is to be made by the child process, the execution of the new
>> program in the child process can be delayed, perhaps unnecessarily, and
>> in the case of blocking before an _exit() call may, I suppose, give rise
>> to an unnecessary context switch by preventing the child process from
>> finishing
>> until the parent process had started executing after the fork(). 
>> However, this all seems pretty marginal, I agree.
> 
> Not to be harsh, but doesn't that seem ridiculous?  And there's only
> one point where you'd block, immediately after fork().

[snip]

I do not know what point you are trying to make, but it simply isn't true
that you would only block after the fork() in the case to which I was
referring.  To deal with the issue which started this series of postings,
if you were to choose to use a pipe to deal with it, it wouldn't matter
where you blocked provided it was before the _exit() call in the child
process.  Nothing in the child process depended on the state of the parent
process.

> For systems where the child runs first, you'll have a context switch back
> the parent.  For systems where the parent runs first, you'll incur no such
> penalty.  EVEN IF you are writing ONLY FOR LINUX, you cannot count on the
> child running first!  It has changed in the past and will change again.
> Writing your app to depend on the current (undocumented, and for good
> reason) behavior is a mistake.

I really don't know what you are talking about.  Using sigprocmask() is
neither undocumented nor a mistake, and doesn't rely on the order in which
the child and parent run as you suggest (it makes the issue irrelevant). 
You may need to re-read the post to which you thought you were replying.

Chris.

-- 
To reply by e-mail, remove the "--nospam--" in the address
0
Reply Chris 1/4/2004 5:39:32 PM

18 Replies
476 Views

(page loaded in 1.159 seconds)

Similiar Articles:






7/23/2012 10:27:24 PM


Reply: