Strange processes

  • Follow


On a customer's OSR 5.0.6 machine (patched with 506a),
there are some unkillable processes:

customer:# ps -u usergaw
  PID     TTY        TIME CMD
18373   ttyp4    00:00:00 scosh
18375   ttyp4 01-08:42:02 desktop
12443   ttyp7    00:00:00 scosh
12445   ttyp7 01-08:31:09 desktop

customer:# kill -9 12443 12445 18373 18375
12443: No such process
12445: No such process
18373: No such process
18375: No such process

They do seem to be consuming resources:

customer:# top | grep usergaw
18375 usergaw     20    4  1352K   256K run    32.7H  desktop
12445 usergaw     20    4  1352K   256K run    32.5H  desktop

What's happening, and what's to be done?

-- 
JP
0
Reply jpr5879 (1158) 1/14/2011 12:15:26 AM

On 1/13/2011 7:15 PM, Jean-Pierre Radley wrote:
> On a customer's OSR 5.0.6 machine (patched with 506a),
> there are some unkillable processes:
>
> customer:# ps -u usergaw
>    PID     TTY        TIME CMD
> 18373   ttyp4    00:00:00 scosh
> 18375   ttyp4 01-08:42:02 desktop
> 12443   ttyp7    00:00:00 scosh
> 12445   ttyp7 01-08:31:09 desktop
>
> customer:# kill -9 12443 12445 18373 18375
> 12443: No such process
> 12445: No such process
> 18373: No such process
> 18375: No such process
>
> They do seem to be consuming resources:
>
> customer:# top | grep usergaw
> 18375 usergaw     20    4  1352K   256K run    32.7H  desktop
> 12445 usergaw     20    4  1352K   256K run    32.5H  desktop
>
> What's happening, and what's to be done?
>

Traverse PPID's up and kill (more gracefully) the parent X session?

-- 
bkw
0
Reply brian109 (760) 1/14/2011 2:23:59 AM


Jean-Pierre Radley wrote:

> On a customer's OSR 5.0.6 machine (patched with 506a),
> there are some unkillable processes:
>
> customer:# ps -u usergaw
>   PID     TTY        TIME CMD
> 18373   ttyp4    00:00:00 scosh
> 18375   ttyp4 01-08:42:02 desktop
> 12443   ttyp7    00:00:00 scosh
> 12445   ttyp7 01-08:31:09 desktop
>
> customer:# kill -9 12443 12445 18373 18375
> 12443: No such process
> 12445: No such process
> 18373: No such process
> 18375: No such process
>
> They do seem to be consuming resources:
>
> customer:# top | grep usergaw
> 18375 usergaw     20    4  1352K   256K run    32.7H  desktop
> 12445 usergaw     20    4  1352K   256K run    32.5H  desktop
>
> What's happening, and what's to be done?

"No such process" is peculiar.  Some shells have a built-in "kill", see
if /bin/kill gets a different result.  I expect it still won't work, but
I hope to see the real errno.

Are there any other processes on those ptys or their master sides?
`lsof /dev/?typ[47]`.  Do the processes have any interesting files open?
`lsof -p18373,etc`.  Is there a file (such as /dev/ttyp4) opened only by
processes you want to kill, that you can `fuser -k`?  That may succeed
or may at least get a good error message.

If all else fails, it should be possible to do in a process using the
user-level kernel debugger, `/etc/scodb -w`; let's go there after other
ideas fail...

>Bela<
0
Reply filbo (325) 1/14/2011 5:36:13 AM

Brian K. White typed (on Thu, Jan 13, 2011 at 09:23:59PM -0500):
| On 1/13/2011 7:15 PM, Jean-Pierre Radley wrote:
| >On a customer's OSR 5.0.6 machine (patched with 506a),
| >there are some unkillable processes:
| >
| >customer:# ps -u usergaw
| >   PID     TTY        TIME CMD
| >18373   ttyp4    00:00:00 scosh
| >18375   ttyp4 01-08:42:02 desktop
| >12443   ttyp7    00:00:00 scosh
| >12445   ttyp7 01-08:31:09 desktop
| >
| >customer:# kill -9 12443 12445 18373 18375
| >12443: No such process
| >12445: No such process
| >18373: No such process
| >18375: No such process
| >
| >They do seem to be consuming resources:
| >
| >customer:# top | grep usergaw
| >18375 usergaw     20    4  1352K   256K run    32.7H  desktop
| >12445 usergaw     20    4  1352K   256K run    32.5H  desktop
| >
| >What's happening, and what's to be done?
| >
| 
| Traverse PPID's up and kill (more gracefully) the parent X session?

No X Session involved here, Brian, that's the char-based dektop...

-- 
JP
0
Reply jpr5879 (1158) 1/14/2011 5:09:23 PM

Bela Lubkin typed (on Thu, Jan 13, 2011 at 09:36:13PM -0800):
| Jean-Pierre Radley wrote:
| 
| > On a customer's OSR 5.0.6 machine (patched with 506a),
| > there are some unkillable processes:
| >
| > customer:# ps -u usergaw
| >   PID     TTY        TIME CMD
| > 18373   ttyp4    00:00:00 scosh
| > 18375   ttyp4 01-08:42:02 desktop
| > 12443   ttyp7    00:00:00 scosh
| > 12445   ttyp7 01-08:31:09 desktop
| >
| > customer:# kill -9 12443 12445 18373 18375
| > 12443: No such process
| > 12445: No such process
| > 18373: No such process
| > 18375: No such process
| >
| > They do seem to be consuming resources:
| >
| > customer:# top | grep usergaw
| > 18375 usergaw     20    4  1352K   256K run    32.7H  desktop
| > 12445 usergaw     20    4  1352K   256K run    32.5H  desktop
| >
| > What's happening, and what's to be done?
| 
| "No such process" is peculiar.  Some shells have a built-in "kill", see
| if /bin/kill gets a different result.  I expect it still won't work, but
| I hope to see the real errno.

/bin/kill -9 produces same "No such process" message for each process.
There's a /usr/gnu/bin/kill on the system, and it emits the same squawk.

| Are there any other processes on those ptys or their master sides?
| `lsof /dev/?typ[47]`.  Do the processes have any interesting files open?
| `lsof -p18373,etc`.  Is there a file (such as /dev/ttyp4) opened only by
| processes you want to kill, that you can `fuser -k`?  That may succeed
 may at least get a good error message.

Lsof has similar results for either tty, so here's just for p7:

customer:# lsof /dev/?typ7
scosh   12443 custgaw    0u   CHR   58,7  0t37391 35216 /dev/ttyp7
scosh   12443 custgaw    1u   CHR   58,7  0t37391 35216 /dev/ttyp7
scosh   12443 custgaw    2u   CHR   58,7  0t37391 35216 /dev/ttyp7
desktop 12445 custgaw    0u   CHR   58,7  0t37391 35216 /dev/ttyp7
desktop 12445 custgaw    1u   CHR   58,7  0t37391 35216 /dev/ttyp7
desktop 12445 custgaw    2u   CHR   58,7  0t37391 35216 /dev/ttyp7
desktop 12445 custgaw    3r   CHR   58,7      0t0 35216 /dev/ttyp7

customer:# lsof -p 12443 -p 12445
scosh   12443 custgaw  cwd    DIR   1,42     3072 117417 / (/dev/root)
scosh   12443 custgaw  txt    REG   1,42    59892    291 /opt/K/SCO/Unix/5.0.6G
a/bin/sh
scosh   12443 custgaw  ltx    REG   1,42   562076   8603 /opt/K/SCO/Unix/5.0.6G
a/usr/lib/libc.so.1
scosh   12443 custgaw    0u   CHR   58,7  0t37391  35216 /dev/ttyp7
scosh   12443 custgaw    1u   CHR   58,7  0t37391  35216 /dev/ttyp7
scosh   12443 custgaw    2u   CHR   58,7  0t37391  35216 /dev/ttyp7
scosh   12443 custgaw   59r   REG   1,42     1889   7960 / (/dev/root)
desktop 12445 custgaw  cwd    DIR   1,42     3072 117417 / (/dev/root)
desktop 12445 custgaw  txt    REG   1,42   319688   7964 / (/dev/root)
desktop 12445 custgaw  ltx    REG   1,42   562076   8603 /opt/K/SCO/Unix/5.0.6G
a/usr/lib/libc.so.1
desktop 12445 custgaw  ltx    REG   1,42   240228  49459 /opt/K/SCO/Unix/5.0.6G
a/usr/lib/libsocket.so.1
desktop 12445 custgaw    0u   CHR   58,7  0t37391  35216 /dev/ttyp7
desktop 12445 custgaw    1u   CHR   58,7  0t37391  35216 /dev/ttyp7
desktop 12445 custgaw    2u   CHR   58,7  0t37391  35216 /dev/ttyp7
desktop 12445 custgaw    3r   CHR   58,7      0t0  35216 /dev/ttyp7
desktop 12445 custgaw    4r   REG   1,42   139405  27724 / (/dev/root)
desktop 12445 custgaw    5r   REG   1,42     1553  27722 / (/dev/root)
desktop 12445 custgaw    6u  FIFO   1,42    0t297 148387 / (/dev/root)
desktop 12445 custgaw    7u   REG   1,42       81 123801 / (/dev/root)
desktop 12445 custgaw    8r   REG   1,42       30  27701 / (/dev/root)
desktop 12445 custgaw    9r   REG   1,42     8138  27723 / (/dev/root)

| If all else fails, it should be possible to do in a process using the
| user-level kernel debugger, `/etc/scodb -w`; let's go there after other
| ideas fail...
| 
| >Bela<

-- 
JP
0
Reply jpr5879 (1158) 1/14/2011 6:08:25 PM

On 1/14/2011 12:09 PM, Jean-Pierre Radley wrote:
> Brian K. White typed (on Thu, Jan 13, 2011 at 09:23:59PM -0500):
> | On 1/13/2011 7:15 PM, Jean-Pierre Radley wrote:
> |>On a customer's OSR 5.0.6 machine (patched with 506a),
> |>there are some unkillable processes:
> |>
> |>customer:# ps -u usergaw
> |>    PID     TTY        TIME CMD
> |>18373   ttyp4    00:00:00 scosh
> |>18375   ttyp4 01-08:42:02 desktop
> |>12443   ttyp7    00:00:00 scosh
> |>12445   ttyp7 01-08:31:09 desktop
> |>
> |>customer:# kill -9 12443 12445 18373 18375
> |>12443: No such process
> |>12445: No such process
> |>18373: No such process
> |>18375: No such process
> |>
> |>They do seem to be consuming resources:
> |>
> |>customer:# top | grep usergaw
> |>18375 usergaw     20    4  1352K   256K run    32.7H  desktop
> |>12445 usergaw     20    4  1352K   256K run    32.5H  desktop
> |>
> |>What's happening, and what's to be done?
> |>
> |
> | Traverse PPID's up and kill (more gracefully) the parent X session?
>
> No X Session involved here, Brian, that's the char-based dektop...
>

Bah I confused scosh with scoterm sorry.

-- 
bkw
0
Reply brian109 (760) 1/14/2011 7:01:08 PM

Jean-Pierre Radley wrote:
> On a customer's OSR 5.0.6 machine (patched with 506a),
(...)
> customer:# top | grep usergaw
> 18375 usergaw     20    4  1352K   256K run    32.7H  desktop
> 12445 usergaw     20    4  1352K   256K run    32.5H  desktop

Sorry for the off-topic, but how could I get info on the memory a 
process is using in OpenServer 5.0.7, with the stock out-of-the-box 
utilities? (i.e., without installing "top" from Skunkware.)
0
Reply pepe5 (204) 1/15/2011 12:05:09 PM

In article <4d318d94$1@news.x-privat.org>, Pepe  <pepe@naleco.com> wrote:
>Sorry for the off-topic, but how could I get info on the memory a 
>process is using in OpenServer 5.0.7, with the stock out-of-the-box 
>utilities? (i.e., without installing "top" from Skunkware.)

ps's "size" and "vsz" data selectors will give you information about process
memory use.  See the ps man page for information on using them.

	John
-- 
John DuBois  spcecdt@armory.com  KC6QKZ/AE  http://www.armory.com/~spcecdt/
0
Reply spcecdt2 (184) 1/15/2011 5:09:09 PM

Jean-Pierre Radley wrote:

> | Are there any other processes on those ptys or their master sides?
> | `lsof /dev/?typ[47]`.  Do the processes have any interesting files open?
> | `lsof -p18373,etc`.  Is there a file (such as /dev/ttyp4) opened only by
> | processes you want to kill, that you can `fuser -k`?  That may succeed
>  may at least get a good error message.
>
> Lsof has similar results for either tty, so here's just for p7:
>
> customer:# lsof /dev/?typ7
> scosh   12443 custgaw    0u   CHR   58,7  0t37391 35216 /dev/ttyp7
> scosh   12443 custgaw    1u   CHR   58,7  0t37391 35216 /dev/ttyp7
> scosh   12443 custgaw    2u   CHR   58,7  0t37391 35216 /dev/ttyp7
> desktop 12445 custgaw    0u   CHR   58,7  0t37391 35216 /dev/ttyp7
> desktop 12445 custgaw    1u   CHR   58,7  0t37391 35216 /dev/ttyp7
> desktop 12445 custgaw    2u   CHR   58,7  0t37391 35216 /dev/ttyp7
> desktop 12445 custgaw    3r   CHR   58,7      0t0 35216 /dev/ttyp7

Ok, those are 2 of the processes you're trying to kill, so:

   # fuser -k /dev/ttyp7  # and ttyp4

.... any joy?  Probably not, but worth a try.

Notice there's nobody with ptyp7 open.  These are the old Berkeley style
ptys with separate master/slave, not /dev/pts/* and the single /dev/ptmx
master.  The spinning process is almost certainly spinning on some sort
of pty I/O.  It should die very neatly if killed -- the real trouble
here seems to be that the kernel is unable to identify the processes by
PID.

But even that doesn't make sense, because `ps` does operations by PID
(sysi86(RDUBLK)) which would fail...

Anyway, after `fuser` fails, try opening the master side of those ptys.
This will also probably fail.  There's also a small chance they'll hang;
put the action in a subshell so your working shell doesn't hang:

   # (echo hey > /dev/ptyp7) &
   # (read whazzup < /dev/ptyp7) &

If they do hang they'll be non-CPU-spinning and probably killable.

If you get this far without any improvement, see whether `trace -p 12445`
and `truss -p 12445` do anything interesting.  These may spew a lot of
output, though the more likely outcome is that they'll just fail.
(These are to be done on the process of each pair which has tons of CPU
time.)  Post the error msgs if they fail, or a summary of output if they
don't (you'll probably have to interrupt if they don't fail.)

This is such an odd situation, it's going to be helpful if you post
exact commands and exact error msgs of everything that happens.

If none of this has any effect, we can kill with scodb...

>Bela<
0
Reply filbo (325) 1/16/2011 12:44:40 PM

Bela Lubkin typed (on Sun, Jan 16, 2011 at 04:44:40AM -0800):
| Jean-Pierre Radley wrote:
| 
| > | Are there any other processes on those ptys or their master sides?
| > | `lsof /dev/?typ[47]`.  Do the processes have any interesting files open?
| > | `lsof -p18373,etc`.  Is there a file (such as /dev/ttyp4) opened only by
| > | processes you want to kill, that you can `fuser -k`?  That may succeed
| >  may at least get a good error message.
| >
| > Lsof has similar results for either tty, so here's just for p7:
| >
| > customer:# lsof /dev/?typ7
| > scosh   12443 custgaw    0u   CHR   58,7  0t37391 35216 /dev/ttyp7
| > scosh   12443 custgaw    1u   CHR   58,7  0t37391 35216 /dev/ttyp7
| > scosh   12443 custgaw    2u   CHR   58,7  0t37391 35216 /dev/ttyp7
| > desktop 12445 custgaw    0u   CHR   58,7  0t37391 35216 /dev/ttyp7
| > desktop 12445 custgaw    1u   CHR   58,7  0t37391 35216 /dev/ttyp7
| > desktop 12445 custgaw    2u   CHR   58,7  0t37391 35216 /dev/ttyp7
| > desktop 12445 custgaw    3r   CHR   58,7      0t0 35216 /dev/ttyp7
| 
| Ok, those are 2 of the processes you're trying to kill, so:
| 
|    # fuser -k /dev/ttyp7  # and ttyp4
| 
| ... any joy?  Probably not, but worth a try.

Changes nothing...

| Notice there's nobody with ptyp7 open.  These are the old Berkeley style
| ptys with separate master/slave, not /dev/pts/* and the single /dev/ptmx
| master.  The spinning process is almost certainly spinning on some sort
| of pty I/O.  It should die very neatly if killed -- the real trouble
| here seems to be that the kernel is unable to identify the processes by
| PID.
| 
| But even that doesn't make sense, because `ps` does operations by PID
| (sysi86(RDUBLK)) which would fail...
| 
| Anyway, after `fuser` fails, try opening the master side of those ptys.
| This will also probably fail.  There's also a small chance they'll hang;
| put the action in a subshell so your working shell doesn't hang:
| 
|    # (echo hey > /dev/ptyp7) &
|    # (read whazzup < /dev/ptyp7) &
| 
| If they do hang they'll be non-CPU-spinning and probably killable.

Both commands barf with: "/dev/ptyp7: I/O error".

| If you get this far without any improvement, see whether `trace -p 12445`
| and `truss -p 12445` do anything interesting.  These may spew a lot of
| output, though the more likely outcome is that they'll just fail.
| (These are to be done on the process of each pair which has tons of CPU
| time.)  Post the error msgs if they fail, or a summary of output if they
| don't (you'll probably have to interrupt if they don't fail.)

Neither trace nor truss are installed on that machine, and the customer
has been rather resistant to introducing anything on a production
machine.  While they're annoyed that these unkillable beasts are
consuming resources, they've not resorted to a reboot.  Uptime is just
now at 37 days.

I'll not know about installing truss or trace until tomorrow.

| This is such an odd situation, it's going to be helpful if you post
| exact commands and exact error msgs of everything that happens.
| 
| If none of this has any effect, we can kill with scodb...


-- 
JP
0
Reply jpr5879 (1158) 1/16/2011 10:17:12 PM

John DuBois wrote:
> In article <4d318d94$1@news.x-privat.org>, Pepe  <pepe@naleco.com> wrote:
> 
>>Sorry for the off-topic, but how could I get info on the memory a 
>>process is using in OpenServer 5.0.7, with the stock out-of-the-box 
>>utilities? (i.e., without installing "top" from Skunkware.)
> 
> 
> ps's "size" and "vsz" data selectors will give you information about process
> memory use.  See the ps man page for information on using them.

Thanks, that's what I was looking for.
0
Reply pepe5 (204) 1/17/2011 6:20:21 PM

10 Replies
43 Views

(page loaded in 0.268 seconds)

5/22/2013 5:26:05 PM


Reply: