On a customer's OSR 5.0.6 machine (patched with 506a),
there are some unkillable processes:
customer:# ps -u usergaw
PID TTY TIME CMD
18373 ttyp4 00:00:00 scosh
18375 ttyp4 01-08:42:02 desktop
12443 ttyp7 00:00:00 scosh
12445 ttyp7 01-08:31:09 desktop
customer:# kill -9 12443 12445 18373 18375
12443: No such process
12445: No such process
18373: No such process
18375: No such process
They do seem to be consuming resources:
customer:# top | grep usergaw
18375 usergaw 20 4 1352K 256K run 32.7H desktop
12445 usergaw 20 4 1352K 256K run 32.5H desktop
What's happening, and what's to be done?
--
JP
|
|
0
|
|
|
|
Reply
|
jpr5879 (1158)
|
1/14/2011 12:15:26 AM |
|
On 1/13/2011 7:15 PM, Jean-Pierre Radley wrote:
> On a customer's OSR 5.0.6 machine (patched with 506a),
> there are some unkillable processes:
>
> customer:# ps -u usergaw
> PID TTY TIME CMD
> 18373 ttyp4 00:00:00 scosh
> 18375 ttyp4 01-08:42:02 desktop
> 12443 ttyp7 00:00:00 scosh
> 12445 ttyp7 01-08:31:09 desktop
>
> customer:# kill -9 12443 12445 18373 18375
> 12443: No such process
> 12445: No such process
> 18373: No such process
> 18375: No such process
>
> They do seem to be consuming resources:
>
> customer:# top | grep usergaw
> 18375 usergaw 20 4 1352K 256K run 32.7H desktop
> 12445 usergaw 20 4 1352K 256K run 32.5H desktop
>
> What's happening, and what's to be done?
>
Traverse PPID's up and kill (more gracefully) the parent X session?
--
bkw
|
|
0
|
|
|
|
Reply
|
brian109 (760)
|
1/14/2011 2:23:59 AM
|
|
Jean-Pierre Radley wrote:
> On a customer's OSR 5.0.6 machine (patched with 506a),
> there are some unkillable processes:
>
> customer:# ps -u usergaw
> PID TTY TIME CMD
> 18373 ttyp4 00:00:00 scosh
> 18375 ttyp4 01-08:42:02 desktop
> 12443 ttyp7 00:00:00 scosh
> 12445 ttyp7 01-08:31:09 desktop
>
> customer:# kill -9 12443 12445 18373 18375
> 12443: No such process
> 12445: No such process
> 18373: No such process
> 18375: No such process
>
> They do seem to be consuming resources:
>
> customer:# top | grep usergaw
> 18375 usergaw 20 4 1352K 256K run 32.7H desktop
> 12445 usergaw 20 4 1352K 256K run 32.5H desktop
>
> What's happening, and what's to be done?
"No such process" is peculiar. Some shells have a built-in "kill", see
if /bin/kill gets a different result. I expect it still won't work, but
I hope to see the real errno.
Are there any other processes on those ptys or their master sides?
`lsof /dev/?typ[47]`. Do the processes have any interesting files open?
`lsof -p18373,etc`. Is there a file (such as /dev/ttyp4) opened only by
processes you want to kill, that you can `fuser -k`? That may succeed
or may at least get a good error message.
If all else fails, it should be possible to do in a process using the
user-level kernel debugger, `/etc/scodb -w`; let's go there after other
ideas fail...
>Bela<
|
|
0
|
|
|
|
Reply
|
filbo (325)
|
1/14/2011 5:36:13 AM
|
|
Brian K. White typed (on Thu, Jan 13, 2011 at 09:23:59PM -0500):
| On 1/13/2011 7:15 PM, Jean-Pierre Radley wrote:
| >On a customer's OSR 5.0.6 machine (patched with 506a),
| >there are some unkillable processes:
| >
| >customer:# ps -u usergaw
| > PID TTY TIME CMD
| >18373 ttyp4 00:00:00 scosh
| >18375 ttyp4 01-08:42:02 desktop
| >12443 ttyp7 00:00:00 scosh
| >12445 ttyp7 01-08:31:09 desktop
| >
| >customer:# kill -9 12443 12445 18373 18375
| >12443: No such process
| >12445: No such process
| >18373: No such process
| >18375: No such process
| >
| >They do seem to be consuming resources:
| >
| >customer:# top | grep usergaw
| >18375 usergaw 20 4 1352K 256K run 32.7H desktop
| >12445 usergaw 20 4 1352K 256K run 32.5H desktop
| >
| >What's happening, and what's to be done?
| >
|
| Traverse PPID's up and kill (more gracefully) the parent X session?
No X Session involved here, Brian, that's the char-based dektop...
--
JP
|
|
0
|
|
|
|
Reply
|
jpr5879 (1158)
|
1/14/2011 5:09:23 PM
|
|
Bela Lubkin typed (on Thu, Jan 13, 2011 at 09:36:13PM -0800):
| Jean-Pierre Radley wrote:
|
| > On a customer's OSR 5.0.6 machine (patched with 506a),
| > there are some unkillable processes:
| >
| > customer:# ps -u usergaw
| > PID TTY TIME CMD
| > 18373 ttyp4 00:00:00 scosh
| > 18375 ttyp4 01-08:42:02 desktop
| > 12443 ttyp7 00:00:00 scosh
| > 12445 ttyp7 01-08:31:09 desktop
| >
| > customer:# kill -9 12443 12445 18373 18375
| > 12443: No such process
| > 12445: No such process
| > 18373: No such process
| > 18375: No such process
| >
| > They do seem to be consuming resources:
| >
| > customer:# top | grep usergaw
| > 18375 usergaw 20 4 1352K 256K run 32.7H desktop
| > 12445 usergaw 20 4 1352K 256K run 32.5H desktop
| >
| > What's happening, and what's to be done?
|
| "No such process" is peculiar. Some shells have a built-in "kill", see
| if /bin/kill gets a different result. I expect it still won't work, but
| I hope to see the real errno.
/bin/kill -9 produces same "No such process" message for each process.
There's a /usr/gnu/bin/kill on the system, and it emits the same squawk.
| Are there any other processes on those ptys or their master sides?
| `lsof /dev/?typ[47]`. Do the processes have any interesting files open?
| `lsof -p18373,etc`. Is there a file (such as /dev/ttyp4) opened only by
| processes you want to kill, that you can `fuser -k`? That may succeed
may at least get a good error message.
Lsof has similar results for either tty, so here's just for p7:
customer:# lsof /dev/?typ7
scosh 12443 custgaw 0u CHR 58,7 0t37391 35216 /dev/ttyp7
scosh 12443 custgaw 1u CHR 58,7 0t37391 35216 /dev/ttyp7
scosh 12443 custgaw 2u CHR 58,7 0t37391 35216 /dev/ttyp7
desktop 12445 custgaw 0u CHR 58,7 0t37391 35216 /dev/ttyp7
desktop 12445 custgaw 1u CHR 58,7 0t37391 35216 /dev/ttyp7
desktop 12445 custgaw 2u CHR 58,7 0t37391 35216 /dev/ttyp7
desktop 12445 custgaw 3r CHR 58,7 0t0 35216 /dev/ttyp7
customer:# lsof -p 12443 -p 12445
scosh 12443 custgaw cwd DIR 1,42 3072 117417 / (/dev/root)
scosh 12443 custgaw txt REG 1,42 59892 291 /opt/K/SCO/Unix/5.0.6G
a/bin/sh
scosh 12443 custgaw ltx REG 1,42 562076 8603 /opt/K/SCO/Unix/5.0.6G
a/usr/lib/libc.so.1
scosh 12443 custgaw 0u CHR 58,7 0t37391 35216 /dev/ttyp7
scosh 12443 custgaw 1u CHR 58,7 0t37391 35216 /dev/ttyp7
scosh 12443 custgaw 2u CHR 58,7 0t37391 35216 /dev/ttyp7
scosh 12443 custgaw 59r REG 1,42 1889 7960 / (/dev/root)
desktop 12445 custgaw cwd DIR 1,42 3072 117417 / (/dev/root)
desktop 12445 custgaw txt REG 1,42 319688 7964 / (/dev/root)
desktop 12445 custgaw ltx REG 1,42 562076 8603 /opt/K/SCO/Unix/5.0.6G
a/usr/lib/libc.so.1
desktop 12445 custgaw ltx REG 1,42 240228 49459 /opt/K/SCO/Unix/5.0.6G
a/usr/lib/libsocket.so.1
desktop 12445 custgaw 0u CHR 58,7 0t37391 35216 /dev/ttyp7
desktop 12445 custgaw 1u CHR 58,7 0t37391 35216 /dev/ttyp7
desktop 12445 custgaw 2u CHR 58,7 0t37391 35216 /dev/ttyp7
desktop 12445 custgaw 3r CHR 58,7 0t0 35216 /dev/ttyp7
desktop 12445 custgaw 4r REG 1,42 139405 27724 / (/dev/root)
desktop 12445 custgaw 5r REG 1,42 1553 27722 / (/dev/root)
desktop 12445 custgaw 6u FIFO 1,42 0t297 148387 / (/dev/root)
desktop 12445 custgaw 7u REG 1,42 81 123801 / (/dev/root)
desktop 12445 custgaw 8r REG 1,42 30 27701 / (/dev/root)
desktop 12445 custgaw 9r REG 1,42 8138 27723 / (/dev/root)
| If all else fails, it should be possible to do in a process using the
| user-level kernel debugger, `/etc/scodb -w`; let's go there after other
| ideas fail...
|
| >Bela<
--
JP
|
|
0
|
|
|
|
Reply
|
jpr5879 (1158)
|
1/14/2011 6:08:25 PM
|
|
On 1/14/2011 12:09 PM, Jean-Pierre Radley wrote:
> Brian K. White typed (on Thu, Jan 13, 2011 at 09:23:59PM -0500):
> | On 1/13/2011 7:15 PM, Jean-Pierre Radley wrote:
> |>On a customer's OSR 5.0.6 machine (patched with 506a),
> |>there are some unkillable processes:
> |>
> |>customer:# ps -u usergaw
> |> PID TTY TIME CMD
> |>18373 ttyp4 00:00:00 scosh
> |>18375 ttyp4 01-08:42:02 desktop
> |>12443 ttyp7 00:00:00 scosh
> |>12445 ttyp7 01-08:31:09 desktop
> |>
> |>customer:# kill -9 12443 12445 18373 18375
> |>12443: No such process
> |>12445: No such process
> |>18373: No such process
> |>18375: No such process
> |>
> |>They do seem to be consuming resources:
> |>
> |>customer:# top | grep usergaw
> |>18375 usergaw 20 4 1352K 256K run 32.7H desktop
> |>12445 usergaw 20 4 1352K 256K run 32.5H desktop
> |>
> |>What's happening, and what's to be done?
> |>
> |
> | Traverse PPID's up and kill (more gracefully) the parent X session?
>
> No X Session involved here, Brian, that's the char-based dektop...
>
Bah I confused scosh with scoterm sorry.
--
bkw
|
|
0
|
|
|
|
Reply
|
brian109 (760)
|
1/14/2011 7:01:08 PM
|
|
Jean-Pierre Radley wrote:
> On a customer's OSR 5.0.6 machine (patched with 506a),
(...)
> customer:# top | grep usergaw
> 18375 usergaw 20 4 1352K 256K run 32.7H desktop
> 12445 usergaw 20 4 1352K 256K run 32.5H desktop
Sorry for the off-topic, but how could I get info on the memory a
process is using in OpenServer 5.0.7, with the stock out-of-the-box
utilities? (i.e., without installing "top" from Skunkware.)
|
|
0
|
|
|
|
Reply
|
pepe5 (204)
|
1/15/2011 12:05:09 PM
|
|
In article <4d318d94$1@news.x-privat.org>, Pepe <pepe@naleco.com> wrote:
>Sorry for the off-topic, but how could I get info on the memory a
>process is using in OpenServer 5.0.7, with the stock out-of-the-box
>utilities? (i.e., without installing "top" from Skunkware.)
ps's "size" and "vsz" data selectors will give you information about process
memory use. See the ps man page for information on using them.
John
--
John DuBois spcecdt@armory.com KC6QKZ/AE http://www.armory.com/~spcecdt/
|
|
0
|
|
|
|
Reply
|
spcecdt2 (184)
|
1/15/2011 5:09:09 PM
|
|
Jean-Pierre Radley wrote:
> | Are there any other processes on those ptys or their master sides?
> | `lsof /dev/?typ[47]`. Do the processes have any interesting files open?
> | `lsof -p18373,etc`. Is there a file (such as /dev/ttyp4) opened only by
> | processes you want to kill, that you can `fuser -k`? That may succeed
> may at least get a good error message.
>
> Lsof has similar results for either tty, so here's just for p7:
>
> customer:# lsof /dev/?typ7
> scosh 12443 custgaw 0u CHR 58,7 0t37391 35216 /dev/ttyp7
> scosh 12443 custgaw 1u CHR 58,7 0t37391 35216 /dev/ttyp7
> scosh 12443 custgaw 2u CHR 58,7 0t37391 35216 /dev/ttyp7
> desktop 12445 custgaw 0u CHR 58,7 0t37391 35216 /dev/ttyp7
> desktop 12445 custgaw 1u CHR 58,7 0t37391 35216 /dev/ttyp7
> desktop 12445 custgaw 2u CHR 58,7 0t37391 35216 /dev/ttyp7
> desktop 12445 custgaw 3r CHR 58,7 0t0 35216 /dev/ttyp7
Ok, those are 2 of the processes you're trying to kill, so:
# fuser -k /dev/ttyp7 # and ttyp4
.... any joy? Probably not, but worth a try.
Notice there's nobody with ptyp7 open. These are the old Berkeley style
ptys with separate master/slave, not /dev/pts/* and the single /dev/ptmx
master. The spinning process is almost certainly spinning on some sort
of pty I/O. It should die very neatly if killed -- the real trouble
here seems to be that the kernel is unable to identify the processes by
PID.
But even that doesn't make sense, because `ps` does operations by PID
(sysi86(RDUBLK)) which would fail...
Anyway, after `fuser` fails, try opening the master side of those ptys.
This will also probably fail. There's also a small chance they'll hang;
put the action in a subshell so your working shell doesn't hang:
# (echo hey > /dev/ptyp7) &
# (read whazzup < /dev/ptyp7) &
If they do hang they'll be non-CPU-spinning and probably killable.
If you get this far without any improvement, see whether `trace -p 12445`
and `truss -p 12445` do anything interesting. These may spew a lot of
output, though the more likely outcome is that they'll just fail.
(These are to be done on the process of each pair which has tons of CPU
time.) Post the error msgs if they fail, or a summary of output if they
don't (you'll probably have to interrupt if they don't fail.)
This is such an odd situation, it's going to be helpful if you post
exact commands and exact error msgs of everything that happens.
If none of this has any effect, we can kill with scodb...
>Bela<
|
|
0
|
|
|
|
Reply
|
filbo (325)
|
1/16/2011 12:44:40 PM
|
|
Bela Lubkin typed (on Sun, Jan 16, 2011 at 04:44:40AM -0800):
| Jean-Pierre Radley wrote:
|
| > | Are there any other processes on those ptys or their master sides?
| > | `lsof /dev/?typ[47]`. Do the processes have any interesting files open?
| > | `lsof -p18373,etc`. Is there a file (such as /dev/ttyp4) opened only by
| > | processes you want to kill, that you can `fuser -k`? That may succeed
| > may at least get a good error message.
| >
| > Lsof has similar results for either tty, so here's just for p7:
| >
| > customer:# lsof /dev/?typ7
| > scosh 12443 custgaw 0u CHR 58,7 0t37391 35216 /dev/ttyp7
| > scosh 12443 custgaw 1u CHR 58,7 0t37391 35216 /dev/ttyp7
| > scosh 12443 custgaw 2u CHR 58,7 0t37391 35216 /dev/ttyp7
| > desktop 12445 custgaw 0u CHR 58,7 0t37391 35216 /dev/ttyp7
| > desktop 12445 custgaw 1u CHR 58,7 0t37391 35216 /dev/ttyp7
| > desktop 12445 custgaw 2u CHR 58,7 0t37391 35216 /dev/ttyp7
| > desktop 12445 custgaw 3r CHR 58,7 0t0 35216 /dev/ttyp7
|
| Ok, those are 2 of the processes you're trying to kill, so:
|
| # fuser -k /dev/ttyp7 # and ttyp4
|
| ... any joy? Probably not, but worth a try.
Changes nothing...
| Notice there's nobody with ptyp7 open. These are the old Berkeley style
| ptys with separate master/slave, not /dev/pts/* and the single /dev/ptmx
| master. The spinning process is almost certainly spinning on some sort
| of pty I/O. It should die very neatly if killed -- the real trouble
| here seems to be that the kernel is unable to identify the processes by
| PID.
|
| But even that doesn't make sense, because `ps` does operations by PID
| (sysi86(RDUBLK)) which would fail...
|
| Anyway, after `fuser` fails, try opening the master side of those ptys.
| This will also probably fail. There's also a small chance they'll hang;
| put the action in a subshell so your working shell doesn't hang:
|
| # (echo hey > /dev/ptyp7) &
| # (read whazzup < /dev/ptyp7) &
|
| If they do hang they'll be non-CPU-spinning and probably killable.
Both commands barf with: "/dev/ptyp7: I/O error".
| If you get this far without any improvement, see whether `trace -p 12445`
| and `truss -p 12445` do anything interesting. These may spew a lot of
| output, though the more likely outcome is that they'll just fail.
| (These are to be done on the process of each pair which has tons of CPU
| time.) Post the error msgs if they fail, or a summary of output if they
| don't (you'll probably have to interrupt if they don't fail.)
Neither trace nor truss are installed on that machine, and the customer
has been rather resistant to introducing anything on a production
machine. While they're annoyed that these unkillable beasts are
consuming resources, they've not resorted to a reboot. Uptime is just
now at 37 days.
I'll not know about installing truss or trace until tomorrow.
| This is such an odd situation, it's going to be helpful if you post
| exact commands and exact error msgs of everything that happens.
|
| If none of this has any effect, we can kill with scodb...
--
JP
|
|
0
|
|
|
|
Reply
|
jpr5879 (1158)
|
1/16/2011 10:17:12 PM
|
|
John DuBois wrote:
> In article <4d318d94$1@news.x-privat.org>, Pepe <pepe@naleco.com> wrote:
>
>>Sorry for the off-topic, but how could I get info on the memory a
>>process is using in OpenServer 5.0.7, with the stock out-of-the-box
>>utilities? (i.e., without installing "top" from Skunkware.)
>
>
> ps's "size" and "vsz" data selectors will give you information about process
> memory use. See the ps man page for information on using them.
Thanks, that's what I was looking for.
|
|
0
|
|
|
|
Reply
|
pepe5 (204)
|
1/17/2011 6:20:21 PM
|
|
|
10 Replies
43 Views
(page loaded in 0.268 seconds)
|