f



Trying to grok Sys V message queue overflow behavior

RHEL4 / SuSE 9.3 / x86

/proc/sys/kernel/msgmnb	= 204800000
/proc/sys/kernel/msgmni	= 16
/proc/sys/kernel/msgmax	= 8192

Summary:

Two processes communicating via IPC (Sys V message queues).  Proc_1 
msgsnd()'s messages on two discrete message queues.  There are only 
two types of message; one is 88 bytes, the other, 16.  One type is 
exclusively written to Q1, the other is exclusively written to Q2.

Proc_2 msgrcv()'s messages from both queues and passes them to 
another process (Proc_3) via TCP. I get the same behavior regardless 
of whether Proc_3 is local or remote.

What I'm trying to grok:

As part of failure case testing, I simulate Proc_3 failing by hitting 
it with a SIGSTOP.  This lets the two message queues fill up.  Q1 
hits the 204MB limit, and the host continues to run.  Q2 continues to 
grow, and at a consistent point the host becomes unresponsive (ssh 
consoles stop responding, no new TCP connections are accepted, yet 
pings are responded to):

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages
0x00001f58 0          root       600        0            0
0x00001bee 32769      jdoe       666        204799936    2327272
0x00001d7e 65538      jdoe       666        125599696    7849981

The out-of-memory manager will sometimes, but not always, start 
killing off processes.

1) I understand msgmnb as the system-wide limit on the number of 
bytes in all message queues.  That value is exceeded by a rather 
large amount (125MB more than the 204MB set).  This isn't a hard 
limit?

2) /proc/meminfo doesn't show what I recognize as a normal loss of 
memory resources*.  I don't see any indicator of impending system 
crash.  What resource, exactly, is being exhausted -- and how can I 
get a look at it from within Proc_1 to prevent creating a crash 
condition?

*
MemTotal:      3956848 kB
MemFree:       2990244 kB
Buffers:          1728 kB
Cached:          75372 kB
SwapCached:          0 kB
Active:          30124 kB
Inactive:        61524 kB
HighTotal:     3080168 kB
HighFree:      2984556 kB
LowTotal:       876680 kB
LowFree:          5688 kB
SwapTotal:     1048568 kB
SwapFree:      1048568 kB
Dirty:              44 kB
Writeback:           0 kB
Mapped:          26644 kB
Slab:           861468 kB
Committed_AS:    56992 kB
PageTables:        472 kB
VmallocTotal:   112632 kB
VmallocUsed:      4196 kB
VmallocChunk:   108212 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     2048 kB

Happy to RFTM / RTFW if someone knows a good resource.
0
8/2/2007 8:42:01 PM
comp.unix.programmer 10848 articles. 0 followers. kokososo56 (350) is leader. Post Follow

4 Replies
551 Views

Similar Articles

[PageSpeed] 26

On 2 Aug, 21:42, Richard Eich <richard.e...@domain.invalid> wrote:
> RHEL4 / SuSE 9.3 / x86
>
> /proc/sys/kernel/msgmnb = 204800000
> /proc/sys/kernel/msgmni = 16
> /proc/sys/kernel/msgmax = 8192
>
> Summary:
>
> Two processes communicating via IPC (Sys V message queues).  Proc_1
> msgsnd()'s messages on two discrete message queues.  There are only
> two types of message; one is 88 bytes, the other, 16.  One type is
> exclusively written to Q1, the other is exclusively written to Q2.
>
> Proc_2 msgrcv()'s messages from both queues and passes them to
> another process (Proc_3) via TCP. I get the same behavior regardless
> of whether Proc_3 is local or remote.
>
> What I'm trying to grok:
>
> As part of failure case testing, I simulate Proc_3 failing by hitting
> it with a SIGSTOP.  This lets the two message queues fill up.

Did you mean Proc_2 here?

> Q1
> hits the 204MB limit, and the host continues to run.  Q2 continues to
> grow, and at a consistent point the host becomes unresponsive (ssh
> consoles stop responding, no new TCP connections are accepted, yet
> pings are responded to):

[]

> 2) /proc/meminfo doesn't show what I recognize as a normal loss of
> memory resources*.  I don't see any indicator of impending system
> crash.  What resource, exactly, is being exhausted -- and how can I
> get a look at it from within Proc_1 to prevent creating a crash
> condition?

Did you try using msgctl() to get the number of messages currently on
queue? Or set the queue memory usage limit?
http://www.opengroup.org/onlinepubs/009695399/basedefs/sys/msg.h.html


0
8/2/2007 9:30:09 PM
maxim.yegorushkin@gmail.com wrote...
> On 2 Aug, 21:42, Richard Eich <richard.e...@domain.invalid> wrote:
> > RHEL4 / SuSE 9.3 / x86
> >
> > /proc/sys/kernel/msgmnb = 204800000
> > /proc/sys/kernel/msgmni = 16
> > /proc/sys/kernel/msgmax = 8192
> >
> > Summary:
> >
> > Two processes communicating via IPC (Sys V message queues).  Proc_1
> > msgsnd()'s messages on two discrete message queues.  There are only
> > two types of message; one is 88 bytes, the other, 16.  One type is
> > exclusively written to Q1, the other is exclusively written to Q2.
> >
> > Proc_2 msgrcv()'s messages from both queues and passes them to
> > another process (Proc_3) via TCP. I get the same behavior regardless
> > of whether Proc_3 is local or remote.
> >
> > What I'm trying to grok:
> >
> > As part of failure case testing, I simulate Proc_3 failing by hitting
> > it with a SIGSTOP.  This lets the two message queues fill up.
> 
> Did you mean Proc_2 here?

No, although actually stopping Proc_2 or Proc_3 would have the same 
result wrt to the message queue:  Proc_2 currently blocks on write() 
to the SIGSTOP'ed Proc_3, so Proc_2 is acting as if stopped anyway.

> > Q1
> > hits the 204MB limit, and the host continues to run.  Q2 continues to
> > grow, and at a consistent point the host becomes unresponsive (ssh
> > consoles stop responding, no new TCP connections are accepted, yet
> > pings are responded to):
> 
> []
> 
> > 2) /proc/meminfo doesn't show what I recognize as a normal loss of
> > memory resources*.  I don't see any indicator of impending system
> > crash.  What resource, exactly, is being exhausted -- and how can I
> > get a look at it from within Proc_1 to prevent creating a crash
> > condition?
> 
> Did you try using msgctl() to get the number of messages currently on
> queue? Or set the queue memory usage limit?

I certainly could use msgctl() for that.

I thought, say, 'echo 204800000 > /proc/sys/kernel/msgmnb' sets the 
queue memory usage limit?  If so, I'm curious as to why that limit is 
so freely violated to the point of crashing.

> http://www.opengroup.org/onlinepubs/009695399/basedefs/sys/msg.h.html

Merci.

0
8/2/2007 9:50:47 PM
Richard Eich <richard.eich@domain.invalid> writes:

>
>I thought, say, 'echo 204800000 > /proc/sys/kernel/msgmnb' sets the 
>queue memory usage limit?  If so, I'm curious as to why that limit is 
>so freely violated to the point of crashing.

From the code, it appears linux (2.6.9-22.EL) is applying the
msgmnb value per queue, not system wide.  This matches your behavior
with the first queue throttled at 204mb.

include/linux/msg.h:
#define MSGMNB 16384   /* <= INT_MAX */   /* default max size of a message queue */

See also the function 'newque' in ipc/msg.c.

scott
0
scott
8/2/2007 10:38:18 PM
scott@slp53.sl.home wrote...
> Richard Eich <richard.eich@domain.invalid> writes:
> 
> >I thought, say, 'echo 204800000 > /proc/sys/kernel/msgmnb' sets the 
> >queue memory usage limit?  If so, I'm curious as to why that limit is 
> >so freely violated to the point of crashing.
> 
> From the code, it appears linux (2.6.9-22.EL) is applying the
> msgmnb value per queue, not system wide.  This matches your behavior
> with the first queue throttled at 204mb.
> 
> include/linux/msg.h:
> #define MSGMNB 16384   /* <= INT_MAX */   /* default max size of a message queue */
> 
> See also the function 'newque' in ipc/msg.c.

Yep, it looks like you're correct.  Thank you.

I'd love to know what the system resource is that is being exhausted, 
so that I could measure _that_ as a determinant on whether or not to 
let Proc_1 msgsnd() to its queues.  Otherwise, I think I'm stuck with 
setting a message-number ceiling in a configuration file, which is OK 
but seems a little clumsy and not quite a guarantee that Proc_1 won't 
exceed that system resource anyway.

I would have thought the system resource would have been free memory, 
but (from a /proc/meminfo taken within 3 seconds of a crash) memory 
seems to be OK:

MemTotal:      3956848 kB
MemFree:       2990244 kB
Buffers:          1728 kB
Cached:          75372 kB
SwapCached:          0 kB
Active:          30124 kB
Inactive:        61524 kB
HighTotal:     3080168 kB
HighFree:      2984556 kB
LowTotal:       876680 kB
LowFree:          5688 kB
SwapTotal:     1048568 kB
SwapFree:      1048568 kB
Dirty:              44 kB
Writeback:           0 kB
Mapped:          26644 kB
Slab:           861468 kB
Committed_AS:    56992 kB
PageTables:        472 kB
VmallocTotal:   112632 kB
VmallocUsed:      4196 kB
VmallocChunk:   108212 kB
HugePages_Total:     0
HugePages_Free:      0
Hugepagesize:     2048 kB




0
8/3/2007 12:31:15 PM
Reply: