[9fans] Lock loop in malloc()

  • Follow


While attempting to compile Bison (yeah, still gnawing at that
bone!) I have managed to jam cpp more or less solid.  That's compiling
scan-code-c.c which reduces to compiling scan-code.c.

However, it does not seem to be Bison that's at fault: it seems that an
invocation of alloc() tries to set a lock and never succeeds or gives up.

This is a summary, with some help from acid, subject to some very limited
knowledge on my part:

	term% acid 3208
	/proc/3208/text:386 plan 9 executable
	/sys/lib/acid/port
	/sys/lib/acid/386
	acid: lstk()
	sleep()+0x7 /sys/src/libc/9syscall/sleep.s:5
	lock(lk=0x1f2f8)+0xb7 /sys/src/libc/port/lock.c:25
	plock(p=0x18310)+0x16 /sys/src/libc/port/malloc.c:81
	poolalloc(p=0x18310,n=0x20)+0xf /sys/src/libc/port/pool.c:1223
	malloc(size=0x18)+0x1c /sys/src/libc/port/malloc.c:207
	domalloc(size=0x18)+0xf /sys/src/cmd/cpp/cpp.c:271
	lookup(tp=0xab4a8,install=0x1)+0x74 /sys/src/cmd/cpp/nlist.c:213
	dodefine(trp=0xdfffeeac)+0x40 /sys/src/cmd/cpp/macro.c:23
	control(trp=0xdfffeeac)+0x4b2 /sys/src/cmd/cpp/cpp.c:133
	process(trp=0xdfffeeac)+0xec /sys/src/cmd/cpp/cpp.c:70
	main(argc=0xb,argv=0xdfffef0c)+0x8a /sys/src/cmd/cpp/cpp.c:35
	_main+0x31 /sys/src/libc/386/main9.s:16
	acid: lstk()
	sleep()+0x7 /sys/src/libc/9syscall/sleep.s:5
	lock(lk=0x1f2f8)+0xb7 /sys/src/libc/port/lock.c:25
		i=0x3e8
	plock(p=0x18310)+0x16 /sys/src/libc/port/malloc.c:81
		pv=0x1f2f8
	poolalloc(p=0x18310,n=0x20)+0xf /sys/src/libc/port/pool.c:1223
		v=0x1928
	malloc(size=0x18)+0x1c /sys/src/libc/port/malloc.c:207
		v=0x18310
	domalloc(size=0x18)+0xf /sys/src/cmd/cpp/cpp.c:271
		p=0x0
	lookup(tp=0xab4a8,install=0x1)+0x74 /sys/src/cmd/cpp/nlist.c:213
		h=0x6f
		np=0x1
	dodefine(trp=0xdfffeeac)+0x40 /sys/src/cmd/cpp/macro.c:23
		dots=0x0
		tp=0xab4a8
		np=0x27100
		args=0x3e
		narg=0x204b6
		err=0x6
		atp=0x27140
		def=0x3876
		tap=0x204b6
	control(trp=0xdfffeeac)+0x4b2 /sys/src/cmd/cpp/cpp.c:133
		tp=0xab498
		np=0x27100
	process(trp=0xdfffeeac)+0xec /sys/src/cmd/cpp/cpp.c:70
		anymacros=0x80000020
	main(argc=0xb,argv=0xdfffef0c)+0x8a /sys/src/cmd/cpp/cpp.c:35
		ebuf=0x3a707063
		tr=0xab498
	_main+0x31 /sys/src/libc/386/main9.s:16
	acid: 

I've no idea how to track this problem down, let alone fix it.  But this
problem is reproducible, albeit not using a small code base.  It is
mildly possible that my Plan 9 installation is not altogether pristine
and is causing this situation, but I can't think how.

++L

0
Reply lucio (1062) 7/25/2011 2:01:34 PM

could you snap(4) this process and mail me/put on sources the
compressed snap?  it's not really possible for this lock to be
held unless cpp has stepped on its lock and the resulting garbage
makes it look like the lock is set.

if you want to try some things yourself, i'm going to run
	; 8c -a /sys/src/cmd/cpp/macro.c > cpp.acid
	; acid -lcpp.acid $pid
	; (Lock)0x1f2f8
	; dump(0x1f2f8, 16, "\X")

to start off with and consider what to do next based on
the results.

- erik

0
Reply quanstro3716 (244) 7/25/2011 2:42:29 PM


> However, it does not seem to be Bison that's at fault: it seems that an
> invocation of alloc() tries to set a lock and never succeeds or gives up.

It's possible that you've found a latent bug in malloc.
However, that malloc has been running along pretty
steadily for a decade at this point, so it wouldn't be
my first guess.  My first guess would be that something
in Bison or in the code you added has corrupted memory,
so that the lock has been overwritten with garbage and
therefore cannot be acquired.

The address passed to lock - 0x1f2f8 in the trace -
should be the address of the symbol sbrkmempriv.
I assume it will be, but check (if not, there's other
memory corruption).  Assuming it is, that's in the bss
so the most likely culprits for corruption are the
symbols near it: run nm | sort and look around.

Another thing to do would be to take the bison code
you are compiling to a Linux box and run it under
valgrind.

Russ

0
Reply rsc (1807) 7/25/2011 2:43:48 PM

On Mon, Jul 25, 2011 at 10:42:11AM -0400, Russ Cox wrote:
> 
> > However, it does not seem to be Bison that's at fault: it seems that an
> > invocation of alloc() tries to set a lock and never succeeds or gives up.
> 
> It's possible that you've found a latent bug in malloc.
> However, that malloc has been running along pretty
> steadily for a decade at this point, so it wouldn't be
> my first guess.  My first guess would be that something
> in Bison or in the code you added has corrupted memory,
> so that the lock has been overwritten with garbage and
> therefore cannot be acquired.
> 
Well, there has to be a problem, I agree that malloc() is used too
extensively in Plan 9 to only reveal a fault at this time.  The same may
be said of cpp, but it's more likely that something evil has been lurking
in there.  I really hope that it is not something I have done that causes
the problem, but I really can't see how that would be possible without
cpp's cooperation.

> The address passed to lock - 0x1f2f8 in the trace -
> should be the address of the symbol sbrkmempriv.
> I assume it will be, but check (if not, there's other
> memory corruption).  Assuming it is, that's in the bss
> so the most likely culprits for corruption are the
> symbols near it: run nm | sort and look around.
> 
Following Erik's direction, it seems that the lock value is 0x0deadead,
so I will start with the premise that a problem has been detected, but
not fatally.  I'll need to dig into cpp, then.  Are there known limits
in cpp's input sizes?

> Another thing to do would be to take the bison code
> you are compiling to a Linux box and run it under
> valgrind.
> 
I have heard good reports regarding valgrind, but it is totally foreign
to me, I'lll resort to that when I have no alternative left.  Thanks for
the advice, please forgive me for not following it immediately.

++L

0
Reply lucio (1062) 7/25/2011 3:19:02 PM

well, this was a fun little bug.  i downloaded bison and within a few
minutes i'd narrowed the problem down to lib/c-ctype.h.  and
it only took another minute to isolate this as the problem statement.

#if (' ' == 32) && ('!' == 33) && ('"' == 34) && ('#' == 35) \
    && ('%' == 37) && ('&' == 38) && ('\'' == 39) && ('(' == 40) \
    && (')' == 41) && ('*' == 42) && ('+' == 43) && (',' == 44) \
    && ('-' == 45) && ('.' == 46) && ('/' == 47) && ('0' == 48) \
    && ('1' == 49) && ('2' == 50) && ('3' == 51) && ('4' == 52) \
    && ('5' == 53) && ('6' == 54) && ('7' == 55) && ('8' == 56) \
    && ('9' == 57) && (':' == 58) && (';' == 59) && ('<' == 60) \
    && ('=' == 61) && ('>' == 62) && ('?' == 63) && ('A' == 65) \
    && ('B' == 66) && ('C' == 67) && ('D' == 68) && ('E' == 69) \
    && ('F' == 70) && ('G' == 71) && ('H' == 72) && ('I' == 73) \
    && ('J' == 74) && ('K' == 75) && ('L' == 76) && ('M' == 77) \
    && ('N' == 78) && ('O' == 79) && ('P' == 80) && ('Q' == 81) \
    && ('R' == 82) && ('S' == 83) && ('T' == 84) && ('U' == 85) \
    && ('V' == 86) && ('W' == 87) && ('X' == 88) && ('Y' == 89) \
    && ('Z' == 90) && ('[' == 91) && ('\\' == 92) && (']' == 93) \
    && ('^' == 94) && ('_' == 95) && ('a' == 97) && ('b' == 98) \
    && ('c' == 99) && ('d' == 100) && ('e' == 101) && ('f' == 102) \
    && ('g' == 103) && ('h' == 104) && ('i' == 105) && ('j' == 106) \
    && ('k' == 107) && ('l' == 108) && ('m' == 109) && ('n' == 110) \
    && ('o' == 111) && ('p' == 112) && ('q' == 113) && ('r' == 114) \
    && ('s' == 115) && ('t' == 116) && ('u' == 117) && ('v' == 118) \
    && ('w' == 119) && ('x' == 120) && ('y' == 121) && ('z' == 122) \
    && ('{' == 123) && ('|' == 124) && ('}' == 125) && ('~' == 126)
/* The character set is ASCII or one of its variants or extensions, not EBCDIC.
   Testing the value of '\n' and '\r' is not relevant.  */
#define C_CTYPE_ASCII 1
#endif

from there, the problem was pretty easy to spot NSTAK was too small,
and unguarded.  the funny  "+ 1" is to allow for a few operators that
can add 2 to the stack in one trip through the loop.

; diffy -c eval.c
/n/dump/2011/0725/sys/src/cmd/cpp/eval.c:2,8 - eval.c:2,8
  #include <libc.h>
  #include "cpp.h"
  
- #define	NSTAK	32
+ #define	NSTAK	1024
  #define	SGN	0
  #define	UNS	1
  #define	UND	2
/n/dump/2011/0725/sys/src/cmd/cpp/eval.c:92,99 - eval.c:92,99
  
  int	evalop(struct pri);
  struct	value tokval(Token *);
- struct value vals[NSTAK], *vp;
- enum toktype ops[NSTAK], *op;
+ struct value vals[NSTAK + 1], *vp;
+ enum toktype ops[NSTAK + 1], *op;
  
  /*
   * Evaluate an #if #elif #ifdef #ifndef line.  trp->tp points to the keyword.
/n/dump/2011/0725/sys/src/cmd/cpp/eval.c:122,127 - eval.c:122,129
  	op = ops;
  	*op++ = END;
  	for (rand=0, tp = trp->bp+ntok; tp < trp->lp; tp++) {
+ 		if(op >= ops + NSTAK)
+ 			sysfatal("cpp: can't evalute #if: increase NSTAK");
  		switch(tp->type) {
  		case WS:
  		case NL:

- erik

0
Reply quanstro (3877) 7/26/2011 1:41:30 AM

4 Replies
21 Views

(page loaded in 0.101 seconds)

3/31/2013 5:34:15 AM


Reply: