A remark about stupidity

  • Follow


Below is some historic source code of a function written by me which is part of
some internal library 'we' are using:

---------------
static int grow_evr_msg_to(struct evr_msg *evr_msg, unsigned need)
{
    uint8_t *p;
    unsigned have, want, ofs;

    have = evr_msg->e - evr_msg->p;
    want = evr_msg->e - evr_msg->s;
    do {
        have += want;
        want *= 2;
    } while (have < need);

    ofs = evr_msg->p - evr_msg->s;
    p = realloc(evr_msg, sizeof(*evr_msg) + want);
    if (!p) return -1;

    evr_msg = (void *)p;
    evr_msg->s = (void *)(evr_msg + 1);
    evr_msg->p = evr_msg->s + ofs;
    evr_msg->e = evr_msg->s + want;
    return 0;
}
--------------

This has been used 'in production' since some time in August, until it
finally blew up on a customer appliance this Wednesday
(2011/11/16). The error is especially ghastly because the code will
work 'most of the time' and when it doesn't, the invoker will happily
write over whatever memory management data structures malloc uses
internally with aribtrarily damageing consequences. This is the source
code equivalent of something known in German as 'Augenblicksversagen'
(blink of an eye failure) when it leads to (usually disastrous) car
crashes on motorways: Despite knowing better and having applied this
better knowledge countless times, someone suddenly does something
completely stupid (in a hurry) because his brain just chose to have a
hiccup. When writing code, it is therefore prudent to assume that
everyone will make stupid mistakes, at least once in a while, and -
despite this isn't exactly popular in "many circles", that nothing
should ever be moved from development to production without a prior
code review conducted by the person who wrote it after some time has
passed and ideally, additionally by second person who hasn't been
involved with it so far.

Humans are going to make stupid mistakes and some strategy to detect
and mitigate these early should be employed (and the guy who is
convinced that HE would never do something likes shouldn't[*]).

[*] I've surely written my share of code with deadlocks. But this is
just an ordinary programming error, not some kind of the
horrid-monster-which-lurks-in-the-deeps, and the remedy is simple (fix
the code to use the proper locking order).
0
Reply rweikusat (2678) 11/25/2011 6:24:36 PM

Rainer Weikusat <rweikusat@mssgmbh.com> wrote:
> despite this isn't exactly popular in "many circles", that nothing
> should ever be moved from development to production without a prior
> code review conducted by the person who wrote it after some time has
> passed and ideally, additionally by second person who hasn't been
> involved with it so far.
>=20

Amen.

Unfortunately, in the real world, most of the source code is never
reviewed.

--=20
Andr=C3=A9 Gillibert
0
Reply UTF 11/25/2011 6:46:23 PM


On 11/25/2011 1:24 PM, Rainer Weikusat wrote:
>[...]
> static int grow_evr_msg_to(struct evr_msg *evr_msg, unsigned need)
> {
>      uint8_t *p;
>      unsigned have, want, ofs;
>
>      have = evr_msg->e - evr_msg->p;
>      want = evr_msg->e - evr_msg->s;
>      do {
>          have += want;
>          want *= 2;
>      } while (have<  need);
>
>      ofs = evr_msg->p - evr_msg->s;
>      p = realloc(evr_msg, sizeof(*evr_msg) + want);
>      if (!p) return -1;
>
>      evr_msg = (void *)p;
>      evr_msg->s = (void *)(evr_msg + 1);
>      evr_msg->p = evr_msg->s + ofs;
>      evr_msg->e = evr_msg->s + want;
>      return 0;
> }
>[...]
> [*] I've surely written my share of code with deadlocks. But this is
> just an ordinary programming error, not some kind of the
> horrid-monster-which-lurks-in-the-deeps, and the remedy is simple (fix
> the code to use the proper locking order).

     Perhaps I'm missing something, but this doesn't look like a
locking problem.  Rather, it looks like a design problem: If the
function returns 0, the caller knows that all its old pointers to
the struct are untrustworthy, but can't discover what new pointer
value it should replace them with.

     Maybe a little more context would be helpful.

-- 
Eric Sosman
esosman@ieee-dot-org.invalid
0
Reply esosman2 (2945) 11/25/2011 7:24:48 PM

Eric Sosman <esosman@ieee-dot-org.invalid> writes:
> On 11/25/2011 1:24 PM, Rainer Weikusat wrote:
>>[...]
>> static int grow_evr_msg_to(struct evr_msg *evr_msg, unsigned need)
>> {
>>      uint8_t *p;
>>      unsigned have, want, ofs;
>>
>>      have = evr_msg->e - evr_msg->p;
>>      want = evr_msg->e - evr_msg->s;
>>      do {
>>          have += want;
>>          want *= 2;
>>      } while (have<  need);
>>
>>      ofs = evr_msg->p - evr_msg->s;
>>      p = realloc(evr_msg, sizeof(*evr_msg) + want);
>>      if (!p) return -1;
>>
>>      evr_msg = (void *)p;
>>      evr_msg->s = (void *)(evr_msg + 1);
>>      evr_msg->p = evr_msg->s + ofs;
>>      evr_msg->e = evr_msg->s + want;
>>      return 0;
>> }
>>[...]
>> [*] I've surely written my share of code with deadlocks. But this is
>> just an ordinary programming error, not some kind of the
>> horrid-monster-which-lurks-in-the-deeps, and the remedy is simple (fix
>> the code to use the proper locking order).
>
>     Perhaps I'm missing something,

Presumably, the text you deleted.

0
Reply rweikusat (2678) 11/25/2011 7:36:02 PM

Rainer Weikusat <rweikusat@mssgmbh.com> writes:
> Eric Sosman <esosman@ieee-dot-org.invalid> writes:
>> On 11/25/2011 1:24 PM, Rainer Weikusat wrote:
>>>[...]
>>> static int grow_evr_msg_to(struct evr_msg *evr_msg, unsigned need)
>>> {
>>>      uint8_t *p;
>>>      unsigned have, want, ofs;
>>>
>>>      have = evr_msg->e - evr_msg->p;
>>>      want = evr_msg->e - evr_msg->s;
>>>      do {
>>>          have += want;
>>>          want *= 2;
>>>      } while (have<  need);
>>>
>>>      ofs = evr_msg->p - evr_msg->s;
>>>      p = realloc(evr_msg, sizeof(*evr_msg) + want);
>>>      if (!p) return -1;
>>>
>>>      evr_msg = (void *)p;
>>>      evr_msg->s = (void *)(evr_msg + 1);
>>>      evr_msg->p = evr_msg->s + ofs;
>>>      evr_msg->e = evr_msg->s + want;
>>>      return 0;
>>> }
>>>[...]
>>> [*] I've surely written my share of code with deadlocks. But this is
>>> just an ordinary programming error, not some kind of the
>>> horrid-monster-which-lurks-in-the-deeps, and the remedy is simple (fix
>>> the code to use the proper locking order).
>>
>>     Perhaps I'm missing something,
>
> Presumably, the text you deleted.

.... which was about 'humans making stupid mistakes', such as
reallocating some memory area inside a subroutine and not returning
the realloc return value to the caller ...
0
Reply rweikusat (2678) 11/25/2011 7:39:49 PM

On 11/26/11 07:24 AM, Rainer Weikusat wrote:
> Below is some historic source code of a function written by me which is part of
> some internal library 'we' are using:
>
> ---------------
> static int grow_evr_msg_to(struct evr_msg *evr_msg, unsigned need)
> {
>      uint8_t *p;
>      unsigned have, want, ofs;
>
>      have = evr_msg->e - evr_msg->p;
>      want = evr_msg->e - evr_msg->s;
>      do {
>          have += want;
>          want *= 2;
>      } while (have<  need);
>
>      ofs = evr_msg->p - evr_msg->s;
>      p = realloc(evr_msg, sizeof(*evr_msg) + want);
>      if (!p) return -1;
>
>      evr_msg = (void *)p;
>      evr_msg->s = (void *)(evr_msg + 1);
>      evr_msg->p = evr_msg->s + ofs;
>      evr_msg->e = evr_msg->s + want;
>      return 0;
> }
> --------------
>
> This has been used 'in production' since some time in August, until it
> finally blew up on a customer appliance this Wednesday
> (2011/11/16). The error is especially ghastly because the code will
> work 'most of the time' and when it doesn't, the invoker will happily
> write over whatever memory management data structures malloc uses
> internally with aribtrarily damageing consequences. This is the source
> code equivalent of something known in German as 'Augenblicksversagen'
> (blink of an eye failure) when it leads to (usually disastrous) car
> crashes on motorways: Despite knowing better and having applied this
> better knowledge countless times, someone suddenly does something
> completely stupid (in a hurry) because his brain just chose to have a
> hiccup. When writing code, it is therefore prudent to assume that
> everyone will make stupid mistakes, at least once in a while, and -
> despite this isn't exactly popular in "many circles", that nothing
> should ever be moved from development to production without a prior
> code review conducted by the person who wrote it after some time has
> passed and ideally, additionally by second person who hasn't been
> involved with it so far.

This looks like a case where unit tests (written before the code) would 
have saved your bacon.  I'm sure only writing production code in order 
to pass tests has saved me form many similar embarrassing moments!

-- 
Ian Collins
0
Reply ian-news (9873) 11/25/2011 7:47:24 PM

On 11/25/2011 2:39 PM, Rainer Weikusat wrote:
> Rainer Weikusat<rweikusat@mssgmbh.com>  writes:
>> Eric Sosman<esosman@ieee-dot-org.invalid>  writes:
>>> On 11/25/2011 1:24 PM, Rainer Weikusat wrote:
>>>> [...]
>>>> static int grow_evr_msg_to(struct evr_msg *evr_msg, unsigned need)
>>>> {
>>>>       uint8_t *p;
>>>>       unsigned have, want, ofs;
>>>>
>>>>       have = evr_msg->e - evr_msg->p;
>>>>       want = evr_msg->e - evr_msg->s;
>>>>       do {
>>>>           have += want;
>>>>           want *= 2;
>>>>       } while (have<   need);
>>>>
>>>>       ofs = evr_msg->p - evr_msg->s;
>>>>       p = realloc(evr_msg, sizeof(*evr_msg) + want);
>>>>       if (!p) return -1;
>>>>
>>>>       evr_msg = (void *)p;
>>>>       evr_msg->s = (void *)(evr_msg + 1);
>>>>       evr_msg->p = evr_msg->s + ofs;
>>>>       evr_msg->e = evr_msg->s + want;
>>>>       return 0;
>>>> }
>>>> [...]
>>>> [*] I've surely written my share of code with deadlocks. But this is
>>>> just an ordinary programming error, not some kind of the
>>>> horrid-monster-which-lurks-in-the-deeps, and the remedy is simple (fix
>>>> the code to use the proper locking order).
>>>
>>>      Perhaps I'm missing something,
>>
>> Presumably, the text you deleted.
>
> ... which was about 'humans making stupid mistakes', such as
> reallocating some memory area inside a subroutine and not returning
> the realloc return value to the caller ...

     Yes, I saw that part -- which said only that there had been a
mistake, and said nothing about its nature.  What I did not see (and
did not snip) was anything that might have explained why "the remedy
is simple (fix the code to use the proper locking order)."  Two
questions remain: What is the locking issue, and how does using "the
proper locking order" get the realloc() result back to the caller?

-- 
Eric Sosman
esosman@ieee-dot-org.invalid
0
Reply esosman2 (2945) 11/25/2011 10:02:04 PM

Eric Sosman <esosman@ieee-dot-org.invalid> writes:
> On 11/25/2011 2:39 PM, Rainer Weikusat wrote:
>> Rainer Weikusat<rweikusat@mssgmbh.com>  writes:
>>> Eric Sosman<esosman@ieee-dot-org.invalid>  writes:
>>>> On 11/25/2011 1:24 PM, Rainer Weikusat wrote:
>>>>> [...]
>>>>> static int grow_evr_msg_to(struct evr_msg *evr_msg, unsigned need)
>>>>> {
>>>>>       uint8_t *p;
>>>>>       unsigned have, want, ofs;
>>>>>
>>>>>       have = evr_msg->e - evr_msg->p;
>>>>>       want = evr_msg->e - evr_msg->s;
>>>>>       do {
>>>>>           have += want;
>>>>>           want *= 2;
>>>>>       } while (have<   need);
>>>>>
>>>>>       ofs = evr_msg->p - evr_msg->s;
>>>>>       p = realloc(evr_msg, sizeof(*evr_msg) + want);
>>>>>       if (!p) return -1;
>>>>>
>>>>>       evr_msg = (void *)p;
>>>>>       evr_msg->s = (void *)(evr_msg + 1);
>>>>>       evr_msg->p = evr_msg->s + ofs;
>>>>>       evr_msg->e = evr_msg->s + want;
>>>>>       return 0;
>>>>> }
>>>>> [...]
>>>>> [*] I've surely written my share of code with deadlocks. But this is
>>>>> just an ordinary programming error, not some kind of the
>>>>> horrid-monster-which-lurks-in-the-deeps, and the remedy is simple (fix
>>>>> the code to use the proper locking order).
>>>>
>>>>      Perhaps I'm missing something,
>>>
>>> Presumably, the text you deleted.
>>
>> ... which was about 'humans making stupid mistakes', such as
>> reallocating some memory area inside a subroutine and not returning
>> the realloc return value to the caller ...
>
>     Yes, I saw that part -- which said only that there had been a
> mistake, and said nothing about its nature.  What I did not see (and
> did not snip) was anything that might have explained why "the remedy
> is simple (fix the code to use the proper locking order)."  Two
> questions remain: What is the locking issue, and how does using "the
> proper locking order" get the realloc() result back to the caller?

This was supposed to be a reference to Boltar's conjecture the me
referring to 'deadlocks' as 'simple problems' must mean that I'd
ludicrously believe to never write code which deadlocks at run time.
But I actually referred to them as simple because fixing them is
simple (provided that it is known what the locking order should have
been and determining this locking order is a required design step for
'proper' multithreaded applications complicated enough to need one)
and 'program stops as soon as the problem occurs', kindly preserving
all information required for fixing it until someone notices that, is
a lot easier to deal with than errors where the program continues to
run, but either works with corrupted data or even runs along codepaths
which manifested themselves out of thin air because some halfway
overwritten function pointer caused a control transfer somewhere into
the middle of an instruction (for CPUs where such things are
possible) or so.

And there are far worse things than that. Something I had to do in the
past was to modify the Linux 2.4 ARM head.s code to work with an Intel
IXP425 processor in little-endian mode. Apparently because "big-endian
makes so much more sense" someone at Intel(!) originally decided that
nobody could possibly want to run such a CPU in a different
configuration and consequently, the memory controller was designed to
be big-endian only. In order to allow a CPU running in little-endian
mode to work with that, there's a  hardware byte-swapper sitting
between the processor and the bus and this hardware byte-swapper swaps
bytes provided the MMU is enabled and the page table entry for the
page in question has a certain flag set. By the time this code runs,
the processor is in big-endian mode and a kernel has been loaded into
RAM by the bootloader. It then needs to setup enough of a page table
to provide access to the kernel after the MMU has been enabled,
byteswap the entire contents of the RAM except the the code performing
the procedure itself, enable the MMU and reconfigure the CPU into
little-endian mode and hope for the best because every error in the
(machine) code which does all this causes the whole bastard thing to
go into a state where 'code execution' is not only not anyhow
observable except with a 'hardware debugger' I didn't have but where
it is also impossible to generate any kind of diagnostic output, be it
as primitive as lighting an LED via GPIO pin.
0
Reply rweikusat (2678) 11/25/2011 10:51:48 PM

7 Replies
41 Views

(page loaded in 0.156 seconds)

Similiar Articles:













7/20/2012 7:47:27 AM


Reply: