f



Robust error handling, an error while handling another error

I've been having this discussion a lot with my colleagues at work.

A buffered file writer is the classic example.

#include <fstream>
int main()
{   std::ofstream fout("foo.txt");
    if ( ! fout)
        return 1;
    //write important data to file
    //let the destructor implicitly close the file handle
}


The ofstream destructor will call flush then free the file handle,
return it to the operating system. However, flush can fail. The disk
can be full, for example. When flush fails, it sets a state flag, but
there is no opportunity for the user to examine this state flag as
it's being called from the destructor. It fails silently. I believe
that it is a mistake, a poor choice in the itnerface, for the
destructor of a buffered file stream to call flush. It adds usability
at the cost of correctness in the corner case when used lazily.

Using the std::ofstream interface above, it's still possible to write
correct code by calling flush explicitly as the last part of the
normal success path, then return an error (either by return code or by
exception) if the flush fails. (I still think it's bad because it
promotes the bad style of relying on the destructor to flush the
data.)




My real question is, what should you do when you encounter an error
when unwinding the stack from another error?

For example, imagine some large portion of code split across several
translation units. The main entry point allocates a database
connection and does an operation on the database. Somewhere along in
this complex piece of code, we discover that the caller of this
function provided a bad table name. We return an error embodying this
error, either as an exception or by return code. When we get back to
the main entry function, we attempt to free the database connection,
except, uh oh, we are unable to free the database connection. Perhaps
the interface is documented that it cannot free the connection in all
states, that you have to do move it into a proper state before you can
destroy it. Perhaps it's the result of a programming error; maybe you
already disconnected from the database. Perhaps your process is
corrupted.


I see these as the possible scenarios which can lead to this:

1- You allocated some resource which is not freeable in an arbitrary
state (assuming LIFO resource acquisition and releasing). Your process
now has a resource leak.

2- The function which frees the resource attempts to do some
additional work on the resource before freeing it, like the buffered
stream writer. If that additional work fails, it will return an error.
Hopefully it will free the resource even if that additional work (aka
flush) fails, otherwise your process now has a resource leak.

3- Either in your code or in the library code, a sanity check was
triggered, revealing a programmer error, either on your part or the
library implementer's part.


As I see it, your only options are:
1- Kill the process (after attempting to log).
2- Return them both. Create some new error return value that wraps the
other two errors.
3- Return one and drop the other (and possibly attempt to log the
dropped exception).
3 b- What do you do if the logging fails? Silently ignore it? Kill the
process?


Luckily, it is my humble observation that you can free (nearly) all
resources (in a LIFO-like, stack unwinding-like order) in an arbitrary
state, be them file handles, memory, database connections, etc.
Without such a guarantee, it would be impossible to write robust code.
When you encounter an error, you need to be able to unwind the stack
to reach a point where you can handle the error, thus all resources
must be in a freeable state (assuming a LIFO-like, stack unwinding-
like ordering), otherwise you will have resource leaks.


Unfortunately, such errors can still happen. I'm looking for any
general rules or guidelines as to how to handle these things, or at
least good papers on the subject exploring other aspects which I do
not currently see. I think my main criterion are sensible enough:
1- Your library should never drop an error silently. Logging it and
then dropping it is acceptable, though not preferred. If logging is
broken, then the process should be killed to signal an error. (Killing
the process is not dropping an error, though debugging friendly
information is preferable.)
2- It is greatly preferred your library does not leak resources, and
if it must leak a resource, that is an error which cannot be dropped.


For programming errors, I'm generally OK killing the process right
then and there, aka some form of assert.

However, suppose my library uses some other third party library, and
the interface they expose has an error return for freeing a resource.
It seems as though there isn't a "one size fits all" answer. Some do
free the resource but report that additional work (like flush) failed.
Some do not free the resource, perhaps requiring you to call it again.
Some report a sanity check failure, suggesting program corruption or a
programming error on the part of the third party library implementer.


Some colleagues at my company want to throw an exception on a sanity
check failure. A lot of them are very averse to killing the process
under any circumstances, even when a sanity check is tripped, aka a
programmer error. Their argument is that we would prefer having the
process to stay up if it's just your component which is broken. For
example, a user clicks something in the GUI, a sanity check is hit,
and instead of the process dying, it's reported to the user that that
functionality is broken and you should not use it, but you may
continue to use other aspects of the program. I question the merits of
it. I question whether it's a good idea to have the process continue
in the face of a confirmed programmer error, and I question if the
benefit of this rare case keeping up possibly corrupt process is worth
the large amount of extra hassle a programmer has to go through
returning an error instead of using a release-mode assert equivalent.

To be clear, I agree that a user of a executable should not be able to
crash that executable from bad input for most executables. However,
should we be so extreme as to include programmer errors in that
category?

-- 
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]

0
JoshuaMaurice
10/25/2008 2:15:31 PM
comp.lang.c++.moderated 10738 articles. 1 followers. allnor (8510) is leader. Post Follow

6 Replies
1288 Views

Similar Articles

[PageSpeed] 53

JoshuaMaurice@gmail.com ha scritto:
> I've been having this discussion a lot with my colleagues at work.
>
> A buffered file writer is the classic example.
>
> #include <fstream>
> int main()
> {   std::ofstream fout("foo.txt");
>     if ( ! fout)
>         return 1;
>     //write important data to file
>     //let the destructor implicitly close the file handle
> }
>
>
> <snip>
>
> Using the std::ofstream interface above, it's still possible to write
> correct code by calling flush explicitly as the last part of the
> normal success path, then return an error (either by return code or by
> exception) if the flush fails. (I still think it's bad because it
> promotes the bad style of relying on the destructor to flush the
> data.)

Whether having flush() called by the destructor is good or bad design is
very debatable. Frankly, I like it the way it is and your arguments do
not convince me of the contrary.

> My real question is, what should you do when you encounter an error
> when unwinding the stack from another error?

The fact is that in most cases throwing the second exception is
pointless because the state of system is so messed up that you couldn't
recover it anyway. In such scenario, calling std::terminate is the most
sensible thing. So the question can't be answered in the general case.

> As I see it, your only options are:
> 1- Kill the process (after attempting to log).
> 2- Return them both. Create some new error return value that wraps the
> other two errors.

That can be achieved with the new C++0x facility std::nested_exception.
Notice that without std::nested_exception or std::exception_ptr (another
feature introduced by C++0x) it's quite hard to achieve point 2 in the
general case.


> Some colleagues at my company want to throw an exception on a sanity
> check failure. A lot of them are very averse to killing the process
> under any circumstances, even when a sanity check is tripped, aka a
> programmer error. Their argument is that we would prefer having the
> process to stay up if it's just your component which is broken. For
> example, a user clicks something in the GUI, a sanity check is hit,
> and instead of the process dying, it's reported to the user that that
> functionality is broken and you should not use it, but you may
> continue to use other aspects of the program. I question the merits of
> it. I question whether it's a good idea to have the process continue
> in the face of a confirmed programmer error, and I question if the
> benefit of this rare case keeping up possibly corrupt process is worth
> the large amount of extra hassle a programmer has to go through
> returning an error instead of using a release-mode assert equivalent.
>
> To be clear, I agree that a user of a executable should not be able to
> crash that executable from bad input for most executables. However,
> should we be so extreme as to include programmer errors in that
> category?
>

The right attitude, here, is not asking what is "correct", but what is,
in the end, "most useful" to you and/or the user. Ok, you could inform
the user that a certain feature is broken and not to use it, instead of
having the application crash. So what? The user will probably yell at
you and stop using the feature.

Wouldn't it be more useful to make the application crash and let the
user send you a crash dump so that you can debug and possibly fix the
problem? The user will still yell at you, but at least he will know that
you care about fixing his problem.

A different case is, for example, if your application supports
third-party plugins and one of them appear to be broken. In that case
you might decide to handle the situation more gracefully, inform the
user that the plugin is broken and disable it. The user will then yell
at the plugin developer, but that's not your problem.

There is no one-size-fit-all approach. It's your job as developer to be
sensible and capture what is the Right Thing to do in each case.

Just my opinion,

Ganesh

-- 
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]

0
Alberto
10/27/2008 1:18:15 AM
On Oct 25, 3:15 pm, JoshuaMaur...@gmail.com wrote:
> Some colleagues at my company want to throw an exception on a sanity
> check failure. A lot of them are very averse to killing the process
> under any circumstances, even when a sanity check is tripped, aka a
> programmer error. Their argument is that we would prefer having the

We maintain a main memory database (DataBlitz)[1], and we provide
shared libraries for other applications to connect to this database.
Depending on the seriousness of the detected error we either return an
error code or we dump core and stop further damage.  In my opinion,
this decision needs to be taken on a case by case basis.  Only on
situations when continuing further might cause further damage, should
one decide to dump core.

This is the reason why std::cassert() calls are generally removed on
production builds!

Rgds,
anna

[1] DataBlitz (http://www.mgl.com/mgct/databases.htm)

--
Flow: For Love of Water
http://missingrainbow.blogspot.com/2008/09/flow-for-love-of-water.html



-- 
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]

0
annamalai
10/28/2008 3:48:09 PM
> The right attitude, here, is not asking what is "correct", but what is,
> in the end, "most useful" to you and/or the user. Ok, you could inform
> the user that a certain feature is broken and not to use it, instead of
> having the application crash. So what? The user will probably yell at
> you and stop using the feature.
> 
> Wouldn't it be more useful to make the application crash and let the
> user send you a crash dump so that you can debug and possibly fix the
> problem? The user will still yell at you, but at least he will know that
> you care about fixing his problem.


That's not a very good argument. Why not fork a new process and generate
a core dump of the currently running one? Also, when the program
crashes, data might be lost, and a user will probably yell a lot more
about lost data than about a broken feature.

-- 
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]

0
Matthias
10/28/2008 11:17:15 PM
Matthias Berndt ha scritto:
>> The right attitude, here, is not asking what is "correct", but what is,
>> in the end, "most useful" to you and/or the user. Ok, you could inform
>> the user that a certain feature is broken and not to use it, instead of
>> having the application crash. So what? The user will probably yell at
>> you and stop using the feature.
>>
>> Wouldn't it be more useful to make the application crash and let the
>> user send you a crash dump so that you can debug and possibly fix the
>> problem? The user will still yell at you, but at least he will know that
>> you care about fixing his problem.
> 
> 
> That's not a very good argument. Why not fork a new process and generate
> a core dump of the currently running one? Also, when the program
> crashes, data might be lost, and a user will probably yell a lot more
> about lost data than about a broken feature.
> 

That wasn't an argument. It was an option.

Ganesh

-- 
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]

0
Alberto
10/29/2008 3:18:15 PM
On Oct 29, 6:17 am, Matthias Berndt <matthias_ber...@gmx.de> wrote:
> > The right attitude, here, is not asking what is "correct", but what is,
> > in the end, "most useful" to you and/or the user. Ok, you could inform
> > the user that a certain feature is broken and not to use it, instead of
> > having the application crash. So what? The user will probably yell at
> > you and stop using the feature.
>
> > Wouldn't it be more useful to make the application crash and let the
> > user send you a crash dump so that you can debug and possibly fix the
> > problem? The user will still yell at you, but at least he will know that
> > you care about fixing his problem.
>
> That's not a very good argument. Why not fork a new process and generate
> a core dump of the currently running one? Also, when the program
> crashes, data might be lost, and a user will probably yell a lot more
> about lost data than about a broken feature.
>

A good program should be designed not to corrupt or lose data even in
the event of a crash (replay logs, checkpointing, using an external
database are all good solutions). In fact *not* crashing (i.e. bailing
out as fast as possible) after an internal error can greatly increase
the possibility of data corruption.

-- 
gpd

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]

0
gpderetta
10/30/2008 1:54:33 PM
On Oct 25, 9:15 pm, JoshuaMaur...@gmail.com wrote:
> I've been having this discussion a lot with my colleagues at work.
>
> A buffered file writer is the classic example.
>
> #include <fstream>
> int main()
> {   std::ofstream fout("foo.txt");
>     if ( ! fout)
>         return 1;
>     //write important data to file
>     //let the destructor implicitly close the file handle
>
> }
>
> The ofstream destructor will call flush then free the file handle,
> return it to the operating system. However, flush can fail. The disk
> can be full, for example. When flush fails, it sets a state flag, but
> there is no opportunity for the user to examine this state flag as
> it's being called from the destructor. It fails silently. I believe
> that it is a mistake, a poor choice in the itnerface, for the
> destructor of a buffered file stream to call flush. It adds usability
> at the cost of correctness in the corner case when used lazily.

I encountered this problem when I wrote my own I/O library. This
library uses exceptions for all I/O errors, so I had to choose what to
do on errors during destruction.

My choice was to let the user decide. The user can supply a callback
function that is called in case of an error. The user can then choose
to throw it, log it, ignore it, or abort the application.
The default behavior is to throw, unless there's already an unwind in
progress, in which case it is ignored, to prevent termination.


--
      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]

0
wasti
11/1/2008 9:29:04 AM
Reply: