[nanomsg] Re: Abort on recoverable error?

  • From: "Garrett D'Amore" <garrett@xxxxxxxxxx>
  • To: "nanomsg@xxxxxxxxxxxxx" <nanomsg@xxxxxxxxxxxxx>
  • Date: Mon, 1 Sep 2014 23:44:53 -0700

There are layers & levels of course -- the developer passing a NULL pointer
when that is explicitly disallowed being an example.  (libc doesn't check
strcpy, for example.)

But a programmer following the documented API should never be exposed to an
abort().

I don't think it is reasonable to ask programmers to check state machines,
etc. in their client code.  With some programming structures, it might not
be very easy for a programmer to cope with all these checks ahead of time
(imagine multithreaded applications), so violations of state machines,
protocol violations, etc. should be reported back as an error IMO.  Ideally
one that can be more readily understood/debugged than a core dump.

But gross programmer error, like mismanaging pointers, heap, etc., is not
something I'd ever expect any library to check.    Indeed, in those cases,
I wouldn't normally expect an assertion in the library itself.  (E.g.
strcpy() doesn't bother to assert() that its arguments are non-NULL.)

  - Garrett

On Mon, Sep 1, 2014 at 11:31 PM, Drew Crawford <drew@xxxxxxxxxxxxxxxxxx>
wrote:

> I think the better criteria is, “can a reasonable developer prevent this?”
>  In the case of “Address already in use”, the answer is no.  Even if the
> developer checked if the address is in use before opening the connection, a
> race condition may cause it to open after the check.
>
> However there are plausibly errors that a developer can always prevent,
> and in those cases I’d argue for abort over returncode.  This is because a
> developer may not write errorchecking code, but an abort forces the issue.
>
>
>
> On Sep 2, 2014, at 1:26 AM, Bruce Mitchener <bruce.mitchener@xxxxxxxxx>
> wrote:
>
> Same.
>
>  - Bruce
>
> Sent from my iPhone
>
> On Sep 2, 2014, at 1:06 PM, Matt Howlett <matt.howlett@xxxxxxxxx> wrote:
>
>
> I agree completely... the way abort is used in nanomsg prevents me from
> using it in scenarios I otherwise would like to.
>
>
> On Tue, Sep 2, 2014 at 3:56 PM, Garrett D'Amore <garrett@xxxxxxxxxx>
> wrote:
>
>> I'd argue that's a bug.  And yet, I see lots of these things in the
>> underlying code now that I look. :-(
>>
>> Libraries should only ever abort() in the event that the library
>> developer created a bug -- and *every* case where the library abort()'s or
>> fails an assertion should be treated as a high priority bug.
>>
>>
>> On Mon, Sep 1, 2014 at 10:49 PM, Jihyun Yu <yjh0502@xxxxxxxxx> wrote:
>>
>>> Hi,
>>>
>>> It seems that nanomsg aborts on recoverable error. For example
>>> following code fails with abort();
>>>
>>> ==
>>> #include <nanomsg/nn.h>
>>> #include <nanomsg/pubsub.h>
>>>
>>> int main(void) {
>>>     int socket = nn_socket(AF_SP, NN_PUB);
>>>     if(socket < 0) return socket;
>>>     int tp1 = nn_bind(socket, "tcp://*:11234");
>>>     if(tp1 < 0) return tp1;
>>>     int tp2 = nn_bind(socket, "tcp://*:11234");
>>>     if(tp2 < 0) return tp2;
>>>
>>>     return 0;
>>> }
>>> ==
>>>
>>> Here's an output
>>>
>>> ==
>>> Address already in use [98] (src/transports/tcp/btcp.c:378)
>>> /bin/bash: line 1: 22624 Aborted                 ./a.out
>>> ==
>>>
>>> I'm writing Erlang binding of nanomsg[1], and calling abort() inside
>>> library functions is not acceptable in Erlang. Is the design
>>> intentional?
>>>
>>>
>>> [1] https://github.com/yjh0502/nanomsg
>>>
>>>
>>
>
>

Other related posts: