[nanomsg] Re: Abort on recoverable error?

  • From: "Roman Puls" <puls@xxxxxxxxxxxx>
  • To: nanomsg@xxxxxxxxxxxxx
  • Date: Tue, 02 Sep 2014 07:04:09 +0000


Same here. I guess it would be clever to use the same error reporting mechanisms as with the posix sockets, where a bind also returns a -1 and errno set to EADDRINUSE (or EADDRNOTAVAIL, ...).

The same applies to many places, and the same 'asserting' misbehavior prevented me to use zeromq.


Generally regarding asserts/aborts:
- fine in debug builds, not in release builds
- library shall report error conditions when -DNDEBUG defined instead of asserting - assertions without proper error handling inside the library indicates insufficient error recovery - also: always give the developer (API user) the chance to handle errors!

So my suggestion would be

#define ASSERT_RETURN(cond, _errno) \
do { \
  if (!(expr)) { assert(expr); errno = _errno; return -1; } \
} while (0)

and then in library code:

int dosomething(void *ctx, int family) {
  ASSERT_RETURN(ctx, EINVAL);
  ASSERT_RETURN(family == AF_INET | family == AF_INET6, EAFNOSUPPORT)

  // or taken from nn_btcp_create
 self = nn_alloc (sizeof (struct nn_btcp), "btcp");
  // NO:  alloc_assert (self);
ASSERT_RETURN(self, ENOMEM); // new, however previously allocated resources need to be released
  ...
}

This is a safe(r) pattern. With -DNDEBUG defined during the build, it will safely return a -1 plus errno is set. With NDBUG undefined, it may abort(). However, special care needs to be taken that no resources are leaking when returning instead of aborting.

Cheers,
  Roman

PS: One of my favorite tests in terms of protocol correctness and ability to handle errors was:

cat /dev/urandom | nc localhost 5555

If the program aborts (which was the case at least in zeromq 2.x), then better don't use it for productive code.


Same.

 - Bruce

Sent from my iPhone

On Sep 2, 2014, at 1:06 PM, Matt Howlett <matt.howlett@xxxxxxxxx> wrote:


I agree completely... the way abort is used in nanomsg prevents me from using it in scenarios I otherwise would like to.


On Tue, Sep 2, 2014 at 3:56 PM, Garrett D'Amore <garrett@xxxxxxxxxx> wrote:
I'd argue that's a bug. And yet, I see lots of these things in the underlying code now that I look. :-(

Libraries should only ever abort() in the event that the library developer created a bug -- and *every* case where the library abort()'s or fails an assertion should be treated as a high priority bug.


On Mon, Sep 1, 2014 at 10:49 PM, Jihyun Yu <yjh0502@xxxxxxxxx> wrote:
Hi,

It seems that nanomsg aborts on recoverable error. For example
following code fails with abort();

==
#include <nanomsg/nn.h>
#include <nanomsg/pubsub.h>

int main(void) {
    int socket = nn_socket(AF_SP, NN_PUB);
    if(socket < 0) return socket;
    int tp1 = nn_bind(socket, "tcp://*:11234");
    if(tp1 < 0) return tp1;
    int tp2 = nn_bind(socket, "tcp://*:11234");
    if(tp2 < 0) return tp2;

    return 0;
}
==

Here's an output

==
Address already in use [98] (src/transports/tcp/btcp.c:378)
/bin/bash: line 1: 22624 Aborted                 ./a.out
==

I'm writing Erlang binding of nanomsg[1], and calling abort() inside
library functions is not acceptable in Erlang. Is the design
intentional?


[1] https://github.com/yjh0502/nanomsg






Other related posts: