[nanomsg] Re: [PATCH] Terminate astream if initializing usock failed

  • From: Martin Sustrik <sustrik@xxxxxxxxxx>
  • To: <nanomsg@xxxxxxxxxxxxx>
  • Date: Tue, 12 Mar 2013 12:09:53 +0100

On 2013-03-12 11:49, Nir Soffer wrote:
On Tue, Mar 12, 2013 at 12:09 PM, Martin Sustrik <sustrik@xxxxxxxxxx> wrote:

On 2013-03-12 10:59, Nir Soffer wrote:

 On OS X the scenario is this:

1. client connects and close the socket
2. server accept a new socket
3. server tune the socket
4. setsockopt fail with EINVAL (looks like a bug in OS X, as the value is
valid (e.g. 128K receive buffer)


Yuck. That's an ugly behaviour. Anyway, what if we simply ignored the
EINVAL error from all those setsockopts, when on OSX?


What will happen if we fail to make the socket non-blocking, and then we
try to initialize the stream with a blocking socket?

I think that failing fast is better in most cases. What I don't like about this patch is that we cannot detect a bug in the library. For example, if we change the code to set a (real) invalid value, we will simply fail to
handle all connections, instead of asserting.

Yeah. I hate that we can't distinguish real "errorneous value" problem (nanomsg should assert) from "peer disconnected" problem (nanomsg should close the socket in decent manner).

Anyway, given that we are not able to distinguish between the two, I think that only option is to assume that there's no bug in the related library code and consider the error to mean "peer disconnected".

What if we check EINVAL, then fail with -EINVAL on OS X (and other platform
that show this error), and assert on Linux?

AFAICS it's the same as ignoring the error on OSX. The assumption is that the peer have disconnected. If we ignore the error next send/recv call on socket fails and socket will be shut down. If we return -EINVAL, the socket will be shut down. Same thing.

I've investigated a little bit further and the problem is in stream.c, lines 127-130. There's a send operation launched. If sending fails, the callback deallocates the stream object. The execution then continues to line 130 where recv() is done on invalid (recently deallocated) socket.


This explain the impossible crash :-)

Synchronous failures through callbacks are really horrible to debug - I was bitten by such issues many times. It would be much easier if all errors
that invoke a callback are asynchrounous. For example, if the send
operation fails, we set a flag and wake up the event loop, so the callback
on the next iteration of the event loop.

Yes. Same experience with callbacks here. (ZeroMQ is a callback hell internally.) I would like to do it in a different way for nanomsg, however, it's not a trivial patch, so I've been delaying dealing with this issue till I have some uninterrupted time to hack on it.

Few thoughts on the mechanism: When AIO subsystem wants to send a notification it should check whether it is currently in a worker thread or in user's thread (there's a function for determining that already). In the former case it should just invoke the callback. In the latter case it should send an nn_event (already implemented) to the worker thread. The worker thread, on reception of the event should call the callback.

How does that sound?

Martin



Other related posts: