[nanomsg] Windows IPC pub/sub worker routine never exits after the pub side is closed

  • From: Timothee Besset <ttimo@xxxxxxxxx>
  • To: nanomsg@xxxxxxxxxxxxx
  • Date: Tue, 22 Jul 2014 12:20:28 -0500

Hello,

We are trying to track down and fix several critical bugs in Windows
nanomsg. We are not making much progress unfortunately, any help will be
greatly appreciated.

The main issues we are concerned with right now are:

   - https://github.com/nanomsg/nanomsg/issues/182
   - https://github.com/nanomsg/nanomsg/issues/283

182 is more critical for us right now, so that's what we have been focusing
on.

https://github.com/TTimo/nanomsg/blob/0.4-beta/tests/ipc_pub_disconnect.c

This test code is a good summary of the various problems.

Most importantly, we would like to make sure that we can detect when the
remote pub socket has closed and cut the connection. Without this, we are
stuck in nn_recv calls that never return, or nn_recv NN_DONTWAIT that will
say 'no data' forever.

Our investigation so far, we have found that there is a single worker
thread (worker_win.inc nn_worker_routine) that is looping forever
on GetQueuedCompletionStatusEx .. finds a timeout, and loops over.

We have put a bit of experimental code:
https://github.com/TTimo/nanomsg/commit/771f3c33082b70c4f9846420e984212bb896ac69

.. and we are able to tell that the completion port is now waiting on a
dead handle (since the pub was closed).

At that point I don't know what the next step should be. It seems that when
we detect the problem in nn_worker_routine, the source socket is already
in NN_USOCK_STATE_DONE anyway. We tried to tick the fsm so recv operations
would either return or start returning an error, but due to
NN_USOCK_STATE_DONE that doesn't work..

Please let us know if you have any suggestions..

Best,
TTimo

Other related posts:

  • » [nanomsg] Windows IPC pub/sub worker routine never exits after the pub side is closed - Timothee Besset