[nanomsg] Re: nn_close() hangs

  • From: "Jason E. Aten" <j.e.aten@xxxxxxxxx>
  • To: "nanomsg@xxxxxxxxxxxxx" <nanomsg@xxxxxxxxxxxxx>
  • Date: Wed, 17 Dec 2014 08:58:44 -0800

Hi André,

please post a gist that lets us reproduce the hang.

Best,
Jason

> On Dec 16, 2014, at 8:51 AM, André Jonsson <andre.jonsson@xxxxxxxxxxxxx> 
> wrote:
> 
> Hi all,
> 
> I'm trying to replace my home-grown message bus with nanomsg in an existing 
> application.
> Everything was surprisingly pain-free, until I noticed that nn_close() hangs.
> 
> For most sockets it works fine, but in a specific scenario - during shutdown 
> of a subsystem - it hangs (it's a SUB socket).
> 
> I breaked (broke?) the process and checked the stack of all threads, and two 
> of them were inside nn_glock_lock(), a worker:
> 
> #0  0x9fffffffbcc46b90:0 in __ksleep+0x30 () from /usr/lib/hpux64/libc.so.1
> #1  0x9fffffffbd181920:0 in __mxn_sleep+0x1190 ()
>   from /usr/lib/hpux64/libpthread.so.1
> #2  0x9fffffffbd0fbce0:0 in __pthread_mutex_lock_wait_ng+0x260 ()
>   from /usr/lib/hpux64/libpthread.so.1
> #3  0x9fffffffbd0f9c90:0 in __pthread_mutex_lock_ng+0x250 ()
>   from /usr/lib/hpux64/libpthread.so.1
> #4  0x9fffffffbd0f9a20:0 in pthread_mutex_lock+0x20 ()
>   from /usr/lib/hpux64/libpthread.so.1
> #5  0x4000000000381590:0 in nn_glock_lock () at src/utils/glock.c:63
> #6  0x40000000003736a0:0 in nn_global_submit_statistics ()
>    at src/core/global.c:1125
> #7  0x400000000036e2a0:0 in nn_global_handler () at src/core/global.c:1286
> #8  0x400000000037b250:0 in nn_fsm_feed () at src/aio/fsm.c:72
> #9  0x400000000037b1a0:0 in nn_fsm_event_process () at src/aio/fsm.c:66
> #10 0x400000000037ac80:0 in nn_ctx_leave () at src/aio/ctx.c:63
> #11 0x400000000037dc40:0 in nn_worker_routine ()
>    at src/aio/worker_posix.inc:189
> #12 0x4000000000384260:0 in nn_thread_main_routine ()
>    at src/utils/thread_posix.inc:35
> 
> ... and my thread, trying to close its SUB socket:
> 
> #0  0x9fffffffbcc46b90:0 in __ksleep+0x30 () from /usr/lib/hpux64/libc.so.1
> (gdb) bt
> #0  0x9fffffffbcc46b90:0 in __ksleep+0x30 () from /usr/lib/hpux64/libc.so.1
> #1  0x9fffffffbd181920:0 in __mxn_sleep+0x1190 ()
>   from /usr/lib/hpux64/libpthread.so.1
> #2  0x9fffffffbd0fbce0:0 in __pthread_mutex_lock_wait_ng+0x260 ()
>   from /usr/lib/hpux64/libpthread.so.1
> #3  0x9fffffffbd0f9c90:0 in __pthread_mutex_lock_ng+0x250 ()
>   from /usr/lib/hpux64/libpthread.so.1
> #4  0x9fffffffbd0f9a20:0 in pthread_mutex_lock+0x20 ()
>   from /usr/lib/hpux64/libpthread.so.1
> #5  0x4000000000381590:0 in nn_glock_lock () at src/utils/glock.c:63
> #6  0x400000000036ff30:0 in nn_close () at src/core/global.c:571
> #7  0x40000000001eeae0:0 in msg::Bus::Sink::~Sink (this=0x60000000005770f0, 
>    No.Identifier_87=2) at message_bus.cxx:194
> (+ more of my stuff)
> 
> 
> As these two threads are seemingly waiting to lock the same mutex, there must 
> be another thread that already has the lock.
> 
> Is this the correct assumption? And, how do I find which thread it is?
> 
> 
> /André
> 

Other related posts: