[nanomsg] Re: nn_close() hangs

  • From: André Jonsson <andre.jonsson@xxxxxxxxxxxxx>
  • To: nanomsg@xxxxxxxxxxxxx
  • Date: Wed, 17 Dec 2014 20:58:20 +0100 (CET)

Thanks for replying.

I'll see if I can make something smaller, and still hang.

/André

----- Original Message -----
> From: "Jason E. Aten" <j.e.aten@xxxxxxxxx>
> To: nanomsg@xxxxxxxxxxxxx
> Sent: Wednesday, 17 December, 2014 17:58:44
> Subject: [nanomsg] Re: nn_close() hangs

> Hi André,
> 
> please post a gist that lets us reproduce the hang.
> 
> Best,
> Jason
> 
>> On Dec 16, 2014, at 8:51 AM, André Jonsson <andre.jonsson@xxxxxxxxxxxxx> 
>> wrote:
>> 
>> Hi all,
>> 
>> I'm trying to replace my home-grown message bus with nanomsg in an existing
>> application.
>> Everything was surprisingly pain-free, until I noticed that nn_close() hangs.
>> 
>> For most sockets it works fine, but in a specific scenario - during shutdown 
>> of
>> a subsystem - it hangs (it's a SUB socket).
>> 
>> I breaked (broke?) the process and checked the stack of all threads, and two 
>> of
>> them were inside nn_glock_lock(), a worker:
>> 
>> #0  0x9fffffffbcc46b90:0 in __ksleep+0x30 () from /usr/lib/hpux64/libc.so.1
>> #1  0x9fffffffbd181920:0 in __mxn_sleep+0x1190 ()
>>   from /usr/lib/hpux64/libpthread.so.1
>> #2  0x9fffffffbd0fbce0:0 in __pthread_mutex_lock_wait_ng+0x260 ()
>>   from /usr/lib/hpux64/libpthread.so.1
>> #3  0x9fffffffbd0f9c90:0 in __pthread_mutex_lock_ng+0x250 ()
>>   from /usr/lib/hpux64/libpthread.so.1
>> #4  0x9fffffffbd0f9a20:0 in pthread_mutex_lock+0x20 ()
>>   from /usr/lib/hpux64/libpthread.so.1
>> #5  0x4000000000381590:0 in nn_glock_lock () at src/utils/glock.c:63
>> #6  0x40000000003736a0:0 in nn_global_submit_statistics ()
>>    at src/core/global.c:1125
>> #7  0x400000000036e2a0:0 in nn_global_handler () at src/core/global.c:1286
>> #8  0x400000000037b250:0 in nn_fsm_feed () at src/aio/fsm.c:72
>> #9  0x400000000037b1a0:0 in nn_fsm_event_process () at src/aio/fsm.c:66
>> #10 0x400000000037ac80:0 in nn_ctx_leave () at src/aio/ctx.c:63
>> #11 0x400000000037dc40:0 in nn_worker_routine ()
>>    at src/aio/worker_posix.inc:189
>> #12 0x4000000000384260:0 in nn_thread_main_routine ()
>>    at src/utils/thread_posix.inc:35
>> 
>> ... and my thread, trying to close its SUB socket:
>> 
>> #0  0x9fffffffbcc46b90:0 in __ksleep+0x30 () from /usr/lib/hpux64/libc.so.1
>> (gdb) bt
>> #0  0x9fffffffbcc46b90:0 in __ksleep+0x30 () from /usr/lib/hpux64/libc.so.1
>> #1  0x9fffffffbd181920:0 in __mxn_sleep+0x1190 ()
>>   from /usr/lib/hpux64/libpthread.so.1
>> #2  0x9fffffffbd0fbce0:0 in __pthread_mutex_lock_wait_ng+0x260 ()
>>   from /usr/lib/hpux64/libpthread.so.1
>> #3  0x9fffffffbd0f9c90:0 in __pthread_mutex_lock_ng+0x250 ()
>>   from /usr/lib/hpux64/libpthread.so.1
>> #4  0x9fffffffbd0f9a20:0 in pthread_mutex_lock+0x20 ()
>>   from /usr/lib/hpux64/libpthread.so.1
>> #5  0x4000000000381590:0 in nn_glock_lock () at src/utils/glock.c:63
>> #6  0x400000000036ff30:0 in nn_close () at src/core/global.c:571
>> #7  0x40000000001eeae0:0 in msg::Bus::Sink::~Sink (this=0x60000000005770f0,
>>    No.Identifier_87=2) at message_bus.cxx:194
>> (+ more of my stuff)
>> 
>> 
>> As these two threads are seemingly waiting to lock the same mutex, there 
>> must be
>> another thread that already has the lock.
>> 
>> Is this the correct assumption? And, how do I find which thread it is?
>> 
>> 
>> /André

Other related posts: