[nanomsg] nn_close() hangs

  • From: André Jonsson <andre.jonsson@xxxxxxxxxxxxx>
  • To: nanomsg@xxxxxxxxxxxxx
  • Date: Tue, 16 Dec 2014 17:51:08 +0100 (CET)

Hi all,

I'm trying to replace my home-grown message bus with nanomsg in an existing 
application.
Everything was surprisingly pain-free, until I noticed that nn_close() hangs.

For most sockets it works fine, but in a specific scenario - during shutdown of 
a subsystem - it hangs (it's a SUB socket).

I breaked (broke?) the process and checked the stack of all threads, and two of 
them were inside nn_glock_lock(), a worker:

#0  0x9fffffffbcc46b90:0 in __ksleep+0x30 () from /usr/lib/hpux64/libc.so.1
#1  0x9fffffffbd181920:0 in __mxn_sleep+0x1190 ()
   from /usr/lib/hpux64/libpthread.so.1
#2  0x9fffffffbd0fbce0:0 in __pthread_mutex_lock_wait_ng+0x260 ()
   from /usr/lib/hpux64/libpthread.so.1
#3  0x9fffffffbd0f9c90:0 in __pthread_mutex_lock_ng+0x250 ()
   from /usr/lib/hpux64/libpthread.so.1
#4  0x9fffffffbd0f9a20:0 in pthread_mutex_lock+0x20 ()
   from /usr/lib/hpux64/libpthread.so.1
#5  0x4000000000381590:0 in nn_glock_lock () at src/utils/glock.c:63
#6  0x40000000003736a0:0 in nn_global_submit_statistics ()
    at src/core/global.c:1125
#7  0x400000000036e2a0:0 in nn_global_handler () at src/core/global.c:1286
#8  0x400000000037b250:0 in nn_fsm_feed () at src/aio/fsm.c:72
#9  0x400000000037b1a0:0 in nn_fsm_event_process () at src/aio/fsm.c:66
#10 0x400000000037ac80:0 in nn_ctx_leave () at src/aio/ctx.c:63
#11 0x400000000037dc40:0 in nn_worker_routine ()
    at src/aio/worker_posix.inc:189
#12 0x4000000000384260:0 in nn_thread_main_routine ()
    at src/utils/thread_posix.inc:35

... and my thread, trying to close its SUB socket:

#0  0x9fffffffbcc46b90:0 in __ksleep+0x30 () from /usr/lib/hpux64/libc.so.1
(gdb) bt
#0  0x9fffffffbcc46b90:0 in __ksleep+0x30 () from /usr/lib/hpux64/libc.so.1
#1  0x9fffffffbd181920:0 in __mxn_sleep+0x1190 ()
   from /usr/lib/hpux64/libpthread.so.1
#2  0x9fffffffbd0fbce0:0 in __pthread_mutex_lock_wait_ng+0x260 ()
   from /usr/lib/hpux64/libpthread.so.1
#3  0x9fffffffbd0f9c90:0 in __pthread_mutex_lock_ng+0x250 ()
   from /usr/lib/hpux64/libpthread.so.1
#4  0x9fffffffbd0f9a20:0 in pthread_mutex_lock+0x20 ()
   from /usr/lib/hpux64/libpthread.so.1
#5  0x4000000000381590:0 in nn_glock_lock () at src/utils/glock.c:63
#6  0x400000000036ff30:0 in nn_close () at src/core/global.c:571
#7  0x40000000001eeae0:0 in msg::Bus::Sink::~Sink (this=0x60000000005770f0, 
    No.Identifier_87=2) at message_bus.cxx:194
(+ more of my stuff)


As these two threads are seemingly waiting to lock the same mutex, there must 
be another thread that already has the lock.

Is this the correct assumption? And, how do I find which thread it is?


/André

Other related posts: