Jason, I have also experienced this hang in `nn_close()`, and so far as I can diagnose, it's a race condition that exists within both `nn_sock_send()` and `nn_sock_recv()` that manifests only when `nn_close()` is called concurrently from another thread. There exists a little bit of work to fix this issue you can casually peruse (don't take it too seriously, because it doesn't work yet!) here: https://github.com/wirebirdlabs/featherweight-nanomsg/commit/d0ebdeaf7d92e4c070aba599d6af871fe9808a5d I believe the race condition is this -- both the sock_recv and sock_send functions might loop indefinitely within the `while (1)` loop, yet within this loop, the context is released and captured again. I have also tried zombifying as part of `nn_close()` in an attempt to cleanly exit the blocking I/O function, but that did not seem to help. If anything, it allowed `nn_close()` to continue past the semaphore capture and free the socket, causing the blocking I/O to fault trying to access freed memory. Yikes! Hope this helps -- and let's keep bouncing around ideas, Jack R. Dunaway | Wirebird Labs LLC On Sat, Jan 31, 2015 at 9:19 PM, Jason E. Aten <j.e.aten@xxxxxxxxx> wrote: > Update: I got a stack trace from gdb. It appears to be hung in > nn_sem_wait(), at src/utils/sem.c:159, which is a call: > > rc = sem_wait (&self->sem); // src/utils/sem.c:159 hangs here. > > So my earlier diagnosis was likely incorrect. It seems we have a logic bug > instead. > > (gdb) *bt* > > #0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:86 > > #1 0x00007ffff7dd0eeb in nn_sem_wait (self=self@entry=0x7fffb4017a88) at > src/utils/sem.c:159 > > #2 0x00007ffff7dca6c2 in nn_sock_term (self=0x7fffb40179b0) at > src/core/sock.c:202 > > #3 0x00007ffff7dc7837 in nn_close (s=31) at src/core/global.c:574 > > #4 0x0000000000401d7b in _cgo_14c45440a8bc_C2func_nn_close > (v=0xc2094182a0) > > at /home/jaten/go/src/github.com/glycerine/go-nanomsg/nanomsg.go:61 > > #5 0x0000000000489ca5 in asmcgocall () at > /home/jaten/pkg/go1.4.1/go/src/runtime/asm_amd64.s:665 > > #6 0x0000000000000008 in ?? () > > #7 0x000000c20913e000 in ?? () > > #8 0x000000000044e749 in runtime.cgocall_errno (fn=0x0, arg=0x0, > ~r2=4204019) > > at /home/jaten/pkg/go1.4.1/go/src/runtime/cgocall.go:117 > > #9 0x000000000047e804 in runtime.mstart () at > /home/jaten/pkg/go1.4.1/go/src/runtime/proc.c:836 > > #10 0x00000000004025f3 in crosscall_amd64 () at > /home/jaten/pkg/go1.4.1/go/src/runtime/cgo/gcc_amd64.S:35 > > #11 0x0000000000000003 in ?? () > > #12 0x0000000000000000 in ?? () > > (gdb) > > On Sat, Jan 31, 2015 at 6:38 PM, Jason E. Aten <j.e.aten@xxxxxxxxx> wrote: > >> In my application, this doesn't happen for a while, but then after a >> while, the server doing an nn_close() on a nanomsg socket hangs forever. >> >> I read in close 2 man page: >> >> When dealing with sockets, you have to be sure that there is no >> *recv*(2) still blocking on it on >> >> another thread, otherwise it might block forever, since no more >> messages will be sent via the >> >> socket. Be sure to use *shutdown*(2) to shut down all parts >> the connection before closing the >> >> socket. >> >> >> Moreover I see this example discussion [the answer by Joseph Quinsey >> <http://stackoverflow.com/users/318716/joseph-quinsey>] of how to >> properly close a socket: >> >> >> http://stackoverflow.com/questions/12730477/close-is-not-closing-socket-properly >> >> Mr. Quinsey suggests that there are three (3) steps needed to >> successfully close without hanging: >> >> a) getsockopt(fd, SOL_SOCKET, SO_ERROR, (char *)&err, &len)); // to clear >> any error on the socket >> >> b) shutdown(fd, SHUT_RDWR); // to terminate reliable delivery >> >> c) close(fd); // finally >> >> >> I don't see nanomsg doing a) or b), so I tend to think this is a bug in >> the nn_close() implimentation, and these two steps should be added. >> >> Thoughts? >> >> >> Thanks! >> >> - Jason >> > > > > -- > > Best regards, > Jason > > -- > Jason E. Aten, Ph.D. > j.e.aten@xxxxxxxxx > 650-429-8602 > linkedin: https://www.linkedin.com/pub/jason-e-aten-ph-d/18/313/45a >