Update: I got a stack trace from gdb. It appears to be hung in nn_sem_wait(), at src/utils/sem.c:159, which is a call: rc = sem_wait (&self->sem); // src/utils/sem.c:159 hangs here. So my earlier diagnosis was likely incorrect. It seems we have a logic bug instead. (gdb) *bt* #0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:86 #1 0x00007ffff7dd0eeb in nn_sem_wait (self=self@entry=0x7fffb4017a88) at src/utils/sem.c:159 #2 0x00007ffff7dca6c2 in nn_sock_term (self=0x7fffb40179b0) at src/core/sock.c:202 #3 0x00007ffff7dc7837 in nn_close (s=31) at src/core/global.c:574 #4 0x0000000000401d7b in _cgo_14c45440a8bc_C2func_nn_close (v=0xc2094182a0) at /home/jaten/go/src/github.com/glycerine/go-nanomsg/nanomsg.go:61 #5 0x0000000000489ca5 in asmcgocall () at /home/jaten/pkg/go1.4.1/go/src/runtime/asm_amd64.s:665 #6 0x0000000000000008 in ?? () #7 0x000000c20913e000 in ?? () #8 0x000000000044e749 in runtime.cgocall_errno (fn=0x0, arg=0x0, ~r2=4204019) at /home/jaten/pkg/go1.4.1/go/src/runtime/cgocall.go:117 #9 0x000000000047e804 in runtime.mstart () at /home/jaten/pkg/go1.4.1/go/src/runtime/proc.c:836 #10 0x00000000004025f3 in crosscall_amd64 () at /home/jaten/pkg/go1.4.1/go/src/runtime/cgo/gcc_amd64.S:35 #11 0x0000000000000003 in ?? () #12 0x0000000000000000 in ?? () (gdb) On Sat, Jan 31, 2015 at 6:38 PM, Jason E. Aten <j.e.aten@xxxxxxxxx> wrote: > In my application, this doesn't happen for a while, but then after a > while, the server doing an nn_close() on a nanomsg socket hangs forever. > > I read in close 2 man page: > > When dealing with sockets, you have to be sure that there is no > *recv*(2) still blocking on it on > > another thread, otherwise it might block forever, since no more > messages will be sent via the > > socket. Be sure to use *shutdown*(2) to shut down all parts > the connection before closing the > > socket. > > > Moreover I see this example discussion [the answer by Joseph Quinsey > <http://stackoverflow.com/users/318716/joseph-quinsey>] of how to > properly close a socket: > > > http://stackoverflow.com/questions/12730477/close-is-not-closing-socket-properly > > Mr. Quinsey suggests that there are three (3) steps needed to successfully > close without hanging: > > a) getsockopt(fd, SOL_SOCKET, SO_ERROR, (char *)&err, &len)); // to clear > any error on the socket > > b) shutdown(fd, SHUT_RDWR); // to terminate reliable delivery > > c) close(fd); // finally > > > I don't see nanomsg doing a) or b), so I tend to think this is a bug in > the nn_close() implimentation, and these two steps should be added. > > Thoughts? > > > Thanks! > > - Jason > -- Best regards, Jason -- Jason E. Aten, Ph.D. j.e.aten@xxxxxxxxx 650-429-8602 linkedin: https://www.linkedin.com/pub/jason-e-aten-ph-d/18/313/45a