[nanomsg] Re: nn_close() of nanomsg socket hangs forever

  • From: "Jason E. Aten" <j.e.aten@xxxxxxxxx>
  • To: nanomsg <nanomsg@xxxxxxxxxxxxx>
  • Date: Sat, 31 Jan 2015 19:19:28 -0800

Update: I got a stack trace from gdb. It appears to be hung in
nn_sem_wait(), at src/utils/sem.c:159, which is a call:

rc = sem_wait (&self->sem); // src/utils/sem.c:159 hangs here.

So my earlier diagnosis was likely incorrect. It seems we have a logic bug
instead.

(gdb) *bt*

#0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:86

#1  0x00007ffff7dd0eeb in nn_sem_wait (self=self@entry=0x7fffb4017a88) at
src/utils/sem.c:159

#2  0x00007ffff7dca6c2 in nn_sock_term (self=0x7fffb40179b0) at
src/core/sock.c:202

#3  0x00007ffff7dc7837 in nn_close (s=31) at src/core/global.c:574

#4  0x0000000000401d7b in _cgo_14c45440a8bc_C2func_nn_close (v=0xc2094182a0)

    at /home/jaten/go/src/github.com/glycerine/go-nanomsg/nanomsg.go:61

#5  0x0000000000489ca5 in asmcgocall () at
/home/jaten/pkg/go1.4.1/go/src/runtime/asm_amd64.s:665

#6  0x0000000000000008 in ?? ()

#7  0x000000c20913e000 in ?? ()

#8  0x000000000044e749 in runtime.cgocall_errno (fn=0x0, arg=0x0,
~r2=4204019)

    at /home/jaten/pkg/go1.4.1/go/src/runtime/cgocall.go:117

#9  0x000000000047e804 in runtime.mstart () at
/home/jaten/pkg/go1.4.1/go/src/runtime/proc.c:836

#10 0x00000000004025f3 in crosscall_amd64 () at
/home/jaten/pkg/go1.4.1/go/src/runtime/cgo/gcc_amd64.S:35

#11 0x0000000000000003 in ?? ()

#12 0x0000000000000000 in ?? ()

(gdb)

On Sat, Jan 31, 2015 at 6:38 PM, Jason E. Aten <j.e.aten@xxxxxxxxx> wrote:

> In my application, this doesn't happen for a while, but then after a
> while, the server doing an nn_close() on a nanomsg socket hangs forever.
>
> I read in close 2 man page:
>
>        When  dealing with sockets, you have to be sure that there is no
> *recv*(2) still blocking on it on
>
>        another thread, otherwise it might block forever, since no more
> messages will be  sent  via  the
>
>        socket.  Be  sure  to  use  *shutdown*(2) to shut down all parts
> the connection before closing the
>
>        socket.
>
>
> Moreover I see this example discussion [the answer by Joseph Quinsey
> <http://stackoverflow.com/users/318716/joseph-quinsey>] of how to
> properly close a socket:
>
>
> http://stackoverflow.com/questions/12730477/close-is-not-closing-socket-properly
>
> Mr. Quinsey suggests that there are three (3) steps needed to successfully
> close without hanging:
>
> a) getsockopt(fd, SOL_SOCKET, SO_ERROR, (char *)&err, &len)); // to clear
> any error on the socket
>
> b) shutdown(fd, SHUT_RDWR); // to terminate reliable delivery
>
> c) close(fd); // finally
>
>
> I don't see nanomsg doing a) or b), so I tend to think this is a bug in
> the nn_close() implimentation, and these two steps should be added.
>
> Thoughts?
>
>
> Thanks!
>
> - Jason
>



-- 

Best regards,
Jason

--
Jason E. Aten, Ph.D.
j.e.aten@xxxxxxxxx
650-429-8602
linkedin: https://www.linkedin.com/pub/jason-e-aten-ph-d/18/313/45a

Other related posts: