[nanomsg] Issues with nn_usock_bind, with NN_IPV4ONLY disabled

From: Matthew Hall <mhall@xxxxxxxxxxxxxxx>
To: nanomsg@xxxxxxxxxxxxx
Date: Sat, 13 Sep 2014 19:20:18 -0700
Hello,

I'm getting some real weird behavior when using the NN_IPV4ONLY socket option.

I used a URL like "tcp://[192.168.1.6]:10001", and code like this:

    nn_queue->conn = nn_socket(AF_SP, nn_queue->type);
    if (nn_queue->conn < 0) {
        fprintf(stderr, "could not allocate nm queue socket: %s\n", 
nn_strerror(nn_errno()));
        goto error_out;
    } 
    so_value = 0;
    rv = nn_setsockopt(nn_queue->conn, NN_SOL_SOCKET, NN_IPV4ONLY, &so_value, 
sizeof(so_value));
    if (rv != 0) {
        fprintf(stderr, "could not enable nm ipv6 support: %s\n", 
nn_strerror(nn_errno()));
        goto error_out;
    }
    nn_queue->remote_id = nn_connect(nn_queue->conn, nn_queue->url);
    if (nn_queue->remote_id < 0) {
        fprintf(stderr, "could not connect nm queue socket: %s\n", 
nn_strerror(nn_errno()));
        goto error_out;
    }

Everything succeeds until I hit nn_connect(). Then the code prints "Address 
family not supported by protocol [97] (src/transports/tcp/ctcp.c:619)" and 
aborts my client application.

If I comment out the NN_IPV4ONLY, then it works fine. The system has correct 
working global IPv6 addresses available.

The abort happens due to an errno from bind() in nn_usock_bind().

When it aborts with NN_IPV4ONLY set to 0 this is what I see:

Breakpoint 5, nn_usock_bind (self=0x871140, addr=0x7fffffffdf20, addrlen=28) at 
src/aio/usock_posix.inc:276
(gdb) p addr
$11 = (const struct sockaddr *) 0x7fffffffdf20
(gdb) p *addr
$12 = {sa_family = 10, sa_data = '\000' <repeats 13 times>}
(gdb) n
281         errno_assert (rc == 0);
(gdb) 
283         rc = bind (self->s, addr, (socklen_t) addrlen);
(gdb) 
284         if (nn_slow (rc != 0))
(gdb) 
285             return -errno;
(gdb) p rc
$13 = -1
(gdb) p errno
$14 = 97
(gdb) p *addr
$15 = {sa_family = 10, sa_data = '\000' <repeats 13 times>}

When it succeeds with NN_IPV4ONLY set to default this is what I see:

Breakpoint 5, nn_usock_bind (self=0x871140, addr=0x7fffffffdf30, addrlen=16) at 
src/aio/usock_posix.inc:276
276         nn_assert_state (self, NN_USOCK_STATE_STARTING);
(gdb) n
279         opt = 1;
(gdb) 
280         rc = setsockopt (self->s, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof 
(opt));
(gdb) 
281         errno_assert (rc == 0);
(gdb) 
283         rc = bind (self->s, addr, (socklen_t) addrlen);
(gdb) 
284         if (nn_slow (rc != 0))
(gdb) p rc
$16 = 0
(gdb) p addr
$17 = (const struct sockaddr *) 0x7fffffffdf30
(gdb) p *addr
$18 = {sa_family = 2, sa_data = '\000' <repeats 13 times>}
(gdb) p rc
$19 = 0
(gdb) p errno
$20 = 0

It would be a big help for me if you guys can help a bit to track it down and 
patch it. I don't know this code super well yet to figure it all out.

The main thing I'm seeing, it's calling bind() with an INET4 socket in the 
default case, and with an INET6 socket in the NN_IPV4ONLY-disabled case.

In both cases using a zeroed-out address. Given V6 is enabled I'd expect this 
would work... but why would it get errno of EAFNOSUPPORT then?

$ zgrep V6 /boot/config-$(uname -r)
CONFIG_SYSV68_PARTITION=y
CONFIG_IPV6=y

sysctl reports I'm not in bindv6only mode... not sure if it could be related.

net.ipv6.bindv6only = 0

Thanks,
Matthew.
[nanomsg] Issues with nn_usock_bind, with NN_IPV4ONLY disabled

Other related posts: