[nanomsg] Re: What has changed since 0.2 in socket handling?

  • From: Boszormenyi Zoltan <zboszor@xxxxx>
  • To: nanomsg@xxxxxxxxxxxxx
  • Date: Sat, 20 Dec 2014 20:04:12 +0100

Hi again,

2014-11-29 08:27 keltezéssel, Boszormenyi Zoltan írta:
> Hi,
>
> sorry for not replying your answer but I just re-subscribed recently
> and I didn't receive the answer from the mailing list.
>
> I sent the test program in private that integrated networking
> into a GLIB mainloop. The real code we use allows switching
> between ZeroMQ 3 (3.2.4, to be exact) and nanomsg at
> configure time and uses static inline wrappers and #define's
> for this reason. We only use the REP/REP pattern at the moment.
>
> The currently attached test programs (obvious ones, really)
> do exhibit the same problem I described in the first mail on
> Fedora 20 and 21. Messaging stops after a few (2 to 8) thousand
> messages.

the last commit "Fix locking bug in nn_global_submit_statistics()"
has fixed the lockup problem for REP/REQ.

Thanks!

>
> Similar code (or the wrapper API with GLIB mainloop integration)
> that uses ZeroMQ didn't stop, I have run one test during the night
> and after about 72 million packets, the program still runs stable
> and without any leaks. Again, on ZeroMQ 3.2.4.
>
> Regarding the closed sockets in TIME_WAIT state, I noticed that
> they slow down ZeroMQ, too, but don't make it lock up. Setting
> these sysctl variables help eliminating the slowdown by instructing
> the kernel to reuse those sockets more aggressively:
>
> net.ipv4.tcp_tw_recycle = 1
> net.ipv4.tcp_tw_reuse = 1
>
> Unfortunately, this didn't help nanomsg.
>
> Best regards,
> Zoltán Böszörményi
>
> 2014-11-21 21:46 keltezéssel, Boszormenyi Zoltan írta:
>> Hi,
>>
>> I use nanomsg with a wrapper library that integrates the networking
>> request-response pattern into the GLIB mainloop via
>> nn_getsockopt(NN_SOL_SOCKET, NN_RCVFD).
>>
>> IIRC, it worked well and without any leaks back then with nanomsg 0.2-ish.
>>
>> Now, I have upgraded to 0.5 and e.g. on Fedora 20 and 21, my example
>> programs lock up after some time. netstat shows there are many sockets
>> in TIME_WAIT state even after both te client and server programs have quit.
>>
>> Also, this memory leak was observed on both Fedora 20 and 21:
>>
>> ==18504== 43,776 (21,888 direct, 21,888 indirect) bytes in 342 blocks are 
>> definitely lost
>> in loss record 3,232 of 3,232
>> ==18504==    at 0x4A0645D: malloc (in 
>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>> ==18504==    by 0x3E902DA99C: gaih_inet (in /usr/lib64/libc-2.18.so)
>> ==18504==    by 0x3E902DE38C: getaddrinfo (in /usr/lib64/libc-2.18.so)
>> ==18504==    by 0x5085FEF: handle_requests (in /usr/lib64/libanl-2.18.so)
>> ==18504==    by 0x3E90E07EE4: start_thread (in /usr/lib64/libpthread-2.18.so)
>> ==18504==    by 0x3E902F4B8C: clone (in /usr/lib64/libc-2.18.so)
>>
>> My understanding with nanomsg 0.2 was that I need these with REQ/REP:
>>
>> server:
>> initialization: nn_socket, nn_bind
>> in the handler loop: nn_recv[msg] + nn_freemsg on the incoming message, then 
>>  nn_send[msg]
>> to the client
>> when quitting: nn_close
>>
>> client (per REQ/REP message exchange):
>> nn_socket, nn_connect, nn_send[msg], nn_recv[msg], nn_close
>>
>> Do I need to nn_close() the socket on the server side or anything else
>> after the reply was sent?
>>
>> Thanks in advance,
>> Zoltán Böszörményi
>>


Other related posts: