That looks to me like a very good catch. George Lambert On Wed, Jan 21, 2015 at 9:20 AM, Boszormenyi Zoltan <zboszor@xxxxx> wrote: > Hi, > > when can I get a review on https://github.com/nanomsg/nanomsg/pull/356 ? > > I can't believe this leak only happens on Fedora 20 and 21. > At least some other Linuxes should show the same problem. > > Thanks in advance, > Zoltán Böszörményi > > 2014-12-21 07:20 keltezéssel, Boszormenyi Zoltan írta: > > Hi, > > > > if you read the starting mail of this thread, you can see > > a memory leak reported by Valgrind. Your reply was at > > > > > //www.freelists.org/post/nanomsg/What-has-changed-since-02-in-socket-handling,1 > > > > and wondered about the nature of the leak, i.e. whether > > it's in GLIBC or nanomsg. > > > > The number of memory blocks leaked equals to the number of > > getaddrinfo_a() calls and it can simply be plugged by calling > > freeadrinfo() as in the attached patch. > > > > It became somewhat obvious after reading the example > > in the getaddrinfo() man page that you need to call freeaddrinfo() > > on the result. But it's not done in > src/transports/utils/dns_getaddrinfo_a.inc > > at the moment and the getaddrinfo_a() man page doesn't > > explicitly say you need to freeaddrinfo(->ar_result), it only says > > "The elements of this structure correspond to the arguments of > getaddrinfo(3). > > ... > > Finally, ar_result corresponds to the res argument; you do not need to > initialize this ele‐ > > ment, it will be automatically set when the request is resolved. > > ... > > " > > > > Yesterday, I have tried disabling getaddrinfo_a() detection in > configure.ac > > to see whether it leaks the same way. To my surprise, I got an > > > > Assertion failed: reply && !reply->ai_next > (src/transports/utils/dns_getaddrinfo.inc:112) > > > > when trying to nn_connect() to localhost. It turned out that GLIBC > > returns the resolved 127.0.0.1 twice, both for getaddrinfo and > getaddrinfo_a. > > I haven't looked at the differences of the two returned structures > > but there are indeed valid cases when more than one addresses > > are returned, e.g.: > > > > $ host www.kernel.org > > www.kernel.org is an alias for pub.all.kernel.org. > > pub.all.kernel.org has address 149.20.4.69 > > pub.all.kernel.org has address 198.145.20.140 > > pub.all.kernel.org has address 199.204.44.194 > > pub.all.kernel.org has IPv6 address 2001:4f8:1:10:0:1991:8:25 > > > > Considering this, the nn_assert() on line 112 in > > src/transports/utils/dns_getaddrinfo.inc is misguided. > > > > Best regards, > > Zoltán Böszörményi > > > > 2014-12-20 20:04 keltezéssel, Boszormenyi Zoltan írta: > >> Hi again, > >> > >> 2014-11-29 08:27 keltezéssel, Boszormenyi Zoltan írta: > >>> Hi, > >>> > >>> sorry for not replying your answer but I just re-subscribed recently > >>> and I didn't receive the answer from the mailing list. > >>> > >>> I sent the test program in private that integrated networking > >>> into a GLIB mainloop. The real code we use allows switching > >>> between ZeroMQ 3 (3.2.4, to be exact) and nanomsg at > >>> configure time and uses static inline wrappers and #define's > >>> for this reason. We only use the REP/REP pattern at the moment. > >>> > >>> The currently attached test programs (obvious ones, really) > >>> do exhibit the same problem I described in the first mail on > >>> Fedora 20 and 21. Messaging stops after a few (2 to 8) thousand > >>> messages. > >> the last commit "Fix locking bug in nn_global_submit_statistics()" > >> has fixed the lockup problem for REP/REQ. > >> > >> Thanks! > >> > >>> Similar code (or the wrapper API with GLIB mainloop integration) > >>> that uses ZeroMQ didn't stop, I have run one test during the night > >>> and after about 72 million packets, the program still runs stable > >>> and without any leaks. Again, on ZeroMQ 3.2.4. > >>> > >>> Regarding the closed sockets in TIME_WAIT state, I noticed that > >>> they slow down ZeroMQ, too, but don't make it lock up. Setting > >>> these sysctl variables help eliminating the slowdown by instructing > >>> the kernel to reuse those sockets more aggressively: > >>> > >>> net.ipv4.tcp_tw_recycle = 1 > >>> net.ipv4.tcp_tw_reuse = 1 > >>> > >>> Unfortunately, this didn't help nanomsg. > >>> > >>> Best regards, > >>> Zoltán Böszörményi > >>> > >>> 2014-11-21 21:46 keltezéssel, Boszormenyi Zoltan írta: > >>>> Hi, > >>>> > >>>> I use nanomsg with a wrapper library that integrates the networking > >>>> request-response pattern into the GLIB mainloop via > >>>> nn_getsockopt(NN_SOL_SOCKET, NN_RCVFD). > >>>> > >>>> IIRC, it worked well and without any leaks back then with nanomsg > 0.2-ish. > >>>> > >>>> Now, I have upgraded to 0.5 and e.g. on Fedora 20 and 21, my example > >>>> programs lock up after some time. netstat shows there are many sockets > >>>> in TIME_WAIT state even after both te client and server programs have > quit. > >>>> > >>>> Also, this memory leak was observed on both Fedora 20 and 21: > >>>> > >>>> ==18504== 43,776 (21,888 direct, 21,888 indirect) bytes in 342 blocks > are definitely lost > >>>> in loss record 3,232 of 3,232 > >>>> ==18504== at 0x4A0645D: malloc (in > /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) > >>>> ==18504== by 0x3E902DA99C: gaih_inet (in /usr/lib64/libc-2.18.so) > >>>> ==18504== by 0x3E902DE38C: getaddrinfo (in /usr/lib64/libc-2.18.so > ) > >>>> ==18504== by 0x5085FEF: handle_requests (in /usr/lib64/ > libanl-2.18.so) > >>>> ==18504== by 0x3E90E07EE4: start_thread (in /usr/lib64/ > libpthread-2.18.so) > >>>> ==18504== by 0x3E902F4B8C: clone (in /usr/lib64/libc-2.18.so) > >>>> > >>>> My understanding with nanomsg 0.2 was that I need these with REQ/REP: > >>>> > >>>> server: > >>>> initialization: nn_socket, nn_bind > >>>> in the handler loop: nn_recv[msg] + nn_freemsg on the incoming > message, then nn_send[msg] > >>>> to the client > >>>> when quitting: nn_close > >>>> > >>>> client (per REQ/REP message exchange): > >>>> nn_socket, nn_connect, nn_send[msg], nn_recv[msg], nn_close > >>>> > >>>> Do I need to nn_close() the socket on the server side or anything else > >>>> after the reply was sent? > >>>> > >>>> Thanks in advance, > >>>> Zoltán Böszörményi > >>>> > >> > > > -- P THINK BEFORE PRINTING: is it really necessary? This e-mail and its attachments are confidential and solely for the intended addressee(s). Do not share or use them without approval. If received in error, contact the sender and delete them.