Hi again, 2014-11-29 08:27 keltezéssel, Boszormenyi Zoltan írta: > Hi, > > sorry for not replying your answer but I just re-subscribed recently > and I didn't receive the answer from the mailing list. > > I sent the test program in private that integrated networking > into a GLIB mainloop. The real code we use allows switching > between ZeroMQ 3 (3.2.4, to be exact) and nanomsg at > configure time and uses static inline wrappers and #define's > for this reason. We only use the REP/REP pattern at the moment. > > The currently attached test programs (obvious ones, really) > do exhibit the same problem I described in the first mail on > Fedora 20 and 21. Messaging stops after a few (2 to 8) thousand > messages. the last commit "Fix locking bug in nn_global_submit_statistics()" has fixed the lockup problem for REP/REQ. Thanks! > > Similar code (or the wrapper API with GLIB mainloop integration) > that uses ZeroMQ didn't stop, I have run one test during the night > and after about 72 million packets, the program still runs stable > and without any leaks. Again, on ZeroMQ 3.2.4. > > Regarding the closed sockets in TIME_WAIT state, I noticed that > they slow down ZeroMQ, too, but don't make it lock up. Setting > these sysctl variables help eliminating the slowdown by instructing > the kernel to reuse those sockets more aggressively: > > net.ipv4.tcp_tw_recycle = 1 > net.ipv4.tcp_tw_reuse = 1 > > Unfortunately, this didn't help nanomsg. > > Best regards, > Zoltán Böszörményi > > 2014-11-21 21:46 keltezéssel, Boszormenyi Zoltan írta: >> Hi, >> >> I use nanomsg with a wrapper library that integrates the networking >> request-response pattern into the GLIB mainloop via >> nn_getsockopt(NN_SOL_SOCKET, NN_RCVFD). >> >> IIRC, it worked well and without any leaks back then with nanomsg 0.2-ish. >> >> Now, I have upgraded to 0.5 and e.g. on Fedora 20 and 21, my example >> programs lock up after some time. netstat shows there are many sockets >> in TIME_WAIT state even after both te client and server programs have quit. >> >> Also, this memory leak was observed on both Fedora 20 and 21: >> >> ==18504== 43,776 (21,888 direct, 21,888 indirect) bytes in 342 blocks are >> definitely lost >> in loss record 3,232 of 3,232 >> ==18504== at 0x4A0645D: malloc (in >> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==18504== by 0x3E902DA99C: gaih_inet (in /usr/lib64/libc-2.18.so) >> ==18504== by 0x3E902DE38C: getaddrinfo (in /usr/lib64/libc-2.18.so) >> ==18504== by 0x5085FEF: handle_requests (in /usr/lib64/libanl-2.18.so) >> ==18504== by 0x3E90E07EE4: start_thread (in /usr/lib64/libpthread-2.18.so) >> ==18504== by 0x3E902F4B8C: clone (in /usr/lib64/libc-2.18.so) >> >> My understanding with nanomsg 0.2 was that I need these with REQ/REP: >> >> server: >> initialization: nn_socket, nn_bind >> in the handler loop: nn_recv[msg] + nn_freemsg on the incoming message, then >> nn_send[msg] >> to the client >> when quitting: nn_close >> >> client (per REQ/REP message exchange): >> nn_socket, nn_connect, nn_send[msg], nn_recv[msg], nn_close >> >> Do I need to nn_close() the socket on the server side or anything else >> after the reply was sent? >> >> Thanks in advance, >> Zoltán Böszörményi >>