-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I've fixed it in the mainline. However, I have to test program to check it. Can you test it yourself and report back? Martin On 15/11/14 06:32, Martin Sustrik wrote: > The analysis sounds correct. > > If the socket is closed between global.c:1121 and global.c:1126 > there's an race condition... > > Martin > > On 07/11/14 16:23, Shon Love wrote: >> Hey nanomsg list! > > > >> I have stack traces for a race condition that crashes my test >> app in nanomsg. I am working on windows 7 x64. I was able to >> get a minidump of the app at this point, too. The dump >> (compressed) is 20MB, though, so I won’t attach it to this email. >> J > > > >> I’m not familiar enough with nanomsg internals to create a patch >> to fix this (yet). So, I thought I’d share this with the list >> and maybe someone who knows more can provide the needed patch. > > > >> In the test app, I create and destroy PAIR sockets fairly >> quickly. It’s a Qt5 application, and I’ve created a >> NanomsgReactor class that manages polling the nanomsg sockets in >> a separate thread. The reactor also is (usually) given the >> responsibility of closing the sockets it is polling if the >> application has asked that they be removed from the reactor – >> this helps ensure the sockets aren’t closed while they’re being >> polled. > > > >> It appears that nn_close will free the socket, then NULL It’s >> slot. Meanwhile, the nanomsg thread will call >> nn_global_submit_statistics, which rolls through the socket list >> looking for NON-NULL sockets, copy the pointer, check for NULL, >> then lock the socket’s context. This operation, however doesn’t >> protect from the socket being freed from the time the NULL is >> checked and the socket’s context being locked. It would seem >> there needs to be a global lock around the code that gets the >> socket, checks for NULL, then locks the socket’s context. > > > >> Thanks, > >> Shon > > > > > > > >> race condition from main branch code base (10/7): > > > > > > > >> on thread1 (application socket polling thread): > >> nn_close freed the socket, then set the slot to NULL > > > >> ntdll.dll!_NtWaitForSingleObject@12() + 0x15 bytes > >> ntdll.dll!_NtWaitForSingleObject@12() + 0x15 bytes > >> kernel32.dll!_HeapFree@12() + 0x14 bytes > >> ntdll.dll!_RtlEnterCriticalSection@4() + 0x16a68 bytes > >> nanomsg.dll!nn_mutex_lock(nn_mutex * self) Line 40 + 0xc bytes >> C > >> nanomsg.dll!nn_ctx_enter(nn_ctx * self) Line 48 + 0x9 bytes >> C > >> nanomsg.dll!nn_global_term() Line 349 + 0xa bytes C > >>> nanomsg.dll!nn_close(int s) Line 555 C > >> testapp.exe!Peering::NanomsgReactor::uninitialization() Line 87 >> + 0xa bytes C++ > >> testapp.exe!Initializable::uninitialize() Line 36 C++ > >> testapp.exe!Peering::NanomsgReactor::~NanomsgReactor() Line 27 >> C++ > >> testapp.exe!Peering::NanomsgReactor::`scalar deleting >> destructor'() + 0xf bytes C++ > >> Qt5Cored.dll!qDeleteInEventHandler(QObject * o) Line 4346 + >> 0x21 bytes C++ > >> Qt5Cored.dll!QObject::event(QEvent * e) Line 1232 + 0xc bytes >> C++ > > >> Qt5Cored.dll!QCoreApplicationPrivate::notify_helper(QObject * >> receiver, QEvent * event) Line 1053 C++ > >> Qt5Cored.dll!QCoreApplication::notify(QObject * receiver, QEvent >> * event) Line 997 + 0x25 bytes C++ > >> Qt5Cored.dll!QCoreApplication::notifyInternal(QObject * >> receiver, QEvent * event) Line 935 + 0x15 bytes >> C++ > >> Qt5Cored.dll!QCoreApplication::sendEvent(QObject * receiver, >> QEvent * event) Line 237 + 0x39 bytes C++ > > >> Qt5Cored.dll!QCoreApplicationPrivate::sendPostedEvents(QObject * >> receiver, int event_type, QThreadData * data) Line 1539 + 0xd >> bytes C++ > >> Qt5Cored.dll!QCoreApplication::sendPostedEvents(QObject * >> receiver, int event_type) Line 1397 + 0x11 bytes C++ > >> Qt5Cored.dll!QThreadPrivate::finish(void * arg, bool lockAnyway) >> Line 424 + 0x9 bytes C++ > >> Qt5Cored.dll!QThreadPrivate::start(void * arg) Line 409 + 0xb >> bytes C++ > >> msvcr100d.dll!__beginthreadex() + 0x243 bytes > >> msvcr100d.dll!__beginthreadex() + 0x1d4 bytes > >> kernel32.dll!@BaseThreadInitThunk@12() + 0x12 bytes > >> ntdll.dll!___RtlUserThreadStart@8() + 0x27 bytes > >> ntdll.dll!__RtlUserThreadStart@8() + 0x1b bytes > > > > > > > >> on thread2 (nanomsg context thread?): > >> nn_global_submit_statistics gets the socket, checks for NULL, >> then proceeds to lock the socket context - on a socket that has >> been free'd by thread1, presumably in the time it took to look it >> up and check for NULL. > > > >> ntdll.dll!_RtlpWaitOnCriticalSection@8() + 0x99 bytes > >> ntdll.dll!_RtlEnterCriticalSection@4() + 0x16a68 bytes > >> nanomsg.dll!nn_mutex_lock(nn_mutex * self) Line 40 + 0xc bytes >> C > >> nanomsg.dll!nn_ctx_enter(nn_ctx * self) Line 48 + 0x9 bytes >> C > >>> nanomsg.dll!nn_global_submit_statistics() Line 1034 + 0xc >> bytes C > >> nanomsg.dll!nn_global_handler(nn_fsm * self, int src, int type, >> void * srcptr) Line 1189 C > >> nanomsg.dll!nn_fsm_feed(nn_fsm * self, int src, int type, void * >> srcptr) Line 72 + 0x19 bytes C > >> nanomsg.dll!nn_fsm_event_process(nn_fsm_event * self) Line 66 + >> 0x17 bytes C > >> nanomsg.dll!nn_ctx_leave(nn_ctx * self) Line 63 + 0x9 bytes >> C > >> nanomsg.dll!nn_worker_routine(void * arg) Line 159 + 0xe bytes >> C > >> nanomsg.dll!nn_thread_main_routine(void * arg) Line 30 + 0x10 >> bytes C > >> msvcr100d.dll!__beginthreadex() + 0x243 bytes > >> msvcr100d.dll!__beginthreadex() + 0x1d4 bytes > >> kernel32.dll!@BaseThreadInitThunk@12() + 0x12 bytes > >> ntdll.dll!___RtlUserThreadStart@8() + 0x27 bytes > >> ntdll.dll!__RtlUserThreadStart@8() + 0x1b bytes > > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQEcBAEBAgAGBQJUZuZlAAoJENTpVjxCNN9YIn4H/1AlSrgavyeDjaJ1tAWjRgjN LJYE+uHrN4kU8ozvtKyXa+3R2YSbl071n4uHUPbCO0B7Hn3swA/b4YeKp0OciZ0k iMjizPxchheWlAUGZJVl1yo+oga9+3dbKrXMEv21SKyfx1MBQCB61Ja9Jt4yN6k+ iohqZlbDRxX1CEJSZcNl6ab+L4+K58vGOr/IwvjRs8ANgu0dk39VeiyQ4lfLTgTM So6dqEJqRSTM1GQMKDBd1i0bavU/SvtUELVQ3Y5VbZ6hizwgYfLwixGm53/0y6hv 5r9qY7o0U17WybwMKQDRLmjK/eAoIdWetKzzDkZObpiGgbXhDy5H33feButu5uE= =R/kc -----END PGP SIGNATURE-----