[nanomsg] Re: Race condition stack trace

  • From: Martin Sustrik <sustrik@xxxxxxxxxx>
  • To: nanomsg <nanomsg@xxxxxxxxxxxxx>
  • Date: Sat, 15 Nov 2014 06:36:37 +0100

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've fixed it in the mainline. However, I have to test program to
check it. Can you test it yourself and report back?

Martin

On 15/11/14 06:32, Martin Sustrik wrote:
> The analysis sounds correct.
> 
> If the socket is closed between global.c:1121 and global.c:1126 
> there's an race condition...
> 
> Martin
> 
> On 07/11/14 16:23, Shon Love wrote:
>> Hey nanomsg list!
> 
> 
> 
>> I have stack traces for a race condition that crashes my test
>> app in nanomsg.  I am working on windows 7 x64.  I was able to
>> get a minidump of the app at this point, too.  The dump
>> (compressed) is 20MB, though, so I won’t attach it to this email.
>> J
> 
> 
> 
>> I’m not familiar enough with nanomsg internals to create a patch 
>> to fix this (yet).  So, I thought I’d share this with the list
>> and maybe someone who knows more can provide the needed patch.
> 
> 
> 
>> In the test app, I create and destroy PAIR sockets fairly
>> quickly. It’s a Qt5 application, and I’ve created a
>> NanomsgReactor class that manages polling the nanomsg sockets in
>> a separate thread.  The reactor also is (usually) given the
>> responsibility of closing the sockets it is polling if the
>> application has asked that they be removed from the reactor –
>> this helps ensure the sockets aren’t closed while they’re being
>> polled.
> 
> 
> 
>> It appears that nn_close will free the socket, then NULL It’s
>> slot. Meanwhile, the nanomsg thread will call 
>> nn_global_submit_statistics, which rolls through the socket list 
>> looking for NON-NULL sockets, copy the pointer, check for NULL, 
>> then lock the socket’s context.  This operation, however doesn’t 
>> protect from the socket being freed from the time the NULL is 
>> checked and the socket’s context being locked.  It would seem
>> there needs to be a global lock around the code that gets the
>> socket, checks for NULL, then locks the socket’s context.
> 
> 
> 
>> Thanks,
> 
>> Shon
> 
> 
> 
> 
> 
> 
> 
>> race condition from main branch code base (10/7):
> 
> 
> 
> 
> 
> 
> 
>> on thread1 (application socket polling thread):
> 
>> nn_close freed the socket, then set the slot to NULL
> 
> 
> 
>> ntdll.dll!_NtWaitForSingleObject@12()  + 0x15 bytes
> 
>> ntdll.dll!_NtWaitForSingleObject@12()  + 0x15 bytes
> 
>> kernel32.dll!_HeapFree@12()  + 0x14 bytes
> 
>> ntdll.dll!_RtlEnterCriticalSection@4()  + 0x16a68 bytes
> 
>> nanomsg.dll!nn_mutex_lock(nn_mutex * self)  Line 40 + 0xc bytes 
>> C
> 
>> nanomsg.dll!nn_ctx_enter(nn_ctx * self)  Line 48 + 0x9 bytes
>> C
> 
>> nanomsg.dll!nn_global_term()  Line 349 + 0xa bytes       C
> 
>>> nanomsg.dll!nn_close(int s)  Line 555     C
> 
>> testapp.exe!Peering::NanomsgReactor::uninitialization() Line 87
>> + 0xa bytes     C++
> 
>> testapp.exe!Initializable::uninitialize()  Line 36   C++
> 
>> testapp.exe!Peering::NanomsgReactor::~NanomsgReactor() Line 27 
>> C++
> 
>> testapp.exe!Peering::NanomsgReactor::`scalar deleting
>> destructor'() + 0xf bytes            C++
> 
>> Qt5Cored.dll!qDeleteInEventHandler(QObject * o)  Line 4346 +
>> 0x21 bytes          C++
> 
>> Qt5Cored.dll!QObject::event(QEvent * e)  Line 1232 + 0xc bytes 
>> C++
> 
> 
>> Qt5Cored.dll!QCoreApplicationPrivate::notify_helper(QObject * 
>> receiver, QEvent * event)  Line 1053    C++
> 
>> Qt5Cored.dll!QCoreApplication::notify(QObject * receiver, QEvent
>> * event)  Line 997 + 0x25 bytes          C++
> 
>> Qt5Cored.dll!QCoreApplication::notifyInternal(QObject *
>> receiver, QEvent * event)  Line 935 + 0x15 bytes
>> C++
> 
>> Qt5Cored.dll!QCoreApplication::sendEvent(QObject * receiver,
>> QEvent * event)  Line 237 + 0x39 bytes C++
> 
> 
>> Qt5Cored.dll!QCoreApplicationPrivate::sendPostedEvents(QObject *
>>  receiver, int event_type, QThreadData * data)  Line 1539 + 0xd 
>> bytes     C++
> 
>> Qt5Cored.dll!QCoreApplication::sendPostedEvents(QObject *
>> receiver, int event_type)  Line 1397 + 0x11 bytes     C++
> 
>> Qt5Cored.dll!QThreadPrivate::finish(void * arg, bool lockAnyway) 
>> Line 424 + 0x9 bytes C++
> 
>> Qt5Cored.dll!QThreadPrivate::start(void * arg)  Line 409 + 0xb 
>> bytes       C++
> 
>> msvcr100d.dll!__beginthreadex()  + 0x243 bytes
> 
>> msvcr100d.dll!__beginthreadex()  + 0x1d4 bytes
> 
>> kernel32.dll!@BaseThreadInitThunk@12()  + 0x12 bytes
> 
>> ntdll.dll!___RtlUserThreadStart@8()  + 0x27 bytes
> 
>> ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes
> 
> 
> 
> 
> 
> 
> 
>> on thread2 (nanomsg context thread?):
> 
>> nn_global_submit_statistics gets the socket, checks for NULL,
>> then proceeds to lock the socket context - on a socket that has
>> been free'd by thread1, presumably in the time it took to look it
>> up and check for NULL.
> 
> 
> 
>> ntdll.dll!_RtlpWaitOnCriticalSection@8()  + 0x99 bytes
> 
>> ntdll.dll!_RtlEnterCriticalSection@4()  + 0x16a68 bytes
> 
>> nanomsg.dll!nn_mutex_lock(nn_mutex * self)  Line 40 + 0xc bytes 
>> C
> 
>> nanomsg.dll!nn_ctx_enter(nn_ctx * self)  Line 48 + 0x9 bytes
>> C
> 
>>> nanomsg.dll!nn_global_submit_statistics()  Line 1034 + 0xc
>> bytes             C
> 
>> nanomsg.dll!nn_global_handler(nn_fsm * self, int src, int type, 
>> void * srcptr)  Line 1189                C
> 
>> nanomsg.dll!nn_fsm_feed(nn_fsm * self, int src, int type, void * 
>> srcptr)  Line 72 + 0x19 bytes     C
> 
>> nanomsg.dll!nn_fsm_event_process(nn_fsm_event * self) Line 66 + 
>> 0x17 bytes             C
> 
>> nanomsg.dll!nn_ctx_leave(nn_ctx * self)  Line 63 + 0x9 bytes
>> C
> 
>> nanomsg.dll!nn_worker_routine(void * arg)  Line 159 + 0xe bytes 
>> C
> 
>> nanomsg.dll!nn_thread_main_routine(void * arg)  Line 30 + 0x10 
>> bytes C
> 
>> msvcr100d.dll!__beginthreadex()  + 0x243 bytes
> 
>> msvcr100d.dll!__beginthreadex()  + 0x1d4 bytes
> 
>> kernel32.dll!@BaseThreadInitThunk@12()  + 0x12 bytes
> 
>> ntdll.dll!___RtlUserThreadStart@8()  + 0x27 bytes
> 
>> ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes
> 
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJUZuZlAAoJENTpVjxCNN9YIn4H/1AlSrgavyeDjaJ1tAWjRgjN
LJYE+uHrN4kU8ozvtKyXa+3R2YSbl071n4uHUPbCO0B7Hn3swA/b4YeKp0OciZ0k
iMjizPxchheWlAUGZJVl1yo+oga9+3dbKrXMEv21SKyfx1MBQCB61Ja9Jt4yN6k+
iohqZlbDRxX1CEJSZcNl6ab+L4+K58vGOr/IwvjRs8ANgu0dk39VeiyQ4lfLTgTM
So6dqEJqRSTM1GQMKDBd1i0bavU/SvtUELVQ3Y5VbZ6hizwgYfLwixGm53/0y6hv
5r9qY7o0U17WybwMKQDRLmjK/eAoIdWetKzzDkZObpiGgbXhDy5H33feButu5uE=
=R/kc
-----END PGP SIGNATURE-----

Other related posts: