[nanomsg] Race condition stack trace

  • From: Shon Love <slove@xxxxxxxxxx>
  • To: "nanomsg@xxxxxxxxxxxxx" <nanomsg@xxxxxxxxxxxxx>
  • Date: Fri, 7 Nov 2014 15:23:22 +0000

Hey nanomsg list!

  I have stack traces for a race condition that crashes my test app in nanomsg. 
 I am working on windows 7 x64.  I was able to get a minidump of the app at 
this point, too.  The dump (compressed) is 20MB, though, so I won't attach it 
to this email. :)

  I'm not familiar enough with nanomsg internals to create a patch to fix this 
(yet).  So, I thought I'd share this with the list and maybe someone who knows 
more can provide the needed patch.

  In the test app, I create and destroy PAIR sockets fairly quickly.  It's a 
Qt5 application, and I've created a NanomsgReactor class that manages polling 
the nanomsg sockets in a separate thread.  The reactor also is (usually) given 
the responsibility of closing the sockets it is polling if the application has 
asked that they be removed from the reactor - this helps ensure the sockets 
aren't closed while they're being polled.

  It appears that nn_close will free the socket, then NULL It's slot.  
Meanwhile, the nanomsg thread will call nn_global_submit_statistics, which 
rolls through the socket list looking for NON-NULL sockets, copy the pointer, 
check for NULL, then lock the socket's context.  This operation, however 
doesn't protect from the socket being freed from the time the NULL is checked 
and the socket's context being locked.  It would seem there needs to be a 
global lock around the code that gets the socket, checks for NULL, then locks 
the socket's context.

Thanks,
Shon



race condition from main branch code base (10/7):



on thread1 (application socket polling thread):
                nn_close freed the socket, then set the slot to NULL

               ntdll.dll!_NtWaitForSingleObject@12()  + 0x15 bytes
               ntdll.dll!_NtWaitForSingleObject@12()  + 0x15 bytes
               kernel32.dll!_HeapFree@12()  + 0x14 bytes
               ntdll.dll!_RtlEnterCriticalSection@4()  + 0x16a68 bytes
               nanomsg.dll!nn_mutex_lock(nn_mutex * self)  Line 40 + 0xc bytes  
       C
               nanomsg.dll!nn_ctx_enter(nn_ctx * self)  Line 48 + 0x9 bytes    C
               nanomsg.dll!nn_global_term()  Line 349 + 0xa bytes       C
>             nanomsg.dll!nn_close(int s)  Line 555     C
               testapp.exe!Peering::NanomsgReactor::uninitialization()  Line 87 
+ 0xa bytes     C++
               testapp.exe!Initializable::uninitialize()  Line 36   C++
               testapp.exe!Peering::NanomsgReactor::~NanomsgReactor()  Line 27  
   C++
               testapp.exe!Peering::NanomsgReactor::`scalar deleting 
destructor'()  + 0xf bytes            C++
               Qt5Cored.dll!qDeleteInEventHandler(QObject * o)  Line 4346 + 
0x21 bytes          C++
               Qt5Cored.dll!QObject::event(QEvent * e)  Line 1232 + 0xc bytes   
            C++
               Qt5Cored.dll!QCoreApplicationPrivate::notify_helper(QObject * 
receiver, QEvent * event)  Line 1053    C++
               Qt5Cored.dll!QCoreApplication::notify(QObject * receiver, QEvent 
* event)  Line 997 + 0x25 bytes          C++
               Qt5Cored.dll!QCoreApplication::notifyInternal(QObject * 
receiver, QEvent * event)  Line 935 + 0x15 bytes                C++
               Qt5Cored.dll!QCoreApplication::sendEvent(QObject * receiver, 
QEvent * event)  Line 237 + 0x39 bytes C++
               Qt5Cored.dll!QCoreApplicationPrivate::sendPostedEvents(QObject * 
receiver, int event_type, QThreadData * data)  Line 1539 + 0xd bytes     C++
               Qt5Cored.dll!QCoreApplication::sendPostedEvents(QObject * 
receiver, int event_type)  Line 1397 + 0x11 bytes     C++
               Qt5Cored.dll!QThreadPrivate::finish(void * arg, bool lockAnyway) 
 Line 424 + 0x9 bytes C++
               Qt5Cored.dll!QThreadPrivate::start(void * arg)  Line 409 + 0xb 
bytes       C++
               msvcr100d.dll!__beginthreadex()  + 0x243 bytes
               msvcr100d.dll!__beginthreadex()  + 0x1d4 bytes
               kernel32.dll!@BaseThreadInitThunk@12()  + 0x12 bytes
               ntdll.dll!___RtlUserThreadStart@8()  + 0x27 bytes
               ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes



on thread2 (nanomsg context thread?):
                nn_global_submit_statistics gets the socket, checks for NULL, 
then proceeds to lock the socket context - on a socket that has been free'd by 
thread1, presumably in the time it took to look it up and check for NULL.

               ntdll.dll!_RtlpWaitOnCriticalSection@8()  + 0x99 bytes
               ntdll.dll!_RtlEnterCriticalSection@4()  + 0x16a68 bytes
               nanomsg.dll!nn_mutex_lock(nn_mutex * self)  Line 40 + 0xc bytes  
       C
               nanomsg.dll!nn_ctx_enter(nn_ctx * self)  Line 48 + 0x9 bytes    C
>             nanomsg.dll!nn_global_submit_statistics()  Line 1034 + 0xc bytes  
>            C
               nanomsg.dll!nn_global_handler(nn_fsm * self, int src, int type, 
void * srcptr)  Line 1189                C
               nanomsg.dll!nn_fsm_feed(nn_fsm * self, int src, int type, void * 
srcptr)  Line 72 + 0x19 bytes     C
               nanomsg.dll!nn_fsm_event_process(nn_fsm_event * self)  Line 66 + 
0x17 bytes             C
               nanomsg.dll!nn_ctx_leave(nn_ctx * self)  Line 63 + 0x9 bytes     
C
               nanomsg.dll!nn_worker_routine(void * arg)  Line 159 + 0xe bytes  
          C
               nanomsg.dll!nn_thread_main_routine(void * arg)  Line 30 + 0x10 
bytes C
               msvcr100d.dll!__beginthreadex()  + 0x243 bytes
               msvcr100d.dll!__beginthreadex()  + 0x1d4 bytes
               kernel32.dll!@BaseThreadInitThunk@12()  + 0x12 bytes
               ntdll.dll!___RtlUserThreadStart@8()  + 0x27 bytes
               ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes

Other related posts: