[haiku-commits] haiku: hrev53345 - src/system/kernel src/system/kernel/scheduler headers/private/system src/bin/debug/time_stats src/apps/debuganalyzer/model

  • From: waddlesplash <waddlesplash@xxxxxxxxx>
  • To: haiku-commits@xxxxxxxxxxxxx
  • Date: Mon, 5 Aug 2019 23:10:33 -0400 (EDT)

hrev53345 adds 2 changesets to branch 'master'
old head: 64a4b621b81e85632a4fcd6eb59f9e6b6e1e3f0b
new head: af0be8dbc5df319bb65ae62cb9924d7344847a73
overview: 
https://git.haiku-os.org/haiku/log/?qt=range&q=af0be8dbc5df+%5E64a4b621b81e

----------------------------------------------------------------------------

2c588b031fd6: kernel: Properly separate and handle THREAD_BLOCK_TYPE_USER.
  
  Consider this scenario:
   * A userland thread puts its ID into some structure so that it
     can be woken up later, sets its wait_status to initiate the
     begin of the wait, and then calls _user_block_thread.
   * A second thread finishes whatever task the first thread
     intended to wait for, reads the ID almost immediately
     after it was written, and calls _user_unblock_thread.
   * _user_unblock_thread was called so soon that the first
     thread is not yet blocked on the _user_block_thread block,
     but is instead blocked on e.g. the thread's main mutex.
   * The first thread's thread_block() call returns B_OK.
     As in this example it was inside mutex_lock, it thinks
     that it now owns the mutex.
   * But it doesn't own the mutex, and so (until yesterday)
     all sorts of mayhem and then a random crash occurs, or
     (after yesterday) an assert-failure is tripped that
     the thread does not own the mutex it expected to.
  
  The above scenario is not a hypothetical, but is in fact the
  exact scenario behind the strange panics in #15211.
  
  The solution is to only have _user_unblock_thread actually
  unblock threads that were blocked by _user_block_thread,
  so I've introduced a new BLOCK_TYPE to differentiate these.
  While I'm at it, remove the BLOCK_TYPE_USER_BASE, which was
  never used (and now never will be.) If we want to differentiate
  different consumers of _user_block_thread for debugging
  purposes, we should use the currently-unused "object"
  argument to thread_block, instead of cluttering the
  relatively-clean block type debugging code with special
  types.
  
  One final note: The race condition which was the case of
  this bug does not, in fact, imply a deadlock on the part
  of the rw_lock here. The wait_status is protected by the
  thread's mutex, which is acquired by both _user_block_thread
  and _user_unblock_thread, and so if _user_unblock_thread
  succeeds faster than _user_block_thread can initiate
  the block, it will just see that wait_status is already
  <= 0 and return immediately.
  
  Fixes #15211.

af0be8dbc5df: kernel/condition_variable: Clean up comments and reduce needless 
unlocks.
  
  This "race prevention" code does not seem to be really hit at all
  in practice, at least from testing, so no need to do a full
  unlock/lock universally for it.
  
  I'm still not sure why the previous fixes here removed 80% of
  the performance benefits of the original change; I need to
  investigate that more.

                              [ Augustin Cavalier <waddlesplash@xxxxxxxxx> ]

----------------------------------------------------------------------------

6 files changed, 28 insertions(+), 11 deletions(-)
headers/private/system/thread_defs.h              |  2 +-
src/apps/debuganalyzer/model/Model.cpp            |  2 ++
src/bin/debug/time_stats/scheduling_analysis.cpp  |  3 +++
src/system/kernel/condition_variable.cpp          | 11 +++--------
src/system/kernel/scheduler/scheduler_tracing.cpp |  3 +++
src/system/kernel/thread.cpp                      | 18 ++++++++++++++++--

############################################################################

Commit:      2c588b031fd618f8ec37376906fbfc68f4eeea0b
URL:         https://git.haiku-os.org/haiku/commit/?id=2c588b031fd6
Author:      Augustin Cavalier <waddlesplash@xxxxxxxxx>
Date:        Tue Aug  6 02:31:02 2019 UTC

Ticket:      https://dev.haiku-os.org/ticket/15211

kernel: Properly separate and handle THREAD_BLOCK_TYPE_USER.

Consider this scenario:
 * A userland thread puts its ID into some structure so that it
   can be woken up later, sets its wait_status to initiate the
   begin of the wait, and then calls _user_block_thread.
 * A second thread finishes whatever task the first thread
   intended to wait for, reads the ID almost immediately
   after it was written, and calls _user_unblock_thread.
 * _user_unblock_thread was called so soon that the first
   thread is not yet blocked on the _user_block_thread block,
   but is instead blocked on e.g. the thread's main mutex.
 * The first thread's thread_block() call returns B_OK.
   As in this example it was inside mutex_lock, it thinks
   that it now owns the mutex.
 * But it doesn't own the mutex, and so (until yesterday)
   all sorts of mayhem and then a random crash occurs, or
   (after yesterday) an assert-failure is tripped that
   the thread does not own the mutex it expected to.

The above scenario is not a hypothetical, but is in fact the
exact scenario behind the strange panics in #15211.

The solution is to only have _user_unblock_thread actually
unblock threads that were blocked by _user_block_thread,
so I've introduced a new BLOCK_TYPE to differentiate these.
While I'm at it, remove the BLOCK_TYPE_USER_BASE, which was
never used (and now never will be.) If we want to differentiate
different consumers of _user_block_thread for debugging
purposes, we should use the currently-unused "object"
argument to thread_block, instead of cluttering the
relatively-clean block type debugging code with special
types.

One final note: The race condition which was the case of
this bug does not, in fact, imply a deadlock on the part
of the rw_lock here. The wait_status is protected by the
thread's mutex, which is acquired by both _user_block_thread
and _user_unblock_thread, and so if _user_unblock_thread
succeeds faster than _user_block_thread can initiate
the block, it will just see that wait_status is already
<= 0 and return immediately.

Fixes #15211.

----------------------------------------------------------------------------

diff --git a/headers/private/system/thread_defs.h 
b/headers/private/system/thread_defs.h
index c2cc881206..e2ba2ee9fb 100644
--- a/headers/private/system/thread_defs.h
+++ b/headers/private/system/thread_defs.h
@@ -28,9 +28,9 @@ enum {
        THREAD_BLOCK_TYPE_SIGNAL                                = 3,
        THREAD_BLOCK_TYPE_MUTEX                                 = 4,
        THREAD_BLOCK_TYPE_RW_LOCK                               = 5,
+       THREAD_BLOCK_TYPE_USER                                  = 6,
 
        THREAD_BLOCK_TYPE_OTHER                                 = 9999,
-       THREAD_BLOCK_TYPE_USER_BASE                             = 10000
 };
 
 
diff --git a/src/apps/debuganalyzer/model/Model.cpp 
b/src/apps/debuganalyzer/model/Model.cpp
index fe1119ab7d..0151d92055 100644
--- a/src/apps/debuganalyzer/model/Model.cpp
+++ b/src/apps/debuganalyzer/model/Model.cpp
@@ -46,6 +46,8 @@ wait_object_type_name(uint32 type)
                        return "mutex";
                case THREAD_BLOCK_TYPE_RW_LOCK:
                        return "rw lock";
+               case THREAD_BLOCK_TYPE_USER:
+                       return "user";
                case THREAD_BLOCK_TYPE_OTHER:
                        return "other";
                case THREAD_BLOCK_TYPE_SNOOZE:
diff --git a/src/bin/debug/time_stats/scheduling_analysis.cpp 
b/src/bin/debug/time_stats/scheduling_analysis.cpp
index 6069f96d4e..1b66ed0a90 100644
--- a/src/bin/debug/time_stats/scheduling_analysis.cpp
+++ b/src/bin/debug/time_stats/scheduling_analysis.cpp
@@ -109,6 +109,9 @@ wait_object_to_string(scheduling_analysis_wait_object* 
waitObject, char* buffer,
                        else
                                sprintf(buffer, "rwlock %p (%s)", object, 
waitObject->name);
                        break;
+               case THREAD_BLOCK_TYPE_USER:
+                       strcpy(buffer, "user");
+                       break;
                case THREAD_BLOCK_TYPE_OTHER:
                        sprintf(buffer, "other %p (%s)", object, 
waitObject->name);
                        break;
diff --git a/src/system/kernel/scheduler/scheduler_tracing.cpp 
b/src/system/kernel/scheduler/scheduler_tracing.cpp
index f944c17f68..d7313ee4be 100644
--- a/src/system/kernel/scheduler/scheduler_tracing.cpp
+++ b/src/system/kernel/scheduler/scheduler_tracing.cpp
@@ -79,6 +79,9 @@ ScheduleThread::AddDump(TraceOutput& out)
                        case THREAD_BLOCK_TYPE_RW_LOCK:
                                out.Print("rwlock %p", fPreviousWaitObject);
                                break;
+                       case THREAD_BLOCK_TYPE_USER:
+                               out.Print("_user_block_thread()");
+                               break;
                        case THREAD_BLOCK_TYPE_OTHER:
                                out.Print("other (%p)", fPreviousWaitObject);
                                        // We could print the string, but it 
might come from a
diff --git a/src/system/kernel/thread.cpp b/src/system/kernel/thread.cpp
index 77820c8961..bcbde99b96 100644
--- a/src/system/kernel/thread.cpp
+++ b/src/system/kernel/thread.cpp
@@ -1699,6 +1699,10 @@ _dump_thread_info(Thread *thread, bool shortInfo)
                                        kprintf("rwlock    %p   ", 
thread->wait.object);
                                        break;
 
+                               case THREAD_BLOCK_TYPE_USER:
+                                       kprintf("user%*s", 
B_PRINTF_POINTER_WIDTH + 11, "");
+                                       break;
+
                                case THREAD_BLOCK_TYPE_OTHER:
                                        kprintf("other%*s", 
B_PRINTF_POINTER_WIDTH + 10, "");
                                        break;
@@ -1782,6 +1786,10 @@ _dump_thread_info(Thread *thread, bool shortInfo)
                                kprintf("rwlock %p\n", thread->wait.object);
                                break;
 
+                       case THREAD_BLOCK_TYPE_USER:
+                               kprintf("user\n");
+                               break;
+
                        case THREAD_BLOCK_TYPE_OTHER:
                                kprintf("other (%s)\n", 
(char*)thread->wait.object);
                                break;
@@ -2976,7 +2984,13 @@ user_unblock_thread(thread_id threadID, status_t status)
        if (thread->user_thread->wait_status > 0) {
                thread->user_thread->wait_status = status;
                clear_ac();
-               thread_unblock_locked(thread, status);
+
+               // Even if the user_thread->wait_status was > 0, it may be the
+               // case that this thread is actually blocked on something else.
+               if (thread->state == B_THREAD_WAITING
+                               && thread->wait.type == THREAD_BLOCK_TYPE_USER) 
{
+                       thread_unblock_locked(thread, status);
+               }
        } else
                clear_ac();
 
@@ -3715,7 +3729,7 @@ _user_block_thread(uint32 flags, bigtime_t timeout)
        clear_ac();
 
        // nope, so wait
-       thread_prepare_to_block(thread, flags, THREAD_BLOCK_TYPE_OTHER, "user");
+       thread_prepare_to_block(thread, flags, THREAD_BLOCK_TYPE_USER, NULL);
 
        threadLocker.Unlock();
 

############################################################################

Revision:    hrev53345
Commit:      af0be8dbc5df319bb65ae62cb9924d7344847a73
URL:         https://git.haiku-os.org/haiku/commit/?id=af0be8dbc5df
Author:      Augustin Cavalier <waddlesplash@xxxxxxxxx>
Date:        Tue Aug  6 02:58:28 2019 UTC

kernel/condition_variable: Clean up comments and reduce needless unlocks.

This "race prevention" code does not seem to be really hit at all
in practice, at least from testing, so no need to do a full
unlock/lock universally for it.

I'm still not sure why the previous fixes here removed 80% of
the performance benefits of the original change; I need to
investigate that more.

----------------------------------------------------------------------------

diff --git a/src/system/kernel/condition_variable.cpp 
b/src/system/kernel/condition_variable.cpp
index db8bf76f63..1f0c19a3be 100644
--- a/src/system/kernel/condition_variable.cpp
+++ b/src/system/kernel/condition_variable.cpp
@@ -159,18 +159,13 @@ ConditionVariableEntry::Wait(uint32 flags, bigtime_t 
timeout)
 
        entryLocker.Lock();
 
-       // Remove entry from variable, if not done yet. Since we are going to
-       // lock ourselves and recheck fVariable if it is not NULL, we don't need
-       // to acquire our own lock before doing this first NULL check.
+       // Remove entry from variable, if not done yet.
        if (fVariable != NULL) {
-               if (fVariable == NULL)
-                       return error;
                SpinLocker conditionLocker(fVariable->fLock);
-               entryLocker.Unlock();
-
                if (fVariable->fEntries.Contains(this)) {
                        fVariable->fEntries.Remove(this);
                } else {
+                       entryLocker.Unlock();
                        // The variable's fEntries did not contain us, but we 
currently
                        // have the variable's lock acquired. This must mean we 
are in
                        // a race with the variable's Notify. It is possible we 
will be
@@ -178,8 +173,8 @@ ConditionVariableEntry::Wait(uint32 flags, bigtime_t 
timeout)
                        // spin until our fVariable member is unset by the 
Notify thread
                        // and then re-acquire our own lock to avoid a 
use-after-free.
                        while (atomic_pointer_get(&fVariable) != NULL) {}
+                       entryLocker.Lock();
                }
-               entryLocker.Lock();
                fVariable = NULL;
        }
 


Other related posts:

  • » [haiku-commits] haiku: hrev53345 - src/system/kernel src/system/kernel/scheduler headers/private/system src/bin/debug/time_stats src/apps/debuganalyzer/model - waddlesplash