[haiku-development] Re: address space rwlock issue?

On 2009-03-16 at 00:32:50 [+0100], David McPaul <dlmcpaul@xxxxxxxxx> wrote:
> 
> Is there a document I can read about debugging locks?

None I would know of at least.

> Because I can lock up MediaPlayer quite easily to the point where only
> a reboot fixes it.

That sounds indeed like a kernel lock is involved. The basic strategy to 
track down locking problems is relatively easy, though. You'll probably use 
the following kernel debugger commands:

* teams -- List all teams and find the one you're interested in.

* threads [ <teamID> ] -- List all threads [of a given team]. It also shows 
the state of each thread, most interestingly on what locking primitive it is 
waiting, if any.

* thread [ -s ] <threadIDs> -- List info for the threads specified by ID. I 
prefer the "-s" option, which uses the same compact format as "threads".

* sem/mutex/rwlock/cvar <ID or address> -- List info for the respective 
locking primitive.

The first step to track a locking problem is to find a thread that is blocked 
and look up the locking primitive it is waiting on. In case of mutexes you'll 
get the thread holding it ATM. Just check what the holding thread is doing 
(waiting for). For semaphores you'll see what thread acquired and released 
them last (if the same thread released it after acquiring its thread ID is 
listed prefixed by a "-"). So, if the semaphore is used as a lock, you can 
follow the lock holder just the same way. For rwlocks only a writer holding 
the lock is listed, for readers only their total number is known. Condition 
variables are a bit more complicated, since they usually are used 
asymmetrically, i.e. one thread waits, another signals the condition. So for 
a specific condition variable knowing the source code using it is pretty much 
inevitable to understand what's happening -- e.g. an "I/O request finished" 
condition variable is signaled by the I/O scheduler, "pipe" by FIFO readers, 
etc.

To sum it up, most locking primitives a thread is waiting on allow you to 
find the thread that is responsible, i.e. the current lock holder, 
respectively the one supposed to signal the condition. In a deadlock 
situation this thread will likely be waiting on a locking primitive itself. 
Iteratively following the responsible threads and locking primitives might 
turn up a cycle. Stack traces of the involved threads will then help to 
understand the problem (requiring some knowledge of the respective kernel 
sources).

In some situations things are a bit more complicated, like when one has to 
find a read lock holder. From my experience most cases are possible to 
analyze, though.

CU, Ingo

Other related posts: