On 2009-03-16 at 00:32:50 [+0100], David McPaul <dlmcpaul@xxxxxxxxx> wrote: > > Is there a document I can read about debugging locks? None I would know of at least. > Because I can lock up MediaPlayer quite easily to the point where only > a reboot fixes it. That sounds indeed like a kernel lock is involved. The basic strategy to track down locking problems is relatively easy, though. You'll probably use the following kernel debugger commands: * teams -- List all teams and find the one you're interested in. * threads [ <teamID> ] -- List all threads [of a given team]. It also shows the state of each thread, most interestingly on what locking primitive it is waiting, if any. * thread [ -s ] <threadIDs> -- List info for the threads specified by ID. I prefer the "-s" option, which uses the same compact format as "threads". * sem/mutex/rwlock/cvar <ID or address> -- List info for the respective locking primitive. The first step to track a locking problem is to find a thread that is blocked and look up the locking primitive it is waiting on. In case of mutexes you'll get the thread holding it ATM. Just check what the holding thread is doing (waiting for). For semaphores you'll see what thread acquired and released them last (if the same thread released it after acquiring its thread ID is listed prefixed by a "-"). So, if the semaphore is used as a lock, you can follow the lock holder just the same way. For rwlocks only a writer holding the lock is listed, for readers only their total number is known. Condition variables are a bit more complicated, since they usually are used asymmetrically, i.e. one thread waits, another signals the condition. So for a specific condition variable knowing the source code using it is pretty much inevitable to understand what's happening -- e.g. an "I/O request finished" condition variable is signaled by the I/O scheduler, "pipe" by FIFO readers, etc. To sum it up, most locking primitives a thread is waiting on allow you to find the thread that is responsible, i.e. the current lock holder, respectively the one supposed to signal the condition. In a deadlock situation this thread will likely be waiting on a locking primitive itself. Iteratively following the responsible threads and locking primitives might turn up a cycle. Stack traces of the involved threads will then help to understand the problem (requiring some knowledge of the respective kernel sources). In some situations things are a bit more complicated, like when one has to find a read lock holder. From my experience most cases are possible to analyze, though. CU, Ingo