Re: deadlock trace file in weblogic environment

  • From: Riyaj Shamsudeen <riyaj.shamsudeen@xxxxxxxxx>
  • To: yong321@xxxxxxxxx
  • Date: Sat, 14 Feb 2009 14:07:43 -0600

Hi Huang
  That was a complex problem (with simple solution). In fact, original issue
involved three instances in a deadlock scenario (happening randomly).
Developers swore that these threads in different instances will not lock
same rows and of course, since all three instances were involved in this
deadlock and LMD trace files were quite useless. We ended up pinning them to
one node and debugged this further. Deadlock graph I posted was that for
that scenario.
  When we resolved it, problem was very simple, initrans of 2 on one index (
that was reorged!) caused this issue and rebuilding that index with higher
initrans fixed it.

Cheers
Riyaj

On Sat, Feb 14, 2009 at 12:02 AM, Yong Huang <yong321@xxxxxxxxx> wrote:

> You're right, Riyaj. I googled and found a few wait-for-graphs that have
> mode 3 or 4, although most are 5.
>
> I think your graph can be interpreted sequentially along the lines: process
> with transaction ID's (whatever it is) [65662,2751] is blocked by process
> [65657,2197], which is blocked by the next one, which is blocked by next,
> ... and it wraps back up to [65662,2751]. I don't know what they're doing,
> and if all of them are on the same instance as in your case, you probably
> don't even see the SQL involved. Oracle definitely needs to improve the
> trace file to at least the same as a non-RAC deadlock trace.
>
> Thanks for the info about _lm_dd_interval. I noticed 11g shortens the
> default value of it from 60 to 10, and added a few new _lm_dd% params.
>
> Yong Huang
>
> --- On Fri, 2/13/09, Riyaj Shamsudeen <riyaj.shamsudeen@xxxxxxxxx> wrote:
>
> > From: Riyaj Shamsudeen <riyaj.shamsudeen@xxxxxxxxx>
> > Subject: Re: deadlock trace file in weblogic environment
> > To: yong321@xxxxxxxxx
> > Cc: oracle-l@xxxxxxxxxxxxx
> > Date: Friday, February 13, 2009, 3:35 PM
> >
> > Hi Huang
> >    I have seen different modes in RAC instances. Here is an
> > example from 9.2.0.8 instance.
> >
> > Global Wait-For-Graph(WFG) at ddTS[0.44] :
> >
> > BLOCKED 8e919f168 3 [0x178001f][0x7789a],[TX] [65662,2751] 0
> > BLOCKER 8e2aa44c8 3 [0x178001f][0x7789a],[TX] [65657,2197] 0
> > BLOCKED 8e2aa4bd0 3 [0x70023][0x2f68bc],[TX] [65657,2197] 0
> > BLOCKER 8e558a3c8 3 [0x70023][0x2f68bc],[TX] [65666,213] 0
> > BLOCKED 8e558aad0 3 [0x179000f][0x89000],[TX] [65666,213] 0
> > BLOCKER 8e819e070 3 [0x179000f][0x89000],[TX] [65707,152] 0
> > BLOCKED 8e819e1d8 3 [0x20001][0x4955c6],[TX] [65707,152] 0
> > BLOCKER 8e919f2d0 3 [0x20001][0x4955c6],[TX] [65662,2751] 0
> >
> > Capturing deadlock information is not so easy, especially,
> > if the problem
> > involves multiple instances. Not much information is
> > printed in deadlock
> > trace files (I think, 11g improves this, but haven't
> > tested this thoroughly
> > yet). Of course, to find why there is a deadlock need more
> > information about
> > blockers and blocked etc. View gv$lock is useless.
> >
> >  gv$ges_enqueue can provide more information. But, it is
> > slow since
> > gv$ges_enqueue encompasses BL locks (Buffer cache locks)
> > also, which tend to
> > be numerous on huge SGAs. Not to mention the fact that,
> > deadlock detection
> > algorithm (controlled by _lm_dd_interval) is smaller than
> > gv$ges_enqueue
> > query response time (at least, in our case). So, I ended up
> > writing a small
> > script to debug a deadlock we recently encountered.
> >
> >  Hrishy, you can find script rac_check_lock.ksh and
> > rac_check_lock.sql here.
> > Read and execute shell script(which will call sql script)
> > to capture state
> > of blockers and waiters in each node. This should give you
> > more information
> > to debug deadlocks. Please send output, if I can be of any
> > help.
> >
> >     <a
> > href="http://www.orainternals.com/scripts_rac1.php";>
> > orainternals_scripts_rac </a>
> >
> > Thanks!
> >
> > --
> > Cheers
> >
> > Riyaj Shamsudeen
> > Principal DBA,
> > Ora!nternals -  http://www.orainternals.com
> > Specialists in Performance, Recovery and EBS11i
> > Blog: http://orainternals.wordpress.com
> >
> >
> > On Fri, Feb 13, 2009 at 12:34 PM, Yong Huang
> > <yong321@xxxxxxxxx> wrote:
> >
> > > I find that at least the cvt | held mode part is not
> > right; it's always 5.
> > > For instance,
> > >
> > > Global Wait-For-Graph(WFG) at ddTS[0.5] :
> > > BLOCKED 0xd933e9a0 5 wq 2 cvtops x1 [0x70011][0x5c8f3],[TX]
> [34000-0001-0000522D] 0
> > > BLOCKER 0xd933e850 5 wq 1 cvtops x8 [0x70011][0x5c8f3],[TX]
> [5C000-0001-00000575] 0
> > > BLOCKED 0xd933ec58 5 wq 2 cvtops x1 [0x180017][0x3935f],[TX]
> [5C000-0001-00000575] 0
> > > BLOCKER 0xd933eaf0 5 wq 1 cvtops x8 [0x180017][0x3935f],[TX]
> [34000-0001-0000522D] 0
> > >
> > > It was the simplest ORA-60 deadlock created on RAC (Oracle 10.2.0.4).
> Both
> > > sessions were in the first instance. Of all RAC deadlock graphs I've
> seen,
> > > mode is always 5. But other pieces of info may be correct.
>
>
>
>

Other related posts: