RE: kkslce [KKSCHLPIN2] mutex contention on hard parse of PQ with binds

  • From: Jonathan Lewis <jonathan@xxxxxxxxxxxxxxxxxx>
  • To: oracle-l-freelists <oracle-l@xxxxxxxxxxxxx>
  • Date: Tue, 15 Sep 2015 08:37:13 +0000



Are the 4 out of 8 isolated to one of the two slave sets, or are they scattered
randomly across the two slave sets. (I would suspect the former).

It's probably expected behaviour (at least in some circumstances) but an
extreme example of slow parse times. I have seen a couple of cases in 11g where
PX slaves actually managed to get different execution plans for the same query
at the same time because the slaves re-optimized the query - if this is
possible, then the slaves could be in a position where they are attempting to
re-optimize the query, but are still waiting for the QC to finish its parse
call.

Gathering related information:

From v$sql - with one call to execute a new version of the statement, how many
parse_calls does the parallel statement report.
If you enable 10053 for a serial and a parallel version, does the parallel
trace show much more work than the serial ? Does the 12c serial show much more
work than the 11g serial?
If you enable 10046 trace can you see the 4 PX slaves doing work that looks
like re-optimization after they get the child mutex

Speculative question
Do the queries involve partitioned tables ?





Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
@jloracle
________________________________
From: oracle-l-bounce@xxxxxxxxxxxxx [oracle-l-bounce@xxxxxxxxxxxxx] on behalf
of Stefan Knecht [knecht.stefan@xxxxxxxxx]
Sent: 15 September 2015 09:12
To: oracle-l-freelists
Subject: kkslce [KKSCHLPIN2] mutex contention on hard parse of PQ with binds

Hi all

Fairly odd event. After an upgrade to 12.1.0.2 we've been seeing very high
"cursor: pin S wait on X" waits. But here's the catch - they only happen under
very special (and seemingly odd) circumstances:

- Query runs in parallel
- Query uses bind variables
- Query is hard parsed

If we run the same query in serial, or with literal values, or we re-use a
cursor, no waits are seen.

The waits are huge (up to 0.3 seconds) and are seen on the PX slaves only. Say
we're using 8 slaves, 4 of them will show these waits. 4 won't. And only if
we're using binds. No binds, no waits.

Does anyone have an explanation for what could be happening?

I've already tried disabling some of the new fancy adaptive features, and most
bind related features (peeking, px bind peeking, bind value captures, etc).
Heck, it even happens with opt features set to 11.2.

On a very similar database, that's still on 11.2.0.3, we're also seeing the
same behavior. The key difference being that the waits are very short - less
than 0.01 seconds. But they're still there, and I don't see why they would be
in the first place. What causes Oracle to pin that mutex, if a bind variable is
used in the query?

A bug has been filed. So we'll see what comes out of that. It would seem odd
that we'd be the first ones to hit this, it's not exactly exotic to run a
parallel query with binds.

Stefan


Other related posts: