RE: Alternative to V$PQ_TQSTAT for troubleshooting unbalanced HJs?

  • From: "Tornblad, John" <JTornblad@xxxxxxxxxx>
  • To: <oracle-l@xxxxxxxxxxxxx>
  • Date: Sun, 12 Aug 2012 12:15:09 -0500

Much thanks to Greg for pointing out _sqlmon_max_planlines and its
default cutoff of not showing any query with more than 300 lines in
their plan.  Our plan was about 1500 lines... bumping up the limit did
do the trick... on the next execution of our query it immediately show
up and I began with delight seeing a nice correlation with
V$SESSION_LONGOPS and the plan statistics in SQL Monitor.
 

HOWEVER... it was a swift transition from the thrill of victory to the
agony of defeat...

 

After 90 minutes of watching our query in SQL Monitor... it DISAPPEARED
utterly and suddenly, and of course right before the query reached one
of the imbalances I'm trying to get some insight on.

 

Is there *another* hidden parameter that controls this (how long a query
is allowed to be monitored)?

 

-john

 

 

From: Greg Rahn [mailto:greg@xxxxxxxxxxxxxxxxxx] 
Sent: Saturday, August 11, 2012 9:11 AM
To: Tornblad, John
Cc: oracle-l@xxxxxxxxxxxxx
Subject: Re: Alternative to V$PQ_TQSTAT for troubleshooting unbalanced
HJs?

 

I think your best bet is SQL Monitor.  If it does not show up in OEM,
can you find it in v$sql_monitor?  If not, is the plan longer than 300
lines?  If so, you would need to set _sqlmon_max_planlines to a number
greater than the number of plan lines for that query.  Once you get the
sql_id and it shows up in v$sql monitor, you could just run a loop and
save some sql monitor reports to a directory and look at them later
using the SQL on this page:

http://structureddata.org/2008/01/06/oracle-11g-real-time-sql-monitoring
-using-dbms_sqltunereport_sql_monitor/

 

Your second best bet is to experiment with parts of the plan that you
think may be problematic and do a "select key, count group by key order
by count desc limit 10" query to see if the data has a very popular key.
Things to be aware of: window functions whose output produces just a few
number of values or joins (with a hash hash distribution) whose input
has a key that is significantly more popular than the rest of the data
or data with a Zipf distribution.

 

With parallel skew there are two types to be aware of:  

1) due to data skew (a very popular key), a worker does significantly
more work than the other workers

2) due to the way the data arrives to the operator, only one worker (or
a small number) is busy at any one point in time, but at the end, they
all process nearly the same amount of data (temporal skew)

 

 

On Fri, Aug 10, 2012 at 11:12 AM, Tornblad, John <JTornblad@xxxxxxxxxx>
wrote:

I have a large INSERT ... SELECT parallel query involving dozens of hash
joins, hundreds of millions of rows and strongly suspect we are getting
bitten by an imbalance of workload distributed among the parallel
servers for certain HJs, but not all.  The suspicion has been sparked by
watching our query in OEM's Activity window on the query... there are
"good" periods of time where clearly the parallel slaves are working
jointly and equally on a join... and other "bad" periods where a single
parallel slave is chewing away by itself for an hour.

To top it off there appears to be a bug in 11.2.0.3 / OEM 12c SQL
Monitoring such that something about our query makes it not show up at
all in OEM SQL Monitor (or in V$SQL_MONITOR) so I can't easily look when
the slow operation is running and just "see" what operation it's on and
try to work that back to what specific join might be causing an
imbalance of workload.

 

-- 

Regards,

Greg Rahn  |  blog <http://bit.ly/u9N0i8>   |  twitter
<http://bit.ly/v733dJ>   |  linkedin <http://linkd.in/gregrahn> 



</pre>This message is confidential, intended only for the named 
recipient(s) and may contain information that is privileged or 
exempt from disclosure under applicable law.  If you are not 
the intended recipient(s), you are notified that the 
dissemination, distribution, or copying of this message is 
strictly prohibited.  If you receive this message in error or 
are not the named recipient(s), please notify the sender by 
return email and delete this message. Thank you.
--
//www.freelists.org/webpage/oracle-l


Other related posts: