Thanks Mark for the feedback ,
I wasn't lucky finding the Doc you were talking about and the upgrade to
12.2 is a long stretch considering the application is barely certified for
12.1.0.1 .
I did read few posts here in Oracle-L (Jonathan Lewis comments) about PX
slaves actually managed to get different execution plans for the same query
at the same time because the slaves re-optimized the query but nothing that
speaks really to my current situation .
Here are some actions that have been taken in the mean time .
1. Deactivated the parallelism : alter system set
parallel_max_servers=1 ---- new execution plans tend to be serialized
2. // Optmizer adaptive features : Alter system set
optimizer_adaptive_features=false; --- no effect
3. Downgraded the Optmizer features version to *11.2.0.1 : * Alter
system set optimizer_features_enable='11.2.0.1'; ---might go lower
4. The client DB has been moved to a new VM (part of planned
maintenance : the existing OS had few minor issues )
- And as if it wasn't enough : a BI report is now hanging for 45
minutes before crashing (although few occurrences are still making it in 13
seconds to 9 min . almost similar than the performance before the
parallelism deactivation)
I first thought the degradation was due of the fact that every Row
Source Operation (index /full scan ,hash joins ,etc..) was now executed in
serial for all queries in the database.
But to my big surprise , I discovered that this very report was
running half of its operations in parallel during a live monitoring of its
session before crashing.
I mean what the heck !!!!!!!!!!!!!!! max_parallel_servers was
explicitly set to 1. Is this Optmizer such a nasty
Anybody have seen such phenomenon on 12.1 ?
Thanks for the help.
On Fri, Jun 7, 2019 at 9:41 PM Mark W. Farnham <mwf@xxxxxxxx> wrote:
12.1 there is (was) a bug with a patch available where child parses can
get a different plan (multiple times), it doesn’t match, repeat, and
hilarity ensues.
Sorry I don’t have any physical notes on that due to site security AND I
just don’t remember long strings like bug numbers and MOS notes any more.
I think that matches your symptoms, so worth a look. If memory serves the
bug look-up has Concurrency cursor: pin S wait on X in it.
Good luck!
mwf
PS: 12.2 seems a lot more reliable if you can get there (at least from my
experience, but I don’t have a large sample size of different sites.)
*From:* oracle-l-bounce@xxxxxxxxxxxxx [mailto:
oracle-l-bounce@xxxxxxxxxxxxx] *On Behalf Of *k.hadd
*Sent:* Friday, June 07, 2019 7:39 PM
*To:* Oracle-L@xxxxxxxxxxxxx
*Subject:* Fwd: 12c Parallelism in windows potentially behind library
cache lock/ PX Deq: Parse Reply/ PX Deq: Execution Msg
Hi Oracle mates,
I have been tracing sessions this week where a *query with multiple
unions caused user sessions to freeze subsequently* .There seemed to be
a contention in the library cache (locks) but -Oddly- when I checked the
trace file report using sqlt a huge time For these queries was unaccounted
for or Idle Time (*PX Deq: Parse Reply* )
Environment : *OS :* (VM) Windows server 2012 16 VCPU [ CPU and RAM
aren't reserved ]
*DB* : 12c Enterprise Edition Release 12.1.0.1.0 -
64bit
Also worth mentioning is that sometimes the optimizer has a knack for
wrongly choosing parallel over serial execution when the later is 50%
faster.
Any clue or insight on best practices to tackle parallelism issues on VM
server (other than completely disabling parallelism )?
Note: OPTIMIZER_ADAPTIVE_FEATURES to FALSE didn't help But disabling
parallelism kept the execution in sequence . I don' know why the Optimizer
keeps choosing a PX plan .
Back when the application ran on 10g we used to have Px deq : credit wait
events a lot but it's other parallel events since the upgrade to 12c.
Really appreciate any input .
Thank you
*• Context and findings :*
Below are the informations collected so far between ORACLE-S-ASH views
and the sqlt (*TRCANLZR*) report on the trace File ( 30 minutes).
*1- S-ASH Top events report for the related period (times are in Secs)*
BEGIN_TIME END_TIME WAIT_CLASS EVENT
SQL_ID COUNT TIME_WAITED AVG_TIME_WAITED
-------------------- -------------------- ------------
---------------------------------------- ------------- ----------
----------- ---------------
31-may-2019 10:00:00 31-may-2019 10:14:59 Concurrency library cache lock
*11wythn1bf8c5* 190968 556700
2,91
31-may-2019 10:00:03 31-may-2019 10:14:29 Concurrency library cache:
mutex X *11wythn1bf8c5* 1690 15
0
31-may-2019 10:01:04 31-may-2019 10:01:07 Concurrency cursor: pin S wait
on X *11wythn1bf8c5* 4 10
2,5
*2- The TraceFile for one of the users *
Relevant Executions
* Performance stats for * for one of the executions . Both Idle *(PX
deq : Parse Reply *which isn't really idle time) and *non accounted for
times* make for almost 100% of the response time.
I suspect the freeze comes from a parallel CPU en-queue during library
cache access but don't know how I could avoid this .
*Rank*
*TraceETPct1*
*SelfElapsedTime2*
*CPU Time*
*Non-IdleWait Time*
*RecursiveElapsedTime3*
*ExecCount*
*User*
*Depth*
*SQL Text*
*Hash Value*
*SQL ID*
*PlanHashValue*
*1:*
79.1%
841.060
22.094
0.814
0.850
3
115
0
SELECT 1*1 TypeEnregistrement ,1* *EMPLH.ID_R ID_*
*45556101* <https://mail.google.com/mail/u/0/#sql298>
11wythn1bf8c5
3867173512
*Call*
*Response TimeAccounted-for*
*ElapsedTime*
*CPU Time*
*Non-IdleWait Time*
*Elapsed TimeUnaccounted-for*
*IdleWait Time*
*Parse:*
0.042
0.032
0.016
0.001
0.016
0.010
*Execute:*
7.153
7.153
6.906
0.123
0.124
0.000
*Fetch:*
826.337
413.562
0.703
0.308
412.551
412.775
*Total:*
833.532
420.747
7.625
0.432
412.690
412.785
*FetchCall*
*Response TimeAccounted-for*
*ElapsedTime*
*CPU Time*
*Non-IdleWait Time*
*Elapsed TimeUnaccounted-for*
*IdleWait Time*
*Call Timestamp*
*First:*
826.080
413.810
0.984
0.306
412.529
412.541
MAY-31 10:44:33.191
Rank 1 : 11wythn1bf8c5 3867173512
*Event Name*
*Wait Class*
*Non-IdleWait Time*
*TimesWaitedNon-Idle*
*IdleWait Time*
*TimesWaitedIdle*
*AverageWait Time*
*MaxWait Time*
*Blocks*
*AverageBlocks*
*PX Deq: Parse Reply:*
Idle
412.533
64
6.445834
85.928264
*enq: PS - contention:*
Other
0.301
64
0.004699
0.008888
*SQL*Net message from client:*
Idle
0.246
56
0.004394
0.011639
*db file sequential read:*
User I/O
0.128
127
0.001011
0.014514
127
1
*PX Deq: Join ACK:*
Idle
0.004
127
0.000029
0.000645
*PX Deq Credit: send blkd:*
Idle
0.002
6
0.000392
0.000937
*SQL*Net more data to client:*
Network
0.002
65
0.000027
0.000272
*log file sync:*
Commit
0.001
1
0.000595
0.000595
*SQL*Net message to client:*
Network
0.000
57
0.000001
0.000003
*latch: row cache objects:*
Concurrency
0.000
4
0.000010
0.000020
*SQL*Net more data from client:*
Network
0.000
1
0.000013
0.000013
*latch: shared pool:*
Concurrency
0.000
1
0.000005
0.000005
*Total:*
0.432
320
412.785
253
*FetchCall*
*RecursiveCallCount*
*OSBuffer Gets(disk)*
*BG ConsistentRead Mode(query)*
*BG CurrentMode(current)*
*RowsProcessedor Returned*
*LibraryCacheMisses*
*TimesWaitedNon-Idle*
*TimesWaitedIdle*
*Call Timestamp*
*First:*
3000
10
198855
0
100
0
78
198
MAY-31 10:44:33.191
*Second:*
0
0
0
0
100
0
2
1
MAY-31 10:44:33.197
*Third:*
0
0
0
0
100
0
2
1
MAY-31 10:44:33.201
*Last:*
0
0
0
0
66
0
2
1
MAY-31 10:44:33.457
overview of the data exchange between the producers/consumer during the
execution
Username QC/Slave SlaveSet SID Slave INS STATE WAIT_EVENT
QC SID QC INS Req. DOP Actual DOP
------------ -------- -------- ------ --------- --------
------------------------------ ------ ------ -------- ----------
APP1 QC 596 1 WAIT PX Deq: Parse
Reply 596
- p00k (Slave) 1 16 1 WAIT PX Deq: Execution
Msg 596 1 32 32
- p006 (Slave) 1 62 1 WAIT PX Deq: Execution
Msg 596 1 32 32
- p00l (Slave) 1 211 1 WAIT PX Deq: Execution
Msg 596 1 32 32
- p00a (Slave) 1 224 1 WAIT PX Deq: Execution
Msg 596 1 32 32
- p00m (Slave) 1 384 1 WAIT PX Deq: Execution
Msg 596 1 32 32
- p00p (Slave) 1 405 1 WAIT library cache
lock 596 1 32 32
- p008 (Slave) 1 421 1 WAIT cursor: pin S
wait on X 596 1 32 32
Explain Plan : did not match the the plan used for execution which was
had parallel processes (adaptive_plan ). Again , I always wanted to ask
the client to reserve esx cpu and the ram for the vm but lacked concrete
evidence of it's impact on Oracle parallelism to back it up.
Thank you very much for reading
Kouss