Re: Reading resource busy error system trace

From: Pap <oracle.developer35@xxxxxxxxx>
To: Jonathan Lewis <jlewisoracle@xxxxxxxxx>
Date: Mon, 4 Jul 2022 21:36:39 +0530

I searched with string 'Call Stack Trace' but didnt find any such in all
the captured trace files generated with that ospid.

I double checked with the team and found the level 266 trace was not
getting generated somehow, so they did set it as level 258 as below.
However wondering if this detail expanded 'call stack trace' is not
captured just because it's been generated with level 258 as opposed to
level 266.

ALTER SYSTEM SET EVENTS '54 trace name systemstate level 258';

On Mon, 4 Jul 2022, 7:52 pm Jonathan Lewis, <jlewisoracle@xxxxxxxxx> wrote:

You didn't miss anything.

I read your comments about *systemstate level 266*, but my fingers typed
*errorstack
level 1* when I wasn't watching because that's a good first choice to try
when you see a session raising an error.

This is an extract from my test dump (with a lot of lines near the start
of the call stack deleted):

----- Error Stack Dump -----
<error barrier> at 0x7ffc8604b220 placed dbkda.c@296
ORA-00054: resource busy and acquire with NOWAIT specified or timeout
expired
----- Current SQL Statement for this session (sql_id=dutzuyds7kb59) -----
alter table child enable novalidate constraint c_fk_p

----- Call Stack Trace -----
calling              call     entry                argument values in hex

location             type     point                (? means dubious value)

-------------------- -------- --------------------
----------------------------
ksedst1()+95         call     kgdsdst()            7FFC8604A680 000000002
                                                   7FFC860449B0 ?
7FFC86044AC8 ?
...
                                                   000000000 000000082 ?
ktaiam()+1110        call     ksesecl0()           7FC6171F49C0 ?
7FC6170D0050 ?
                                                   000000036 ? 000000036
                                                   09BB833E8 000000000
ktagetg0()+929       call     ktaiam()             00001E6ED 000000004
000000000
                                                   7FFC8604DAB8 ?
000000000 ?
                                                   000000000 ?
ktagetp_internal()+  call     ktagetg0()           00001E6ED ? 000000004 ?
141                                                000000004 ?
7FFC8604DAB8 ?
                                                   000000000 ? 000000000 ?
ktagetg_ddlX()+323   call     ktagetp_internal()   00001E6ED 000000000
000000004
                                                   000000000 ? 000000000
                                                   746DC3D600000000
ktagetg_ddl()+30     call     ktagetg_ddlX()       00001E6ED ? 000000000 ?
                                                   000000004 ? 000000000 ?
                                                   000000000 ? 000000000
kkdllk0()+1551       call     ktagetg_ddl()        00001E6ED ? 000000000 ?
                                                   000000004 ? 000000000 ?
                                                   000000000 ? 000000000 ?

In your systemstate dump you should find a shortstack that includes a
section similar to:
...
ktaiam()+1110<-ktagetg0()+929<-ktagetp_internal()+141<-ktagetg_ddlX()+323<-ktagetg_ddl()+30
...
which gets expanded to the detailed call stack above.

Regards
Jonathan Lewis

On Mon, 4 Jul 2022 at 13:26, Pap <oracle.developer35@xxxxxxxxx> wrote:

Thank You Jonathan.

In the event trace i don't see any 'call stack trace' but seeing 'Short
stack dump' something as below. But again in that i don't see such
object_id in hexa format. Am i missing anything?

Also i didnt see any DML on the parent table same time in ASH but even
dev team saying they dont perform any DML on the parent objects during this
heavy child direct path insert runs.

https://gist.github.com/oracle9999/95307babd0366cc0c25afd447fa0264d

On Mon, 4 Jul 2022, 3:44 pm Jonathan Lewis, <jlewisoracle@xxxxxxxxx>
wrote:

Your dump file seems to be saying that the session is holding a mode 4
on the "child" table, and since the insert /*+ append */ would have been
holding a mode 6 the commit must have cleared the lock properly before the
following code acquired the mode 4.

Even though you're enabling with "novalidate" the enable would try to
lock the child and then the parent in mode 4, so the first thing you need
to check is that the problem isn't with the parent table being locked in an
incompatible fashion.

I'm not 100% certain, but if you check the call stack trace (headed as
follows)

----- Call Stack Trace -----
calling              call     entry                argument values in hex
location             type     point                (? means dubious
value)
-------------------- -------- --------------------
----------------------------

I think you'll find several lines similar to:
ktagetg0()+929       call     ktaiam()             00001E6ED 000000004
000000000
                                                   7FFFC6AA87B8 ?
000000000 ?
                                                   000000000 ?
ktagetp_internal()+  call     ktagetg0()           00001E6ED ? 000000004
?
141                                                000000004 ?
7FFFC6AA87B8 ?
                                                   000000000 ? 000000000
?
ktagetg_ddlX()+323   call     ktagetp_internal()   00001E6ED 000000000
000000004
                                                   000000000 ? 000000000
                                                   4DE9A6BF00000000
ktagetg_ddl()+30     call     ktagetg_ddlX()       00001E6ED ? 000000000
?
                                                   000000004 ? 000000000
?
                                                   000000000 ? 000000000
kkdllk0()+1551       call     ktagetg_ddl()        00001E6ED ? 000000000
?
                                                   000000004 ? 000000000
?
                                                   000000000 ? 000000000
?

The first parameter will be the object_id of the parent table where the
attempt to lock failed.  (In my sample 0001E6ED).

This may take you one step forward in your search for the source of the
problem.  Is it possible that you've got informatica doing "insert lots of
parent rows", "insert lots of child rows" nearly concurrently with the
expectation that the parent insert will complete and commit before the
child insert starts (or completes).  I have seen a couple of cases where
such things happen, and if the parent insert gets delayed the child insert
crashes.

Regards
Jonathan Lewis

On Sat, 2 Jul 2022 at 18:57, Pap <oracle.developer35@xxxxxxxxx> wrote:

Hello All,  I followed the blog below and tried debugging the system
trace to find the blocking session/sql which caused this resource busy
error, Ora-0054. But I am encountering strange behaviour. It appears to be
the same session blocking itself.The session which did the direct path load
and commit seems not releasing exclusive lock for some duration post
commit. Below are the details of the behaviour. Has anybody
encountered such a scenario?

https://nenadnoveljic.com/blog/troubleshooting-ora-00054-resource-busy-and-acquire-with-nowait-specified-or-timeout-expired/

So the code which is encountering the error is below. Its disabling all
constraints + Making index unusable and then does direct path load to that
table and then after while trying to enable the constraint of the same
table using 'ALTER' and its failing with Ora-0054. The sample code for data
load and ALTER table is as below.

****Sample code *****
INSERT /*+Append*/ INTO TAB1 (c1,c2.....)
        SELECT (ROWNUM + sqq_key) ,col1, col2, col3..... from....tab2,
tab3, tab4.....;
COMMIT;

FOR I IN (SELECT TABLE_NAME, CONSTRAINT_NAME FROM ALL_CONSTRAINTS WHERE
TABLE_NAME = v_table_nm AND CONSTRAINT_TYPE = 'R' AND STATUS = 'DISABLED')
LOOP
   EXECUTE IMMEDIATE ('ALTER TABLE ' || v_table_nm || ' ENABLE
NOVALIDATE CONSTRAINT '|| I.CONSTRAINT_NAME);
END LOOP I;

********

When i scan through trace file as per the blog step by step, i see below

*Step- 1 -* I searched through the object_Id(101739) for TAB1 and got
the section below, i.e pointing towards a DML lock on the table. And the
process state object is proc: 0x243efffc0.

   ----------------------------------------
   SO: 0x233cd8bc8, type: DML lock (83), map: 0x237d28a10
      state: LIVE (0x4532), flags: 0x0
      owner: 0x233b9af60, proc: 0x243efffc0
      link: 0x233cd8be8[0x28acf1e58, 0x1d793c780]
      pg: 0
   SOC: 0x237d28a10, type: *DML lock* (83), map: 0x233cd8bc8
      state: LIVE (0x99fc), flags: INIT (0x1)
   DML LOCK: tab=101739 flg=11 chi=0
          his[0]: mod=4 spn=69302
2022-07-01 05:44:10.662*:ksq.c@10787:ksqdmc(): Enqueue Dump
(enqueue) TM-00018D6B-00000000-00000000-00000000 DID: ksqlkdid:
0001-00CF-000ACD5C
   lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x7
   mode: S, lock_flag: 0x20, lock: 0x237d28a38, res: 0x2499f3798
   own: 0x2488605f8, sess: 0x2488605f8, *proc: 0x2403a4510*, prv:
0x2499f37a8
   slk: 0x2530967c8
   ----------------------------------------

*STEP - 2:-* Then i searched through the trace for the state object
address i.e. "*SO: 0x243efffc0*" and got below. its pointing to
pid:207 as below.

*** 2022-07-01T05:44:10.538156-04:00
PROCESS 207:
----------------------------------------
*SO: 0x243efffc0*, type: process (2), map: 0x2403a4510
     state: LIVE (0x4532), flags: 0x1
     owner: (nil), proc: 0x243efffc0
     link: 0x243efffe0[0x243efffe0, 0x243efffe0]
     child list count: 15, link: 0x243f00030[0x243b7f190, 0x1d7903b10]
     pg: 0
SOC: 0x2403a4510, type: process (2), map: 0x243efffc0
      state: LIVE (0x99fc), flags: INIT (0x1)
(process) *Oracle pid:207,* ser:3445, calls cur/top:
0x1a0372f08/0x1d2ee6c70
           flags : (0x0) - icon_uid:0 logon_pdbid=0
           flags2: (0x1000), flags3: (0x510)
           call error: 0, sess error: 0, txn error 0
           intr queue: empty
   (post info) last post received: 0 0 277
               last post received-location: kcl2.h LINE:5703 ID:kclcget
               last process to post me: 0x24c3b2180 1 6
               last post sent: 0 0 33
               last post sent-location: ksa2.h LINE:298 ID:ksasnd
               last process posted by me: 0x24837eaa0 1 6
               waiter on post event: 0
   (latch info) hold_bits=0x0 ud_influx=0x4101b
   (osp latch info) hold_bits=0x0 ud_influx=0x0
   Process Group: DEFAULT, pseudo proc: 0x2484c5d70
   O/S info: user: grid, term: UNKNOWN, ospid: 383076
   OSD pid info:
   Short stack dump:

ksedsts()+426<-ksdxfstk()+58<-ksdxdocmdmultex()+5961<-ksdxdocmdmult()+55<-ksudmp_proc()+1626<-ksudss_main()+1976<-ksudss_opt()+1544<-dbkdaKsdActDriver()+2484<-dbgdaExecuteAction()+354<-dbgdaRunAction()+770<-dbgdRunActions()+83<-dbgdProcessEventActions()+525<-dbgdChkEventKgErr()+394<-dbkdChkEventRdbmsErr()+65<-dbgePostErrorKGE()+1066<-dbkePostKGE_kgsf()+71<-kgeade()+338<-kgeselv()+89<-ksesecl0()+189<-ktaiam()+1110<-ktagetg0()+929<-ktagetp_internal()+141<-ktagetg_ddlX()+323<-ktagetg_ddl()+30<-kkdllk0()+1551<-kkdcprc()+711<-kkdcmco()+3682<-atbadd()+8770<-atbdrv()+14663<-opiexe()+30061<-opiosq0()+4575<-opipls()+14619<-opiodr()+1202<-rpidrus()+198<-skgmstack()+65<-rpidru()+132<-rpiswu2()+614<-rpidrv()+1494<-psddr0()+467<-psdnal()+646<-pevm_EXIM()+276<-pfrinstr_EXIM()+43<-pfrrun_no_tool()+52<-pfrrun()+902<-plsql_run()+755<-peicnt()+279<-kkxexe()+720<-opiexe()+24762<-kpoal8()+2226<-opiodr()+1202<-ttcpip()+1239<-opitsk()+1897<-opiino()+936<-opiodr()+1202<-opidrv()+1094<-sou2o()+165<-opimai_real()+422<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245

   KGL-UOL (Process state object)
   KGX Atomic Operation Log 0x2403a5690
    Mutex (nil)(0, 0) idn 0 oper NONE(0)
    FSO mutex uid 262142 efd 0 whr 0 slp 0
   KGX Atomic Operation Log 0x2403a56f8
    Mutex (nil)(0, 0) idn 0 oper NONE(0)
    FSO mutex uid 262142 efd 0 whr 0 slp 0
   KGX Atomic Operation Log 0x2403a5760
    Mutex (nil)(0, 0) idn 0 oper NONE(0)
    FSO mutex uid 262142 efd 0 whr 0 slp 0
   KGX Atomic Operation Log 0x2403a57c8
    Mutex (nil)(0, 0) idn 0 oper NONE(0)
    FSO mutex uid 262142 efd 0 whr 0 slp 0
   KGX Atomic Operation Log 0x2403a5830
    Mutex (nil)(0, 0) idn 0 oper NONE(0)
    FSO mutex uid 262142 efd 0 whr 0 slp 0
   KGX Atomic Operation Log 0x2403a5898
    Mutex (nil)(0, 0) idn 0 oper NONE(0)
    FSO mutex uid 262142 efd 0 whr 0 slp 0
   KGX Atomic Operation Log 0x2403a5900
    Mutex (nil)(0, 0) idn 0 oper NONE(0)
    FSO mutex uid 262142 efd 0 whr 0 slp 0
   KGX Atomic Operation Log 0x2403a5968
    Mutex (nil)(0, 0) idn 0 oper NONE(0)
    FSO mutex uid 262142 efd 0 whr 0 slp 0
   KGX Atomic Operation Log 0x2403a59d0
    Mutex (nil)(0, 0) idn 0 oper NONE(0)
    FSO mutex uid 262142 efd 0 whr 0 slp 0
  ----------------------------------------

*Step-3 :-  * Then searching for the *pid-207* in the start of the
trace , i found *sid - 1072 ser 23 *and its pointing to the same ALTER
TABLE statement. The OSPID is the same one for which the trace gets
generated. And if I try to see the DML in this sesion before this ALTER
it's the same one, which has just finished up doing the COMMIT for the
direct path load.

So I am wondering , is it possible that we are hitting any bug due to
which Oracle is actually not releasing the Exclusive table lock immediately
post Commit of the direct path load ? And that is why , even if we are
trying to do the ALTER post COMMIT in the same session we are hitting
Ora-0054 as it blocks itself? And then should we put some wait/sleep
statement post commit i.e. before the ALTER to avoid this bug?

207: USER ospid 383076 sid 1072 ser 23,
         Cmd: ALTER TABLE

On Mon, Jun 13, 2022 at 4:28 PM Pap <oracle.developer35@xxxxxxxxx>
wrote:

Thank You So much Nenad.

Unfortunately while checking with the team got to know that the other
trace files has been removed from the location. So i will wait for the
next
occurrence of the error to capture and analyze all the traces wrt that one
ospid.

On Sun, 12 Jun 2022, 9:24 pm Noveljic Nenad, <
nenad.noveljic@xxxxxxxxxxxx> wrote:

Your system dump apparently consists of several files.

At the very beginning of the system dump there must be a section that
identifies the blockers for each process and looks something like this:

157: USER ospid 1385877 sid 786 ser 49681, waiting for 'log file
switch (checkpoint incomplete)'

  Cmd: INSERT

  Blocked by inst: 1, sid: 266, ser: 13238

  Final Blocker inst: 1, sid: 266, ser: 13238

159: USER ospid 1295896 sid 796 ser 48963, waiting for 'log file
switch (checkpoint incomplete)'

  Cmd: INSERT

Blocked by inst: 1, sid: 266, ser: 13238

  Final Blocker inst: 1, sid: 266, ser: 13238

First, check who was blocking the Powercenter process with ospid 89092
(Blocked by…). Then search **all** system dump trace files
(**89092**.trc)
for the blocker to look at its state objects. Once you now the sid and
serial of the blocker, you can position yourself to its state objects
with
the following search expression: sid: xxx ser: xxx . One of the state
objects will be for the cursor that blocked your DDL on MAIN_TAB.

Best regards,

Nenad

*From:* Pap <oracle.developer35@xxxxxxxxx>
*Sent:* Sonntag, 12. Juni 2022 14:06
*To:* Noveljic Nenad <nenad.noveljic@xxxxxxxxxxxx>
*Cc:* Oracle L <oracle-l@xxxxxxxxxxxxx>
*Subject:* Re: Reading resource busy error system trace

Thank You Nenad.

So actually here the session/Alter statement, which failed, is from
the informatica power center session only. So do you mean we should
search
the body of the original trace file from which this trace continued from
i.e  "u01/app/oracle/diag/rdbms/DB1p/DB11/trace/DB11_ora_89092_8.trc"
rather this one. And in that trace the sql statements near the "final
blocker" string will be the cause of this error?

____________________________________________________

Please consider the environment before printing this e-mail.

Bitte denken Sie an die Umwelt, bevor Sie dieses E-Mail drucken.

Important Notice

This message is intended only for the individual named. It may
contain confidential or privileged information. If you are not the named
addressee you should in particular not disseminate, distribute, modify or
copy this e-mail. Please notify the sender immediately by e-mail, if you
have received this message by mistake and delete it from your system.
Without prejudice to any contractual agreements between you and us
which shall prevail in any case, we take it as your authorization to
correspond with you by e-mail if you send us messages by e-mail. However,
we reserve the right not to execute orders and instructions transmitted
by
e-mail at any time and without further explanation.
E-mail transmission may not be secure or error-free as information
could be intercepted, corrupted, lost, destroyed, arrive late or
incomplete. Also processing of incoming e-mails cannot be guaranteed. All
liability of Vontobel Holding Ltd. and any of its affiliates (hereinafter
collectively referred to as "Vontobel Group") for any damages resulting
from e-mail use is excluded. You are advised that urgent and time
sensitive
messages should not be sent by e-mail and if verification is required
please request a printed version.
Please note that all e-mail communications to and from the Vontobel
Group are subject to electronic storage and review by Vontobel Group.
Unless stated to the contrary and without prejudice to any contractual
agreements between you and Vontobel Group which shall prevail in any
case,
e-mail-communication is for informational purposes only and is not
intended
as an offer or solicitation for the purchase or sale of any financial
instrument or as an official confirmation of any transaction.
The legal basis for the processing of your personal data is the
legitimate interest to develop a commercial relationship with you, as
well
as your consent to forward you commercial communications. You can
exercise,
at any time and under the terms established under current regulation,
your
rights. If you prefer not to receive any further communications, please
contact your client relationship manager if you are a client of Vontobel
Group or notify the sender. Please note for an exact reference to the
affected group entity the corporate e-mail signature. For further
information about data privacy at Vontobel Group please consult
www.vontobel.com.

Follow-Ups:
- Re: Reading resource busy error system trace
  - From: Jonathan Lewis

References:
- Re: Reading resource busy error system trace
  - From: Pap
- Re: Reading resource busy error system trace
  - From: Jonathan Lewis
- Re: Reading resource busy error system trace
  - From: Pap
- Re: Reading resource busy error system trace
  - From: Jonathan Lewis

Re: Reading resource busy error system trace

Other related posts: