Jonathan
Small correction as I was working on another subpartition issue also but in
this current issue, the table only have partitions (list Partition) and not
subpartition or hash partition which was my mistake.
Average Row length is 77 as per dba_tables and Num_rows/blocks come out 403
I created some Event trace and will be checking and update the details. Tables
are having a high number of DML where 80% inserts and rest are Deletes.
I agreed that currently in one non prod setup, I created the non compressed
table and so far it is not having the issue while other non-prod where it is
still compressed is having slowness. I had open the SR with Oracle but not much
success as lots more been pointed here in the list. The reason for not moving
all env to uncompress is due to high number of table and also size of the data
in them. While testing in UAT earlier this year for more than 2 month with god
load has not experienced the issue. Company then decided to make 12.2 as
standard and so moved some env to 12.2 started experiencing with few weeks.
TxSanjay
On Thursday, May 23, 2019, 1:51:18 PM EDT, Jonathan Lewis
<jonathan@xxxxxxxxxxxxxxxxxx> wrote:
The really messy thing about a case like this is that it might depend very
heavily on the history. You won't KNOW that the problem is fixed until you find
that it simply re-appears one day.
For example - if you inserted 500,000 rows across multiple subpartitions in one
transaction, but failed and rolled back would all the ASSM bitmap blocks have
been brought up to date on the rollback ? Would there be some feature of the
rolled back blocks that means that still appear to have space while
simultaneously being unused. (A detail that was missing from the snapper
report was "ASSM cbk:blocks marked full" which we might expect to see if Oracle
has visited a block then rejected it because it was full when the bitmap said
it was empty.
When you copied the table did you copy the exact subpartitioning layout ? Was
it a CTAS with or without nologging, or a create / insert append, or a
create/insert. The problem may never appear on the copy - until you've
inserted the next 100M rows - because you created the copy using a method that
didn't match the actual data acquisition. If it's a full copy of the data, how
well do the numbers of blocks used match up, are they nearly identical.
Can you tell us the average row length, the average number of rows per block in
the original so we can get an idea of how well compression worked.
Regards
Jonathan Lewis
________________________________________
From: Sanjay Mishra <smishra_97@xxxxxxxxx>
Sent: 23 May 2019 18:20:08
To: mwf@xxxxxxxx; Jonathan Lewis; 'Oracle-L Freelists'
Subject: Re: Simple Insert - very slow
Mark
This setup is also Hash Partition and subpartition. I will also be checking
with the non-assm tablespace. It is checked that behavior is not repeated with
non-compressed ASSM table as per the initial test but might have to do some
bigger deleted and insert to validate.
Tx
Sanjay
On Thursday, May 23, 2019, 11:31:24 AM EDT, Mark W. Farnham <mwf@xxxxxxxx>
wrote:
specifically, this condition has been seen with subpartitioning hash on
unique_id (16) used to create multiple concurrent insert points, followed by
massive deletes on monotonically increasing unique_id so that the deleted
hashed widely to all 16 storage threads.
mwf
-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>
[mailto:oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>]
On Behalf Of Mark W. Farnham
Sent: Thursday, May 23, 2019 11:12 AM
To: 'Jonathan Lewis'; 'Oracle-L Freelists'
Subject: RE: Simple Insert - very slow
nods.
ergo the words massive and uncommitted.
definitely an edge case.
-----Original Message-----
From: Jonathan Lewis
[mailto:jonathan@xxxxxxxxxxxxxxxxxx<mailto:jonathan@xxxxxxxxxxxxxxxxxx>]
Sent: Thursday, May 23, 2019 10:46 AM
To: Mark W. Farnham; 'Oracle-L Freelists'
Subject: Re: Simple Insert - very slow
Mark,
SInce it's 12.2 I think we'd see a relatively large value for "ASSM
gsp:Optimized reject DB" unless a theoretical very large uncommitted delete
made space empty in a very large number of blocks that were extremely widely
scattered so that no two data blocks with deleted rows appeared in the same
L1 bitmap block. (i.e. something like 128 blocks between every pair of
blocks with not yet committed free space).
Unless, possibly, the reject cache has been disabled..
Regards
Jonathan Lewis
________________________________________
From: Mark W. Farnham <mwf@xxxxxxxx<mailto:mwf@xxxxxxxx>>
Sent: 23 May 2019 13:27:51
To: Jonathan Lewis; 'Oracle-L Freelists'
Subject: RE: Simple Insert - very slow
Perchance do you have any massive uncommitted large deletes in the same
tablespace?
IF so, to rule out a particular ASSM bug (and nearly everyone is using ASSM
to manage free space lists now, right?) you could try creating a freelist
managed tablespace, slapping the table over there (or a clone), and see how
it goes there.
That is a "shot in the dark." Don't let that disrupt following JL's advice
on a stepwise solution. But if you have resources, you can clone the table
and mirror the processes operating on the copy without much human work and
without interrupting your search for a root cause.
mwf
-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>
[mailto:oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>]
On Behalf Of Jonathan Lewis
Sent: Thursday, May 23, 2019 5:55 AM
To: 'Oracle-L Freelists'
Subject: Re: Simple Insert - very slow
Sanjay,
I keep forgetting that the "cell physical block read" (etc.) events don't
tell you about database block addresses.
It would be sensible to find out something about which blocks were being
read - one option would be to use event 10200 for a brief period for that
SQL_ID, but since you've got lost write protection enabled we could take
advantage of that and dump some redo.
The command is of the form:
alter system dump logfile '/u01/app/oracle/oradata/orcl12c/redo03.log'
layer 23 opcode 2
rba min 2781 1
rba max 2781 1000
;
Layer 23 opcode 2 is the block read record so that is fixed The file name is
any online or archived redo log name you fancy the rba min/max has to
specify the correct sequence number for the file name (v$log.sequence# /
v$log_history.sequence#) the 1, 1000 that I've shown is to limit the dump to
the first 1,000 blocks of the log file - since you have 6,000 BRRs per
second you might want to limit the dump to a similar number of redo blocks.
A BRR (block read record) looks like:
REDO RECORD - Thread:1 RBA: 0x000add.00000019.01b0 LEN: 0x004c VLD: 0x10
CON_UID: 2846920952
SCN: 0x00000000025a7c13 SUBSCN: 1 05/23/2019 10:42:51 CHANGE #1 CON_ID:3
TYP:2 CLS:6 AFN:9 DBA:0x00407930 OBJ:40 SCN:0x00000000001a1e2a SEQ:2 OP:23.2
ENC:0 RBL:0 FLG:0x0000 Block Read - afn: 9 rdba: 0x00407930
BFT:(1024,4225328) non-BFT:(1,31024)
scn: 0x00000000001a1e2a seq: 0x02
flags: 0x00000006 ( dlog ckval )
where: qeilwh05: qeilbk
The interesting information is likely to be
a) the TYP and CLS
b) the rdba - translated to the file/block (BFT is "big file tablespace")
c) where
What we're interested in is whether Oracle keeps re-reading the same blocks
time , and if so what type they are and from which function the read is
requested. (And Stefan might be able to tell you the signficance of the
function if any one stands out).
Regards
Jonathan Lewis
________________________________________
From: Sanjay Mishra <smishra_97@xxxxxxxxx<mailto:smishra_97@xxxxxxxxx>>
Sent: 22 May 2019 23:22
To: Jonathan Lewis; 'Oracle-L Freelists'; Mark W. Farnham
Subject: Re: Simple Insert - very slow
Mark/Jonathan
When I created a new table with the same compression, Issue is not appearing
and not sure if this will come back again after heavily DML operation. As it
was fine for several days after imported earlier and suddenly started
behaving badly. I will continue to do the test with the original table as
suggested above by Stefan/Jonathan so as to make sure RCA is done and the
same is not going to pop up again
Tx
Sanjay
On Monday, May 20, 2019, 9:29:15 AM EDT, Mark W. Farnham
<mwf@xxxxxxxx<mailto:mwf@xxxxxxxx>>
wrote:
Taken all together, I would advocate copying this single table to a
tablespace brand new, all its own and re-running the test.
That will quickly factor out accumulated problems from one time run
problems. Whether you do this as a clone (not the real table) or for real is
an interesting question. A test clone would avoid other activities updating
in competition. A move "for real" might make your problem go away without
fully learning the root cause.
This would help focus on the nature of your actual problem, together with
making it easier to see if the MOS note referenced in the thread is
relevant.
The difference between a Millsap trace on the current object versus the
trace of the same updates to a clone of the table should be very
illuminating.
-----Original Message-----
From:
oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx><mailto:oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>>
[mailto:oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx><mailto:oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>>]
On Behalf Of Jonathan Lewis
Sent: Monday, May 20, 2019 7:00 AM
To: Oracle-L Freelists
Subject: Re: Simple Insert - very slow
Sanjay,
I don't use snapper often enough to be sure of exactly how it works, but
your figures for a "1 second single snapshot" suggest that it took nearly 2
seconds to do a one second snapshot - which could indicate that the machine
is overloaded and work outside the work you are doing is impacting in your
attempt to load data.
The things I note about the figures are:
a) you read 6,111 blocks - and decrypt 6,110 blocks (so presumably the table
is either encrypted, or in an enctryped tablespace)
b) you generate 6,120 redo entries of which 6,110 are for lost write
detection (so I think you've got "db_lost_write_protect = full" (or your
table is in the SYSTEM tablespace !)
c) you report only 11 db block changes and 750 bytes of "other" redo - so
you haven't managed to insert many rows, and
d) you've got only 3 "gc current block received" but waited 629 times on "gc
current block busy" - which is odd, and also suggests that something else is
hammering away at the table you're trying to insert to.
e) you've examined 2,478 level 1 bitmap blocks leading to 12,388 data block
rejections - so there's a massive conflict between the space the bitmap says
is available and the space that seems to be available when the session
checks the targetted blocks. (This is probably why Stefan is suggesting
stack tracing) to find out why there's such a discrepancy between the
bitmaps and the actual blocks. I note from an earlier post, though that the
table is compressed and has a column txn_upd_dt. If anything is updating the
rows in this table you're likely to end up with lots of migrated rows, and
there used to be a complete foul-up with migrated rows, ITLs and ASSM that
resulted in a major collision and waste of resources.
You've got some trace data for the insert - and identified the blocks being
read as from the table. Do you get file and block ids from the trace ? If
so then (a) do you see the same blocks being re-read many times over the
course of a couple of minutes, and (b) can you dump a few of the blocks that
are read (especially if they are in the middle of a stream of consecute
block ids) so that we can see if they are the target of a lot of migration
and have large numbers of ITL entries (check for "itc" and the number
following it).
Regards
Jonathan Lewis
________________________________________
From:
oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx><mailto:oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>>
<oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx><mailto:oracle-l-bounce@xxxxxxxxxxxxx<mailto:oracle-l-bounce@xxxxxxxxxxxxx>>>
on
behalf of Stefan Koehler
<contact@xxxxxxxx<mailto:contact@xxxxxxxx><mailto:contact@xxxxxxxx<mailto:contact@xxxxxxxx>>>
Sent: 20 May 2019 10:34
To: Sanjay Mishra
Cc: Oracle-L Freelists
Subject: Re: Simple Insert - very slow
Hello Sanjay,
it seems like this is really an ASSM issue based on the stats of "ASSM gsp*"
and "ASSM cbk*".
Now it would be time to gather stack traces and check in more detail with
the already mentioned trace events 10320 and/or 10612.
You can also try to disable the ASSM rejection cache
(_enable_rejection_cache) or set event 10019 to level 1 but these would be
just some shot in the dark.
Best Regards
Stefan Koehler
Independent Oracle performance consultant and researcher
Website: http://www.soocs.de
Twitter: @OracleSK
Sanjay Mishrageschrieben:
<smishra_97@xxxxxxxxx<mailto:smishra_97@xxxxxxxxx><mailto:smishra_97@xxxxxxxxx<mailto:smishra_97@xxxxxxxxx>>>
hat
am 18. Mai 2019 um 19:08
what is best way to interpret snapper data but pasting the details and if
Stefan
Thanks for good document created by you to troubleshoot. Need to check
as
----------------------------------------------------------------------------
Sampling SID 1070 with interval 1 seconds, taking 1 snapshots...
SID @INST, USERNAME , TYPE, STATISTIC, DELTA, HDELTA/SEC, %TIME, GRAPH , NUM_WAITS, WAITS/SEC,
----------------------------------------------------------------------------
1070 @2, SYSADMIN, STAT, session logical reads, 13865, 7.73k, , , , ,
1070 @2, SYSADMIN, STAT, user I/O wait time, 141, 78.65, , , , ,
1070 @2, SYSADMIN, STAT, non-idle wait time, 141, 78.65, , , , ,
1070 @2, SYSADMIN, STAT, non-idle wait count, 12230, 6.82k, , , , ,
1070 @2, SYSADMIN, STAT, physical read total IO requests, 6112, 3.41k, , , , ,
1070 @2, SYSADMIN, STAT, physical read requests optimized, 6111, 3.41k, , , , ,
1070 @2, SYSADMIN, STAT, physical read total bytes optimized, 50069504, 27.93M, , , , ,
1070 @2, SYSADMIN, STAT, physical read total bytes, 50069504, 27.93M, , , , ,
1070 @2, SYSADMIN, STAT, cell physical IO interconnect bytes, 50069504, 27.93M, , , , ,
1070 @2, SYSADMIN, STAT, gcs messages sent, 3, 1.67, , , , ,
1070 @2, SYSADMIN, STAT, db block gets, 13860, 7.73k, , , , ,
1070 @2, SYSADMIN, STAT, db block gets from cache, 13860, 7.73k, , , , ,
1070 @2, SYSADMIN, STAT, db block gets from cache (fastpath), 7737, 4.32k, , , , ,
1070 @2, SYSADMIN, STAT, consistent gets, 1, .56, , , , ,
1070 @2, SYSADMIN, STAT, consistent gets from cache, 1, .56, , , , ,
1070 @2, SYSADMIN, STAT, consistent gets pin, 1, .56, , , , ,
1070 @2, SYSADMIN, STAT, consistent gets pin (fastpath), 1, .56, , , , ,
1070 @2, SYSADMIN, STAT, logical read bytes from cache, 113541120, 63.34M, , , , ,
1070 @2, SYSADMIN, STAT, physical reads, 6111, 3.41k, , , , ,
1070 @2, SYSADMIN, STAT, physical reads cache, 6111, 3.41k, , , , ,
1070 @2, SYSADMIN, STAT, physical read IO requests, 6112, 3.41k, , , , ,
1070 @2, SYSADMIN, STAT, physical read bytes, 50069504, 27.93M, , , , ,
1070 @2, SYSADMIN, STAT, db block changes, 11, 6.14, , , , ,
1070 @2, SYSADMIN, STAT, free buffer requested, 6112, 3.41k, , , , ,
1070 @2, SYSADMIN, STAT, hot buffers moved to head of LRU, 958, 534.39, , , , ,
1070 @2, SYSADMIN, STAT, free buffer inspected, 6144, 3.43k, , , , ,
1070 @2, SYSADMIN, STAT, shared hash latch upgrades - no wait, 7, 3.9, , , , ,
1070 @2, SYSADMIN, STAT, blocks decrypted, 6110, 3.41k, , , , ,
1070 @2, SYSADMIN, STAT, redo entries, 6120, 3.41k, , , , ,
1070 @2, SYSADMIN, STAT, redo size, 465504, 259.67k, , , , ,
1070 @2, SYSADMIN, STAT, redo entries for lost write detection, 6110, 3.41k, , , , ,
1070 @2, SYSADMIN, STAT, redo size for lost write detection, 464756, 259.25k, , , , ,
1070 @2, SYSADMIN, STAT, redo subscn max counts, 7, 3.9, , , , ,
1070 @2, SYSADMIN, STAT, file io wait time, 1408659, 785.78k, , , , ,
1070 @2, SYSADMIN, STAT, gc current blocks received, 3, 1.67, , , , ,
1070 @2, SYSADMIN, STAT, gc local grants, 6116, 3.41k, , , , ,
1070 @2, SYSADMIN, STAT, ASSM cbk:blocks examined, 12366, 6.9k, , , , ,
1070 @2, SYSADMIN, STAT, ASSM gsp:L1 bitmaps examined, 2478, 1.38k, , , , ,
1070 @2, SYSADMIN, STAT, ASSM gsp:L2 bitmaps examined, 1, .56, , , , ,
1070 @2, SYSADMIN, STAT, ASSM gsp:reject db, 12388, 6.91k, , , , ,
1070 @2, SYSADMIN, STAT, buffer is pinned count, 230, 128.3, , , , ,
1070 @2, SYSADMIN, STAT, cell flash cache read hits, 6723, 3.75k, , , , ,
1070 @2, SYSADMIN, TIME, background cpu time, 365192, 203.71ms, 20.4%, [## ], , ,
1070 @2, SYSADMIN, TIME, background elapsed time, 1273623, 710.45ms, 71.0%, [######## ], , ,
1070 @2, SYSADMIN, WAIT, gc current block busy, 629, 350.87us, .0%, [ ], 3, 1.67,
1070 @2, SYSADMIN, WAIT, cell single block physical read, 1557638, 868.88ms, 86.9%, [WWWWWWWWW ], 6746, 3.76k,
--
-- End of Stats snap 1, end=2019-05-18 12:58:58, seconds=1.8