RE: High consistent Gets and response time in prod

From: "Mark W. Farnham" <mwf@xxxxxxxx>
To: <jlewisoracle@xxxxxxxxx>, "'ORACLE-L'" <oracle-l@xxxxxxxxxxxxx>
Date: Sat, 1 May 2021 09:16:54 -0400

Improper cleanout is not a required contribution, but can make “empty front”
more expensive. In your case it seems you read enough blocks to explain the
elapsed time.

Oracle does not order either freelists or free bitmaps so that lowest blocks
are used first (even on conventional inserts), and append inserts go high
regardless of available free blocks. Filtering on rownum for 1 row and getting
that huge number of reads is sufficient evidence for me that this is “empty
front.” (running snapstat or Tanel’s much more complete snapper  is not a bad
idea, ever, if you want to prove what’s going on in detail. I just don’t see a
need for it this time.)

Since at least Oracle 4 this has been a problem, but until the advent of
“append” it was not very commonly a noticeable problem. (When combined with
pending cleanout and only reads into the PGA it can be hilariously expensive).
Jerry told me they never saw it in 2 or 3, but that may have been because it
was standard operating procedure to full export, full reload in 2 and 3. Every
4 hours for 2, daily in 3.

Combined with unindexed checks for row existence to gate whether or not to run
some transactional process empty front can be a real drag.

Since Oracle does not have a way to set the low water mark to the lowest block
having any contents, fixing this probably means reloading. (Adding that to the
code without creating a bunch of bugs is more probably more difficult than it
sounds, and the existence of the “move” operation mitigates against
implementing this probably much cheaper operation.)

Avoiding this problem (if you have partitioning) in the case of rows with born
on dates that correlate with the discard date is simply to partition by date,
throwing away old partitions as they become empty.

IF this was the result of a long overdue cleanup (Oracle E-biz Workflow tables
come to mind), and you do have non-direct inserts routinely, a onetime move or
reload is probably all you need. IF you routinely delete obsolete rows and only
append new rows in, periodically (once daily, weekly, monthly?) tracking the
number of consistent gets to find the first row can give you a diagnostic of
when it is worthwhile to re-org.

The thinking on that is roughly: If the sum of time spent reading past empty
front blocks in your application exceeds the time to re-build plus the hassle
of operationally scheduling the rebuild, then do it. The hassle of
operationally scheduling a rebuild varies wildly with the polar easy for shops
that are strictly only online from 9 AM to 5 PM in a single time zone and the
biggest table you have can be rebuilt overnight to extremely hard for globally
available 24x7x365 operations with giant monoliths.

The special case of empty front is one of the culprits in propagation of the
freespace fragmentation myth and old school “rebuild everything” notions.
Unless an unmonitored rebuild everything routine operation is definitively the
cheapest thing for you to do, don’t go that way.

Very most often highly selective rebuilds for a reason are useful. This sure
looks like one.

I presume you loaded a fresh empty table from prod for test, right?

mwf

From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On ;
Behalf Of Jonathan Lewis
Sent: Saturday, May 01, 2021 4:05 AM
To: ORACLE-L
Subject: Re: High consistent Gets and response time in prod

Various scenarios could produce this.

As others have said you need more information - Sayan suggested Tanel's
"snapper" which is a tool that everyone should know about, but if you haven't
used it before

Start a new session

run the query

select
        name, value
from
        V$mystat
join    v$statname
using
        (statistic#)
where
        value != 0
/

That will tell us what sort of work the session has done.

There are two obvious possibilities:

*  it's doing a tablescan of a very large table which has had almost all the
data deleted butbut the blocks haven't been cleaned out properly.

*  it's doing a huge amount of work to get read-consistent versions of blocks.

In the first case we will see lots of multiblock reads, in the second we'll see
lots of "xxxx - undo records applied".

You mentioned the 30 hour parallel query - are these actually related to DML,
have they got uncommitted transactions sitting behind them

Check v$lock for locks TM locks on the table

Check v$transaction for transactions that have done a lot of work (used_urec)
and started a long time in the past.

You could also query v$session_event

select  event, total_waits, time_waited
from    v$session_event
where   sid = sys_context('userenv','sid')
order by
        time_waited
/

And, of course, you could enable extend tracing (10046 level 8) before running
the query to get a detailed sequence of activity for the session

Regards

Jonathan Lewis

P.S.  The fact that the table is 11.2M blocks (when it looks like it should be
less) is interesting - and the 11.5M physical reads suggests a very long
(empty) tablescan to find the first row with a few hundred thousand reads of
undo to check commit times ("transaction table consistent reads - undo records
applied"). There's a hint of big processes doing "delete everything, insert it
all again" and not cleaning up the space management bitmaps properly. (Oracle
has a history of this, but I don't know if it can still happen in 19c.)

On Fri, 30 Apr 2021 at 23:39, Ram Raman <veeeraman@xxxxxxxxx> wrote:

Hi,

Fri evening and I have a problem. grrr..

We have a simple SELECT statement like this:

select * from c.pd where rownum <=1

---------------------------------------------------------------------------------

| Id  | Operation          | Name       | Rows  | Bytes | Cost (%CPU)| Time
|

---------------------------------------------------------------------------------

|   0 | SELECT STATEMENT   |            |     1 |   993 |     4   (0)| 00:00:01
|

|*  1 |  COUNT STOPKEY     |            |       |       |            |
|

|   2 |   TABLE ACCESS FULL| PD |     1 |   993 |     4   (0)| 00:00:01 |

---------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

   1 - filter(ROWNUM<=1)

Statistics

----------------------------------------------------------

          0  recursive calls

          0  db block gets

   11290619  consistent gets

   11289575  physical reads

       8024  redo size

       8510  bytes sent via SQL*Net to client

        408  bytes received via SQL*Net from client

          2  SQL*Net roundtrips to/from client

          0  sorts (memory)

          0  sorts (disk)

          1  rows processed

First thing that jumps out is the number of consistent gets, which is in the
same range in prod again and again. In test also, the consistent gets are in
the same range for repeated executions - just couple of dozens.

TABLE_NAME                               NUM_ROWS LAST_ANAL       BLOCKS
AVG_ROW_LEN   PCT_FREE   PCT_USED EMPTY_BLOCKS  AVG_SPACE

------------------------ ------------------------ --------- ------------
----------- ---------- ---------- ------------ ----------

CHAIN_CNT PAR MON COMPRESS

---------- --- --- --------

PD                                 1,420,080 29-APR-21   11,537,288         993
        10                       0        0

         0 NO  YES DISABLED

What is confusing is that the number of blocks the table takes in the prod. I
did some calc and I feel that it should take around 172,000+ blocks, given our
block size is 8K, but I am unable to explain 11.5M blocks.

References:
- Re: High consistent Gets and response time in prod
  - From: Jonathan Lewis

RE: High consistent Gets and response time in prod

Other related posts: