RE: Archive Log Size

From: "Mark W. Farnham" <mwf@xxxxxxxx>
To: <henry.poras@xxxxxxxxx>
Date: Tue, 16 Mar 2021 14:43:12 -0400

I may have been shipped beta versions with some tracing turned on that showed
the switch reason. I can’t find any notes on how to turn it on via trace from
the time frame when I was checking into that, which was shorting after I
convinced them that it would be best to NOT copy the entire file for the short
manual switches used to do clearly document and minimize the set of redo logs
required as part of an old fashioned “hot” backup. That would have been
sometime between 1988 and 1994. (Early V6 copied the entire file regardless of
the end of active redo marker, thinking that matching the exact size [and
binary compare if you really wanted to be sure] would be the simplest way to
know the file had been copied correctly to the backup. That was true, but not
convenient for very large online logs with operational manual switches.)

I didn’t hunt for the trace flags that may or may not turn on that information
in current releases. I would think that would be in the most basic trace with
nothing turned on, but it could have been chucked away as something customers
would ever need to know.

Sorry I don’t have any better recollection, and my archives from COAT went into
the dustbin soon after it was sold to Bain (2006).

I’d check with support to see if there is a simple trace setting that would
have that information. I would be shocked if they don’t know why they are
switching when they switch.

mwf

From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On ;
Behalf Of Henry Poras
Sent: Tuesday, March 16, 2021 9:29 AM
To: Mark W. Farnham
Cc: Jonathan Lewis; ORACLE-L
Subject: Re: Archive Log Size

Thanks for the ideas. I hadn't checked all of those trace files. Unfortunately,
I didn't see anything useful. Maybe I'll cycle back and see if I missed
something. I'm also still trying to think of a way to distinguish between local
cause and standby cause.

Henry

On Fri, Mar 12, 2021 at 1:38 PM Mark W. Farnham <mwf@xxxxxxxx> wrote:

I am rusty digging in on these details and they have almost certainly changed a
few things, but if I recall correctly then once upon a time the reason for a
switch appeared in one or more of the alert logs from the set alert.log, the
lgwr log, the ckpt log, and the arch log.

Now I always tried to keep quite a few log groups in the rotation (on the order
of ten) because there are several things that need to be complete for a log
that has been switched out of before it can be used again.

Finally mean time to recover of course cares whether the redo data has made it
to the standby, but it is the accumulated unapplied redo maximum that seems
likely to be applicable within that time that still might drive a checkpoint
and switch. I am also rusty on those mechanics and they have probably also
changed since last I dug into the precise blow by blow order of mandatory
events to ensure reliability in all edge cases.

So, are there any hints for the switch reasons in the logs? IF not, can ARCH,
CKPT, and LGWR be traced at a higher level to cause them to puke switch reasons?

mwf

From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On ;
Behalf Of Henry Poras
Sent: Friday, March 12, 2021 1:04 PM
To: Jonathan Lewis
Cc: ORACLE-L
Subject: Re: Archive Log Size

Wow. I just reread my last response and it reads just like what it was, free
form thoughts. Let me try and clarify what I meant.

Initially, Jonathan's supposition that somehow the MTTR setting gets the
standby to force these short log switches seems reasonable. However, upon
further thought, I have some questions. First, I typically associate
fast_start_mttr_target with checkpointing, and that isn't a standby thing.
Second, the Standby gets its data from LGWR (the creator for dest_id=2), so the
data should be in the SRL without a log switch

This also got me wondering if LNS (LGWR) falling behind in writing from the log
buffer to the standby SRL might play a role. So I looked at x$logbuf_readhist,
but the HITRATE on all 3 nodes was 98%, so it looks like no backlog there.

  1  select indx, inst_id, bufsize, rdmemblks, rddiskblks, hitrate, bufinfo
  2* from x$logbuf_readhist
primary:sys@ohcops1> /

      INDX    INST_ID BUFSIZE              RDMEMBLKS                RDDISKBLKS
                HITRATE BUFINFO
---------- ---------- -------------------- ------------------------
------------------------ ---------- --------------------
         0          1 107812K              53747612                 1869757119
                      2
         1          1 129374K              361691660                1561813071
                     18
         2          1 150937K              727333789                1196170942
                     37
         3          1 172499K              1303643599               619861132
                     67
         4          1 194062K              1834188083               89316648
                     95 TARGET-90
         5          1 215624K              1896707348               26797383
                     98 CURRENT
         6          1 237186K              1897284975               26219756
                     98
         7          1 258749K              1897284975               26219756
                     98
         8          1 280311K              1897284975               26219756
                     98
         9          1 301874K              1897284975               26219756
                     98
        10          1 323436K              1897284975               26219756
                     98
        11          1 344998K              1897284975               26219756
                     98
        12          1 366561K              1897284975               26219756
                     98
        13          1 388123K              1897284975               26219756
                     98
        14          1 409686K              1897284975               26219756
                     98
        15          1 431248K              1897284975               26219756
                     98

16 rows selected.

I'm back to wondering if there is a way to distinguish between a local vs.
standby effect here.

Henry

On Thu, Mar 11, 2021 at 10:43 AM Henry Poras <henry.poras@xxxxxxxxx> wrote:

Your supposition makes sense.I associate fast_start_mttr with checkpointing,
which isn't happening on the standby. Just reading directly from the SRL.
However, on the other hand, it would still be strange for this to be the cause
of the small archive. We also know from earlier that the Standby
(log_archive_dest_2) is getting its data from LGWR so the log switches
shouldn't be necessary to obtain the same mttr on the standby. I also tend to
get confused by some of the interrelations between fsat_start_mttr_target,
archive_lag_target, ...

If it is mttr related, I would guess that doubling my fast_start_mttr_target
should double the size of my archive logs (seeing that the current archive
sizing is pretty consistent). Not sure if I'll be able to do that, but it's
worth a try. I think it would be pretty harmless.

Henry

On Thu, Mar 11, 2021 at 10:07 AM Jonathan Lewis <jlewisoracle@xxxxxxxxx> wrote:

I wonder if the MTTR gets reflected by something in the Standby.

If you're mean time to recover has to be 15 seconds then maybe Oracle does
something to ensure that the standby could recover inside the MTTR as well -
and archiving log files prematurely might be a mechanism to allow that to
happen.

Regards

Jonathan Lewis

On Wed, 10 Mar 2021 at 15:29, Henry Poras <henry.poras@xxxxxxxxx> wrote:

fast_start_mttr_target               integer     15

gv$instance_recovery

INST_ID                        : 1
RECOVERY_ESTIMATED_IOS         : 18594
ACTUAL_REDO_BLKS               : 369677
TARGET_REDO_BLKS               : 1002837
LOG_FILE_SIZE_REDO_BLKS        : 3821985
LOG_CHKPT_TIMEOUT_REDO_BLKS    : 1002837
LOG_CHKPT_INTERVAL_REDO_BLKS   :
FAST_START_IO_TARGET_REDO_BLKS :
TARGET_MTTR                    : 15
ESTIMATED_MTTR                 : 12
CKPT_BLOCK_WRITES              : 91402225
OPTIMAL_LOGFILE_SIZE           : 457
ESTD_CLUSTER_AVAILABLE_TIME    : 3
WRITES_MTTR                    : 211343278
WRITES_LOGFILE_SIZE            : 0
WRITES_LOG_CHECKPOINT_SETTINGS : 0
WRITES_OTHER_SETTINGS          : 0
WRITES_AUTOTUNE                : 24832295
WRITES_FULL_THREAD_CKPT        : 97904

-----------------

INST_ID                        : 2
RECOVERY_ESTIMATED_IOS         : 14247
ACTUAL_REDO_BLKS               : 268501
TARGET_REDO_BLKS               : 1047425
LOG_FILE_SIZE_REDO_BLKS        : 3821985
LOG_CHKPT_TIMEOUT_REDO_BLKS    : 1047425
LOG_CHKPT_INTERVAL_REDO_BLKS   :
FAST_START_IO_TARGET_REDO_BLKS :
TARGET_MTTR                    : 15
ESTIMATED_MTTR                 : 7
CKPT_BLOCK_WRITES              : 8211142
OPTIMAL_LOGFILE_SIZE           : 596
ESTD_CLUSTER_AVAILABLE_TIME    : 3
WRITES_MTTR                    : 6818539
WRITES_LOGFILE_SIZE            : 0
WRITES_LOG_CHECKPOINT_SETTINGS : 0
WRITES_OTHER_SETTINGS          : 0
WRITES_AUTOTUNE                : 37153280
WRITES_FULL_THREAD_CKPT        : 134238

-----------------

INST_ID                        : 3
RECOVERY_ESTIMATED_IOS         : 7800
ACTUAL_REDO_BLKS               : 262743
TARGET_REDO_BLKS               : 1035890
LOG_FILE_SIZE_REDO_BLKS        : 3821985
LOG_CHKPT_TIMEOUT_REDO_BLKS    : 1035890
LOG_CHKPT_INTERVAL_REDO_BLKS   :
FAST_START_IO_TARGET_REDO_BLKS :
TARGET_MTTR                    : 15
ESTIMATED_MTTR                 : 14
CKPT_BLOCK_WRITES              : 6467159
OPTIMAL_LOGFILE_SIZE           : 285
ESTD_CLUSTER_AVAILABLE_TIME    : 8
WRITES_MTTR                    : 18266031
WRITES_LOGFILE_SIZE            : 0
WRITES_LOG_CHECKPOINT_SETTINGS : 0
WRITES_OTHER_SETTINGS          : 0
WRITES_AUTOTUNE                : 16983159
WRITES_FULL_THREAD_CKPT        : 81827

-----------------

That's a table I almost never look at. I need to go back and review it a bit.

Henry

On Wed, Mar 10, 2021 at 4:59 AM Jonathan Lewis <jlewisoracle@xxxxxxxxx> wrote:

I don't think fast_start_mttr_target has been mentioned explicitly yet, nor
(g)v$instance_recovery.

Any information there.

Regards

Jonathan Lewis

On Tue, 9 Mar 2021 at 22:34, Henry Poras <henry.poras@xxxxxxxxx> wrote:

Jonathan,

According to the docs, "A 0 value disables the time-based thread advance
feature" so I am going under the assumption that 900 means 900. Of course,
sometimes docs lie.

Tim, what you suggest makes sense, but it doesn't match up with what I am
seeing in this environment. There is some data in the first few entries in this
thread. I can summarize over a larger time frame if you like. Basically, each
thread is switching every couple of minutes or less, and the archive logs are
consistently 40-43M (redo logs are 256M).

Henry

Follow-Ups:
- Re: Archive Log Size
  - From: Jonathan Lewis

References:
- Re: Archive Log Size
  - From: Henry Poras
- Re: Archive Log Size
  - From: Jonathan Lewis
- Re: Archive Log Size
  - From: Henry Poras
- Re: Archive Log Size
  - From: Jonathan Lewis
- Re: Archive Log Size
  - From: Henry Poras
- Re: Archive Log Size
  - From: Henry Poras
- Re: Archive Log Size
  - From: Jonathan Lewis
- Re: Archive Log Size
  - From: Henry Poras
- Re: Archive Log Size
  - From: Henry Poras
- Re: Archive Log Size
  - From: Chris Taylor
- Re: Archive Log Size
  - From: Henry Poras
- Re: Archive Log Size
  - From: Tim Gorman
- Re: Archive Log Size
  - From: Henry Poras
- Re: Archive Log Size
  - From: Tim Gorman
- Re: Archive Log Size
  - From: Henry Poras
- Re: Archive Log Size
  - From: Henry Poras
- RE: Archive Log Size
  - From: Mark W. Farnham
- Re: Archive Log Size
  - From: Henry Poras

RE: Archive Log Size

Other related posts: