RE: log buffer size and log file syncs

  • From: "CRISLER, JON A" <JC1706@xxxxxxx>
  • To: "Mark W. Farnham" <mwf@xxxxxxxx>, "tanel@xxxxxxxxxxxxxx" <tanel@xxxxxxxxxxxxxx>
  • Date: Tue, 29 May 2012 16:14:41 +0000

Mark, you raise some good points.  REDO is mirrored at the logical level, to 
address possible logical corruption (which we have run into before on other 
systems).  The disk devices are on very large SAN storage with 1 TB of cache 
memory at the SAN level.  So we are doing 2 x the normal redo activity plus 
Data Guard.  I had a Oracle RAC performance specialist look at our Service 
Request and he was of the opinion that setting LGWR to a real-time priority 
would be a good thing to do, and would not do any harm in our environment- 
which agrees with your assessment.  99% of our redo disk service time is <= 
2ms, but we have some outliers that spike up to > 8ms which seems to have a 
cascading affect on performance.  These outliers seem to be caused by anomalies 
in the multipathing software we are using, so some changes are pending to 
remove these outliers.

-----Original Message-----
From: Mark W. Farnham [mailto:mwf@xxxxxxxx] 
Sent: Tuesday, May 29, 2012 11:51 AM
To: tanel@xxxxxxxxxxxxxx; CRISLER, JON A
Cc: 'oracle-l'
Subject: RE: log buffer size and log file syncs

" However, I don't like to fix a problem first and then see whether the
problem existed in first place (trial and error), that's why I asked for
extra information / hard evidence in form of LGWR's snapper output ..."

In most cases I tend to agree with Tanel's call to only take specific
actions in reaction to specific known problems. (Doubly so, since by calling
for running Oracle "memory rich" in 1990 I may have contributed to the
helter skelter bchr landrush. In my defense, I called for running "memory
rich" on a system where at the time my total SGA was 10 megabytes and I was
not even calculating bchr, but rather, I had so little memory that lookup
tables that were nearly never updated were being chronically re-read from
disk [OK, Unix file buffer probably at least some of the time].)

In the case of setting LGWR to a "stays scheduled more often" priority,
where it is convenient to set it (which depends on what release you're
running and the OS), I'm unaware of ways this can cause harm. That being the
case, setting it *may* not solve your current problem, but it is unlikely to
*cause* a problem and be a prophylaxis against future transient problems. So
I consider it a useful standard configuration unless it is contraindicated.

By the way, how many different ways do you have the online logs mirrored,
and is the mirroring configuration forcing multiple writes to the same
devices?  If you're overrunning write caches (or don't have write caches on
those devices) writing multiple times to the same device can force otherwise
unneeded seeks and queueing for no extra physical error prevention. (And
while I'm always up for selecting one of physical multiplexing and having
multiple members of a given group, I know that many folks disagree and have
been saved by the extra member in the case of operator error.) I'd suggest
that if there is indeed a multi-write problem with the log file syncs and
you *cannot* do less work, then removing some copies or at least getting
them to different devices in a non-conflicting pattern with ARCH is called
for, whether or not it actually cures your current problem.

mwf

-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx]
On Behalf Of Tanel Poder
Sent: Tuesday, May 01, 2012 11:54 AM
To: CRISLER, JON A
Cc: oracle-l
Subject: Re: log buffer size and log file syncs

Hi Jon,
Increasing LGWR priority would only help if it was currently starving for
CPU / or waiting too long in the CPU runqueue... Unfortunately on Linux
there's no easy way to measure this directly. If your load is low (let's say
only 10 on a 32 CPU machine) then I'd expect that LGWR priority change isn't
going to help much.

However, I don't like to fix a problem first and then see whether the
problem existed in first place (trial and error), that's why I asked for
extra information / hard evidence in form of LGWR's snapper output ...

Tanel.

On Tue, May 1, 2012 at 6:19 PM, CRISLER, JON A <JC1706@xxxxxxx> wrote:

>  Red Hat Linux 5.  We have async DG running but Real Time apply is 
> also configured, and redo logs are mirrored.  I believe LGWR is not 
> starved for CPU given the overall conditions for the system, but I am 
> finding some info that putting lgwr in a real-time OS priority would 
> be a good thing.****
>
> ** **
>
> The default for _*high_priority*_processes is  LMS*|VKTM  but I have 
> seen some Metalink notes about adding LGWR.  I also saw a blog post 
> that mentioned you discussed setting this parameter at a HOTSOS 
> seminar, and this is something we are considering.  Given all the CPU 
> power in this server, and all the LMS processes, I don't this would 
> pose a problem.****
>
> ** **
>
> alter system set "_high_priority_processes"='LMS*|VKTM|LGWR' 
> scope=spfile
> sid='*';****
>
> ****
>
> ** **
>
> ** **
>
> *From:* tanel@xxxxxxxxxx [mailto:tanel@xxxxxxxxxx] *On Behalf Of 
> *Tanel Poder
> *Sent:* Monday, April 30, 2012 6:21 PM
>
> *To:* CRISLER, JON A
> *Cc:* oracle-l
> *Subject:* Re: log buffer size and log file syncs****
>
>  ** **
>
> Which OS are you on? If it happens to be Solaris, then prstat -mLp 
> *PID*would show the scheduling latency for LGWR. This would help to 
> find out whether LGWR is CPU starved or not.... what load averages do 
> you have?****
>
> ** **
>
> Also, what does snapper say when ran on LGWR? If you have synchronous 
> DG for example, then LGWR would wait for the LNS ack too in addition 
> to the log file parallel write wait, before returning OK back to the 
> committing session ...****
>
> ** **
>
> Tanel.****
>
> On Mon, Apr 30, 2012 at 5:56 PM, CRISLER, JON A <JC1706@xxxxxxx> 
> wrote:***
> *
>
> Interesting thoughts Tanel: in this case of this specific app, the 
> majority of the work is made of up small commits to a handful of 
> tables on a 6 node RAC cluster.  I/O times are generally quite good, 
> and with 32 cores per node the CPU and load average is very low.  Its 
> 11gR1 - I was wondering if some of the tweaks to put LGWR at "real 
> time" priority that are mentioned for 10g also apply to 11g.****
>
> ** **
>

--
//www.freelists.org/webpage/oracle-l


--
//www.freelists.org/webpage/oracle-l


Other related posts: