Can you recommend any sort of monitoring to identify when a SAN is getting
overloaded? In our case it only became apparent when an app started
experiencing latency at the same time for 5-10 minutes every day and we tracked
it down to a batch job which was running on an entirely different cluster but
which shared the same storage unit. Storage denied it was their problem right
up until the point we proved it was.
It would have been nice to have known that before the problems started showing
up. Getting a new storage unit is a slow process.
Jay Miller
Sr. Oracle DBA
201.369.8355
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On ;
Behalf Of Neil Chandler
Sent: Tuesday, March 26, 2019 10:35 AM
To: Chris Taylor
Cc: gogala.mladen@xxxxxxxxx; ORACLE-L
Subject: Re: System stats
In the majority of places I have worked - 5 clients last year - the SANs were
overloaded in 4 of them. They are too frequently sized for capacity and not
throughput/response time. The response time was inevitably variable and System
Stats would not have been helpful on the systems they have. In one of the
clients, some of the critical DB's have dedicated storage but changing the
system stats would have had little to no effect on those systems due to other
measures having been put in place (including using a low optimizer index cost
adj on one system, meaning lots of index use. Just not necessarily the right
indexes.)
The optimizer tries to be all things to all people, and there's lots of
parameters to try to twist it into the shape that you want. The problem is
frequently the abuse of those parameters - especially the global ones - via
googling a problem, believing a silver bullet blog, and the lack of time to
prove the solution so we just throw the fix into the system. It can be
enlightening to strip the more extreme parameters back to their defaults and
see how the system copes.
As an aside, did you run your systems with the default parameters, discover
notable problems and then use the 2 sets of system stats to correct those
problems, or did you put them in from the start and everything was good?
There's a case to be made for using system stats, but I just don't think that
is something that should be used frequently.
Neil.
________________________________
From: Chris Taylor <christopherdtaylor1994@xxxxxxxxx>
Sent: 26 March 2019 12:59
To: Neil Chandler
Cc: gogala.mladen@xxxxxxxxx; ORACLE-L
Subject: Re: System stats
As far as the workload, I used 2 workload stats and swapped between them - one
for the day where the business hours and the off-business hours had their own
personalities (for lack of a better word).
As far as the SAN goes, if enough systems are hitting the SAN enough to cause
the IO rate/throughput to become affected, then its *probably* time for a new
SAN.
Chris