Re: System stats

  • From: Mladen Gogala <gogala.mladen@xxxxxxxxx>
  • To: oracle-l@xxxxxxxxxxxxx
  • Date: Fri, 12 Apr 2019 19:18:10 -0400

You can monitor your SAN with Nagios agent for NetApp. If your SAN is not NetApp, throw it away and purchase NetApp. Not knowing what SAN equipment you have, that's the best advice I can offer. It is a general advice: if you are not using NetApp SAN, throw it away and buy NetApp. NetApp has your data on tap, like beer.



I don't work for NetApp, but I really, really like their equipment.

On 4/12/19 10:23 AM, (Redacted sender Jay.Miller for DMARC) wrote:

Can you recommend any sort of monitoring to identify when a SAN is getting overloaded? In our case it only became apparent when an app started experiencing latency at the same time for 5-10 minutes every day and we tracked it down to a batch job which was running on an entirely different cluster but which shared the same storage unit. Storage denied it was their problem right up until the point we proved it was.

It would have been nice to have known that before the problems started showing up. Getting a new storage unit is a slow process.

Jay Miller

Sr. Oracle DBA


*From:*oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] *On Behalf Of *Neil Chandler
*Sent:* Tuesday, March 26, 2019 10:35 AM
*To:* Chris Taylor
*Cc:* gogala.mladen@xxxxxxxxx; ORACLE-L
*Subject:* Re: System stats

In the majority of places I have worked - 5 clients last year - the SANs were overloaded in 4 of them. They are too frequently sized for capacity and not throughput/response time. The response time was inevitably variable and System Stats would not have been helpful on the systems they have. In one of the clients, some of the critical DB's have dedicated storage but changing the system stats would have had little to no effect on those systems due to other measures having been put in place (including using a low optimizer index cost adj on one system, meaning lots of index use. Just not necessarily the right indexes.)

The optimizer tries to be all things to all people, and there's lots of parameters to try to twist it into the shape that you want. The problem is frequently the abuse of those parameters - especially the global ones - via googling a problem, believing a silver bullet blog, and the lack of time to prove the solution so we just throw the fix into the system. It can be enlightening to strip the more extreme parameters back to their defaults and see how the system copes.

As an aside, did you run your systems with the default parameters, discover notable problems and then use the 2 sets of system stats to correct those problems, or did you put them in from the start and everything was good?

There's a case to be made for using system stats, but I just don't think that is something that should be used frequently.



*From:*Chris Taylor <christopherdtaylor1994@xxxxxxxxx>
*Sent:* 26 March 2019 12:59
*To:* Neil Chandler
*Cc:* gogala.mladen@xxxxxxxxx; ORACLE-L
*Subject:* Re: System stats

As far as the workload, I used 2 workload stats and swapped between them - one for the day where the business hours and the off-business hours had their own personalities (for lack of a better word).

As far as the SAN goes, if enough systems are hitting the SAN enough to cause the IO rate/throughput to become affected, then its *probably* time for a new SAN.


Mladen Gogala
Database Consultant
Tel: (347) 321-1217

Other related posts: