Re: Database upgrade/move I/O waits performance

  • From: Tim Gorman <tim.evdbt@xxxxxxxxx>
  • To: lyallbarbour@xxxxxxxxxxxxxxx
  • Date: Wed, 24 Jul 2019 08:09:37 -0700

Lyall,

What type of vSCSI I/O controller(s) were configured for the VM? Choices typically include (in ascending order of preference): BusLogic, LSI Logic Parallel, LSI Logic SAS, Paravirtual.  Hopefully it is using paravirtual and hopefully more than one vSCSI controller (i.e. four) has been created within the VM?

Allocating one VMDK for each VMFS datastore, and only one VMFS datastore for each storage LUN, presents the least queuing for high I/O workloads at that level.  Distributing VMDKs as evenly as possible across multiple vSCSI controllers is also a good practice to minimize queuing for high I/O workloads.

Hope this helps?

Thanks!

-Tim


On 7/24/19 07:36, Lyall Barbour wrote:

Thanks everyone.
Couple of pieces of new information, if its useful to people.
1) FILESYSTEMIO_OPTIONS is still at default "none". That's being changed tonight during some downtime to SETALL.  I was a bit blown away that that wasn't set by the consulting company doing this upgrade, it was asked for and was set in at least one test system 12.1 databases.
2) on a call with the Unix, Storage and VM teams at my company, the amount of Data Stores (??) was brought up compared to best practises.  this system was only set up with one and graphs show a bottleneck for I/O at that point in the process of getting data from Pure Storage to the Oracle database.
Lyall
*Sent:* Wednesday, July 24, 2019 at 7:47 AM
*From:* "Stefan Koehler" <contact@xxxxxxxx>
*To:* lyallbarbour@xxxxxxxxxxxxxxx, oracle-l <oracle-l@xxxxxxxxxxxxx>
*Subject:* Re: Database upgrade/move I/O waits performance
Hello Lyall,

> Explain Plan, Cost, disk reads, buffer gets, all line up pretty well in 11g and 12c.  The 2 things that are different is the actual time, slowest in 11g is 2.5 seconds, 12c is 14 seconds.  and percentage of work, 11g disk work is 85% and cpu work is 15%. 12c disk work is 98%, cpu 2%. Do i talk to the SAN folks and see what their graphs are like? VM team?  keep digging into AWR stats and comparisons on the database?

OK, based on on this background information you are doing the same amount of I/O work (disk reads & buffer gets) but runtime is slower with 12c and the driver of this slow-down is the I/O part of the queries (2.125 seconds on I/O with 11g on hardware but 13.72 seconds on I/O with 12c and on VMware).

How to proceed? At first have a look what kind of I/O (single block vs. multi-block) you are doing and if the latter check the I/O request size and then compare the I/O latency with help of latency histogram in AWR report.

If you find the issue with latency go to OS level and check the disk/device queues, etc.. In my experience you need solid proof analysis data before going to storage admins as there are so many layers in-between the database and the storage and there is a pretty good chance that you both see different latency values.

P.S.: Shameless ad - you might want to consider my consulting services to help you with this issue ... or as a teaser for my services - send me a SQLd360 report of the exemplary query and an eDB360 report for the database (both tools can be found here: https://github.com/sqldb360/sqldb360/archive/v19.2.zip) and I gonna have quick look for free :)

Best Regards
Stefan Koehler

Independent Oracle performance consultant and researcher
Website: http://www.soocs.de
Twitter: @OracleSK

> Lyall Barbour <lyallbarbour@xxxxxxxxxxxxxxx> hat am 23. Juli 2019 um 18:39 geschrieben:
>
> Hello gurus,
>   We've upgraded and moved the company's CRM OLTP Oracle database.
>
> Old: physical hardware, 512g RAM, 32 CPU -- Oracle 11.2.0.4 Enterprise, 16g MEMORY_TARGET, SPM baselines used, one or two Profiles.
> New: VMWare VM, 224g RAM, 24 vCPU -- Oracle 12.1.0.2 Enterprise, 72g SGA_TARGET, HugePages, 12g PGA_TARGET, no SPM baselines, multiple Profiles.
>
> Having said all that, there's a consistency to the slowness that's very interesting.  We are constantly having I/O waits on all (?) queries, if not all, the main top sqls that are running in v$session_longops.
>
> Comparing one query.  Explain Plan, Cost, disk reads, buffer gets, all line up pretty well in 11g and 12c.  The 2 things that are different is the actual time, slowest in 11g is 2.5 seconds, 12c is 14 seconds. and percentage of work, 11g disk work is 85% and cpu work is 15%.  12c disk work is 98%, cpu 2%.
>
> I'm not trying to get any solutions from everyone here, i'm trying to get direction on where to look next so i know who to talk to at my company with this issue. Do i talk to the SAN folks and see what their graphs are like?  VM team?  keep digging into AWR stats and comparisons on the database?
>
> Comparing AWR hour periods from today (oracle 12.1) to last week, tuesday, (oracle 11.2) the amount of buffer gets and disk reads all line up with the issue i described above.  420T of buffer cache reads last week, 10T today, in that hour period.  ALL work is waiting on disk.
>
> Any help... helps.
>
> Thanks,
> Lyall Barbour
-- //www.freelists.org/webpage/oracle-l

Other related posts: