When consolidating systems, it is obvious you have to look at Oracle’s usage of
the available resources, and that has been covered extensively here. The one
piece that many forget to consider is the impact it has to the OS and
utilization that cannot be seen from an AWR (or similar) report.
When you consolidate, the combined physical resource utilization will likely
remain fairly constant - except for memory. This means you will be doing the
same amount (or very close to) of network IO and likely a small increase in
physical disk IO if you cannot increase the memory footprint accordingly. Each
of these require HW interrupts to work. HW interrupts - whether traditional,
MSI-X or SR-IOV - will be directed to a subset of the CPUs. The interrupts for
device X must all go to a fixed set of 1 or more CPUs. For instance, a network
card with 4 MSI-X interrupts will always require the same 4 CPUs. They can be
moved, but all interrupts that were directed to CPU N will move to CPU M if a
move occurs. From what I can tell - not being able to see the hypervisor in
Azure - it looks like they use SR-IOV with Mellanox/Nvidia ConnectX cards which
will behave the same way essentially as MSI-X. On-prem deployments that I have
seen rarely leverage SR-IOV (sad).
That being said, if you consolidate three 8 CPU systems onto a 24 (v)CPU
system, you will still have the same or more interrupts direct at the same
number of CPUs as the 8 CPU system - unless someone takes this into account and
alters the number of interrupts for MSI-X or VFPs. So if each of the 8 CPU
systems was driving even one of the interrupt vectored CPUs to 33% utilization,
the consolidated system very well may have a single cPU that is at 99%
utilization and becomes the choke point for the associated IO. A system wide
look at utilization, like from an AWR, might show the system 25% utilized
overall when in fact you have a CPU resource problem.
You have to look at per CPU utilization and understand what is causing it if
you don’t see an even distribution across all CPUs.
Just something to consider.
Thanks,
Jarod
On Feb 20, 2021, at 8:39 AM, Mark Powell <markp28665@xxxxxxxxx> wrote:
Kellyn, nice of you to post.
On Wed, Feb 17, 2021 at 5:16 PM Kellyn Pot'Vin-Gorman <dbakevlar@xxxxxxxxx
<mailto:dbakevlar@xxxxxxxxx>> wrote:
If you'd like to steal the worksheet, it would be easy to remove the Azure
calculations and just stop at the AWR workload sizing... :)
Estimate Tool for Sizing Oracle Workloads to Azure IaaS VMs - Microsoft Tech
Community
<https://techcommunity.microsoft.com/t5/data-architecture-blog/estimate-tool-for-sizing-oracle-workloads-to-azure-iaas-vms/ba-p/1427183>
Kellyn Pot'Vin-Gorman
DBAKevlar Blog <http://dbakevlar.com/>
about.me/dbakevlar <http://about.me/dbakevlar>
On Wed, Feb 17, 2021 at 1:31 PM Cary Millsap <cary.millsap@xxxxxxxxxxxx
<mailto:cary.millsap@xxxxxxxxxxxx>> wrote:
Thank you, Kellyn!
Cary
On Wed, Feb 17, 2021 at 3:20 PM Kellyn Pot'Vin-Gorman <dbakevlar@xxxxxxxxx
<mailto:dbakevlar@xxxxxxxxx>> wrote:
Hey Cary,
I know I do this for Azure and we're about to embark on writing this up as
part of a book for Apress, but, at a high level, there's a process and here's
the overall reasoning behind why we do what we do:
1. You realize that few are very good at sizing out machines for database
workloads as they think they are.
2. Database workload resource needs change over time.
3. Existing hardware is purchased to serve the datacenter for an extended
period of time.
Due to this, the process for a datacenter move and license review, (we suffer
from Oracle's policy for 2:1 penalty on CPU:vCPU with hyperthreading) is-
For each database environment, we collect a single AWR report for a one week
window.
1. The data points for CPU, memory, IOPs, %busy CPU, CPU/Core, SGA/PGA is
all documented for each production database- Don't do this for non-prod.
We take the averages and any missing data for Exadata into consideration
using a "fudge factor" table and translate it to what the workload would
require to run NOW.
2. This is listed out in a spreadsheet, which can easily compare what is
required vs. what is allocated in the hardware.
3. Values are assigned for a percentage of resources from the production
workload that they wish to grant to stage, test, dev, etc.
4. Consolidate workloads for databases that make sense.
5. Consider the amount of resource usage that can be saved if a more modern
backup utility can be used than RMAN.
Most customers discover they aren't using the cores that they thought they
were and can save money on Oracle licensing.
Hope this helps and I do have a worksheet for Azure that could be easily
updated to just do straight CPU instead of Azure calculations if interested.
It's what we use here to do the work.
Kellyn Pot'Vin-Gorman
DBAKevlar Blog <http://dbakevlar.com/>
about.me/dbakevlar <http://about.me/dbakevlar>
On Wed, Feb 17, 2021 at 12:45 PM Cary Millsap <cary.millsap@xxxxxxxxxxxx
<mailto:cary.millsap@xxxxxxxxxxxx>> wrote:
Hi everybody, from freezing cold Texas.
I have a friend who's embarking on a big project to reduce the number of
servers and licenses his company has to pay for and maintain (presumably
using VMs and PDBs and all that). Do you know of any good sources for
studying up on how to do a good job on a project like this?
Thank you,
Cary Millsap
Method R Corporation
Author of Optimizing Oracle Performance <http://amzn.to/OM0q75>
and The Method R Guide to Mastering Oracle Trace Data, 3rd edition
<https://amzn.to/2IhhCG6+-+Millsap+2019.+Mastering+Oracle+Trace+Data+3ed>