RE: Oracle Event Monitoring and VMWare

  • From: Michael Schmitt <mschmitt@xxxxxxxxxxxx>
  • To: ORACLE-L <oracle-l@xxxxxxxxxxxxx>
  • Date: Wed, 18 Sep 2013 14:56:28 +0000

We see something similar so I will be interested in what comes of this thread.

I am not sure if I am completely correct on the below explanation, but this is 
my understanding from our ESX admin.

The problem is that it is a result of over allocating multiple virtual CPUs to 
a host and over allocating virtual CPUs compared to physical cores.  The way 
VMware works, is that if you allocate 4 virtual CPUs to a VM, the virtual host 
will have to grab all 4 cores every time it wants to do work.  If it cannot 
find all 4 cores available on the physical machine (and it has to get access to 
all 4 and not a subset), it will run into some form of wait.  The chances of 
getting multiple cores at the same time is reduced when you have over allocated 
the number of Virtual CPUs to physical CPUs.    

It seems the majority of your problem is a result of not having the support you 
need.  A good ESX admin should be able to work with you.  When I saw the Blue 
Medora Plugin that Kyle mentioned, I thought it would be great for us.  
Unfortunately, our ESX team just got a different monitor and do not want to 
test it out atm

 



-----Original Message-----
From: oracle-l-bounce@xxxxxxxxxxxxx [mailto:oracle-l-bounce@xxxxxxxxxxxxx] On 
Behalf Of Dba DBA
Sent: Tuesday, September 17, 2013 2:57 PM
To: ORACLE-L
Subject: Oracle Event Monitoring and VMWare

Some of the DBAs on here crossover between being DBAs and SAs. This is a 
question more for these guys. I am rather disturbed by what I wrote below.
Oracle 11.2.0.3
VMWare Esxi

4 CPU Server
7 VMs on the server
3 DBs (use internal disks for storage)
4 Application Server VMs use SAN for storage (low end SAN with low end pipe. 
plus we ran out of space on it)

Issue:
1. DB performance issue
2. Checked the events in all 3 DBs. A variety of events all related to disk on 
all DBs. (variety of events, mainly used ASH, but ran some 10046 traces). It 
was pretty clear.
3. No excessive work is going on.
4. Figured this is some kind of disk issue since the DBs run separately on 
intenral disks and that the other VMs would not be impacting this.


Answer: Not even close. CPU issue on the Host (no CPU issue on the boxes)
13 Virtual Cores assigned to 7 VMs
4 Cores on the server
Race condition(SA doesn't know this word, but its my interpretation of his
explanation) and a number of context switches. Cores have to just grab 
different threads coming from the VM. My guess is that this reduces the benefit 
of CPU caching since cache for a specific VM keeps getting pushed out. (best 
guess. not a hardware guy)

CPU Issue: I may see disk at the VM level, but this is completely irrelavent 
from what is going on at the host.

Solution: Each VM gets allocated 1 virtual core

Per my SA: Giving extra Virtual Cores to a VM does not impact performance.
It does not matter if the application can handle multiple cores better, those 
cores are just 'threads' at the OS level. So it doesn't matter.

This raises a few disturbing questions...

1. Does the number of virtual cores a VM is allocated have ANY meaning 
whatsoever ? I had assumed that this represented a % of the CPU allocated to 
each VM.

2. In a virtual environment how do we interpret oracle events? They appear to 
be meaningless to the host tier. I  know that excessive LIOs or locking issues 
and the such can be meaningful, but in general I don't really have alot of data 
to provide an SA to help diagnose the problem.

3. How do I work with an SA if I only have access to the database inside the 
VM. No unix access in operations. The operations team gives very minimal 
support to any performance issues. I do not even have direct contact with them. 
If I see events pointing to 'serial disk reads', that information appears to be 
meaningless.

4. Per my SA, VMWare does not automatically store historical performance data 
and he needs to look at it in real time. I have to open tickets to reach 
operations and days go by... So the issues can clear up. Operations will use 
Hyper-V. Are there things that Hyper-V will automatically store that I can ask 
them to look at to compare? I want to increase my knowledge about this so at a 
minimum I can communicate better with the operations SAs. So we don't talk 
through each other.


--
//www.freelists.org/webpage/oracle-l


--
//www.freelists.org/webpage/oracle-l


Other related posts: