Hi Kyle,
If those two numbers differ, it is obvious that one of them is wrong. In
other words, either the hypervisor or the guest OS have a bug. That much
we know. Now, the rest is guessing. The guest machine, assuming it's a
Linux box uses tools like top, sar or nmon to provide the CPU usage. All
of those tools are rather old and well tested so the probability that
the bug is in guest OS is rather minimal. That leaves the possibility
of hypervisor incorrectly reporting the load. AWS uses home brewed
hypervisor, called C5 and loosely based on KVM. There were some issues
with Intel Skylake CPU. If I am allowed to guess, C5 is at fault here.
Of course, to re-iterate, this is my assumption only.
There is also a well known bug of Linux performance reporting tool which
report CPU waiting for memory access as "working". There two things I
would suggest:
* Try all possible Linux tools: sar, top, nmon and dstat and see
whether they all report the same thing. If they do not report the
same thing, see if any of them is in agreement with the hypervisor
report.
* If not, debug the hypervisor.
Regards
On 7/1/19 12:14 PM, kyle Hailey wrote:
Anyone know what it means when the hypervisor is reporting significantly more CPU for a virtual machine than the actual virtual machine thinks it's consuming?
For the other case where virtual machine OS reports CPU is higher than the hypervisor, I always figured that it was because the virtual machine wasn't actually getting the CPU it thought it was and this could be seem with % CPU ready.
For the other way around, I'm wondering what is going on.
Thanks
Kyle