Re: [PATCH] Implement timekeeping for rumprun/hw (x86)

  • From: Martin Lucina <martin@xxxxxxxxxx>
  • To: rumpkernel-users@xxxxxxxxxxxxx
  • Date: Fri, 3 Jul 2015 17:57:44 +0200

On Wednesday, 01.07.2015 at 14:24, Antti Kantee wrote:

I don't know what precisely to do, but it would be good to make sure
the user sees the error, yet does not have to go edit code if they
want to run regardless. Maybe we need some sort of "make error a
warning" flag to rumprun?

Nothing to do with rumprun I think, the platform bootstrap can just print a
suitable warning if the hardware does not do invariant TSC.

However, I am concerned about the case where the host has SMP. Is
tsc always sufficiently virtualized?

To be honest, I don't know. See the references at the end of this email.
The answer is "probably, yes" but we'll have to wait and see what happens
in reality when people start using it.

In the interest of getting this tested by users ASAP, and given that I've
addressed all the major points, I'm going to merge this now and continue
working on it in-tree. I'm doing that with the full history of changes
since I started working on it, since I'd like to keep that as a record of
the thought process that went into it.

Well it's definitely not easier for *me* to reason about it if *you*
write it in assembly ;)

Perhaps offer a C fallback there? It also serves to document what
is actually going on.

-> TODO list :-)

I don't know what exactly since I didn't think about it carefully.
Just randomly sprinkling cli/sti is usually the wrong thing, but
cli/sti would be the mechanism of critical sectionizing, yes.

Done in what I think are the right places.

Why do you need the /100? Can't you just run the clock at TIMER_HZ
for calibration? Logically thinking, you'd get an almost HZ times
more accurate result that way too and could easily drop the delay
(but FIIK what really happens).

FIIKT :-) I've removed the references to HZ and replaced with / 100 for
now. Incidentally, I also tried builds with the rump kernel HZ set to 50
and nothing bad happened. I'll do some more testing and we can lower it
if all is well.

bmk_cpu_block() is wrong. Just because a timer interrupt fired
doesn't mean another interrupt didn't. Seems rather painful doing
tickless with i8254...

A correct but wasteful solution would be to just always return back into
schedule() after the hlt(). It'll be inefficient for long sleeps, but will
work fine. Any better ideas much appreciated!

Do we need a really good solution there? I assume that KVM-clock
will solve also this for the virtualization case where it matters
most. I can't imagine that going to the scheduler is *that* many
more cycles since you wake up already anyway.

Unfortunately KVM clock only provides a virtualized TSC for timekeeping, it
does *not* provide (or I couldn't find it) an equivalent of the Xen "block
domain" hypercall. Anyhow, I've taken this approach for now and we'll see
what can be improved, I can ask on the KVM list for recommendations.

p.s. thanks for the braindump and urls, they're a useful resource

Here's some more useful resources that I found along the way:

Intel Manuals:

APIC TSC-Deadline Mode: Intel SDM vol. 3A page 10-17 section

Invariant TSC: Intel SDM vol. 3B section 17.14, page 17-38, also
specifically 17.14.4 page 17-40 "Invariant Time-Keeping" which talks about
getting the TSC frequency from CPUID.

Intel SDM Vol. 2A page 3-185: CPUID leaf 15H


TSC and VMX: Intel SDM Vol. 3C page 25-8: VMX non-root operation

Intel paper on "Processor Identification and the CPUID instruction" which contains code examples of
measuring CPU frequency (also using a fixed, long, delay).

This VMWare paper on timekeeping in virtual machines is also a good read
and talks about timekeeping implementations in several OSes:


Other related posts: