Fixing timekeeping for rumprun/hw

From: Martin Lucina <martin@xxxxxxxxxx>
To: rumpkernel-users@xxxxxxxxxxxxx
Date: Fri, 19 Jun 2015 18:42:01 +0200

Hi,

I'm trying to fix the timekeeping for rumprun/hw (issue #30). I've looked
at the various options and have a plausible approach with minimal changes
to the current code, except that I can't quite get to a workable solution
using scaled integer arithmetic.

My changes are simple, so I'm not including the full patch as most of that
is just some reorganisation moving the timer and clock related code into
its own source file.

The interesting parts of the code are:

In bmk_x86_initclocks(), I compute the TSC frequency by calibrating it
against a fixed 100000 us delay using the PIT:

initial_tsc = rdtsc();
i8254_delay(100000);
tsc_freq = (rdtsc() - initial_tsc) * 10;

(i8254_delay is lifted from NetBSD, but simplified)

So far so good. Then, I use scaled integer arithmetic with a scale factor
of 32 bits to compute a multiplier that can be used to convert TSC counts
to nanoseconds:

#define NSEC_PER_SEC 1000000000ULL
tsc_mult = (NSEC_PER_SEC << 32) / tsc_freq;

bmk_cpu_clock_now() thus becomes:

uint64_t nsec = (rdtsc() * tsc_mult) >> 32;
return nsec;

The problem is that the (rdtsc() * tsc_mult) calculation easily overflows a
uint64_t. For example: on my laptop tsc_freq is 2594848270, tsc_mult is
1655190149. The calculation would overflow within 4.29 seconds of bootup.

So, the question is, what to do about it? I have thought of the following
options so far:

1) Use 128-bit multiplication for (rdtsc() * tsc_mult). This is trivial on
x86_64 (just use mulq) a bit more involved but still "easy" on x86.

2) Always operate on TSC deltas since last call to bmk_cpu_clock_now()
rather than absolute TSC values. Assuming we arrange for a call to
bmk_cpu_clock_now() once a second we should be fine. This could be done via
eg. the RTC interrupt which, unlike the PIT, can be programmed to fire once
a second.

From my limited understanding of the NetBSD timetc and Linux clocksource
code, I believe this is how the code operates, ie. they also use scaled
integer math for device tick to nsec conversions but ensure that they
operate on small enough deltas that do not cause overflow.

3) Give up on TSC and just count PIT ticks at 100 Hz. This results in less
precision (but we don't care too much as the rump kernel HZ runs at 100
anyway), however relies on the PIT interrupt always being enabled which is
not terribly efficient (esp. in a virtualized environment). It does,
however, have the advantage of not depending on TSC which means it will run
on a 486-class processor. This may be useful for some embedded use cases.

Also note that I have no idea what the stability of TSC is under KVM and
the current code makes no attempt to figure out if TSC is invariant,
constant, or anything else. So option 3) would also be a safe(r) bet in
that the PIT should "just work".

Thoughts? Is there a trick in the arithmetic I've missed here? Entirely
possible since I've never used scaled integer math before.

Martin

Follow-Ups:
- Re: Fixing timekeeping for rumprun/hw
  - From: Justin Cormack
- Re: Fixing timekeeping for rumprun/hw
  - From: Antti Kantee

Fixing timekeeping for rumprun/hw

Other related posts: