Re: [PATCH] Implement timekeeping for rumprun/hw (x86)

  • From: Martin Lucina <martin@xxxxxxxxxx>
  • To: rumpkernel-users@xxxxxxxxxxxxx
  • Date: Sat, 4 Jul 2015 17:47:58 +0200

On Saturday, 04.07.2015 at 11:30, Antti Kantee wrote:

You missed the "make *sure* the user sees the error" part. How else
are you going to do that except by printing the warning and then
refusing to boot by default? If you refuse to boot by default, how
else are going let the user to decide to force boot anyway except
with an override switch in rumprun? (probably some other ways, but
that was the most obvious way for my imagination)

I don't want to ship a system with weird bugs where the blame is on
the user for not studying the full boot output on every system they
deploy on.

Good point, In that case, yeah, refusing to boot by default is probably the
best option.

Bug shakedowns are one thing, but it would be nice to see the author
extrude more confidence than "probably" for if the introduced
technique is not fundamentally broken... At least add a
check-and-panic for tsc not going backwards if you're not sure it
never will. Otherwise people may run into utterly weird failure
modes in the order of "some funny stuff happens sometimes in some

I'll do that (add checks for TSC going backwards). That will definitely
happen in case of eg. migration, but we're not claiming to support that
right now anyway.

The clock seems to run at about 2/5 speed under no-kvm QEMU (i.e.
sleep 2 takes ~5 seconds). Is that expected?

Interesting. Perhaps your QEMU is showing even more overhead for PIT
interrupt delivery than mine? There's also still some (bad) interaction
between rumpclk time and bmk time. I'll do a general out-of-band
implementation of rumpxentc tonight which should hopefully help a bit.

Can you run the test program below with, say, '10 500', '10 1000' and '10
5000' as parameters and report the output? Also, if possible, run it in
both POSIX mode and BMK mode (built with -DBMK, you'll need to comment out
the "die" if the stub link fails in -gcc).

#include <err.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>

#define NSEC_PER_SEC 1000000000ULL

#ifdef BMK
extern uint64_t bmk_platform_clock_monotonic(void);
extern void bmk_sched_blockprepare_timeout(uint64_t);
extern void bmk_sched_block();

#define clock_monotonic_ns bmk_platform_clock_monotonic

void sleep_monotonic_ns(const uint64_t n)
uint64_t bmk_now = bmk_platform_clock_monotonic();

bmk_sched_blockprepare_timeout(bmk_now + n);


uint64_t clock_monotonic_ns(void)
int rc;
struct timespec ts;

rc = clock_gettime(CLOCK_MONOTONIC, &ts);
if (rc == -1)
err(EXIT_FAILURE, "clock_gettime(CLOCK_MONOTONIC)");
return (ts.tv_sec * NSEC_PER_SEC) + ts.tv_nsec;

void sleep_monotonic_ns(const uint64_t n)
struct timespec ts = { 0 };

if (n >= NSEC_PER_SEC) {
ts.tv_sec = n / NSEC_PER_SEC;
ts.tv_nsec = n % NSEC_PER_SEC;
else {
ts.tv_nsec = n;

clock_nanosleep(CLOCK_MONOTONIC, 0, &ts, NULL);

int main(int argc, char *argv[])
int loops = 600,
ms = 1000,
struct timespec ts_wall;
struct tm tm_wall;
uint64_t start_ns,
char s[40];

if (argc == 3) {
loops = atoi(argv[1]);
ms = atoi(argv[2]);
sleep_ns = 1000000ULL * (uint64_t)ms;
printf("%d loops of %llu ns\n", loops, sleep_ns);
overhead_ns = 0;

for (i=0; i < loops; i++) {
start_ns = clock_monotonic_ns();
end_ns = clock_monotonic_ns();

rc = clock_gettime(CLOCK_REALTIME, &ts_wall);
if (rc == -1)
err(EXIT_FAILURE, "clock_gettime(CLOCK_REALTIME)");
gmtime_r(&ts_wall.tv_sec, &tm_wall);
strftime(s, sizeof s, "%Y-%m-%d %H:%M:%S", &tm_wall);

printf("%s %lld.%lld\n", s, (long long)end_ns / NSEC_PER_SEC,
(long long)end_ns % NSEC_PER_SEC);
printf(">> delta %llu overhead %llu\n",
(unsigned long long)end_ns - start_ns,
(unsigned long long)(end_ns - start_ns) - sleep_ns);
overhead_ns += (end_ns - start_ns) - sleep_ns;

printf("total overhead %llu\n", overhead_ns);

return 0;

Other related posts: