On Wed, Jun 20, 2018 at 9:35 PM, Arnd Bergmann arnd@arndb.de wrote:
On Wed, Jun 20, 2018 at 6:19 PM, Andi Kleen ak@linux.intel.com wrote:
Arnd Bergmann arnd@arndb.de writes:
To clarify: current_kernel_time() uses at most millisecond resolution rather than microsecond, as tkr_mono.xtime_nsec only gets updated during the timer tick.
Ah you're right. I remember now: the motivation was to make sure there is basically no overhead. In some setups the full gtod can be rather slow, particularly if it falls back to some crappy timer.
This means, we're probably fine with a compile-time option that distros can choose to enable depending on what classes of hardware they are targetting, like
struct timespec64 current_time(struct inode *inode) { struct timespec64 now; u64 gran = inode->i_sb->s_time_gran;
if (IS_ENABLED(CONFIG_HIRES_INODE_TIMES) && gran <= NSEC_PER_JIFFY) ktime_get_real_ts64(&now); else ktime_get_coarse_real_ts64(&now); return timespec64_trunc(now, gran);
}
With that implementation, we could still let file systems choose to get coarse timestamps by tuning the granularity in the superblock s_time_gran, which would result in nice round tv_nsec values that represent the actual accuracy.
I've done some simple tests and found that on a variety of x86, arm32 and arm64 CPUs, it takes between 70 and 100 CPU cycles to read the TSC and add it to the coarse clock, e.g. on a 3.1GHz Ryzen, using the little test program below:
vdso hires: 37.18ns vdso coarse: 6.44ns sysc hires: 161.62ns sysc coarse: 133.87ns
On the same machine, it takes around 400ns (1240 cycles) to write one byte into a tmpfs file with pwrite(). Adding 5% to 10% overhead for accurate timestamps would definitely be noticed, so I guess we wouldn't enable that unconditionally, but could do it as an opt-in mount option if someone had a use case.
Arnd
--- /* measure times for high-resolution clocksource access from userspace */ #include <stdio.h> #include <time.h> #include <unistd.h> #include <stdbool.h> #include <sys/syscall.h>
static int do_clock_gettime(clockid_t clkid, struct timespec *tp, bool vdso) { if (vdso) return clock_gettime(clkid, tp);
return syscall(__NR_clock_gettime, clkid, tp); }
static int loop1sec(int clkid, bool vdso) { int i; struct timespec t, start;
do_clock_gettime(clkid, &start, vdso); i = 0; do { do_clock_gettime(clkid, &t, vdso); i++; } while (t.tv_sec == start.tv_sec || t.tv_nsec < start.tv_nsec);
return i; }
int main(void) { printf("vdso hires: %7.2fns\n", 1000000000.0 / loop1sec(CLOCK_REALTIME, true)); printf("vdso coarse: %7.2fns\n", 1000000000.0 / loop1sec(CLOCK_REALTIME_COARSE, true)); printf("sysc hires: %7.2fns\n", 1000000000.0 / loop1sec(CLOCK_REALTIME, false)); printf("sysc coarse: %7.2fns\n", 1000000000.0 / loop1sec(CLOCK_REALTIME_COARSE, false));
return 0; }