On Tue, Mar 25 2025 at 12:32, Miroslav Lichvar wrote:
On Thu, Mar 20, 2025 at 01:03:00PM -0700, John Stultz wrote:
+static u64 timekeeping_accumulate(struct timekeeper *tk, u64 offset,
enum timekeeping_adv_mode mode,
unsigned int *clock_set)
* Also reset tk::ntp_error as it does not make sense to keep the
* old accumulated error around in this case.
*/
I'm not sure if I still understand the timekeeping code correctly, but that doesn't seem right to me. At least the comment should explain why it does not make sense to keep the NTP error.
Resetting the NTP error causes a small time step. An NTP/PTP client can be setting the frequency very frequently, e.g. up to 128 times per second and the interval between updates can be random. If the timing
I never observed that behaviour, but I'm not a NTP/PTP wizard/power user.
was right, I suspect it could cause a measurable drift. The client should be able to compensate for it, but why make its job harder by making the clock less predictable. My expectation for the clock is that its frequency will not change if the same (or only slightly different) frequency is set repeatedly by adjtimex().
The point is that timekeeper::ntp_error accumulates the error between NTP and the clock interval. With John's change to forward the clock in case of adjtimex() setting the tick length or frequency, the previously accumulated information is out of sync because the forwarding resets the period asynchronously.
The fundamental property of the timekeeper adjustment is that it advances everything in multiples of the clock interval. The clock interval is the number of hardware clock increments per tick, which has been determined from the initial multiplier/shift pair of the clock source at the point where the clock source is installed as the timekeeper source. It never changes throughout the life time of the clocksource.
The original implementation respected this base period, but John's approach of forwarding, which cures the coarse time getter issue, violates it. As a consequence the previous error accumulation is not longer based on the base period because the period has been reset to the random point in time when adjtimex() was invoked, which makes the error accumulation a random number.
There are two ways to deal with that. Both require to revert this change completely.
1) Handle the coarse time getter problem seperately and leave the existing adjtimex logic alone. That was my initial suggestion:
https://lore.kernel.org/all/87cyej5rid.ffs@tglx
2) Handle adjtimex(ADJ_TICK/ADJ_FREQUENCY) at the next tick boundary instead of doing it immediately at the random point in time when adjtimex() is invoked.
That cures the coarse time getter problem as well, but obviously delays the multiplier update to the next tick, which means that only the last adjtimex(ADJ_TICK/ADJ_FREQUENCY) invocation between two ticks becomes effective.
From a pure mathematical point of view, this is keeping everything consistent. A quick test shows that it works. Though again, I'm not the NTP wizard here and don't know which dragons are lurking in the NTP/PTP clients.
Patch on top of the revert below. That requires some thought vs. NOHZ delaying the next tick, but that's a solvable problem.
Thanks,
tglx --- --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -34,14 +34,6 @@
#define TK_UPDATE_ALL (TK_CLEAR_NTP | TK_CLOCK_WAS_SET)
-enum timekeeping_adv_mode { - /* Update timekeeper when a tick has passed */ - TK_ADV_TICK, - - /* Update timekeeper on a direct frequency change */ - TK_ADV_FREQ -}; - /* * The most important data for readout fits into a single 64 byte * cache line. @@ -2155,7 +2147,7 @@ static u64 logarithmic_accumulation(stru * timekeeping_advance - Updates the timekeeper to the current time and * current NTP tick length */ -static bool timekeeping_advance(enum timekeeping_adv_mode mode) +static bool timekeeping_advance(void) { struct timekeeper *tk = &tk_core.shadow_timekeeper; struct timekeeper *real_tk = &tk_core.timekeeper; @@ -2173,8 +2165,8 @@ static bool timekeeping_advance(enum tim tk->tkr_mono.cycle_last, tk->tkr_mono.mask, tk->tkr_mono.clock->max_raw_delta);
- /* Check if there's really nothing to do */ - if (offset < real_tk->cycle_interval && mode == TK_ADV_TICK) + /* Check if there's really something to do */ + if (offset < real_tk->cycle_interval) return false;
/* @@ -2216,7 +2208,7 @@ static bool timekeeping_advance(enum tim */ void update_wall_time(void) { - if (timekeeping_advance(TK_ADV_TICK)) + if (timekeeping_advance()) clock_was_set_delayed(); }
@@ -2548,10 +2540,6 @@ int do_adjtimex(struct __kernel_timex *t
audit_ntp_log(&ad);
- /* Update the multiplier immediately if frequency was set directly */ - if (txc->modes & (ADJ_FREQUENCY | ADJ_TICK)) - clock_set |= timekeeping_advance(TK_ADV_FREQ); - if (clock_set) clock_was_set(CLOCK_SET_WALL);