Re: [RFC PATCH] sched_clock: Avoid tearing during read from NMI

21 Jan 2015

      On 21/01/15 17:29, John Stultz wrote:
...
On Wed, Jan 21, 2015 at 8:53 AM, Daniel Thompson
daniel.thompson@linaro.org wrote:
...
Currently it is possible for an NMI (or FIQ on ARM) to come in and
read sched_clock() whilst update_sched_clock() has half updated the
state. This results in a bad time value being observed.
This patch fixes that problem in a similar manner to Thomas Gleixner's
4396e058c52e("timekeeping: Provide fast and NMI safe access to
CLOCK_MONOTONIC").
Note that ripping out the seqcount lock from sched_clock_register() and
replacing it with a large comment is not nearly as bad as it looks! The
locking here is actually pretty useless since most of the variables
modified within the write lock are not covered by the read lock. As a
result a big comment and the sequence bump implicit in the call
to update_epoch() should work pretty much the same.
It still looks pretty bad, even with the current explanation.
I'm inclined to agree. Although to be clear, the code I proposed should
not more broken than the code we have today (and arguably more honest).
...
...

  raw_write_seqcount_begin(&cd.seq);

  /*

   * sched_clock will report a bad value if it executes

   * concurrently with the following code. No locking exists to

   * prevent this; we rely mostly on this function being called

   * early during kernel boot up before we have lots of other

   * stuff going on.

   */
  read_sched_clock = read;
  sched_clock_mask = new_mask;
  cd.rate = rate;
  cd.wrap_kt = new_wrap_kt;
  cd.mult = new_mult;
  cd.shift = new_shift;

  cd.epoch_cyc = new_epoch;

  cd.epoch_ns = ns;

  raw_write_seqcount_end(&cd.seq);

  update_epoch(new_epoch, ns);

So looking at this, the sched_clock_register() function may not be
called super early, so I was looking to see what prevented bad reads
prior to registration.
Certainly not super early, but, from the WARN_ON() at the top of the
function I thought it was intended to be called before start_kernel()
unmasks interrupts...
...
And from quick inspection, its nothing. I
suspect the undocumented trick that makes this work is that the mult
value is initialzied to zero, so sched_clock returns 0 until things
have been registered.
So it does seem like it would be worth while to do the initialization
under the lock, or possibly use the suspend flag to make the first
initialization safe.
As mentioned the existing write lock doesn't really do very much at the
moment.
The simple and (I think) strictly correct approach is to duplicate the
whole of the clock_data (minus the seqcount) and make the read lock in
sched_clock cover all accesses to the structure.
This would substantially enlarge the critical section in sched_clock()
meaning we might loop round the seqcount fractionally more often.
However if that causes any real problems it would be a sign the epoch
was being updated too frequently.
Unless I get any objections (or you really want me to look closely at
using suspend) then I'll try this approach in the next day or two.
Daniel.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [RFC PATCH] sched_clock: Avoid tearing during read from NMI