Re: [PATCH RFC bpf-next v2 02/10] bpf/helpers: introduce sleepable timers

16 Feb 2024

      On Feb 16 2024, Toke Høiland-Jørgensen wrote:
...
Benjamin Tissoires bentiss@kernel.org writes:
...
On Feb 15 2024, Martin KaFai Lau wrote:
...
On 2/14/24 9:18 AM, Benjamin Tissoires wrote:
...
+static void bpf_timer_work_cb(struct work_struct *work)
+{

struct bpf_hrtimer *t = container_of(work, struct bpf_hrtimer, work);
struct bpf_map *map = t->map;
void *value = t->value;
bpf_callback_t callback_fn;
void *key;
u32 idx;

BTF_TYPE_EMIT(struct bpf_timer);

rcu_read_lock();
callback_fn = rcu_dereference(t->sleepable_cb_fn);
rcu_read_unlock();

I took a very brief look at patch 2. One thing that may worth to ask here,
the rcu_read_unlock() seems to be done too early. It is protecting the
t->sleepable_cb_fn (?), so should it be done after finished using the
callback_fn?
Probably :)
TBH, everytime I work with RCUs I spent countless hours trying to
re-understand everything, and in this case I'm currently in the "let's
make it work" process than fixing concurrency issues.
I still gave it a shot in case it solves my issue, but no, I still have
the crash.
But given that callback_fn might sleep, isn't it an issue to keep the
RCU_reader lock so long? (we don't seem to call synchronize_rcu() so it
might be fine, but I'd like the confirmation from someone else).
You're right, it isn't. From the RCU/checklist.rst doc:

Unlike most flavors of RCU, it *is* permissible to block in an

SRCU read-side critical section (demarked by srcu_read_lock()
   and srcu_read_unlock()), hence the "SRCU": "sleepable RCU".
   Please note that if you don't need to sleep in read-side critical
   sections, you should be using RCU rather than SRCU, because RCU
   is almost always faster and easier to use than is SRCU.
So we can't use the regular RCU protection for the callback in this
usage. We'll need to either convert it to SRCU, or add another
protection mechanism to make sure the callback function is not freed
from under us (like a refcnt). I suspect the latter may be simpler (from
reading the rest of that documentation around SRCU.
Currently I'm thinking at also incrementing the ->prog held in the
bpf_hrtimer which should prevent the callback to be freed, if I'm not wrong.
Then I should be able to just release the rcu_read_unlock before calling
the actual callback. And then put the ref on ->prog once done.
But to be able to do that I might need to protect ->prog with an RCU
too.
...
...
...
A high level design question. The intention of the new
bpf_timer_set_sleepable_cb() kfunc is actually to delay work to a workqueue.
It is useful to delay work from the bpf_timer_cb and it may also useful to
delay work from other bpf running context (e.g. the networking hooks like
"tc"). The bpf_timer_set_sleepable_cb() seems to be unnecessary forcing
delay-work must be done in a bpf_timer_cb.
Basically I'm just a monkey here. I've been told that I should use
bpf_timer[0]. But my implementation is not finished, as Alexei mentioned
that we should bypass hrtimer if I'm not wrong [1].
I don't think getting rid of the hrtimer in favour of
schedule_delayed_work() makes any sense. schedule_delayed_work() does
exactly the same as you're doing in this version of the patch: it
schedules a timer callback, and calls queue_work() from inside that
timer callback. It just uses "regular" timers instead of hrtimers. So I
don't think there's any performance benefit from using that facility; on
the contrary, it would require extra logic to handle cancellation etc;
might as well just re-use the existing hrtimer-based callback logic we
already have, and do a schedule_work() from the hrtimer callback like
you're doing now.
I agree that we can nicely emulate delayed_timer with the current patch
series. However, if I understand Alexei's idea (and Martin's) there are
cases where we just want schedule_work(), without any timer involved.
That makes a weird timer (with a delay always equal to 0), but it would
allow to satisfy those latency issues.
So (and this also answers your second email today) I'm thinking at:
- have multiple flags to control the timer (with dedicated timer_cb
  kernel functions):
  - BPF_F_TIMER_HRTIMER (default)
  - BPF_F_TIMER_WORKER (no timer, just workqueue)
  - BPF_F_TIMER_DELAYED_WORKER (hrtimer + workqueue, or actual
    delayed_work, but that's re-implementing stuffs)
- have bpf_timer_set_callback() and bpf_timer_set_sleepable_cb() strictly
  equivalent in terms of processing, but the latter just instructs the
  verifier that the callback can be sleepable (so calling
  bpf_timer_set_callback() on a BPF_F_TIMER_DELAYED_WORKER is fine as
  long as the target callback is using non sleepable kfuncs).
- ensure that bpf_timer_set_sleepable_cb() is invalid when called with
  a non sleepable timer flag.
- ensure that when the bpf_timer has no hrtimer we also return -EINVAL
  when a delay is given in bpf_timer_start().
Actually, BPF_F_TIMER_DELAYED_WORKER could be just a combination of
(BPF_F_TIMER_HRTIMER | BPF_F_TIMER_WORKER) which would reflect the
reality of how things are implemented.
Thinking through this a little bit more, maybe we should have
BPF_F_TIMER_SLEEPABLE instead of BPF_F_TIMER_WORKER. We can still have
BPF_F_TIMER_SLEEPABLE_BH when WQ_BH gets merged. _SLEEPABLE allows to
not bind the flag to the implementation...
Cheers,
Benjamin

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH RFC bpf-next v2 02/10] bpf/helpers: introduce sleepable timers