On Thu, Apr 4, 2024 at 8:27 AM Benjamin Tissoires benjamin.tissoires@redhat.com wrote:
So we need something like:
struct bpf_hrtimer { union { struct hrtimer timer;
- struct work_struct work; }; struct bpf_map *map; struct bpf_prog *prog; void __rcu *callback_fn; void *value; union {
Are you sure we need an union here? If we get to call kfree_rcu() we need to have both struct rcu_head and sync_work, not one or the other.
why? with an extra flag it's one or the other. In bpf_timer_cancel_and_free() if (flag & SLEEPABLE) { schedule_work() to cancel_work_sync + kfree_rcu } else { hrtimer_cancel kfree_rcu }
struct rcu_head rcu;
- struct work_struct sync_work; };
- u64 flags; // bpf_timer_init() will require BPF_F_TIMER_SLEEPABLE
If I understand, you want BPF_F_TIMER_SLEEPABLE in bpf_timer_init() (like in my v2 or v3 IIRC). But that means that once a timer is initialized it needs to be of one or the other type (especially true with the first union in this struct).
yes. That's an idea. The code to support wq vs timer seems to be diverging more than what we expected initially. It seems cleaner to set it as init time and enforce in other helpers.
Also with two work_struct-s we're pushing the sizeof(bpf_hrtimer) too far. It's already at 112 bytes and some people use bpf_timer per flow. So potentially millions of such timers. Adding extra sizeof(struct work_struct)=32 * 2 that won't be used is too much. Note that sizeof(struct hrtimer)=64, so unions make everything fit nicely.
So should we reject during run time bpf_timer_set_callback() for sleepable timers and only allow bpf_timer_set_sleepable_cb() for those? (and the invert in the other case).
yes.
This version of the patch allows for one timer to be used as softIRQ or WQ, depending on the timer_set_callback that is used. But it might be simpler for the kfree_rcu race to define the bpf_timer to be of one kind, so we are sure to call the correct kfree method.
I think one or another simplifies the code and makes it easier to think through combinations.
I'm still contemplating adding new "struct bpf_wq" and new kfuncs to completely separate wq vs timer. The code reuse seems to be relatively small. We can potentially factor out internals of bpf_timer_* into smaller helpers and use them from bpf_timer_* and from new bpf_wq_* kfuncs.
One more thing. bpf_timer_cancel() api turned out to be troublesome. Not only it cancels the timer, but drops callback too. It was a surprising behavior for people familiar with kernel timer api-s. We should not repeat this mistake with wq.
We can try to fix bpf_timer_cancel() too. If we drop drop_prog_refcnt() from it it shouldn't affect existing bpf_timer users who are forced to do: bpf_timer_cancel() bpf_timer_set_callback() bpf_timer_start() all the time. If/when bpf_timer_cancel() stops dropping the callback such bpf prog won't be affected. So low chance of breaking any prog. wdyt?