On Mon, Feb 24, 2025 at 01:00:01PM +0100, Ricardo Cañuelo Navarro wrote:
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
[ Upstream commit 401cb7dae8130fd34eb84648e02ab4c506df7d5e ]
The XDP redirect process is two staged:
bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the packet and makes decisions. While doing that, the per-CPU variable bpf_redirect_info is used.
Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info and it may also access other per-CPU variables like xskmap_flush_list.
At the very end of the NAPI callback, xdp_do_flush() is invoked which does not access bpf_redirect_info but will touch the individual per-CPU lists.
The per-CPU variables are only used in the NAPI callback hence disabling bottom halves is the only protection mechanism. Users from preemptible context (like cpu_map_kthread_run()) explicitly disable bottom halves for protections reasons. Without locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking.
PREEMPT_RT has forced-threaded interrupts enabled and every NAPI-callback runs in a thread. If each thread has its own data structure then locking can be avoided.
Create a struct bpf_net_context which contains struct bpf_redirect_info. Define the variable on stack, use bpf_net_ctx_set() to save a pointer to it, bpf_net_ctx_clear() removes it again. The bpf_net_ctx_set() may nest. For instance a function can be used from within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and NET_TX_SOFTIRQ which does not. Therefore only the first invocations updates the pointer. Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct bpf_redirect_info. The returned data structure is zero initialized to ensure nothing is leaked from stack. This is done on first usage of the struct. bpf_net_ctx_set() sets bpf_redirect_info::kern_flags to 0 to note that initialisation is required. First invocation of bpf_net_ctx_get_ri() will memset() the data structure and update bpf_redirect_info::kern_flags. bpf_redirect_info::nh is excluded from memset because it is only used once BPF_F_NEIGH is set which also sets the nh member. The kern_flags is moved past nh to exclude it from memset.
The pointer to bpf_net_context is saved task's task_struct. Using always the bpf_net_context approach has the advantage that there is almost zero differences between PREEMPT_RT and non-PREEMPT_RT builds.
Cc: Andrii Nakryiko andrii@kernel.org Cc: Eduard Zingerman eddyz87@gmail.com Cc: Hao Luo haoluo@google.com Cc: Jiri Olsa jolsa@kernel.org Cc: John Fastabend john.fastabend@gmail.com Cc: KP Singh kpsingh@kernel.org Cc: Martin KaFai Lau martin.lau@linux.dev Cc: Song Liu song@kernel.org Cc: Stanislav Fomichev sdf@google.com Cc: Yonghong Song yonghong.song@linux.dev Acked-by: Alexei Starovoitov ast@kernel.org Acked-by: Jesper Dangaard Brouer hawk@kernel.org Reviewed-by: Toke Høiland-Jørgensen toke@redhat.com Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Link: https://patch.msgid.link/20240620132727.660738-15-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski kuba@kernel.org
include/linux/filter.h | 56 +++++++++++++++++++++++++++++++++++++++++--------- include/linux/sched.h | 3 +++ kernel/bpf/cpumap.c | 3 +++ kernel/bpf/devmap.c | 9 +++++++- kernel/fork.c | 1 + net/bpf/test_run.c | 11 +++++++++- net/core/dev.c | 28 ++++++++++++++++++++++++- net/core/filter.c | 41 +++++++++++------------------------- net/core/lwt_bpf.c | 3 +++ 9 files changed, 113 insertions(+), 42 deletions(-)
You did major changes to this and you didn't sign off and explain what you did differently from the original commit? You know we can't take that...
thanks,
greg k-h