Hi Jordan,
On 9/14/24 8:46 PM, Jordan Rife wrote:
When operating Cilium in netkit mode with BPF-based host routing, calls to bpf_redirect() cause a kernel panic.
[ 52.247646] BUG: kernel NULL pointer dereference, address: 0000000000000038 ... [ 52.247727] RIP: 0010:bpf_redirect+0x18/0x80 ...
[...]
Setting a breakpoint inside bpf_net_ctx_get_ri() confirms that current->bpf_net_context is NULL right before the panic.
(gdb) p $lx_current().bpf_net_context $4 = (struct bpf_net_context *) 0x0 <fixed_percpu_data> (gdb) disassemble bpf_redirect Dump of assembler code for function bpf_redirect: 0xffffffff81f085e0 <+0>: nopl 0x0(%rax,%rax,1) 0xffffffff81f085e5 <+5>: mov %gs:0x7e12d593(%rip),%rax 0xffffffff81f085ed <+13>: push %rbp 0xffffffff81f085ee <+14>: mov 0x23d0(%rax),%rax => 0xffffffff81f085f5 <+21>: mov %rsp,%rbp 0xffffffff81f085f8 <+24>: mov 0x38(%rax),%edx ... (gdb) continue Continuing.
Thread 1 hit Breakpoint 1, panic ... 288 { (gdb)
commit 401cb7dae813 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.") recently moved bpf_redirect_info into bpf_net_context, a new member of task_struct. Currently, current->bpf_net_context is set and then cleared inside sch_handle_egress() where tcx_run() and tc_run() execute, but it looks like netkit_xmit() was missed leaving current->bpf_net_context uninitialized when it runs. This patch ensures that current->bpf_net_context is initialized while running netkit_xmit().
Signed-off-by: Jordan Rife jrife@google.com Fixes: 401cb7dae813 ("net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.")
Thanks for the fix! Similar patch is however already in net tree :
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=15...
Best, Daniel