________________________________________ From: Daniel Borkmann daniel@iogearbox.net Sent: Tuesday, July 9, 2024 10:44 AM To: KP Singh kpsingh@kernel.org Cc: Puranjay Mohan puranjay@kernel.org; Andrii Nakryiko andrii@kernel.org; Eduard Zingerman eddyz87@gmail.com; Mykola Lysenko mykolal@meta.com; Alexei Starovoitov ast@kernel.org; Martin KaFai Lau martin.lau@linux.dev; Song Liu song@kernel.org; Yonghong Song yonghong.song@linux.dev; John Fastabend john.fastabend@gmail.com; Stanislav Fomichev sdf@google.com; Hao Luo haoluo@google.com; Jiri Olsa jolsa@kernel.org; Shuah Khan shuah@kernel.org; bpf@vger.kernel.org bpf@vger.kernel.org; linux-kselftest@vger.kernel.org linux-kselftest@vger.kernel.org; linux-kernel@vger.kernel.org linux-kernel@vger.kernel.org; Manu Bretelle chantra@meta.com; Florent Revest revest@google.com Subject: Re: [PATCH bpf] selftests/bpf: DENYLIST.aarch64: Remove fexit_sleep On 7/8/24 6:42 PM, KP Singh wrote:
On Mon, Jul 8, 2024 at 6:09 PM Daniel Borkmann daniel@iogearbox.net wrote:
On 7/8/24 5:35 PM, Puranjay Mohan wrote:
Daniel Borkmann daniel@iogearbox.net writes:
On 7/8/24 5:26 PM, KP Singh wrote:
On Mon, Jul 8, 2024 at 5:00 PM Puranjay Mohan puranjay@kernel.org wrote:
Daniel Borkmann daniel@iogearbox.net writes: > On 7/5/24 4:50 PM, Puranjay Mohan wrote: >> fexit_sleep test runs successfully now on the CI so remove it from the >> deny list. > > Do you happen to know which commit fixed it? If yes, might be nice to have it > documented in the commit message.
Actually, I never saw this test failing on my local setup and yesterday I tried running it on the CI where it passed as well. So, I assumed that this would be fixed by some commit. I am not sure which exact commit might have fixed this.
Manu, Martin
When this was added to the deny list was this failing every time and did you have some reproducer for this. If there is a reproducer, I can try fixing it but when ran normally this test never fails for me.
I think this never worked until https://lore.kernel.org/lkml/20230405180250.2046566-1-revest@chromium.org/ was merged, FTrace direct calls was blocking tracing programs on ARM, since then it has always worked.
Awesome, thanks! I'll add this to the commit desc then when applying.
The commit that added this to the deny list said: 31f4f810d533 ("selftests/bpf: Add fexit_sleep to DENYLIST.aarch64")
It is reported that the fexit_sleep never returns in aarch64. The remaining tests cannot start.
It may also have something to do with sleepable programs. But I think it's generally in the category of "BPF tracing was catching up with ARM", it has now.
Hm, the latest run actually hangs in fexit_sleep (which is the test right after fexit_bpf2bpf). So looks like this was too early. It seems some CI runs pass on arm64 but others fail:
https://github.com/kernel-patches/bpf/actions/runs/9859826851/job/2722486839...) https://github.com/kernel-patches/bpf/actions/runs/9859837213/job/2722495504...)
Puranjay, do you have a chance to look into this again?
Probably unrelated... but when I tried to reproduce this using qemu in full emulation mode [0], I am getting a kernel crash for fexit_sleep, but also for fexit_bpf2bpf, fentry_fexit
stacktraces look like (for fentry_fexit)
root@(none):/mnt/vmtest/selftests/bpf# ./test_progs -v -t fentry_fexit bpf_testmod.ko is already unloaded. Loading bpf_testmod.ko... Successfully loaded bpf_testmod.ko. test_fentry_fexit:PASS:fentry_skel_load 0 nsec test_fentry_fexit:PASS:fexit_skel_load 0 nsec
test_fentry_fexit:PASS:fentry_attach 0 nsec test_fentry_fexit:PASS:fexit_attach 0 nsec Unable to handle kernel paging request at virtual address ffff0000c2a80e68 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000041b4a000 [ffff0000c2a80e68] pgd=1000000042f28003, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] SMP Modules linked in: bpf_testmod(OE) [last unloaded: bpf_testmod(OE)] CPU: 0 PID: 97 Comm: test_progs Tainted: G OE 6.10.0-rc6-gb0eedd920017-dirty #67 Hardware name: linux,dummy-virt (DT) pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : __bpf_tramp_enter+0x58/0x190 lr : __bpf_tramp_enter+0xd8/0x190 sp : ffff800084afbc10 x29: ffff800084afbc10 x28: fff00000c28c2e80 x27: 0000000000000000 x26: 0000000000000000 x25: 0000000000000050 x24: 0000000000000000 x23: 000000000000000a x22: fff00000c28c2e80 x21: 0000ffffed100070 x20: ffff800082032938 x19: ffff0000c2a80c00 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffed100070 x14: 0000000000000000 x13: ffff800082032938 x12: 0000000000000000 x11: 0000000000020007 x10: 0000000000000007 x9 : 00000000ffffffff x8 : 0000000000004008 x7 : ffff80008218fa78 x6 : 0000000000000000 x5 : 0000000000000001 x4 : 0000000086db7919 x3 : 0000000095481a34 x2 : 0000000000000001 x1 : fff00000c28c2e80 x0 : 0000000000000001 Call trace: __bpf_tramp_enter+0x58/0x190 bpf_trampoline_6442499844+0x44/0x158 bpf_fentry_test1+0x8/0x10 bpf_prog_test_run_tracing+0x190/0x328 __sys_bpf+0x844/0x2148 __arm64_sys_bpf+0x2c/0x48 invoke_syscall+0x4c/0x118 el0_svc_common.constprop.0+0x48/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x4c/0x120 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x190/0x198 Code: 52800001 97f9f3df 942a3be8 35000400 (f9413660) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x00,00000006,8c13bd78,576676af Memory Limit: none
For "fexit_sleep" and "fexit_bpf2bpf" respectively:
$ ( cd 9859826851 && vmtest -k kbuild-output/arch/arm64/boot/Image.gz -r ../aarch64-rootfs -a aarch64 '/bin/mount bpffs /sys/fs/bpf -t bpf && ip link set lo up && cd /mnt/vmtest/selftests/bpf/ && ./test_progs -v -t fexit_sleep' ) => Image.gz ===> Booting ===> Setting up VM ===> Running command root@(none):/# bpf_testmod: loading out-of-tree module taints kernel. bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel Unable to handle kernel paging request at virtual address ffff0000c19c2668 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000041b4a000 [ffff0000c19c2668] pgd=1000000042f28003, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] SMP Modules linked in: bpf_testmod(OE) CPU: 1 PID: 91 Comm: test_progs Tainted: G OE 6.10.0-rc6-gb0eedd920017-dirty #67 Hardware name: linux,dummy-virt (DT) pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : __bpf_tramp_enter+0x58/0x190 lr : __bpf_tramp_enter+0xd8/0x190 sp : ffff800084c4bda0 x29: ffff800084c4bda0 x28: fff00000c274ae80 x27: 0000000000000000 x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 x23: 0000000060001000 x22: 0000ffffa36b7a54 x21: 00000000ffffffff x20: ffff800082032938 x19: ffff0000c19c2400 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 x14: 0000000000000000 x13: ffff800082032938 x12: 0000000000000000 x11: 0000000000020007 x10: 0000000000000007 x9 : 00000000ffffffff x8 : 0000000000004008 x7 : ffff80008218fa78 x6 : 0000000000000000 x5 : 0000000000000001 x4 : 0000000086db7919 x3 : 0000000095481a34 x2 : 0000000000000001 x1 : fff00000c274ae80 x0 : 0000000000000001 Call trace: __bpf_tramp_enter+0x58/0x190 bpf_trampoline_6442487232+0x44/0x158 __arm64_sys_nanosleep+0x8/0xf0 invoke_syscall+0x4c/0x118 el0_svc_common.constprop.0+0x48/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x4c/0x120 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x190/0x198 Code: 52800001 97f9f3df 942a3be8 35000400 (f9413660) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x00,00000006,8c13bd78,576676af Memory Limit: none Failed to run command
Caused by: 0: Failed to QGA guest-exec-status 1: error running guest_exec_status 2: Broken pipe (os error 32) 3: Broken pipe (os error 32) [11:46:14] chantra@devvm17937:scratchpad $ [11:47:56] chantra@devvm17937:scratchpad $ [11:47:57] chantra@devvm17937:scratchpad $ ( cd 9859826851 && vmtest -k kbuild-output/arch/arm64/boot/Image.gz -r ../aarch64-rootfs -a aarch64 '/bin/mount bpffs /sys/fs/bpf -t bpf && ip link set lo up && cd /mnt/vmtest/selftests/bpf/ && ./test_progs -v -t fexit_bpf2bpf' ) => Image.gz ===> Booting ===> Setting up VM ===> Running command root@(none):/# bpf_testmod: loading out-of-tree module taints kernel. bpf_testmod: module verification failed: signature and/or required key missing - tainting kernel Unable to handle kernel paging request at virtual address ffff0000c278de68 Mem abort info: ESR = 0x0000000096000004 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault Data abort info: ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 swapper pgtable: 4k pages, 52-bit VAs, pgdp=0000000041b4a000 [ffff0000c278de68] pgd=1000000042f28003, p4d=0000000000000000 Internal error: Oops: 0000000096000004 [#1] SMP Modules linked in: bpf_testmod(OE) CPU: 1 PID: 87 Comm: test_progs Tainted: G OE 6.10.0-rc6-gb0eedd920017-dirty #67 Hardware name: linux,dummy-virt (DT) pstate: 01400005 (nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) pc : __bpf_tramp_enter+0x58/0x190 lr : __bpf_tramp_enter+0xd8/0x190 sp : ffff800084c4ba90 x29: ffff800084c4ba90 x28: ffff800080a32d10 x27: ffff800080a32d80 x26: ffff8000813e0ad8 x25: ffff800084c4bce4 x24: ffff800082fbd048 x23: 0000000000000001 x22: fff00000c2732e80 x21: fff00000c18a3200 x20: ffff800082032938 x19: ffff0000c278dc00 x18: 0000000000000000 x17: 0000000000000000 x16: 0000000000000000 x15: 0000aaaabcc22aa0 x14: 0000000000000000 x13: ffff800082032938 x12: 0000000000000000 x11: 0000000000000000 x10: 000000000ac0d5af x9 : 000000000ac0d5af x8 : 00000000a4d7a457 x7 : ffff80008218fa78 x6 : 0000000000000000 x5 : 0000000000000002 x4 : 0000000006fa0785 x3 : 0000000081d7cd4c x2 : 0000000000000202 x1 : fff00000c2732e80 x0 : 0000000000000001 Call trace: __bpf_tramp_enter+0x58/0x190 bpf_trampoline_34359738386+0x44/0xf8 bpf_prog_3b052b77318ab7c4_test_pkt_md_access+0x8/0x118 bpf_test_run+0x200/0x3a0 bpf_prog_test_run_skb+0x328/0x6d8 __sys_bpf+0x844/0x2148 __arm64_sys_bpf+0x2c/0x48 invoke_syscall+0x4c/0x118 el0_svc_common.constprop.0+0x48/0xf0 do_el0_svc+0x24/0x38 el0_svc+0x4c/0x120 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x190/0x198 Code: 52800001 97f9f3df 942a3be8 35000400 (f9413660) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt SMP: stopping secondary CPUs Kernel Offset: disabled CPU features: 0x00,00000006,8c13bd78,576676af Memory Limit: none Failed to run command
Caused by: 0: Failed to QGA guest-exec-status 1: error running guest_exec_status 2: Broken pipe (os error 32) 3: Broken pipe (os error 32)
[0] https://chantra.github.io/bpfcitools/bpfci-troubleshooting.html
So, if the lack of Ftrace direct calls would be the reason then the failure would be due to fexit programs not being supported on arm64.
But this says that the selftest never returns therefore is not related to ftrace direct call support but another bug?
Fwiw, at least it is passing in the BPF CI now.
https://github.com/kernel-patches/bpf/actions/runs/9841781347/job/2716961000...