Hello,
Wei reported when loading his bpf prog in 5.10.200 kernel, host would panic, this didn't happen in 5.10.135 kernel. Test on latest v5.10.238 still has this panic.
[ 26.531718] BUG: kernel NULL pointer dereference, address: 0000000000000168 [ 26.538093] #PF: supervisor read access in kernel mode [ 26.542727] #PF: error_code(0x0000) - not-present page [ 26.548093] PGD 10f3e9067 P4D 10f332067 PUD 10f0c5067 PMD 0 [ 26.553211] Oops: 0000 [#1] SMP NOPTI [ 26.556531] CPU: 2 PID: 541 Comm: main Not tainted 5.10.238-00267-g01e7e36b8606 #63 [ 26.563816] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 26.572357] RIP: 0010:__mark_chain_precision+0x24b/0x4d0 [ 26.576572] Code: 51 01 be 20 00 00 00 4c 89 ef 48 63 d2 e8 bd df 31 00 89 c1 83 f8 1f 7f 29 48 63 d1 48 89 d0 48 c1 e0 04 48 29 d0 48 8d 04 c3 <83> 38 01 75 c3 0f b6 74 24 06 80 78 74 00 c6 40 74 01 44 0f 44 f6 [ 26.589100] RSP: 0018:ffa0000000ff7b60 EFLAGS: 00010216 [ 26.592612] RAX: 0000000000000168 RBX: 0000000000000000 RCX: 0000000000000003 [ 26.597416] RDX: 0000000000000003 RSI: 0000000000000020 RDI: ffa0000000ff7b78 [ 26.601362] RBP: 0000000000000003 R08: ffa0000000ff7b70 R09: 0000000000000004 [ 26.604261] R10: 0000000000000007 R11: ffa0000000425000 R12: ff11000102ee2000 [ 26.607202] R13: ffa0000000ff7b78 R14: 0000000000000000 R15: ff1100010ee37140 [ 26.610327] FS: 00000000007a0630(0000) GS:ff1100081c400000(0000) knlGS:0000000000000000 [ 26.613678] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 26.616105] CR2: 0000000000000168 CR3: 0000000115e72002 CR4: 0000000000371ee0 [ 26.619059] Call Trace: [ 26.620118] adjust_reg_min_max_vals+0x133/0x340 [ 26.622048] ? krealloc+0x63/0xe0 [ 26.623435] do_check+0x38c/0xa80 [ 26.624859] do_check_common+0x15b/0x280 [ 26.626496] bpf_check+0xbe1/0xd30 [ 26.627939] ? srso_alias_return_thunk+0x5/0x7f [ 26.629796] ? trace_hardirqs_on+0x1a/0xd0 [ 26.631503] ? srso_alias_return_thunk+0x5/0x7f [ 26.633402] bpf_prog_load+0x422/0x8a0 [ 26.634987] ? srso_alias_return_thunk+0x5/0x7f [ 26.636864] ? __handle_mm_fault+0x3cb/0x6d0 [ 26.638658] ? srso_alias_return_thunk+0x5/0x7f [ 26.640543] ? lock_release+0xe3/0x110 [ 26.642114] __do_sys_bpf+0x485/0xdf0 [ 26.643624] do_syscall_64+0x33/0x40 [ 26.645110] entry_SYSCALL_64_after_hwframe+0x67/0xd1 [ 26.647190] RIP: 0033:0x409a6e [ 26.648470] Code: 24 28 44 8b 44 24 2c e9 70 ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48 [ 26.656154] RSP: 002b:000000c00199edc0 EFLAGS: 00000212 ORIG_RAX: 0000000000000141 [ 26.659451] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000409a6e [ 26.662375] RDX: 0000000000000098 RSI: 000000c00199f290 RDI: 0000000000000005 [ 26.665267] RBP: 000000c00199ee00 R08: 0000000000000000 R09: 0000000000000000 [ 26.668204] R10: 0000000000000000 R11: 0000000000000212 R12: 0000000000000000 [ 26.671125] R13: 0000000000000080 R14: 000000c000002380 R15: 8080808080808080 [ 26.674085] Modules linked in: [ 26.675363] CR2: 0000000000000168 [ 26.676772] ---[ end trace 3fc192ee4dabbf12 ]--- [ 26.678667] RIP: 0010:__mark_chain_precision+0x24b/0x4d0 [ 26.680926] Code: 51 01 be 20 00 00 00 4c 89 ef 48 63 d2 e8 bd df 31 00 89 c1 83 f8 1f 7f 29 48 63 d1 48 89 d0 48 c1 e0 04 48 29 d0 48 8d 04 c3 <83> 38 01 75 c3 0f b6 74 24 06 80 78 74 00 c6 40 74 01 44 0f 44 f6 [ 26.688665] RSP: 0018:ffa0000000ff7b60 EFLAGS: 00010216 [ 26.690828] RAX: 0000000000000168 RBX: 0000000000000000 RCX: 0000000000000003 [ 26.693777] RDX: 0000000000000003 RSI: 0000000000000020 RDI: ffa0000000ff7b78 [ 26.696680] RBP: 0000000000000003 R08: ffa0000000ff7b70 R09: 0000000000000004 [ 26.699651] R10: 0000000000000007 R11: ffa0000000425000 R12: ff11000102ee2000 [ 26.702561] R13: ffa0000000ff7b78 R14: 0000000000000000 R15: ff1100010ee37140 [ 26.705522] FS: 00000000007a0630(0000) GS:ff1100081c400000(0000) knlGS:0000000000000000 [ 26.708806] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 26.711179] CR2: 0000000000000168 CR3: 0000000115e72002 CR4: 0000000000371ee0 [ 26.714143] Kernel panic - not syncing: Fatal exception [ 26.716893] Kernel Offset: disabled [ 26.718911] Rebooting in 5 seconds..
I did a bisect in linux-5.10.y branch and found the fbc is commit 2474ec58b96d("bpf: allow precision tracking for programs with subprogs").
I noticed there is a commit in Linus master branch that has a fix tag for this bisected commit: commit 81335f90e8a8("bpf: unconditionally reset backtrack_state masks on global func exit"), I tried to apply it in this 5.10.y branch but since the bases are quite different, clean apply is not possible, I end up with the following diff:
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 40ac67a04ab75..71da33fb96552 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2118,11 +2118,9 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int frame, int r bitmap_from_u64(mask, reg_mask); for_each_set_bit(i, mask, 32) { reg = &st->frame[0]->regs[i]; - if (reg->type != SCALAR_VALUE) { - reg_mask &= ~(1u << i); - continue; - } - reg->precise = true; + reg_mask &= ~(1u << i); + if (reg->type == SCALAR_VALUE) + reg->precise = true; } return 0; }
But it didn't make any difference.
Here are the reproduce steps: 1 clone this repo https://github.com/bytedance/vArmor-ebpf and switch to panic-analysis branch; 2 make build A binary named main should be built. I used golang compiler downloaded here: https://go.dev/dl/go1.24.3.linux-amd64.tar.gz but other golang compiler may also work.
Run main as root and it will panic the host(kernel needs CONFIG_BPF_LSM).
Full dmesg and config are attached, feel free to let me know if you need any additional info, thanks.
P.S. linux-5.15.y has the same situation.
Ping?
On Thu, Jun 05, 2025 at 03:09:21PM +0800, Aaron Lu wrote:
Hello,
Wei reported when loading his bpf prog in 5.10.200 kernel, host would panic, this didn't happen in 5.10.135 kernel. Test on latest v5.10.238 still has this panic.
If a fix is not easy for these stable kernels, I think we should revert this commit? Because for whatever bpf progs, the bpf verifier should not panic the kernel.
Regarding revert, per my test, the following four commits in linux-5.10.y branch have to be reverted and after that, the kernel does not panic anymore: commit 2474ec58b96d("bpf: allow precision tracking for programs with subprogs") commit 7ca3e7459f4a("bpf: stop setting precise in current state") commit 1952a4d5e4cf("bpf: aggressively forget precise markings during state checkpointing") commit 4af2d9ddb7e7("selftests/bpf: make test_align selftest more robust")
[ 26.531718] BUG: kernel NULL pointer dereference, address: 0000000000000168 [ 26.538093] #PF: supervisor read access in kernel mode [ 26.542727] #PF: error_code(0x0000) - not-present page [ 26.548093] PGD 10f3e9067 P4D 10f332067 PUD 10f0c5067 PMD 0 [ 26.553211] Oops: 0000 [#1] SMP NOPTI [ 26.556531] CPU: 2 PID: 541 Comm: main Not tainted 5.10.238-00267-g01e7e36b8606 #63 [ 26.563816] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 26.572357] RIP: 0010:__mark_chain_precision+0x24b/0x4d0 [ 26.576572] Code: 51 01 be 20 00 00 00 4c 89 ef 48 63 d2 e8 bd df 31 00 89 c1 83 f8 1f 7f 29 48 63 d1 48 89 d0 48 c1 e0 04 48 29 d0 48 8d 04 c3 <83> 38 01 75 c3 0f b6 74 24 06 80 78 74 00 c6 40 74 01 44 0f 44 f6 [ 26.589100] RSP: 0018:ffa0000000ff7b60 EFLAGS: 00010216 [ 26.592612] RAX: 0000000000000168 RBX: 0000000000000000 RCX: 0000000000000003 [ 26.597416] RDX: 0000000000000003 RSI: 0000000000000020 RDI: ffa0000000ff7b78 [ 26.601362] RBP: 0000000000000003 R08: ffa0000000ff7b70 R09: 0000000000000004 [ 26.604261] R10: 0000000000000007 R11: ffa0000000425000 R12: ff11000102ee2000 [ 26.607202] R13: ffa0000000ff7b78 R14: 0000000000000000 R15: ff1100010ee37140 [ 26.610327] FS: 00000000007a0630(0000) GS:ff1100081c400000(0000) knlGS:0000000000000000 [ 26.613678] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 26.616105] CR2: 0000000000000168 CR3: 0000000115e72002 CR4: 0000000000371ee0 [ 26.619059] Call Trace: [ 26.620118] adjust_reg_min_max_vals+0x133/0x340 [ 26.622048] ? krealloc+0x63/0xe0 [ 26.623435] do_check+0x38c/0xa80 [ 26.624859] do_check_common+0x15b/0x280 [ 26.626496] bpf_check+0xbe1/0xd30 [ 26.627939] ? srso_alias_return_thunk+0x5/0x7f [ 26.629796] ? trace_hardirqs_on+0x1a/0xd0 [ 26.631503] ? srso_alias_return_thunk+0x5/0x7f [ 26.633402] bpf_prog_load+0x422/0x8a0 [ 26.634987] ? srso_alias_return_thunk+0x5/0x7f [ 26.636864] ? __handle_mm_fault+0x3cb/0x6d0 [ 26.638658] ? srso_alias_return_thunk+0x5/0x7f [ 26.640543] ? lock_release+0xe3/0x110 [ 26.642114] __do_sys_bpf+0x485/0xdf0 [ 26.643624] do_syscall_64+0x33/0x40 [ 26.645110] entry_SYSCALL_64_after_hwframe+0x67/0xd1 [ 26.647190] RIP: 0033:0x409a6e [ 26.648470] Code: 24 28 44 8b 44 24 2c e9 70 ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48 [ 26.656154] RSP: 002b:000000c00199edc0 EFLAGS: 00000212 ORIG_RAX: 0000000000000141 [ 26.659451] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000409a6e [ 26.662375] RDX: 0000000000000098 RSI: 000000c00199f290 RDI: 0000000000000005 [ 26.665267] RBP: 000000c00199ee00 R08: 0000000000000000 R09: 0000000000000000 [ 26.668204] R10: 0000000000000000 R11: 0000000000000212 R12: 0000000000000000 [ 26.671125] R13: 0000000000000080 R14: 000000c000002380 R15: 8080808080808080 [ 26.674085] Modules linked in: [ 26.675363] CR2: 0000000000000168 [ 26.676772] ---[ end trace 3fc192ee4dabbf12 ]--- [ 26.678667] RIP: 0010:__mark_chain_precision+0x24b/0x4d0 [ 26.680926] Code: 51 01 be 20 00 00 00 4c 89 ef 48 63 d2 e8 bd df 31 00 89 c1 83 f8 1f 7f 29 48 63 d1 48 89 d0 48 c1 e0 04 48 29 d0 48 8d 04 c3 <83> 38 01 75 c3 0f b6 74 24 06 80 78 74 00 c6 40 74 01 44 0f 44 f6 [ 26.688665] RSP: 0018:ffa0000000ff7b60 EFLAGS: 00010216 [ 26.690828] RAX: 0000000000000168 RBX: 0000000000000000 RCX: 0000000000000003 [ 26.693777] RDX: 0000000000000003 RSI: 0000000000000020 RDI: ffa0000000ff7b78 [ 26.696680] RBP: 0000000000000003 R08: ffa0000000ff7b70 R09: 0000000000000004 [ 26.699651] R10: 0000000000000007 R11: ffa0000000425000 R12: ff11000102ee2000 [ 26.702561] R13: ffa0000000ff7b78 R14: 0000000000000000 R15: ff1100010ee37140 [ 26.705522] FS: 00000000007a0630(0000) GS:ff1100081c400000(0000) knlGS:0000000000000000 [ 26.708806] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 26.711179] CR2: 0000000000000168 CR3: 0000000115e72002 CR4: 0000000000371ee0 [ 26.714143] Kernel panic - not syncing: Fatal exception [ 26.716893] Kernel Offset: disabled [ 26.718911] Rebooting in 5 seconds..
On Mon, Jun 16, 2025 at 03:06:17PM +0800, Aaron Lu wrote:
Ping?
On Thu, Jun 05, 2025 at 03:09:21PM +0800, Aaron Lu wrote:
Hello,
Wei reported when loading his bpf prog in 5.10.200 kernel, host would panic, this didn't happen in 5.10.135 kernel. Test on latest v5.10.238 still has this panic.
If a fix is not easy for these stable kernels, I think we should revert this commit? Because for whatever bpf progs, the bpf verifier should not panic the kernel.
Regarding revert, per my test, the following four commits in linux-5.10.y branch have to be reverted and after that, the kernel does not panic anymore: commit 2474ec58b96d("bpf: allow precision tracking for programs with subprogs") commit 7ca3e7459f4a("bpf: stop setting precise in current state") commit 1952a4d5e4cf("bpf: aggressively forget precise markings during state checkpointing") commit 4af2d9ddb7e7("selftests/bpf: make test_align selftest more robust")
Can you send the reverts for this, so that you get credit for finding and fixing this issue, and you can put the correct wording in the commit messages for why they need to be reverted?
thanks,
greg k-h
On Mon, Jun 23, 2025 at 10:17:15AM +0200, Greg Kroah-Hartman wrote:
On Mon, Jun 16, 2025 at 03:06:17PM +0800, Aaron Lu wrote:
Ping?
On Thu, Jun 05, 2025 at 03:09:21PM +0800, Aaron Lu wrote:
Hello,
Wei reported when loading his bpf prog in 5.10.200 kernel, host would panic, this didn't happen in 5.10.135 kernel. Test on latest v5.10.238 still has this panic.
If a fix is not easy for these stable kernels, I think we should revert this commit? Because for whatever bpf progs, the bpf verifier should not panic the kernel.
Regarding revert, per my test, the following four commits in linux-5.10.y branch have to be reverted and after that, the kernel does not panic anymore: commit 2474ec58b96d("bpf: allow precision tracking for programs with subprogs") commit 7ca3e7459f4a("bpf: stop setting precise in current state") commit 1952a4d5e4cf("bpf: aggressively forget precise markings during state checkpointing") commit 4af2d9ddb7e7("selftests/bpf: make test_align selftest more robust")
Can you send the reverts for this, so that you get credit for finding and fixing this issue, and you can put the correct wording in the commit messages for why they need to be reverted?
No problem, thanks for the info.
I have sent them: https://lore.kernel.org/stable/20250623115403.299-1-ziqianlu@bytedance.com/
On Mon, Jun 23, 2025 at 07:55:52PM +0800, Aaron Lu wrote:
On Mon, Jun 23, 2025 at 10:17:15AM +0200, Greg Kroah-Hartman wrote:
On Mon, Jun 16, 2025 at 03:06:17PM +0800, Aaron Lu wrote:
Ping?
On Thu, Jun 05, 2025 at 03:09:21PM +0800, Aaron Lu wrote:
Hello,
Wei reported when loading his bpf prog in 5.10.200 kernel, host would panic, this didn't happen in 5.10.135 kernel. Test on latest v5.10.238 still has this panic.
If a fix is not easy for these stable kernels, I think we should revert this commit? Because for whatever bpf progs, the bpf verifier should not panic the kernel.
Regarding revert, per my test, the following four commits in linux-5.10.y branch have to be reverted and after that, the kernel does not panic anymore: commit 2474ec58b96d("bpf: allow precision tracking for programs with subprogs") commit 7ca3e7459f4a("bpf: stop setting precise in current state") commit 1952a4d5e4cf("bpf: aggressively forget precise markings during state checkpointing") commit 4af2d9ddb7e7("selftests/bpf: make test_align selftest more robust")
Can you send the reverts for this, so that you get credit for finding and fixing this issue, and you can put the correct wording in the commit messages for why they need to be reverted?
No problem, thanks for the info.
I have sent them: https://lore.kernel.org/stable/20250623115403.299-1-ziqianlu@bytedance.com/
All now queued up, thanks!
greg k-h
On 2025/6/23 20:03, Greg Kroah-Hartman wrote:
On Mon, Jun 23, 2025 at 07:55:52PM +0800, Aaron Lu wrote:
On Mon, Jun 23, 2025 at 10:17:15AM +0200, Greg Kroah-Hartman wrote:
On Mon, Jun 16, 2025 at 03:06:17PM +0800, Aaron Lu wrote:
Ping?
On Thu, Jun 05, 2025 at 03:09:21PM +0800, Aaron Lu wrote:
Hello,
Wei reported when loading his bpf prog in 5.10.200 kernel, host would panic, this didn't happen in 5.10.135 kernel. Test on latest v5.10.238 still has this panic.
If a fix is not easy for these stable kernels, I think we should revert this commit? Because for whatever bpf progs, the bpf verifier should not panic the kernel.
Regarding revert, per my test, the following four commits in linux-5.10.y branch have to be reverted and after that, the kernel does not panic anymore: commit 2474ec58b96d("bpf: allow precision tracking for programs with subprogs") commit 7ca3e7459f4a("bpf: stop setting precise in current state") commit 1952a4d5e4cf("bpf: aggressively forget precise markings during state checkpointing") commit 4af2d9ddb7e7("selftests/bpf: make test_align selftest more robust")
Hi Aaron, Greg,
Sorry for the late. Just found a fix [0] for this issue, we don't need to revert this bugfix series. Hope that will help!
Link: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=4bb7e... [0]
Can you send the reverts for this, so that you get credit for finding and fixing this issue, and you can put the correct wording in the commit messages for why they need to be reverted?
No problem, thanks for the info.
I have sent them: https://lore.kernel.org/stable/20250623115403.299-1-ziqianlu@bytedance.com/
All now queued up, thanks!
greg k-h
On Tue, Jun 24, 2025 at 09:32:54AM +0800, Pu Lehui wrote:
Hi Aaron, Greg,
Sorry for the late. Just found a fix [0] for this issue, we don't need to revert this bugfix series. Hope that will help!
Link: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=4bb7e... [0]
I can confirm this also fixed the panic issue on top of 5.10.238.
Hi Greg,
The cherry pick is not clean but can be trivially fixed. I've appended the patch I've used for test below for your reference in case you want to take it and drop that revert series. Thanks.
From f0e1047ee11e4ab902a413736e4fd4fb32b278c8 Mon Sep 17 00:00:00 2001 From: Andrii Nakryiko andrii@kernel.org Date: Thu, 9 Nov 2023 16:26:37 -0800 Subject: [PATCH] bpf: fix precision backtracking instruction iteration
commit 4bb7ea946a370707315ab774432963ce47291946 upstream.
Fix an edge case in __mark_chain_precision() which prematurely stops backtracking instructions in a state if it happens that state's first and last instruction indexes are the same. This situations doesn't necessarily mean that there were no instructions simulated in a state, but rather that we starting from the instruction, jumped around a bit, and then ended up at the same instruction before checkpointing or marking precision.
To distinguish between these two possible situations, we need to consult jump history. If it's empty or contain a single record "bridging" parent state and first instruction of processed state, then we indeed backtracked all instructions in this state. But if history is not empty, we are definitely not done yet.
Move this logic inside get_prev_insn_idx() to contain it more nicely. Use -ENOENT return code to denote "we are out of instructions" situation.
This bug was exposed by verifier_loop1.c's bounded_recursion subtest, once the next fix in this patch set is applied.
Acked-by: Eduard Zingerman eddyz87@gmail.com Fixes: b5dc0163d8fd ("bpf: precise scalar_value tracking") Signed-off-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/r/20231110002638.4168352-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Aaron Lu ziqianlu@bytedance.com --- kernel/bpf/verifier.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index e6d50e371a2b8..75251870430e4 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1796,12 +1796,29 @@ static int push_jmp_history(struct bpf_verifier_env *env,
/* Backtrack one insn at a time. If idx is not at the top of recorded * history then previous instruction came from straight line execution. + * Return -ENOENT if we exhausted all instructions within given state. + * + * It's legal to have a bit of a looping with the same starting and ending + * insn index within the same state, e.g.: 3->4->5->3, so just because current + * instruction index is the same as state's first_idx doesn't mean we are + * done. If there is still some jump history left, we should keep going. We + * need to take into account that we might have a jump history between given + * state's parent and itself, due to checkpointing. In this case, we'll have + * history entry recording a jump from last instruction of parent state and + * first instruction of given state. */ static int get_prev_insn_idx(struct bpf_verifier_state *st, int i, u32 *history) { u32 cnt = *history;
+ if (i == st->first_insn_idx) { + if (cnt == 0) + return -ENOENT; + if (cnt == 1 && st->jmp_history[0].idx == i) + return -ENOENT; + } + if (cnt && st->jmp_history[cnt - 1].idx == i) { i = st->jmp_history[cnt - 1].prev_idx; (*history)--; @@ -2269,9 +2286,9 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int frame, int r * Nothing to be tracked further in the parent state. */ return 0; - if (i == first_idx) - break; i = get_prev_insn_idx(st, i, &history); + if (i == -ENOENT) + break; if (i >= env->prog->len) { /* This can happen if backtracking reached insn 0 * and there are still reg_mask or stack_mask
On 2025/6/24 11:52, Aaron Lu wrote:
On Tue, Jun 24, 2025 at 09:32:54AM +0800, Pu Lehui wrote:
Hi Aaron, Greg,
Sorry for the late. Just found a fix [0] for this issue, we don't need to revert this bugfix series. Hope that will help!
Link: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=4bb7e... [0]
I can confirm this also fixed the panic issue on top of 5.10.238.
Hi Greg,
The cherry pick is not clean but can be trivially fixed. I've appended the patch I've used for test below for your reference in case you want to take it and drop that revert series. Thanks.
From f0e1047ee11e4ab902a413736e4fd4fb32b278c8 Mon Sep 17 00:00:00 2001
From: Andrii Nakryiko andrii@kernel.org Date: Thu, 9 Nov 2023 16:26:37 -0800 Subject: [PATCH] bpf: fix precision backtracking instruction iteration
commit 4bb7ea946a370707315ab774432963ce47291946 upstream.
Fix an edge case in __mark_chain_precision() which prematurely stops backtracking instructions in a state if it happens that state's first and last instruction indexes are the same. This situations doesn't necessarily mean that there were no instructions simulated in a state, but rather that we starting from the instruction, jumped around a bit, and then ended up at the same instruction before checkpointing or marking precision.
To distinguish between these two possible situations, we need to consult jump history. If it's empty or contain a single record "bridging" parent state and first instruction of processed state, then we indeed backtracked all instructions in this state. But if history is not empty, we are definitely not done yet.
Move this logic inside get_prev_insn_idx() to contain it more nicely. Use -ENOENT return code to denote "we are out of instructions" situation.
This bug was exposed by verifier_loop1.c's bounded_recursion subtest, once the next fix in this patch set is applied.
Acked-by: Eduard Zingerman eddyz87@gmail.com Fixes: b5dc0163d8fd ("bpf: precise scalar_value tracking") Signed-off-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/r/20231110002638.4168352-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov ast@kernel.org
Alright, this patch should target for linux-5.10.y and linux-5.15.y.
And better to add here with the follow tag:
Reported-by: Wei Wei weiwei.danny@bytedance.com Closes: https://lore.kernel.org/all/20250605070921.GA3795@bytedance/
Signed-off-by: Aaron Lu ziqianlu@bytedance.com
kernel/bpf/verifier.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index e6d50e371a2b8..75251870430e4 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1796,12 +1796,29 @@ static int push_jmp_history(struct bpf_verifier_env *env, /* Backtrack one insn at a time. If idx is not at the top of recorded
- history then previous instruction came from straight line execution.
- Return -ENOENT if we exhausted all instructions within given state.
- It's legal to have a bit of a looping with the same starting and ending
- insn index within the same state, e.g.: 3->4->5->3, so just because current
- instruction index is the same as state's first_idx doesn't mean we are
- done. If there is still some jump history left, we should keep going. We
- need to take into account that we might have a jump history between given
- state's parent and itself, due to checkpointing. In this case, we'll have
- history entry recording a jump from last instruction of parent state and
*/ static int get_prev_insn_idx(struct bpf_verifier_state *st, int i, u32 *history) { u32 cnt = *history;
- first instruction of given state.
- if (i == st->first_insn_idx) {
if (cnt == 0)
return -ENOENT;
if (cnt == 1 && st->jmp_history[0].idx == i)
return -ENOENT;
- }
- if (cnt && st->jmp_history[cnt - 1].idx == i) { i = st->jmp_history[cnt - 1].prev_idx; (*history)--;
@@ -2269,9 +2286,9 @@ static int __mark_chain_precision(struct bpf_verifier_env *env, int frame, int r * Nothing to be tracked further in the parent state. */ return 0;
if (i == first_idx)
break; i = get_prev_insn_idx(st, i, &history);
if (i == -ENOENT)
break; if (i >= env->prog->len) { /* This can happen if backtracking reached insn 0 * and there are still reg_mask or stack_mask
On Tue, Jun 24, 2025 at 02:41:56PM +0800, Pu Lehui wrote:
On 2025/6/24 11:52, Aaron Lu wrote:
On Tue, Jun 24, 2025 at 09:32:54AM +0800, Pu Lehui wrote:
Hi Aaron, Greg,
Sorry for the late. Just found a fix [0] for this issue, we don't need to revert this bugfix series. Hope that will help!
Link: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=4bb7e... [0]
I can confirm this also fixed the panic issue on top of 5.10.238.
Hi Greg,
The cherry pick is not clean but can be trivially fixed. I've appended the patch I've used for test below for your reference in case you want to take it and drop that revert series. Thanks.
From f0e1047ee11e4ab902a413736e4fd4fb32b278c8 Mon Sep 17 00:00:00 2001
From: Andrii Nakryiko andrii@kernel.org Date: Thu, 9 Nov 2023 16:26:37 -0800 Subject: [PATCH] bpf: fix precision backtracking instruction iteration
commit 4bb7ea946a370707315ab774432963ce47291946 upstream.
Fix an edge case in __mark_chain_precision() which prematurely stops backtracking instructions in a state if it happens that state's first and last instruction indexes are the same. This situations doesn't necessarily mean that there were no instructions simulated in a state, but rather that we starting from the instruction, jumped around a bit, and then ended up at the same instruction before checkpointing or marking precision.
To distinguish between these two possible situations, we need to consult jump history. If it's empty or contain a single record "bridging" parent state and first instruction of processed state, then we indeed backtracked all instructions in this state. But if history is not empty, we are definitely not done yet.
Move this logic inside get_prev_insn_idx() to contain it more nicely. Use -ENOENT return code to denote "we are out of instructions" situation.
This bug was exposed by verifier_loop1.c's bounded_recursion subtest, once the next fix in this patch set is applied.
Acked-by: Eduard Zingerman eddyz87@gmail.com Fixes: b5dc0163d8fd ("bpf: precise scalar_value tracking") Signed-off-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/r/20231110002638.4168352-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov ast@kernel.org
Alright, this patch should target for linux-5.10.y and linux-5.15.y.
And better to add here with the follow tag:
Reported-by: Wei Wei weiwei.danny@bytedance.com Closes: https://lore.kernel.org/all/20250605070921.GA3795@bytedance/
Thanks, I've dropped the reverts and now queued this up. Let's push out a -rc2 and see how that goes through testing...
greg k-h
On Tue, Jun 24, 2025 at 11:33:20AM +0100, Greg Kroah-Hartman wrote:
On Tue, Jun 24, 2025 at 02:41:56PM +0800, Pu Lehui wrote:
On 2025/6/24 11:52, Aaron Lu wrote:
On Tue, Jun 24, 2025 at 09:32:54AM +0800, Pu Lehui wrote:
Hi Aaron, Greg,
Sorry for the late. Just found a fix [0] for this issue, we don't need to revert this bugfix series. Hope that will help!
Link: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=4bb7e... [0]
I can confirm this also fixed the panic issue on top of 5.10.238.
Hi Greg,
The cherry pick is not clean but can be trivially fixed. I've appended the patch I've used for test below for your reference in case you want to take it and drop that revert series. Thanks.
From f0e1047ee11e4ab902a413736e4fd4fb32b278c8 Mon Sep 17 00:00:00 2001
From: Andrii Nakryiko andrii@kernel.org Date: Thu, 9 Nov 2023 16:26:37 -0800 Subject: [PATCH] bpf: fix precision backtracking instruction iteration
commit 4bb7ea946a370707315ab774432963ce47291946 upstream.
Fix an edge case in __mark_chain_precision() which prematurely stops backtracking instructions in a state if it happens that state's first and last instruction indexes are the same. This situations doesn't necessarily mean that there were no instructions simulated in a state, but rather that we starting from the instruction, jumped around a bit, and then ended up at the same instruction before checkpointing or marking precision.
To distinguish between these two possible situations, we need to consult jump history. If it's empty or contain a single record "bridging" parent state and first instruction of processed state, then we indeed backtracked all instructions in this state. But if history is not empty, we are definitely not done yet.
Move this logic inside get_prev_insn_idx() to contain it more nicely. Use -ENOENT return code to denote "we are out of instructions" situation.
This bug was exposed by verifier_loop1.c's bounded_recursion subtest, once the next fix in this patch set is applied.
Acked-by: Eduard Zingerman eddyz87@gmail.com Fixes: b5dc0163d8fd ("bpf: precise scalar_value tracking") Signed-off-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/r/20231110002638.4168352-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov ast@kernel.org
Alright, this patch should target for linux-5.10.y and linux-5.15.y.
And better to add here with the follow tag:
Reported-by: Wei Wei weiwei.danny@bytedance.com Closes: https://lore.kernel.org/all/20250605070921.GA3795@bytedance/
Thanks, I've dropped the reverts and now queued this up. Let's push out a -rc2 and see how that goes through testing...
Thanks Greg.
5.15 stable tree also has this problem and after applying the above patch to 5.15.185, the problem is also fixed. I appreciate if you can also queue it for 5.15 stable branch, thanks.
linux-stable-mirror@lists.linaro.org