Hi,
Wei reported when loading his bpf prog in 5.10.200 kernel, host would panic, this didn't happen in 5.10.135 kernel. Test on latest v5.10.238 still has this panic:
[ 26.531718] BUG: kernel NULL pointer dereference, address: 0000000000000168 [ 26.538093] #PF: supervisor read access in kernel mode [ 26.542727] #PF: error_code(0x0000) - not-present page [ 26.548093] PGD 10f3e9067 P4D 10f332067 PUD 10f0c5067 PMD 0 [ 26.553211] Oops: 0000 [#1] SMP NOPTI [ 26.556531] CPU: 2 PID: 541 Comm: main Not tainted 5.10.238-00267-g01e7e36b8606 #63 [ 26.563816] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 26.572357] RIP: 0010:__mark_chain_precision+0x24b/0x4d0 [ 26.576572] Code: 51 01 be 20 00 00 00 4c 89 ef 48 63 d2 e8 bd df 31 00 89 c1 83 f8 1f 7f 29 48 63 d1 48 89 d0 48 c1 e0 04 48 29 d0 48 8d 04 c3 <83> 38 01 75 c3 0f b6 74 24 06 80 78 74 00 c6 40 74 01 44 0f 44 f6 [ 26.589100] RSP: 0018:ffa0000000ff7b60 EFLAGS: 00010216 [ 26.592612] RAX: 0000000000000168 RBX: 0000000000000000 RCX: 0000000000000003 [ 26.597416] RDX: 0000000000000003 RSI: 0000000000000020 RDI: ffa0000000ff7b78 [ 26.601362] RBP: 0000000000000003 R08: ffa0000000ff7b70 R09: 0000000000000004 [ 26.604261] R10: 0000000000000007 R11: ffa0000000425000 R12: ff11000102ee2000 [ 26.607202] R13: ffa0000000ff7b78 R14: 0000000000000000 R15: ff1100010ee37140 [ 26.610327] FS: 00000000007a0630(0000) GS:ff1100081c400000(0000) knlGS:0000000000000000 [ 26.613678] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 26.616105] CR2: 0000000000000168 CR3: 0000000115e72002 CR4: 0000000000371ee0 [ 26.619059] Call Trace: [ 26.620118] adjust_reg_min_max_vals+0x133/0x340 [ 26.622048] ? krealloc+0x63/0xe0 [ 26.623435] do_check+0x38c/0xa80 [ 26.624859] do_check_common+0x15b/0x280 [ 26.626496] bpf_check+0xbe1/0xd30 [ 26.627939] ? srso_alias_return_thunk+0x5/0x7f [ 26.629796] ? trace_hardirqs_on+0x1a/0xd0 [ 26.631503] ? srso_alias_return_thunk+0x5/0x7f [ 26.633402] bpf_prog_load+0x422/0x8a0 [ 26.634987] ? srso_alias_return_thunk+0x5/0x7f [ 26.636864] ? __handle_mm_fault+0x3cb/0x6d0 [ 26.638658] ? srso_alias_return_thunk+0x5/0x7f [ 26.640543] ? lock_release+0xe3/0x110 [ 26.642114] __do_sys_bpf+0x485/0xdf0 [ 26.643624] do_syscall_64+0x33/0x40 [ 26.645110] entry_SYSCALL_64_after_hwframe+0x67/0xd1 [ 26.647190] RIP: 0033:0x409a6e [ 26.648470] Code: 24 28 44 8b 44 24 2c e9 70 ff ff ff cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48 [ 26.656154] RSP: 002b:000000c00199edc0 EFLAGS: 00000212 ORIG_RAX: 0000000000000141 [ 26.659451] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 0000000000409a6e [ 26.662375] RDX: 0000000000000098 RSI: 000000c00199f290 RDI: 0000000000000005 [ 26.665267] RBP: 000000c00199ee00 R08: 0000000000000000 R09: 0000000000000000 [ 26.668204] R10: 0000000000000000 R11: 0000000000000212 R12: 0000000000000000 [ 26.671125] R13: 0000000000000080 R14: 000000c000002380 R15: 8080808080808080 [ 26.674085] Modules linked in: [ 26.675363] CR2: 0000000000000168 [ 26.676772] ---[ end trace 3fc192ee4dabbf12 ]--- [ 26.678667] RIP: 0010:__mark_chain_precision+0x24b/0x4d0 [ 26.680926] Code: 51 01 be 20 00 00 00 4c 89 ef 48 63 d2 e8 bd df 31 00 89 c1 83 f8 1f 7f 29 48 63 d1 48 89 d0 48 c1 e0 04 48 29 d0 48 8d 04 c3 <83> 38 01 75 c3 0f b6 74 24 06 80 78 74 00 c6 40 74 01 44 0f 44 f6 [ 26.688665] RSP: 0018:ffa0000000ff7b60 EFLAGS: 00010216 [ 26.690828] RAX: 0000000000000168 RBX: 0000000000000000 RCX: 0000000000000003 [ 26.693777] RDX: 0000000000000003 RSI: 0000000000000020 RDI: ffa0000000ff7b78 [ 26.696680] RBP: 0000000000000003 R08: ffa0000000ff7b70 R09: 0000000000000004 [ 26.699651] R10: 0000000000000007 R11: ffa0000000425000 R12: ff11000102ee2000 [ 26.702561] R13: ffa0000000ff7b78 R14: 0000000000000000 R15: ff1100010ee37140 [ 26.705522] FS: 00000000007a0630(0000) GS:ff1100081c400000(0000) knlGS:0000000000000000 [ 26.708806] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 26.711179] CR2: 0000000000000168 CR3: 0000000115e72002 CR4: 0000000000371ee0 [ 26.714143] Kernel panic - not syncing: Fatal exception [ 26.716893] Kernel Offset: disabled [ 26.718911] Rebooting in 5 seconds..
I did a bisect in linux-5.10.y branch and found the fbc is commit 2474ec58b96d("bpf: allow precision tracking for programs with subprogs").
This series revert the above commit and related commits. After the revert, kernel does not panic anymore.
For detailed log and a reproducer, please reference this link: https://lore.kernel.org/stable/20250605070921.GA3795@bytedance
Aaron Lu (4): Revert "selftests/bpf: make test_align selftest more robust" Revert "bpf: aggressively forget precise markings during state checkpointing" Revert "bpf: stop setting precise in current state" Revert "bpf: allow precision tracking for programs with subprogs"
kernel/bpf/verifier.c | 175 ++---------------- .../testing/selftests/bpf/prog_tests/align.c | 36 ++-- 2 files changed, 26 insertions(+), 185 deletions(-)