This patch allows progs to elide a null check on statically known map lookup keys. In other words, if the verifier can statically prove that the lookup will be in-bounds, allow the prog to drop the null check.
This is useful for two reasons:
1. Large numbers of nullness checks (especially when they cannot fail) unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ. 2. It forms a tighter contract between programmer and verifier.
For (1), bpftrace is starting to make heavier use of percpu scratch maps. As a result, for user scripts with large number of unrolled loops, we are starting to hit jump complexity verification errors. These percpu lookups cannot fail anyways, as we only use static key values. Eliding nullness probably results in less work for verifier as well.
For (2), percpu scratch maps are often used as a larger stack, as the currrent stack is limited to 512 bytes. In these situations, it is desirable for the programmer to express: "this lookup should never fail, and if it does, it means I messed up the code". By omitting the null check, the programmer can "ask" the verifier to double check the logic.
Changes in v5: * Dropped all acks * Use s64 instead of long for const_map_key * Ensure stack slot contains spilled reg before accessing spilled_ptr * Ensure spilled reg is a scalar before accessing tnum const value * Fix verifier selftest for 32-bit write to write at 8 byte alignment to ensure spill is tracked * Introduce more precise tracking of helper stack accesses * Do constant map key extraction as part of helper argument processing and then remove duplicated stack checks * Use ret_flag instead of regs[BPF_REG_0].type * Handle STACK_ZERO * Fix bug in bpf_load_hdr_opt() arg annotation
Changes in v4: * Only allow for CAP_BPF * Add test for stack growing upwards * Improve comment about stack growing upwards
Changes in v3: * Check if stack is (erroneously) growing upwards * Mention in commit message why existing tests needed change
Changes in v2: * Added a check for when R2 is not a ptr to stack * Added a check for when stack is uninitialized (no stack slot yet) * Updated existing tests to account for null elision * Added test case for when R2 can be both const and non-const
Daniel Xu (5): bpf: verifier: Add missing newline on verbose() call bpf: tcp: Mark bpf_load_hdr_opt() arg2 as read-write bpf: verifier: Refactor helper access type tracking bpf: verifier: Support eliding map lookup nullness bpf: selftests: verifier: Add nullness elision tests
kernel/bpf/verifier.c | 127 ++++++++--- net/core/filter.c | 2 +- .../testing/selftests/bpf/progs/dynptr_fail.c | 6 +- tools/testing/selftests/bpf/progs/iters.c | 14 +- .../selftests/bpf/progs/map_kptr_fail.c | 2 +- .../selftests/bpf/progs/test_global_func10.c | 2 +- .../selftests/bpf/progs/uninit_stack.c | 29 --- .../bpf/progs/verifier_array_access.c | 214 ++++++++++++++++++ .../bpf/progs/verifier_basic_stack.c | 2 +- .../selftests/bpf/progs/verifier_const_or.c | 4 +- .../progs/verifier_helper_access_var_len.c | 12 +- .../selftests/bpf/progs/verifier_int_ptr.c | 2 +- .../selftests/bpf/progs/verifier_map_in_map.c | 2 +- .../selftests/bpf/progs/verifier_mtu.c | 2 +- .../selftests/bpf/progs/verifier_raw_stack.c | 4 +- .../selftests/bpf/progs/verifier_unpriv.c | 2 +- .../selftests/bpf/progs/verifier_var_off.c | 8 +- tools/testing/selftests/bpf/verifier/calls.c | 2 +- .../testing/selftests/bpf/verifier/map_kptr.c | 2 +- 19 files changed, 342 insertions(+), 96 deletions(-)
Previously, the verifier was treating all PTR_TO_STACK registers passed to a helper call as potentially written to by the helper. However, all calls to check_stack_range_initialized() already have precise access type information available.
Rather than treat ACCESS_HELPER as a proxy for BPF_WRITE, pass enum bpf_access_type to check_stack_range_initialized() to more precisely track helper arguments.
One benefit from this precision is that registers tracked as valid spills and passed as a read-only helper argument remain tracked after the call. Rather than being marked STACK_MISC afterwards.
An additional benefit is the verifier logs are also more precise. For this particular error, users will enjoy a slightly clearer message. See included selftest updates for examples.
Signed-off-by: Daniel Xu dxu@dxuuu.xyz --- kernel/bpf/verifier.c | 45 +++++++------------ .../testing/selftests/bpf/progs/dynptr_fail.c | 6 +-- .../selftests/bpf/progs/test_global_func10.c | 2 +- .../selftests/bpf/progs/uninit_stack.c | 29 ------------ .../bpf/progs/verifier_basic_stack.c | 2 +- .../selftests/bpf/progs/verifier_const_or.c | 4 +- .../progs/verifier_helper_access_var_len.c | 12 ++--- .../selftests/bpf/progs/verifier_int_ptr.c | 2 +- .../selftests/bpf/progs/verifier_mtu.c | 2 +- .../selftests/bpf/progs/verifier_raw_stack.c | 4 +- .../selftests/bpf/progs/verifier_unpriv.c | 2 +- .../selftests/bpf/progs/verifier_var_off.c | 8 ++-- tools/testing/selftests/bpf/verifier/calls.c | 2 +- 13 files changed, 39 insertions(+), 81 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 630150013479..58b36cc96bd5 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -5302,7 +5302,7 @@ enum bpf_access_src { static int check_stack_range_initialized(struct bpf_verifier_env *env, int regno, int off, int access_size, bool zero_size_allowed, - enum bpf_access_src type, + enum bpf_access_type type, struct bpf_call_arg_meta *meta);
static struct bpf_reg_state *reg_state(struct bpf_verifier_env *env, int regno) @@ -5335,7 +5335,7 @@ static int check_stack_read_var_off(struct bpf_verifier_env *env, /* Note that we pass a NULL meta, so raw access will not be permitted. */ err = check_stack_range_initialized(env, ptr_regno, off, size, - false, ACCESS_DIRECT, NULL); + false, BPF_READ, NULL); if (err) return err;
@@ -7205,7 +7205,7 @@ static int check_stack_slot_within_bounds(struct bpf_verifier_env *env, static int check_stack_access_within_bounds( struct bpf_verifier_env *env, int regno, int off, int access_size, - enum bpf_access_src src, enum bpf_access_type type) + enum bpf_access_type type) { struct bpf_reg_state *regs = cur_regs(env); struct bpf_reg_state *reg = regs + regno; @@ -7214,10 +7214,7 @@ static int check_stack_access_within_bounds( int err; char *err_extra;
- if (src == ACCESS_HELPER) - /* We don't know if helpers are reading or writing (or both). */ - err_extra = " indirect access to"; - else if (type == BPF_READ) + if (type == BPF_READ) err_extra = " read from"; else err_extra = " write to"; @@ -7435,7 +7432,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
} else if (reg->type == PTR_TO_STACK) { /* Basic bounds checks. */ - err = check_stack_access_within_bounds(env, regno, off, size, ACCESS_DIRECT, t); + err = check_stack_access_within_bounds(env, regno, off, size, t); if (err) return err;
@@ -7655,13 +7652,11 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i static int check_stack_range_initialized( struct bpf_verifier_env *env, int regno, int off, int access_size, bool zero_size_allowed, - enum bpf_access_src type, struct bpf_call_arg_meta *meta) + enum bpf_access_type type, struct bpf_call_arg_meta *meta) { struct bpf_reg_state *reg = reg_state(env, regno); struct bpf_func_state *state = func(env, reg); int err, min_off, max_off, i, j, slot, spi; - char *err_extra = type == ACCESS_HELPER ? " indirect" : ""; - enum bpf_access_type bounds_check_type; /* Some accesses can write anything into the stack, others are * read-only. */ @@ -7672,18 +7667,10 @@ static int check_stack_range_initialized( return -EACCES; }
- if (type == ACCESS_HELPER) { - /* The bounds checks for writes are more permissive than for - * reads. However, if raw_mode is not set, we'll do extra - * checks below. - */ - bounds_check_type = BPF_WRITE; + if (type == BPF_WRITE) clobber = true; - } else { - bounds_check_type = BPF_READ; - } - err = check_stack_access_within_bounds(env, regno, off, access_size, - type, bounds_check_type); + + err = check_stack_access_within_bounds(env, regno, off, access_size, type); if (err) return err;
@@ -7700,8 +7687,8 @@ static int check_stack_range_initialized( char tn_buf[48];
tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off); - verbose(env, "R%d%s variable offset stack access prohibited for !root, var_off=%s\n", - regno, err_extra, tn_buf); + verbose(env, "R%d variable offset stack access prohibited for !root, var_off=%s\n", + regno, tn_buf); return -EACCES; } /* Only initialized buffer on stack is allowed to be accessed @@ -7782,14 +7769,14 @@ static int check_stack_range_initialized( }
if (tnum_is_const(reg->var_off)) { - verbose(env, "invalid%s read from stack R%d off %d+%d size %d\n", - err_extra, regno, min_off, i - min_off, access_size); + verbose(env, "invalid read from stack R%d off %d+%d size %d\n", + regno, min_off, i - min_off, access_size); } else { char tn_buf[48];
tnum_strn(tn_buf, sizeof(tn_buf), reg->var_off); - verbose(env, "invalid%s read from stack R%d var_off %s+%d size %d\n", - err_extra, regno, tn_buf, i - min_off, access_size); + verbose(env, "invalid read from stack R%d var_off %s+%d size %d\n", + regno, tn_buf, i - min_off, access_size); } return -EACCES; mark: @@ -7864,7 +7851,7 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno, return check_stack_range_initialized( env, regno, reg->off, access_size, - zero_size_allowed, ACCESS_HELPER, meta); + zero_size_allowed, access_type, meta); case PTR_TO_BTF_ID: return check_ptr_to_btf_access(env, regs, regno, reg->off, access_size, BPF_READ, -1); diff --git a/tools/testing/selftests/bpf/progs/dynptr_fail.c b/tools/testing/selftests/bpf/progs/dynptr_fail.c index dfd817d0348c..bd8f15229f5c 100644 --- a/tools/testing/selftests/bpf/progs/dynptr_fail.c +++ b/tools/testing/selftests/bpf/progs/dynptr_fail.c @@ -192,7 +192,7 @@ int ringbuf_invalid_api(void *ctx)
/* Can't add a dynptr to a map */ SEC("?raw_tp") -__failure __msg("invalid indirect read from stack") +__failure __msg("invalid read from stack") int add_dynptr_to_map1(void *ctx) { struct bpf_dynptr ptr; @@ -210,7 +210,7 @@ int add_dynptr_to_map1(void *ctx)
/* Can't add a struct with an embedded dynptr to a map */ SEC("?raw_tp") -__failure __msg("invalid indirect read from stack") +__failure __msg("invalid read from stack") int add_dynptr_to_map2(void *ctx) { struct test_info x; @@ -398,7 +398,7 @@ int data_slice_missing_null_check2(void *ctx) * dynptr argument */ SEC("?raw_tp") -__failure __msg("invalid indirect read from stack") +__failure __msg("invalid read from stack") int invalid_helper1(void *ctx) { struct bpf_dynptr ptr; diff --git a/tools/testing/selftests/bpf/progs/test_global_func10.c b/tools/testing/selftests/bpf/progs/test_global_func10.c index 5da001ca57a5..09d027bd3ea8 100644 --- a/tools/testing/selftests/bpf/progs/test_global_func10.c +++ b/tools/testing/selftests/bpf/progs/test_global_func10.c @@ -26,7 +26,7 @@ __noinline int foo(const struct Big *big) }
SEC("cgroup_skb/ingress") -__failure __msg("invalid indirect access to stack") +__failure __msg("invalid read from stack") int global_func10(struct __sk_buff *skb) { const struct Small small = {.x = skb->len }; diff --git a/tools/testing/selftests/bpf/progs/uninit_stack.c b/tools/testing/selftests/bpf/progs/uninit_stack.c index 8a403470e557..87a2f8f7e92a 100644 --- a/tools/testing/selftests/bpf/progs/uninit_stack.c +++ b/tools/testing/selftests/bpf/progs/uninit_stack.c @@ -55,33 +55,4 @@ exit_%=: r0 = 0; \ : __clobber_all); }
-static __noinline void dummy(void) {} - -/* Pass a pointer to uninitialized stack memory to a helper. - * Passed memory block should be marked as STACK_MISC after helper call. - */ -SEC("socket") -__log_level(7) __msg("fp-104=mmmmmmmm") -__naked int helper_uninit_to_misc(void *ctx) -{ - asm volatile (" \ - /* force stack depth to be 128 */ \ - *(u64*)(r10 - 128) = r1; \ - r1 = r10; \ - r1 += -128; \ - r2 = 32; \ - call %[bpf_trace_printk]; \ - /* Call to dummy() forces print_verifier_state(..., true), \ - * thus showing the stack state, matched by __msg(). \ - */ \ - call %[dummy]; \ - r0 = 0; \ - exit; \ -" - : - : __imm(bpf_trace_printk), - __imm(dummy) - : __clobber_all); -} - char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/progs/verifier_basic_stack.c b/tools/testing/selftests/bpf/progs/verifier_basic_stack.c index 8d77cc5323d3..fb62e09f2114 100644 --- a/tools/testing/selftests/bpf/progs/verifier_basic_stack.c +++ b/tools/testing/selftests/bpf/progs/verifier_basic_stack.c @@ -28,7 +28,7 @@ __naked void stack_out_of_bounds(void) SEC("socket") __description("uninitialized stack1") __success __log_level(4) __msg("stack depth 8") -__failure_unpriv __msg_unpriv("invalid indirect read from stack") +__failure_unpriv __msg_unpriv("invalid read from stack") __naked void uninitialized_stack1(void) { asm volatile (" \ diff --git a/tools/testing/selftests/bpf/progs/verifier_const_or.c b/tools/testing/selftests/bpf/progs/verifier_const_or.c index ba8922b2eebd..68c568c3c3a0 100644 --- a/tools/testing/selftests/bpf/progs/verifier_const_or.c +++ b/tools/testing/selftests/bpf/progs/verifier_const_or.c @@ -25,7 +25,7 @@ __naked void constant_should_keep_constant_type(void)
SEC("tracepoint") __description("constant register |= constant should not bypass stack boundary checks") -__failure __msg("invalid indirect access to stack R1 off=-48 size=58") +__failure __msg("invalid write to stack R1 off=-48 size=58") __naked void not_bypass_stack_boundary_checks_1(void) { asm volatile (" \ @@ -62,7 +62,7 @@ __naked void register_should_keep_constant_type(void)
SEC("tracepoint") __description("constant register |= constant register should not bypass stack boundary checks") -__failure __msg("invalid indirect access to stack R1 off=-48 size=58") +__failure __msg("invalid write to stack R1 off=-48 size=58") __naked void not_bypass_stack_boundary_checks_2(void) { asm volatile (" \ diff --git a/tools/testing/selftests/bpf/progs/verifier_helper_access_var_len.c b/tools/testing/selftests/bpf/progs/verifier_helper_access_var_len.c index 50c6b22606f6..f2c54e4d89eb 100644 --- a/tools/testing/selftests/bpf/progs/verifier_helper_access_var_len.c +++ b/tools/testing/selftests/bpf/progs/verifier_helper_access_var_len.c @@ -67,7 +67,7 @@ SEC("socket") __description("helper access to variable memory: stack, bitwise AND, zero included") /* in privileged mode reads from uninitialized stack locations are permitted */ __success __failure_unpriv -__msg_unpriv("invalid indirect read from stack R2 off -64+0 size 64") +__msg_unpriv("invalid read from stack R2 off -64+0 size 64") __retval(0) __naked void stack_bitwise_and_zero_included(void) { @@ -100,7 +100,7 @@ __naked void stack_bitwise_and_zero_included(void)
SEC("tracepoint") __description("helper access to variable memory: stack, bitwise AND + JMP, wrong max") -__failure __msg("invalid indirect access to stack R1 off=-64 size=65") +__failure __msg("invalid write to stack R1 off=-64 size=65") __naked void bitwise_and_jmp_wrong_max(void) { asm volatile (" \ @@ -187,7 +187,7 @@ l0_%=: r0 = 0; \
SEC("tracepoint") __description("helper access to variable memory: stack, JMP, bounds + offset") -__failure __msg("invalid indirect access to stack R1 off=-64 size=65") +__failure __msg("invalid write to stack R1 off=-64 size=65") __naked void memory_stack_jmp_bounds_offset(void) { asm volatile (" \ @@ -211,7 +211,7 @@ l0_%=: r0 = 0; \
SEC("tracepoint") __description("helper access to variable memory: stack, JMP, wrong max") -__failure __msg("invalid indirect access to stack R1 off=-64 size=65") +__failure __msg("invalid write to stack R1 off=-64 size=65") __naked void memory_stack_jmp_wrong_max(void) { asm volatile (" \ @@ -260,7 +260,7 @@ SEC("socket") __description("helper access to variable memory: stack, JMP, no min check") /* in privileged mode reads from uninitialized stack locations are permitted */ __success __failure_unpriv -__msg_unpriv("invalid indirect read from stack R2 off -64+0 size 64") +__msg_unpriv("invalid read from stack R2 off -64+0 size 64") __retval(0) __naked void stack_jmp_no_min_check(void) { @@ -750,7 +750,7 @@ SEC("socket") __description("helper access to variable memory: 8 bytes leak") /* in privileged mode reads from uninitialized stack locations are permitted */ __success __failure_unpriv -__msg_unpriv("invalid indirect read from stack R2 off -64+32 size 64") +__msg_unpriv("invalid read from stack R2 off -64+32 size 64") __retval(0) __naked void variable_memory_8_bytes_leak(void) { diff --git a/tools/testing/selftests/bpf/progs/verifier_int_ptr.c b/tools/testing/selftests/bpf/progs/verifier_int_ptr.c index 5f2efb895edb..59e34d558654 100644 --- a/tools/testing/selftests/bpf/progs/verifier_int_ptr.c +++ b/tools/testing/selftests/bpf/progs/verifier_int_ptr.c @@ -96,7 +96,7 @@ __naked void arg_ptr_to_long_misaligned(void)
SEC("cgroup/sysctl") __description("arg pointer to long size < sizeof(long)") -__failure __msg("invalid indirect access to stack R4 off=-4 size=8") +__failure __msg("invalid write to stack R4 off=-4 size=8") __naked void to_long_size_sizeof_long(void) { asm volatile (" \ diff --git a/tools/testing/selftests/bpf/progs/verifier_mtu.c b/tools/testing/selftests/bpf/progs/verifier_mtu.c index 4ccf1ebc42d1..256956ea1ac5 100644 --- a/tools/testing/selftests/bpf/progs/verifier_mtu.c +++ b/tools/testing/selftests/bpf/progs/verifier_mtu.c @@ -8,7 +8,7 @@ SEC("tc/ingress") __description("uninit/mtu: write rejected") __success __caps_unpriv(CAP_BPF|CAP_NET_ADMIN) -__failure_unpriv __msg_unpriv("invalid indirect read from stack") +__failure_unpriv __msg_unpriv("invalid read from stack") int tc_uninit_mtu(struct __sk_buff *ctx) { __u32 mtu; diff --git a/tools/testing/selftests/bpf/progs/verifier_raw_stack.c b/tools/testing/selftests/bpf/progs/verifier_raw_stack.c index 7cc83acac727..c689665e07b9 100644 --- a/tools/testing/selftests/bpf/progs/verifier_raw_stack.c +++ b/tools/testing/selftests/bpf/progs/verifier_raw_stack.c @@ -236,7 +236,7 @@ __naked void load_bytes_spilled_regs_data(void)
SEC("tc") __description("raw_stack: skb_load_bytes, invalid access 1") -__failure __msg("invalid indirect access to stack R3 off=-513 size=8") +__failure __msg("invalid write to stack R3 off=-513 size=8") __naked void load_bytes_invalid_access_1(void) { asm volatile (" \ @@ -255,7 +255,7 @@ __naked void load_bytes_invalid_access_1(void)
SEC("tc") __description("raw_stack: skb_load_bytes, invalid access 2") -__failure __msg("invalid indirect access to stack R3 off=-1 size=8") +__failure __msg("invalid write to stack R3 off=-1 size=8") __naked void load_bytes_invalid_access_2(void) { asm volatile (" \ diff --git a/tools/testing/selftests/bpf/progs/verifier_unpriv.c b/tools/testing/selftests/bpf/progs/verifier_unpriv.c index 7ea535bfbacd..a4a5e2071604 100644 --- a/tools/testing/selftests/bpf/progs/verifier_unpriv.c +++ b/tools/testing/selftests/bpf/progs/verifier_unpriv.c @@ -199,7 +199,7 @@ __naked void pass_pointer_to_helper_function(void) SEC("socket") __description("unpriv: indirectly pass pointer on stack to helper function") __success __failure_unpriv -__msg_unpriv("invalid indirect read from stack R2 off -8+0 size 8") +__msg_unpriv("invalid read from stack R2 off -8+0 size 8") __retval(0) __naked void on_stack_to_helper_function(void) { diff --git a/tools/testing/selftests/bpf/progs/verifier_var_off.c b/tools/testing/selftests/bpf/progs/verifier_var_off.c index c810f4f6f479..1d36d01b746e 100644 --- a/tools/testing/selftests/bpf/progs/verifier_var_off.c +++ b/tools/testing/selftests/bpf/progs/verifier_var_off.c @@ -203,7 +203,7 @@ __naked void stack_write_clobbers_spilled_regs(void)
SEC("sockops") __description("indirect variable-offset stack access, unbounded") -__failure __msg("invalid unbounded variable-offset indirect access to stack R4") +__failure __msg("invalid unbounded variable-offset write to stack R4") __naked void variable_offset_stack_access_unbounded(void) { asm volatile (" \ @@ -236,7 +236,7 @@ l0_%=: r0 = 0; \
SEC("lwt_in") __description("indirect variable-offset stack access, max out of bound") -__failure __msg("invalid variable-offset indirect access to stack R2") +__failure __msg("invalid variable-offset read from stack R2") __naked void access_max_out_of_bound(void) { asm volatile (" \ @@ -269,7 +269,7 @@ __naked void access_max_out_of_bound(void) */ SEC("socket") __description("indirect variable-offset stack access, zero-sized, max out of bound") -__failure __msg("invalid variable-offset indirect access to stack R1") +__failure __msg("invalid variable-offset write to stack R1") __naked void zero_sized_access_max_out_of_bound(void) { asm volatile (" \ @@ -294,7 +294,7 @@ __naked void zero_sized_access_max_out_of_bound(void)
SEC("lwt_in") __description("indirect variable-offset stack access, min out of bound") -__failure __msg("invalid variable-offset indirect access to stack R2") +__failure __msg("invalid variable-offset read from stack R2") __naked void access_min_out_of_bound(void) { asm volatile (" \ diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c index 7afc2619ab14..18596ae0b0c1 100644 --- a/tools/testing/selftests/bpf/verifier/calls.c +++ b/tools/testing/selftests/bpf/verifier/calls.c @@ -2252,7 +2252,7 @@ BPF_EXIT_INSN(), }, .fixup_map_hash_48b = { 7 }, - .errstr_unpriv = "invalid indirect read from stack R2 off -8+0 size 8", + .errstr_unpriv = "invalid read from stack R2 off -8+0 size 8", .result_unpriv = REJECT, /* in privileged mode reads from uninitialized stack locations are permitted */ .result = ACCEPT,
On Thu, 2024-12-12 at 16:22 -0700, Daniel Xu wrote:
Previously, the verifier was treating all PTR_TO_STACK registers passed to a helper call as potentially written to by the helper. However, all calls to check_stack_range_initialized() already have precise access type information available.
Rather than treat ACCESS_HELPER as a proxy for BPF_WRITE, pass enum bpf_access_type to check_stack_range_initialized() to more precisely track helper arguments.
One benefit from this precision is that registers tracked as valid spills and passed as a read-only helper argument remain tracked after the call. Rather than being marked STACK_MISC afterwards.
An additional benefit is the verifier logs are also more precise. For this particular error, users will enjoy a slightly clearer message. See included selftest updates for examples.
Signed-off-by: Daniel Xu dxu@dxuuu.xyz
I think this change is ok. With it there is only one use of 'enum bpf_access_src' remains, but it doesn't look like it could be removed.
Acked-by: Eduard Zingerman eddyz87@gmail.com
[...]
--- a/tools/testing/selftests/bpf/progs/uninit_stack.c +++ b/tools/testing/selftests/bpf/progs/uninit_stack.c @@ -55,33 +55,4 @@ exit_%=: r0 = 0; \ : __clobber_all); } -static __noinline void dummy(void) {}
-/* Pass a pointer to uninitialized stack memory to a helper.
- Passed memory block should be marked as STACK_MISC after helper call.
- */
-SEC("socket") -__log_level(7) __msg("fp-104=mmmmmmmm") -__naked int helper_uninit_to_misc(void *ctx)
Is it possible to peek a helper that writes into memory and not delete this test?
-{
- asm volatile (" \
/* force stack depth to be 128 */ \
*(u64*)(r10 - 128) = r1; \
r1 = r10; \
r1 += -128; \
r2 = 32; \
call %[bpf_trace_printk]; \
/* Call to dummy() forces print_verifier_state(..., true), \
* thus showing the stack state, matched by __msg(). \
*/ \
call %[dummy]; \
r0 = 0; \
exit; \
-"
:
: __imm(bpf_trace_printk),
__imm(dummy)
: __clobber_all);
-}
[...]
On Thu, Dec 12, 2024 at 08:04:28PM GMT, Eduard Zingerman wrote:
On Thu, 2024-12-12 at 16:22 -0700, Daniel Xu wrote:
Previously, the verifier was treating all PTR_TO_STACK registers passed to a helper call as potentially written to by the helper. However, all calls to check_stack_range_initialized() already have precise access type information available.
Rather than treat ACCESS_HELPER as a proxy for BPF_WRITE, pass enum bpf_access_type to check_stack_range_initialized() to more precisely track helper arguments.
One benefit from this precision is that registers tracked as valid spills and passed as a read-only helper argument remain tracked after the call. Rather than being marked STACK_MISC afterwards.
An additional benefit is the verifier logs are also more precise. For this particular error, users will enjoy a slightly clearer message. See included selftest updates for examples.
Signed-off-by: Daniel Xu dxu@dxuuu.xyz
I think this change is ok. With it there is only one use of 'enum bpf_access_src' remains, but it doesn't look like it could be removed.
Acked-by: Eduard Zingerman eddyz87@gmail.com
[...]
--- a/tools/testing/selftests/bpf/progs/uninit_stack.c +++ b/tools/testing/selftests/bpf/progs/uninit_stack.c @@ -55,33 +55,4 @@ exit_%=: r0 = 0; \ : __clobber_all); } -static __noinline void dummy(void) {}
-/* Pass a pointer to uninitialized stack memory to a helper.
- Passed memory block should be marked as STACK_MISC after helper call.
- */
-SEC("socket") -__log_level(7) __msg("fp-104=mmmmmmmm") -__naked int helper_uninit_to_misc(void *ctx)
Is it possible to peek a helper that writes into memory and not delete this test?
Yeah, good idea. Will do.
This commit allows progs to elide a null check on statically known map lookup keys. In other words, if the verifier can statically prove that the lookup will be in-bounds, allow the prog to drop the null check.
This is useful for two reasons:
1. Large numbers of nullness checks (especially when they cannot fail) unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ. 2. It forms a tighter contract between programmer and verifier.
For (1), bpftrace is starting to make heavier use of percpu scratch maps. As a result, for user scripts with large number of unrolled loops, we are starting to hit jump complexity verification errors. These percpu lookups cannot fail anyways, as we only use static key values. Eliding nullness probably results in less work for verifier as well.
For (2), percpu scratch maps are often used as a larger stack, as the currrent stack is limited to 512 bytes. In these situations, it is desirable for the programmer to express: "this lookup should never fail, and if it does, it means I messed up the code". By omitting the null check, the programmer can "ask" the verifier to double check the logic.
Tests also have to be updated in sync with these changes, as the verifier is more efficient with this change. Notable, iters.c tests had to be changed to use a map type that still requires null checks, as it's exercising verifier tracking logic w.r.t iterators.
Signed-off-by: Daniel Xu dxu@dxuuu.xyz --- kernel/bpf/verifier.c | 80 ++++++++++++++++++- tools/testing/selftests/bpf/progs/iters.c | 14 ++-- .../selftests/bpf/progs/map_kptr_fail.c | 2 +- .../selftests/bpf/progs/verifier_map_in_map.c | 2 +- .../testing/selftests/bpf/verifier/map_kptr.c | 2 +- 5 files changed, 87 insertions(+), 13 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 58b36cc96bd5..4947ef884a18 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -287,6 +287,7 @@ struct bpf_call_arg_meta { u32 ret_btf_id; u32 subprogno; struct btf_field *kptr_field; + s64 const_map_key; };
struct bpf_kfunc_call_arg_meta { @@ -9163,6 +9164,53 @@ static int check_reg_const_str(struct bpf_verifier_env *env, return 0; }
+/* Returns constant key value if possible, else -1 */ +static s64 get_constant_map_key(struct bpf_verifier_env *env, + struct bpf_reg_state *key, + u32 key_size) +{ + struct bpf_func_state *state = func(env, key); + struct bpf_reg_state *reg; + int zero_size = 0; + int stack_off; + u8 *stype; + int slot; + int spi; + int i; + + if (!env->bpf_capable) + return -1; + if (key->type != PTR_TO_STACK) + return -1; + if (!tnum_is_const(key->var_off)) + return -1; + + stack_off = key->off + key->var_off.value; + slot = -stack_off - 1; + spi = slot / BPF_REG_SIZE; + + /* First handle precisely tracked STACK_ZERO, up to BPF_REG_SIZE */ + stype = state->stack[spi].slot_type; + for (i = 0; i < BPF_REG_SIZE && stype[i] == STACK_ZERO; i++) + zero_size++; + if (zero_size == key_size) + return 0; + + if (!is_spilled_reg(&state->stack[spi])) + /* Not pointer to stack */ + return -1; + + reg = &state->stack[spi].spilled_ptr; + if (reg->type != SCALAR_VALUE) + /* Only scalars are valid array map keys */ + return -1; + else if (!tnum_is_const(reg->var_off)) + /* Stack value not statically known */ + return -1; + + return reg->var_off.value; +} + static int check_func_arg(struct bpf_verifier_env *env, u32 arg, struct bpf_call_arg_meta *meta, const struct bpf_func_proto *fn, @@ -9173,6 +9221,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, enum bpf_arg_type arg_type = fn->arg_type[arg]; enum bpf_reg_type type = reg->type; u32 *arg_btf_id = NULL; + u32 key_size; int err = 0; bool mask;
@@ -9307,8 +9356,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, verbose(env, "invalid map_ptr to access map->key\n"); return -EACCES; } - err = check_helper_mem_access(env, regno, meta->map_ptr->key_size, - BPF_READ, false, NULL); + key_size = meta->map_ptr->key_size; + err = check_helper_mem_access(env, regno, key_size, BPF_READ, false, NULL); + if (err) + return err; + meta->const_map_key = get_constant_map_key(env, reg, key_size); break; case ARG_PTR_TO_MAP_VALUE: if (type_may_be_null(arg_type) && register_is_null(reg)) @@ -10833,6 +10885,21 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno state->callback_subprogno == subprogno); }
+/* Returns whether or not the given map type can potentially elide + * lookup return value nullness check. This is possible if the key + * is statically known. + */ +static bool can_elide_value_nullness(enum bpf_map_type type) +{ + switch (type) { + case BPF_MAP_TYPE_ARRAY: + case BPF_MAP_TYPE_PERCPU_ARRAY: + return true; + default: + return false; + } +} + static int get_helper_proto(struct bpf_verifier_env *env, int func_id, const struct bpf_func_proto **ptr) { @@ -11199,10 +11266,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn "kernel subsystem misconfigured verifier\n"); return -EINVAL; } + + if (func_id == BPF_FUNC_map_lookup_elem && + can_elide_value_nullness(meta.map_ptr->map_type) && + meta.const_map_key >= 0 && + meta.const_map_key < meta.map_ptr->max_entries) + ret_flag &= ~PTR_MAYBE_NULL; + regs[BPF_REG_0].map_ptr = meta.map_ptr; regs[BPF_REG_0].map_uid = meta.map_uid; regs[BPF_REG_0].type = PTR_TO_MAP_VALUE | ret_flag; - if (!type_may_be_null(ret_type) && + if (!type_may_be_null(ret_flag) && btf_record_has_field(meta.map_ptr->record, BPF_SPIN_LOCK)) { regs[BPF_REG_0].id = ++env->id_gen; } diff --git a/tools/testing/selftests/bpf/progs/iters.c b/tools/testing/selftests/bpf/progs/iters.c index 7c969c127573..190822b2f08b 100644 --- a/tools/testing/selftests/bpf/progs/iters.c +++ b/tools/testing/selftests/bpf/progs/iters.c @@ -524,11 +524,11 @@ int iter_subprog_iters(const void *ctx) }
struct { - __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(type, BPF_MAP_TYPE_HASH); __type(key, int); __type(value, int); __uint(max_entries, 1000); -} arr_map SEC(".maps"); +} hash_map SEC(".maps");
SEC("?raw_tp") __failure __msg("invalid mem access 'scalar'") @@ -539,7 +539,7 @@ int iter_err_too_permissive1(const void *ctx)
MY_PID_GUARD();
- map_val = bpf_map_lookup_elem(&arr_map, &key); + map_val = bpf_map_lookup_elem(&hash_map, &key); if (!map_val) return 0;
@@ -561,12 +561,12 @@ int iter_err_too_permissive2(const void *ctx)
MY_PID_GUARD();
- map_val = bpf_map_lookup_elem(&arr_map, &key); + map_val = bpf_map_lookup_elem(&hash_map, &key); if (!map_val) return 0;
bpf_repeat(1000000) { - map_val = bpf_map_lookup_elem(&arr_map, &key); + map_val = bpf_map_lookup_elem(&hash_map, &key); }
*map_val = 123; @@ -585,7 +585,7 @@ int iter_err_too_permissive3(const void *ctx) MY_PID_GUARD();
bpf_repeat(1000000) { - map_val = bpf_map_lookup_elem(&arr_map, &key); + map_val = bpf_map_lookup_elem(&hash_map, &key); found = true; }
@@ -606,7 +606,7 @@ int iter_tricky_but_fine(const void *ctx) MY_PID_GUARD();
bpf_repeat(1000000) { - map_val = bpf_map_lookup_elem(&arr_map, &key); + map_val = bpf_map_lookup_elem(&hash_map, &key); if (map_val) { found = true; break; diff --git a/tools/testing/selftests/bpf/progs/map_kptr_fail.c b/tools/testing/selftests/bpf/progs/map_kptr_fail.c index c2a6bd392e48..4c0ff01f1a96 100644 --- a/tools/testing/selftests/bpf/progs/map_kptr_fail.c +++ b/tools/testing/selftests/bpf/progs/map_kptr_fail.c @@ -345,7 +345,7 @@ int reject_indirect_global_func_access(struct __sk_buff *ctx) }
SEC("?tc") -__failure __msg("Unreleased reference id=5 alloc_insn=") +__failure __msg("Unreleased reference id=4 alloc_insn=") int kptr_xchg_ref_state(struct __sk_buff *ctx) { struct prog_test_ref_kfunc *p; diff --git a/tools/testing/selftests/bpf/progs/verifier_map_in_map.c b/tools/testing/selftests/bpf/progs/verifier_map_in_map.c index 4eaab1468eb7..7d088ba99ea5 100644 --- a/tools/testing/selftests/bpf/progs/verifier_map_in_map.c +++ b/tools/testing/selftests/bpf/progs/verifier_map_in_map.c @@ -47,7 +47,7 @@ l0_%=: r0 = 0; \
SEC("xdp") __description("map in map state pruning") -__success __msg("processed 26 insns") +__success __msg("processed 15 insns") __log_level(2) __retval(0) __flag(BPF_F_TEST_STATE_FREQ) __naked void map_in_map_state_pruning(void) { diff --git a/tools/testing/selftests/bpf/verifier/map_kptr.c b/tools/testing/selftests/bpf/verifier/map_kptr.c index f420c0312aa0..4b39f8472f9b 100644 --- a/tools/testing/selftests/bpf/verifier/map_kptr.c +++ b/tools/testing/selftests/bpf/verifier/map_kptr.c @@ -373,7 +373,7 @@ .prog_type = BPF_PROG_TYPE_SCHED_CLS, .fixup_map_kptr = { 1 }, .result = REJECT, - .errstr = "Unreleased reference id=5 alloc_insn=20", + .errstr = "Unreleased reference id=4 alloc_insn=20", .fixup_kfunc_btf_id = { { "bpf_kfunc_call_test_acquire", 15 }, }
On Thu, 2024-12-12 at 16:22 -0700, Daniel Xu wrote:
I think these changes are fine in general, but see below.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 58b36cc96bd5..4947ef884a18 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -287,6 +287,7 @@ struct bpf_call_arg_meta { u32 ret_btf_id; u32 subprogno; struct btf_field *kptr_field;
- s64 const_map_key;
}; struct bpf_kfunc_call_arg_meta { @@ -9163,6 +9164,53 @@ static int check_reg_const_str(struct bpf_verifier_env *env, return 0; } +/* Returns constant key value if possible, else -1 */ +static s64 get_constant_map_key(struct bpf_verifier_env *env,
struct bpf_reg_state *key,
u32 key_size)
I understand that this is not your use case, but maybe generalize this a bit by checking maximal register value instead of a constant?
+{
- struct bpf_func_state *state = func(env, key);
- struct bpf_reg_state *reg;
- int zero_size = 0;
- int stack_off;
- u8 *stype;
- int slot;
- int spi;
- int i;
- if (!env->bpf_capable)
return -1;
- if (key->type != PTR_TO_STACK)
return -1;
- if (!tnum_is_const(key->var_off))
return -1;
- stack_off = key->off + key->var_off.value;
- slot = -stack_off - 1;
- spi = slot / BPF_REG_SIZE;
- /* First handle precisely tracked STACK_ZERO, up to BPF_REG_SIZE */
- stype = state->stack[spi].slot_type;
- for (i = 0; i < BPF_REG_SIZE && stype[i] == STACK_ZERO; i++)
zero_size++;
- if (zero_size == key_size)
return 0;
- if (!is_spilled_reg(&state->stack[spi]))
/* Not pointer to stack */
return -1;
Nit: there is a 'is_spilled_scalar_reg' utility function.
- reg = &state->stack[spi].spilled_ptr;
- if (reg->type != SCALAR_VALUE)
/* Only scalars are valid array map keys */
return -1;
- else if (!tnum_is_const(reg->var_off))
/* Stack value not statically known */
return -1;
I think you need to check if size of the spill matches the size of the key. The mismatch would be unsafe when spill size is smaller than key size. E.g. consider 1-byte spill with mask 'mmmmmmrr' and a 4-byte key, at runtime the 'mmmmmm' part might be non-zero, rendering key to be out of range.
- return reg->var_off.value;
+}
static int check_func_arg(struct bpf_verifier_env *env, u32 arg, struct bpf_call_arg_meta *meta, const struct bpf_func_proto *fn,
[...]
On Thu, Dec 12, 2024 at 08:04:45PM GMT, Eduard Zingerman wrote:
On Thu, 2024-12-12 at 16:22 -0700, Daniel Xu wrote:
I think these changes are fine in general, but see below.
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 58b36cc96bd5..4947ef884a18 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -287,6 +287,7 @@ struct bpf_call_arg_meta { u32 ret_btf_id; u32 subprogno; struct btf_field *kptr_field;
- s64 const_map_key;
}; struct bpf_kfunc_call_arg_meta { @@ -9163,6 +9164,53 @@ static int check_reg_const_str(struct bpf_verifier_env *env, return 0; } +/* Returns constant key value if possible, else -1 */ +static s64 get_constant_map_key(struct bpf_verifier_env *env,
struct bpf_reg_state *key,
u32 key_size)
I understand that this is not your use case, but maybe generalize this a bit by checking maximal register value instead of a constant?
I'll check on this. If it works I think you're right - it allows more flexibility while retaining safety. User could define max_entries to be a power of two and then mask key with with 0xFFFF.. to guarantee null free codepaths.
+{
- struct bpf_func_state *state = func(env, key);
- struct bpf_reg_state *reg;
- int zero_size = 0;
- int stack_off;
- u8 *stype;
- int slot;
- int spi;
- int i;
- if (!env->bpf_capable)
return -1;
- if (key->type != PTR_TO_STACK)
return -1;
- if (!tnum_is_const(key->var_off))
return -1;
- stack_off = key->off + key->var_off.value;
- slot = -stack_off - 1;
- spi = slot / BPF_REG_SIZE;
- /* First handle precisely tracked STACK_ZERO, up to BPF_REG_SIZE */
- stype = state->stack[spi].slot_type;
- for (i = 0; i < BPF_REG_SIZE && stype[i] == STACK_ZERO; i++)
zero_size++;
- if (zero_size == key_size)
return 0;
- if (!is_spilled_reg(&state->stack[spi]))
/* Not pointer to stack */
return -1;
Nit: there is a 'is_spilled_scalar_reg' utility function.
Ack.
- reg = &state->stack[spi].spilled_ptr;
- if (reg->type != SCALAR_VALUE)
/* Only scalars are valid array map keys */
return -1;
- else if (!tnum_is_const(reg->var_off))
/* Stack value not statically known */
return -1;
I think you need to check if size of the spill matches the size of the key. The mismatch would be unsafe when spill size is smaller than key size. E.g. consider 1-byte spill with mask 'mmmmmmrr' and a 4-byte key, at runtime the 'mmmmmm' part might be non-zero, rendering key to be out of range.
Ah great catch. I think you're right.
On Thu, Dec 12, 2024 at 3:23 PM Daniel Xu dxu@dxuuu.xyz wrote:
This commit allows progs to elide a null check on statically known map lookup keys. In other words, if the verifier can statically prove that the lookup will be in-bounds, allow the prog to drop the null check.
This is useful for two reasons:
- Large numbers of nullness checks (especially when they cannot fail) unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ.
- It forms a tighter contract between programmer and verifier.
For (1), bpftrace is starting to make heavier use of percpu scratch maps. As a result, for user scripts with large number of unrolled loops, we are starting to hit jump complexity verification errors. These percpu lookups cannot fail anyways, as we only use static key values. Eliding nullness probably results in less work for verifier as well.
For (2), percpu scratch maps are often used as a larger stack, as the currrent stack is limited to 512 bytes. In these situations, it is desirable for the programmer to express: "this lookup should never fail, and if it does, it means I messed up the code". By omitting the null check, the programmer can "ask" the verifier to double check the logic.
Tests also have to be updated in sync with these changes, as the verifier is more efficient with this change. Notable, iters.c tests had to be changed to use a map type that still requires null checks, as it's exercising verifier tracking logic w.r.t iterators.
Signed-off-by: Daniel Xu dxu@dxuuu.xyz
kernel/bpf/verifier.c | 80 ++++++++++++++++++- tools/testing/selftests/bpf/progs/iters.c | 14 ++-- .../selftests/bpf/progs/map_kptr_fail.c | 2 +- .../selftests/bpf/progs/verifier_map_in_map.c | 2 +- .../testing/selftests/bpf/verifier/map_kptr.c | 2 +- 5 files changed, 87 insertions(+), 13 deletions(-)
Eduard has great points. I've added a few more comments below.
pw-bot: cr
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 58b36cc96bd5..4947ef884a18 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -287,6 +287,7 @@ struct bpf_call_arg_meta { u32 ret_btf_id; u32 subprogno; struct btf_field *kptr_field;
s64 const_map_key;
};
struct bpf_kfunc_call_arg_meta { @@ -9163,6 +9164,53 @@ static int check_reg_const_str(struct bpf_verifier_env *env, return 0; }
+/* Returns constant key value if possible, else -1 */ +static s64 get_constant_map_key(struct bpf_verifier_env *env,
struct bpf_reg_state *key,
u32 key_size)
+{
struct bpf_func_state *state = func(env, key);
struct bpf_reg_state *reg;
int zero_size = 0;
int stack_off;
u8 *stype;
int slot;
int spi;
int i;
if (!env->bpf_capable)
return -1;
if (key->type != PTR_TO_STACK)
return -1;
if (!tnum_is_const(key->var_off))
return -1;
stack_off = key->off + key->var_off.value;
slot = -stack_off - 1;
spi = slot / BPF_REG_SIZE;
/* First handle precisely tracked STACK_ZERO, up to BPF_REG_SIZE */
stype = state->stack[spi].slot_type;
for (i = 0; i < BPF_REG_SIZE && stype[i] == STACK_ZERO; i++)
it's Friday and I'm lazy, but please double-check that this works for both big-endian and little-endian :)
with Eduard's suggestion this also becomes interesting when you have 000mmm mix (as one example), because that gives you a small range, and all values might be valid keys for arrays
zero_size++;
if (zero_size == key_size)
return 0;
if (!is_spilled_reg(&state->stack[spi]))
/* Not pointer to stack */
!is_spilled_reg and "Not pointer to stack" seem to be not exactly the same things?
btw, we also have is_spilled_scalar_reg() which you can use here instead of two separate checks?
return -1;
reg = &state->stack[spi].spilled_ptr;
if (reg->type != SCALAR_VALUE)
/* Only scalars are valid array map keys */
return -1;
else if (!tnum_is_const(reg->var_off))
/* Stack value not statically known */
return -1;
return reg->var_off.value;
+}
[...]
On Fri, Dec 13, 2024 at 03:02:11PM GMT, Andrii Nakryiko wrote:
On Thu, Dec 12, 2024 at 3:23 PM Daniel Xu dxu@dxuuu.xyz wrote:
This commit allows progs to elide a null check on statically known map lookup keys. In other words, if the verifier can statically prove that the lookup will be in-bounds, allow the prog to drop the null check.
This is useful for two reasons:
- Large numbers of nullness checks (especially when they cannot fail) unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ.
- It forms a tighter contract between programmer and verifier.
For (1), bpftrace is starting to make heavier use of percpu scratch maps. As a result, for user scripts with large number of unrolled loops, we are starting to hit jump complexity verification errors. These percpu lookups cannot fail anyways, as we only use static key values. Eliding nullness probably results in less work for verifier as well.
For (2), percpu scratch maps are often used as a larger stack, as the currrent stack is limited to 512 bytes. In these situations, it is desirable for the programmer to express: "this lookup should never fail, and if it does, it means I messed up the code". By omitting the null check, the programmer can "ask" the verifier to double check the logic.
Tests also have to be updated in sync with these changes, as the verifier is more efficient with this change. Notable, iters.c tests had to be changed to use a map type that still requires null checks, as it's exercising verifier tracking logic w.r.t iterators.
Signed-off-by: Daniel Xu dxu@dxuuu.xyz
kernel/bpf/verifier.c | 80 ++++++++++++++++++- tools/testing/selftests/bpf/progs/iters.c | 14 ++-- .../selftests/bpf/progs/map_kptr_fail.c | 2 +- .../selftests/bpf/progs/verifier_map_in_map.c | 2 +- .../testing/selftests/bpf/verifier/map_kptr.c | 2 +- 5 files changed, 87 insertions(+), 13 deletions(-)
Eduard has great points. I've added a few more comments below.
pw-bot: cr
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 58b36cc96bd5..4947ef884a18 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -287,6 +287,7 @@ struct bpf_call_arg_meta { u32 ret_btf_id; u32 subprogno; struct btf_field *kptr_field;
s64 const_map_key;
};
struct bpf_kfunc_call_arg_meta { @@ -9163,6 +9164,53 @@ static int check_reg_const_str(struct bpf_verifier_env *env, return 0; }
+/* Returns constant key value if possible, else -1 */ +static s64 get_constant_map_key(struct bpf_verifier_env *env,
struct bpf_reg_state *key,
u32 key_size)
+{
struct bpf_func_state *state = func(env, key);
struct bpf_reg_state *reg;
int zero_size = 0;
int stack_off;
u8 *stype;
int slot;
int spi;
int i;
if (!env->bpf_capable)
return -1;
if (key->type != PTR_TO_STACK)
return -1;
if (!tnum_is_const(key->var_off))
return -1;
stack_off = key->off + key->var_off.value;
slot = -stack_off - 1;
spi = slot / BPF_REG_SIZE;
/* First handle precisely tracked STACK_ZERO, up to BPF_REG_SIZE */
stype = state->stack[spi].slot_type;
for (i = 0; i < BPF_REG_SIZE && stype[i] == STACK_ZERO; i++)
it's Friday and I'm lazy, but please double-check that this works for both big-endian and little-endian :)
Any tips? Are the existing tests running thru s390x hosts in CI sufficient or should I add some tests writen in C (and not BPF assembler)? I can never think about endianness correctly...
with Eduard's suggestion this also becomes interesting when you have 000mmm mix (as one example), because that gives you a small range, and all values might be valid keys for arrays
Can you define what "small range" means? What range is there with 0's? Any pointers would be helpful.
zero_size++;
if (zero_size == key_size)
return 0;
if (!is_spilled_reg(&state->stack[spi]))
/* Not pointer to stack */
!is_spilled_reg and "Not pointer to stack" seem to be not exactly the same things?
You're right - comment is not helpful. I'll make the change to use is_spilled_scalar_reg() which is probably as clear as it gets.
[..]
On Fri, 2024-12-13 at 19:44 -0700, Daniel Xu wrote:
[...]
/* First handle precisely tracked STACK_ZERO, up to BPF_REG_SIZE */
stype = state->stack[spi].slot_type;
for (i = 0; i < BPF_REG_SIZE && stype[i] == STACK_ZERO; i++)
it's Friday and I'm lazy, but please double-check that this works for both big-endian and little-endian :)
Any tips? Are the existing tests running thru s390x hosts in CI sufficient or should I add some tests writen in C (and not BPF assembler)? I can never think about endianness correctly...
I think that if test operates on a key like:
valid key 15 v 0000000f <-- written to stack as a single u64 value ^^^^^^^ stack zero marks
and is executed (e.g. using __retval annotation), then CI passing for s390 should be enough.
There is a guide on how to gen a s390 environment locally: https://docs.kernel.org/bpf/s390.html I used it recently to build a vmlinux for s390 with no or minimal issues. Used it to boot long time ago, but don't remember if there were any surprises.
with Eduard's suggestion this also becomes interesting when you have 000mmm mix (as one example), because that gives you a small range, and all values might be valid keys for arrays
Can you define what "small range" means? What range is there with 0's? Any pointers would be helpful.
I think Andrii means that each 'm' adds 8 bits of range. E.g. range for 0000_000m is 0-255, range for 0000_00mm is 0-65535, etc.
[...]
On Fri, Dec 13, 2024 at 7:13 PM Eduard Zingerman eddyz87@gmail.com wrote:
On Fri, 2024-12-13 at 19:44 -0700, Daniel Xu wrote:
[...]
/* First handle precisely tracked STACK_ZERO, up to BPF_REG_SIZE */
stype = state->stack[spi].slot_type;
for (i = 0; i < BPF_REG_SIZE && stype[i] == STACK_ZERO; i++)
it's Friday and I'm lazy, but please double-check that this works for both big-endian and little-endian :)
Any tips? Are the existing tests running thru s390x hosts in CI sufficient or should I add some tests writen in C (and not BPF assembler)? I can never think about endianness correctly...
I think that if test operates on a key like:
valid key 15 v 0000000f <-- written to stack as a single u64 value ^^^^^^^ stack zero marks
and is executed (e.g. using __retval annotation), then CI passing for s390 should be enough.
+1, something like that where for big-endian it will be all zero while for little endian it would be 0xf (and then make sure that the test should *fail* by making sure that 0xf is not a valid index, so NULL check is necessary)
There is a guide on how to gen a s390 environment locally: https://docs.kernel.org/bpf/s390.html I used it recently to build a vmlinux for s390 with no or minimal issues. Used it to boot long time ago, but don't remember if there were any surprises.
with Eduard's suggestion this also becomes interesting when you have 000mmm mix (as one example), because that gives you a small range, and all values might be valid keys for arrays
Can you define what "small range" means? What range is there with 0's? Any pointers would be helpful.
I think Andrii means that each 'm' adds 8 bits of range. E.g. range for 0000_000m is 0-255, range for 0000_00mm is 0-65535, etc.
yes, exactly, thank you, Eduard!
[...]
On Mon, Dec 16, 2024 at 03:24:01PM -0800, Andrii Nakryiko wrote:
On Fri, Dec 13, 2024 at 7:13 PM Eduard Zingerman eddyz87@gmail.com wrote:
On Fri, 2024-12-13 at 19:44 -0700, Daniel Xu wrote:
[...]
with Eduard's suggestion this also becomes interesting when you have 000mmm mix (as one example), because that gives you a small range, and all values might be valid keys for arrays
Can you define what "small range" means? What range is there with 0's? Any pointers would be helpful.
I think Andrii means that each 'm' adds 8 bits of range. E.g. range for 0000_000m is 0-255, range for 0000_00mm is 0-65535, etc.
yes, exactly, thank you, Eduard!
Gave it some thought. Still seems like a good idea, but I'd prefer to leave this extension for a separate patchset. Mostly b/c I'm running out of space in my head to grok everything :P. Probably higher likelihood of me getting the existing stuff correct if I don't add more scope.
Thanks, Daniel
On Mon, Dec 16, 2024 at 03:24:01PM -0800, Andrii Nakryiko wrote:
On Fri, Dec 13, 2024 at 7:13 PM Eduard Zingerman eddyz87@gmail.com wrote:
On Fri, 2024-12-13 at 19:44 -0700, Daniel Xu wrote:
[...]
/* First handle precisely tracked STACK_ZERO, up to BPF_REG_SIZE */
stype = state->stack[spi].slot_type;
for (i = 0; i < BPF_REG_SIZE && stype[i] == STACK_ZERO; i++)
it's Friday and I'm lazy, but please double-check that this works for both big-endian and little-endian :)
Any tips? Are the existing tests running thru s390x hosts in CI sufficient or should I add some tests writen in C (and not BPF assembler)? I can never think about endianness correctly...
I think that if test operates on a key like:
valid key 15 v 0000000f <-- written to stack as a single u64 value ^^^^^^^ stack zero marks
and is executed (e.g. using __retval annotation), then CI passing for s390 should be enough.
+1, something like that where for big-endian it will be all zero while for little endian it would be 0xf (and then make sure that the test should *fail* by making sure that 0xf is not a valid index, so NULL check is necessary)
How would it work for LE to be 0xF but BE to be 0x0?
The prog passes a pointer to the beginning of the u32 to bpf_map_lookup_elem(). The kernel does a 4 byte read starting from that address. On both BE and LE all 4 bytes will be interpreted. So set bits cannot just go away.
Am I missing something?
Thanks, Daniel
On Thu, 2024-12-19 at 14:41 -0700, Daniel Xu wrote:
[...]
I think that if test operates on a key like:
valid key 15 v 0000000f <-- written to stack as a single u64 value ^^^^^^^ stack zero marks
and is executed (e.g. using __retval annotation), then CI passing for s390 should be enough.
+1, something like that where for big-endian it will be all zero while for little endian it would be 0xf (and then make sure that the test should *fail* by making sure that 0xf is not a valid index, so NULL check is necessary)
How would it work for LE to be 0xF but BE to be 0x0?
The prog passes a pointer to the beginning of the u32 to bpf_map_lookup_elem(). The kernel does a 4 byte read starting from that address. On both BE and LE all 4 bytes will be interpreted. So set bits cannot just go away.
Am I missing something?
Ok, thinking a bit more, the best test I can come up with is:
u8 vals[8]; vals[0] = 0; ... vals[6] = 0; vals[7] = 0xf; p = bpf_map_lookup_elem(... vals ...); *p = 42;
For LE vals as u32 should be 0x0f; For BE vals as u32 should be 0xf000_0000. Hence, it is not safe to remove null check for this program. What would verifier think about the value of such key? As far as I understand, there would be stack zero for for vals[0-6] and u8 stack spill for vals[7]. You were going to add a check for the spill size, which should help here. So, a negative test like above that checks that verifier complains that 'p' should be checked for nullness first?
If anyone has better test in mind, please speak-up.
[...]
On Thu, Dec 19, 2024 at 04:04:43PM -0800, Eduard Zingerman wrote:
On Thu, 2024-12-19 at 14:41 -0700, Daniel Xu wrote:
[...]
I think that if test operates on a key like:
valid key 15 v 0000000f <-- written to stack as a single u64 value ^^^^^^^ stack zero marks
and is executed (e.g. using __retval annotation), then CI passing for s390 should be enough.
+1, something like that where for big-endian it will be all zero while for little endian it would be 0xf (and then make sure that the test should *fail* by making sure that 0xf is not a valid index, so NULL check is necessary)
How would it work for LE to be 0xF but BE to be 0x0?
The prog passes a pointer to the beginning of the u32 to bpf_map_lookup_elem(). The kernel does a 4 byte read starting from that address. On both BE and LE all 4 bytes will be interpreted. So set bits cannot just go away.
Am I missing something?
Ok, thinking a bit more, the best test I can come up with is:
u8 vals[8]; vals[0] = 0; ... vals[6] = 0; vals[7] = 0xf; p = bpf_map_lookup_elem(... vals ...); *p = 42;
For LE vals as u32 should be 0x0f; For BE vals as u32 should be 0xf000_0000. Hence, it is not safe to remove null check for this program. What would verifier think about the value of such key? As far as I understand, there would be stack zero for for vals[0-6] and u8 stack spill for vals[7].
Right. By checking that spill size is same as key size, we stay endian neutral, as constant values are tracked in native endianness.
However, if we were to start interpreting combinations of STACK_ZERO, STACK_MISC, and STACK_SPILL, the verifier would have to be endian aware (IIUC). Which makes it a somewhat interesting problem but also requires some thought to correctly handle the state space.
You were going to add a check for the spill size, which should help here. So, a negative test like above that checks that verifier complains that 'p' should be checked for nullness first?
If anyone has better test in mind, please speak-up.
I think this case reduces down to a spill_size != key_size test. As long as the sizes match, we don't have to worry about endianness.
Thanks, Daniel
On Thu, 2024-12-19 at 17:40 -0700, Daniel Xu wrote:
[...]
Ok, thinking a bit more, the best test I can come up with is:
u8 vals[8]; vals[0] = 0; ... vals[6] = 0; vals[7] = 0xf; p = bpf_map_lookup_elem(... vals ...); *p = 42;
For LE vals as u32 should be 0x0f; For BE vals as u32 should be 0xf000_0000. Hence, it is not safe to remove null check for this program. What would verifier think about the value of such key? As far as I understand, there would be stack zero for for vals[0-6] and u8 stack spill for vals[7].
Right. By checking that spill size is same as key size, we stay endian neutral, as constant values are tracked in native endianness.
However, if we were to start interpreting combinations of STACK_ZERO, STACK_MISC, and STACK_SPILL, the verifier would have to be endian aware (IIUC). Which makes it a somewhat interesting problem but also requires some thought to correctly handle the state space.
Right.
You were going to add a check for the spill size, which should help here. So, a negative test like above that checks that verifier complains that 'p' should be checked for nullness first?
If anyone has better test in mind, please speak-up.
I think this case reduces down to a spill_size != key_size test. As long as the sizes match, we don't have to worry about endianness.
Agree.
On Thu, Dec 19, 2024 at 4:43 PM Eduard Zingerman eddyz87@gmail.com wrote:
On Thu, 2024-12-19 at 17:40 -0700, Daniel Xu wrote:
[...]
Ok, thinking a bit more, the best test I can come up with is:
u8 vals[8]; vals[0] = 0; ... vals[6] = 0; vals[7] = 0xf; p = bpf_map_lookup_elem(... vals ...); *p = 42;
For LE vals as u32 should be 0x0f; For BE vals as u32 should be 0xf000_0000. Hence, it is not safe to remove null check for this program. What would verifier think about the value of such key? As far as I understand, there would be stack zero for for vals[0-6] and u8 stack spill for vals[7].
Right. By checking that spill size is same as key size, we stay endian neutral, as constant values are tracked in native endianness.
However, if we were to start interpreting combinations of STACK_ZERO, STACK_MISC, and STACK_SPILL, the verifier would have to be endian aware (IIUC). Which makes it a somewhat interesting problem but also requires some thought to correctly handle the state space.
Right.
You were going to add a check for the spill size, which should help here. So, a negative test like above that checks that verifier complains that 'p' should be checked for nullness first?
If anyone has better test in mind, please speak-up.
I think this case reduces down to a spill_size != key_size test. As long as the sizes match, we don't have to worry about endianness.
Agree.
Earlier I suggested to generalize this zero/misc/spill counting into a helper and reuse here and in check_stack_read_fixed_off().
We do very similar checks there with a similar purpose.
It sounds there are ideas to make this particular feature smarter than what we have in check_stack_read_fixed_off(). Let's not overdo it. Even if a common helper is not possible, keep things consistent. The simpler the better.
On Thu, Dec 19, 2024 at 04:49:13PM -0800, Alexei Starovoitov wrote:
On Thu, Dec 19, 2024 at 4:43 PM Eduard Zingerman eddyz87@gmail.com wrote:
On Thu, 2024-12-19 at 17:40 -0700, Daniel Xu wrote:
[...]
Ok, thinking a bit more, the best test I can come up with is:
u8 vals[8]; vals[0] = 0; ... vals[6] = 0; vals[7] = 0xf; p = bpf_map_lookup_elem(... vals ...); *p = 42;
For LE vals as u32 should be 0x0f; For BE vals as u32 should be 0xf000_0000. Hence, it is not safe to remove null check for this program. What would verifier think about the value of such key? As far as I understand, there would be stack zero for for vals[0-6] and u8 stack spill for vals[7].
Right. By checking that spill size is same as key size, we stay endian neutral, as constant values are tracked in native endianness.
However, if we were to start interpreting combinations of STACK_ZERO, STACK_MISC, and STACK_SPILL, the verifier would have to be endian aware (IIUC). Which makes it a somewhat interesting problem but also requires some thought to correctly handle the state space.
Right.
You were going to add a check for the spill size, which should help here. So, a negative test like above that checks that verifier complains that 'p' should be checked for nullness first?
If anyone has better test in mind, please speak-up.
I think this case reduces down to a spill_size != key_size test. As long as the sizes match, we don't have to worry about endianness.
Agree.
Earlier I suggested to generalize this zero/misc/spill counting into a helper and reuse here and in check_stack_read_fixed_off().
We do very similar checks there with a similar purpose.
Looked again, didn't see any obvious way to share code that doesn't make it more confusing. Let me post v6 without this particular refactor. If I missed something I'll fix it up in v7.
It sounds there are ideas to make this particular feature smarter than what we have in check_stack_read_fixed_off(). Let's not overdo it. Even if a common helper is not possible, keep things consistent. The simpler the better.
Fair enough. We can keep it simple.
Thanks, Daniel
On Fri, 13 Dec 2024 at 00:24, Daniel Xu dxu@dxuuu.xyz wrote:
This commit allows progs to elide a null check on statically known map lookup keys. In other words, if the verifier can statically prove that the lookup will be in-bounds, allow the prog to drop the null check.
This is useful for two reasons:
- Large numbers of nullness checks (especially when they cannot fail) unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ.
- It forms a tighter contract between programmer and verifier.
For (1), bpftrace is starting to make heavier use of percpu scratch maps. As a result, for user scripts with large number of unrolled loops, we are starting to hit jump complexity verification errors. These percpu lookups cannot fail anyways, as we only use static key values. Eliding nullness probably results in less work for verifier as well.
For (2), percpu scratch maps are often used as a larger stack, as the currrent stack is limited to 512 bytes. In these situations, it is desirable for the programmer to express: "this lookup should never fail, and if it does, it means I messed up the code". By omitting the null check, the programmer can "ask" the verifier to double check the logic.
Tests also have to be updated in sync with these changes, as the verifier is more efficient with this change. Notable, iters.c tests had to be changed to use a map type that still requires null checks, as it's exercising verifier tracking logic w.r.t iterators.
Signed-off-by: Daniel Xu dxu@dxuuu.xyz
kernel/bpf/verifier.c | 80 ++++++++++++++++++- tools/testing/selftests/bpf/progs/iters.c | 14 ++-- .../selftests/bpf/progs/map_kptr_fail.c | 2 +- .../selftests/bpf/progs/verifier_map_in_map.c | 2 +- .../testing/selftests/bpf/verifier/map_kptr.c | 2 +- 5 files changed, 87 insertions(+), 13 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 58b36cc96bd5..4947ef884a18 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -287,6 +287,7 @@ struct bpf_call_arg_meta { u32 ret_btf_id; u32 subprogno; struct btf_field *kptr_field;
s64 const_map_key;
};
struct bpf_kfunc_call_arg_meta { @@ -9163,6 +9164,53 @@ static int check_reg_const_str(struct bpf_verifier_env *env, return 0; }
+/* Returns constant key value if possible, else -1 */ +static s64 get_constant_map_key(struct bpf_verifier_env *env,
struct bpf_reg_state *key,
u32 key_size)
+{
struct bpf_func_state *state = func(env, key);
struct bpf_reg_state *reg;
int zero_size = 0;
int stack_off;
u8 *stype;
int slot;
int spi;
int i;
if (!env->bpf_capable)
return -1;
if (key->type != PTR_TO_STACK)
return -1;
if (!tnum_is_const(key->var_off))
return -1;
stack_off = key->off + key->var_off.value;
slot = -stack_off - 1;
spi = slot / BPF_REG_SIZE;
/* First handle precisely tracked STACK_ZERO, up to BPF_REG_SIZE */
stype = state->stack[spi].slot_type;
for (i = 0; i < BPF_REG_SIZE && stype[i] == STACK_ZERO; i++)
zero_size++;
if (zero_size == key_size)
return 0;
if (!is_spilled_reg(&state->stack[spi]))
/* Not pointer to stack */
return -1;
reg = &state->stack[spi].spilled_ptr;
if (reg->type != SCALAR_VALUE)
/* Only scalars are valid array map keys */
return -1;
else if (!tnum_is_const(reg->var_off))
/* Stack value not statically known */
return -1;
return reg->var_off.value;
+}
static int check_func_arg(struct bpf_verifier_env *env, u32 arg, struct bpf_call_arg_meta *meta, const struct bpf_func_proto *fn, @@ -9173,6 +9221,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, enum bpf_arg_type arg_type = fn->arg_type[arg]; enum bpf_reg_type type = reg->type; u32 *arg_btf_id = NULL;
u32 key_size; int err = 0; bool mask;
@@ -9307,8 +9356,11 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg, verbose(env, "invalid map_ptr to access map->key\n"); return -EACCES; }
err = check_helper_mem_access(env, regno, meta->map_ptr->key_size,
BPF_READ, false, NULL);
key_size = meta->map_ptr->key_size;
err = check_helper_mem_access(env, regno, key_size, BPF_READ, false, NULL);
if (err)
return err;
meta->const_map_key = get_constant_map_key(env, reg, key_size); break; case ARG_PTR_TO_MAP_VALUE: if (type_may_be_null(arg_type) && register_is_null(reg))
@@ -10833,6 +10885,21 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno state->callback_subprogno == subprogno); }
+/* Returns whether or not the given map type can potentially elide
- lookup return value nullness check. This is possible if the key
- is statically known.
- */
+static bool can_elide_value_nullness(enum bpf_map_type type) +{
switch (type) {
case BPF_MAP_TYPE_ARRAY:
case BPF_MAP_TYPE_PERCPU_ARRAY:
return true;
default:
return false;
}
+}
static int get_helper_proto(struct bpf_verifier_env *env, int func_id, const struct bpf_func_proto **ptr) { @@ -11199,10 +11266,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn "kernel subsystem misconfigured verifier\n"); return -EINVAL; }
if (func_id == BPF_FUNC_map_lookup_elem &&
can_elide_value_nullness(meta.map_ptr->map_type) &&
meta.const_map_key >= 0 &&
meta.const_map_key < meta.map_ptr->max_entries)
ret_flag &= ~PTR_MAYBE_NULL;
I think we probably need mark_chain_precision applied on the constant key since its concrete value is made use of here to prevent pruning on it. If it's already happening and I missed it, I think we should atleast add a comment.
For context of a similar case with tail calls, see commit cc52d9140aa9 ("bpf: Fix record_func_key to perform backtracking on r3") for what happens when it is missed.
regs[BPF_REG_0].map_ptr = meta.map_ptr; regs[BPF_REG_0].map_uid = meta.map_uid; regs[BPF_REG_0].type = PTR_TO_MAP_VALUE | ret_flag;
if (!type_may_be_null(ret_type) &&
if (!type_may_be_null(ret_flag) && btf_record_has_field(meta.map_ptr->record, BPF_SPIN_LOCK)) { regs[BPF_REG_0].id = ++env->id_gen; }
diff --git a/tools/testing/selftests/bpf/progs/iters.c b/tools/testing/selftests/bpf/progs/iters.c index 7c969c127573..190822b2f08b 100644 --- a/tools/testing/selftests/bpf/progs/iters.c +++ b/tools/testing/selftests/bpf/progs/iters.c @@ -524,11 +524,11 @@ int iter_subprog_iters(const void *ctx) }
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__uint(type, BPF_MAP_TYPE_HASH); __type(key, int); __type(value, int); __uint(max_entries, 1000);
-} arr_map SEC(".maps"); +} hash_map SEC(".maps");
SEC("?raw_tp") __failure __msg("invalid mem access 'scalar'") @@ -539,7 +539,7 @@ int iter_err_too_permissive1(const void *ctx)
MY_PID_GUARD();
map_val = bpf_map_lookup_elem(&arr_map, &key);
map_val = bpf_map_lookup_elem(&hash_map, &key); if (!map_val) return 0;
@@ -561,12 +561,12 @@ int iter_err_too_permissive2(const void *ctx)
MY_PID_GUARD();
map_val = bpf_map_lookup_elem(&arr_map, &key);
map_val = bpf_map_lookup_elem(&hash_map, &key); if (!map_val) return 0; bpf_repeat(1000000) {
map_val = bpf_map_lookup_elem(&arr_map, &key);
map_val = bpf_map_lookup_elem(&hash_map, &key); } *map_val = 123;
@@ -585,7 +585,7 @@ int iter_err_too_permissive3(const void *ctx) MY_PID_GUARD();
bpf_repeat(1000000) {
map_val = bpf_map_lookup_elem(&arr_map, &key);
map_val = bpf_map_lookup_elem(&hash_map, &key); found = true; }
@@ -606,7 +606,7 @@ int iter_tricky_but_fine(const void *ctx) MY_PID_GUARD();
bpf_repeat(1000000) {
map_val = bpf_map_lookup_elem(&arr_map, &key);
map_val = bpf_map_lookup_elem(&hash_map, &key); if (map_val) { found = true; break;
diff --git a/tools/testing/selftests/bpf/progs/map_kptr_fail.c b/tools/testing/selftests/bpf/progs/map_kptr_fail.c index c2a6bd392e48..4c0ff01f1a96 100644 --- a/tools/testing/selftests/bpf/progs/map_kptr_fail.c +++ b/tools/testing/selftests/bpf/progs/map_kptr_fail.c @@ -345,7 +345,7 @@ int reject_indirect_global_func_access(struct __sk_buff *ctx) }
SEC("?tc") -__failure __msg("Unreleased reference id=5 alloc_insn=") +__failure __msg("Unreleased reference id=4 alloc_insn=") int kptr_xchg_ref_state(struct __sk_buff *ctx) { struct prog_test_ref_kfunc *p; diff --git a/tools/testing/selftests/bpf/progs/verifier_map_in_map.c b/tools/testing/selftests/bpf/progs/verifier_map_in_map.c index 4eaab1468eb7..7d088ba99ea5 100644 --- a/tools/testing/selftests/bpf/progs/verifier_map_in_map.c +++ b/tools/testing/selftests/bpf/progs/verifier_map_in_map.c @@ -47,7 +47,7 @@ l0_%=: r0 = 0; \
SEC("xdp") __description("map in map state pruning") -__success __msg("processed 26 insns") +__success __msg("processed 15 insns") __log_level(2) __retval(0) __flag(BPF_F_TEST_STATE_FREQ) __naked void map_in_map_state_pruning(void) { diff --git a/tools/testing/selftests/bpf/verifier/map_kptr.c b/tools/testing/selftests/bpf/verifier/map_kptr.c index f420c0312aa0..4b39f8472f9b 100644 --- a/tools/testing/selftests/bpf/verifier/map_kptr.c +++ b/tools/testing/selftests/bpf/verifier/map_kptr.c @@ -373,7 +373,7 @@ .prog_type = BPF_PROG_TYPE_SCHED_CLS, .fixup_map_kptr = { 1 }, .result = REJECT,
.errstr = "Unreleased reference id=5 alloc_insn=20",
.errstr = "Unreleased reference id=4 alloc_insn=20", .fixup_kfunc_btf_id = { { "bpf_kfunc_call_test_acquire", 15 }, }
-- 2.46.0
On Sat, 2024-12-14 at 00:10 +0100, Kumar Kartikeya Dwivedi wrote:
[...]
@@ -11199,10 +11266,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn "kernel subsystem misconfigured verifier\n"); return -EINVAL; }
if (func_id == BPF_FUNC_map_lookup_elem &&
can_elide_value_nullness(meta.map_ptr->map_type) &&
meta.const_map_key >= 0 &&
meta.const_map_key < meta.map_ptr->max_entries)
ret_flag &= ~PTR_MAYBE_NULL;
I think we probably need mark_chain_precision applied on the constant key since its concrete value is made use of here to prevent pruning on it. If it's already happening and I missed it, I think we should atleast add a comment.
For context of a similar case with tail calls, see commit cc52d9140aa9 ("bpf: Fix record_func_key to perform backtracking on r3") for what happens when it is missed.
Great point, I'm sure this does not happen.
[...]
Test that nullness elision works for common use cases. For example, we want to check that both full and subreg stack slots are recognized. As well as when there's both const and non-const values of R2 leading up to a lookup. And obviously some bound checks.
Signed-off-by: Daniel Xu dxu@dxuuu.xyz --- .../bpf/progs/verifier_array_access.c | 214 ++++++++++++++++++ 1 file changed, 214 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/verifier_array_access.c b/tools/testing/selftests/bpf/progs/verifier_array_access.c index 4195aa824ba5..8ed8865fc6f6 100644 --- a/tools/testing/selftests/bpf/progs/verifier_array_access.c +++ b/tools/testing/selftests/bpf/progs/verifier_array_access.c @@ -28,6 +28,20 @@ struct { __uint(map_flags, BPF_F_WRONLY_PROG); } map_array_wo SEC(".maps");
+struct { + __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY); + __uint(max_entries, 2); + __type(key, int); + __type(value, struct test_val); +} map_array_pcpu SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 2); + __type(key, int); + __type(value, struct test_val); +} map_array SEC(".maps"); + struct { __uint(type, BPF_MAP_TYPE_HASH); __uint(max_entries, 1); @@ -525,4 +539,204 @@ l0_%=: exit; \ : __clobber_all); }
+SEC("socket") +__description("valid map access into an array using constant without nullness") +__success __retval(4) +__naked void an_array_with_a_constant_no_nullness(void) +{ + asm volatile (" \ + r1 = 1; \ + *(u64*)(r10 - 8) = r1; \ + r2 = r10; \ + r2 += -8; \ + r1 = %[map_array] ll; \ + call %[bpf_map_lookup_elem]; \ + r1 = %[test_val_foo]; \ + *(u64*)(r0 + 0) = r1; \ + r0 = *(u64*)(r0 + 0); \ + exit; \ +" : + : __imm(bpf_map_lookup_elem), + __imm_addr(map_array), + __imm_const(test_val_foo, offsetof(struct test_val, foo)) + : __clobber_all); +} + +SEC("socket") +__description("valid multiple map access into an array using constant without nullness") +__success __retval(8) +__naked void multiple_array_with_a_constant_no_nullness(void) +{ + asm volatile (" \ + r1 = 1; \ + *(u64*)(r10 - 8) = r1; \ + r2 = r10; \ + r2 += -8; \ + r1 = %[map_array] ll; \ + call %[bpf_map_lookup_elem]; \ + r6 = %[test_val_foo]; \ + *(u64*)(r0 + 0) = r6; \ + r7 = *(u64*)(r0 + 0); \ + r1 = 0; \ + *(u64*)(r10 - 16) = r1; \ + r2 = r10; \ + r2 += -16; \ + r1 = %[map_array] ll; \ + call %[bpf_map_lookup_elem]; \ + *(u64*)(r0 + 0) = r6; \ + r1 = *(u64*)(r0 + 0); \ + r7 += r1; \ + r0 = r7; \ + exit; \ +" : + : __imm(bpf_map_lookup_elem), + __imm_addr(map_array), + __imm_const(test_val_foo, offsetof(struct test_val, foo)) + : __clobber_all); +} + +SEC("socket") +__description("valid map access into an array using 32-bit constant without nullness") +__success __retval(4) +__naked void an_array_with_a_32bit_constant_no_nullness(void) +{ + /* 32-bit write must be to stack address aligned to BPF_REG_SIZE + * so that the spill is tracked. Unaligned subreg writes are less + * precisely tracked. + */ + asm volatile (" \ + r1 = 1; \ + *(u32*)(r10 - 8) = r1; \ + r2 = r10; \ + r2 += -8; \ + r1 = %[map_array] ll; \ + call %[bpf_map_lookup_elem]; \ + r1 = %[test_val_foo]; \ + *(u64*)(r0 + 0) = r1; \ + r0 = *(u64*)(r0 + 0); \ + exit; \ +" : + : __imm(bpf_map_lookup_elem), + __imm_addr(map_array), + __imm_const(test_val_foo, offsetof(struct test_val, foo)) + : __clobber_all); +} + +SEC("socket") +__description("valid map access into an array using 32-bit constant 0 without nullness") +__success __retval(4) +__naked void an_array_with_a_32bit_constant_0_no_nullness(void) +{ + /* Unlike the above test, 32-bit zeroing is precisely tracked even + * if writes are not aligned to BPF_REG_SIZE. This tests that our + * STACK_ZERO handling functions. + */ + asm volatile (" \ + r1 = 0; \ + *(u32*)(r10 - 4) = r1; \ + r2 = r10; \ + r2 += -4; \ + r1 = %[map_array] ll; \ + call %[bpf_map_lookup_elem]; \ + r1 = %[test_val_foo]; \ + *(u64*)(r0 + 0) = r1; \ + r0 = *(u64*)(r0 + 0); \ + exit; \ +" : + : __imm(bpf_map_lookup_elem), + __imm_addr(map_array), + __imm_const(test_val_foo, offsetof(struct test_val, foo)) + : __clobber_all); +} + +SEC("socket") +__description("valid map access into a pcpu array using constant without nullness") +__success __retval(4) +__naked void a_pcpu_array_with_a_constant_no_nullness(void) +{ + asm volatile (" \ + r1 = 1; \ + *(u64*)(r10 - 8) = r1; \ + r2 = r10; \ + r2 += -8; \ + r1 = %[map_array_pcpu] ll; \ + call %[bpf_map_lookup_elem]; \ + r1 = %[test_val_foo]; \ + *(u64*)(r0 + 0) = r1; \ + r0 = *(u64*)(r0 + 0); \ + exit; \ +" : + : __imm(bpf_map_lookup_elem), + __imm_addr(map_array_pcpu), + __imm_const(test_val_foo, offsetof(struct test_val, foo)) + : __clobber_all); +} + +SEC("socket") +__description("invalid map access into an array using constant without nullness") +__failure __msg("R0 invalid mem access 'map_value_or_null'") +__naked void an_array_with_a_constant_no_nullness_out_of_bounds(void) +{ + asm volatile (" \ + r1 = 3; \ + *(u64*)(r10 - 8) = r1; \ + r2 = r10; \ + r2 += -8; \ + r1 = %[map_array] ll; \ + call %[bpf_map_lookup_elem]; \ + r1 = %[test_val_foo]; \ + *(u64*)(r0 + 0) = r1; \ + r0 = *(u64*)(r0 + 0); \ + exit; \ +" : + : __imm(bpf_map_lookup_elem), + __imm_addr(map_array), + __imm_const(test_val_foo, offsetof(struct test_val, foo)) + : __clobber_all); +} + +SEC("socket") +__description("invalid elided lookup using const and non-const key") +__failure __msg("R0 invalid mem access 'map_value_or_null'") +__naked void mixed_const_and_non_const_key_lookup(void) +{ + asm volatile (" \ + call %[bpf_get_prandom_u32]; \ + if r0 > 42 goto l1_%=; \ + *(u64*)(r10 - 8) = r0; \ + r2 = r10; \ + r2 += -8; \ + goto l0_%=; \ +l1_%=: r1 = 1; \ + *(u64*)(r10 - 8) = r1; \ + r2 = r10; \ + r2 += -8; \ +l0_%=: r1 = %[map_array] ll; \ + call %[bpf_map_lookup_elem]; \ + r0 = *(u64*)(r0 + 0); \ + exit; \ +" : + : __imm(bpf_get_prandom_u32), + __imm(bpf_map_lookup_elem), + __imm_addr(map_array) + : __clobber_all); +} + +SEC("socket") +__failure __msg("invalid read from stack R2 off=4096 size=4") +__naked void key_lookup_at_invalid_fp(void) +{ + asm volatile (" \ + r1 = %[map_array] ll; \ + r2 = r10; \ + r2 += 4096; \ + call %[bpf_map_lookup_elem]; \ + r0 = *(u64*)(r0 + 0); \ + exit; \ +" : + : __imm(bpf_map_lookup_elem), + __imm_addr(map_array) + : __clobber_all); +} + char _license[] SEC("license") = "GPL";
On Thu, 2024-12-12 at 16:22 -0700, Daniel Xu wrote:
Test that nullness elision works for common use cases. For example, we want to check that both full and subreg stack slots are recognized. As well as when there's both const and non-const values of R2 leading up to a lookup. And obviously some bound checks.
Signed-off-by: Daniel Xu dxu@dxuuu.xyz
Daniel,
since there would be a respin of this patch-set, maybe consider using plain C for some of the tests?
[...]
On Fri, Dec 13, 2024, at 10:17 PM, Eduard Zingerman wrote:
On Thu, 2024-12-12 at 16:22 -0700, Daniel Xu wrote:
Test that nullness elision works for common use cases. For example, we want to check that both full and subreg stack slots are recognized. As well as when there's both const and non-const values of R2 leading up to a lookup. And obviously some bound checks.
Signed-off-by: Daniel Xu dxu@dxuuu.xyz
Daniel,
since there would be a respin of this patch-set, maybe consider using plain C for some of the tests?
Yeah, makes sense. Will do for v6.
Thanks, Daniel
linux-kselftest-mirror@lists.linaro.org