This patch allows progs to elide a null check on statically known map lookup keys. In other words, if the verifier can statically prove that the lookup will be in-bounds, allow the prog to drop the null check.
This is useful for two reasons:
1. Large numbers of nullness checks (especially when they cannot fail) unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ. 2. It forms a tighter contract between programmer and verifier.
For (1), bpftrace is starting to make heavier use of percpu scratch maps. As a result, for user scripts with large number of unrolled loops, we are starting to hit jump complexity verification errors. These percpu lookups cannot fail anyways, as we only use static key values. Eliding nullness probably results in less work for verifier as well.
For (2), percpu scratch maps are often used as a larger stack, as the currrent stack is limited to 512 bytes. In these situations, it is desirable for the programmer to express: "this lookup should never fail, and if it does, it means I messed up the code". By omitting the null check, the programmer can "ask" the verifier to double check the logic.
Changes in v3: * Check if stack is (erroneously) growing upwards * Mention in commit message why existing tests needed change
Changes in v2: * Added a check for when R2 is not a ptr to stack * Added a check for when stack is uninitialized (no stack slot yet) * Updated existing tests to account for null elision * Added test case for when R2 can be both const and non-const
Daniel Xu (2): bpf: verifier: Support eliding map lookup nullness bpf: selftests: verifier: Add nullness elision tests
kernel/bpf/verifier.c | 67 ++++++- tools/testing/selftests/bpf/progs/iters.c | 14 +- .../selftests/bpf/progs/map_kptr_fail.c | 2 +- .../bpf/progs/verifier_array_access.c | 166 ++++++++++++++++++ .../selftests/bpf/progs/verifier_map_in_map.c | 2 +- .../testing/selftests/bpf/verifier/map_kptr.c | 2 +- 6 files changed, 242 insertions(+), 11 deletions(-)
This commit allows progs to elide a null check on statically known map lookup keys. In other words, if the verifier can statically prove that the lookup will be in-bounds, allow the prog to drop the null check.
This is useful for two reasons:
1. Large numbers of nullness checks (especially when they cannot fail) unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ. 2. It forms a tighter contract between programmer and verifier.
For (1), bpftrace is starting to make heavier use of percpu scratch maps. As a result, for user scripts with large number of unrolled loops, we are starting to hit jump complexity verification errors. These percpu lookups cannot fail anyways, as we only use static key values. Eliding nullness probably results in less work for verifier as well.
For (2), percpu scratch maps are often used as a larger stack, as the currrent stack is limited to 512 bytes. In these situations, it is desirable for the programmer to express: "this lookup should never fail, and if it does, it means I messed up the code". By omitting the null check, the programmer can "ask" the verifier to double check the logic.
Tests also have to be updated in sync with these changes, as the verifier is more efficient with this change. Notable, iters.c tests had to be changed to use a map type that still requires null checks, as it's exercising verifier tracking logic w.r.t iterators.
Signed-off-by: Daniel Xu dxu@dxuuu.xyz --- kernel/bpf/verifier.c | 67 ++++++++++++++++++- tools/testing/selftests/bpf/progs/iters.c | 14 ++-- .../selftests/bpf/progs/map_kptr_fail.c | 2 +- .../selftests/bpf/progs/verifier_map_in_map.c | 2 +- .../testing/selftests/bpf/verifier/map_kptr.c | 2 +- 5 files changed, 76 insertions(+), 11 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index dd86282ccaa4..cff745d484df 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -284,6 +284,7 @@ struct bpf_call_arg_meta { u32 ret_btf_id; u32 subprogno; struct btf_field *kptr_field; + long const_map_key; };
struct bpf_kfunc_call_arg_meta { @@ -10416,6 +10417,54 @@ static void update_loop_inline_state(struct bpf_verifier_env *env, u32 subprogno state->callback_subprogno == subprogno); }
+/* Returns whether or not the given map type can potentially elide + * lookup return value nullness check. This is possible if the key + * is statically known. + */ +static bool can_elide_value_nullness(enum bpf_map_type type) +{ + switch (type) { + case BPF_MAP_TYPE_ARRAY: + case BPF_MAP_TYPE_PERCPU_ARRAY: + return true; + default: + return false; + } +} + +/* Returns constant key value if possible, else -1 */ +static long get_constant_map_key(struct bpf_verifier_env *env, + struct bpf_reg_state *key) +{ + struct bpf_func_state *state = func(env, key); + struct bpf_reg_state *reg; + int stack_off; + int slot; + int spi; + + if (key->type != PTR_TO_STACK) + return -1; + if (!tnum_is_const(key->var_off)) + return -1; + + stack_off = key->off + key->var_off.value; + slot = -stack_off - 1; + if (slot < 0) + /* Stack grew upwards */ + return -1; + else if (slot >= state->allocated_stack) + /* Stack uninitialized */ + return -1; + + spi = slot / BPF_REG_SIZE; + reg = &state->stack[spi].spilled_ptr; + if (!tnum_is_const(reg->var_off)) + /* Stack value not statically known */ + return -1; + + return reg->var_off.value; +} + static int get_helper_proto(struct bpf_verifier_env *env, int func_id, const struct bpf_func_proto **ptr) { @@ -10513,6 +10562,15 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn env->insn_aux_data[insn_idx].storage_get_func_atomic = true; }
+ /* Logically we are trying to check on key register state before + * the helper is called, so process here. Otherwise argument processing + * may clobber the spilled key values. + */ + regs = cur_regs(env); + if (func_id == BPF_FUNC_map_lookup_elem) + meta.const_map_key = get_constant_map_key(env, ®s[BPF_REG_2]); + + meta.func_id = func_id; /* check args */ for (i = 0; i < MAX_BPF_FUNC_REG_ARGS; i++) { @@ -10773,10 +10831,17 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn "kernel subsystem misconfigured verifier\n"); return -EINVAL; } + + if (func_id == BPF_FUNC_map_lookup_elem && + can_elide_value_nullness(meta.map_ptr->map_type) && + meta.const_map_key >= 0 && + meta.const_map_key < meta.map_ptr->max_entries) + ret_flag &= ~PTR_MAYBE_NULL; + regs[BPF_REG_0].map_ptr = meta.map_ptr; regs[BPF_REG_0].map_uid = meta.map_uid; regs[BPF_REG_0].type = PTR_TO_MAP_VALUE | ret_flag; - if (!type_may_be_null(ret_type) && + if (!type_may_be_null(regs[BPF_REG_0].type) && btf_record_has_field(meta.map_ptr->record, BPF_SPIN_LOCK)) { regs[BPF_REG_0].id = ++env->id_gen; } diff --git a/tools/testing/selftests/bpf/progs/iters.c b/tools/testing/selftests/bpf/progs/iters.c index ef70b88bccb2..24e6cd946396 100644 --- a/tools/testing/selftests/bpf/progs/iters.c +++ b/tools/testing/selftests/bpf/progs/iters.c @@ -524,11 +524,11 @@ int iter_subprog_iters(const void *ctx) }
struct { - __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(type, BPF_MAP_TYPE_HASH); __type(key, int); __type(value, int); __uint(max_entries, 1000); -} arr_map SEC(".maps"); +} hash_map SEC(".maps");
SEC("?raw_tp") __failure __msg("invalid mem access 'scalar'") @@ -539,7 +539,7 @@ int iter_err_too_permissive1(const void *ctx)
MY_PID_GUARD();
- map_val = bpf_map_lookup_elem(&arr_map, &key); + map_val = bpf_map_lookup_elem(&hash_map, &key); if (!map_val) return 0;
@@ -561,12 +561,12 @@ int iter_err_too_permissive2(const void *ctx)
MY_PID_GUARD();
- map_val = bpf_map_lookup_elem(&arr_map, &key); + map_val = bpf_map_lookup_elem(&hash_map, &key); if (!map_val) return 0;
bpf_repeat(1000000) { - map_val = bpf_map_lookup_elem(&arr_map, &key); + map_val = bpf_map_lookup_elem(&hash_map, &key); }
*map_val = 123; @@ -585,7 +585,7 @@ int iter_err_too_permissive3(const void *ctx) MY_PID_GUARD();
bpf_repeat(1000000) { - map_val = bpf_map_lookup_elem(&arr_map, &key); + map_val = bpf_map_lookup_elem(&hash_map, &key); found = true; }
@@ -606,7 +606,7 @@ int iter_tricky_but_fine(const void *ctx) MY_PID_GUARD();
bpf_repeat(1000000) { - map_val = bpf_map_lookup_elem(&arr_map, &key); + map_val = bpf_map_lookup_elem(&hash_map, &key); if (map_val) { found = true; break; diff --git a/tools/testing/selftests/bpf/progs/map_kptr_fail.c b/tools/testing/selftests/bpf/progs/map_kptr_fail.c index 450bb373b179..c4a81d1c1354 100644 --- a/tools/testing/selftests/bpf/progs/map_kptr_fail.c +++ b/tools/testing/selftests/bpf/progs/map_kptr_fail.c @@ -345,7 +345,7 @@ int reject_indirect_global_func_access(struct __sk_buff *ctx) }
SEC("?tc") -__failure __msg("Unreleased reference id=5 alloc_insn=") +__failure __msg("Unreleased reference id=4 alloc_insn=") int kptr_xchg_ref_state(struct __sk_buff *ctx) { struct prog_test_ref_kfunc *p; diff --git a/tools/testing/selftests/bpf/progs/verifier_map_in_map.c b/tools/testing/selftests/bpf/progs/verifier_map_in_map.c index 4eaab1468eb7..7d088ba99ea5 100644 --- a/tools/testing/selftests/bpf/progs/verifier_map_in_map.c +++ b/tools/testing/selftests/bpf/progs/verifier_map_in_map.c @@ -47,7 +47,7 @@ l0_%=: r0 = 0; \
SEC("xdp") __description("map in map state pruning") -__success __msg("processed 26 insns") +__success __msg("processed 15 insns") __log_level(2) __retval(0) __flag(BPF_F_TEST_STATE_FREQ) __naked void map_in_map_state_pruning(void) { diff --git a/tools/testing/selftests/bpf/verifier/map_kptr.c b/tools/testing/selftests/bpf/verifier/map_kptr.c index f420c0312aa0..4b39f8472f9b 100644 --- a/tools/testing/selftests/bpf/verifier/map_kptr.c +++ b/tools/testing/selftests/bpf/verifier/map_kptr.c @@ -373,7 +373,7 @@ .prog_type = BPF_PROG_TYPE_SCHED_CLS, .fixup_map_kptr = { 1 }, .result = REJECT, - .errstr = "Unreleased reference id=5 alloc_insn=20", + .errstr = "Unreleased reference id=4 alloc_insn=20", .fixup_kfunc_btf_id = { { "bpf_kfunc_call_test_acquire", 15 }, }
On Tue, 2024-09-24 at 04:40 -0600, Daniel Xu wrote:
This commit allows progs to elide a null check on statically known map lookup keys. In other words, if the verifier can statically prove that the lookup will be in-bounds, allow the prog to drop the null check.
This is useful for two reasons:
- Large numbers of nullness checks (especially when they cannot fail) unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ.
- It forms a tighter contract between programmer and verifier.
For (1), bpftrace is starting to make heavier use of percpu scratch maps. As a result, for user scripts with large number of unrolled loops, we are starting to hit jump complexity verification errors. These percpu lookups cannot fail anyways, as we only use static key values. Eliding nullness probably results in less work for verifier as well.
For (2), percpu scratch maps are often used as a larger stack, as the currrent stack is limited to 512 bytes. In these situations, it is desirable for the programmer to express: "this lookup should never fail, and if it does, it means I messed up the code". By omitting the null check, the programmer can "ask" the verifier to double check the logic.
Tests also have to be updated in sync with these changes, as the verifier is more efficient with this change. Notable, iters.c tests had to be changed to use a map type that still requires null checks, as it's exercising verifier tracking logic w.r.t iterators.
Signed-off-by: Daniel Xu dxu@dxuuu.xyz
Acked-by: Eduard Zingerman eddyz87@gmail.com
[...]
+/* Returns constant key value if possible, else -1 */ +static long get_constant_map_key(struct bpf_verifier_env *env,
struct bpf_reg_state *key)
+{
- struct bpf_func_state *state = func(env, key);
- struct bpf_reg_state *reg;
- int stack_off;
- int slot;
- int spi;
- if (key->type != PTR_TO_STACK)
return -1;
- if (!tnum_is_const(key->var_off))
return -1;
- stack_off = key->off + key->var_off.value;
- slot = -stack_off - 1;
- if (slot < 0)
/* Stack grew upwards */
return -1;
Nitpick: I'd also add a test like below:
SEC("socket") __failure __msg("invalid indirect access to stack R2 off=4096 size=4") __naked void key_lookup_at_invalid_fp(void) { asm volatile (" \ r1 = %[map_array] ll; \ r2 = r10; \ r2 += 4096; \ call %[bpf_map_lookup_elem]; \ r0 = *(u64*)(r0 + 0); \ exit; \ " : : __imm(bpf_map_lookup_elem), __imm_addr(map_array) : __clobber_all); }
(double checked with v2 and this test does cause page fault)
[...]
On Tue, Sep 24, 2024 at 12:40 PM Daniel Xu dxu@dxuuu.xyz wrote:
+/* Returns constant key value if possible, else -1 */ +static long get_constant_map_key(struct bpf_verifier_env *env,
struct bpf_reg_state *key)
+{
struct bpf_func_state *state = func(env, key);
struct bpf_reg_state *reg;
int stack_off;
int slot;
int spi;
if (key->type != PTR_TO_STACK)
return -1;
if (!tnum_is_const(key->var_off))
return -1;
stack_off = key->off + key->var_off.value;
slot = -stack_off - 1;
if (slot < 0)
/* Stack grew upwards */
The comment is misleading. The verifier is supposed to catch this. It's just this helper was called before the stack bounds were checked? Maybe the call can be done later?
return -1;
else if (slot >= state->allocated_stack)
/* Stack uninitialized */
return -1;
spi = slot / BPF_REG_SIZE;
reg = &state->stack[spi].spilled_ptr;
if (!tnum_is_const(reg->var_off))
/* Stack value not statically known */
return -1;
return reg->var_off.value;
+}
Looks like the code is more subtle than it looks.
I think it's better to guard it all with CAP_BPF.
pw-bot: cr
On Wed, Sep 25, 2024 at 10:24:01AM GMT, Alexei Starovoitov wrote:
On Tue, Sep 24, 2024 at 12:40 PM Daniel Xu dxu@dxuuu.xyz wrote:
+/* Returns constant key value if possible, else -1 */ +static long get_constant_map_key(struct bpf_verifier_env *env,
struct bpf_reg_state *key)
+{
struct bpf_func_state *state = func(env, key);
struct bpf_reg_state *reg;
int stack_off;
int slot;
int spi;
if (key->type != PTR_TO_STACK)
return -1;
if (!tnum_is_const(key->var_off))
return -1;
stack_off = key->off + key->var_off.value;
slot = -stack_off - 1;
if (slot < 0)
/* Stack grew upwards */
The comment is misleading. The verifier is supposed to catch this. It's just this helper was called before the stack bounds were checked?
Yeah. Stack bounds checked in check_stack_access_within_bounds() as part of helper call argument checks.
Maybe the call can be done later?
Maybe? The argument checking starts clobbering state so it'll probably be not very simple to pull information out after args are checked.
I think the logic will probably be much easier to follow with current approach. But maybe I'm missing a simpler idea.
return -1;
else if (slot >= state->allocated_stack)
/* Stack uninitialized */
return -1;
spi = slot / BPF_REG_SIZE;
reg = &state->stack[spi].spilled_ptr;
if (!tnum_is_const(reg->var_off))
/* Stack value not statically known */
return -1;
return reg->var_off.value;
+}
Looks like the code is more subtle than it looks.
I think it's better to guard it all with CAP_BPF.
Ack.
Hit send too early.
On Tue, Oct 1, 2024, at 5:07 PM, Daniel Xu wrote:
On Wed, Sep 25, 2024 at 10:24:01AM GMT, Alexei Starovoitov wrote:
On Tue, Sep 24, 2024 at 12:40 PM Daniel Xu dxu@dxuuu.xyz wrote:
+/* Returns constant key value if possible, else -1 */ +static long get_constant_map_key(struct bpf_verifier_env *env,
struct bpf_reg_state *key)
+{
struct bpf_func_state *state = func(env, key);
struct bpf_reg_state *reg;
int stack_off;
int slot;
int spi;
if (key->type != PTR_TO_STACK)
return -1;
if (!tnum_is_const(key->var_off))
return -1;
stack_off = key->off + key->var_off.value;
slot = -stack_off - 1;
if (slot < 0)
/* Stack grew upwards */
The comment is misleading. The verifier is supposed to catch this. It's just this helper was called before the stack bounds were checked?
Yeah. Stack bounds checked in check_stack_access_within_bounds() as part of helper call argument checks.
Maybe the call can be done later?
Maybe? The argument checking starts clobbering state so it'll probably be not very simple to pull information out after args are checked.
I think the logic will probably be much easier to follow with current approach. But maybe I'm missing a simpler idea.
I can make the comment a bit more verbose. Maybe that's better than trying to wire a bunch of logic through memory access checks.
Test that nullness elision works for common use cases. For example, we want to check that both full and subreg stack slots are recognized. As well as when there's both const and non-const values of R2 leading up to a lookup. And obviously some bound checks.
Acked-by: Eduard Zingerman eddyz87@gmail.com Signed-off-by: Daniel Xu dxu@dxuuu.xyz --- .../bpf/progs/verifier_array_access.c | 166 ++++++++++++++++++ 1 file changed, 166 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/verifier_array_access.c b/tools/testing/selftests/bpf/progs/verifier_array_access.c index 95d7ecc12963..2e74504ddbb5 100644 --- a/tools/testing/selftests/bpf/progs/verifier_array_access.c +++ b/tools/testing/selftests/bpf/progs/verifier_array_access.c @@ -28,6 +28,20 @@ struct { __uint(map_flags, BPF_F_WRONLY_PROG); } map_array_wo SEC(".maps");
+struct { + __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY); + __uint(max_entries, 2); + __type(key, int); + __type(value, struct test_val); +} map_array_pcpu SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 2); + __type(key, int); + __type(value, struct test_val); +} map_array SEC(".maps"); + struct { __uint(type, BPF_MAP_TYPE_HASH); __uint(max_entries, 1); @@ -526,4 +540,156 @@ l0_%=: exit; \ : __clobber_all); }
+SEC("socket") +__description("valid map access into an array using constant without nullness") +__success __retval(4) +__naked void an_array_with_a_constant_no_nullness(void) +{ + asm volatile (" \ + r1 = 1; \ + *(u64*)(r10 - 8) = r1; \ + r2 = r10; \ + r2 += -8; \ + r1 = %[map_array] ll; \ + call %[bpf_map_lookup_elem]; \ + r1 = %[test_val_foo]; \ + *(u64*)(r0 + 0) = r1; \ + r0 = *(u64*)(r0 + 0); \ + exit; \ +" : + : __imm(bpf_map_lookup_elem), + __imm_addr(map_array), + __imm_const(test_val_foo, offsetof(struct test_val, foo)) + : __clobber_all); +} + +SEC("socket") +__description("valid multiple map access into an array using constant without nullness") +__success __retval(8) +__naked void multiple_array_with_a_constant_no_nullness(void) +{ + asm volatile (" \ + r1 = 1; \ + *(u64*)(r10 - 8) = r1; \ + r2 = r10; \ + r2 += -8; \ + r1 = %[map_array] ll; \ + call %[bpf_map_lookup_elem]; \ + r6 = %[test_val_foo]; \ + *(u64*)(r0 + 0) = r6; \ + r7 = *(u64*)(r0 + 0); \ + r1 = 0; \ + *(u64*)(r10 - 16) = r1; \ + r2 = r10; \ + r2 += -16; \ + r1 = %[map_array] ll; \ + call %[bpf_map_lookup_elem]; \ + *(u64*)(r0 + 0) = r6; \ + r1 = *(u64*)(r0 + 0); \ + r7 += r1; \ + r0 = r7; \ + exit; \ +" : + : __imm(bpf_map_lookup_elem), + __imm_addr(map_array), + __imm_const(test_val_foo, offsetof(struct test_val, foo)) + : __clobber_all); +} + +SEC("socket") +__description("valid map access into an array using 32-bit constant without nullness") +__success __retval(4) +__naked void an_array_with_a_32bit_constant_no_nullness(void) +{ + asm volatile (" \ + r1 = 1; \ + *(u32*)(r10 - 4) = r1; \ + r2 = r10; \ + r2 += -4; \ + r1 = %[map_array] ll; \ + call %[bpf_map_lookup_elem]; \ + r1 = %[test_val_foo]; \ + *(u64*)(r0 + 0) = r1; \ + r0 = *(u64*)(r0 + 0); \ + exit; \ +" : + : __imm(bpf_map_lookup_elem), + __imm_addr(map_array), + __imm_const(test_val_foo, offsetof(struct test_val, foo)) + : __clobber_all); +} + +SEC("socket") +__description("valid map access into a pcpu array using constant without nullness") +__success __retval(4) +__naked void a_pcpu_array_with_a_constant_no_nullness(void) +{ + asm volatile (" \ + r1 = 1; \ + *(u64*)(r10 - 8) = r1; \ + r2 = r10; \ + r2 += -8; \ + r1 = %[map_array_pcpu] ll; \ + call %[bpf_map_lookup_elem]; \ + r1 = %[test_val_foo]; \ + *(u64*)(r0 + 0) = r1; \ + r0 = *(u64*)(r0 + 0); \ + exit; \ +" : + : __imm(bpf_map_lookup_elem), + __imm_addr(map_array_pcpu), + __imm_const(test_val_foo, offsetof(struct test_val, foo)) + : __clobber_all); +} + +SEC("socket") +__description("invalid map access into an array using constant without nullness") +__failure __msg("R0 invalid mem access 'map_value_or_null'") +__naked void an_array_with_a_constant_no_nullness_out_of_bounds(void) +{ + asm volatile (" \ + r1 = 3; \ + *(u64*)(r10 - 8) = r1; \ + r2 = r10; \ + r2 += -8; \ + r1 = %[map_array] ll; \ + call %[bpf_map_lookup_elem]; \ + r1 = %[test_val_foo]; \ + *(u64*)(r0 + 0) = r1; \ + r0 = *(u64*)(r0 + 0); \ + exit; \ +" : + : __imm(bpf_map_lookup_elem), + __imm_addr(map_array), + __imm_const(test_val_foo, offsetof(struct test_val, foo)) + : __clobber_all); +} + +SEC("socket") +__description("invalid elided lookup using const and non-const key") +__failure __msg("R0 invalid mem access 'map_value_or_null'") +__naked void mixed_const_and_non_const_key_lookup(void) +{ + asm volatile (" \ + call %[bpf_get_prandom_u32]; \ + if r0 > 42 goto l1_%=; \ + *(u64*)(r10 - 8) = r0; \ + r2 = r10; \ + r2 += -8; \ + goto l0_%=; \ +l1_%=: r1 = 1; \ + *(u64*)(r10 - 8) = r1; \ + r2 = r10; \ + r2 += -8; \ +l0_%=: r1 = %[map_array] ll; \ + call %[bpf_map_lookup_elem]; \ + r0 = *(u64*)(r0 + 0); \ + exit; \ +" : + : __imm(bpf_get_prandom_u32), + __imm(bpf_map_lookup_elem), + __imm_addr(map_array) + : __clobber_all); +} + char _license[] SEC("license") = "GPL";
linux-kselftest-mirror@lists.linaro.org