From: Xu Kuohai xukuohai@huawei.com
With lsm return value check, the no-alu32 version test_libbpf_get_fd_by_id_opts is rejected by the verifier, and the log says:
0: R1=ctx() R10=fp0 ; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27 0: (b7) r0 = 0 ; R0_w=0 1: (79) r2 = *(u64 *)(r1 +0) func 'bpf_lsm_bpf_map' arg0 has btf_id 916 type STRUCT 'bpf_map' 2: R1=ctx() R2_w=trusted_ptr_bpf_map() ; if (map != (struct bpf_map *)&data_input) @ test_libbpf_get_fd_by_id_opts.c:29 2: (18) r3 = 0xffff9742c0951a00 ; R3_w=map_ptr(map=data_input,ks=4,vs=4) 4: (5d) if r2 != r3 goto pc+4 ; R2_w=trusted_ptr_bpf_map() R3_w=map_ptr(map=data_input,ks=4,vs=4) ; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27 5: (79) r0 = *(u64 *)(r1 +8) ; R0_w=scalar() R1=ctx() ; if (fmode & FMODE_WRITE) @ test_libbpf_get_fd_by_id_opts.c:32 6: (67) r0 <<= 62 ; R0_w=scalar(smax=0x4000000000000000,umax=0xc000000000000000,smin32=0,smax32=umax32=0,var_off=(0x0; 0xc000000000000000)) 7: (c7) r0 s>>= 63 ; R0_w=scalar(smin=smin32=-1,smax=smax32=0) ; @ test_libbpf_get_fd_by_id_opts.c:0 8: (57) r0 &= -13 ; R0_w=scalar(smax=0x7ffffffffffffff3,umax=0xfffffffffffffff3,smax32=0x7ffffff3,umax32=0xfffffff3,var_off=(0x0; 0xfffffffffffffff3)) ; int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) @ test_libbpf_get_fd_by_id_opts.c:27 9: (95) exit
And here is the C code of the prog.
SEC("lsm/bpf_map") int BPF_PROG(check_access, struct bpf_map *map, fmode_t fmode) { if (map != (struct bpf_map *)&data_input) return 0;
if (fmode & FMODE_WRITE) return -EACCES;
return 0; }
It is clear that the prog can only return either 0 or -EACCESS, and both values are legal.
So why is it rejected by the verifier?
The verifier log shows that the second if and return value setting statements in the prog is optimized to bitwise operations "r0 s>>= 63" and "r0 &= -13". The verifier correctly deduces that the value of r0 is in the range [-1, 0] after verifing instruction "r0 s>>= 63". But when the verifier proceeds to verify instruction "r0 &= -13", it fails to deduce the correct value range of r0.
7: (c7) r0 s>>= 63 ; R0_w=scalar(smin=smin32=-1,smax=smax32=0) 8: (57) r0 &= -13 ; R0_w=scalar(smax=0x7ffffffffffffff3,umax=0xfffffffffffffff3,smax32=0x7ffffff3,umax32=0xfffffff3,var_off=(0x0; 0xfffffffffffffff3))
So why the verifier fails to deduce the result of 'r0 &= -13'?
The verifier uses tnum to track values, and the two ranges "[-1, 0]" and "[0, -1ULL]" are encoded to the same tnum. When verifing instruction "r0 &= -13", the verifier erroneously deduces the result from "[0, -1ULL] AND -13", which is out of the expected return range [-4095, 0].
As explained by Eduard in [0], the clang transformation that generates this pattern is located in DAGCombiner::SimplifySelectCC() method (see [1]).
The transformation happens as a part of DAG to DAG rewrites (LLVM uses several internal representations: - generic optimizer uses LLVM IR, most of the work is done using this representation; - before instruction selection IR is converted to Selection DAG, some optimizations are applied at this stage, all such optimizations are a set of pattern replacements; - Selection DAG is converted to machine code, some optimizations are applied at the machine code level).
Full pattern is described as follows:
// fold (select_cc seteq (and x, y), 0, 0, A) -> (and (sra (shl x)) A) // where y is has a single bit set. // A plaintext description would be, we can turn the SELECT_CC into an AND // when the condition can be materialized as an all-ones register. Any // single bit-test can be materialized as an all-ones register with // shift-left and shift-right-arith.
For this particular test case the DAG is converted as follows:
.---------------- lhs The meaning of this select_cc is: | .------- rhs `lhs == rhs ? true value : false value` | | .----- true value | | | .-- false value v v v v (select_cc seteq (and X 2) 0 0 -13) ^ -> '---------------. (and (sra (sll X 62) 63) | -13) | | Before pattern is applied, it checks that second 'and' operand has only one bit set, (which is true for '2').
The pattern itself generates logical shift left / arithmetic shift right pair, that ensures that result is either all ones (-1) or all zeros (0). Hence, applying 'and' to shifts result and false value generates a correct result.
As suggested by Eduard and Andrii, this patch makes a special case for source or destination register of '&=' operation being in range [-1, 0].
Meaning that one of the '&=' operands is either: - all ones, in which case the counterpart is the result of the operation; - all zeros, in which case zero is the result of the operation.
That is, the result is equivalent to adding 0 to the counterpart. And MIN and MAX values could be deduced based on these observations.
[0] https://lore.kernel.org/bpf/e62e2971301ca7f2e9eb74fc500c520285cad8f5.camel@g... [1] https://github.com/llvm/llvm-project/blob/4523a267829c807f3fc8fab8e5e9613985...
Suggested-by: Eduard Zingerman eddyz87@gmail.com Suggested-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Xu Kuohai xukuohai@huawei.com --- include/linux/tnum.h | 3 ++ kernel/bpf/tnum.c | 25 +++++++++++++++++ kernel/bpf/verifier.c | 64 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 92 insertions(+)
diff --git a/include/linux/tnum.h b/include/linux/tnum.h index 3c13240077b8..5e795d728b9f 100644 --- a/include/linux/tnum.h +++ b/include/linux/tnum.h @@ -52,6 +52,9 @@ struct tnum tnum_mul(struct tnum a, struct tnum b); /* Return a tnum representing numbers satisfying both @a and @b */ struct tnum tnum_intersect(struct tnum a, struct tnum b);
+/* Return a tnum representing numbers satisfying either @a or @b */ +struct tnum tnum_union(struct tnum a, struct tnum b); + /* Return @a with all but the lowest @size bytes cleared */ struct tnum tnum_cast(struct tnum a, u8 size);
diff --git a/kernel/bpf/tnum.c b/kernel/bpf/tnum.c index 9dbc31b25e3d..8028ce06fc1e 100644 --- a/kernel/bpf/tnum.c +++ b/kernel/bpf/tnum.c @@ -150,6 +150,31 @@ struct tnum tnum_intersect(struct tnum a, struct tnum b) return TNUM(v & ~mu, mu); }
+/* Each bit has 3 states: unknown, known 0, known 1. Using x to represent + * unknown state, the result of the union of two bits is as follows: + * + * | x 0 1 + * -----+------------ + * x | x x x + * 0 | x 0 x + * 1 | x x 1 + * + * For tnum a and b, only the bits that are both known 0 or known 1 in a + * and b are known in the result of union a and b. + */ +struct tnum tnum_union(struct tnum a, struct tnum b) +{ + u64 v0, v1, mu; + + /* unknown bits either in a or b */ + mu = a.mask | b.mask; + /* "known 1" bits in both a and b */ + v1 = (a.value & b.value) & ~mu; + /* "known 0" bits in both a and b */ + v0 = (~a.value & ~b.value) & ~mu; + return TNUM(v1, ~(v0 | v1)); +} + struct tnum tnum_cast(struct tnum a, u8 size) { a.value &= (1ULL << (size * 8)) - 1; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 19ef3d27dbb7..7f4ee3b95f4e 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -13632,6 +13632,39 @@ static void scalar32_min_max_and(struct bpf_reg_state *dst_reg, return; }
+ /* special case: dst_reg is in range [-1, 0] */ + if (dst_reg->s32_min_value == -1 && dst_reg->s32_max_value == 0) { + /* the result is equivalent to adding 0 to src_reg */ + var32_off = tnum_union(src_reg->var_off, tnum_const(0)); + dst_reg->var_off = tnum_with_subreg(dst_reg->var_off, var32_off); + /* update signed min/max to include 0 */ + dst_reg->s32_min_value = min_t(s32, src_reg->s32_min_value, 0); + dst_reg->s32_max_value = max_t(s32, src_reg->s32_max_value, 0); + /* since we're adding 0 to src_reg and 0 is the smallest + * unsigned integer, dst_reg->u32_min_value should be 0, + * and dst->u32_max_value should be src_reg->u32_max_value. + */ + dst_reg->u32_min_value = 0; + dst_reg->u32_max_value = src_reg->u32_max_value; + return; + } + + /* special case: src_reg is in range [-1, 0] */ + if (src_reg->s32_min_value == -1 && src_reg->s32_max_value == 0) { + /* the result is equivalent to adding 0 to dst_reg */ + var32_off = tnum_union(dst_reg->var_off, tnum_const(0)); + dst_reg->var_off = tnum_with_subreg(dst_reg->var_off, var32_off); + /* update signed min/max to include 0 */ + dst_reg->s32_min_value = min_t(s32, dst_reg->s32_min_value, 0); + dst_reg->s32_max_value = max_t(s32, dst_reg->s32_max_value, 0); + /* since we're adding 0 to dst_reg and 0 is the smallest + * unsigned integer, dst_reg->u32_min_value should be 0, + * and dst->u32_max_value should remain unchanged. + */ + dst_reg->u32_min_value = 0; + return; + } + /* We get our minimum from the var_off, since that's inherently * bitwise. Our maximum is the minimum of the operands' maxima. */ @@ -13662,6 +13695,37 @@ static void scalar_min_max_and(struct bpf_reg_state *dst_reg, return; }
+ /* special case: dst_reg is in range [-1, 0] */ + if (dst_reg->smin_value == -1 && dst_reg->smax_value == 0) { + /* the result is equivalent to adding 0 to src_reg */ + dst_reg->var_off = tnum_union(src_reg->var_off, tnum_const(0)); + /* update signed min/max to include 0 */ + dst_reg->smin_value = min_t(s64, src_reg->smin_value, 0); + dst_reg->smax_value = max_t(s64, src_reg->smax_value, 0); + /* since we're adding 0 to src_reg and 0 is the smallest + * unsigned integer, dst_reg->umin_value should be 0, + * and dst->umax_value should be src_reg->umax_value. + */ + dst_reg->umin_value = 0; + dst_reg->umax_value = src_reg->umax_value; + return; + } + + /* special case: src_reg is in range [-1, 0] */ + if (src_reg->smin_value == -1 && src_reg->smax_value == 0) { + /* the result is equivalent to adding 0 to dst_reg */ + dst_reg->var_off = tnum_union(dst_reg->var_off, tnum_const(0)); + /* update signed min/max to include 0 */ + dst_reg->smin_value = min_t(s64, dst_reg->smin_value, 0); + dst_reg->smax_value = max_t(s64, dst_reg->smax_value, 0); + /* since we're adding 0 to dst_reg and 0 is the smallest + * unsigned integer, dst_reg->min_value should be 0, + * and dst->umax_value should remain unchanged. + */ + dst_reg->umin_value = 0; + return; + } + /* We get our minimum from the var_off, since that's inherently * bitwise. Our maximum is the minimum of the operands' maxima. */