bpf_dynptr_slice(_rw) uses a user provided buffer if it can not provide a pointer to a block of contiguous memory. This buffer is unused in the case of local dynptrs, and may be unused in other cases as well. There is no need to require the buffer, as the kfunc can just return NULL if it was needed and not provided.
This adds another kfunc annotation, __opt, which combines with __sz and __szk to allow the buffer associated with the size to be NULL. If the buffer is NULL, the verifier does not check that the buffer is of sufficient size.
Signed-off-by: Daniel Rosenberg drosen@google.com --- Documentation/bpf/kfuncs.rst | 23 ++++++++++++++++++++++- include/linux/skbuff.h | 2 +- kernel/bpf/helpers.c | 30 ++++++++++++++++++------------ kernel/bpf/verifier.c | 17 +++++++++++++---- 4 files changed, 54 insertions(+), 18 deletions(-)
diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst index ea2516374d92..7a3d9de5f315 100644 --- a/Documentation/bpf/kfuncs.rst +++ b/Documentation/bpf/kfuncs.rst @@ -100,7 +100,7 @@ Hence, whenever a constant scalar argument is accepted by a kfunc which is not a size parameter, and the value of the constant matters for program safety, __k suffix should be used.
-2.2.2 __uninit Annotation +2.2.3 __uninit Annotation -------------------------
This annotation is used to indicate that the argument will be treated as @@ -117,6 +117,27 @@ Here, the dynptr will be treated as an uninitialized dynptr. Without this annotation, the verifier will reject the program if the dynptr passed in is not initialized.
+2.2.4 __opt Annotation +------------------------- + +This annotation is used to indicate that the buffer associated with an __sz or __szk +argument may be null. If the function is passed a nullptr in place of the buffer, +the verifier will not check that length is appropriate for the buffer. The kfunc is +responsible for checking if this buffer is null before using it. + +An example is given below:: + + __bpf_kfunc void *bpf_dynptr_slice(..., void *buffer__opt, u32 buffer__szk) + { + ... + } + +Here, the buffer may be null. If buffer is not null, it at least of size buffer_szk. +Either way, the returned buffer is either NULL, or of size buffer_szk. Without this +annotation, the verifier will reject the program if a null pointer is passed in with +a nonzero size. + + .. _BPF_kfunc_nodef:
2.3 Using an existing kernel function diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 738776ab8838..8ddb4af1a501 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4033,7 +4033,7 @@ __skb_header_pointer(const struct sk_buff *skb, int offset, int len, if (likely(hlen - offset >= len)) return (void *)data + offset;
- if (!skb || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0)) + if (!skb || !buffer || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0)) return NULL;
return buffer; diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 8d368fa353f9..26efb6fbeab2 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2167,13 +2167,15 @@ __bpf_kfunc struct task_struct *bpf_task_from_pid(s32 pid) * bpf_dynptr_slice() - Obtain a read-only pointer to the dynptr data. * @ptr: The dynptr whose data slice to retrieve * @offset: Offset into the dynptr - * @buffer: User-provided buffer to copy contents into - * @buffer__szk: Size (in bytes) of the buffer. This is the length of the - * requested slice. This must be a constant. + * @buffer__opt: User-provided buffer to copy contents into. May be NULL + * @buffer__szk: Size (in bytes) of the buffer if present. This is the + * length of the requested slice. This must be a constant. * * For non-skb and non-xdp type dynptrs, there is no difference between * bpf_dynptr_slice and bpf_dynptr_data. * + * If buffer__opt is NULL, the call will fail if buffer_opt was needed. + * * If the intention is to write to the data slice, please use * bpf_dynptr_slice_rdwr. * @@ -2190,7 +2192,7 @@ __bpf_kfunc struct task_struct *bpf_task_from_pid(s32 pid) * direct pointer) */ __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset, - void *buffer, u32 buffer__szk) + void *buffer__opt, u32 buffer__szk) { enum bpf_dynptr_type type; u32 len = buffer__szk; @@ -2210,15 +2212,17 @@ __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset case BPF_DYNPTR_TYPE_RINGBUF: return ptr->data + ptr->offset + offset; case BPF_DYNPTR_TYPE_SKB: - return skb_header_pointer(ptr->data, ptr->offset + offset, len, buffer); + return skb_header_pointer(ptr->data, ptr->offset + offset, len, buffer__opt); case BPF_DYNPTR_TYPE_XDP: { void *xdp_ptr = bpf_xdp_pointer(ptr->data, ptr->offset + offset, len); if (xdp_ptr) return xdp_ptr;
- bpf_xdp_copy_buf(ptr->data, ptr->offset + offset, buffer, len, false); - return buffer; + if (!buffer__opt) + return NULL; + bpf_xdp_copy_buf(ptr->data, ptr->offset + offset, buffer__opt, len, false); + return buffer__opt; } default: WARN_ONCE(true, "unknown dynptr type %d\n", type); @@ -2230,13 +2234,15 @@ __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset * bpf_dynptr_slice_rdwr() - Obtain a writable pointer to the dynptr data. * @ptr: The dynptr whose data slice to retrieve * @offset: Offset into the dynptr - * @buffer: User-provided buffer to copy contents into - * @buffer__szk: Size (in bytes) of the buffer. This is the length of the - * requested slice. This must be a constant. + * @buffer__opt: User-provided buffer to copy contents into. May be NULL + * @buffer__szk: Size (in bytes) of the buffer if present. This is the + * length of the requested slice. This must be a constant. * * For non-skb and non-xdp type dynptrs, there is no difference between * bpf_dynptr_slice and bpf_dynptr_data. * + * If buffer__opt is NULL, the call will fail if buffer_opt was needed. + * * The returned pointer is writable and may point to either directly the dynptr * data at the requested offset or to the buffer if unable to obtain a direct * data pointer to (example: the requested slice is to the paged area of an skb @@ -2267,7 +2273,7 @@ __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset * direct pointer) */ __bpf_kfunc void *bpf_dynptr_slice_rdwr(const struct bpf_dynptr_kern *ptr, u32 offset, - void *buffer, u32 buffer__szk) + void *buffer__opt, u32 buffer__szk) { if (!ptr->data || bpf_dynptr_is_rdonly(ptr)) return NULL; @@ -2294,7 +2300,7 @@ __bpf_kfunc void *bpf_dynptr_slice_rdwr(const struct bpf_dynptr_kern *ptr, u32 o * will be copied out into the buffer and the user will need to call * bpf_dynptr_write() to commit changes. */ - return bpf_dynptr_slice(ptr, offset, buffer, buffer__szk); + return bpf_dynptr_slice(ptr, offset, buffer__opt, buffer__szk); }
__bpf_kfunc void *bpf_cast_to_kern_ctx(void *obj) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index fbcf5a4e2fcd..708ae7bca1fe 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -9398,6 +9398,11 @@ static bool is_kfunc_arg_const_mem_size(const struct btf *btf, return __kfunc_param_match_suffix(btf, arg, "__szk"); }
+static bool is_kfunc_arg_optional(const struct btf *btf, const struct btf_param *arg) +{ + return __kfunc_param_match_suffix(btf, arg, "__opt"); +} + static bool is_kfunc_arg_constant(const struct btf *btf, const struct btf_param *arg) { return __kfunc_param_match_suffix(btf, arg, "__k"); @@ -10464,13 +10469,17 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ break; case KF_ARG_PTR_TO_MEM_SIZE: { + struct bpf_reg_state *buff_reg = ®s[regno]; + const struct btf_param *buff_arg = &args[i]; struct bpf_reg_state *size_reg = ®s[regno + 1]; const struct btf_param *size_arg = &args[i + 1];
- ret = check_kfunc_mem_size_reg(env, size_reg, regno + 1); - if (ret < 0) { - verbose(env, "arg#%d arg#%d memory, len pair leads to invalid memory access\n", i, i + 1); - return ret; + if (!register_is_null(buff_reg) || !is_kfunc_arg_optional(meta->btf, buff_arg)) { + ret = check_kfunc_mem_size_reg(env, size_reg, regno + 1); + if (ret < 0) { + verbose(env, "arg#%d arg#%d memory, len pair leads to invalid memory access\n", i, i + 1); + return ret; + } }
if (is_kfunc_arg_const_mem_size(meta->btf, size_arg, size_reg)) {
base-commit: 6e98b09da931a00bf4e0477d0fa52748bf28fcce
bpf_dynptr_slice(_rw) no longer requires a buffer for verification. If the buffer is needed, but not present, the function will return NULL.
Signed-off-by: Daniel Rosenberg drosen@google.com --- tools/testing/selftests/bpf/prog_tests/dynptr.c | 1 + .../selftests/bpf/progs/dynptr_success.c | 17 +++++++++++++++++ 2 files changed, 18 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/dynptr.c b/tools/testing/selftests/bpf/prog_tests/dynptr.c index d176c34a7d2e..ac1fcaddcddf 100644 --- a/tools/testing/selftests/bpf/prog_tests/dynptr.c +++ b/tools/testing/selftests/bpf/prog_tests/dynptr.c @@ -20,6 +20,7 @@ static struct { {"test_ringbuf", SETUP_SYSCALL_SLEEP}, {"test_skb_readonly", SETUP_SKB_PROG}, {"test_dynptr_skb_data", SETUP_SKB_PROG}, + {"test_dynptr_skb_no_buff", SETUP_SKB_PROG}, };
static void verify_success(const char *prog_name, enum test_setup_type setup_type) diff --git a/tools/testing/selftests/bpf/progs/dynptr_success.c b/tools/testing/selftests/bpf/progs/dynptr_success.c index b2fa6c47ecc0..16636a29242a 100644 --- a/tools/testing/selftests/bpf/progs/dynptr_success.c +++ b/tools/testing/selftests/bpf/progs/dynptr_success.c @@ -207,3 +207,20 @@ int test_dynptr_skb_data(struct __sk_buff *skb)
return 1; } + +SEC("?cgroup_skb/egress") +int test_dynptr_skb_no_buff(struct __sk_buff *skb) +{ + struct bpf_dynptr ptr; + __u64 *data; + + if (bpf_dynptr_from_skb(skb, 0, &ptr)) { + err = 1; + return 1; + } + + /* This may return NULL. SKB may require a buffer */ + data = bpf_dynptr_slice(&ptr, 0, NULL, 1); + + return !!data; +}
On Mon, May 1, 2023 at 5:52 PM Daniel Rosenberg drosen@google.com wrote:
bpf_dynptr_slice(_rw) no longer requires a buffer for verification. If the buffer is needed, but not present, the function will return NULL.
Signed-off-by: Daniel Rosenberg drosen@google.com
tools/testing/selftests/bpf/prog_tests/dynptr.c | 1 + .../selftests/bpf/progs/dynptr_success.c | 17 +++++++++++++++++ 2 files changed, 18 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/dynptr.c b/tools/testing/selftests/bpf/prog_tests/dynptr.c index d176c34a7d2e..ac1fcaddcddf 100644 --- a/tools/testing/selftests/bpf/prog_tests/dynptr.c +++ b/tools/testing/selftests/bpf/prog_tests/dynptr.c @@ -20,6 +20,7 @@ static struct { {"test_ringbuf", SETUP_SYSCALL_SLEEP}, {"test_skb_readonly", SETUP_SKB_PROG}, {"test_dynptr_skb_data", SETUP_SKB_PROG},
{"test_dynptr_skb_no_buff", SETUP_SKB_PROG},
};
Please rebase and resubmit targeting bpf-next and with [PATCH bpf-next] subject.
It doesn't apply: Using index info to reconstruct a base tree... M tools/testing/selftests/bpf/prog_tests/dynptr.c M tools/testing/selftests/bpf/progs/dynptr_success.c Falling back to patching base and 3-way merge... Auto-merging tools/testing/selftests/bpf/progs/dynptr_success.c CONFLICT (content): Merge conflict in tools/testing/selftests/bpf/progs/dynptr_success.c Auto-merging tools/testing/selftests/bpf/prog_tests/dynptr.c CONFLICT (content): Merge conflict in tools/testing/selftests/bpf/prog_tests/dynptr.c error: Failed to merge in the changes. Patch failed at 0002 selftests/bpf: Test allowing NULL buffer in dynptr slice
This ensures we still reject invalid memory accesses in buffers that are marked optional.
Signed-off-by: Daniel Rosenberg drosen@google.com --- .../testing/selftests/bpf/progs/dynptr_fail.c | 20 +++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/dynptr_fail.c b/tools/testing/selftests/bpf/progs/dynptr_fail.c index 759eb5c245cd..ee98f7ce0b0d 100644 --- a/tools/testing/selftests/bpf/progs/dynptr_fail.c +++ b/tools/testing/selftests/bpf/progs/dynptr_fail.c @@ -1378,3 +1378,23 @@ int invalid_slice_rdwr_rdonly(struct __sk_buff *skb)
return 0; } + +/* Buffers that are provided must be sufficiently long */ +SEC("?cgroup_skb/egress") +__failure __msg("memory, len pair leads to invalid memory access") +int test_dynptr_skb_small_buff(struct __sk_buff *skb) +{ + struct bpf_dynptr ptr; + char buffer[8] = {}; + __u64 *data; + + if (bpf_dynptr_from_skb(skb, 0, &ptr)) { + err = 1; + return 1; + } + + /* This may return NULL. SKB may require a buffer */ + data = bpf_dynptr_slice(&ptr, 0, buffer, 9); + + return !!data; +}
On Mon, May 1, 2023 at 5:52 PM Daniel Rosenberg drosen@google.com wrote:
bpf_dynptr_slice(_rw) uses a user provided buffer if it can not provide a pointer to a block of contiguous memory. This buffer is unused in the case of local dynptrs, and may be unused in other cases as well. There is no need to require the buffer, as the kfunc can just return NULL if it was needed and not provided.
This adds another kfunc annotation, __opt, which combines with __sz and __szk to allow the buffer associated with the size to be NULL. If the buffer is NULL, the verifier does not check that the buffer is of sufficient size.
Signed-off-by: Daniel Rosenberg drosen@google.com
LGTM
Acked-by: Andrii Nakryiko andrii@kernel.org
Documentation/bpf/kfuncs.rst | 23 ++++++++++++++++++++++- include/linux/skbuff.h | 2 +- kernel/bpf/helpers.c | 30 ++++++++++++++++++------------ kernel/bpf/verifier.c | 17 +++++++++++++---- 4 files changed, 54 insertions(+), 18 deletions(-)
[...]
On Mon, 1 May 2023 17:52:16 -0700 Daniel Rosenberg wrote:
--- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4033,7 +4033,7 @@ __skb_header_pointer(const struct sk_buff *skb, int offset, int len, if (likely(hlen - offset >= len)) return (void *)data + offset;
- if (!skb || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
- if (!skb || !buffer || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0)) return NULL;
First off - please make sure you CC netdev on changes to networking!
Please do not add stupid error checks to core code for BPF safety. Wrap the call if you can't guarantee that value is sane, this is a very bad precedent.
On Tue, Jul 18, 2023 at 8:26 AM Jakub Kicinski kuba@kernel.org wrote:
On Mon, 1 May 2023 17:52:16 -0700 Daniel Rosenberg wrote:
--- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4033,7 +4033,7 @@ __skb_header_pointer(const struct sk_buff *skb, int offset, int len, if (likely(hlen - offset >= len)) return (void *)data + offset;
if (!skb || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
if (!skb || !buffer || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0)) return NULL;
First off - please make sure you CC netdev on changes to networking!
Please do not add stupid error checks to core code for BPF safety. Wrap the call if you can't guarantee that value is sane, this is a very bad precedent.
This is NOT for safety. You misread the code.
On Tue, 18 Jul 2023 08:52:55 -0700 Alexei Starovoitov wrote:
On Tue, Jul 18, 2023 at 8:26 AM Jakub Kicinski kuba@kernel.org wrote:
On Mon, 1 May 2023 17:52:16 -0700 Daniel Rosenberg wrote:
--- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4033,7 +4033,7 @@ __skb_header_pointer(const struct sk_buff *skb, int offset, int len, if (likely(hlen - offset >= len)) return (void *)data + offset;
if (!skb || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
if (!skb || !buffer || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0)) return NULL;
First off - please make sure you CC netdev on changes to networking!
Please do not add stupid error checks to core code for BPF safety. Wrap the call if you can't guarantee that value is sane, this is a very bad precedent.
This is NOT for safety. You misread the code.
Doesn't matter, safety or optionality. skb_header_pointer() is used on the fast paths of the networking stack, adding heavy handed input validation to it is not okay. No sane code should be passing NULL buffer to skb_header_pointer(). Please move the NULL check to the BPF code so the rest of the networking stack does not have to pay the cost.
This should be common sense. If one caller is doing something.. "special" the extra code should live in the caller, not the callee. That's basic code hygiene.
On Tue, Jul 18, 2023 at 9:06 AM Jakub Kicinski kuba@kernel.org wrote:
On Tue, 18 Jul 2023 08:52:55 -0700 Alexei Starovoitov wrote:
On Tue, Jul 18, 2023 at 8:26 AM Jakub Kicinski kuba@kernel.org wrote:
On Mon, 1 May 2023 17:52:16 -0700 Daniel Rosenberg wrote:
--- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4033,7 +4033,7 @@ __skb_header_pointer(const struct sk_buff *skb, int offset, int len, if (likely(hlen - offset >= len)) return (void *)data + offset;
if (!skb || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
if (!skb || !buffer || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0)) return NULL;
First off - please make sure you CC netdev on changes to networking!
Please do not add stupid error checks to core code for BPF safety. Wrap the call if you can't guarantee that value is sane, this is a very bad precedent.
This is NOT for safety. You misread the code.
Doesn't matter, safety or optionality. skb_header_pointer() is used on the fast paths of the networking stack, adding heavy handed input validation to it is not okay. No sane code should be passing NULL buffer to skb_header_pointer(). Please move the NULL check to the BPF code so the rest of the networking stack does not have to pay the cost.
This should be common sense. If one caller is doing something.. "special" the extra code should live in the caller, not the callee. That's basic code hygiene.
you're still missing the point. Pls read the whole patch series. It is _not_ input validation. skb_copy_bits is a slow path. One extra check doesn't affect performance at all. So 'fast paths' isn't a valid argument here. The code is reusing if (likely(hlen - offset >= len)) return (void *)data + offset; which _is_ the fast path.
What you're requesting is to copy paste the whole __skb_header_pointer into __skb_header_pointer2. Makes no sense.
On Tue, 18 Jul 2023 09:52:24 -0700 Alexei Starovoitov wrote:
On Tue, Jul 18, 2023 at 9:06 AM Jakub Kicinski kuba@kernel.org wrote:
This is NOT for safety. You misread the code.
Doesn't matter, safety or optionality. skb_header_pointer() is used on the fast paths of the networking stack, adding heavy handed input validation to it is not okay. No sane code should be passing NULL buffer to skb_header_pointer(). Please move the NULL check to the BPF code so the rest of the networking stack does not have to pay the cost.
This should be common sense. If one caller is doing something.. "special" the extra code should live in the caller, not the callee. That's basic code hygiene.
you're still missing the point. Pls read the whole patch series.
Could you just tell me what the point is then? The "series" is one patch plus some tiny selftests. I don't see any documentation for how dynptrs are supposed to work either.
As far as I can grasp this makes the "copy buffer" optional from the kfunc-API perspective (of bpf_dynptr_slice()).
It is _not_ input validation. skb_copy_bits is a slow path. One extra check doesn't affect performance at all. So 'fast paths' isn't a valid argument here. The code is reusing if (likely(hlen - offset >= len)) return (void *)data + offset; which _is_ the fast path.
What you're requesting is to copy paste the whole __skb_header_pointer into __skb_header_pointer2. Makes no sense.
No, Alexei, the whole point of skb_header_pointer() is to pass the secondary buffer, to make header parsing dependable.
Passing NULL buffer to skb_header_pointer() is absolutely nonsensical. It should *not* be supported. We had enough prod problems with people thinking that the entire header will be in the linear portion. Then either the NIC can't parse the header, someone enables jumbo, disables GRO, adds new HW, adds encap, etc etc and things implode.
If you want to support it in BPF that's up to you, but I think it's entirely reasonable for me to request that you don't do such things in general networking code. The function is 5 LoC, so a local BPF copy seems fine. Although I'd suggest skb_header_pointer_misguided() rather than __skb_header_pointer2() as the name :)
On Tue, Jul 18, 2023 at 10:18 AM Jakub Kicinski kuba@kernel.org wrote:
On Tue, 18 Jul 2023 09:52:24 -0700 Alexei Starovoitov wrote:
On Tue, Jul 18, 2023 at 9:06 AM Jakub Kicinski kuba@kernel.org wrote:
This is NOT for safety. You misread the code.
Doesn't matter, safety or optionality. skb_header_pointer() is used on the fast paths of the networking stack, adding heavy handed input validation to it is not okay. No sane code should be passing NULL buffer to skb_header_pointer(). Please move the NULL check to the BPF code so the rest of the networking stack does not have to pay the cost.
This should be common sense. If one caller is doing something.. "special" the extra code should live in the caller, not the callee. That's basic code hygiene.
you're still missing the point. Pls read the whole patch series.
Could you just tell me what the point is then? The "series" is one patch plus some tiny selftests. I don't see any documentation for how dynptrs are supposed to work either.
As far as I can grasp this makes the "copy buffer" optional from the kfunc-API perspective (of bpf_dynptr_slice()).
It is _not_ input validation. skb_copy_bits is a slow path. One extra check doesn't affect performance at all. So 'fast paths' isn't a valid argument here. The code is reusing if (likely(hlen - offset >= len)) return (void *)data + offset; which _is_ the fast path.
What you're requesting is to copy paste the whole __skb_header_pointer into __skb_header_pointer2. Makes no sense.
No, Alexei, the whole point of skb_header_pointer() is to pass the secondary buffer, to make header parsing dependable.
of course. No one argues about that.
Passing NULL buffer to skb_header_pointer() is absolutely nonsensical.
Quick grep through the code proves you wrong: drivers/net/ethernet/broadcom/bnxt/bnxt.c __skb_header_pointer(NULL, start, sizeof(*hp), skb->data, skb_headlen(skb), NULL);
was done before this patch. It's using __ variant on purpose and explicitly passing skb==NULL to exactly trigger that line to deliberately avoid the slow path.
Another example: drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c skb_header_pointer(skb, 0, 0, NULL);
This one I'm not sure about. Looks buggy.
It should *not* be supported. We had enough prod problems with people thinking that the entire header will be in the linear portion. Then either the NIC can't parse the header, someone enables jumbo, disables GRO, adds new HW, adds encap, etc etc and things implode.
I don't see how this is related. NULL buffer allows to get a linear pointer and explicitly avoids slow path when it's not linear.
If you want to support it in BPF that's up to you, but I think it's entirely reasonable for me to request that you don't do such things in general networking code. The function is 5 LoC, so a local BPF copy seems fine. Although I'd suggest skb_header_pointer_misguided() rather than __skb_header_pointer2() as the name :)
If you insist we can, but bnxt is an example that buffer==NULL is a useful concept for networking and not bpf specific. It also doesn't make "people think the header is linear" any worse.
On Tue, 18 Jul 2023 10:50:14 -0700 Alexei Starovoitov wrote:
On Tue, Jul 18, 2023 at 10:18 AM Jakub Kicinski kuba@kernel.org wrote:
you're still missing the point. Pls read the whole patch series.
Could you just tell me what the point is then? The "series" is one patch plus some tiny selftests. I don't see any documentation for how dynptrs are supposed to work either.
As far as I can grasp this makes the "copy buffer" optional from the kfunc-API perspective (of bpf_dynptr_slice()).
It is _not_ input validation. skb_copy_bits is a slow path. One extra check doesn't affect performance at all. So 'fast paths' isn't a valid argument here. The code is reusing if (likely(hlen - offset >= len)) return (void *)data + offset; which _is_ the fast path.
What you're requesting is to copy paste the whole __skb_header_pointer into __skb_header_pointer2. Makes no sense.
No, Alexei, the whole point of skb_header_pointer() is to pass the secondary buffer, to make header parsing dependable.
of course. No one argues about that.
Passing NULL buffer to skb_header_pointer() is absolutely nonsensical.
Quick grep through the code proves you wrong: drivers/net/ethernet/broadcom/bnxt/bnxt.c __skb_header_pointer(NULL, start, sizeof(*hp), skb->data, skb_headlen(skb), NULL);
was done before this patch. It's using __ variant on purpose and explicitly passing skb==NULL to exactly trigger that line to deliberately avoid the slow path.
Another example: drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c skb_header_pointer(skb, 0, 0, NULL);
This one I'm not sure about. Looks buggy.
These are both Tx path for setting up offloads, Linux doesn't request offloads for headers outside of the linear part. The ixgbevf code is completely pointless, as you say.
In general drivers are rarely a source of high quality code examples. Having been directly involved in the bugs that lead to the bnxt code being written - I was so happy that the driver started parsing Tx packets *at all*, so I wasn't too fussed by the minor problems :(
It should *not* be supported. We had enough prod problems with people thinking that the entire header will be in the linear portion. Then either the NIC can't parse the header, someone enables jumbo, disables GRO, adds new HW, adds encap, etc etc and things implode.
I don't see how this is related. NULL buffer allows to get a linear pointer and explicitly avoids slow path when it's not linear.
Direct packet access via skb->data is there for those who want high speed 🤷️
If you want to support it in BPF that's up to you, but I think it's entirely reasonable for me to request that you don't do such things in general networking code. The function is 5 LoC, so a local BPF copy seems fine. Although I'd suggest skb_header_pointer_misguided() rather than __skb_header_pointer2() as the name :)
If you insist we can, but bnxt is an example that buffer==NULL is a useful concept for networking and not bpf specific. It also doesn't make "people think the header is linear" any worse.
My worry is that people will think that whether the buffer is needed or not depends on _their program_, rather than on the underlying platform. So if it works in testing without the buffer - the buffer must not be required for their use case.
On Tue, Jul 18, 2023 at 11:11 AM Jakub Kicinski kuba@kernel.org wrote:
On Tue, 18 Jul 2023 10:50:14 -0700 Alexei Starovoitov wrote:
On Tue, Jul 18, 2023 at 10:18 AM Jakub Kicinski kuba@kernel.org wrote:
you're still missing the point. Pls read the whole patch series.
Could you just tell me what the point is then? The "series" is one patch plus some tiny selftests. I don't see any documentation for how dynptrs are supposed to work either.
As far as I can grasp this makes the "copy buffer" optional from the kfunc-API perspective (of bpf_dynptr_slice()).
It is _not_ input validation. skb_copy_bits is a slow path. One extra check doesn't affect performance at all. So 'fast paths' isn't a valid argument here. The code is reusing if (likely(hlen - offset >= len)) return (void *)data + offset; which _is_ the fast path.
What you're requesting is to copy paste the whole __skb_header_pointer into __skb_header_pointer2. Makes no sense.
No, Alexei, the whole point of skb_header_pointer() is to pass the secondary buffer, to make header parsing dependable.
of course. No one argues about that.
Passing NULL buffer to skb_header_pointer() is absolutely nonsensical.
Quick grep through the code proves you wrong: drivers/net/ethernet/broadcom/bnxt/bnxt.c __skb_header_pointer(NULL, start, sizeof(*hp), skb->data, skb_headlen(skb), NULL);
was done before this patch. It's using __ variant on purpose and explicitly passing skb==NULL to exactly trigger that line to deliberately avoid the slow path.
Another example: drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c skb_header_pointer(skb, 0, 0, NULL);
This one I'm not sure about. Looks buggy.
These are both Tx path for setting up offloads, Linux doesn't request offloads for headers outside of the linear part. The ixgbevf code is completely pointless, as you say.
In general drivers are rarely a source of high quality code examples. Having been directly involved in the bugs that lead to the bnxt code being written - I was so happy that the driver started parsing Tx packets *at all*, so I wasn't too fussed by the minor problems :(
It should *not* be supported. We had enough prod problems with people thinking that the entire header will be in the linear portion. Then either the NIC can't parse the header, someone enables jumbo, disables GRO, adds new HW, adds encap, etc etc and things implode.
I don't see how this is related. NULL buffer allows to get a linear pointer and explicitly avoids slow path when it's not linear.
Direct packet access via skb->data is there for those who want high speed 🤷️
skb->data/data_end approach unfortunately doesn't work that well. Too much verifier fighting. That's why dynptr was introduced.
If you want to support it in BPF that's up to you, but I think it's entirely reasonable for me to request that you don't do such things in general networking code. The function is 5 LoC, so a local BPF copy seems fine. Although I'd suggest skb_header_pointer_misguided() rather than __skb_header_pointer2() as the name :)
If you insist we can, but bnxt is an example that buffer==NULL is a useful concept for networking and not bpf specific. It also doesn't make "people think the header is linear" any worse.
My worry is that people will think that whether the buffer is needed or not depends on _their program_, rather than on the underlying platform. So if it works in testing without the buffer - the buffer must not be required for their use case.
Are you concerned about bpf progs breaking this way? I thought you're worried about the driver misusing skb_header_pointer() with buffer==NULL.
We can remove !buffer check as in the attached patch, but I don't quite see how it would improve driver quality.
On Tue, 18 Jul 2023 13:34:06 -0700 Alexei Starovoitov wrote:
Direct packet access via skb->data is there for those who want high speed 🤷️
skb->data/data_end approach unfortunately doesn't work that well. Too much verifier fighting. That's why dynptr was introduced.
I wish Daniel told us more about the use case.
My worry is that people will think that whether the buffer is needed or not depends on _their program_, rather than on the underlying platform. So if it works in testing without the buffer - the buffer must not be required for their use case.
Are you concerned about bpf progs breaking this way?
Both, BPF progs breaking and netdev code doing things which don't make sense. But I won't argue too hard about the former, i.e. the BPF API.
I thought you're worried about the driver misusing skb_header_pointer() with buffer==NULL.
We can remove !buffer check as in the attached patch, but I don't quite see how it would improve driver quality.
The drivers may not be pretty but they aren't buggy AFAICT.
[0001-bpf-net-Introduce-skb_pointer_if_linear.patch application/octet-stream (2873 bytes)]
Or we can simply pretend we don't have the skb:
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 91ed66952580..217447f01d56 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4023,7 +4023,7 @@ __skb_header_pointer(const struct sk_buff *skb, int offset, int len, if (likely(hlen - offset >= len)) return (void *)data + offset;
- if (!skb || !buffer || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0)) + if (!skb || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0)) return NULL;
return buffer; diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 9e80efa59a5d..8bc4622cc1df 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2239,7 +2239,13 @@ __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset case BPF_DYNPTR_TYPE_RINGBUF: return ptr->data + ptr->offset + offset; case BPF_DYNPTR_TYPE_SKB: - return skb_header_pointer(ptr->data, ptr->offset + offset, len, buffer__opt); + { + const struct sk_buff *skb = ptr->data; + + return __skb_header_pointer(NULL, ptr->offset + offset, len, + skb->data, skb_headlen(skb), + buffer__opt); + } case BPF_DYNPTR_TYPE_XDP: { void *xdp_ptr = bpf_xdp_pointer(ptr->data, ptr->offset + offset, len);
On Tue, Jul 18, 2023 at 4:06 PM Jakub Kicinski kuba@kernel.org wrote:
On Tue, 18 Jul 2023 13:34:06 -0700 Alexei Starovoitov wrote:
Direct packet access via skb->data is there for those who want high speed 🤷️
skb->data/data_end approach unfortunately doesn't work that well. Too much verifier fighting. That's why dynptr was introduced.
I wish Daniel told us more about the use case.
My worry is that people will think that whether the buffer is needed or not depends on _their program_, rather than on the underlying platform. So if it works in testing without the buffer - the buffer must not be required for their use case.
Are you concerned about bpf progs breaking this way?
Both, BPF progs breaking and netdev code doing things which don't make sense. But I won't argue too hard about the former, i.e. the BPF API.
I thought you're worried about the driver misusing skb_header_pointer() with buffer==NULL.
We can remove !buffer check as in the attached patch, but I don't quite see how it would improve driver quality.
The drivers may not be pretty but they aren't buggy AFAICT.
[0001-bpf-net-Introduce-skb_pointer_if_linear.patch application/octet-stream (2873 bytes)]
Or we can simply pretend we don't have the skb:
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 91ed66952580..217447f01d56 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4023,7 +4023,7 @@ __skb_header_pointer(const struct sk_buff *skb, int offset, int len, if (likely(hlen - offset >= len)) return (void *)data + offset;
if (!skb || !buffer || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
if (!skb || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0)) return NULL; return buffer;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 9e80efa59a5d..8bc4622cc1df 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -2239,7 +2239,13 @@ __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset case BPF_DYNPTR_TYPE_RINGBUF: return ptr->data + ptr->offset + offset; case BPF_DYNPTR_TYPE_SKB:
return skb_header_pointer(ptr->data, ptr->offset + offset, len, buffer__opt);
{
const struct sk_buff *skb = ptr->data;
return __skb_header_pointer(NULL, ptr->offset + offset, len,
skb->data, skb_headlen(skb),
buffer__opt);
}
Which would encourage bnxt-like hacks. I don't like it tbh. At least skb_pointer_if_linear() has a clear meaning. It's more run-time overhead, since buffer__opt is checked early, but that's ok.
On Tue, 18 Jul 2023 16:17:24 -0700 Alexei Starovoitov wrote:
Which would encourage bnxt-like hacks. I don't like it tbh. At least skb_pointer_if_linear() has a clear meaning. It's more run-time overhead, since buffer__opt is checked early, but that's ok.
Alright, your version fine by me, too. Thanks!
On Tue, Jul 18, 2023 at 4:21 PM Jakub Kicinski kuba@kernel.org wrote:
On Tue, 18 Jul 2023 16:17:24 -0700 Alexei Starovoitov wrote:
Which would encourage bnxt-like hacks. I don't like it tbh. At least skb_pointer_if_linear() has a clear meaning. It's more run-time overhead, since buffer__opt is checked early, but that's ok.
Alright, your version fine by me, too. Thanks!
ok. will send it officially soon.
On 7/19/23 1:21 AM, Jakub Kicinski wrote:
On Tue, 18 Jul 2023 16:17:24 -0700 Alexei Starovoitov wrote:
Which would encourage bnxt-like hacks. I don't like it tbh. At least skb_pointer_if_linear() has a clear meaning. It's more run-time overhead, since buffer__opt is checked early, but that's ok.
Alright, your version fine by me, too. Thanks!
Looks good to me too. Agree that the !buffer check should not live in __skb_header_pointer() and is better handled in the bpf_dynptr_slice() internals.
Thanks, Daniel
linux-kselftest-mirror@lists.linaro.org