On Thu, Jul 20, 2023 at 09:28:52PM +0200, Michał Mirosław wrote:
> This is a massaged version of patch by Muhammad Usama Anjum [1]
> to illustrate my review comments and hopefully push the implementation
> efforts closer to conclusion. The changes are:
[...]
> +static void pagemap_scan_backout_range(struct pagemap_scan_private *p,
> + unsigned long addr, unsigned long end)
> +{
> + struct page_region *cur_buf = &p->cur_buf;
> +
> + if (cur_buf->start != addr) {
> + cur_buf->end = addr;
> + } else {
> + cur_buf->start = cur_buf->end = 0;
> + }
> +
> + p->end_addr = 0;
Just noticed that this is missing:
p->found_pages -= (end - addr) / PAGE_SIZE;
> +}
[...]
Best Regards
Michał Mirosław
v1: https://lore.kernel.org/rust-for-linux/20230614180837.630180-1-ojeda@kernel…
v2:
- Rebased on top of v6.5-rc1, which requires a change from
`kunit_do_failed_assertion` to `__kunit_do_failed_assertion` (since
the former became a macro) and the addition of a call to
`__kunit_abort` (since previously the call was done by the old
function which we cannot use anymore since it is a macro). (David)
- Added prerequisite patch to KUnit header to include `stddef.h` to
support the `KUNIT=y` case. (Reported by Boqun)
- Added comment on the purpose of `trait FromErrno`. (Martin asked
about it)
- Simplify code to use `std::fs::write` instead of `write!`, which
improves code size too. (Suggested by Alice)
- Fix copy-paste type in docs from "error" to "info" and, to make it
proper English, copy the `printk` docs style, i.e. from "info"
to "info-level message" -- and same for the "error" one. (Miguel)
- Swap `FILE` and `LINE` `static`s to keep the same order as done
elsewhere. (Miguel)
- Rename config option from `RUST_KERNEL_KUNIT_TEST` to
`RUST_KERNEL_DOCTESTS` (and update its title), so that we can use
the former for the "normal" (i.e. non-doctests, e.g. `#[test]` ones)
tests in the future. (David)
- Follow the syntax proposed for declaring test metadata in the KTAP
v2 spec, which may also get used for the KUnit test attributes API.
Thus, instead of "# Doctest from line {line}", use
"# {test_name}.location: {file}.{line}", which ideally will allow to
migrate to a KUnit attribute later.
This is done in all cases, i.e. when the tests succeeds, because
it may be useful for users running KUnit manually, or when passing
`--raw_output` to `kunit.py`. (David)
David: I used "location" instead of your suggested "line" alone, in
order to have both in a single line, which looked nice and closer to
the "ASSERTION FAILURE" case/line, since now we do have the original
file (please see below).
- Figure out the original line. This is done by deploying an anchor
so that the difference in lines between the beginning of the test
and the assert, in the generated file, can be computed. Then, we
offset the line number of the original test, which is given by
`rustdoc`. (developed by Boqun)
- Figure out the original file. This is done by walking the
filesystem, checking directory prefixes to reduce the amount of
combinations to check, and it is only done once per file (rather
than per test).
Ambiguities are detected and reported. It does limit the filenames
(module names) we can use, but it is unlikely we will hit it soon
and this should be temporary anyway until `rustdoc` provides us
with the real path. (Miguel)
Tested with both in-tree and `O=` builds, but I would appreciate
extra testing on this one, including via the `kunit.py` script.
- The three last items combined means that we now see this output even
for successful cases:
# rust_doctest_kernel_sync_locked_by_rs_0.location: rust/kernel/sync/locked_by.rs:28
ok 53 rust_doctest_kernel_sync_locked_by_rs_0
Which basically gives the user all the information they need to go
back to the source code of the doctest, while keeping them fairly
stable for bisection, and while avoiding to require users to write
test names manually. (David + Boqun + Miguel)
David: from what I saw in v2 of the RFC for the test attributes API,
the syntax still contains the test name when it is not a suite, so
I followed that, but if you prefer to omit it, please feel free to
do so (for me either way it is fine, and if this is the expected
attribute syntax, I guess it is worth to follow it to make migration
easier later on):
# location: rust/kernel/sync/locked_by.rs:28
ok 53 rust_doctest_kernel_sync_locked_by_rs_0
- Collected `Reviewed-by`s and `Tested-by`s, except for the main
commit due to the substantial changes.
Miguel Ojeda (7):
kunit: test-bug.h: include `stddef.h` for `NULL`
rust: init: make doctests compilable/testable
rust: str: make doctests compilable/testable
rust: sync: make doctests compilable/testable
rust: types: make doctests compilable/testable
rust: support running Rust documentation tests as KUnit ones
MAINTAINERS: add Rust KUnit files to the KUnit entry
MAINTAINERS | 2 +
include/kunit/test-bug.h | 2 +
lib/Kconfig.debug | 13 ++
rust/.gitignore | 2 +
rust/Makefile | 29 ++++
rust/bindings/bindings_helper.h | 1 +
rust/helpers.c | 7 +
rust/kernel/init.rs | 26 +--
rust/kernel/kunit.rs | 163 +++++++++++++++++++
rust/kernel/lib.rs | 2 +
rust/kernel/str.rs | 4 +-
rust/kernel/sync/arc.rs | 9 +-
rust/kernel/sync/lock/mutex.rs | 1 +
rust/kernel/sync/lock/spinlock.rs | 1 +
rust/kernel/types.rs | 6 +-
scripts/.gitignore | 2 +
scripts/Makefile | 4 +
scripts/rustdoc_test_builder.rs | 72 +++++++++
scripts/rustdoc_test_gen.rs | 260 ++++++++++++++++++++++++++++++
19 files changed, 591 insertions(+), 15 deletions(-)
create mode 100644 rust/kernel/kunit.rs
create mode 100644 scripts/rustdoc_test_builder.rs
create mode 100644 scripts/rustdoc_test_gen.rs
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
--
2.41.0
This series fixes an issue which David Spickett found where if we change
the SVE VL while SME is in use we can end up attempting to save state to
an unallocated buffer and adds testing coverage for that plus a bit more
coverage of VL changes, just for paranioa.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (3):
arm64/fpsimd: Ensure SME storage is allocated after SVE VL changes
kselftest/arm64: Add a test case for SVE VL changes with SME active
kselftest/arm64: Validate that changing one VL type does not affect another
arch/arm64/kernel/fpsimd.c | 32 +++++--
tools/testing/selftests/arm64/fp/vec-syscfg.c | 127 +++++++++++++++++++++++++-
2 files changed, 148 insertions(+), 11 deletions(-)
---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230713-arm64-fix-sve-sme-vl-change-60eb1fa6a707
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Hi,
This follows the discussion here:
https://lore.kernel.org/linux-kselftest/20230324123157.bbwvfq4gsxnlnfwb@hou…
This shows a couple of inconsistencies with regard to how device-managed
resources are cleaned up. Basically, devm resources will only be cleaned up
if the device is attached to a bus and bound to a driver. Failing any of
these cases, a call to device_unregister will not end up in the devm
resources being released.
We had to work around it in DRM to provide helpers to create a device for
kunit tests, but the current discussion around creating similar, generic,
helpers for kunit resumed interest in fixing this.
This can be tested using the command:
./tools/testing/kunit/kunit.py run --kunitconfig=drivers/base/test/
I added the fix David suggested back in that discussion which does fix
the tests. The SoB is missing, since David didn't provide it back then.
Let me know what you think,
Maxime
Signed-off-by: Maxime Ripard <maxime(a)cerno.tech>
---
Changes in v2:
- Use an init function
- Document the tests
- Add a fix for the bugs
- Link to v1: https://lore.kernel.org/r/20230329-kunit-devm-inconsistencies-test-v1-0-c33…
---
David Gow (1):
drivers: base: Free devm resources when unregistering a device
Maxime Ripard (2):
drivers: base: Add basic devm tests for root devices
drivers: base: Add basic devm tests for platform devices
drivers/base/core.c | 11 ++
drivers/base/test/.kunitconfig | 2 +
drivers/base/test/Kconfig | 4 +
drivers/base/test/Makefile | 3 +
drivers/base/test/platform-device-test.c | 220 +++++++++++++++++++++++++++++++
drivers/base/test/root-device-test.c | 108 +++++++++++++++
6 files changed, 348 insertions(+)
---
base-commit: 53cdf865f90ba922a854c65ed05b519f9d728424
change-id: 20230329-kunit-devm-inconsistencies-test-5e5a7d01e60d
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
Hi,
This patch series aims to improve the PMU event filter settings with a cleaner
and more organized structure and adds several test cases related to PMU event
filters.
These changes help to ensure that KVM's PMU event filter functions as expected
in all supported use cases.
Any feedback or suggestions are greatly appreciated.
Sincerely,
Jinrong Liang
Changes log:
v5:
- Add more x86 properties for Intel PMU;
- Designated initializer instead of overwrite all members; (Isaku Yamahata)
- PMU event filter invalid flag modified to "KVM_PMU_EVENT_FLAGS_VALID_MASK << 1"; (Isaku Yamahata)
- sizeof(bitmap) is modified to "sizeof(bitmap) * 8" to represent the number of
bits that can be represented by the bitmap variable. (Isaku Yamahata)
Previous:
https://lore.kernel.org/kvm/20230717062343.3743-1-cloudliang@tencent.com/T/
Jinrong Liang (6):
KVM: selftests: Add x86 properties for Intel PMU in processor.h
KVM: selftests: Drop the return of remove_event()
KVM: selftests: Introduce __kvm_pmu_event_filter to improved event
filter settings
KVM: selftests: Add test cases for unsupported PMU event filter input
values
KVM: selftests: Test if event filter meets expectations on fixed
counters
KVM: selftests: Test gp event filters don't affect fixed event filters
.../selftests/kvm/include/x86_64/processor.h | 5 +
.../kvm/x86_64/pmu_event_filter_test.c | 317 ++++++++++++------
2 files changed, 228 insertions(+), 94 deletions(-)
base-commit: 88bb466c9dec4f70d682cf38c685324e7b1b3d60
--
2.39.3
When we collect a signal context with one of the SME modes enabled we will
have enabled that mode behind the compiler and libc's back so they may
issue some instructions not valid in streaming mode, causing spurious
failures.
For the code prior to issuing the BRK to trigger signal handling we need to
stay in streaming mode if we were already there since that's a part of the
signal context the caller is trying to collect. Unfortunately this code
includes a memset() which is likely to be heavily optimised and is likely
to use FP instructions incompatible with streaming mode. We can avoid this
happening by open coding the memset(), inserting a volatile assembly
statement to avoid the compiler recognising what's being done and doing
something in optimisation. This code is not performance critical so the
inefficiency should not be an issue.
After collecting the context we can simply exit streaming mode, avoiding
these issues. Use a full SMSTOP for safety to prevent any issues appearing
with ZA.
Reported-by: Will Deacon <will(a)kernel.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto v6.5-rc1.
- Link to v1: https://lore.kernel.org/r/20230628-arm64-signal-memcpy-fix-v1-1-db3e0300829…
---
.../selftests/arm64/signal/test_signals_utils.h | 28 +++++++++++++++++++++-
1 file changed, 27 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.h b/tools/testing/selftests/arm64/signal/test_signals_utils.h
index 222093f51b67..db28409fd44b 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.h
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.h
@@ -60,13 +60,28 @@ static __always_inline bool get_current_context(struct tdescr *td,
size_t dest_sz)
{
static volatile bool seen_already;
+ int i;
+ char *uc = (char *)dest_uc;
assert(td && dest_uc);
/* it's a genuine invocation..reinit */
seen_already = 0;
td->live_uc_valid = 0;
td->live_sz = dest_sz;
- memset(dest_uc, 0x00, td->live_sz);
+
+ /*
+ * This is a memset() but we don't want the compiler to
+ * optimise it into either instructions or a library call
+ * which might be incompatible with streaming mode.
+ */
+ for (i = 0; i < td->live_sz; i++) {
+ asm volatile("nop"
+ : "+m" (*dest_uc)
+ :
+ : "memory");
+ uc[i] = 0;
+ }
+
td->live_uc = dest_uc;
/*
* Grab ucontext_t triggering a SIGTRAP.
@@ -103,6 +118,17 @@ static __always_inline bool get_current_context(struct tdescr *td,
:
: "memory");
+ /*
+ * If we were grabbing a streaming mode context then we may
+ * have entered streaming mode behind the system's back and
+ * libc or compiler generated code might decide to do
+ * something invalid in streaming mode, or potentially even
+ * the state of ZA. Issue a SMSTOP to exit both now we have
+ * grabbed the state.
+ */
+ if (td->feats_supported & FEAT_SME)
+ asm volatile("msr S0_3_C4_C6_3, xzr");
+
/*
* If we get here with seen_already==1 it implies the td->live_uc
* context has been used to get back here....this probably means
---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230628-arm64-signal-memcpy-fix-7de3b3c8fa10
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Hi All,
This is v2 of my series to clean up mm selftests so that they run correctly on
arm64. See [1] for full explanation.
Changes Since v1 [1]
--------------------
- Patch 1: Explicitly set line buffer mode in ksft_print_header()
- Dropped v1 patch 2 (set execute permissions): Andrew has taken this into his
branch separately.
- Patch 2: Don't compile `soft-dirty` suite for arm64 instead of skipping it
at runtime.
- Patch 2: Declare fewer tests and skip all of test_softdirty() if soft-dirty
is not supported, rather than conditionally marking each check as skipped.
- Added Reviewed-by tags: thanks DavidH!
- Patch 8: Clarified commit message.
[1] https://lore.kernel.org/linux-mm/20230713135440.3651409-1-ryan.roberts@arm.…
Thanks,
Ryan
Ryan Roberts (8):
selftests: Line buffer test program's stdout
selftests/mm: Skip soft-dirty tests on arm64
selftests/mm: Enable mrelease_test for arm64
selftests/mm: Fix thuge-gen test bugs
selftests/mm: va_high_addr_switch should skip unsupported arm64
configs
selftests/mm: Make migration test robust to failure
selftests/mm: Optionally pass duration to transhuge-stress
selftests/mm: Run all tests from run_vmtests.sh
tools/testing/selftests/kselftest.h | 9 ++
tools/testing/selftests/kselftest/runner.sh | 7 +-
tools/testing/selftests/mm/Makefile | 82 ++++++++++---------
tools/testing/selftests/mm/madv_populate.c | 26 +++++-
tools/testing/selftests/mm/migration.c | 14 +++-
tools/testing/selftests/mm/mrelease_test.c | 1 +
tools/testing/selftests/mm/run_vmtests.sh | 28 ++++++-
tools/testing/selftests/mm/settings | 2 +-
tools/testing/selftests/mm/thuge-gen.c | 4 +-
tools/testing/selftests/mm/transhuge-stress.c | 12 ++-
.../selftests/mm/va_high_addr_switch.c | 2 +-
11 files changed, 133 insertions(+), 54 deletions(-)
--
2.25.1
Hi,
Consequential to the previous problem report, this one addresses almost the very
next test script.
The testing environment is the same: 6.5-rc2 vanilla Torvalds tree on Ubuntu 22.04 LTS.
The used config is the same, please find it with the bridge_mdb.sh normal and "set -x"
output on this link (too large to attach):
https://domac.alu.unizg.hr/~mtodorov/linux/selftests/net-forwarding/bridge_…
root@defiant:# ./bridge_mdb.sh
INFO: # Host entries configuration tests
TEST: Common host entries configuration tests (IPv4) [FAIL]
Managed to add IPv4 host entry with a filter mode
TEST: Common host entries configuration tests (IPv6) [FAIL]
Managed to add IPv6 host entry with a filter mode
TEST: Common host entries configuration tests (L2) [FAIL]
Managed to add L2 host entry with a filter mode
INFO: # Port group entries configuration tests - (*, G)
Command "replace" is unknown, try "bridge mdb help".
TEST: Common port group entries configuration tests (IPv4 (*, G)) [FAIL]
Failed to replace IPv4 (*, G) entry
Command "replace" is unknown, try "bridge mdb help".
TEST: Common port group entries configuration tests (IPv6 (*, G)) [FAIL]
Failed to replace IPv6 (*, G) entry
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
RTNETLINK answers: Invalid argument
Error: bridge: (*, G) group is already joined by port.
Error: bridge: (*, G) group is already joined by port.
TEST: IPv4 (*, G) port group entries configuration tests [FAIL]
(S, G) entry not created
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
RTNETLINK answers: Invalid argument
Error: bridge: (*, G) group is already joined by port.
Error: bridge: (*, G) group is already joined by port.
TEST: IPv6 (*, G) port group entries configuration tests [FAIL]
(S, G) entry not created
INFO: # Port group entries configuration tests - (S, G)
Command "replace" is unknown, try "bridge mdb help".
TEST: Common port group entries configuration tests (IPv4 (S, G)) [FAIL]
Failed to replace IPv4 (S, G) entry
Command "replace" is unknown, try "bridge mdb help".
TEST: Common port group entries configuration tests (IPv6 (S, G)) [FAIL]
Failed to replace IPv6 (S, G) entry
Error: bridge: (S, G) group is already joined by port.
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
TEST: IPv4 (S, G) port group entries configuration tests [FAIL]
Managed to add an entry with a filter mode
Error: bridge: (S, G) group is already joined by port.
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
TEST: IPv6 (S, G) port group entries configuration tests [FAIL]
"temp" entry has an unpending group timer
INFO: # Port group entries configuration tests - L2
Command "replace" is unknown, try "bridge mdb help".
TEST: Common port group entries configuration tests (L2 (*, G)) [FAIL]
Failed to replace L2 (*, G) entry
TEST: L2 (*, G) port group entries configuration tests [FAIL]
Managed to add an entry with a filter mode
INFO: # Large scale dump tests
TEST: IPv4 large scale dump tests [ OK ]
TEST: IPv6 large scale dump tests [ OK ]
TEST: L2 large scale dump tests [ OK ]
INFO: # Forwarding tests
Error: bridge: Group is already joined by host.
TEST: IPv4 host entries forwarding tests [FAIL]
Packet not locally received after adding a host entry
Error: bridge: Group is already joined by host.
TEST: IPv6 host entries forwarding tests [FAIL]
Packet locally received after flood
TEST: L2 host entries forwarding tests [FAIL]
Packet not locally received after flood
Command "replace" is unknown, try "bridge mdb help".
TEST: IPv4 port group "exclude" entries forwarding tests [FAIL]
Packet from valid source not received on H2 after adding entry
Command "replace" is unknown, try "bridge mdb help".
TEST: IPv6 port group "exclude" entries forwarding tests [FAIL]
Packet from invalid source received on H2 after adding entry
Command "replace" is unknown, try "bridge mdb help".
TEST: IPv4 port group "include" entries forwarding tests [FAIL]
Packet from valid source not received on H2 after adding entry
Command "replace" is unknown, try "bridge mdb help".
TEST: IPv6 port group "include" entries forwarding tests [FAIL]
Packet from invalid source received on H2 after adding entry
TEST: L2 port entries forwarding tests [ OK ]
INFO: # Control packets tests
Command "replace" is unknown, try "bridge mdb help".
TEST: IGMPv3 MODE_IS_INCLUDE tests [FAIL]
Source not add to source list
Command "replace" is unknown, try "bridge mdb help".
TEST: MLDv2 MODE_IS_INCLUDE tests [FAIL]
Source not add to source list
root@defiant:# bridge mdb show
root@defiant:#
NOTE that several "sleep 10" command looped in the script can easily exceed
the default timeout of 45 seconds, and SIGTERM to the script isn't processed,
so it leaves the system in an unpredictable state from which even
"systemctl restart networking" didn't bail out.
Setting tools/testing/selftests/net/forwarding/settings:timeout=150 seemed enough.
Best regards,
Mirsad Todorovac
This series is a follow up to the recent change [1] which added
per-cpu insert/delete statistics for maps. The bpf_map_sum_elem_count
kfunc presented in the original series was only available to tracing
programs, so let's make it available to all.
The first patch makes types listed in the reg2btf_ids[] array to be
considered trusted by kfuncs.
The second patch allows to treat CONST_PTR_TO_MAP as trusted pointers from
kfunc's point of view by adding it to the reg2btf_ids[] array.
The third patch adds missing const to the map argument of the
bpf_map_sum_elem_count kfunc.
The fourth patch registers the bpf_map_sum_elem_count for all programs,
and patches selftests correspondingly.
[1] https://lore.kernel.org/bpf/20230705160139.19967-1-aspsk@isovalent.com/
v1 -> v2:
* treat the whole reg2btf_ids array as trusted (Alexei)
Anton Protopopov (4):
bpf: consider types listed in reg2btf_ids as trusted
bpf: consider CONST_PTR_TO_MAP as trusted pointer to struct bpf_map
bpf: make an argument const in the bpf_map_sum_elem_count kfunc
bpf: allow any program to use the bpf_map_sum_elem_count kfunc
include/linux/btf_ids.h | 1 +
kernel/bpf/map_iter.c | 7 +++---
kernel/bpf/verifier.c | 22 +++++++++++--------
.../selftests/bpf/progs/map_ptr_kern.c | 5 +++++
4 files changed, 22 insertions(+), 13 deletions(-)
--
2.34.1
bpf_dynptr_slice(_rw) uses a user provided buffer if it can not provide
a pointer to a block of contiguous memory. This buffer is unused in the
case of local dynptrs, and may be unused in other cases as well. There
is no need to require the buffer, as the kfunc can just return NULL if
it was needed and not provided.
This adds another kfunc annotation, __opt, which combines with __sz and
__szk to allow the buffer associated with the size to be NULL. If the
buffer is NULL, the verifier does not check that the buffer is of
sufficient size.
Signed-off-by: Daniel Rosenberg <drosen(a)google.com>
---
Documentation/bpf/kfuncs.rst | 23 ++++++++++++++++++++++-
include/linux/skbuff.h | 2 +-
kernel/bpf/helpers.c | 30 ++++++++++++++++++------------
kernel/bpf/verifier.c | 17 +++++++++++++----
4 files changed, 54 insertions(+), 18 deletions(-)
diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index ea2516374d92..7a3d9de5f315 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -100,7 +100,7 @@ Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
size parameter, and the value of the constant matters for program safety, __k
suffix should be used.
-2.2.2 __uninit Annotation
+2.2.3 __uninit Annotation
-------------------------
This annotation is used to indicate that the argument will be treated as
@@ -117,6 +117,27 @@ Here, the dynptr will be treated as an uninitialized dynptr. Without this
annotation, the verifier will reject the program if the dynptr passed in is
not initialized.
+2.2.4 __opt Annotation
+-------------------------
+
+This annotation is used to indicate that the buffer associated with an __sz or __szk
+argument may be null. If the function is passed a nullptr in place of the buffer,
+the verifier will not check that length is appropriate for the buffer. The kfunc is
+responsible for checking if this buffer is null before using it.
+
+An example is given below::
+
+ __bpf_kfunc void *bpf_dynptr_slice(..., void *buffer__opt, u32 buffer__szk)
+ {
+ ...
+ }
+
+Here, the buffer may be null. If buffer is not null, it at least of size buffer_szk.
+Either way, the returned buffer is either NULL, or of size buffer_szk. Without this
+annotation, the verifier will reject the program if a null pointer is passed in with
+a nonzero size.
+
+
.. _BPF_kfunc_nodef:
2.3 Using an existing kernel function
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 738776ab8838..8ddb4af1a501 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4033,7 +4033,7 @@ __skb_header_pointer(const struct sk_buff *skb, int offset, int len,
if (likely(hlen - offset >= len))
return (void *)data + offset;
- if (!skb || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
+ if (!skb || !buffer || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
return NULL;
return buffer;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 8d368fa353f9..26efb6fbeab2 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2167,13 +2167,15 @@ __bpf_kfunc struct task_struct *bpf_task_from_pid(s32 pid)
* bpf_dynptr_slice() - Obtain a read-only pointer to the dynptr data.
* @ptr: The dynptr whose data slice to retrieve
* @offset: Offset into the dynptr
- * @buffer: User-provided buffer to copy contents into
- * @buffer__szk: Size (in bytes) of the buffer. This is the length of the
- * requested slice. This must be a constant.
+ * @buffer__opt: User-provided buffer to copy contents into. May be NULL
+ * @buffer__szk: Size (in bytes) of the buffer if present. This is the
+ * length of the requested slice. This must be a constant.
*
* For non-skb and non-xdp type dynptrs, there is no difference between
* bpf_dynptr_slice and bpf_dynptr_data.
*
+ * If buffer__opt is NULL, the call will fail if buffer_opt was needed.
+ *
* If the intention is to write to the data slice, please use
* bpf_dynptr_slice_rdwr.
*
@@ -2190,7 +2192,7 @@ __bpf_kfunc struct task_struct *bpf_task_from_pid(s32 pid)
* direct pointer)
*/
__bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset,
- void *buffer, u32 buffer__szk)
+ void *buffer__opt, u32 buffer__szk)
{
enum bpf_dynptr_type type;
u32 len = buffer__szk;
@@ -2210,15 +2212,17 @@ __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset
case BPF_DYNPTR_TYPE_RINGBUF:
return ptr->data + ptr->offset + offset;
case BPF_DYNPTR_TYPE_SKB:
- return skb_header_pointer(ptr->data, ptr->offset + offset, len, buffer);
+ return skb_header_pointer(ptr->data, ptr->offset + offset, len, buffer__opt);
case BPF_DYNPTR_TYPE_XDP:
{
void *xdp_ptr = bpf_xdp_pointer(ptr->data, ptr->offset + offset, len);
if (xdp_ptr)
return xdp_ptr;
- bpf_xdp_copy_buf(ptr->data, ptr->offset + offset, buffer, len, false);
- return buffer;
+ if (!buffer__opt)
+ return NULL;
+ bpf_xdp_copy_buf(ptr->data, ptr->offset + offset, buffer__opt, len, false);
+ return buffer__opt;
}
default:
WARN_ONCE(true, "unknown dynptr type %d\n", type);
@@ -2230,13 +2234,15 @@ __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset
* bpf_dynptr_slice_rdwr() - Obtain a writable pointer to the dynptr data.
* @ptr: The dynptr whose data slice to retrieve
* @offset: Offset into the dynptr
- * @buffer: User-provided buffer to copy contents into
- * @buffer__szk: Size (in bytes) of the buffer. This is the length of the
- * requested slice. This must be a constant.
+ * @buffer__opt: User-provided buffer to copy contents into. May be NULL
+ * @buffer__szk: Size (in bytes) of the buffer if present. This is the
+ * length of the requested slice. This must be a constant.
*
* For non-skb and non-xdp type dynptrs, there is no difference between
* bpf_dynptr_slice and bpf_dynptr_data.
*
+ * If buffer__opt is NULL, the call will fail if buffer_opt was needed.
+ *
* The returned pointer is writable and may point to either directly the dynptr
* data at the requested offset or to the buffer if unable to obtain a direct
* data pointer to (example: the requested slice is to the paged area of an skb
@@ -2267,7 +2273,7 @@ __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset
* direct pointer)
*/
__bpf_kfunc void *bpf_dynptr_slice_rdwr(const struct bpf_dynptr_kern *ptr, u32 offset,
- void *buffer, u32 buffer__szk)
+ void *buffer__opt, u32 buffer__szk)
{
if (!ptr->data || bpf_dynptr_is_rdonly(ptr))
return NULL;
@@ -2294,7 +2300,7 @@ __bpf_kfunc void *bpf_dynptr_slice_rdwr(const struct bpf_dynptr_kern *ptr, u32 o
* will be copied out into the buffer and the user will need to call
* bpf_dynptr_write() to commit changes.
*/
- return bpf_dynptr_slice(ptr, offset, buffer, buffer__szk);
+ return bpf_dynptr_slice(ptr, offset, buffer__opt, buffer__szk);
}
__bpf_kfunc void *bpf_cast_to_kern_ctx(void *obj)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index fbcf5a4e2fcd..708ae7bca1fe 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9398,6 +9398,11 @@ static bool is_kfunc_arg_const_mem_size(const struct btf *btf,
return __kfunc_param_match_suffix(btf, arg, "__szk");
}
+static bool is_kfunc_arg_optional(const struct btf *btf, const struct btf_param *arg)
+{
+ return __kfunc_param_match_suffix(btf, arg, "__opt");
+}
+
static bool is_kfunc_arg_constant(const struct btf *btf, const struct btf_param *arg)
{
return __kfunc_param_match_suffix(btf, arg, "__k");
@@ -10464,13 +10469,17 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
break;
case KF_ARG_PTR_TO_MEM_SIZE:
{
+ struct bpf_reg_state *buff_reg = ®s[regno];
+ const struct btf_param *buff_arg = &args[i];
struct bpf_reg_state *size_reg = ®s[regno + 1];
const struct btf_param *size_arg = &args[i + 1];
- ret = check_kfunc_mem_size_reg(env, size_reg, regno + 1);
- if (ret < 0) {
- verbose(env, "arg#%d arg#%d memory, len pair leads to invalid memory access\n", i, i + 1);
- return ret;
+ if (!register_is_null(buff_reg) || !is_kfunc_arg_optional(meta->btf, buff_arg)) {
+ ret = check_kfunc_mem_size_reg(env, size_reg, regno + 1);
+ if (ret < 0) {
+ verbose(env, "arg#%d arg#%d memory, len pair leads to invalid memory access\n", i, i + 1);
+ return ret;
+ }
}
if (is_kfunc_arg_const_mem_size(meta->btf, size_arg, size_reg)) {
base-commit: 6e98b09da931a00bf4e0477d0fa52748bf28fcce
--
2.40.1.495.gc816e09b53d-goog