This series expands the XDP TX metadata framework to allow user
applications to pass per packet 64-bit launch time directly to the kernel
driver, requesting launch time hardware offload support. The XDP TX
metadata framework will not perform any clock conversion or packet
reordering.
Please note that the role of Tx metadata is just to pass the launch time,
not to enable the offload feature. Users will need to enable the launch
time hardware offload feature of the device by using the respective
command, such as the tc-etf command.
Although some devices use the tc-etf command to enable their launch time
hardware offload feature, xsk packets will not go through the etf qdisc.
Therefore, in my opinion, the launch time should always be based on the PTP
Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the
clock source.
To simplify the test steps, I modified the xdp_hw_metadata bpf self-test
tool in such a way that it will set the launch time based on the offset
provided by the user and the value of the Receive Hardware Timestamp, which
is against the PHC. This will eliminate the need to discipline System Clock
with the PHC and then use clock_gettime() to get the time.
Please note that AF_XDP lacks a feedback mechanism to inform the
application if the requested launch time is invalid. So, users are expected
to familiar with the horizon of the launch time of the device they use and
not request a launch time that is beyond the horizon. Otherwise, the driver
might interpret the launch time incorrectly and react wrongly. For stmmac
and igc, where modulo computation is used, a launch time larger than the
horizon will cause the device to transmit the packet earlier that the
requested launch time.
Although there is no feedback mechanism for the launch time request
for now, user still can check whether the requested launch time is
working or not, by requesting the Transmit Completion Hardware Timestamp.
v10:
- use net_err_ratelimited(), instead of net_ratelimit() (Maciej)
- accumulate the amount of used descs in local variable and update the
igc_metadata_request::used_desc once (Maciej)
- Ensure reverse christmas tree rule (Maciej)
V9: https://lore.kernel.org/netdev/20250206060408.808325-1-yoong.siang.song@int…
- Remove the igc_desc_unused() checking (Maciej)
- Ensure that skb allocation and DMA mapping work before proceeding to
fill in igc_tx_buffer info, context desc, and data desc (Maciej)
- Rate limit the error messages (Maciej)
- Update the comment to indicate that the 2 descriptors needed by the
empty frame are already taken into consideration (Maciej)
- Handle the case where the insertion of an empty frame fails and
explain the reason behind (Maciej)
- put self SOB tag as last tag (Maciej)
V8: https://lore.kernel.org/netdev/20250205024116.798862-1-yoong.siang.song@int…
- check the number of used descriptor in xsk_tx_metadata_request()
by using used_desc of struct igc_metadata_request, and then decreases
the budget with it (Maciej)
- submit another bug fix patch to set the buffer type for empty frame (Maciej):
https://lore.kernel.org/netdev/20250205023603.798819-1-yoong.siang.song@int…
V7: https://lore.kernel.org/netdev/20250204004907.789330-1-yoong.siang.song@int…
- split the refactoring code of igc empty packet insertion into a separate
commit (Faizal)
- add explanation on why the value "4" is used as igc transmit budget
(Faizal)
- perform a stress test by sending 1000 packets with 10ms interval and
launch time set to 500us in the future (Faizal & Yong Liang)
V6: https://lore.kernel.org/netdev/20250116155350.555374-1-yoong.siang.song@int…
- fix selftest build errors by using asprintf() and realloc(), instead of
managing the buffer sizes manually (Daniel, Stanislav)
V5: https://lore.kernel.org/netdev/20250114152718.120588-1-yoong.siang.song@int…
- change netdev feature name from tx-launch-time to tx-launch-time-fifo
to explicitly state the FIFO behaviour (Stanislav)
- improve the looping of xdp_hw_metadata app to wait for packet tx
completion to be more readable by using clock_gettime() (Stanislav)
- add launch time setup steps into xdp_hw_metadata app (Stanislav)
V4: https://lore.kernel.org/netdev/20250106135506.9687-1-yoong.siang.song@intel…
- added XDP launch time support to the igc driver (Jesper & Florian)
- added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper)
- added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub)
- added step to enable launch time in the commit message (Jesper & Willem)
- explicitly documented the type of launch_time and which clock source
it is against (Willem)
V3: https://lore.kernel.org/netdev/20231203165129.1740512-1-yoong.siang.song@in…
- renamed to use launch time (Jesper & Willem)
- changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s
because some NICs do not support such a large future time.
V2: https://lore.kernel.org/netdev/20231201062421.1074768-1-yoong.siang.song@in…
- renamed to use Earliest TxTime First (Willem)
- renamed to use txtime (Willem)
V1: https://lore.kernel.org/netdev/20231130162028.852006-1-yoong.siang.song@int…
Song Yoong Siang (5):
xsk: Add launch time hardware offload support to XDP Tx metadata
selftests/bpf: Add launch time request to xdp_hw_metadata
net: stmmac: Add launch time support to XDP ZC
igc: Refactor empty frame insertion for launch time support
igc: Add launch time support to XDP ZC
Documentation/netlink/specs/netdev.yaml | 4 +
Documentation/networking/xsk-tx-metadata.rst | 62 +++++++
drivers/net/ethernet/intel/igc/igc.h | 1 +
drivers/net/ethernet/intel/igc/igc_main.c | 143 +++++++++++----
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 +
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++
include/net/xdp_sock.h | 10 ++
include/net/xdp_sock_drv.h | 1 +
include/uapi/linux/if_xdp.h | 10 ++
include/uapi/linux/netdev.h | 3 +
net/core/netdev-genl.c | 2 +
net/xdp/xsk.c | 3 +
tools/include/uapi/linux/if_xdp.h | 10 ++
tools/include/uapi/linux/netdev.h | 3 +
tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++-
15 files changed, 396 insertions(+), 39 deletions(-)
--
2.34.1
Correct the typo "adress" to "address" in a comment in conntrack_icmp_related.sh
to improve clarity.
Signed-off-by: Marcelo Moreira <marcelomoreira1905(a)gmail.com>
---
tools/testing/selftests/net/netfilter/conntrack_icmp_related.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/netfilter/conntrack_icmp_related.sh b/tools/testing/selftests/net/netfilter/conntrack_icmp_related.sh
index c63d840ead61..f63b7f12b36a 100755
--- a/tools/testing/selftests/net/netfilter/conntrack_icmp_related.sh
+++ b/tools/testing/selftests/net/netfilter/conntrack_icmp_related.sh
@@ -171,7 +171,7 @@ table inet filter {
}
EOF
-# make sure NAT core rewrites adress of icmp error if nat is used according to
+# make sure NAT core rewrites address of icmp error if nat is used according to
# conntrack nat information (icmp error will be directed at nsrouter1 address,
# but it needs to be routed to nsclient1 address).
ip netns exec "$nsrouter1" nft -f - <<EOF
--
2.48.1
Hi all,
This is v7 of the Rust/KUnit integration patch. I think all of the
suggestions have at least been responded to (even if there are a few I'm
leaving as either future projects or matters of taste). Hopefully this
is good-to-go for 6.15, so we can start using it concurrently with
making any additional improvements we may wish.
This series was originally written by José Expósito, and has been
modified and updated by Matt Gilbride, Miguel Ojeda, and myself. The
original version can be found here:
https://github.com/Rust-for-Linux/linux/pull/950
Add support for writing KUnit tests in Rust. While Rust doctests are
already converted to KUnit tests and run, they're really better suited
for examples, rather than as first-class unit tests.
This series implements a series of direct Rust bindings for KUnit tests,
as well as a new macro which allows KUnit tests to be written using a
close variant of normal Rust unit test syntax. The only change required
is replacing '#[cfg(test)]' with '#[kunit_tests(kunit_test_suite_name)]'
An example test would look like:
#[kunit_tests(rust_kernel_hid_driver)]
mod tests {
use super::*;
use crate::{c_str, driver, hid, prelude::*};
use core::ptr;
struct SimpleTestDriver;
impl Driver for SimpleTestDriver {
type Data = ();
}
#[test]
fn rust_test_hid_driver_adapter() {
let mut hid = bindings::hid_driver::default();
let name = c_str!("SimpleTestDriver");
static MODULE: ThisModule = unsafe { ThisModule::from_ptr(ptr::null_mut()) };
let res = unsafe {
<hid::Adapter<SimpleTestDriver> as driver::DriverOps>::register(&mut hid, name, &MODULE)
};
assert_eq!(res, Err(ENODEV)); // The mock returns -19
}
}
Please give this a go, and make sure I haven't broken it! There's almost
certainly a lot of improvements which can be made -- and there's a fair
case to be made for replacing some of this with generated C code which
can use the C macros -- but this is hopefully an adequate implementation
for now, and the interface can (with luck) remain the same even if the
implementation changes.
A few small notable missing features:
- Attributes (like the speed of a test) are hardcoded to the default
value.
- Similarly, the module name attribute is hardcoded to NULL. In C, we
use the KBUILD_MODNAME macro, but I couldn't find a way to use this
from Rust which wasn't more ugly than just disabling it.
- Assertions are not automatically rewritten to use KUnit assertions.
---
Changes since v6:
https://lore.kernel.org/rust-for-linux/20250214074051.1619256-1-davidgow@go…
- Fixed an [allow(unused_unsafe)] which ended up in patch 2 instead of
patch 1. (Thanks, Tamir!)
- Doc comments now have several useful links. (Thanks, Tamir!)
- Fix a potential compile error under macos. (Thanks, Tamir!)
- Several small tidy-ups to limit unsafe usage. (Thanks, Tamir!)
Changes since v5:
https://lore.kernel.org/all/20241213081035.2069066-1-davidgow@google.com/
- Rebased against 6.14-rc1
- Fixed a bunch of warnings / clippy lints introduced in Rust 1.83 and
1.84.
- No longer needs static_mut_refs / const_mut_refs, and is much cleaned
up as a result. (Thanks, Miguel)
- Major documentation and example fixes. (Thanks, Miguel)
Changes since v4:
https://lore.kernel.org/linux-kselftest/20241101064505.3820737-1-davidgow@g…
- Rebased against 6.13-rc1
- Allowed an unused_unsafe warning after the behaviour of addr_of_mut!()
changed in Rust 1.82. (Thanks Boqun, Miguel)
- "Expect" that the sample assert_eq!(1+1, 2) produces a clippy warning
due to a redundant assertion. (Thanks Boqun, Miguel)
- Fix some missing safety comments, and remove some unneeded 'unsafe'
blocks. (Thanks Boqun)
- Fix a couple of minor rustfmt issues which were triggering checkpatch
warnings.
Changes since v3:
https://lore.kernel.org/linux-kselftest/20241030045719.3085147-2-davidgow@g…
- The kunit_unsafe_test_suite!() macro now panic!s if the suite name is
too long, triggering a compile error. (Thanks, Alice!)
- The #[kunit_tests()] macro now preserves span information, so
errors can be better reported. (Thanks, Boqun!)
- The example tests have been updated to no longer use assert_eq!() with
a constant bool argument (which triggered a clippy warning now we
have the span info).
Changes since v2:
https://lore.kernel.org/linux-kselftest/20241029092422.2884505-1-davidgow@g…
- Include missing rust/macros/kunit.rs file from v2. (Thanks Boqun!)
- The kunit_unsafe_test_suite!() macro will truncate the name of the
suite if it is too long. (Thanks Alice!)
- The proc macro now emits an error if the suite name is too long.
- We no longer needlessly use UnsafeCell<> in
kunit_unsafe_test_suite!(). (Thanks Alice!)
Changes since v1:
https://lore.kernel.org/lkml/20230720-rustbind-v1-0-c80db349e3b5@google.com…
- Rebase on top of the latest rust-next (commit 718c4069896c)
- Make kunit_case a const fn, rather than a macro (Thanks Boqun)
- As a result, the null terminator is now created with
kernel::kunit::kunit_case_null()
- Use the C kunit_get_current_test() function to implement
in_kunit_test(), rather than re-implementing it (less efficiently)
ourselves.
Changes since the GitHub PR:
- Rebased on top of kselftest/kunit
- Add const_mut_refs feature
This may conflict with https://lore.kernel.org/lkml/20230503090708.2524310-6-nmi@metaspace.dk/
- Add rust/macros/kunit.rs to the KUnit MAINTAINERS entry
---
José Expósito (3):
rust: kunit: add KUnit case and suite macros
rust: macros: add macro to easily run KUnit tests
rust: kunit: allow to know if we are in a test
MAINTAINERS | 1 +
rust/kernel/kunit.rs | 199 +++++++++++++++++++++++++++++++++++++++++++
rust/macros/kunit.rs | 161 ++++++++++++++++++++++++++++++++++
rust/macros/lib.rs | 29 +++++++
4 files changed, 390 insertions(+)
create mode 100644 rust/macros/kunit.rs
--
2.48.1.601.g30ceb7b040-goog
Fix misspellings flagged by codespell.
Signed-off-by: Ujwal Kundur <ujwal.kundur(a)gmail.com>
---
tools/testing/selftests/mm/uffd-common.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/uffd-common.c b/tools/testing/selftests/mm/uffd-common.c
index 717539eddf98..254409972ddb 100644
--- a/tools/testing/selftests/mm/uffd-common.c
+++ b/tools/testing/selftests/mm/uffd-common.c
@@ -323,7 +323,7 @@ int uffd_test_ctx_init(uint64_t features, const char **errmsg)
ret = userfaultfd_open(&features);
if (ret) {
if (errmsg)
- *errmsg = "possible lack of priviledge";
+ *errmsg = "possible lack of privilege";
return ret;
}
@@ -348,7 +348,7 @@ int uffd_test_ctx_init(uint64_t features, const char **errmsg)
/*
* After initialization of area_src, we must explicitly release pages
* for area_dst to make sure it's fully empty. Otherwise we could have
- * some area_dst pages be errornously initialized with zero pages,
+ * some area_dst pages be erroneously initialized with zero pages,
* hence we could hit memory corruption later in the test.
*
* One example is when THP is globally enabled, above allocate_area()
--
2.20.1
The driver for the 8250 console is not used, as no port is found.
Instead the prom0 bootconsole is used the whole time.
The prom driver translates '\n' to '\r\n' before handing of the message
off to the firmware. The firmware performs the same translation again.
In the final output produced by QEMU each line ends with '\r\r\n'.
This breaks the kunit parser, which can only handle '\r\n' and '\n'.
Use the Zilog console instead. It works correctly, is the one documented
by the QEMU manual and also saves a bit of codesize:
Before=4051011, After=4023326, chg -0.68%
Observed on QEMU 9.2.0.
Fixes: 87c9c1631788 ("kunit: tool: add support for QEMU")
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
tools/testing/kunit/qemu_configs/sparc.py | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/kunit/qemu_configs/sparc.py b/tools/testing/kunit/qemu_configs/sparc.py
index e975c4331a7c2a74f8ade61c3f31ff0d37314545..256d9573b44646533d1a6f768976628adc87921e 100644
--- a/tools/testing/kunit/qemu_configs/sparc.py
+++ b/tools/testing/kunit/qemu_configs/sparc.py
@@ -2,8 +2,9 @@ from ..qemu_config import QemuArchParams
QEMU_ARCH = QemuArchParams(linux_arch='sparc',
kconfig='''
-CONFIG_SERIAL_8250=y
-CONFIG_SERIAL_8250_CONSOLE=y''',
+CONFIG_SERIAL_SUNZILOG=y
+CONFIG_SERIAL_SUNZILOG_CONSOLE=y
+''',
qemu_arch='sparc',
kernel_path='arch/sparc/boot/zImage',
kernel_command_line='console=ttyS0 mem=256M',
---
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
change-id: 20250214-kunit-qemu-sparc-console-73ece282d867
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Syzbot caught an array out-of-bounds bug [1]. It turns out that when the
BPF program runs through do_misc_fixups(), it allocates an extra 8 bytes
on the call stack, which eventually causes stack_depth to exceed 512.
I was able to reproduce this issue probabilistically by enabling
CONFIG_UBSAN=y and disabling CONFIG_BPF_JIT_ALWAYS_ON with the selfttest
I provide in second patch(although it doesn't happen every time - I didn't
dig deeper into why UBSAN behaves this way).
Furthermore, if I set /proc/sys/net/core/bpf_jit_enable to 0 to disable
the jit, a panic occurs, and the reason is the same, that bpf_func is
assigned an incorrect address.
[---[ end trace ]---
[Oops: general protection fault, probably for non-canonical address
0x100f0e0e0d090808: 0000 [#1] PREEMPT SMP NOPTI
[Tainted: [W]=WARN, [O]=OOT_MODULE
[RIP: 0010:bpf_test_run+0x1d2/0x360
[RSP: 0018:ffffafc7955178a0 EFLAGS: 00010246
[RAX: 100f0e0e0d090808 RBX: ffff8e9fdb2c4100 RCX: 0000000000000018
[RDX: 00000000002b5b18 RSI: ffffafc780497048 RDI: ffff8ea04d601700
[RBP: ffffafc780497000 R08: ffffafc795517a0c R09: 0000000000000000
[R10: 0000000000000000 R11: fefefefefefefeff R12: ffff8ea04d601700
[R13: ffffafc795517928 R14: ffffafc795517928 R15: 0000000000000000
[CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[CR2: 00007f181c064648 CR3: 00000001aa2be003 CR4: 0000000000770ef0
[DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[PKRU: 55555554
[Call Trace:
[ <TASK>
[ ? die_addr+0x36/0x90
[ ? exc_general_protection+0x237/0x430
[ ? asm_exc_general_protection+0x26/0x30
[ ? bpf_test_run+0x1d2/0x360
[ ? bpf_test_run+0x10d/0x360
[ ? __link_object+0x12a/0x1e0
[ ? slab_build_skb+0x23/0x130
[ ? kmem_cache_alloc_noprof+0x2ea/0x3f0
[ ? sk_prot_alloc+0xc2/0x120
[ bpf_prog_test_run_skb+0x21b/0x590
[ __sys_bpf+0x340/0xa80
[ __x64_sys_bpf+0x1e/0x30
---
v2 -> v3:
Optimized some code naming and conditional judgment logic.
https://lore.kernel.org/bpf/20250213131214.164982-1-mrpre@163.com/T/
v1 -> v2:
Directly reject loading programs with a stack size greater than 512 when
jit disabled.(Suggested by Alexei Starovoitov)
https://lore.kernel.org/bpf/20250212135251.85487-1-mrpre@163.com/T/
Jiayuan Chen (3):
bpf: Fix array bounds error with may_goto
selftests/bpf: Introduce __load_if_JITed annotation for tests
selftests/bpf: Add selftest for may_goto
kernel/bpf/core.c | 19 +++++--
kernel/bpf/verifier.c | 7 +++
tools/testing/selftests/bpf/progs/bpf_misc.h | 2 +
.../selftests/bpf/progs/verifier_stack_ptr.c | 52 +++++++++++++++++++
tools/testing/selftests/bpf/test_loader.c | 26 ++++++++++
5 files changed, 102 insertions(+), 4 deletions(-)
--
2.47.1
Update error message in mptcp_connect.sh to use a more formal and
consistent wording. The phrase "not able to" was replaced with "Unable to"
for better clarity.
Signed-off-by: Marcelo Moreira <marcelomoreira1905(a)gmail.com>
---
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index 5e3c56253274..bdc9c22af6be 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -261,8 +261,8 @@ check_mptcp_disabled()
# mainly to cover more code
if ! ip netns exec ${disabled_ns} sysctl net.mptcp >/dev/null; then
- mptcp_lib_pr_fail "not able to list net.mptcp sysctl knobs"
- mptcp_lib_result_fail "not able to list net.mptcp sysctl knobs"
+ mptcp_lib_pr_fail "Unable to list net.mptcp sysctl knobs"
+ mptcp_lib_result_fail "Unable to list net.mptcp sysctl knobs"
ret=${KSFT_FAIL}
return 1
fi
--
2.48.1
This is one of just 3 remaining "Test Module" kselftests (the others
being bitmap and scanf), the rest having been converted to KUnit.
I tested this using:
$ tools/testing/kunit/kunit.py run --arch arm64 --make_options LLVM=1 printf
I have also sent out a series converting scanf[0].
Link: https://lore.kernel.org/all/20250204-scanf-kunit-convert-v3-0-386d7c3ee714@… [0]
Signed-off-by: Tamir Duberstein <tamird(a)gmail.com>
---
Changes in v3:
- Remove extraneous trailing newlines from failure messages.
- Replace `pr_warn` with `kunit_warn`.
- Drop arch changes.
- Remove KUnit boilerplate from CONFIG_PRINTF_KUNIT_TEST help text.
- Restore `total_tests` counting.
- Remove tc_fail macro in last patch.
- Link to v2: https://lore.kernel.org/r/20250207-printf-kunit-convert-v2-0-057b23860823@g…
Changes in v2:
- Incorporate code review from prior work[0] by Arpitha Raghunandan.
- Link to v1: https://lore.kernel.org/r/20250204-printf-kunit-convert-v1-0-ecf1b846a4de@g…
Link: https://lore.kernel.org/lkml/20200817043028.76502-1-98.arpi@gmail.com/t/#u [0]
---
Tamir Duberstein (2):
printf: convert self-test to KUnit
printf: break kunit into test cases
Documentation/core-api/printk-formats.rst | 2 +-
MAINTAINERS | 2 +-
lib/Kconfig.debug | 12 +-
lib/Makefile | 2 +-
lib/{test_printf.c => printf_kunit.c} | 429 +++++++++++++-----------------
tools/testing/selftests/lib/config | 1 -
tools/testing/selftests/lib/printf.sh | 4 -
7 files changed, 192 insertions(+), 260 deletions(-)
---
base-commit: a64dcfb451e254085a7daee5fe51bf22959d52d3
change-id: 20250131-printf-kunit-convert-fd4012aa2ec6
Best regards,
--
Tamir Duberstein <tamird(a)gmail.com>
The current code returns -ENOMEM if test->bar[barno] is NULL.
There can be two reasons why test->bar[barno] is NULL:
1) The pci_ioremap_bar() call in pci_endpoint_test_probe() failed.
2) The BAR was skipped, because it is disabled by the endpoint.
Many PCI endpoint controller drivers will disable all BARs in their
init function. A disabled BAR will have a size of 0.
A PCI endpoint function driver will be able to enable any BAR that
is not marked as BAR_RESERVED (which means that the BAR should not
be touched by the EPF driver).
Thus, perform check if the size is 0, before checking if
test->bar[barno] is NULL, such that we can return different errors.
This will allow the selftests to return SKIP instead of FAIL for
disabled BARs.
Signed-off-by: Niklas Cassel <cassel(a)kernel.org>
---
Hello PCI maintainers.
This patch might give a trivial conflict with:
https://lore.kernel.org/linux-pci/20250123095906.3578241-2-cassel@kernel.or…
because the context lines (lines that haven't been changed)
might be different. If there is a conflict, simply look at
this patch by itself, and resolution should be trivial.
drivers/misc/pci_endpoint_test.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
index d5ac71a49386..b95980b29eb9 100644
--- a/drivers/misc/pci_endpoint_test.c
+++ b/drivers/misc/pci_endpoint_test.c
@@ -292,11 +292,13 @@ static int pci_endpoint_test_bar(struct pci_endpoint_test *test,
void *read_buf __free(kfree) = NULL;
struct pci_dev *pdev = test->pdev;
+ bar_size = pci_resource_len(pdev, barno);
+ if (!bar_size)
+ return -ENODATA;
+
if (!test->bar[barno])
return -ENOMEM;
- bar_size = pci_resource_len(pdev, barno);
-
if (barno == test->test_reg_bar)
bar_size = 0x4;
--
2.48.1
Instead of inlining equivalents, use lib.sh-provided primitives.
Use defer to manage vx lifetime.
This will make it easier to extend the test in the next patch.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Ido Schimmel <idosch(a)nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor(a)blackwall.org>
---
Notes:
CC: Simon Horman <horms(a)kernel.org>
CC: Shuah Khan <shuah(a)kernel.org>
CC: linux-kselftest(a)vger.kernel.org
.../net/test_vxlan_fdb_changelink.sh | 39 ++++++++++++-------
1 file changed, 24 insertions(+), 15 deletions(-)
diff --git a/tools/testing/selftests/net/test_vxlan_fdb_changelink.sh b/tools/testing/selftests/net/test_vxlan_fdb_changelink.sh
index 2d442cdab11e..6f2bca4b346c 100755
--- a/tools/testing/selftests/net/test_vxlan_fdb_changelink.sh
+++ b/tools/testing/selftests/net/test_vxlan_fdb_changelink.sh
@@ -1,29 +1,38 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
-# Check FDB default-remote handling across "ip link set".
+ALL_TESTS="
+ test_set_remote
+"
+source lib.sh
check_remotes()
{
local what=$1; shift
local N=$(bridge fdb sh dev vx | grep 00:00:00:00:00:00 | wc -l)
- echo -ne "expected two remotes after $what\t"
- if [[ $N != 2 ]]; then
- echo "[FAIL]"
- EXIT_STATUS=1
- else
- echo "[ OK ]"
- fi
+ ((N == 2))
+ check_err $? "expected 2 remotes after $what, got $N"
}
-ip link add name vx up type vxlan id 2000 dstport 4789
-bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.20 self permanent
-bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.30 self permanent
-check_remotes "fdb append"
+# Check FDB default-remote handling across "ip link set".
+test_set_remote()
+{
+ RET=0
-ip link set dev vx type vxlan remote 192.0.2.30
-check_remotes "link set"
+ ip_link_add vx up type vxlan id 2000 dstport 4789
+ bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.20 self permanent
+ bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.30 self permanent
+ check_remotes "fdb append"
+
+ ip link set dev vx type vxlan remote 192.0.2.30
+ check_remotes "link set"
+
+ log_test 'FDB default-remote handling across "ip link set"'
+}
+
+trap defer_scopes_cleanup EXIT
+
+tests_run
-ip link del dev vx
exit $EXIT_STATUS
--
2.47.0
This helper could be useful to more than just forwarding tests.
Move it upstairs and port over to log_test_skip().
Split the function into two parts: the bit that actually checks and
reports skip, which is in a new function check_command(). And a bit
that exits the test script if the check fails. This allows users
consistent checking behavior while giving an option to bail out from
a single test without bailing out of the whole script.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Ido Schimmel <idosch(a)nvidia.com>
Reviewed-by: Nikolay Aleksandrov <razor(a)blackwall.org>
---
Notes:
CC: Simon Horman <horms(a)kernel.org>
CC: Shuah Khan <shuah(a)kernel.org>
CC: linux-kselftest(a)vger.kernel.org
tools/testing/selftests/net/forwarding/lib.sh | 10 ----------
tools/testing/selftests/net/lib.sh | 19 +++++++++++++++++++
2 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 8de80acf249e..508f3c700d71 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -291,16 +291,6 @@ if [[ "$CHECK_TC" = "yes" ]]; then
check_tc_version
fi
-require_command()
-{
- local cmd=$1; shift
-
- if [[ ! -x "$(command -v "$cmd")" ]]; then
- echo "SKIP: $cmd not installed"
- exit $ksft_skip
- fi
-}
-
# IPv6 support was added in v3.0
check_mtools_version()
{
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index 0bd9a038a1f0..975be4fdbcdb 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -450,6 +450,25 @@ kill_process()
{ kill $pid && wait $pid; } 2>/dev/null
}
+check_command()
+{
+ local cmd=$1; shift
+
+ if [[ ! -x "$(command -v "$cmd")" ]]; then
+ log_test_skip "$cmd not installed"
+ return $EXIT_STATUS
+ fi
+}
+
+require_command()
+{
+ local cmd=$1; shift
+
+ if ! check_command "$cmd"; then
+ exit $EXIT_STATUS
+ fi
+}
+
ip_link_add()
{
local name=$1; shift
--
2.47.0
This patch series includes some netns-related improvements and fixes for
rtnetlink, to make link creation more intuitive:
1) Creating link in another net namespace doesn't conflict with link
names in current one.
2) Refector rtnetlink link creation. Create link in target namespace
directly.
So that
# ip link add netns ns1 link-netns ns2 tun0 type gre ...
will create tun0 in ns1, rather than create it in ns2 and move to ns1.
And don't conflict with another interface named "tun0" in current netns.
Patch 01 avoids link name conflict in different netns.
To achieve 2), there're mainly 3 steps:
- Patch 02 packs newlink() parameters into a struct, including
the original "src_net" along with more netns context. No semantic
changes are introduced.
- Patch 03 ~ 07 converts device drivers to use the explicit netns
extracted from params.
- Patch 08 ~ 09 removes the old netns parameter, and converts
rtnetlink to create device in target netns directly.
Patch 10 ~ 11 adds some tests for link name and link netns.
BTW please note there're some issues found in current code:
- In amt_newlink() drivers/net/amt.c:
amt->net = net;
...
amt->stream_dev = dev_get_by_index(net, ...
Uses net, but amt_lookup_upper_dev() only searches in dev_net.
So the AMT device may not be properly deleted if it's in a different
netns from lower dev.
- In lowpan_newlink() in net/ieee802154/6lowpan/core.c:
wdev = dev_get_by_index(dev_net(ldev), nla_get_u32(tb[IFLA_LINK]));
Looks for IFLA_LINK in dev_net, but in theory the ifindex is defined
in link netns.
And thanks to Kuniyuki for fixing related issues in gtp and pfcp:
https://lore.kernel.org/netdev/20250110014754.33847-1-kuniyu@amazon.com/
---
v9:
- Change the prototype of macvlan_common_newlink().
- Minor fixes of coding style and local variables.
v8:
link: https://lore.kernel.org/all/20250113143719.7948-1-shaw.leon@gmail.com/
- Move dev and ext_ack out from param struct.
- Validate link_net and dev_net are identical for 6lowpan.
v7:
link: https://lore.kernel.org/all/20250104125732.17335-1-shaw.leon@gmail.com/
- Add selftest kconfig.
- Remove a duplicated test of ip6gre.
v6:
link: https://lore.kernel.org/all/20241218130909.2173-1-shaw.leon@gmail.com/
- Split prototype, driver and rtnetlink changes.
- Add more tests for link netns.
- Fix IPv6 tunnel net overwriten in ndo_init().
- Reorder variable declarations.
- Exclude a ip_tunnel-specific patch.
v5:
link: https://lore.kernel.org/all/20241209140151.231257-1-shaw.leon@gmail.com/
- Fix function doc in batman-adv.
- Include peer_net in rtnl newlink parameters.
v4:
link: https://lore.kernel.org/all/20241118143244.1773-1-shaw.leon@gmail.com/
- Pack newlink() parameters to a single struct.
- Use ynl async_msg_queue.empty() in selftest.
v3:
link: https://lore.kernel.org/all/20241113125715.150201-1-shaw.leon@gmail.com/
- Drop "netns_atomic" flag and module parameter. Add netns parameter to
newlink() instead, and convert drivers accordingly.
- Move python NetNSEnter helper to net selftest lib.
v2:
link: https://lore.kernel.org/all/20241107133004.7469-1-shaw.leon@gmail.com/
- Check NLM_F_EXCL to ensure only link creation is affected.
- Add self tests for link name/ifindex conflict and notifications
in different netns.
- Changes in dummy driver and ynl in order to add the test case.
v1:
link: https://lore.kernel.org/all/20241023023146.372653-1-shaw.leon@gmail.com/
Xiao Liang (11):
rtnetlink: Lookup device in target netns when creating link
rtnetlink: Pack newlink() params into struct
net: Use link netns in newlink() of rtnl_link_ops
ieee802154: 6lowpan: Validate link netns in newlink() of rtnl_link_ops
net: ip_tunnel: Use link netns in newlink() of rtnl_link_ops
net: ipv6: Use link netns in newlink() of rtnl_link_ops
net: xfrm: Use link netns in newlink() of rtnl_link_ops
rtnetlink: Remove "net" from newlink params
rtnetlink: Create link directly in target net namespace
selftests: net: Add python context manager for netns entering
selftests: net: Add test cases for link and peer netns
drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 9 +-
drivers/net/amt.c | 11 +-
drivers/net/bareudp.c | 9 +-
drivers/net/bonding/bond_netlink.c | 6 +-
drivers/net/can/dev/netlink.c | 4 +-
drivers/net/can/vxcan.c | 7 +-
.../ethernet/qualcomm/rmnet/rmnet_config.c | 9 +-
drivers/net/geneve.c | 9 +-
drivers/net/gtp.c | 10 +-
drivers/net/ipvlan/ipvlan.h | 3 +-
drivers/net/ipvlan/ipvlan_main.c | 8 +-
drivers/net/ipvlan/ipvtap.c | 6 +-
drivers/net/macsec.c | 9 +-
drivers/net/macvlan.c | 21 +--
drivers/net/macvtap.c | 6 +-
drivers/net/netkit.c | 14 +-
drivers/net/pfcp.c | 9 +-
drivers/net/ppp/ppp_generic.c | 9 +-
drivers/net/team/team_core.c | 6 +-
drivers/net/veth.c | 7 +-
drivers/net/vrf.c | 5 +-
drivers/net/vxlan/vxlan_core.c | 9 +-
drivers/net/wireguard/device.c | 7 +-
drivers/net/wireless/virtual/virt_wifi.c | 8 +-
drivers/net/wwan/wwan_core.c | 16 +-
include/linux/if_macvlan.h | 6 +-
include/net/ip_tunnels.h | 5 +-
include/net/rtnetlink.h | 40 ++++-
net/8021q/vlan_netlink.c | 9 +-
net/batman-adv/soft-interface.c | 9 +-
net/bridge/br_netlink.c | 6 +-
net/caif/chnl_net.c | 5 +-
net/core/rtnetlink.c | 34 +++--
net/hsr/hsr_netlink.c | 12 +-
net/ieee802154/6lowpan/core.c | 7 +-
net/ipv4/ip_gre.c | 22 ++-
net/ipv4/ip_tunnel.c | 10 +-
net/ipv4/ip_vti.c | 9 +-
net/ipv4/ipip.c | 9 +-
net/ipv6/ip6_gre.c | 30 ++--
net/ipv6/ip6_tunnel.c | 19 ++-
net/ipv6/ip6_vti.c | 15 +-
net/ipv6/sit.c | 17 ++-
net/xfrm/xfrm_interface_core.c | 15 +-
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/config | 5 +
.../testing/selftests/net/lib/py/__init__.py | 2 +-
tools/testing/selftests/net/lib/py/netns.py | 18 +++
tools/testing/selftests/net/link_netns.py | 141 ++++++++++++++++++
tools/testing/selftests/net/netns-name.sh | 10 ++
50 files changed, 494 insertions(+), 179 deletions(-)
create mode 100755 tools/testing/selftests/net/link_netns.py
--
2.48.1
Greetings:
[Note: RESEND because apparently my gmail oauth2 client is breaking
patches I send]
Welcome to v7. Added CONFIG_XDP_SOCKETS to the selftests/driver/net
kernel config in patch 3. Updated the xdp_helper.c to return -1 on
AF_XDP non-existence, and updated queues.py to skip the test if AF_XDP
does not exist (but fail in other failure cases). Tested on kernels with
and without CONFIG_XDP_SOCKETS enabled.
This is an attempt to followup on something Jakub asked me about [1],
adding an xsk attribute to queues and more clearly documenting which
queues are linked to NAPIs...
After the RFC [2], Jakub suggested creating an empty nest for queues
which have a pool, so I've adjusted this version to work that way.
The nest can be extended in the future to express attributes about XSK
as needed. Queues which are not used for AF_XDP do not have the xsk
attribute present.
I've run the included test on:
- my mlx5 machine (via NETIF=)
- without setting NETIF
And the test seems to pass in both cases.
Thanks,
Joe
[1]: https://lore.kernel.org/netdev/20250113143109.60afa59a@kernel.org/
[2]: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
v7:
- Added CONFIG_XDP_SOCKETS to selftests/driver/net/config as suggested
by Stanislav.
- Updated xdp_helper.c to return -1 for AF_XDP non-existence, but 1
for other failures.
- Updated queues.py to mark test as skipped if AF_XDP does not exist.
v6: https://lore.kernel.org/bpf/20250210193903.16235-1-jdamato@fastly.com/
- Added ifdefs for CONFIG_XDP_SOCKETS in patch 2 as Stanislav
suggested.
v5: https://lore.kernel.org/bpf/20250208041248.111118-1-jdamato@fastly.com/
- Removed unused ret variable from patch 2 as Simon suggested.
v4: https://lore.kernel.org/lkml/20250207030916.32751-1-jdamato@fastly.com/
- Add patch 1, as suggested by Jakub, which adds an empty nest helper.
- Use the helper in patch 2, which makes the code cleaner and prevents
a possible bug.
v3: https://lore.kernel.org/netdev/20250204191108.161046-1-jdamato@fastly.com/
- Change comment format in patch 2 to avoid kdoc warnings. No other
changes.
v2: https://lore.kernel.org/all/20250203185828.19334-1-jdamato@fastly.com/
- Switched from RFC to actual submission now that net-next is open
- Adjusted patch 1 to include an empty nest as suggested by Jakub
- Adjusted patch 2 to update the test based on changes to patch 1, and
to incorporate some Python feedback from Jakub :)
rfc: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
Joe Damato (3):
netlink: Add nla_put_empty_nest helper
netdev-genl: Add an XSK attribute to queues
selftests: drv-net: Test queue xsk attribute
Documentation/netlink/specs/netdev.yaml | 13 ++-
include/net/netlink.h | 15 +++
include/uapi/linux/netdev.h | 6 ++
net/core/netdev-genl.c | 12 +++
tools/include/uapi/linux/netdev.h | 6 ++
.../testing/selftests/drivers/net/.gitignore | 2 +
tools/testing/selftests/drivers/net/Makefile | 3 +
tools/testing/selftests/drivers/net/config | 1 +
tools/testing/selftests/drivers/net/queues.py | 42 +++++++-
.../selftests/drivers/net/xdp_helper.c | 98 +++++++++++++++++++
10 files changed, 194 insertions(+), 4 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/.gitignore
create mode 100644 tools/testing/selftests/drivers/net/xdp_helper.c
base-commit: 4079918ae720e842ed7dff65fedeb9980b374995
--
2.43.0
Syzbot caught an array out-of-bounds bug [1]. It turns out that when the
BPF program runs through do_misc_fixups(), it allocates an extra 8 bytes
on the call stack, which eventually causes stack_depth to exceed 512.
I was able to reproduce this issue probabilistically by enabling
CONFIG_UBSAN=y and disabling CONFIG_BPF_JIT_ALWAYS_ON with the selfttest
I provide in second patch(although it doesn't happen every time - I didn't
dig deeper into why UBSAN behaves this way).
Furthermore, if I set /proc/sys/net/core/bpf_jit_enable to 0 to disable
the jit, a panic occurs, and the reason is the same, that bpf_func is
assigned an incorrect address.
[---[ end trace ]---
[Oops: general protection fault, probably for non-canonical address
0x100f0e0e0d090808: 0000 [#1] PREEMPT SMP NOPTI
[Tainted: [W]=WARN, [O]=OOT_MODULE
[RIP: 0010:bpf_test_run+0x1d2/0x360
[RSP: 0018:ffffafc7955178a0 EFLAGS: 00010246
[RAX: 100f0e0e0d090808 RBX: ffff8e9fdb2c4100 RCX: 0000000000000018
[RDX: 00000000002b5b18 RSI: ffffafc780497048 RDI: ffff8ea04d601700
[RBP: ffffafc780497000 R08: ffffafc795517a0c R09: 0000000000000000
[R10: 0000000000000000 R11: fefefefefefefeff R12: ffff8ea04d601700
[R13: ffffafc795517928 R14: ffffafc795517928 R15: 0000000000000000
[CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[CR2: 00007f181c064648 CR3: 00000001aa2be003 CR4: 0000000000770ef0
[DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
[PKRU: 55555554
[Call Trace:
[ <TASK>
[ ? die_addr+0x36/0x90
[ ? exc_general_protection+0x237/0x430
[ ? asm_exc_general_protection+0x26/0x30
[ ? bpf_test_run+0x1d2/0x360
[ ? bpf_test_run+0x10d/0x360
[ ? __link_object+0x12a/0x1e0
[ ? slab_build_skb+0x23/0x130
[ ? kmem_cache_alloc_noprof+0x2ea/0x3f0
[ ? sk_prot_alloc+0xc2/0x120
[ bpf_prog_test_run_skb+0x21b/0x590
[ __sys_bpf+0x340/0xa80
[ __x64_sys_bpf+0x1e/0x30
---
v1 -> v2:
Directly reject loading programs with a stack size greater than 512 when
jit disabled.(Suggested by Alexei Starovoitov)
https://lore.kernel.org/bpf/20250212135251.85487-1-mrpre@163.com/T/#u
---
Jiayuan Chen (3):
bpf: Fix array bounds error with may_goto
selftests/bpf: Allow the program to select specific modes for testing
selftests/bpf: Add selftest for may_goto
kernel/bpf/core.c | 18 +++++--
kernel/bpf/verifier.c | 7 +++
tools/testing/selftests/bpf/progs/bpf_misc.h | 2 +
.../selftests/bpf/progs/verifier_stack_ptr.c | 50 +++++++++++++++++++
tools/testing/selftests/bpf/test_loader.c | 27 ++++++++++
5 files changed, 100 insertions(+), 4 deletions(-)
--
2.47.1
On 2/13/25 3:32 AM, Li Wang wrote:
> Hi John,
>
> On Thu, Feb 13, 2025 at 6:31 AM John Hubbard <jhubbard(a)nvidia.com <mailto:jhubbard@nvidia.com>> wrote:
>
> On 2/12/25 12:34 PM, Dave Hansen wrote:
> > Hi John,
> >
> > On 6/13/24 19:30, John Hubbard wrote:
> >> --- a/tools/testing/selftests/mm/protection_keys.c
> >> +++ b/tools/testing/selftests/mm/protection_keys.c
> >> @@ -42,7 +42,7 @@
> >> #include <sys/wait.h>
> >> #include <sys/stat.h>
> >> #include <fcntl.h>
> >> -#include <unistd.h>
> >> +#include <linux/unistd.h>
> >> #include <sys/ptrace.h>
> >> #include <setjmp.h>
> >
> > I'm not quite sure how but this broke the protection_keys.c selftest for
> > me. Before this commit (a5c6bc590094a1a73cf6fa3f505e1945d2bf2461) things
> > are fine. But after, I get:
> >
> > running PKEY tests for unsupported CPU/OS
> >
> > The "unsupported" test just makes a pkey_alloc() syscall. It's probably
> > calling the wrong syscall number or something.
> >
> > I think it's still broken in mainline. What's the right fix?
>
> omg I think this is an asm-generic include mistake, I'll check
> on it in an hour or so, in more depth.
>
>
> I just found that mlock2_() return a wrong valuein mlock2-test,
> I guess that was caused by including the wrong header file
> <asm-generic/unistd.h>,which might define a different syscall
> number than what the kernel uses on the test system.
Agreed.
>
> Shouldn't we make use of <unistd.h> directly?
Well, yes and no. For now, there appear to be two commits involved
in causing these problems, and the __NR_* parts need to be reverted.
I'll explain more when I post later today, but for the moment, the
first, mseal- related commit below has some hints about how we got
here:
504d8a5e0fd4 selftests/mm: mseal, self_elf: fix missing __NR_mseal
a5c6bc590094 selftests/mm: remove local __NR_* definitions
thanks,
--
John Hubbard
The quiet infrastructure was moved out of Makefile.build to accomidate
the new syscall table generation scripts in perf. Syscall table
generation wanted to also be able to be quiet, so instead of again
copying the code to set the quiet variables, the code was moved into
Makefile.perf to be used globally. This was not the right solution. It
should have been moved even further upwards in the call chain.
Makefile.include is imported in many files so this seems like a proper
place to put it.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
---
Changes in v2:
- Fix spacing around Q= (Andrii)
- Link to v1: https://lore.kernel.org/r/20250203-quiet_tools-v1-0-d25c8956e59a@rivosinc.c…
---
Charlie Jenkins (2):
tools: Unify top-level quiet infrastructure
tools: Remove redundant quiet setup
tools/arch/arm64/tools/Makefile | 6 -----
tools/bpf/Makefile | 6 -----
tools/bpf/bpftool/Documentation/Makefile | 6 -----
tools/bpf/bpftool/Makefile | 6 -----
tools/bpf/resolve_btfids/Makefile | 2 --
tools/bpf/runqslower/Makefile | 5 +---
tools/build/Makefile | 8 +-----
tools/lib/bpf/Makefile | 13 ----------
tools/lib/perf/Makefile | 13 ----------
tools/lib/thermal/Makefile | 13 ----------
tools/objtool/Makefile | 6 -----
tools/perf/Makefile.perf | 41 -------------------------------
tools/scripts/Makefile.include | 31 ++++++++++++++++++++++-
tools/testing/selftests/bpf/Makefile.docs | 6 -----
tools/testing/selftests/hid/Makefile | 2 --
tools/thermal/lib/Makefile | 13 ----------
tools/tracing/latency/Makefile | 6 -----
tools/tracing/rtla/Makefile | 6 -----
tools/verification/rv/Makefile | 6 -----
19 files changed, 32 insertions(+), 163 deletions(-)
---
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
change-id: 20250203-quiet_tools-9a6ea9d65a19
--
- Charlie
When working on OpenRISC support for restartable sequences I noticed
and fixed these two issues with the riscv support bits.
1 The 'inc' argument to RSEQ_ASM_OP_R_DEREF_ADDV was being implicitly
passed to the macro. Fix this by adding 'inc' to the list of macro
arguments.
2 The inline asm input constraints for 'inc' and 'off' use "er", The
riscv gcc port does not have an "e" constraint, this looks to be
copied from the x86 port. Fix this by just using an "r" constraint.
I have compile tested this only for riscv. However, the same fixes I
use in the OpenRISC rseq selftests and everything passes with no issues.
Fixes: 171586a6ab66 ("selftests/rseq: riscv: Template memory ordering and percpu access mode")
Signed-off-by: Stafford Horne <shorne(a)gmail.com>
Tested-by: Charlie Jenkins <charlie(a)rivosinc.com>
Reviewed-by: Charlie Jenkins <charlie(a)rivosinc.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Acked-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
Since v1:
- Added Fixes, Tested-by, Reviewed-by etc.
tools/testing/selftests/rseq/rseq-riscv-bits.h | 6 +++---
tools/testing/selftests/rseq/rseq-riscv.h | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/rseq/rseq-riscv-bits.h b/tools/testing/selftests/rseq/rseq-riscv-bits.h
index de31a0143139..f02f411d550d 100644
--- a/tools/testing/selftests/rseq/rseq-riscv-bits.h
+++ b/tools/testing/selftests/rseq/rseq-riscv-bits.h
@@ -243,7 +243,7 @@ int RSEQ_TEMPLATE_IDENTIFIER(rseq_offset_deref_addv)(intptr_t *ptr, off_t off, i
#ifdef RSEQ_COMPARE_TWICE
RSEQ_ASM_CMP_CPU_ID(cpu_id, current_cpu_id, "%l[error1]")
#endif
- RSEQ_ASM_OP_R_DEREF_ADDV(ptr, off, 3)
+ RSEQ_ASM_OP_R_DEREF_ADDV(ptr, off, inc, 3)
RSEQ_INJECT_ASM(4)
RSEQ_ASM_DEFINE_ABORT(4, abort)
: /* gcc asm goto does not allow outputs */
@@ -251,8 +251,8 @@ int RSEQ_TEMPLATE_IDENTIFIER(rseq_offset_deref_addv)(intptr_t *ptr, off_t off, i
[current_cpu_id] "m" (rseq_get_abi()->RSEQ_TEMPLATE_CPU_ID_FIELD),
[rseq_cs] "m" (rseq_get_abi()->rseq_cs.arch.ptr),
[ptr] "r" (ptr),
- [off] "er" (off),
- [inc] "er" (inc)
+ [off] "r" (off),
+ [inc] "r" (inc)
RSEQ_INJECT_INPUT
: "memory", RSEQ_ASM_TMP_REG_1
RSEQ_INJECT_CLOBBER
diff --git a/tools/testing/selftests/rseq/rseq-riscv.h b/tools/testing/selftests/rseq/rseq-riscv.h
index 37e598d0a365..67d544aaa9a3 100644
--- a/tools/testing/selftests/rseq/rseq-riscv.h
+++ b/tools/testing/selftests/rseq/rseq-riscv.h
@@ -158,7 +158,7 @@ do { \
"bnez " RSEQ_ASM_TMP_REG_1 ", 222b\n" \
"333:\n"
-#define RSEQ_ASM_OP_R_DEREF_ADDV(ptr, off, post_commit_label) \
+#define RSEQ_ASM_OP_R_DEREF_ADDV(ptr, off, inc, post_commit_label) \
"mv " RSEQ_ASM_TMP_REG_1 ", %[" __rseq_str(ptr) "]\n" \
RSEQ_ASM_OP_R_ADD(off) \
REG_L RSEQ_ASM_TMP_REG_1 ", 0(" RSEQ_ASM_TMP_REG_1 ")\n" \
--
2.47.0
Jeff Xu, I apologize for this churn: I was forced to drop your
Reviewed-by and Tested-by tags from 2 of the 3 mseal patches, because
the __NR_mseal fix is completely different now.
Changes since v1:
a) Reworked the mseal fix to use the kernel's in-tree unistd*.h files,
instead of hacking in a __NR_mseal definition directly. (Thanks to David
Hildenbrand for pointing out that this needed to be done.)
b) Fixed the subject line of the kvm and mdwe patch.
c) Reordered the patches so as to group the mseal changes together.
d) ADDED an additional patch, 6/6, to remove various __NR_xx items and
checks from the mm selftests.
Cover letter, updated for v2:
Eventually, once the build succeeds on a sufficiently old distro, the
idea is to delete $(KHDR_INCLUDES) from the selftests/mm build, and then
after that, from selftests/lib.mk and all of the other selftest builds.
For now, this series merely achieves a clean build of selftests/mm on a
not-so-old distro: Ubuntu 23.04. In other words, after this series is
applied, it is possible to delete $(KHDR_INCLUDES) from
selftests/mm/Makefile and the build will still succeed.
1. Add tools/uapi/asm/unistd_[32|x32|64].h files, which include
definitions of __NR_mseal, and include them (indirectly) from the files
that use __NR_mseal. The new files are copied from ./usr/include/asm,
which is how we have agreed to do this sort of thing, see [1].
2. Add fs.h, similarly created: it was copied directly from a snapshot
of ./usr/include/linux/fs.h after running "make headers".
3. Add a few selected prctl.h values that the ksm and mdwe tests require.
4. Factor out some common code from mseal_test.c and seal_elf.c, into a
new mseal_helpers.h file.
5. Remove local __NR_* definitions and checks.
[1] commit e076eaca5906 ("selftests: break the dependency upon local
header files")
John Hubbard (6):
selftests/mm: mseal, self_elf: fix missing __NR_mseal
selftests/mm: mseal, self_elf: factor out test macros and other
duplicated items
selftests/mm: mseal, self_elf: rename TEST_END_CHECK to
REPORT_TEST_PASS
selftests/mm: fix vm_util.c build failures: add snapshot of fs.h
selftests/mm: kvm, mdwe fixes to avoid requiring "make headers"
selftests/mm: remove local __NR_* definitions
tools/include/uapi/asm/unistd_32.h | 458 ++++++++++++++++++
tools/include/uapi/asm/unistd_64.h | 380 +++++++++++++++
tools/include/uapi/asm/unistd_x32.h | 369 ++++++++++++++
tools/include/uapi/linux/fs.h | 392 +++++++++++++++
tools/testing/selftests/mm/hugepage-mremap.c | 2 +-
.../selftests/mm/ksm_functional_tests.c | 8 +-
tools/testing/selftests/mm/mdwe_test.c | 1 +
tools/testing/selftests/mm/memfd_secret.c | 14 +-
tools/testing/selftests/mm/mkdirty.c | 8 +-
tools/testing/selftests/mm/mlock2.h | 1 +
tools/testing/selftests/mm/mrelease_test.c | 2 +-
tools/testing/selftests/mm/mseal_helpers.h | 41 ++
tools/testing/selftests/mm/mseal_test.c | 143 ++----
tools/testing/selftests/mm/pagemap_ioctl.c | 2 +-
tools/testing/selftests/mm/protection_keys.c | 2 +-
tools/testing/selftests/mm/seal_elf.c | 37 +-
tools/testing/selftests/mm/uffd-common.c | 4 -
tools/testing/selftests/mm/uffd-stress.c | 16 +-
tools/testing/selftests/mm/uffd-unit-tests.c | 14 +-
tools/testing/selftests/mm/vm_util.h | 15 +
20 files changed, 1717 insertions(+), 192 deletions(-)
create mode 100644 tools/include/uapi/asm/unistd_32.h
create mode 100644 tools/include/uapi/asm/unistd_64.h
create mode 100644 tools/include/uapi/asm/unistd_x32.h
create mode 100644 tools/include/uapi/linux/fs.h
create mode 100644 tools/testing/selftests/mm/mseal_helpers.h
base-commit: 2ccbdf43d5e758f8493a95252073cf9078a5fea5
--
2.45.2
Hey all,
We are making these changes as part of a KUnit Hackathon at LKCamp [1].
This patch sets out to refactor fs/unicode/utf8-selftest.c to KUnit tests.
The main benefit of this change is that we can leverage KUnit's
test suite for quickly compiling and testing the functions in utf8,
instead of compiling the kernel and loading the previous utf8-selftest
module, as well as adopting a pattern across all kernel tests.
The first commit is the refactoring itself from self test into KUnit,
which kept the original test logic intact -- maintaining the purpose
of the original tests -- with the added benefit of including these
tests into the KUnit test suite.
The second commit applies the naming style and file path conventions
defined on Documentation/dev-tools/kunit/style.rst
We appreciate any feedback and suggestions. :)
[1] https://lkcamp.dev/about/
Co-developed-by: Pedro Orlando <porlando(a)lkcamp.dev>
Signed-off-by: Pedro Orlando <porlando(a)lkcamp.dev>
Co-developed-by: Danilo Pereira <dpereira(a)lkcamp.dev>
Signed-off-by: Danilo Pereira <dpereira(a)lkcamp.dev>
Signed-off-by: Gabriela Bittencourt <gbittencourt(a)lkcamp.dev>
Gabriela Bittencourt (2):
unicode: kunit: refactor selftest to kunit tests
unicode: kunit: change tests filename and path
fs/unicode/Kconfig | 5 +-
fs/unicode/Makefile | 2 +-
fs/unicode/tests/.kunitconfig | 3 +
.../{utf8-selftest.c => tests/utf8_kunit.c} | 149 ++++++++----------
4 files changed, 76 insertions(+), 83 deletions(-)
create mode 100644 fs/unicode/tests/.kunitconfig
rename fs/unicode/{utf8-selftest.c => tests/utf8_kunit.c} (64%)
--
2.34.1
The kselftest-vDSO/vdso_test_abi test encounters failures when built
and executed using GCC-13 on armv7 architecture.
The issue has been observed on both TI X15 devices and QEMU-armv7.
Interestingly, the same test passes without any issues when built using
Clang-19.This failure is specific to GCC-13. but test case should have
been failed on clang toolchain but it did not.
The Clang toolchain likely defaults to the traditional 32-bit time_t on
32-bit Debian Trixie (except x86), leading to an ABI issue similar to
older compilers. This is affecting compatibility.
This is not a new regression,
this report generated on Linux next but also seen on Linux stable tree.
Test name: kselftest-vDSO/vdso_test_abi
Fails: gcc-13
pass: clang-19
Arnd Bergmann investigated and proposed a patch which fixes the problem.
Test regression: selftest-vDSO/vdso_test_abi:
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
# selftests: vDSO: vdso_test_abi
# TAP version 13
# 1..16
# # [vDSO kselftest] VDSO_VERSION: LINUX_2.6
# # The time is 184285885525887.-5268386858514840748
# ok 1 __vdso_gettimeofday
# # clock_id: CLOCK_REALTIME
# # The time is 184287230099877759.-1225334784
# ok 2 __vdso_clock_gettime CLOCK_REALTIME
# # The vdso resolution is 4294967296 -1225335584
# # The syscall resolution is 4294967296 -1225334784
# not ok 3 __vdso_clock_getres CLOCK_REALTIME
# # clock_id: CLOCK_BOOTTIME
# # The time is 3253049749439251215.-1225334784
# ok 4 __vdso_clock_gettime CLOCK_BOOTTIME
# # The vdso resolution is 4294967296 -1225335584
# # The syscall resolution is 4294967296 -1225334784
# not ok 5 __vdso_clock_getres CLOCK_BOOTTIME
# # clock_id: CLOCK_TAI
# # The time is 184549377723756415.-1225334784
# ok 6 __vdso_clock_gettime CLOCK_TAI
# # The vdso resolution is 4294967296 -1225335584
# # The syscall resolution is 4294967296 -1225334784
# not ok 7 __vdso_clock_getres CLOCK_TAI
# # clock_id: CLOCK_REALTIME_COARSE
# # The time is 128974845965660031.-1225334784
# ok 8 __vdso_clock_gettime CLOCK_REALTIME_COARSE
# # The vdso resolution is 42949672960000000 -1225335584
# # The syscall resolution is 42949672960000000 -1225334784
# not ok 9 __vdso_clock_getres CLOCK_REALTIME_COARSE
# # clock_id: CLOCK_MONOTONIC
# # The time is 3253311892768162575.-1225334784
# ok 10 __vdso_clock_gettime CLOCK_MONOTONIC
# # The vdso resolution is 4294967296 -1225335584
# # The syscall resolution is 4294967296 -1225334784
# not ok 11 __vdso_clock_getres CLOCK_MONOTONIC
# # clock_id: CLOCK_MONOTONIC_RAW
# # The time is 3253442966580101903.-1225334784
# ok 12 __vdso_clock_gettime CLOCK_MONOTONIC_RAW
# # The vdso resolution is 4294967296 -1225335584
# # The syscall resolution is 4294967296 -1225334784
# not ok 13 __vdso_clock_getres CLOCK_MONOTONIC_RAW
# # clock_id: CLOCK_MONOTONIC_COARSE
# # The time is 3197606291493094159.-1225334784
# ok 14 __vdso_clock_gettime CLOCK_MONOTONIC_COARSE
# # The vdso resolution is 42949672960000000 -1225335584
# # The syscall resolution is 42949672960000000 -1225334784
# not ok 15 __vdso_clock_getres CLOCK_MONOTONIC_COARSE
# # Couldn't find __vdso_time
# ok 16 # SKIP __vdso_time
# # 1 skipped test(s) detected. Consider enabling relevant config
options to improve coverage.
# # Totals: pass:8 fail:7 xfail:0 xpass:0 skip:1 error:0
not ok 3 selftests: vDSO: vdso_test_abi # exit=1
## Build
* test log: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241126/te…
* TI x15 test log:
https://lkft.validation.linaro.org/scheduler/job/8117196#L7091
* build link: https://storage.tuxsuite.com/public/linaro/lkft/builds/2spp0E2mMNQq16RHsbzL…
## Source
* git tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
* git sha: df5d6180169ae06a2eac57e33b077ad6f6252440 (this not new issue)
* architectures: armv7
* toolchain: gcc-13 (13.3.0) - failed
* toolchain: clang ( version: 19.1.7) - pass
* devices: Armv7 TI beaglebone, qemu-armv7
--
Linaro LKFT
https://lkft.linaro.org
Greetings:
Welcome to v7. Added CONFIG_XDP_SOCKETS to the selftests/driver/net
kernel config in patch 3. Updated the xdp_helper.c to return -1 on
AF_XDP non-existence, and updated queues.py to skip the test if AF_XDP
does not exist (but fail in other failure cases). Tested on kernels with
and without CONFIG_XDP_SOCKETS enabled.
This is an attempt to followup on something Jakub asked me about [1],
adding an xsk attribute to queues and more clearly documenting which
queues are linked to NAPIs...
After the RFC [2], Jakub suggested creating an empty nest for queues
which have a pool, so I've adjusted this version to work that way.
The nest can be extended in the future to express attributes about XSK
as needed. Queues which are not used for AF_XDP do not have the xsk
attribute present.
I've run the included test on:
- my mlx5 machine (via NETIF=)
- without setting NETIF
And the test seems to pass in both cases.
Thanks,
Joe
[1]: https://lore.kernel.org/netdev/20250113143109.60afa59a@kernel.org/
[2]: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
v7:
- Added CONFIG_XDP_SOCKETS to selftests/driver/net/config as suggested
by Stanislav.
- Updated xdp_helper.c to return -1 for AF_XDP non-existence, but 1
for other failures.
- Updated queues.py to mark test as skipped if AF_XDP does not exist.
v6: https://lore.kernel.org/bpf/20250210193903.16235-1-jdamato@fastly.com/
- Added ifdefs for CONFIG_XDP_SOCKETS in patch 2 as Stanislav
suggested.
v5: https://lore.kernel.org/bpf/20250208041248.111118-1-jdamato@fastly.com/
- Removed unused ret variable from patch 2 as Simon suggested.
v4: https://lore.kernel.org/lkml/20250207030916.32751-1-jdamato@fastly.com/
- Add patch 1, as suggested by Jakub, which adds an empty nest helper.
- Use the helper in patch 2, which makes the code cleaner and prevents
a possible bug.
v3: https://lore.kernel.org/netdev/20250204191108.161046-1-jdamato@fastly.com/
- Change comment format in patch 2 to avoid kdoc warnings. No other
changes.
v2: https://lore.kernel.org/all/20250203185828.19334-1-jdamato@fastly.com/
- Switched from RFC to actual submission now that net-next is open
- Adjusted patch 1 to include an empty nest as suggested by Jakub
- Adjusted patch 2 to update the test based on changes to patch 1, and
to incorporate some Python feedback from Jakub :)
rfc: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
Joe Damato (3):
netlink: Add nla_put_empty_nest helper
netdev-genl: Add an XSK attribute to queues
selftests: drv-net: Test queue xsk attribute
Documentation/netlink/specs/netdev.yaml | 13 ++-
include/net/netlink.h | 15 +++
include/uapi/linux/netdev.h | 6 ++
net/core/netdev-genl.c | 12 +++
tools/include/uapi/linux/netdev.h | 6 ++
.../testing/selftests/drivers/net/.gitignore | 2 +
tools/testing/selftests/drivers/net/Makefile | 3 +
tools/testing/selftests/drivers/net/config | 1 +
tools/testing/selftests/drivers/net/queues.py | 42 +++++++-
.../selftests/drivers/net/xdp_helper.c | 98 +++++++++++++++++++
10 files changed, 194 insertions(+), 4 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/.gitignore
create mode 100644 tools/testing/selftests/drivers/net/xdp_helper.c
base-commit: 4e41231249f4083a095085ff86e317e29313c2c3
--
2.43.0
Greetings
Welcome to v6. Added ifdefs to patch 2 and built with CONFIG_XDP_SOCKETS
both enabled and disabled to confirm it builds cleanly. No other
changes.
This is an attempt to followup on something Jakub asked me about [1],
adding an xsk attribute to queues and more clearly documenting which
queues are linked to NAPIs...
After the RFC [2], Jakub suggested creating an empty nest for queues
which have a pool, so I've adjusted this version to work that way.
The nest can be extended in the future to express attributes about XSK
as needed. Queues which are not used for AF_XDP do not have the xsk
attribute present.
I've run the included test on:
- my mlx5 machine (via NETIF=)
- without setting NETIF
And the test seems to pass in both cases.
Thanks,
Joe
[1]: https://lore.kernel.org/netdev/20250113143109.60afa59a@kernel.org/
[2]: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
v6:
- Added ifdefs for CONFIG_XDP_SOCKETS in patch 2 as Stanislav
suggested.
v5: https://lore.kernel.org/bpf/20250208041248.111118-1-jdamato@fastly.com/
- Removed unused ret variable from patch 2 as Simon suggested.
v4: https://lore.kernel.org/lkml/20250207030916.32751-1-jdamato@fastly.com/
- Add patch 1, as suggested by Jakub, which adds an empty nest helper.
- Use the helper in patch 2, which makes the code cleaner and prevents
a possible bug.
v3: https://lore.kernel.org/netdev/20250204191108.161046-1-jdamato@fastly.com/
- Change comment format in patch 2 to avoid kdoc warnings. No other
changes.
v2: https://lore.kernel.org/all/20250203185828.19334-1-jdamato@fastly.com/
- Switched from RFC to actual submission now that net-next is open
- Adjusted patch 1 to include an empty nest as suggested by Jakub
- Adjusted patch 2 to update the test based on changes to patch 1, and
to incorporate some Python feedback from Jakub :)
rfc: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
Joe Damato (3):
netlink: Add nla_put_empty_nest helper
netdev-genl: Add an XSK attribute to queues
selftests: drv-net: Test queue xsk attribute
Documentation/netlink/specs/netdev.yaml | 13 ++-
include/net/netlink.h | 15 ++++
include/uapi/linux/netdev.h | 6 ++
net/core/netdev-genl.c | 12 +++
tools/include/uapi/linux/netdev.h | 6 ++
.../testing/selftests/drivers/net/.gitignore | 2 +
tools/testing/selftests/drivers/net/Makefile | 3 +
tools/testing/selftests/drivers/net/queues.py | 35 +++++++-
.../selftests/drivers/net/xdp_helper.c | 89 +++++++++++++++++++
9 files changed, 178 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/.gitignore
create mode 100644 tools/testing/selftests/drivers/net/xdp_helper.c
base-commit: 39f54262ba499d862420a97719d2f0eea0cbd394
--
2.43.0
Add basic support to run various MIPS variants via kunit_tool using the
virtualized malta platform.
Various kunit tests from drivers/firmware/cirrus/ are failing on MIPS.
They are fixed in [0].
[0] https://lore.kernel.org/lkml/20250211-cs_dsp-kunit-strings-v1-1-d9bc2035d15…
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Thomas Weißschuh (2):
MIPS: mm: Avoid blocking DMA zone with memory map memblock allocation
kunit: qemu_configs: Add MIPS configurations
arch/mips/mm/init.c | 2 ++
tools/testing/kunit/qemu_configs/mips.py | 18 ++++++++++++++++++
tools/testing/kunit/qemu_configs/mips64.py | 19 +++++++++++++++++++
tools/testing/kunit/qemu_configs/mips64el.py | 19 +++++++++++++++++++
tools/testing/kunit/qemu_configs/mipsel.py | 18 ++++++++++++++++++
5 files changed, 76 insertions(+)
---
base-commit: 6e24361511062dba8c5f7e59d51b29cdfa859523
change-id: 20241014-kunit-mips-e4fe1c265ed7
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Use POSIX-conformant expression operator symbol '='.
The use of the non POSIX-conformant symbol '==' would work
in bash, but not in sh where the unexpected operator error
would result in test_smoke.sh being skipped.
Instead of changing the shebang to use bash, which may not be
available on all systems, use the POSIX-conformant expression
symbol '=' to test for equality.
Without this patch:
===================
# make -j8 TARGETS=tpm2 kselftest
# selftests: tpm2: test_smoke.sh
# ./test_smoke.sh: 9: [: 2: unexpected operator
ok 1 selftests: tpm2: test_smoke.sh # SKIP
With this patch:
================
# make -j8 TARGETS=tpm2 kselftest
# selftests: tpm2: test_smoke.sh
# Ran 9 tests in 9.236s
ok 1 selftests: tpm2: test_smoke.sh
Signed-off-by: Ahmed Salem <x0rw3ll(a)gmail.com>
---
Changes in v4:
- Include relevant parts of the script showing
the issue before and after the patch
Changes in v3:
- Reword mistaken commit message
Changes in v2:
- Remove snippets pinpointing the issue
from commit message
tools/testing/selftests/tpm2/test_smoke.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/tpm2/test_smoke.sh b/tools/testing/selftests/tpm2/test_smoke.sh
index 168f4b166234..3a60e6c6f5c9 100755
--- a/tools/testing/selftests/tpm2/test_smoke.sh
+++ b/tools/testing/selftests/tpm2/test_smoke.sh
@@ -6,6 +6,6 @@ ksft_skip=4
[ -e /dev/tpm0 ] || exit $ksft_skip
read tpm_version < /sys/class/tpm/tpm0/tpm_version_major
-[ "$tpm_version" == 2 ] || exit $ksft_skip
+[ "$tpm_version" = 2 ] || exit $ksft_skip
python3 -m unittest -v tpm2_tests.SmokeTest 2>&1
base-commit: 34c26c5e1cb0cc7f54c1b89c6e9168cad8ffb580
--
2.47.2
This patch fixes a grammatical error in a test log message in
reuseaddr_ports_exhausted.c for better clarity as a part of lfx
application tasks
Signed-off-by: Pranav Tyagi <pranav.tyagi03(a)gmail.com>
---
tools/testing/selftests/net/reuseaddr_ports_exhausted.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/reuseaddr_ports_exhausted.c b/tools/testing/selftests/net/reuseaddr_ports_exhausted.c
index 066efd30e294..7b9bf8a7bbe1 100644
--- a/tools/testing/selftests/net/reuseaddr_ports_exhausted.c
+++ b/tools/testing/selftests/net/reuseaddr_ports_exhausted.c
@@ -112,7 +112,7 @@ TEST(reuseaddr_ports_exhausted_reusable_same_euid)
ASSERT_NE(-1, fd[0]) TH_LOG("failed to bind.");
if (opts->reuseport[0] && opts->reuseport[1]) {
- EXPECT_EQ(-1, fd[1]) TH_LOG("should fail to bind because both sockets succeed to be listened.");
+ EXPECT_EQ(-1, fd[1]) TH_LOG("should fail to bind because both sockets successfully listened.");
} else {
EXPECT_NE(-1, fd[1]) TH_LOG("should succeed to bind to connect to different destinations.");
}
--
2.47.1
From: Jeff Xu <jeffxu(a)chromium.org>
This change creates the initial version of memorysealing.c.
The introduction of memorysealing.c, which replaces mseal_test.c and
uses the kselftest_harness, aims to initiate a discussion on using the
selftest harness for memory sealing tests. Upon approval of this
approach, the migration of tests from mseal_test.c to memorysealing.c
can be implemented in a step-by-step manner.
This tests addresses following feedbacks from previous reviews:
1> Use kselftest_harness instead of custom macro, such as EXPECT_XX,
ASSERT_XX, etc. (Lorenzo Stoakes, Mark Brown, etc) [1]
2> Use MAP_FAILED to check the return of mmap (Lorenzo Stoakes).
3> Adding a check for vma size and prot bits. The discussion for
this can be found in [2] [3], here is a brief summary:
This is to follow up on Pedro’s in-loop change (from
can_modify_mm to can_modify_vma). When mseal_test is initially
created, they have a common pattern: setup memory layout,
seal the memory, perform a few mm-api steps, verify return code
(not zero). Because of the nature of out-of-loop, it is sufficient
to just verify the error code in a few cases.
With Pedro's in-loop change, the sealing check happens later in the
stack, thus there are more things and scenarios to verify. And there
were feedbacks to me that mseal_test should be extensive enough to
discover all regressions. Hence I'm adding check for vma size and prot
bits.
In this change: we created two fixtures:
Fixture basic: This creates a single VMA, the VMA has a
PROT_NONE page at each end to prevent auto-merging.
Fixture wo_vma: Two VMAs back to end, a PROT_NONE page at each
end to prevent auto-merging.
In addition, I add one test (mprotec) in each fixture for discussion.
[1] https://lore.kernel.org/all/20240830180237.1220027-5-jeffxu@chromium.org/
[2] https://lore.kernel.org/all/CABi2SkUgDZtJtRJe+J9UNdtZn=EQzZcbMB685P=1rR7DUh…
[3] https://lore.kernel.org/all/2qywbjb5ebtgwkh354w3lj3vhaothvubjokxq5fhyri5jee…
Jeff Xu (1):
selftest/mm: refactor mseal_test
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/memorysealing.c | 182 +++++++++++++++++++++
tools/testing/selftests/mm/memorysealing.h | 116 +++++++++++++
tools/testing/selftests/mm/mseal_test.c | 67 +-------
5 files changed, 301 insertions(+), 66 deletions(-)
create mode 100644 tools/testing/selftests/mm/memorysealing.c
create mode 100644 tools/testing/selftests/mm/memorysealing.h
--
2.47.1.613.gc27f4b7a9f-goog
It's useful to build samples/* with UML and the only blocker is the
artificial incompatibility with CONFIG_HEADERS_INSTALL.
Allow the headers_install target with ARCH=um, which then allow building
samples (and tests using them) with UML too:
printf 'CONFIG_SAMPLES=y\nCONFIG_HEADERS_INSTALL=y\nCONFIG_SAMPLE_LANDLOCK=y\n' >.config
make ARCH=um olddefconfig headers_install
make ARCH=um samples/landlock/
Cc: Anton Ivanov <anton.ivanov(a)cambridgegreys.com>
Cc: Johannes Berg <johannes(a)sipsolutions.net>
Cc: Masahiro Yamada <masahiroy(a)kernel.org>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Nicolas Schier <nicolas(a)fjasle.eu>
Cc: Richard Weinberger <richard(a)nod.at>
Fixes: 1b620d539ccc ("kbuild: disable header exports for UML in a straightforward way")
Signed-off-by: Mickaël Salaün <mic(a)digikod.net>
---
Makefile | 1 -
lib/Kconfig.debug | 1 -
2 files changed, 2 deletions(-)
diff --git a/Makefile b/Makefile
index e5b8a8832c0c..6e2cce16a2a3 100644
--- a/Makefile
+++ b/Makefile
@@ -1355,7 +1355,6 @@ hdr-inst := -f $(srctree)/scripts/Makefile.headersinst obj
PHONY += headers
headers: $(version_h) scripts_unifdef uapi-asm-generic archheaders archscripts
- $(if $(filter um, $(SRCARCH)), $(error Headers not exportable for UML))
$(Q)$(MAKE) $(hdr-inst)=include/uapi
$(Q)$(MAKE) $(hdr-inst)=arch/$(SRCARCH)/include/uapi
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index f3d723705879..fac1208f48e4 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -473,7 +473,6 @@ config READABLE_ASM
config HEADERS_INSTALL
bool "Install uapi headers to usr/include"
- depends on !UML
help
This option will install uapi headers (headers exported to user-space)
into the usr/include directory for use during the kernel build.
--
2.47.1
hile taking a look at '[PATCH net] pktgen: Avoid out-of-range in
get_imix_entries' ([1]) and '[PATCH net v2] pktgen: Avoid out-of-bounds access
in get_imix_entries' ([2], [3]) and doing some tests and code review I
detected that the /proc/net/pktgen/... parsing logic does not honour the
user given buffer bounds (resulting in out-of-bounds access).
This can be observed e.g. by the following simple test (sometimes the
old/'longer' previous value is re-read from the buffer):
$ echo add_device lo@0 > /proc/net/pktgen/kpktgend_0
$ echo "min_pkt_size 12345" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 12345 max_pkt_size: 0
Result: OK: min_pkt_size=12345
$ echo -n "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 12345 max_pkt_size: 0
Result: OK: min_pkt_size=12345
$ echo "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 123 max_pkt_size: 0
Result: OK: min_pkt_size=123
So fix the out-of-bounds access (and some minor findings) and add a simple
proc_net_pktgen selftest...
Regards,
Peter
Changes v3 -> v4:
- add rev-by Simon Horman
- new patch 'net: pktgen: use defines for the various dec/hex number parsing
digits lengths' (suggested by Simon Horman)
- replace C99 comment (suggested by Paolo Abeni)
- drop available characters check in strn_len() (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: align some variable declarations to the
most common pattern' (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: remove extra tmp variable (re-use len
instead)' (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: remove some superfluous variable
initializing' (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: fix mpls maximum labels list parsing'
(suggested by Paolo Abeni)
- factored out 'net: pktgen: hex32_arg/num_arg error out in case no
characters are available' (suggested by Paolo Abeni)
- factored out 'net: pktgen: num_arg error out in case no valid character
is parsed' (suggested by Paolo Abeni)
Changes v2 -> v3:
- new patch: 'net: pktgen: fix ctrl interface command parsing'
- new patch: 'net: pktgen: fix mpls reset parsing'
- tools/testing/selftests/net/proc_net_pktgen.c:
- fix typo in change description ('v1 -> v1' and tyop)
- rename some vars to better match usage
add_loopback_0 -> thr_cmd_add_loopback_0
rm_loopback_0 -> thr_cmd_rm_loopback_0
wrong_ctrl_cmd -> wrong_thr_cmd
legacy_ctrl_cmd -> legacy_thr_cmd
ctrl_fd -> thr_fd
- add ctrl interface tests
Changes v1 -> v2:
- new patch: 'net: pktgen: fix hex32_arg parsing for short reads'
- new patch: 'net: pktgen: fix 'rate 0' error handling (return -EINVAL)'
- new patch: 'net: pktgen: fix 'ratep 0' error handling (return -EINVAL)'
- net/core/pktgen.c: additional fix get_imix_entries() and get_labels()
- tools/testing/selftests/net/proc_net_pktgen.c:
- fix tyop not vs. nod (suggested by Jakub Kicinski)
- fix misaligned line (suggested by Jakub Kicinski)
- enable fomerly commented out CONFIG_XFRM dependent test (command spi),
as CONFIG_XFRM is enabled via tools/testing/selftests/net/config
CONFIG_XFRM_INTERFACE/CONFIG_XFRM_USER (suggestex by Jakub Kicinski)
- add CONFIG_NET_PKTGEN=m to tools/testing/selftests/net/config
(suggested by Jakub Kicinski)
- add modprobe pktgen to FIXTURE_SETUP() (suggested by Jakub Kicinski)
- fix some checkpatch warnings (Missing a blank line after declarations)
- shrink line length by re-naming some variables (command -> cmd,
device -> dev)
- add 'rate 0' testcase
- add 'ratep 0' testcase
[1] https://lore.kernel.org/netdev/20241006221221.3744995-1-artem.chernyshev@re…
[2] https://lore.kernel.org/netdev/20250109083039.14004-1-pchelkin@ispras.ru/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
Peter Seiderer (17):
net: pktgen: replace ENOTSUPP with EOPNOTSUPP
net: pktgen: enable 'param=value' parsing
net: pktgen: fix hex32_arg parsing for short reads
net: pktgen: fix 'rate 0' error handling (return -EINVAL)
net: pktgen: fix 'ratep 0' error handling (return -EINVAL)
net: pktgen: fix ctrl interface command parsing
net: pktgen: fix access outside of user given buffer in
pktgen_thread_write()
net: pktgen: use defines for the various dec/hex number parsing digits
lengths
net: pktgen: align some variable declarations to the most common
pattern
net: pktgen: remove extra tmp variable (re-use len instead)
net: pktgen: remove some superfluous variable initializing
net: pktgen: fix mpls maximum labels list parsing
net: pktgen: fix access outside of user given buffer in
pktgen_if_write()
net: pktgen: hex32_arg/num_arg error out in case no characters are
available
net: pktgen: num_arg error out in case no valid character is parsed
net: pktgen: fix mpls reset parsing
selftest: net: add proc_net_pktgen
net/core/pktgen.c | 268 +++++---
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/config | 1 +
tools/testing/selftests/net/proc_net_pktgen.c | 650 ++++++++++++++++++
4 files changed, 828 insertions(+), 92 deletions(-)
create mode 100644 tools/testing/selftests/net/proc_net_pktgen.c
--
2.48.1
We have separate Env classes for local tests and tests with a remote
endpoint. Make it easier to share the code by creating a base class.
Make env loading a method of this class.
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
.../selftests/drivers/net/lib/py/env.py | 59 +++++++++++--------
1 file changed, 33 insertions(+), 26 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/lib/py/env.py b/tools/testing/selftests/drivers/net/lib/py/env.py
index 987e452d3a45..2f17880e411d 100644
--- a/tools/testing/selftests/drivers/net/lib/py/env.py
+++ b/tools/testing/selftests/drivers/net/lib/py/env.py
@@ -10,38 +10,46 @@ from lib.py import NetNS, NetdevSimDev
from .remote import Remote
-def _load_env_file(src_path):
- env = os.environ.copy()
+class NetDrvEnvBase:
+ """
+ Base class for a NIC / host envirnoments
+ """
+ def __init__(self, src_path):
+ self.src_path = src_path
+ self.env = self._load_env_file()
- src_dir = Path(src_path).parent.resolve()
- if not (src_dir / "net.config").exists():
+ def _load_env_file(self):
+ env = os.environ.copy()
+
+ src_dir = Path(self.src_path).parent.resolve()
+ if not (src_dir / "net.config").exists():
+ return ksft_setup(env)
+
+ with open((src_dir / "net.config").as_posix(), 'r') as fp:
+ for line in fp.readlines():
+ full_file = line
+ # Strip comments
+ pos = line.find("#")
+ if pos >= 0:
+ line = line[:pos]
+ line = line.strip()
+ if not line:
+ continue
+ pair = line.split('=', maxsplit=1)
+ if len(pair) != 2:
+ raise Exception("Can't parse configuration line:", full_file)
+ env[pair[0]] = pair[1]
return ksft_setup(env)
- with open((src_dir / "net.config").as_posix(), 'r') as fp:
- for line in fp.readlines():
- full_file = line
- # Strip comments
- pos = line.find("#")
- if pos >= 0:
- line = line[:pos]
- line = line.strip()
- if not line:
- continue
- pair = line.split('=', maxsplit=1)
- if len(pair) != 2:
- raise Exception("Can't parse configuration line:", full_file)
- env[pair[0]] = pair[1]
- return ksft_setup(env)
-
-class NetDrvEnv:
+class NetDrvEnv(NetDrvEnvBase):
"""
Class for a single NIC / host env, with no remote end
"""
def __init__(self, src_path, **kwargs):
- self._ns = None
+ super().__init__(src_path)
- self.env = _load_env_file(src_path)
+ self._ns = None
if 'NETIF' in self.env:
self.dev = ip("link show dev " + self.env['NETIF'], json=True)[0]
@@ -68,7 +76,7 @@ from .remote import Remote
self._ns = None
-class NetDrvEpEnv:
+class NetDrvEpEnv(NetDrvEnvBase):
"""
Class for an environment with a local device and "remote endpoint"
which can be used to send traffic in.
@@ -82,8 +90,7 @@ from .remote import Remote
nsim_v6_pfx = "2001:db8::"
def __init__(self, src_path, nsim_test=None):
-
- self.env = _load_env_file(src_path)
+ super().__init__(src_path)
self._stats_settle_time = None
--
2.48.1
When I implemented virtio's hash-related features to tun/tap [1],
I found tun/tap does not fill the entire region reserved for the virtio
header, leaving some uninitialized hole in the middle of the buffer
after read()/recvmesg().
This series fills the uninitialized hole. More concretely, the
num_buffers field will be initialized with 1, and the other fields will
be inialized with 0. Setting the num_buffers field to 1 is mandated by
virtio 1.0 [2].
The change to virtio header is preceded by another change that refactors
tun and tap to unify their virtio-related code.
[1]: https://lore.kernel.org/r/20241008-rss-v5-0-f3cf68df005d@daynix.com
[2]: https://lore.kernel.org/r/20241227084256-mutt-send-email-mst@kernel.org/
Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
---
Changes in v6:
- Added an intermediate variable in tun_vnet_legacy_is_little_endian()
to reduce a complexity of an expression.
- Noted that functions are renamed in the message of patch
"tun: Extract the vnet handling code".
- Used clamp() in patch "tap: Keep hdr_len in tap_get_user()".
- Link to v5: https://lore.kernel.org/r/20250205-tun-v5-0-15d0b32e87fa@daynix.com
Changes in v5:
- s/vnet_hdr_len_sz/vnet_hdr_sz/ for patch "tun: Decouple vnet handling"
(Willem de Bruijn)
- Changed to inline vnet implementations to TUN and TAP.
- Dropped patch "tun: Avoid double-tracking iov_iter length changes" and
"tap: Avoid double-tracking iov_iter length changes".
- Link to v4: https://lore.kernel.org/r/20250120-tun-v4-0-ee81dda03d7f@daynix.com
Changes in v4:
- s/sz/vnet_hdr_len_sz/ for patch "tun: Decouple vnet handling"
(Willem de Bruijn)
- Reverted to add CONFIG_TUN_VNET.
- Link to v3: https://lore.kernel.org/r/20250116-tun-v3-0-c6b2871e97f7@daynix.com
Changes in v3:
- Dropped changes to fill the vnet header.
- Splitted patch "tun: Unify vnet implementation".
- Reverted spurious changes in patch "tun: Unify vnet implementation".
- Merged tun_vnet.c into TAP.
- Link to v2: https://lore.kernel.org/r/20250109-tun-v2-0-388d7d5a287a@daynix.com
Changes in v2:
- Fixed num_buffers endian.
- Link to v1: https://lore.kernel.org/r/20250108-tun-v1-0-67d784b34374@daynix.com
---
Akihiko Odaki (7):
tun: Refactor CONFIG_TUN_VNET_CROSS_LE
tun: Keep hdr_len in tun_get_user()
tun: Decouple vnet from tun_struct
tun: Decouple vnet handling
tun: Extract the vnet handling code
tap: Keep hdr_len in tap_get_user()
tap: Use tun's vnet-related code
MAINTAINERS | 2 +-
drivers/net/tap.c | 166 +++++-------------------------------------
drivers/net/tun.c | 193 ++++++-------------------------------------------
drivers/net/tun_vnet.h | 185 +++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 229 insertions(+), 317 deletions(-)
---
base-commit: a32e14f8aef69b42826cf0998b068a43d486a9e9
change-id: 20241230-tun-66e10a49b0c7
Best regards,
--
Akihiko Odaki <akihiko.odaki(a)daynix.com>
test_smoke.sh:9: expression uses non POSIX-conformant operator
symbol '==', resulting in skipping the remaining tests.
This patch ensures the use of POSIX-conformant operator symbol '='.
w/o this patch for test_smoke.sh
================================
$ sudo make -j8 TARGETS=tpm2 kselftest
make[3]: Nothing to be done for 'all'.
TAP version 13
1..3
# timeout set to 600
# selftests: tpm2: test_smoke.sh
# ./test_smoke.sh: 9: [: 2: unexpected operator
ok 1 selftests: tpm2: test_smoke.sh # SKIP
# timeout set to 600
# selftests: tpm2: test_space.sh
# test_flush_context (tpm2_tests.SpaceTest.test_flush_context) ... ok
# test_get_handles (tpm2_tests.SpaceTest.test_get_handles) ... ok
# test_invalid_cc (tpm2_tests.SpaceTest.test_invalid_cc) ... ok
# test_make_two_spaces (tpm2_tests.SpaceTest.test_make_two_spaces) ... ok
#
# ----------------------------------------------------------------------
# Ran 4 tests in 8.647s
#
# OK
ok 2 selftests: tpm2: test_space.sh
# timeout set to 600
# selftests: tpm2: test_async.sh
# test_async (tpm2_tests.AsyncTest.test_async) ... ok
# test_flush_invalid_context (tpm2_tests.AsyncTest.test_flush_invalid_context) ... ok
#
# ----------------------------------------------------------------------
# Ran 2 tests in 0.005s
#
# OK
ok 3 selftests: tpm2: test_async.sh
with this patch for test_smoke.sh
=================================
$ sudo make -j8 TARGETS=tpm2 kselftest
make[3]: Nothing to be done for 'all'.
TAP version 13
1..3
# timeout set to 600
# selftests: tpm2: test_smoke.sh
# test_read_partial_overwrite (tpm2_tests.SmokeTest.test_read_partial_overwrite) ... ok
# test_read_partial_resp (tpm2_tests.SmokeTest.test_read_partial_resp) ... ok
# test_seal_with_auth (tpm2_tests.SmokeTest.test_seal_with_auth) ... ok
# test_seal_with_policy (tpm2_tests.SmokeTest.test_seal_with_policy) ... ok
# test_seal_with_too_long_auth (tpm2_tests.SmokeTest.test_seal_with_too_long_auth) ... ok
# test_send_two_cmds (tpm2_tests.SmokeTest.test_send_two_cmds) ... ok
# test_too_short_cmd (tpm2_tests.SmokeTest.test_too_short_cmd) ... ok
# test_unseal_with_wrong_auth (tpm2_tests.SmokeTest.test_unseal_with_wrong_auth) ... ok
# test_unseal_with_wrong_policy (tpm2_tests.SmokeTest.test_unseal_with_wrong_policy) ... ok
#
# ----------------------------------------------------------------------
# Ran 9 tests in 9.278s
#
# OK
ok 1 selftests: tpm2: test_smoke.sh
# timeout set to 600
# selftests: tpm2: test_space.sh
# test_flush_context (tpm2_tests.SpaceTest.test_flush_context) ... ok
# test_get_handles (tpm2_tests.SpaceTest.test_get_handles) ... ok
# test_invalid_cc (tpm2_tests.SpaceTest.test_invalid_cc) ... ok
# test_make_two_spaces (tpm2_tests.SpaceTest.test_make_two_spaces) ... ok
#
# ----------------------------------------------------------------------
# Ran 4 tests in 8.701s
#
# OK
ok 2 selftests: tpm2: test_space.sh
# timeout set to 600
# selftests: tpm2: test_async.sh
# test_async (tpm2_tests.AsyncTest.test_async) ... ok
# test_flush_invalid_context (tpm2_tests.AsyncTest.test_flush_invalid_context) ... ok
#
# ----------------------------------------------------------------------
# Ran 2 tests in 0.004s
#
# OK
ok 3 selftests: tpm2: test_async.sh
Signed-off-by: Ahmed Salem <x0rw3ll(a)gmail.com>
---
tools/testing/selftests/tpm2/test_smoke.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/tpm2/test_smoke.sh b/tools/testing/selftests/tpm2/test_smoke.sh
index 168f4b166234..3a60e6c6f5c9 100755
--- a/tools/testing/selftests/tpm2/test_smoke.sh
+++ b/tools/testing/selftests/tpm2/test_smoke.sh
@@ -6,6 +6,6 @@ ksft_skip=4
[ -e /dev/tpm0 ] || exit $ksft_skip
read tpm_version < /sys/class/tpm/tpm0/tpm_version_major
-[ "$tpm_version" == 2 ] || exit $ksft_skip
+[ "$tpm_version" = 2 ] || exit $ksft_skip
python3 -m unittest -v tpm2_tests.SmokeTest 2>&1
--
2.47.2
The current implementation of netconsole sends all log messages in
parallel, which can lead to an intermixed and interleaved output on the
receiving side. This makes it challenging to demultiplex the messages
and attribute them to their originating CPUs.
As a result, users and developers often struggle to effectively analyze
and debug the parallel log output received through netconsole.
Example of a message got from produciton hosts:
------------[ cut here ]------------
------------[ cut here ]------------
refcount_t: saturated; leaking memory.
WARNING: CPU: 2 PID: 1613668 at lib/refcount.c:22 refcount_warn_saturate+0x5e/0xe0
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 26 PID: 4139916 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0xe0
Modules linked in: bpf_preload(E) vhost_net(E) tun(E) vhost(E)
This series of patches introduces a new feature to the netconsole
subsystem that allows the automatic population of the CPU number in the
userdata field for each log message. This enhancement provides several
benefits:
* Improved demultiplexing of parallel log output: When multiple CPUs are
sending messages concurrently, the added CPU number in the userdata
makes it easier to differentiate and attribute the messages to their
originating CPUs.
* Better visibility into message sources: The CPU number information
gives users and developers more insight into which specific CPU a
particular log message came from, which can be valuable for debugging
and analysis.
The changes in this series are as follows Patches::
Patch "consolidate send buffers into netconsole_target struct"
=================================================
Move the static buffers to netconsole target, from static declaration
in send_msg_no_fragmentation() and send_msg_fragmented().
Patch "netconsole: Rename userdata to extradata"
=================================================
Create the a concept of extradata, which encompasses the concept of
userdata and the upcoming sysdatao
Sysdata is a new concept being added, which is basically fields that are
populated by the kernel. At this time only the CPU#, but, there is a
desire to add current task name, kernel release version, etc.
Patch "netconsole: Helper to count number of used entries"
===========================================================
Create a simple helper to count number of entries in extradata. I am
separating this in a function since it will need to count userdata and
sysdata. For instance, when the user adds an extra userdata, we need to
check if there is space, counting the previous data entries (from
userdata and cpu data)
Patch "Introduce configfs helpers for sysdata features"
======================================================
Create the concept of sysdata feature in the netconsole target, and
create the configfs helpers to enable the bit in nt->sysdata
Patch "Include sysdata in extradata entry count"
================================================
Add the concept of sysdata when counting for available space in the
buffer. This will protect users from creating new userdata/sysdata if
there is no more space
Patch "netconsole: add support for sysdata and CPU population"
===============================================================
This is the core patch. Basically add a new option to enable automatic
CPU number population in the netconsole userdata Provides a new "cpu_nr"
sysfs attribute to control this feature
Patch "netconsole: selftest: test CPU number auto-population"
=============================================================
Expands the existing netconsole selftest to verify the CPU number
auto-population functionality Ensures the received netconsole messages
contain the expected "cpu=<CPU>" entry in the message. Test different
permutation with userdata
Patch "netconsole: docs: Add documentation for CPU number auto-population"
=============================================================================
Updates the netconsole documentation to explain the new CPU number
auto-population feature Provides instructions on how to enable and use
the feature
I believe these changes will be a valuable addition to the netconsole
subsystem, enhancing its usefulness for kernel developers and users.
PS: This patchset is on top of the patch that created
netcons_fragmented_msg selftest:
https://lore.kernel.org/all/20250203-netcons_frag_msgs-v1-1-5bc6bedf2ac0@de…
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v5:
- Fixed a kernel doc syntax syntax (Simon)
- Link to v4: https://lore.kernel.org/r/20250204-netcon_cpu-v4-0-9480266ef556@debian.org
Changes in v4:
- Fixed Kernel doc for netconsole_target (Simon)
- Fixed a typo in disable_sysdata_feature (Simon)
- Improved sysdata_cpu_nr_show() to return !! in a bit-wise operation
- Link to v3: https://lore.kernel.org/r/20250124-netcon_cpu-v3-0-12a0d286ba1d@debian.org
Changes in v3:
- Moved the buffer into netconsole_target, avoiding static functions in
the send path (Jakub).
- Fix a documentation error (Randy Dunlap)
- Created a function that handle all the extradata, consolidating it in
a single place (Jakub)
- Split the patch even more, trying to simplify the review.
- Link to v2: https://lore.kernel.org/r/20250115-netcon_cpu-v2-0-95971b44dc56@debian.org
Changes in v2:
- Create the concept of extradata and sysdata. This will make the design
easier to understand, and the code easier to read.
* Basically extradata encompasses userdata and the new sysdata.
Userdata originates from user, and sysdata originates in kernel.
- Improved the test to send from a very specific CPU, which can be
checked to be correct on the other side, as suggested by Jakub.
- Fixed a bug where CPU # was populated at the wrong place
- Link to v1: https://lore.kernel.org/r/20241113-netcon_cpu-v1-0-d187bf7c0321@debian.org
---
Breno Leitao (8):
netconsole: consolidate send buffers into netconsole_target struct
netconsole: Rename userdata to extradata
netconsole: Helper to count number of used entries
netconsole: Introduce configfs helpers for sysdata features
netconsole: Include sysdata in extradata entry count
netconsole: add support for sysdata and CPU population
netconsole: selftest: test for sysdata CPU
netconsole: docs: Add documentation for CPU number auto-population
Documentation/networking/netconsole.rst | 45 ++++
drivers/net/netconsole.c | 261 ++++++++++++++++-----
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 17 ++
.../selftests/drivers/net/netcons_sysdata.sh | 167 +++++++++++++
5 files changed, 427 insertions(+), 64 deletions(-)
---
base-commit: 09717c28b76c30b1dc8c261c855ffb2406abab2e
change-id: 20241108-netcon_cpu-ce3917e88f4b
Best regards,
--
Breno Leitao <leitao(a)debian.org>
This series introduces support in the ARM PMUv3 driver for
partitioning PMU counters into two separate ranges by taking advantage
of the MDCR_EL2.HPMN register field.
The advantage of a partitioned PMU would be to allow KVM guests direct
access to a subset of PMU functionality, greatly reducing the overhead
of performance monitoring in guests.
While this feature could be accepted on its own merits, practically
there is a lot more to be done before it will be fully useful, so I'm
sending as an RFC for now.
v2:
* Rebase to v6.14-rc1, and take into account Oliver's debug cleanup
changing some KVM handling of MDCR_EL2.
* Reorder patches to put related things closer together for easier
reading.
* Ensure driver compiles for 32-bit ARM by defining 32-bit access to
MDCR_EL2 correctly, which is called HDCR there. These will not be
called due to the below requirement for VHE mode but a functionally
correct definition seemed preferable to making it a no op.
* Rename parameter to reserved_host_counters to better reflect
underlying hardware semantics and ensure host counters are easier to
preserve.
* Document parameter with MODULE_PARM_DESC
* Restrict partitioning to VHE mode only. Using HPMN restricts the
host counters to EL2 access only. Trying to make this work for the
driver running at EL1 would require hypercalls for every access of
those registers and a lot of additional handling at the KVM level.
This would defeat the whole purpose of having the feature at
all. Define has_vhe() on 32-bit arm to always return false.
* While Rob asked about making the parameter configurable at run time
rather than boot time through sysfs, that is potentially very tricky
because it is only safe to reprogram HPMN when we are certain no
one, host or guest, is using any counters that might change
ownership. If this is determined to be worthwhile, it should come
after getting the core infrastructure right.
* While Marc asked about reusing the sysreg masks instead of
redefining them for MDCR_EL2, that doesn't work because sysreg masks
are only defined for arm64. I'm not sure how to avoid the
duplication.
* Wrap code for determining if a counter index belongs to the host or
guest partition in functions
* Typo fixes and additional testing
v1:
https://lore.kernel.org/kvm/20250127222031.3078945-1-coltonlewis@google.com/
Colton Lewis (4):
perf: arm_pmuv3: Generalize counter bitmasks
perf: arm_pmuv3: Introduce module param to partition the PMU
perf: arm_pmuv3: Keep out of guest counter partition
KVM: arm64: Make guests see only counters they can access
arch/arm/include/asm/arm_pmuv3.h | 13 ++
arch/arm64/include/asm/arm_pmuv3.h | 10 ++
arch/arm64/kvm/debug.c | 3 +-
arch/arm64/kvm/pmu-emul.c | 8 +-
drivers/perf/arm_pmuv3.c | 117 ++++++++++++++++--
include/linux/perf/arm_pmu.h | 2 +
include/linux/perf/arm_pmuv3.h | 34 ++++-
.../selftests/kvm/arm64/vpmu_counter_access.c | 2 +-
8 files changed, 168 insertions(+), 21 deletions(-)
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
--
2.48.1.502.g6dc24dfdaf-goog
The TX path had been dropped from the Device Memory TCP patch series
post RFCv1 [1], to make that series slightly easier to review. This
series rebases the implementation of the TX path on top of the
net_iov/netmem framework agreed upon and merged. The motivation for
the feature is thoroughly described in the docs & cover letter of the
original proposal, so I don't repeat the lengthy descriptions here, but
they are available in [1].
Sending this series as RFC as the winder closure is immenient. I plan on
reposting as non-RFC once the tree re-opens, addressing any feedback
I receive in the meantime.
Full outline on usage of the TX path is detailed in the documentation
added in the first patch.
Test example is available via the kselftest included in the series as well.
The series is relatively small, as the TX path for this feature largely
piggybacks on the existing MSG_ZEROCOPY implementation.
Patch Overview:
---------------
1. Documentation & tests to give high level overview of the feature
being added.
2. Add netmem refcounting needed for the TX path.
3. Devmem TX netlink API.
4. Devmem TX net stack implementation.
Testing:
--------
Testing is very similar to devmem TCP RX path. The ncdevmem test used
for the RX path is now augemented with client functionality to test TX
path.
* Test Setup:
Kernel: net-next with this RFC and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Performance results are not included with this version, unfortunately.
I'm having issues running the dma-buf exporter driver against the
upstream kernel on my test setup. The issues are specific to that
dma-buf exporter and do not affect this patch series. I plan to follow
up this series with perf fixes if the tests point to issues once they're
up and running.
Special thanks to Stan who took a stab at rebasing the TX implementation
on top of the netmem/net_iov framework merged. Parts of his proposal [2]
that are reused as-is are forked off into their own patches to give full
credit.
[1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.…
[2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#…
Cc: sdf(a)fomichev.me
Cc: asml.silence(a)gmail.com
Cc: dw(a)davidwei.uk
Mina Almasry (4):
net: add devmem TCP TX documentation
selftests: ncdevmem: Implement devmem TCP TX
net: add get_netmem/put_netmem support
net: devmem: Implement TX path
Stanislav Fomichev (1):
net: devmem TCP tx netlink api
Documentation/netlink/specs/netdev.yaml | 12 +
Documentation/networking/devmem.rst | 140 +++++++++-
include/linux/skbuff.h | 13 +-
include/linux/skbuff_ref.h | 4 +-
include/net/netmem.h | 3 +
include/net/sock.h | 2 +
include/uapi/linux/netdev.h | 1 +
include/uapi/linux/uio.h | 5 +
net/core/datagram.c | 40 ++-
net/core/devmem.c | 101 ++++++-
net/core/devmem.h | 51 +++-
net/core/netdev-genl-gen.c | 13 +
net/core/netdev-genl-gen.h | 1 +
net/core/netdev-genl.c | 67 ++++-
net/core/skbuff.c | 38 ++-
net/core/sock.c | 9 +
net/ipv4/tcp.c | 36 ++-
net/vmw_vsock/virtio_transport_common.c | 4 +-
tools/include/uapi/linux/netdev.h | 1 +
.../selftests/drivers/net/hw/ncdevmem.c | 261 +++++++++++++++++-
20 files changed, 764 insertions(+), 38 deletions(-)
--
2.47.1.613.gc27f4b7a9f-goog
Basics and overview
===================
Software with larger attack surfaces (e.g. network facing apps like databases,
browsers or apps relying on browser runtimes) suffer from memory corruption
issues which can be utilized by attackers to bend control flow of the program
to eventually gain control (by making their payload executable). Attackers are
able to perform such attacks by leveraging call-sites which rely on indirect
calls or return sites which rely on obtaining return address from stack memory.
To mitigate such attacks, risc-v extension zicfilp enforces that all indirect
calls must land on a landing pad instruction `lpad` else cpu will raise software
check exception (a new cpu exception cause code on riscv).
Similarly for return flow, risc-v extension zicfiss extends architecture with
- `sspush` instruction to push return address on a shadow stack
- `sspopchk` instruction to pop return address from shadow stack
and compare with input operand (i.e. return address on stack)
- `sspopchk` to raise software check exception if comparision above
was a mismatch
- Protection mechanism using which shadow stack is not writeable via
regular store instructions
More information an details can be found at extensions github repo [1].
Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel
CET [3] and branch target identification (BTI) [4] on arm.
Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control
stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack.
x86 and arm64 support for user mode shadow stack is already in mainline.
Kernel awareness for user control flow integrity
================================================
This series picks up Samuel Holland's envcfg changes [2] as well. So if those are
being applied independently, they should be removed from this series.
Enabling:
In order to maintain compatibility and not break anything in user mode, kernel
doesn't enable control flow integrity cpu extensions on binary by default.
Instead exposes a prctl interface to enable, disable and lock the shadow stack
or landing pad feature for a task. This allows userspace (loader) to enumerate
if all objects in its address space are compiled with shadow stack and landing
pad support and accordingly enable the feature. Additionally if a subsequent
`dlopen` happens on a library, user mode can take a decision again to disable
the feature (if incoming library is not compiled with support) OR terminate the
task (if user mode policy is strict to have all objects in address space to be
compiled with control flow integirty cpu feature). prctl to enable shadow stack
results in allocating shadow stack from virtual memory and activating for user
address space. x86 and arm64 are also following same direction due to similar
reason(s).
clone/fork:
On clone and fork, cfi state for task is inherited by child. Shadow stack is
part of virtual memory and is a writeable memory from kernel perspective
(writeable via a restricted set of instructions aka shadow stack instructions)
Thus kernel changes ensure that this memory is converted into read-only when
fork/clone happens and COWed when fault is taken due to sspush, sspopchk or
ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled,
kernel will automatically allocate a shadow stack for that clone call.
map_shadow_stack:
x86 introduced `map_shadow_stack` system call to allow user space to explicitly
map shadow stack memory in its address space. It is useful to allocate shadow
for different contexts managed by a single thread (green threads or contexts)
risc-v implements this system call as well.
signal management:
If shadow stack is enabled for a task, kernel performs an asynchronous control
flow diversion to deliver the signal and eventually expects userspace to issue
sigreturn so that original execution can be resumed. Even though resume context
is prepared by kernel, it is in user space memory and is subject to memory
corruption and corruption bugs can be utilized by attacker in this race window
to perform arbitrary sigreturn and eventually bypass cfi mechanism.
Another issue is how to ensure that cfi related state on sigcontext area is not
trampled by legacy apps or apps compiled with old kernel headers.
In order to mitigate control-flow hijacting, kernel prepares a token and place
it on shadow stack before signal delivery and places address of token in
sigcontext structure. During sigreturn, kernel obtains address of token from
sigcontext struture, reads token from shadow stack and validates it and only
then allow sigreturn to succeed. Compatiblity issue is solved by adopting
dynamic sigcontext management introduced for vector extension. This series
re-factor the code little bit to allow future sigcontext management easy (as
proposed by Andy Chiu from SiFive)
config and compilation:
Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this
config option picks the kernel support for user control flow integrity. This
optin is presented only if toolchain has shadow stack and landing pad support.
And is on purpose guarded by toolchain support. Reason being that eventually
vDSO also needs to be compiled in with shadow stack and landing pad support.
vDSO compile patches are not included as of now because landing pad labeling
scheme is yet to settle for usermode runtime.
To get more information on kernel interactions with respect to
zicfilp and zicfiss, patch series adds documentation for
`zicfilp` and `zicfiss` in following:
Documentation/arch/riscv/zicfiss.rst
Documentation/arch/riscv/zicfilp.rst
How to test this series
=======================
Toolchain
---------
$ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev
$ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static"
$ make -j$(nproc)
Qemu
----
Get the lastest qemu
$ cd qemu
$ mkdir build
$ cd build
$ ../configure --target-list=riscv64-softmmu
$ make -j$(nproc)
Opensbi
-------
$ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi
$ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic
Linux
-----
Running defconfig is fine. CFI is enabled by default if the toolchain
supports it.
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc)
In case you're building your own rootfs using toolchain, please make sure you
pick following patch to ensure that vDSO compiled with lpad and shadow stack.
"arch/riscv: compile vdso with landing pad"
Branch where above patch can be picked
https://github.com/deepak0414/linux-riscv-cfi/tree/vdso_user_cfi_v6.12-rc1
Running
-------
Modify your qemu command to have:
-bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin
-cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true
vDSO related Opens (in the flux)
=================================
I am listing these opens for laying out plan and what to expect in future
patch sets. And of course for the sake of discussion.
Shadow stack and landing pad enabling in vDSO
----------------------------------------------
vDSO must have shadow stack and landing pad support compiled in for task
to have shadow stack and landing pad support. This patch series doesn't
enable that (yet). Enabling shadow stack support in vDSO should be
straight forward (intend to do that in next versions of patch set). Enabling
landing pad support in vDSO requires some collaboration with toolchain folks
to follow a single label scheme for all object binaries. This is necessary to
ensure that all indirect call-sites are setting correct label and target landing
pads are decorated with same label scheme.
How many vDSOs
---------------
Shadow stack instructions are carved out of zimop (may be operations) and if CPU
doesn't implement zimop, they're illegal instructions. Kernel could be running on
a CPU which may or may not implement zimop. And thus kernel will have to carry 2
different vDSOs and expose the appropriate one depending on whether CPU implements
zimop or not.
References
==========
[1] - https://github.com/riscv/riscv-cfi
[2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.c…
[3] - https://lwn.net/Articles/889475/
[4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identific…
[5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-i…
[6] - https://lwn.net/Articles/940403/
---
changelog
---------
v10:
- dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch
is not that interesting to this patch series for risc-v. There are instances in
arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch
to expedite merging in riscv tree.
- Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to
validate presence of cfi based on config.
- Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure
we add single vdso object with cfi enabled. But a vdso object with scheme of
zero labeled landing pad is least common denominator and should work with all
objects of zero labeled as well as function-signature labeled objects.
v9:
- rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion")
- dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs)
- dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs)
v8:
- rebased on palmer/for-next
- dropped samuel holland's `envcfg` context switch patches.
they are in parlmer/for-next
v7:
- Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv"
Instead using `deactivate_mm` flow to clean up.
see here for more context
https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.…
- Changed the header include in `kselftest`. Hopefully this fixes compile
issue faced by Zong Li at SiFive.
- Cleaned up an orphaned change to `mm/mmap.c` in below patch
"riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE"
- Lock interfaces for shadow stack and indirect branch tracking expect arg == 0
Any future evolution of this interface should accordingly define how arg should
be setup.
- `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper
`is_shadow_stack_vma`.
- Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@riv…
v6:
- Picked up Samuel Holland's changes as is with `envcfg` placed in
`thread` instead of `thread_info`
- fixed unaligned newline escapes in kselftest
- cleaned up messages in kselftest and included test output in commit message
- fixed a bug in clone path reported by Zong Li
- fixed a build issue if CONFIG_RISCV_ISA_V is not selected
(this was introduced due to re-factoring signal context
management code)
v5:
- rebased on v6.12-rc1
- Fixed schema related issues in device tree file
- Fixed some of the documentation related issues in zicfilp/ss.rst
(style issues and added index)
- added `SHADOW_STACK_SET_MARKER` so that implementation can define base
of shadow stack.
- Fixed warnings on definitions added in usercfi.h when
CONFIG_RISCV_USER_CFI is not selected.
- Adopted context header based signal handling as proposed by Andy Chiu
- Added support for enabling kernel mode access to shadow stack using
FWFT
(https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware…)
- Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@riv…
(Note: I had an issue in my workflow due to which version number wasn't
picked up correctly while sending out patches)
v4:
- rebased on 6.11-rc6
- envcfg: Converged with Samuel Holland's patches for envcfg management on per-
thread basis.
- vma_is_shadow_stack is renamed to is_vma_shadow_stack
- picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch
- signal context: using extended context management to maintain compatibility.
- fixed `-Wmissing-prototypes` compiler warnings for prctl functions
- Documentation fixes and amending typos.
- Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/
v3:
- envcfg
logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been
picked on per task basis, even though CPU didn't implement it. Fixed in
this series.
- dt-bindings
As suggested, split into separate commit. fixed the messaging that spec is
in public review
- arch_is_shadow_stack change
arch_is_shadow_stack changed to vma_is_shadow_stack
- hwprobe
zicfiss / zicfilp if present will get enumerated in hwprobe
- selftests
As suggested, added object and binary filenames to .gitignore
Selftest binary anyways need to be compiled with cfi enabled compiler which
will make sure that landing pad and shadow stack are enabled. Thus removed
separate enable/disable tests. Cleaned up tests a bit.
- Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/
v2:
- Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow
integrity for user mode programs can be compiled in the kernel.
- Enabling of control flow integrity for user programs is left to user runtime
- This patch series introduces arch agnostic `prctls` to enable shadow stack
and indirect branch tracking. And implements them on riscv.
---
---
Andy Chiu (1):
riscv: signal: abstract header saving for setup_sigcontext
Clément Léger (1):
riscv: Add Firmware Feature SBI extensions definitions
Deepak Gupta (24):
mm: VM_SHADOW_STACK definition for riscv
dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml)
riscv: zicfiss / zicfilp enumeration
riscv: zicfiss / zicfilp extension csr and bit definitions
riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit
riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE
riscv mm: manufacture shadow stack pte
riscv mmu: teach pte_mkwrite to manufacture shadow stack PTEs
riscv mmu: write protect and shadow stack
riscv/mm: Implement map_shadow_stack() syscall
riscv/shstk: If needed allocate a new shadow stack on clone
riscv: Implements arch agnostic shadow stack prctls
prctl: arch-agnostic prctl for indirect branch tracking
riscv/traps: Introduce software check exception
riscv/signal: save and restore of shadow stack for signal
riscv/kernel: update __show_regs to print shadow stack register
riscv/ptrace: riscv cfi status and state via ptrace and in core files
riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe
riscv: enable kernel access to shadow stack memory via FWFT sbi call
riscv: kernel command line option to opt out of user cfi
riscv: create a config for shadow stack and landing pad instr support
riscv: Documentation for landing pad / indirect branch tracking
riscv: Documentation for shadow stack on riscv
kselftest/riscv: kselftest for user mode cfi
Jim Shu (1):
arch/riscv: compile vdso with landing pad
Documentation/arch/riscv/index.rst | 2 +
Documentation/arch/riscv/zicfilp.rst | 115 +++++
Documentation/arch/riscv/zicfiss.rst | 176 +++++++
.../devicetree/bindings/riscv/extensions.yaml | 14 +
arch/riscv/Kconfig | 20 +
arch/riscv/Makefile | 7 +-
arch/riscv/include/asm/asm-prototypes.h | 1 +
arch/riscv/include/asm/assembler.h | 44 ++
arch/riscv/include/asm/cpufeature.h | 13 +
arch/riscv/include/asm/csr.h | 16 +
arch/riscv/include/asm/entry-common.h | 2 +
arch/riscv/include/asm/hwcap.h | 2 +
arch/riscv/include/asm/mman.h | 25 +
arch/riscv/include/asm/mmu_context.h | 7 +
arch/riscv/include/asm/pgtable.h | 30 +-
arch/riscv/include/asm/processor.h | 2 +
arch/riscv/include/asm/sbi.h | 26 +
arch/riscv/include/asm/thread_info.h | 3 +
arch/riscv/include/asm/usercfi.h | 89 ++++
arch/riscv/include/asm/vector.h | 3 +
arch/riscv/include/uapi/asm/hwprobe.h | 2 +
arch/riscv/include/uapi/asm/ptrace.h | 22 +
arch/riscv/include/uapi/asm/sigcontext.h | 1 +
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/asm-offsets.c | 8 +
arch/riscv/kernel/cpufeature.c | 13 +
arch/riscv/kernel/entry.S | 31 +-
arch/riscv/kernel/head.S | 12 +
arch/riscv/kernel/process.c | 26 +-
arch/riscv/kernel/ptrace.c | 83 ++++
arch/riscv/kernel/signal.c | 142 +++++-
arch/riscv/kernel/sys_hwprobe.c | 2 +
arch/riscv/kernel/sys_riscv.c | 10 +
arch/riscv/kernel/traps.c | 43 ++
arch/riscv/kernel/usercfi.c | 524 +++++++++++++++++++++
arch/riscv/kernel/vdso/Makefile | 8 +
arch/riscv/kernel/vdso/flush_icache.S | 4 +
arch/riscv/kernel/vdso/getcpu.S | 4 +
arch/riscv/kernel/vdso/rt_sigreturn.S | 4 +
arch/riscv/kernel/vdso/sys_hwprobe.S | 4 +
arch/riscv/mm/init.c | 2 +-
arch/riscv/mm/pgtable.c | 17 +
include/linux/cpu.h | 4 +
include/linux/mm.h | 7 +
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 27 ++
kernel/sys.c | 30 ++
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/cfi/.gitignore | 3 +
tools/testing/selftests/riscv/cfi/Makefile | 10 +
tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 84 ++++
tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 78 +++
tools/testing/selftests/riscv/cfi/shadowstack.c | 375 +++++++++++++++
tools/testing/selftests/riscv/cfi/shadowstack.h | 37 ++
54 files changed, 2189 insertions(+), 29 deletions(-)
---
base-commit: 39a803b754d5224a3522016b564113ee1e4091b2
change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2
--
- debug
A task in the kernel (task_mm_cid_work) runs somewhat periodically to
compact the mm_cid for each process. Add a test to validate that it runs
correctly and timely.
The test spawns 1 thread pinned to each CPU, then each thread, including
the main one, runs in short bursts for some time. During this period, the
mm_cids should be spanning all numbers between 0 and nproc.
At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
is selected to be the new leader, all other threads terminate.
After some time, the only running thread should see 0 as mm_cid, if that
doesn't happen, the compaction mechanism didn't work and the test fails.
The test never fails if only 1 core is available, in which case, we
cannot test anything as the only available mm_cid is 0.
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Signed-off-by: Gabriele Monaco <gmonaco(a)redhat.com>
---
tools/testing/selftests/rseq/.gitignore | 1 +
tools/testing/selftests/rseq/Makefile | 2 +-
.../selftests/rseq/mm_cid_compaction_test.c | 200 ++++++++++++++++++
3 files changed, 202 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c
diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
index 16496de5f6ce4..2c89f97e4f737 100644
--- a/tools/testing/selftests/rseq/.gitignore
+++ b/tools/testing/selftests/rseq/.gitignore
@@ -3,6 +3,7 @@ basic_percpu_ops_test
basic_percpu_ops_mm_cid_test
basic_test
basic_rseq_op_test
+mm_cid_compaction_test
param_test
param_test_benchmark
param_test_compare_twice
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index 5a3432fceb586..ce1b38f46a355 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
param_test_benchmark param_test_compare_twice param_test_mm_cid \
- param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
+ param_test_mm_cid_benchmark param_test_mm_cid_compare_twice mm_cid_compaction_test
TEST_GEN_PROGS_EXTENDED = librseq.so
diff --git a/tools/testing/selftests/rseq/mm_cid_compaction_test.c b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
new file mode 100644
index 0000000000000..7ddde3b657dd6
--- /dev/null
+++ b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
@@ -0,0 +1,200 @@
+// SPDX-License-Identifier: LGPL-2.1
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stddef.h>
+
+#include "../kselftest.h"
+#include "rseq.h"
+
+#define VERBOSE 0
+#define printf_verbose(fmt, ...) \
+ do { \
+ if (VERBOSE) \
+ printf(fmt, ##__VA_ARGS__); \
+ } while (0)
+
+/* 0.5 s */
+#define RUNNER_PERIOD 500000
+/* Number of runs before we terminate or get the token */
+#define THREAD_RUNS 5
+
+/*
+ * Number of times we check that the mm_cid were compacted.
+ * Checks are repeated every RUNNER_PERIOD.
+ */
+#define MM_CID_COMPACT_TIMEOUT 10
+
+struct thread_args {
+ int cpu;
+ int num_cpus;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ pthread_t *tinfo;
+ struct thread_args *args_head;
+};
+
+static void __noreturn *thread_runner(void *arg)
+{
+ struct thread_args *args = arg;
+ int i, ret, curr_mm_cid;
+ cpu_set_t cpumask;
+
+ CPU_ZERO(&cpumask);
+ CPU_SET(args->cpu, &cpumask);
+ ret = pthread_setaffinity_np(pthread_self(), sizeof(cpumask), &cpumask);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to set affinity");
+ abort();
+ }
+ pthread_barrier_wait(args->barrier);
+
+ for (i = 0; i < THREAD_RUNS; i++)
+ usleep(RUNNER_PERIOD);
+ curr_mm_cid = rseq_current_mm_cid();
+ /*
+ * We select one thread with high enough mm_cid to be the new leader.
+ * All other threads (including the main thread) will terminate.
+ * After some time, the mm_cid of the only remaining thread should
+ * converge to 0, if not, the test fails.
+ */
+ if (curr_mm_cid >= args->num_cpus / 2 &&
+ !pthread_mutex_trylock(args->token)) {
+ printf_verbose(
+ "cpu%d has mm_cid=%d and will be the new leader.\n",
+ sched_getcpu(), curr_mm_cid);
+ for (i = 0; i < args->num_cpus; i++) {
+ if (args->tinfo[i] == pthread_self())
+ continue;
+ ret = pthread_join(args->tinfo[i], NULL);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to join thread");
+ abort();
+ }
+ }
+ pthread_barrier_destroy(args->barrier);
+ free(args->tinfo);
+ free(args->token);
+ free(args->barrier);
+ free(args->args_head);
+
+ for (i = 0; i < MM_CID_COMPACT_TIMEOUT; i++) {
+ curr_mm_cid = rseq_current_mm_cid();
+ printf_verbose("run %d: mm_cid=%d on cpu%d.\n", i,
+ curr_mm_cid, sched_getcpu());
+ if (curr_mm_cid == 0)
+ exit(EXIT_SUCCESS);
+ usleep(RUNNER_PERIOD);
+ }
+ exit(EXIT_FAILURE);
+ }
+ printf_verbose("cpu%d has mm_cid=%d and is going to terminate.\n",
+ sched_getcpu(), curr_mm_cid);
+ pthread_exit(NULL);
+}
+
+int test_mm_cid_compaction(void)
+{
+ cpu_set_t affinity;
+ int i, j, ret = 0, num_threads;
+ pthread_t *tinfo;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ struct thread_args *args;
+
+ sched_getaffinity(0, sizeof(affinity), &affinity);
+ num_threads = CPU_COUNT(&affinity);
+ tinfo = calloc(num_threads, sizeof(*tinfo));
+ if (!tinfo) {
+ perror("Error: failed to allocate tinfo");
+ return -1;
+ }
+ args = calloc(num_threads, sizeof(*args));
+ if (!args) {
+ perror("Error: failed to allocate args");
+ ret = -1;
+ goto out_free_tinfo;
+ }
+ token = malloc(sizeof(*token));
+ if (!token) {
+ perror("Error: failed to allocate token");
+ ret = -1;
+ goto out_free_args;
+ }
+ barrier = malloc(sizeof(*barrier));
+ if (!barrier) {
+ perror("Error: failed to allocate barrier");
+ ret = -1;
+ goto out_free_token;
+ }
+ if (num_threads == 1) {
+ fprintf(stderr, "Cannot test on a single cpu. "
+ "Skipping mm_cid_compaction test.\n");
+ /* only skipping the test, this is not a failure */
+ goto out_free_barrier;
+ }
+ pthread_mutex_init(token, NULL);
+ ret = pthread_barrier_init(barrier, NULL, num_threads);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to initialise barrier");
+ goto out_free_barrier;
+ }
+ for (i = 0, j = 0; i < CPU_SETSIZE && j < num_threads; i++) {
+ if (!CPU_ISSET(i, &affinity))
+ continue;
+ args[j].num_cpus = num_threads;
+ args[j].tinfo = tinfo;
+ args[j].token = token;
+ args[j].barrier = barrier;
+ args[j].cpu = i;
+ args[j].args_head = args;
+ if (!j) {
+ /* The first thread is the main one */
+ tinfo[0] = pthread_self();
+ ++j;
+ continue;
+ }
+ ret = pthread_create(&tinfo[j], NULL, thread_runner, &args[j]);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to create thread");
+ abort();
+ }
+ ++j;
+ }
+ printf_verbose("Started %d threads.\n", num_threads);
+
+ /* Also main thread will terminate if it is not selected as leader */
+ thread_runner(&args[0]);
+
+ /* only reached in case of errors */
+out_free_barrier:
+ free(barrier);
+out_free_token:
+ free(token);
+out_free_args:
+ free(args);
+out_free_tinfo:
+ free(tinfo);
+
+ return ret;
+}
+
+int main(int argc, char **argv)
+{
+ if (!rseq_mm_cid_available()) {
+ fprintf(stderr, "Error: rseq_mm_cid unavailable\n");
+ return -1;
+ }
+ if (test_mm_cid_compaction())
+ return -1;
+ return 0;
+}
--
2.48.1
PTRACE_SET_SYSCALL_INFO is a generic ptrace API that complements
PTRACE_GET_SYSCALL_INFO by letting the ptracer modify details of
system calls the tracee is blocked in.
This API allows ptracers to obtain and modify system call details in a
straightforward and architecture-agnostic way, providing a consistent way
of manipulating the system call number and arguments across architectures.
As in case of PTRACE_GET_SYSCALL_INFO, PTRACE_SET_SYSCALL_INFO also
does not aim to address numerous architecture-specific system call ABI
peculiarities, like differences in the number of system call arguments
for such system calls as pread64 and preadv.
The current implementation supports changing only those bits of system call
information that are used by strace system call tampering, namely, syscall
number, syscall arguments, and syscall return value.
Support of changing additional details returned by PTRACE_GET_SYSCALL_INFO,
such as instruction pointer and stack pointer, could be added later if
needed, by using struct ptrace_syscall_info.flags to specify the additional
details that should be set. Currently, "flags" and "reserved" fields of
struct ptrace_syscall_info must be initialized with zeroes; "arch",
"instruction_pointer", and "stack_pointer" fields are currently ignored.
PTRACE_SET_SYSCALL_INFO currently supports only PTRACE_SYSCALL_INFO_ENTRY,
PTRACE_SYSCALL_INFO_EXIT, and PTRACE_SYSCALL_INFO_SECCOMP operations.
Other operations could be added later if needed.
Ideally, PTRACE_SET_SYSCALL_INFO should have been introduced along with
PTRACE_GET_SYSCALL_INFO, but it didn't happen. The last straw that
convinced me to implement PTRACE_SET_SYSCALL_INFO was apparent failure
to provide an API of changing the first system call argument on riscv
architecture [1].
ptrace(2) man page:
long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);
...
PTRACE_SET_SYSCALL_INFO
Modify information about the system call that caused the stop.
The "data" argument is a pointer to struct ptrace_syscall_info
that specifies the system call information to be set.
The "addr" argument should be set to sizeof(struct ptrace_syscall_info)).
[1] https://lore.kernel.org/all/59505464-c84a-403d-972f-d4b2055eeaac@gmail.com/
Notes:
v5:
* ptrace: Extend the commit message to say that the new API does not aim
to address numerous architecture-specific syscall ABI peculiarities
* selftests: Add a workaround for s390 16-bit syscall numbers
* Add more Acked-by
* v4: https://lore.kernel.org/all/20250203065849.GA14120@strace.io/
v4:
* Split out syscall_set_return_value() for hexagon into a separate patch
* s390: Change the style of syscall_set_arguments() implementation as
requested
* Add more Reviewed-by
* v3: https://lore.kernel.org/all/20250128091445.GA8257@strace.io/
v3:
* powerpc: Submit syscall_set_return_value() fix for "sc" case separately
* mips: Do not introduce erroneous argument truncation on mips n32,
add a detailed description to the commit message of the
mips_get_syscall_arg() change
* ptrace: Add explicit padding to the end of struct ptrace_syscall_info,
simplify obtaining of user ptrace_syscall_info,
do not introduce PTRACE_SYSCALL_INFO_SIZE_VER0
* ptrace: Change the return type of ptrace_set_syscall_info_* functions
from "unsigned long" to "int"
* ptrace: Add -ERANGE check to ptrace_set_syscall_info_exit(),
add comments to -ERANGE checks
* ptrace: Update comments about supported syscall stops
* selftests: Extend set_syscall_info test, fix for mips n32
* Add Tested-by and Reviewed-by
v2:
* Add patch to fix syscall_set_return_value() on powerpc
* Add patch to fix mips_get_syscall_arg() on mips
* Add syscall_set_return_value() implementation on hexagon
* Add syscall_set_return_value() invocation to syscall_set_nr()
on arm and arm64.
* Fix syscall_set_nr() and mips_set_syscall_arg() on mips
* Add a comment to syscall_set_nr() on arc, powerpc, s390, sh,
and sparc
* Remove redundant ptrace_syscall_info.op assignments in
ptrace_get_syscall_info_*
* Minor style tweaks in ptrace_get_syscall_info_op()
* Remove syscall_set_return_value() invocation from
ptrace_set_syscall_info_entry()
* Skip syscall_set_arguments() invocation in case of syscall number -1
in ptrace_set_syscall_info_entry()
* Split ptrace_syscall_info.reserved into ptrace_syscall_info.reserved
and ptrace_syscall_info.flags
* Use __kernel_ulong_t instead of unsigned long in set_syscall_info test
Dmitry V. Levin (7):
mips: fix mips_get_syscall_arg() for o32
hexagon: add syscall_set_return_value()
syscall.h: add syscall_set_arguments()
syscall.h: introduce syscall_set_nr()
ptrace_get_syscall_info: factor out ptrace_get_syscall_info_op
ptrace: introduce PTRACE_SET_SYSCALL_INFO request
selftests/ptrace: add a test case for PTRACE_SET_SYSCALL_INFO
arch/arc/include/asm/syscall.h | 25 +
arch/arm/include/asm/syscall.h | 37 ++
arch/arm64/include/asm/syscall.h | 29 +
arch/csky/include/asm/syscall.h | 13 +
arch/hexagon/include/asm/syscall.h | 21 +
arch/loongarch/include/asm/syscall.h | 15 +
arch/m68k/include/asm/syscall.h | 7 +
arch/microblaze/include/asm/syscall.h | 7 +
arch/mips/include/asm/syscall.h | 70 ++-
arch/nios2/include/asm/syscall.h | 16 +
arch/openrisc/include/asm/syscall.h | 13 +
arch/parisc/include/asm/syscall.h | 19 +
arch/powerpc/include/asm/syscall.h | 20 +
arch/riscv/include/asm/syscall.h | 16 +
arch/s390/include/asm/syscall.h | 21 +
arch/sh/include/asm/syscall_32.h | 24 +
arch/sparc/include/asm/syscall.h | 22 +
arch/um/include/asm/syscall-generic.h | 19 +
arch/x86/include/asm/syscall.h | 43 ++
arch/xtensa/include/asm/syscall.h | 18 +
include/asm-generic/syscall.h | 30 +
include/uapi/linux/ptrace.h | 7 +-
kernel/ptrace.c | 179 +++++-
tools/testing/selftests/ptrace/Makefile | 2 +-
.../selftests/ptrace/set_syscall_info.c | 519 ++++++++++++++++++
25 files changed, 1145 insertions(+), 47 deletions(-)
create mode 100644 tools/testing/selftests/ptrace/set_syscall_info.c
--
ldv
Basics and overview
===================
Software with larger attack surfaces (e.g. network facing apps like databases,
browsers or apps relying on browser runtimes) suffer from memory corruption
issues which can be utilized by attackers to bend control flow of the program
to eventually gain control (by making their payload executable). Attackers are
able to perform such attacks by leveraging call-sites which rely on indirect
calls or return sites which rely on obtaining return address from stack memory.
To mitigate such attacks, risc-v extension zicfilp enforces that all indirect
calls must land on a landing pad instruction `lpad` else cpu will raise software
check exception (a new cpu exception cause code on riscv).
Similarly for return flow, risc-v extension zicfiss extends architecture with
- `sspush` instruction to push return address on a shadow stack
- `sspopchk` instruction to pop return address from shadow stack
and compare with input operand (i.e. return address on stack)
- `sspopchk` to raise software check exception if comparision above
was a mismatch
- Protection mechanism using which shadow stack is not writeable via
regular store instructions
More information an details can be found at extensions github repo [1].
Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel
CET [3] and branch target identification (BTI) [4] on arm.
Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control
stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack.
x86 and arm64 support for user mode shadow stack is already in mainline.
Kernel awareness for user control flow integrity
================================================
This series picks up Samuel Holland's envcfg changes [2] as well. So if those are
being applied independently, they should be removed from this series.
Enabling:
In order to maintain compatibility and not break anything in user mode, kernel
doesn't enable control flow integrity cpu extensions on binary by default.
Instead exposes a prctl interface to enable, disable and lock the shadow stack
or landing pad feature for a task. This allows userspace (loader) to enumerate
if all objects in its address space are compiled with shadow stack and landing
pad support and accordingly enable the feature. Additionally if a subsequent
`dlopen` happens on a library, user mode can take a decision again to disable
the feature (if incoming library is not compiled with support) OR terminate the
task (if user mode policy is strict to have all objects in address space to be
compiled with control flow integirty cpu feature). prctl to enable shadow stack
results in allocating shadow stack from virtual memory and activating for user
address space. x86 and arm64 are also following same direction due to similar
reason(s).
clone/fork:
On clone and fork, cfi state for task is inherited by child. Shadow stack is
part of virtual memory and is a writeable memory from kernel perspective
(writeable via a restricted set of instructions aka shadow stack instructions)
Thus kernel changes ensure that this memory is converted into read-only when
fork/clone happens and COWed when fault is taken due to sspush, sspopchk or
ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled,
kernel will automatically allocate a shadow stack for that clone call.
map_shadow_stack:
x86 introduced `map_shadow_stack` system call to allow user space to explicitly
map shadow stack memory in its address space. It is useful to allocate shadow
for different contexts managed by a single thread (green threads or contexts)
risc-v implements this system call as well.
signal management:
If shadow stack is enabled for a task, kernel performs an asynchronous control
flow diversion to deliver the signal and eventually expects userspace to issue
sigreturn so that original execution can be resumed. Even though resume context
is prepared by kernel, it is in user space memory and is subject to memory
corruption and corruption bugs can be utilized by attacker in this race window
to perform arbitrary sigreturn and eventually bypass cfi mechanism.
Another issue is how to ensure that cfi related state on sigcontext area is not
trampled by legacy apps or apps compiled with old kernel headers.
In order to mitigate control-flow hijacting, kernel prepares a token and place
it on shadow stack before signal delivery and places address of token in
sigcontext structure. During sigreturn, kernel obtains address of token from
sigcontext struture, reads token from shadow stack and validates it and only
then allow sigreturn to succeed. Compatiblity issue is solved by adopting
dynamic sigcontext management introduced for vector extension. This series
re-factor the code little bit to allow future sigcontext management easy (as
proposed by Andy Chiu from SiFive)
config and compilation:
Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this
config option picks the kernel support for user control flow integrity. This
optin is presented only if toolchain has shadow stack and landing pad support.
And is on purpose guarded by toolchain support. Reason being that eventually
vDSO also needs to be compiled in with shadow stack and landing pad support.
vDSO compile patches are not included as of now because landing pad labeling
scheme is yet to settle for usermode runtime.
To get more information on kernel interactions with respect to
zicfilp and zicfiss, patch series adds documentation for
`zicfilp` and `zicfiss` in following:
Documentation/arch/riscv/zicfiss.rst
Documentation/arch/riscv/zicfilp.rst
How to test this series
=======================
Toolchain
---------
$ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev
$ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static"
$ make -j$(nproc)
Qemu
----
Get the lastest qemu
$ cd qemu
$ mkdir build
$ cd build
$ ../configure --target-list=riscv64-softmmu
$ make -j$(nproc)
Opensbi
-------
$ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi
$ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic
Linux
-----
Running defconfig is fine. CFI is enabled by default if the toolchain
supports it.
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc)
In case you're building your own rootfs using toolchain, please make sure you
pick following patch to ensure that vDSO compiled with lpad and shadow stack.
"arch/riscv: compile vdso with landing pad"
Branch where above patch can be picked
https://github.com/deepak0414/linux-riscv-cfi/tree/vdso_user_cfi_v6.12-rc1
Running
-------
Modify your qemu command to have:
-bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin
-cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true
vDSO related Opens (in the flux)
=================================
I am listing these opens for laying out plan and what to expect in future
patch sets. And of course for the sake of discussion.
Shadow stack and landing pad enabling in vDSO
----------------------------------------------
vDSO must have shadow stack and landing pad support compiled in for task
to have shadow stack and landing pad support. This patch series doesn't
enable that (yet). Enabling shadow stack support in vDSO should be
straight forward (intend to do that in next versions of patch set). Enabling
landing pad support in vDSO requires some collaboration with toolchain folks
to follow a single label scheme for all object binaries. This is necessary to
ensure that all indirect call-sites are setting correct label and target landing
pads are decorated with same label scheme.
How many vDSOs
---------------
Shadow stack instructions are carved out of zimop (may be operations) and if CPU
doesn't implement zimop, they're illegal instructions. Kernel could be running on
a CPU which may or may not implement zimop. And thus kernel will have to carry 2
different vDSOs and expose the appropriate one depending on whether CPU implements
zimop or not.
References
==========
[1] - https://github.com/riscv/riscv-cfi
[2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.c…
[3] - https://lwn.net/Articles/889475/
[4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identific…
[5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-i…
[6] - https://lwn.net/Articles/940403/
---
changelog
---------
v9:
- rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion")
- dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs)
- dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs)
v8:
- rebased on palmer/for-next
- dropped samuel holland's `envcfg` context switch patches.
they are in parlmer/for-next
v7:
- Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv"
Instead using `deactivate_mm` flow to clean up.
see here for more context
https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.…
- Changed the header include in `kselftest`. Hopefully this fixes compile
issue faced by Zong Li at SiFive.
- Cleaned up an orphaned change to `mm/mmap.c` in below patch
"riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE"
- Lock interfaces for shadow stack and indirect branch tracking expect arg == 0
Any future evolution of this interface should accordingly define how arg should
be setup.
- `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper
`is_shadow_stack_vma`.
- Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@riv…
v6:
- Picked up Samuel Holland's changes as is with `envcfg` placed in
`thread` instead of `thread_info`
- fixed unaligned newline escapes in kselftest
- cleaned up messages in kselftest and included test output in commit message
- fixed a bug in clone path reported by Zong Li
- fixed a build issue if CONFIG_RISCV_ISA_V is not selected
(this was introduced due to re-factoring signal context
management code)
v5:
- rebased on v6.12-rc1
- Fixed schema related issues in device tree file
- Fixed some of the documentation related issues in zicfilp/ss.rst
(style issues and added index)
- added `SHADOW_STACK_SET_MARKER` so that implementation can define base
of shadow stack.
- Fixed warnings on definitions added in usercfi.h when
CONFIG_RISCV_USER_CFI is not selected.
- Adopted context header based signal handling as proposed by Andy Chiu
- Added support for enabling kernel mode access to shadow stack using
FWFT
(https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware…)
- Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@riv…
(Note: I had an issue in my workflow due to which version number wasn't
picked up correctly while sending out patches)
v4:
- rebased on 6.11-rc6
- envcfg: Converged with Samuel Holland's patches for envcfg management on per-
thread basis.
- vma_is_shadow_stack is renamed to is_vma_shadow_stack
- picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch
- signal context: using extended context management to maintain compatibility.
- fixed `-Wmissing-prototypes` compiler warnings for prctl functions
- Documentation fixes and amending typos.
- Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/
v3:
- envcfg
logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been
picked on per task basis, even though CPU didn't implement it. Fixed in
this series.
- dt-bindings
As suggested, split into separate commit. fixed the messaging that spec is
in public review
- arch_is_shadow_stack change
arch_is_shadow_stack changed to vma_is_shadow_stack
- hwprobe
zicfiss / zicfilp if present will get enumerated in hwprobe
- selftests
As suggested, added object and binary filenames to .gitignore
Selftest binary anyways need to be compiled with cfi enabled compiler which
will make sure that landing pad and shadow stack are enabled. Thus removed
separate enable/disable tests. Cleaned up tests a bit.
- Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/
v2:
- Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow
integrity for user mode programs can be compiled in the kernel.
- Enabling of control flow integrity for user programs is left to user runtime
- This patch series introduces arch agnostic `prctls` to enable shadow stack
and indirect branch tracking. And implements them on riscv.
---
Changes in v9:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
- Link to v8: https://lore.kernel.org/r/20241111-v5_user_cfi_series-v8-0-dce14aa30207@riv…
Changes in v8:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
- Link to v7: https://lore.kernel.org/r/20241029-v5_user_cfi_series-v7-0-2727ce9936cb@riv…
---
Andy Chiu (1):
riscv: signal: abstract header saving for setup_sigcontext
Clément Léger (1):
riscv: Add Firmware Feature SBI extensions definitions
Deepak Gupta (24):
mm: helper `is_shadow_stack_vma` to check shadow stack vma
dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml)
riscv: zicfiss / zicfilp enumeration
riscv: zicfiss / zicfilp extension csr and bit definitions
riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit
riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE
riscv mm: manufacture shadow stack pte
riscv mmu: teach pte_mkwrite to manufacture shadow stack PTEs
riscv mmu: write protect and shadow stack
riscv/mm: Implement map_shadow_stack() syscall
riscv/shstk: If needed allocate a new shadow stack on clone
riscv: Implements arch agnostic shadow stack prctls
prctl: arch-agnostic prctl for indirect branch tracking
riscv/traps: Introduce software check exception
riscv/signal: save and restore of shadow stack for signal
riscv/kernel: update __show_regs to print shadow stack register
riscv/ptrace: riscv cfi status and state via ptrace and in core files
riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe
riscv: enable kernel access to shadow stack memory via FWFT sbi call
riscv: kernel command line option to opt out of user cfi
riscv: create a config for shadow stack and landing pad instr support
riscv: Documentation for landing pad / indirect branch tracking
riscv: Documentation for shadow stack on riscv
kselftest/riscv: kselftest for user mode cfi
Documentation/arch/riscv/index.rst | 2 +
Documentation/arch/riscv/zicfilp.rst | 115 +++++
Documentation/arch/riscv/zicfiss.rst | 176 +++++++
.../devicetree/bindings/riscv/extensions.yaml | 14 +
arch/riscv/Kconfig | 20 +
arch/riscv/include/asm/asm-prototypes.h | 1 +
arch/riscv/include/asm/cpufeature.h | 13 +
arch/riscv/include/asm/csr.h | 16 +
arch/riscv/include/asm/entry-common.h | 2 +
arch/riscv/include/asm/hwcap.h | 2 +
arch/riscv/include/asm/mman.h | 25 +
arch/riscv/include/asm/mmu_context.h | 7 +
arch/riscv/include/asm/pgtable.h | 30 +-
arch/riscv/include/asm/processor.h | 2 +
arch/riscv/include/asm/sbi.h | 26 +
arch/riscv/include/asm/thread_info.h | 3 +
arch/riscv/include/asm/usercfi.h | 89 ++++
arch/riscv/include/asm/vector.h | 3 +
arch/riscv/include/uapi/asm/hwprobe.h | 2 +
arch/riscv/include/uapi/asm/ptrace.h | 22 +
arch/riscv/include/uapi/asm/sigcontext.h | 1 +
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/asm-offsets.c | 8 +
arch/riscv/kernel/cpufeature.c | 2 +
arch/riscv/kernel/entry.S | 31 +-
arch/riscv/kernel/head.S | 12 +
arch/riscv/kernel/process.c | 26 +-
arch/riscv/kernel/ptrace.c | 83 ++++
arch/riscv/kernel/signal.c | 142 +++++-
arch/riscv/kernel/sys_hwprobe.c | 2 +
arch/riscv/kernel/sys_riscv.c | 10 +
arch/riscv/kernel/traps.c | 43 ++
arch/riscv/kernel/usercfi.c | 524 +++++++++++++++++++++
arch/riscv/mm/init.c | 2 +-
arch/riscv/mm/pgtable.c | 17 +
include/linux/cpu.h | 4 +
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 27 ++
kernel/sys.c | 30 ++
mm/gup.c | 2 +-
mm/mmap.c | 2 +-
mm/vma.h | 10 +-
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/cfi/.gitignore | 3 +
tools/testing/selftests/riscv/cfi/Makefile | 10 +
tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 84 ++++
tools/testing/selftests/riscv/cfi/riscv_cfi_test.c | 78 +++
tools/testing/selftests/riscv/cfi/shadowstack.c | 375 +++++++++++++++
tools/testing/selftests/riscv/cfi/shadowstack.h | 37 ++
49 files changed, 2106 insertions(+), 33 deletions(-)
---
base-commit: 39a803b754d5224a3522016b564113ee1e4091b2
change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2
--
- debug
Hi all,
On Mon, 10 Feb 2025 04:43:28 +0530 Shuah Khan <skhan(a)linuxfoundation.org> wrote:
>
> On Sun, Feb 9, 2025, 2:44 AM Kees Cook <kees(a)kernel.org> wrote:
>
> > On Fri, Oct 11, 2024 at 07:53:43AM -0600, Shuah Khan wrote:
> > > On 10/11/24 01:25, David Gow wrote:
> > > > As discussed in [1], the KUnit test naming scheme has changed to avoid
> > > > name conflicts (and tab-completion woes) with the files being tested.
> > > > These renames and moves have caused a nasty set of merge conflicts, so
> > > > this series collates and rebases them all to be applied via
> > > > mm-nonmm-unstable alongside any lib/ changes[2].
> >
> > Shall I carry this in the hardening tree? I didn't see it land in the
> > merge window, and I still don't see it in -next?
> >
>
> My thinking was that this series would go through Andrew's tree to avoid
> conflicts. Please take it through yours.
If they have been rebased onto mm-nonmm-unstable, then they really need
to go through Andrew' tree (since mm-nonmm-unstable gets rebased often).
--
Cheers,
Stephen Rothwell
[ Background ]
On ARM GIC systems and others, the target address of the MSI is translated
by the IOMMU. For GIC, the MSI address page is called "ITS" page. When the
IOMMU is disabled, the MSI address is programmed to the physical location
of the GIC ITS page (e.g. 0x20200000). When the IOMMU is enabled, the ITS
page is behind the IOMMU, so the MSI address is programmed to an allocated
IO virtual address (a.k.a IOVA), e.g. 0xFFFF0000, which must be mapped to
the physical ITS page: IOVA (0xFFFF0000) ===> PA (0x20200000).
When a 2-stage translation is enabled, IOVA will be still used to program
the MSI address, though the mappings will be in two stages:
IOVA (0xFFFF0000) ===> IPA (e.g. 0x80900000) ===> PA (0x20200000)
(IPA stands for Intermediate Physical Address).
If the device that generates MSI is attached to an IOMMU_DOMAIN_DMA, the
IOVA is dynamically allocated from the top of the IOVA space. If attached
to an IOMMU_DOMAIN_UNMANAGED (e.g. a VFIO passthrough device), the IOVA is
fixed to an MSI window reported by the IOMMU driver via IOMMU_RESV_SW_MSI,
which is hardwired to MSI_IOVA_BASE (IOVA==0x8000000) for ARM IOMMUs.
So far, this IOMMU_RESV_SW_MSI works well as kernel is entirely in charge
of the IOMMU translation (1-stage translation), since the IOVA for the ITS
page is fixed and known by kernel. However, with virtual machine enabling
a nested IOMMU translation (2-stage), a guest kernel directly controls the
stage-1 translation with an IOMMU_DOMAIN_DMA, mapping a vITS page (at an
IPA 0x80900000) onto its own IOVA space (e.g. 0xEEEE0000). Then, the host
kernel can't know that guest-level IOVA to program the MSI address.
There have been two approaches to solve this problem:
1. Create an identity mapping in the stage-1. VMM could insert a few RMRs
(Reserved Memory Regions) in guest's IORT. Then the guest kernel would
fetch these RMR entries from the IORT and create an IOMMU_RESV_DIRECT
region per iommu group for a direct mapping. Eventually, the mappings
would look like: IOVA (0x8000000) === IPA (0x8000000) ===> 0x20200000
This requires an IOMMUFD ioctl for kernel and VMM to agree on the IPA.
2. Forward the guest-level MSI IOVA captured by VMM to the host-level GIC
driver, to program the correct MSI IOVA. Forward the VMM-defined vITS
page location (IPA) to the kernel for the stage-2 mapping. Eventually:
IOVA (0xFFFF0000) ===> IPA (0x80900000) ===> PA (0x20200000)
This requires a VFIO ioctl (for IOVA) and an IOMMUFD ioctl (for IPA).
Worth mentioning that when Eric Auger was working on the same topic with
the VFIO iommu uAPI, he had the approach (2) first, and then switched to
the approach (1), suggested by Jean-Philippe for reduction of complexity.
The approach (1) basically feels like the existing VFIO passthrough that
has a 1-stage mapping for the unmanaged domain, yet only by shifting the
MSI mapping from stage 1 (guest-has-no-iommu case) to stage 2 (guest-has-
iommu case). So, it could reuse the existing IOMMU_RESV_SW_MSI piece, by
sharing the same idea of "VMM leaving everything to the kernel".
The approach (2) is an ideal solution, yet it requires additional effort
for kernel to be aware of the 1-stage gIOVA(s) and 2-stage IPAs for vITS
page(s), which demands VMM to closely cooperate.
* It also brings some complicated use cases to the table where the host
or/and guest system(s) has/have multiple ITS pages.
[ Execution ]
Though these two approaches feel very different on the surface, they can
share some underlying common infrastructure. Currently, only one pair of
sw_msi functions (prepare/compose) are provided by dma-iommu for irqchip
drivers to directly use. There could be different versions of functions
from different domain owners: for existing VFIO passthrough cases and in-
kernel DMA domain cases, reuse the existing dma-iommu's version of sw_msi
functions; for nested translation use cases, there can be another version
of sw_msi functions to handle mapping and msi_msg(s) differently.
To support both approaches, in this series
- Get rid of the duplication in the "compose" function
- Introduce a function pointer for the previously "prepare" function
- Allow different domain owners to set their own "sw_msi" implementations
- Implement an iommufd_sw_msi function to additionally support a nested
translation use case using the approach (2), i.e. the RMR solution
- Add a pair of IOMMUFD options for a SW_MSI window for kernel and VMM to
agree on (for approach 1)
- Add a new VFIO ioctl to set the MSI(x) vector(s) for iommufd_sw_msi()
to update the msi_desc structure accordingly (for approach 2)
A missing piece
- Potentially another IOMMUFD_CMD_IOAS_MAP_MSI ioctl for VMM to map the
IPAs of the vITS page(s) in the stage-2 io page table. (for approach 2)
(in this RFC, conveniently reuse the new IOMMUFD SW_MSI options to set
the vITS page's IPA, which works finely in a single-vITS-page case.)
This is a joint effort that includes Jason's rework in irq/iommu/iommufd
base level and my additional patches on top of that for new uAPIs.
This series is on github:
https://github.com/nicolinc/iommufd/commits/iommufd_msi-rfcv2
Pairing QEMU branch for testing (approach 1):
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_msi-rfcv2-rmr
Pairing QEMU branch for testing (approach 2):
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_msi-rfcv2-vits
Changelog
v2
* Rebase on v6.13-rc6
* Drop all the irq/pci patches and rework the compose function instead
* Add a new sw_msi op to iommu_domain for a per type implementation and
let iommufd core has its own implementation to support both approaches
* Add RMR-solution (approach 1) support since it is straightforward and
have been used in some out-of-tree projects widely
v1
https://lore.kernel.org/kvm/cover.1731130093.git.nicolinc@nvidia.com/
Thanks!
Nicolin
Jason Gunthorpe (5):
genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of
iommu_cookie
genirq/msi: Rename iommu_dma_compose_msi_msg() to
msi_msg_set_msi_addr()
iommu: Make iommu_dma_prepare_msi() into a generic operation
irqchip: Have CONFIG_IRQ_MSI_IOMMU be selected by the irqchips that
need it
iommufd: Implement sw_msi support natively
Nicolin Chen (8):
iommu: Turn fault_data to iommufd private pointer
iommufd: Make attach_handle generic
iommu: Turn iova_cookie to dma-iommu private pointer
iommufd: Add IOMMU_OPTION_SW_MSI_START/SIZE ioctls
iommufd/selftes: Add coverage for IOMMU_OPTION_SW_MSI_START/SIZE
iommufd/device: Allow setting IOVAs for MSI(x) vectors
vfio-iommufd: Provide another layer of msi_iova helpers
vfio/pci: Allow preset MSI IOVAs via VFIO_IRQ_SET_ACTION_PREPARE
drivers/iommu/Kconfig | 1 -
drivers/irqchip/Kconfig | 4 +
kernel/irq/Kconfig | 1 +
drivers/iommu/iommufd/iommufd_private.h | 69 ++--
include/linux/iommu.h | 58 ++--
include/linux/iommufd.h | 6 +
include/linux/msi.h | 43 ++-
include/linux/vfio.h | 25 ++
include/uapi/linux/iommufd.h | 18 +-
include/uapi/linux/vfio.h | 8 +-
drivers/iommu/dma-iommu.c | 63 ++--
drivers/iommu/iommu.c | 29 ++
drivers/iommu/iommufd/device.c | 312 ++++++++++++++++--
drivers/iommu/iommufd/fault.c | 122 +------
drivers/iommu/iommufd/hw_pagetable.c | 5 +-
drivers/iommu/iommufd/io_pagetable.c | 4 +-
drivers/iommu/iommufd/ioas.c | 34 ++
drivers/iommu/iommufd/main.c | 15 +
drivers/irqchip/irq-gic-v2m.c | 5 +-
drivers/irqchip/irq-gic-v3-its.c | 13 +-
drivers/irqchip/irq-gic-v3-mbi.c | 12 +-
drivers/irqchip/irq-ls-scfg-msi.c | 5 +-
drivers/vfio/iommufd.c | 27 ++
drivers/vfio/pci/vfio_pci_intrs.c | 46 +++
drivers/vfio/vfio_main.c | 3 +
tools/testing/selftests/iommu/iommufd.c | 53 +++
.../selftests/iommu/iommufd_fail_nth.c | 14 +
27 files changed, 712 insertions(+), 283 deletions(-)
--
2.43.0
Hello,
My name is David Song, at AA4 FS, we are a consultancy and
brokerage Firm specializing in Growth Financial Loan and joint
partnership venture. We specialize in investments in all Private
and public sectors in a broad range of areas within our Financial
Investment Services.
We are experts in financial and operational management, due
diligence and capital planning in all markets and industries. Our
Investors wish to invest in any viable Project presented by your
Management after reviews on your Business Project Presentation
Plan.
We look forward to your Swift response. We also offer commission
to consultants and brokers for any partnership referrals.
Regards,
David Song
Senior Broker
AA4 Financial Services
13 Wonersh Way, Cheam,
Sutton, Surrey, SM2 7LX
Email: dsong(a)aa4financialservice.com
┌────────────┐ ┌───────────────────────────────────┐ ┌────────────────┐
│ │ │ │ │ │
│ │ │ PCI Endpoint │ │ PCI Host │
│ │ │ │ │ │
│ │◄──┤ 1.platform_msi_domain_alloc_irqs()│ │ │
│ │ │ │ │ │
│ MSI ├──►│ 2.write_msi_msg() ├──►├─BAR<n> │
│ Controller │ │ update doorbell register address│ │ │
│ │ │ for BAR │ │ │
│ │ │ │ │ 3. Write BAR<n>│
│ │◄──┼───────────────────────────────────┼───┤ │
│ │ │ │ │ │
│ ├──►│ 4.Irq Handle │ │ │
│ │ │ │ │ │
│ │ │ │ │ │
└────────────┘ └───────────────────────────────────┘ └────────────────┘
This patches based on old https://lore.kernel.org/imx/20221124055036.1630573-1-Frank.Li@nxp.com/
Original patch only target to vntb driver. But actually it is common
method.
This patches add new API to pci-epf-core, so any EP driver can use it.
Previous v2 discussion here.
https://lore.kernel.org/imx/20230911220920.1817033-1-Frank.Li@nxp.com/
Changes in v14:
Marc Zyngier raised concerns about adding DOMAIN_BUS_DEVICE_PCI_EP_MSI. As
a result, the approach has been reverted to the v9 method. However, there
are several improvements:
MSI now supports msi-map in addition to msi-parent.
- The struct device: id is used as the endpoint function (EPF) device
identity to map to the stream ID (sideband information).
- The EPC device tree source (DTS) utilizes msi-map to provide such
information.
- The EPF device's of_node is set to the EPC controller’s node. This
approach is commonly used for multi-function device (MFD) platform child
devices, allowing them to inherit properties from the MFD device’s DTS,
such as reset-cells and gpio-cells. This method is well-suited for the
current case, as the EPF is inherently created/binded to the EPC and
should inherit the EPC’s DTS node properties.
Additionally:
Since the basic IMX95 LUT support has already been merged into the
mainline, a DTS and driver increment patch is added to complete the
solution. The patch is rebased onto the latest linux-next tree and
aligned with the new pcitest framework.
- Link to v13: https://lore.kernel.org/r/20241218-ep-msi-v13-0-646e2192dc24@nxp.com
Changes in v13:
- Change to use DOMAIN_BUS_PCI_DEVICE_EP_MSI
- Change request id as func | vfunc << 3
- Remove IRQ_DOMAIN_MSI_IMMUTABLE
Thomas Gleixner:
I hope capture all your points in review comments. If missed, let me know.
- Link to v12: https://lore.kernel.org/r/20241211-ep-msi-v12-0-33d4532fa520@nxp.com
Changes in v12:
- Change to use IRQ_DOMAIN_MSI_IMMUTABLE and add help function
irq_domain_msi_is_immuatble().
- split PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check to 3 patches
- Link to v11: https://lore.kernel.org/r/20241209-ep-msi-v11-0-7434fa8397bd@nxp.com
Changes in v11:
- Change to use MSI_FLAG_MSG_IMMUTABLE
- Link to v10: https://lore.kernel.org/r/20241204-ep-msi-v10-0-87c378dbcd6d@nxp.com
Changes in v10:
Thomas Gleixner:
There are big change in pci-ep-msi.c. I am sure if go on the
corrent path. The key improvement is remove only 1 function devices's
limitation.
I use new patch for imutable check, which relative additional
feature compared to base enablement patch.
- Remove patch Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
- Add new patch irqchip/gic-v3-its: Avoid overwriting msi_prepare callback if provided by msi_domain_info
- Remove only support 1 endpoint function limiation.
- Create one MSI domain for each endpoint function devices.
- Use "msi-map" in pci ep controler node, instead of of msi-parent. first
argument is
(func_no << 8 | vfunc_no)
- Link to v9: https://lore.kernel.org/r/20241203-ep-msi-v9-0-a60dbc3f15dd@nxp.com
Changes in v9
- Add patch platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
- Remove patch PCI: endpoint: Add pci_epc_get_fn() API for customizable filtering
- Remove API pci_epf_align_inbound_addr_lo_hi
- Move doorbell_alloc in to doorbell_enable function.
- Link to v8: https://lore.kernel.org/r/20241116-ep-msi-v8-0-6f1f68ffd1bb@nxp.com
Changes in v8:
- update helper function name to pci_epf_align_inbound_addr()
- Link to v7: https://lore.kernel.org/r/20241114-ep-msi-v7-0-d4ac7aafbd2c@nxp.com
Changes in v7:
- Add helper function pci_epf_align_addr();
- Link to v6: https://lore.kernel.org/r/20241112-ep-msi-v6-0-45f9722e3c2a@nxp.com
Changes in v6:
- change doorbell_addr to doorbell_offset
- use round_down()
- add Niklas's test by tag
- rebase to pci/endpoint
- Link to v5: https://lore.kernel.org/r/20241108-ep-msi-v5-0-a14951c0d007@nxp.com
Changes in v5:
- Move request_irq to epf test function driver for more flexiable user case
- Add fixed size bar handler
- Some minor improvememtn to see each patches's changelog.
- Link to v4: https://lore.kernel.org/r/20241031-ep-msi-v4-0-717da2d99b28@nxp.com
Changes in v4:
- Remove patch genirq/msi: Add cleanup guard define for msi_lock_descs()/msi_unlock_descs()
- Use new method to avoid compatible problem.
Add new command DOORBELL_ENABLE and DOORBELL_DISABLE.
pcitest -B send DOORBELL_ENABLE first, EP test function driver try to
remap one of BAR_N (except test register bar) to ITS MSI MMIO space. Old
driver don't support new command, so failure return, not side effect.
After test, DOORBELL_DISABLE command send out to recover original map, so
pcitest bar test can pass as normal.
- Other detail change see each patches's change log
- Link to v3: https://lore.kernel.org/r/20241015-ep-msi-v3-0-cedc89a16c1a@nxp.com
Change from v2 to v3
- Fixed manivannan's comments
- Move common part to pci-ep-msi.c and pci-ep-msi.h
- rebase to 6.12-rc1
- use RevID to distingiush old version
mkdir /sys/kernel/config/pci_ep/functions/pci_epf_test/func1
echo 16 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/msi_interrupts
echo 0x080c > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/deviceid
echo 0x1957 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/vendorid
echo 1 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/revid
^^^^^^ to enable platform msi support.
ln -s /sys/kernel/config/pci_ep/functions/pci_epf_test/func1 /sys/kernel/config/pci_ep/controllers/4c380000.pcie-ep
- use new device ID, which identify support doorbell to avoid broken
compatility.
Enable doorbell support only for PCI_DEVICE_ID_IMX8_DB, while other devices
keep the same behavior as before.
EP side RC with old driver RC with new driver
PCI_DEVICE_ID_IMX8_DB no probe doorbell enabled
Other device ID doorbell disabled* doorbell disabled*
* Behavior remains unchanged.
Change from v1 to v2
- Add missed patch for endpont/pci-epf-test.c
- Move alloc and free to epc driver from epf.
- Provide general help function for EPC driver to alloc platform msi irq.
- Fixed manivannan's comments.
Signed-off-by: Frank Li <Frank.Li(a)nxp.com>
---
Frank Li (15):
platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
irqdomain: Add IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and irq_domain_is_msi_immutable()
irqchip/gic-v3-its: Set IRQ_DOMAIN_FLAG_MSI_IMMUTABLE for ITS
irqchip/gic-v3-its: Add support for device tree msi-map and msi-mask
PCI: endpoint: Set ID and of_node for function driver
PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller
PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check
PCI: endpoint: Add pci_epf_align_inbound_addr() helper for address alignment
PCI: endpoint: pci-epf-test: Add doorbell test support
misc: pci_endpoint_test: Add doorbell test case
selftests: pci_endpoint: Add doorbell test case
pci: imx6: Add helper function imx_pcie_add_lut_by_rid()
pci: imx6: Add LUT setting for MSI/IOMMU in Endpoint mode
arm64: dts: imx95: Add msi-map for pci-ep device
arm64: dts: imx95-19x19-evk: Add PCIe1 endpoint function overlay file
arch/arm64/boot/dts/freescale/Makefile | 3 +
.../dts/freescale/imx95-19x19-evk-pcie1-ep.dtso | 21 ++++
arch/arm64/boot/dts/freescale/imx95.dtsi | 1 +
drivers/base/platform-msi.c | 1 +
drivers/irqchip/irq-gic-v3-its-msi-parent.c | 8 ++
drivers/irqchip/irq-gic-v3-its.c | 2 +-
drivers/misc/pci_endpoint_test.c | 81 +++++++++++++
drivers/pci/controller/dwc/pci-imx6.c | 25 ++--
drivers/pci/endpoint/Makefile | 2 +-
drivers/pci/endpoint/functions/pci-epf-test.c | 132 +++++++++++++++++++++
drivers/pci/endpoint/pci-ep-msi.c | 89 ++++++++++++++
drivers/pci/endpoint/pci-epf-core.c | 48 ++++++++
include/linux/irqdomain.h | 7 ++
include/linux/pci-ep-msi.h | 26 ++++
include/linux/pci-epf.h | 21 ++++
include/uapi/linux/pcitest.h | 1 +
.../selftests/pci_endpoint/pci_endpoint_test.c | 25 ++++
17 files changed, 483 insertions(+), 10 deletions(-)
---
base-commit: 00f3246adeeacbda0bd0b303604e46eb59c32e6e
change-id: 20241010-ep-msi-8b4cab33b1be
Best regards,
---
Frank Li <Frank.Li(a)nxp.com>
Greetings
Welcome to v5. No functional changes; removed an unused variable from
patch 2.
This is an attempt to followup on something Jakub asked me about [1],
adding an xsk attribute to queues and more clearly documenting which
queues are linked to NAPIs...
After the RFC [2], Jakub suggested creating an empty nest for queues
which have a pool, so I've adjusted this version to work that way.
The nest can be extended in the future to express attributes about XSK
as needed. Queues which are not used for AF_XDP do not have the xsk
attribute present.
I've run the included test on:
- my mlx5 machine (via NETIF=)
- without setting NETIF
And the test seems to pass in both cases.
Thanks,
Joe
[1]: https://lore.kernel.org/netdev/20250113143109.60afa59a@kernel.org/
[2]: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
v5:
- Removed unused ret variable from patch 2 as Simon suggested.
v4: https://lore.kernel.org/lkml/20250207030916.32751-1-jdamato@fastly.com/
- Add patch 1, as suggested by Jakub, which adds an empty nest helper.
- Use the helper in patch 2, which makes the code cleaner and prevents
a possible bug.
v3: https://lore.kernel.org/netdev/20250204191108.161046-1-jdamato@fastly.com/
- Change comment format in patch 2 to avoid kdoc warnings. No other
changes.
v2: https://lore.kernel.org/all/20250203185828.19334-1-jdamato@fastly.com/
- Switched from RFC to actual submission now that net-next is open
- Adjusted patch 1 to include an empty nest as suggested by Jakub
- Adjusted patch 2 to update the test based on changes to patch 1, and
to incorporate some Python feedback from Jakub :)
rfc: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
Joe Damato (3):
netlink: Add nla_put_empty_nest helper
netdev-genl: Add an XSK attribute to queues
selftests: drv-net: Test queue xsk attribute
Documentation/netlink/specs/netdev.yaml | 13 ++-
include/net/netlink.h | 15 ++++
include/uapi/linux/netdev.h | 6 ++
net/core/netdev-genl.c | 11 +++
tools/include/uapi/linux/netdev.h | 6 ++
.../testing/selftests/drivers/net/.gitignore | 2 +
tools/testing/selftests/drivers/net/Makefile | 3 +
tools/testing/selftests/drivers/net/queues.py | 35 +++++++-
.../selftests/drivers/net/xdp_helper.c | 89 +++++++++++++++++++
9 files changed, 177 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/.gitignore
create mode 100644 tools/testing/selftests/drivers/net/xdp_helper.c
base-commit: 233a2b1480a0bdf6b40d4debf58a07084e9921ff
--
2.43.0
Those two scripts were used by test_flow_dissector.sh to setup/cleanup
the network topology before/after the tests. test_flow_dissector.sh
have been deleted by commit 63b37657c5fd ("selftests/bpf: remove
test_flow_dissector.sh") so they aren't used anywhere now.
Remove the two unused scripts and their Makefile entries.
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
---
tools/testing/selftests/bpf/Makefile | 4 +--
tools/testing/selftests/bpf/with_addr.sh | 54 -----------------------------
tools/testing/selftests/bpf/with_tunnels.sh | 36 -------------------
3 files changed, 2 insertions(+), 92 deletions(-)
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 6722080b2107a56c1500bab4d7b638f5bcde2a82..e9dfd96fa48c98a736c7b39356757b1b3680220a 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -117,8 +117,8 @@ TEST_PROGS := test_kmod.sh \
test_xsk.sh \
test_xdp_features.sh
-TEST_PROGS_EXTENDED := with_addr.sh \
- with_tunnels.sh ima_setup.sh verify_sig_setup.sh \
+TEST_PROGS_EXTENDED := \
+ ima_setup.sh verify_sig_setup.sh \
test_xdp_vlan.sh test_bpftool.py
TEST_KMODS := bpf_testmod.ko bpf_test_no_cfi.ko bpf_test_modorder_x.ko \
diff --git a/tools/testing/selftests/bpf/with_addr.sh b/tools/testing/selftests/bpf/with_addr.sh
deleted file mode 100755
index ffcd3953f94cf2301f0e5158d141ccb48b2fb1c0..0000000000000000000000000000000000000000
--- a/tools/testing/selftests/bpf/with_addr.sh
+++ /dev/null
@@ -1,54 +0,0 @@
-#!/bin/bash
-# SPDX-License-Identifier: GPL-2.0
-#
-# add private ipv4 and ipv6 addresses to loopback
-
-readonly V6_INNER='100::a/128'
-readonly V4_INNER='192.168.0.1/32'
-
-if getopts ":s" opt; then
- readonly SIT_DEV_NAME='sixtofourtest0'
- readonly V6_SIT='2::/64'
- readonly V4_SIT='172.17.0.1/32'
- shift
-fi
-
-fail() {
- echo "error: $*" 1>&2
- exit 1
-}
-
-setup() {
- ip -6 addr add "${V6_INNER}" dev lo || fail 'failed to setup v6 address'
- ip -4 addr add "${V4_INNER}" dev lo || fail 'failed to setup v4 address'
-
- if [[ -n "${V6_SIT}" ]]; then
- ip link add "${SIT_DEV_NAME}" type sit remote any local any \
- || fail 'failed to add sit'
- ip link set dev "${SIT_DEV_NAME}" up \
- || fail 'failed to bring sit device up'
- ip -6 addr add "${V6_SIT}" dev "${SIT_DEV_NAME}" \
- || fail 'failed to setup v6 SIT address'
- ip -4 addr add "${V4_SIT}" dev "${SIT_DEV_NAME}" \
- || fail 'failed to setup v4 SIT address'
- fi
-
- sleep 2 # avoid race causing bind to fail
-}
-
-cleanup() {
- if [[ -n "${V6_SIT}" ]]; then
- ip -4 addr del "${V4_SIT}" dev "${SIT_DEV_NAME}"
- ip -6 addr del "${V6_SIT}" dev "${SIT_DEV_NAME}"
- ip link del "${SIT_DEV_NAME}"
- fi
-
- ip -4 addr del "${V4_INNER}" dev lo
- ip -6 addr del "${V6_INNER}" dev lo
-}
-
-trap cleanup EXIT
-
-setup
-"$@"
-exit "$?"
diff --git a/tools/testing/selftests/bpf/with_tunnels.sh b/tools/testing/selftests/bpf/with_tunnels.sh
deleted file mode 100755
index e24949ed3a20813e34bd69bdd634b0f32b7a26e9..0000000000000000000000000000000000000000
--- a/tools/testing/selftests/bpf/with_tunnels.sh
+++ /dev/null
@@ -1,36 +0,0 @@
-#!/bin/bash
-# SPDX-License-Identifier: GPL-2.0
-#
-# setup tunnels for flow dissection test
-
-readonly SUFFIX="test_$(mktemp -u XXXX)"
-CONFIG="remote 127.0.0.2 local 127.0.0.1 dev lo"
-
-setup() {
- ip link add "ipip_${SUFFIX}" type ipip ${CONFIG}
- ip link add "gre_${SUFFIX}" type gre ${CONFIG}
- ip link add "sit_${SUFFIX}" type sit ${CONFIG}
-
- echo "tunnels before test:"
- ip tunnel show
-
- ip link set "ipip_${SUFFIX}" up
- ip link set "gre_${SUFFIX}" up
- ip link set "sit_${SUFFIX}" up
-}
-
-
-cleanup() {
- ip tunnel del "ipip_${SUFFIX}"
- ip tunnel del "gre_${SUFFIX}"
- ip tunnel del "sit_${SUFFIX}"
-
- echo "tunnels after test:"
- ip tunnel show
-}
-
-trap cleanup EXIT
-
-setup
-"$@"
-exit "$?"
---
base-commit: ab4839cd3b4d65bffd574035a02568809c54a5b6
change-id: 20250130-with-388b72afcbc0
Best regards,
--
Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
generic_map_lookup_batch
The generic_map_lookup_batch currently returns EINTR if it fails with
ENOENT and retries several times on bpf_map_copy_value. The next batch
would start from the same location, presuming it's a transient issue.
This is incorrect if a map can actually have "holes", i.e.
"get_next_key" can return a key that does not point to a valid value. At
least the array of maps type may contain such holes legitly. Right now
these holes show up, generic batch lookup cannot proceed any more. It
will always fail with EINTR errors.
This patch fixes this behavior by skipping the non-existing key, and
does not return EINTR any more.
V1->V2: split the fix and selftests; fixed a few selftests issues.
V1: https://lore.kernel.org/bpf/Z6OYbS4WqQnmzi2z@debian.debian/
Yan Zhai (2):
bpf: skip non exist keys in generic_map_lookup_batch
selftests: bpf: test batch lookup on array of maps with holes
kernel/bpf/syscall.c | 16 ++---
.../bpf/map_tests/map_in_map_batch_ops.c | 62 +++++++++++++------
2 files changed, 49 insertions(+), 29 deletions(-)
--
2.39.5
Kent, to continue our discussion from last November, I've gone through
more parts of the eytzinger code and as a result, here are some patches
for you to consider.
What I've not looked at are the eytzinger_to_inorder and
inorder_to_eytzinger functions, as well as the implementation of sort.
Those functions could use a bit more documentation, but the code iself
looks reasonable.
Shuah, I've also had a quick look at converting the tests into kernel
selftests, but that hasn't gone very far because of the lack of support
for basic functions like __fls(), __ffs(), ffz(), and
rounddown_pow_of_two() in selftests. Are there any plans for making
those kinds of primitives generally available to selftests?
Thanks,
Andreas
Andreas Gruenbacher (21):
bcachefs: remove dead code in is_aligned
bcachefs: bch2_blacklist_entries_gc cleanup
bcachefs: Run the eytzinger tests on modprobe
bcachefs: EYTZINGER_DEBUG fix
bcachefs: eytzinger self tests: eytzinger0_for_each loop cleanups
bcachefs: eytzinger self tests: missing newline termination
bcachefs: eytzinger self tests: fix cmp_u16 typo
bcachefs: eytzinger[01]_test improvement
bcachefs: eytzinger0_find_test improvement
bcachefs: add eytzinger0_for_each_prev
bcachefs: improve the eytzinger0_find_le tests
bcachefs: convert eytzinger0_find to be 1-based
bcachefs: convert eytzinger0_find_le to be 1-based
bcachefs: simplify eytzinger0_find_le
bcachefs: add eytzinger0_find_gt tests
bcachefs: implement eytzinger0_find_gt directly
bcachefs: add eytzinger0_find_ge tests
bcachefs: implement eytzinger0_find_ge directly
bcachefs: convert eytzinger sort to be 1-based (1)
bcachefs: convert eytzinger sort to be 1-based (2)
bcachefs: eytzinger1_{next,prev} cleanup
fs/bcachefs/eytzinger.c | 89 +++++++-------
fs/bcachefs/eytzinger.h | 99 +++++++--------
fs/bcachefs/journal_seq_blacklist.c | 7 +-
fs/bcachefs/super.c | 5 +
fs/bcachefs/util.c | 183 +++++++++++++++++++++-------
fs/bcachefs/util.h | 4 +
6 files changed, 240 insertions(+), 147 deletions(-)
--
2.48.1
The 2024 architecture release includes a number of data processing
extensions, mostly SVE and SME additions with a few others. These are
all very straightforward extensions which add instructions but no
architectural state so only need hwcaps and exposing of the ID registers
to KVM guests and userspace.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v5:
- Rebase onto arm64/for-next/cpufeature, which incorporates most of the
sysreg updates from earlier versions.
- Remove SF8MM8 and SF8MM4 register defintitions which were removed from
the ISA in the 2024-12 XML release, along with their associated hwcaps.
- Incorporate Marc's fix for SVE hwcaps on SME only systems and update
the hwcaps for the newly added features to follow the same pattern.
- Link to v4: https://lore.kernel.org/r/20241211-arm64-2024-dpisa-v4-0-0fd403876df2@kerne…
Changes in v4:
- Fix encodings for ID_AA64ISAR3_EL1.
- Link to v3: https://lore.kernel.org/r/20241203-arm64-2024-dpisa-v3-0-a6c78b1aa297@kerne…
Changes in v3:
- Commit log update for the hwcap test.
- Link to v2: https://lore.kernel.org/r/20241030-arm64-2024-dpisa-v2-0-b6601a15d2a5@kerne…
Changes in v2:
- Filter KVM guest visible bitfields in ID_AA64ISAR3_EL1 to only those
we make writeable.
- Link to v1: https://lore.kernel.org/r/20241028-arm64-2024-dpisa-v1-0-a38d08b008a8@kerne…
---
Marc Zyngier (1):
arm64: Filter out SVE hwcaps when FEAT_SVE isn't implemented
Mark Brown (4):
arm64/sysreg: Update ID_AA64SMFR0_EL1 to DDI0601 2024-12
arm64/hwcap: Describe 2024 dpISA extensions to userspace
KVM: arm64: Allow control of dpISA extensions in ID_AA64ISAR3_EL1
kselftest/arm64: Add 2024 dpISA extensions to hwcap test
Documentation/arch/arm64/elf_hwcaps.rst | 89 +++++++++--
arch/arm64/include/asm/hwcap.h | 15 ++
arch/arm64/include/uapi/asm/hwcap.h | 15 ++
arch/arm64/kernel/cpufeature.c | 71 +++++++--
arch/arm64/kernel/cpuinfo.c | 15 ++
arch/arm64/kvm/sys_regs.c | 6 +-
arch/arm64/tools/sysreg | 26 +++-
tools/testing/selftests/arm64/abi/hwcap.c | 235 +++++++++++++++++++++++++++++-
8 files changed, 441 insertions(+), 31 deletions(-)
---
base-commit: d66e21d59ed0e043e68ef8c6541c1e9f1a962614
change-id: 20241008-arm64-2024-dpisa-8091074a7f48
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Instead of inlining equivalents, use lib.sh-provided primitives.
Use defer to manage vx lifetime.
This will make it easier to extend the test in the next patch.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Ido Schimmel <idosch(a)nvidia.com>
---
Notes:
CC: Simon Horman <horms(a)kernel.org>
CC: Shuah Khan <shuah(a)kernel.org>
CC: linux-kselftest(a)vger.kernel.org
.../net/test_vxlan_fdb_changelink.sh | 39 ++++++++++++-------
1 file changed, 24 insertions(+), 15 deletions(-)
diff --git a/tools/testing/selftests/net/test_vxlan_fdb_changelink.sh b/tools/testing/selftests/net/test_vxlan_fdb_changelink.sh
index 2d442cdab11e..6f2bca4b346c 100755
--- a/tools/testing/selftests/net/test_vxlan_fdb_changelink.sh
+++ b/tools/testing/selftests/net/test_vxlan_fdb_changelink.sh
@@ -1,29 +1,38 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
-# Check FDB default-remote handling across "ip link set".
+ALL_TESTS="
+ test_set_remote
+"
+source lib.sh
check_remotes()
{
local what=$1; shift
local N=$(bridge fdb sh dev vx | grep 00:00:00:00:00:00 | wc -l)
- echo -ne "expected two remotes after $what\t"
- if [[ $N != 2 ]]; then
- echo "[FAIL]"
- EXIT_STATUS=1
- else
- echo "[ OK ]"
- fi
+ ((N == 2))
+ check_err $? "expected 2 remotes after $what, got $N"
}
-ip link add name vx up type vxlan id 2000 dstport 4789
-bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.20 self permanent
-bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.30 self permanent
-check_remotes "fdb append"
+# Check FDB default-remote handling across "ip link set".
+test_set_remote()
+{
+ RET=0
-ip link set dev vx type vxlan remote 192.0.2.30
-check_remotes "link set"
+ ip_link_add vx up type vxlan id 2000 dstport 4789
+ bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.20 self permanent
+ bridge fdb ap dev vx 00:00:00:00:00:00 dst 192.0.2.30 self permanent
+ check_remotes "fdb append"
+
+ ip link set dev vx type vxlan remote 192.0.2.30
+ check_remotes "link set"
+
+ log_test 'FDB default-remote handling across "ip link set"'
+}
+
+trap defer_scopes_cleanup EXIT
+
+tests_run
-ip link del dev vx
exit $EXIT_STATUS
--
2.47.0
This helper could be useful to more than just forwarding tests.
Move it upstairs and port over to log_test_skip().
Split the function into two parts: the bit that actually checks and
reports skip, which is in a new function check_command(). And a bit
that exits the test script if the check fails. This allows users
consistent checking behavior while giving an option to bail out from
a single test without bailing out of the whole script.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Ido Schimmel <idosch(a)nvidia.com>
---
Notes:
CC: Simon Horman <horms(a)kernel.org>
CC: Shuah Khan <shuah(a)kernel.org>
CC: linux-kselftest(a)vger.kernel.org
tools/testing/selftests/net/forwarding/lib.sh | 10 ----------
tools/testing/selftests/net/lib.sh | 19 +++++++++++++++++++
2 files changed, 19 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 8de80acf249e..508f3c700d71 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -291,16 +291,6 @@ if [[ "$CHECK_TC" = "yes" ]]; then
check_tc_version
fi
-require_command()
-{
- local cmd=$1; shift
-
- if [[ ! -x "$(command -v "$cmd")" ]]; then
- echo "SKIP: $cmd not installed"
- exit $ksft_skip
- fi
-}
-
# IPv6 support was added in v3.0
check_mtools_version()
{
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index 0bd9a038a1f0..975be4fdbcdb 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -450,6 +450,25 @@ kill_process()
{ kill $pid && wait $pid; } 2>/dev/null
}
+check_command()
+{
+ local cmd=$1; shift
+
+ if [[ ! -x "$(command -v "$cmd")" ]]; then
+ log_test_skip "$cmd not installed"
+ return $EXIT_STATUS
+ fi
+}
+
+require_command()
+{
+ local cmd=$1; shift
+
+ if ! check_command "$cmd"; then
+ exit $EXIT_STATUS
+ fi
+}
+
ip_link_add()
{
local name=$1; shift
--
2.47.0
This patch fixes a grammatical error in a test log message in
reuseaddr_ports_exhausted.c for better clarity as a part of lfx
application tasks
Signed-off-by: Pranav Tyagi <pranav.tyagi03(a)gmail.com>
---
tools/testing/selftests/net/reuseaddr_ports_exhausted.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/reuseaddr_ports_exhausted.c b/tools/testing/selftests/net/reuseaddr_ports_exhausted.c
index 066efd30e294..7b9bf8a7bbe1 100644
--- a/tools/testing/selftests/net/reuseaddr_ports_exhausted.c
+++ b/tools/testing/selftests/net/reuseaddr_ports_exhausted.c
@@ -112,7 +112,7 @@ TEST(reuseaddr_ports_exhausted_reusable_same_euid)
ASSERT_NE(-1, fd[0]) TH_LOG("failed to bind.");
if (opts->reuseport[0] && opts->reuseport[1]) {
- EXPECT_EQ(-1, fd[1]) TH_LOG("should fail to bind because both sockets succeed to be listened.");
+ EXPECT_EQ(-1, fd[1]) TH_LOG("should fail to bind because both sockets successfully listened.");
} else {
EXPECT_NE(-1, fd[1]) TH_LOG("should succeed to bind to connect to different destinations.");
}
--
2.47.1
Hi all,
This patchset adds a new buddy allocator like (or non-uniform) large folio
split to reduce the total number of after-split folios, the amount of memory
needed for multi-index xarray split, and keep more large folios after a split.
It is on top of mm-everything-2025-02-01-05-58. It is ready to be merged.
Instead of duplicating existing split_huge_page*() code, __folio_split()
is introduced as the shared backend code for both
split_huge_page_to_list_to_order() and folio_split(). __folio_split()
can support both uniform split and buddy allocator like (or non-uniform) split.
All existing split_huge_page*() users can be gradually converted to use
folio_split() if possible. In this patchset, I converted
truncate_inode_partial_folio() to use folio_split().
xfstests quick group passed for both tmpfs and xfs.
Changelog
===
From V5[7]:
1. Split shmem to any lower order patches are in mm tree, so dropped
from this series.
2. Rename split_folio_at() to try_folio_split() to clarify that
non-uniform split will not be used if it is not supported.
From V4[6]:
1. Enabled shmem support in both uniform and buddy allocator like split
and added selftests for it.
2. Added functions to check if uniform split and buddy allocator like
split are supported for the given folio and order.
3. Made truncate fall back to uniform split if buddy allocator split is
not supported (CONFIG_READ_ONLY_THP_FOR_FS and FS without large folio).
4. Added the missing folio_clear_has_hwpoisoned() to
__split_unmapped_folio().
From V3[5]:
1. Used xas_split_alloc(GFP_NOWAIT) instead of xas_nomem(), since extra
operations inside xas_split_alloc() are needed for correctness.
2. Enabled folio_split() for shmem and no issue was found with xfstests
quick test group.
3. Split both ends of a truncate range in truncate_inode_partial_folio()
to avoid wasting memory in shmem truncate (per David Hildenbrand).
4. Removed page_in_folio_offset() since page_folio() does the same
thing.
5. Finished truncate related tests from xfstests quick test group on XFS and
tmpfs without issues.
6. Disabled buddy allocator like split on CONFIG_READ_ONLY_THP_FOR_FS
and FS without large folio. This check was missed in the prior
versions.
From V2[3]:
1. Incorporated all the feedback from Kirill[4].
2. Used GFP_NOWAIT for xas_nomem().
3. Tested the code path when xas_nomem() fails.
4. Added selftests for folio_split().
5. Fixed no THP config build error.
From V1[2]:
1. Split the original patch 1 into multiple ones for easy review (per
Kirill).
2. Added xas_destroy() to avoid memory leak.
3. Fixed nr_dropped not used error (per kernel test robot).
4. Added proper error handling when xas_nomem() fails to allocate memory
for xas_split() during buddy allocator like split.
From RFC[1]:
1. Merged backend code of split_huge_page_to_list_to_order() and
folio_split(). The same code is used for both uniform split and buddy
allocator like split.
2. Use xas_nomem() instead of xas_split_alloc() for folio_split().
3. folio_split() now leaves the first after-split folio unlocked,
instead of the one containing the given page, since
the caller of truncate_inode_partial_folio() locks and unlocks the
first folio.
4. Extended split_huge_page debugfs to use folio_split().
5. Added truncate_inode_partial_folio() as first user of folio_split().
Design
===
folio_split() splits a large folio in the same way as buddy allocator
splits a large free page for allocation. The purpose is to minimize the
number of folios after the split. For example, if user wants to free the
3rd subpage in a order-9 folio, folio_split() will split the order-9 folio
as:
O-0, O-0, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-8 if it is anon
O-1, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-9 if it is pagecache
Since anon folio does not support order-1 yet.
The split process is similar to existing approach:
1. Unmap all page mappings (split PMD mappings if exist);
2. Split meta data like memcg, page owner, page alloc tag;
3. Copy meta data in struct folio to sub pages, but instead of spliting
the whole folio into multiple smaller ones with the same order in a
shot, this approach splits the folio iteratively. Taking the example
above, this approach first splits the original order-9 into two order-8,
then splits left part of order-8 to two order-7 and so on;
4. Post-process split folios, like write mapping->i_pages for pagecache,
adjust folio refcounts, add split folios to corresponding list;
5. Remap split folios
6. Unlock split folios.
__split_unmapped_folio() and __split_folio_to_order() replace
__split_huge_page() and __split_huge_page_tail() respectively.
__split_unmapped_folio() uses different approaches to perform
uniform split and buddy allocator like split:
1. uniform split: one single call to __split_folio_to_order() is used to
uniformly split the given folio. All resulting folios are put back to
the list after split. The folio containing the given page is left to
caller to unlock and others are unlocked.
2. buddy allocator like (or non-uniform) split: (old_order - new_order) calls
to __split_folio_to_order() are used to split the given folio at order N to
order N-1. After each call, the target folio is changed to the one
containing the page, which is given as a folio_split() parameter.
After each call, folios not containing the page are put back to the list.
The folio containing the page is put back to the list when its order
is new_order. All folios are unlocked except the first folio, which
is left to caller to unlock.
Patch Overview
===
1. Patch 1 added __split_unmapped_folio() and __split_folio_to_order() to
prepare for moving to new backend split code.
2. Patch 2 moved common code in split_huge_page_to_list_to_order() to
__folio_split().
3. Patch 3 added new folio_split() and made
split_huge_page_to_list_to_order() share the new
__split_unmapped_folio() with folio_split().
4. Patch 4 removed no longer used __split_huge_page() and
__split_huge_page_tail().
5. Patch 5 added a new in_folio_offset to split_huge_page debugfs for
folio_split() test.
6. Patch 6 used try_folio_split() for truncate operation.
7. Patch 7 added folio_split() tests.
Any comments and/or suggestions are welcome. Thanks.
[1] https://lore.kernel.org/linux-mm/20241008223748.555845-1-ziy@nvidia.com/
[2] https://lore.kernel.org/linux-mm/20241028180932.1319265-1-ziy@nvidia.com/
[3] https://lore.kernel.org/linux-mm/20241101150357.1752726-1-ziy@nvidia.com/
[4] https://lore.kernel.org/linux-mm/e6ppwz5t4p4kvir6eqzoto4y5fmdjdxdyvxvtw43nc…
[5] https://lore.kernel.org/linux-mm/20241205001839.2582020-1-ziy@nvidia.com/
[6] https://lore.kernel.org/linux-mm/20250106165513.104899-1-ziy@nvidia.com/
[7] https://lore.kernel.org/linux-mm/20250116211042.741543-1-ziy@nvidia.com/
Zi Yan (7):
mm/huge_memory: add two new (not yet used) functions for folio_split()
mm/huge_memory: move folio split common code to __folio_split()
mm/huge_memory: add buddy allocator like folio_split()
mm/huge_memory: remove the old, unused __split_huge_page()
mm/huge_memory: add folio_split() to debugfs testing interface.
mm/truncate: use buddy allocator like folio split for truncate
operation.
selftests/mm: add tests for folio_split(), buddy allocator like split.
include/linux/huge_mm.h | 36 +
mm/huge_memory.c | 749 ++++++++++++------
mm/truncate.c | 31 +-
.../selftests/mm/split_huge_page_test.c | 34 +-
4 files changed, 582 insertions(+), 268 deletions(-)
--
2.47.2
Notable changes since v17:
* fixed netdevice_tracker pointer assignment in netlink post_doit
(triggered by kernel test robot on m86k)
* renamed nla_get_uint() to ovpn_nla_get_uint() in ovpn-cli.c to avoid
clashes with libnl-3.11.0
FTR, here are the notable changes since v16:
* fixed usage of netdev tracker by removing dev_tracker member from
ovpn_priv and adding it to ovpn_peer and ovpn_socket as those are the
objects really holding a ref to the netdev
* switched ovpn_get_dev_from_attrs() to GFP_ATOMIC to prevent sleep under
rcu_read_lock
* allocated netdevice_tracker in ovpn_nl_pre_doit() [stored in
user_ptr[1]] to keep track of the netdev reference held during netlink
handler calls
* moved whole socket detaching routine to worker. This way the code is
allowed to sleep and in turn it can be executed under lock_sock. This
lock allows us to happily coordinate concurrent attach/detach calls.
(note: lock is acquired everytime the refcnt for the socket is
decremented, because this guarantees us that setting the refcnt to 0
and detaching the socket will happen atomically)
* dropped kref_put_sock()/refcount handler as it's not required anymore,
thanks to the point above
* re-arranged ovpn_socket_new() in order to simplify error path by first
allocating the new ovpn_sock and then attaching
Please note that some patches were already reviewed/tested by a few
people. iThese patches have retained the tags as they have hardly been
touched.
The latest code can also be found at:
https://github.com/OpenVPN/linux-kernel-ovpn
Thanks a lot!
Best Regards,
Antonio Quartulli
OpenVPN Inc.
---
Antonio Quartulli (25):
net: introduce OpenVPN Data Channel Offload (ovpn)
ovpn: add basic netlink support
ovpn: add basic interface creation/destruction/management routines
ovpn: keep carrier always on for MP interfaces
ovpn: introduce the ovpn_peer object
ovpn: introduce the ovpn_socket object
ovpn: implement basic TX path (UDP)
ovpn: implement basic RX path (UDP)
ovpn: implement packet processing
ovpn: store tunnel and transport statistics
ipv6: export inet6_stream_ops via EXPORT_SYMBOL_GPL
ovpn: implement TCP transport
skb: implement skb_send_sock_locked_with_flags()
ovpn: add support for MSG_NOSIGNAL in tcp_sendmsg
ovpn: implement multi-peer support
ovpn: implement peer lookup logic
ovpn: implement keepalive mechanism
ovpn: add support for updating local UDP endpoint
ovpn: add support for peer floating
ovpn: implement peer add/get/dump/delete via netlink
ovpn: implement key add/get/del/swap via netlink
ovpn: kill key and notify userspace in case of IV exhaustion
ovpn: notify userspace when a peer is deleted
ovpn: add basic ethtool support
testing/selftests: add test tool and scripts for ovpn module
Documentation/netlink/specs/ovpn.yaml | 372 +++
Documentation/netlink/specs/rt_link.yaml | 16 +
MAINTAINERS | 11 +
drivers/net/Kconfig | 15 +
drivers/net/Makefile | 1 +
drivers/net/ovpn/Makefile | 22 +
drivers/net/ovpn/bind.c | 55 +
drivers/net/ovpn/bind.h | 101 +
drivers/net/ovpn/crypto.c | 211 ++
drivers/net/ovpn/crypto.h | 145 ++
drivers/net/ovpn/crypto_aead.c | 382 ++++
drivers/net/ovpn/crypto_aead.h | 33 +
drivers/net/ovpn/io.c | 446 ++++
drivers/net/ovpn/io.h | 34 +
drivers/net/ovpn/main.c | 350 +++
drivers/net/ovpn/main.h | 14 +
drivers/net/ovpn/netlink-gen.c | 213 ++
drivers/net/ovpn/netlink-gen.h | 41 +
drivers/net/ovpn/netlink.c | 1183 ++++++++++
drivers/net/ovpn/netlink.h | 18 +
drivers/net/ovpn/ovpnstruct.h | 54 +
drivers/net/ovpn/peer.c | 1269 +++++++++++
drivers/net/ovpn/peer.h | 164 ++
drivers/net/ovpn/pktid.c | 129 ++
drivers/net/ovpn/pktid.h | 87 +
drivers/net/ovpn/proto.h | 118 +
drivers/net/ovpn/skb.h | 60 +
drivers/net/ovpn/socket.c | 204 ++
drivers/net/ovpn/socket.h | 49 +
drivers/net/ovpn/stats.c | 21 +
drivers/net/ovpn/stats.h | 47 +
drivers/net/ovpn/tcp.c | 565 +++++
drivers/net/ovpn/tcp.h | 33 +
drivers/net/ovpn/udp.c | 421 ++++
drivers/net/ovpn/udp.h | 22 +
include/linux/skbuff.h | 2 +
include/uapi/linux/if_link.h | 15 +
include/uapi/linux/ovpn.h | 111 +
include/uapi/linux/udp.h | 1 +
net/core/skbuff.c | 18 +-
net/ipv6/af_inet6.c | 1 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/net/ovpn/.gitignore | 2 +
tools/testing/selftests/net/ovpn/Makefile | 17 +
tools/testing/selftests/net/ovpn/config | 10 +
tools/testing/selftests/net/ovpn/data64.key | 5 +
tools/testing/selftests/net/ovpn/ovpn-cli.c | 2367 ++++++++++++++++++++
tools/testing/selftests/net/ovpn/tcp_peers.txt | 5 +
.../testing/selftests/net/ovpn/test-chachapoly.sh | 9 +
tools/testing/selftests/net/ovpn/test-float.sh | 9 +
tools/testing/selftests/net/ovpn/test-tcp.sh | 9 +
tools/testing/selftests/net/ovpn/test.sh | 185 ++
tools/testing/selftests/net/ovpn/udp_peers.txt | 5 +
53 files changed, 9673 insertions(+), 5 deletions(-)
---
base-commit: 7d0da8f862340c5f42f0062b8560b8d0971a6ac4
change-id: 20241002-b4-ovpn-eeee35c694a2
Best regards,
--
Antonio Quartulli <antonio(a)openvpn.net>
The mac address on backup slave should be convert from Solicited-Node
Multicast address, not from bonding unicast target address.
v2: fix patch 01's subject
Hangbin Liu (2):
bonding: fix incorrect MAC address setting to receive NS messages
selftests: bonding: fix incorrect mac address
drivers/net/bonding/bond_options.c | 4 +++-
tools/testing/selftests/drivers/net/bonding/bond_options.sh | 4 ++--
2 files changed, 5 insertions(+), 3 deletions(-)
--
2.46.0
Use ASSERT_OK_FD to check the return value of join cgroup,
or else this test will pass even if the fd < 0. ASSERT_OK_FD
can print the error message to the console.
Link: https://lore.kernel.org/all/6d62bd77-6733-40c7-b240-a1aeff55566c@linux.dev/
Suggested-by: Martin KaFai Lau <martin.lau(a)kernel.org>
Signed-off-by: Jason Xing <kerneljasonxing(a)gmail.com>
---
tools/testing/selftests/bpf/prog_tests/setget_sockopt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c b/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c
index e12255121c15..e4dac529d424 100644
--- a/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c
+++ b/tools/testing/selftests/bpf/prog_tests/setget_sockopt.c
@@ -202,7 +202,7 @@ static void test_nonstandard_opt(int family)
void test_setget_sockopt(void)
{
cg_fd = test__join_cgroup(CG_NAME);
- if (cg_fd < 0)
+ if (!ASSERT_OK_FD(cg_fd, "join cgroup"))
return;
if (create_netns())
--
2.43.5
Greetings:
Welcome to v4. Small functional change, which makes the code cleaner
(see changelog) and tests pass on my machine with mlx5 and netdevsim.
This is an attempt to followup on something Jakub asked me about [1],
adding an xsk attribute to queues and more clearly documenting which
queues are linked to NAPIs...
After the RFC [2], Jakub suggested creating an empty nest for queues
which have a pool, so I've adjusted this version to work that way.
The nest can be extended in the future to express attributes about XSK
as needed. Queues which are not used for AF_XDP do not have the xsk
attribute present.
I've run the included test on:
- my mlx5 machine (via NETIF=)
- without setting NETIF
And the test seems to pass in both cases.
Thanks,
Joe
[1]: https://lore.kernel.org/netdev/20250113143109.60afa59a@kernel.org/
[2]: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
v4:
- Add patch 1, as suggested by Jakub, which adds an empty nest helper.
- Use the helper in patch 2, which makes the code cleaner and prevents
a possible bug.
v3: https://lore.kernel.org/netdev/20250204191108.161046-1-jdamato@fastly.com/
- Change comment format in patch 2 to avoid kdoc warnings. No other
changes.
v2: https://lore.kernel.org/all/20250203185828.19334-1-jdamato@fastly.com/
- Switched from RFC to actual submission now that net-next is open
- Adjusted patch 1 to include an empty nest as suggested by Jakub
- Adjusted patch 2 to update the test based on changes to patch 1, and
to incorporate some Python feedback from Jakub :)
rfc: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
Joe Damato (3):
netlink: Add nla_put_empty_nest helper
netdev-genl: Add an XSK attribute to queues
selftests: drv-net: Test queue xsk attribute
Documentation/netlink/specs/netdev.yaml | 13 ++-
include/net/netlink.h | 15 ++++
include/uapi/linux/netdev.h | 6 ++
net/core/netdev-genl.c | 12 +++
tools/include/uapi/linux/netdev.h | 6 ++
.../testing/selftests/drivers/net/.gitignore | 2 +
tools/testing/selftests/drivers/net/Makefile | 3 +
tools/testing/selftests/drivers/net/queues.py | 35 +++++++-
.../selftests/drivers/net/xdp_helper.c | 89 +++++++++++++++++++
9 files changed, 178 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/.gitignore
create mode 100644 tools/testing/selftests/drivers/net/xdp_helper.c
base-commit: f3eba8edd885db439f4bfaa2cf9d766bad1ae6c5
--
2.43.0
The kernel has recently added support for shadow stacks, currently
x86 only using their CET feature but both arm64 and RISC-V have
equivalent features (GCS and Zicfiss respectively), I am actively
working on GCS[1]. With shadow stacks the hardware maintains an
additional stack containing only the return addresses for branch
instructions which is not generally writeable by userspace and ensures
that any returns are to the recorded addresses. This provides some
protection against ROP attacks and making it easier to collect call
stacks. These shadow stacks are allocated in the address space of the
userspace process.
Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled. The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread. This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces. As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.
Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process, keeping the current
implicit allocation behaviour if one is not specified either with
clone3() or through the use of clone(). The user must provide a shadow
stack pointer, this must point to memory mapped for use as a shadow
stackby map_shadow_stack() with an architecture specified shadow stack
token at the top of the stack.
Please note that the x86 portions of this code are build tested only, I
don't appear to have a system that can run CET available to me.
[1] https://lore.kernel.org/linux-arm-kernel/20241001-arm64-gcs-v13-0-222b78d87…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v14:
- Rebase onto v6.14-rc1.
- Link to v13: https://lore.kernel.org/r/20241203-clone3-shadow-stack-v13-0-93b89a81a5ed@k…
Changes in v13:
- Rebase onto v6.13-rc1.
- Link to v12: https://lore.kernel.org/r/20241031-clone3-shadow-stack-v12-0-7183eb8bee17@k…
Changes in v12:
- Add the regular prctl() to the userspace API document since arm64
support is queued in -next.
- Link to v11: https://lore.kernel.org/r/20241005-clone3-shadow-stack-v11-0-2a6a2bd6d651@k…
Changes in v11:
- Rebase onto arm64 for-next/gcs, which is based on v6.12-rc1, and
integrate arm64 support.
- Rework the interface to specify a shadow stack pointer rather than a
base and size like we do for the regular stack.
- Link to v10: https://lore.kernel.org/r/20240821-clone3-shadow-stack-v10-0-06e8797b9445@k…
Changes in v10:
- Integrate fixes & improvements for the x86 implementation from Rick
Edgecombe.
- Require that the shadow stack be VM_WRITE.
- Require that the shadow stack base and size be sizeof(void *) aligned.
- Clean up trailing newline.
- Link to v9: https://lore.kernel.org/r/20240819-clone3-shadow-stack-v9-0-962d74f99464@ke…
Changes in v9:
- Pull token validation earlier and report problems with an error return
to parent rather than signal delivery to the child.
- Verify that the top of the supplied shadow stack is VM_SHADOW_STACK.
- Rework token validation to only do the page mapping once.
- Drop no longer needed support for testing for signals in selftest.
- Fix typo in comments.
- Link to v8: https://lore.kernel.org/r/20240808-clone3-shadow-stack-v8-0-0acf37caf14c@ke…
Changes in v8:
- Fix token verification with user specified shadow stack.
- Don't track user managed shadow stacks for child processes.
- Link to v7: https://lore.kernel.org/r/20240731-clone3-shadow-stack-v7-0-a9532eebfb1d@ke…
Changes in v7:
- Rebase onto v6.11-rc1.
- Typo fixes.
- Link to v6: https://lore.kernel.org/r/20240623-clone3-shadow-stack-v6-0-9ee7783b1fb9@ke…
Changes in v6:
- Rebase onto v6.10-rc3.
- Ensure we don't try to free the parent shadow stack in error paths of
x86 arch code.
- Spelling fixes in userspace API document.
- Additional cleanups and improvements to the clone3() tests to support
the shadow stack tests.
- Link to v5: https://lore.kernel.org/r/20240203-clone3-shadow-stack-v5-0-322c69598e4b@ke…
Changes in v5:
- Rebase onto v6.8-rc2.
- Rework ABI to have the user allocate the shadow stack memory with
map_shadow_stack() and a token.
- Force inlining of the x86 shadow stack enablement.
- Move shadow stack enablement out into a shared header for reuse by
other tests.
- Link to v4: https://lore.kernel.org/r/20231128-clone3-shadow-stack-v4-0-8b28ffe4f676@ke…
Changes in v4:
- Formatting changes.
- Use a define for minimum shadow stack size and move some basic
validation to fork.c.
- Link to v3: https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@ke…
Changes in v3:
- Rebase onto v6.7-rc2.
- Remove stale shadow_stack in internal kargs.
- If a shadow stack is specified unconditionally use it regardless of
CLONE_ parameters.
- Force enable shadow stacks in the selftest.
- Update changelogs for RISC-V feature rename.
- Link to v2: https://lore.kernel.org/r/20231114-clone3-shadow-stack-v2-0-b613f8681155@ke…
Changes in v2:
- Rebase onto v6.7-rc1.
- Remove ability to provide preallocated shadow stack, just specify the
desired size.
- Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke…
---
Mark Brown (8):
arm64/gcs: Return a success value from gcs_alloc_thread_stack()
Documentation: userspace-api: Add shadow stack API documentation
selftests: Provide helper header for shadow stack testing
fork: Add shadow stack support to clone3()
selftests/clone3: Remove redundant flushes of output streams
selftests/clone3: Factor more of main loop into test_clone3()
selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
selftests/clone3: Test shadow stack support
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/shadow_stack.rst | 44 +++++
arch/arm64/include/asm/gcs.h | 8 +-
arch/arm64/kernel/process.c | 8 +-
arch/arm64/mm/gcs.c | 62 +++++-
arch/x86/include/asm/shstk.h | 11 +-
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 57 +++++-
include/asm-generic/cacheflush.h | 11 ++
include/linux/sched/task.h | 17 ++
include/uapi/linux/sched.h | 10 +-
kernel/fork.c | 96 +++++++--
tools/testing/selftests/clone3/clone3.c | 226 ++++++++++++++++++----
tools/testing/selftests/clone3/clone3_selftests.h | 65 ++++++-
tools/testing/selftests/ksft_shstk.h | 98 ++++++++++
15 files changed, 635 insertions(+), 81 deletions(-)
---
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This series expands the XDP TX metadata framework to allow user
applications to pass per packet 64-bit launch time directly to the kernel
driver, requesting launch time hardware offload support. The XDP TX
metadata framework will not perform any clock conversion or packet
reordering.
Please note that the role of Tx metadata is just to pass the launch time,
not to enable the offload feature. Users will need to enable the launch
time hardware offload feature of the device by using the respective
command, such as the tc-etf command.
Although some devices use the tc-etf command to enable their launch time
hardware offload feature, xsk packets will not go through the etf qdisc.
Therefore, in my opinion, the launch time should always be based on the PTP
Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the
clock source.
To simplify the test steps, I modified the xdp_hw_metadata bpf self-test
tool in such a way that it will set the launch time based on the offset
provided by the user and the value of the Receive Hardware Timestamp, which
is against the PHC. This will eliminate the need to discipline System Clock
with the PHC and then use clock_gettime() to get the time.
Please note that AF_XDP lacks a feedback mechanism to inform the
application if the requested launch time is invalid. So, users are expected
to familiar with the horizon of the launch time of the device they use and
not request a launch time that is beyond the horizon. Otherwise, the driver
might interpret the launch time incorrectly and react wrongly. For stmmac
and igc, where modulo computation is used, a launch time larger than the
horizon will cause the device to transmit the packet earlier that the
requested launch time.
Although there is no feedback mechanism for the launch time request
for now, user still can check whether the requested launch time is
working or not, by requesting the Transmit Completion Hardware Timestamp.
V9:
- Remove the igc_desc_unused() checking (Maciej)
- Ensure that skb allocation and DMA mapping work before proceeding to
fill in igc_tx_buffer info, context desc, and data desc (Maciej)
- Rate limit the error messages (Maciej)
- Update the comment to indicate that the 2 descriptors needed by the
empty frame are already taken into consideration (Maciej)
- Handle the case where the insertion of an empty frame fails and
explain the reason behind (Maciej)
- put self SOB tag as last tag (Maciej)
V8: https://lore.kernel.org/netdev/20250205024116.798862-1-yoong.siang.song@int…
- check the number of used descriptor in xsk_tx_metadata_request()
by using used_desc of struct igc_metadata_request, and then decreases
the budget with it (Maciej)
- submit another bug fix patch to set the buffer type for empty frame (Maciej):
https://lore.kernel.org/netdev/20250205023603.798819-1-yoong.siang.song@int…
V7: https://lore.kernel.org/netdev/20250204004907.789330-1-yoong.siang.song@int…
- split the refactoring code of igc empty packet insertion into a separate
commit (Faizal)
- add explanation on why the value "4" is used as igc transmit budget
(Faizal)
- perform a stress test by sending 1000 packets with 10ms interval and
launch time set to 500us in the future (Faizal & Yong Liang)
V6: https://lore.kernel.org/netdev/20250116155350.555374-1-yoong.siang.song@int…
- fix selftest build errors by using asprintf() and realloc(), instead of
managing the buffer sizes manually (Daniel, Stanislav)
V5: https://lore.kernel.org/netdev/20250114152718.120588-1-yoong.siang.song@int…
- change netdev feature name from tx-launch-time to tx-launch-time-fifo
to explicitly state the FIFO behaviour (Stanislav)
- improve the looping of xdp_hw_metadata app to wait for packet tx
completion to be more readable by using clock_gettime() (Stanislav)
- add launch time setup steps into xdp_hw_metadata app (Stanislav)
V4: https://lore.kernel.org/netdev/20250106135506.9687-1-yoong.siang.song@intel…
- added XDP launch time support to the igc driver (Jesper & Florian)
- added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper)
- added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub)
- added step to enable launch time in the commit message (Jesper & Willem)
- explicitly documented the type of launch_time and which clock source
it is against (Willem)
V3: https://lore.kernel.org/netdev/20231203165129.1740512-1-yoong.siang.song@in…
- renamed to use launch time (Jesper & Willem)
- changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s
because some NICs do not support such a large future time.
V2: https://lore.kernel.org/netdev/20231201062421.1074768-1-yoong.siang.song@in…
- renamed to use Earliest TxTime First (Willem)
- renamed to use txtime (Willem)
V1: https://lore.kernel.org/netdev/20231130162028.852006-1-yoong.siang.song@int…
Song Yoong Siang (5):
xsk: Add launch time hardware offload support to XDP Tx metadata
selftests/bpf: Add launch time request to xdp_hw_metadata
net: stmmac: Add launch time support to XDP ZC
igc: Refactor empty frame insertion for launch time support
igc: Add launch time support to XDP ZC
Documentation/netlink/specs/netdev.yaml | 4 +
Documentation/networking/xsk-tx-metadata.rst | 62 +++++++
drivers/net/ethernet/intel/igc/igc.h | 1 +
drivers/net/ethernet/intel/igc/igc_main.c | 141 +++++++++++----
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 +
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++
include/net/xdp_sock.h | 10 ++
include/net/xdp_sock_drv.h | 1 +
include/uapi/linux/if_xdp.h | 10 ++
include/uapi/linux/netdev.h | 3 +
net/core/netdev-genl.c | 2 +
net/xdp/xsk.c | 3 +
tools/include/uapi/linux/if_xdp.h | 10 ++
tools/include/uapi/linux/netdev.h | 3 +
tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++-
15 files changed, 394 insertions(+), 39 deletions(-)
--
2.34.1
When I implemented virtio's hash-related features to tun/tap [1],
I found tun/tap does not fill the entire region reserved for the virtio
header, leaving some uninitialized hole in the middle of the buffer
after read()/recvmesg().
This series fills the uninitialized hole. More concretely, the
num_buffers field will be initialized with 1, and the other fields will
be inialized with 0. Setting the num_buffers field to 1 is mandated by
virtio 1.0 [2].
The change to virtio header is preceded by another change that refactors
tun and tap to unify their virtio-related code.
[1]: https://lore.kernel.org/r/20241008-rss-v5-0-f3cf68df005d@daynix.com
[2]: https://lore.kernel.org/r/20241227084256-mutt-send-email-mst@kernel.org/
Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
---
Changes in v5:
- s/vnet_hdr_len_sz/vnet_hdr_sz/ for patch "tun: Decouple vnet handling"
(Willem de Bruijn)
- Changed to inline vnet implementations to TUN and TAP.
- Dropped patch "tun: Avoid double-tracking iov_iter length changes" and
"tap: Avoid double-tracking iov_iter length changes".
- Link to v4: https://lore.kernel.org/r/20250120-tun-v4-0-ee81dda03d7f@daynix.com
Changes in v4:
- s/sz/vnet_hdr_len_sz/ for patch "tun: Decouple vnet handling"
(Willem de Bruijn)
- Reverted to add CONFIG_TUN_VNET.
- Link to v3: https://lore.kernel.org/r/20250116-tun-v3-0-c6b2871e97f7@daynix.com
Changes in v3:
- Dropped changes to fill the vnet header.
- Splitted patch "tun: Unify vnet implementation".
- Reverted spurious changes in patch "tun: Unify vnet implementation".
- Merged tun_vnet.c into TAP.
- Link to v2: https://lore.kernel.org/r/20250109-tun-v2-0-388d7d5a287a@daynix.com
Changes in v2:
- Fixed num_buffers endian.
- Link to v1: https://lore.kernel.org/r/20250108-tun-v1-0-67d784b34374@daynix.com
---
Akihiko Odaki (7):
tun: Refactor CONFIG_TUN_VNET_CROSS_LE
tun: Keep hdr_len in tun_get_user()
tun: Decouple vnet from tun_struct
tun: Decouple vnet handling
tun: Extract the vnet handling code
tap: Keep hdr_len in tap_get_user()
tap: Use tun's vnet-related code
MAINTAINERS | 2 +-
drivers/net/tap.c | 168 ++++++------------------------------------
drivers/net/tun.c | 193 ++++++-------------------------------------------
drivers/net/tun_vnet.h | 184 ++++++++++++++++++++++++++++++++++++++++++++++
4 files changed, 231 insertions(+), 316 deletions(-)
---
base-commit: a32e14f8aef69b42826cf0998b068a43d486a9e9
change-id: 20241230-tun-66e10a49b0c7
Best regards,
--
Akihiko Odaki <akihiko.odaki(a)daynix.com>
The current implementation of netconsole sends all log messages in
parallel, which can lead to an intermixed and interleaved output on the
receiving side. This makes it challenging to demultiplex the messages
and attribute them to their originating CPUs.
As a result, users and developers often struggle to effectively analyze
and debug the parallel log output received through netconsole.
Example of a message got from produciton hosts:
------------[ cut here ]------------
------------[ cut here ]------------
refcount_t: saturated; leaking memory.
WARNING: CPU: 2 PID: 1613668 at lib/refcount.c:22 refcount_warn_saturate+0x5e/0xe0
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 26 PID: 4139916 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0xe0
Modules linked in: bpf_preload(E) vhost_net(E) tun(E) vhost(E)
This series of patches introduces a new feature to the netconsole
subsystem that allows the automatic population of the CPU number in the
userdata field for each log message. This enhancement provides several
benefits:
* Improved demultiplexing of parallel log output: When multiple CPUs are
sending messages concurrently, the added CPU number in the userdata
makes it easier to differentiate and attribute the messages to their
originating CPUs.
* Better visibility into message sources: The CPU number information
gives users and developers more insight into which specific CPU a
particular log message came from, which can be valuable for debugging
and analysis.
The changes in this series are as follows Patches::
Patch "consolidate send buffers into netconsole_target struct"
=================================================
Move the static buffers to netconsole target, from static declaration
in send_msg_no_fragmentation() and send_msg_fragmented().
Patch "netconsole: Rename userdata to extradata"
=================================================
Create the a concept of extradata, which encompasses the concept of
userdata and the upcoming sysdatao
Sysdata is a new concept being added, which is basically fields that are
populated by the kernel. At this time only the CPU#, but, there is a
desire to add current task name, kernel release version, etc.
Patch "netconsole: Helper to count number of used entries"
===========================================================
Create a simple helper to count number of entries in extradata. I am
separating this in a function since it will need to count userdata and
sysdata. For instance, when the user adds an extra userdata, we need to
check if there is space, counting the previous data entries (from
userdata and cpu data)
Patch "Introduce configfs helpers for sysdata features"
======================================================
Create the concept of sysdata feature in the netconsole target, and
create the configfs helpers to enable the bit in nt->sysdata
Patch "Include sysdata in extradata entry count"
================================================
Add the concept of sysdata when counting for available space in the
buffer. This will protect users from creating new userdata/sysdata if
there is no more space
Patch "netconsole: add support for sysdata and CPU population"
===============================================================
This is the core patch. Basically add a new option to enable automatic
CPU number population in the netconsole userdata Provides a new "cpu_nr"
sysfs attribute to control this feature
Patch "netconsole: selftest: test CPU number auto-population"
=============================================================
Expands the existing netconsole selftest to verify the CPU number
auto-population functionality Ensures the received netconsole messages
contain the expected "cpu=<CPU>" entry in the message. Test different
permutation with userdata
Patch "netconsole: docs: Add documentation for CPU number auto-population"
=============================================================================
Updates the netconsole documentation to explain the new CPU number
auto-population feature Provides instructions on how to enable and use
the feature
I believe these changes will be a valuable addition to the netconsole
subsystem, enhancing its usefulness for kernel developers and users.
PS: This patchset is on top of the patch that created
netcons_fragmented_msg selftest:
https://lore.kernel.org/all/20250203-netcons_frag_msgs-v1-1-5bc6bedf2ac0@de…
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v4:
- Fixed Kernel doc for netconsole_target (Simon)
- Fixed a typo in disable_sysdata_feature (Simon)
- Improved sysdata_cpu_nr_show() to return !! in a bit-wise operation
- Link to v3: https://lore.kernel.org/r/20250124-netcon_cpu-v3-0-12a0d286ba1d@debian.org
Changes in v3:
- Moved the buffer into netconsole_target, avoiding static functions in
the send path (Jakub).
- Fix a documentation error (Randy Dunlap)
- Created a function that handle all the extradata, consolidating it in
a single place (Jakub)
- Split the patch even more, trying to simplify the review.
- Link to v2: https://lore.kernel.org/r/20250115-netcon_cpu-v2-0-95971b44dc56@debian.org
Changes in v2:
- Create the concept of extradata and sysdata. This will make the design
easier to understand, and the code easier to read.
* Basically extradata encompasses userdata and the new sysdata.
Userdata originates from user, and sysdata originates in kernel.
- Improved the test to send from a very specific CPU, which can be
checked to be correct on the other side, as suggested by Jakub.
- Fixed a bug where CPU # was populated at the wrong place
- Link to v1: https://lore.kernel.org/r/20241113-netcon_cpu-v1-0-d187bf7c0321@debian.org
---
Breno Leitao (8):
netconsole: consolidate send buffers into netconsole_target struct
netconsole: Rename userdata to extradata
netconsole: Helper to count number of used entries
netconsole: Introduce configfs helpers for sysdata features
netconsole: Include sysdata in extradata entry count
netconsole: add support for sysdata and CPU population
netconsole: selftest: test for sysdata CPU
netconsole: docs: Add documentation for CPU number auto-population
Documentation/networking/netconsole.rst | 45 ++++
drivers/net/netconsole.c | 261 ++++++++++++++++-----
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 17 ++
.../selftests/drivers/net/netcons_sysdata.sh | 167 +++++++++++++
5 files changed, 427 insertions(+), 64 deletions(-)
---
base-commit: 145702ca2544bf82c6b244a49163f603fbe746f4
change-id: 20241108-netcon_cpu-ce3917e88f4b
Best regards,
--
Breno Leitao <leitao(a)debian.org>
Support for 32-bit s390 is very easy to implement and useful for
testing. For example I used to test some generic compat_ptr() logic,
which is only testable on 32-bit s390.
The series depends on my other series
"selftests/nolibc: test kernel configuration cleanups".
(It's not a hard dependency, only a minor diff conflict)
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Changes in v2:
- Rebase unto nolibc-next
- Use 96 bytes of stack frame size
- Pick up Ack from Willy
- Link to v1: https://lore.kernel.org/r/20250122-nolibc-s390-v1-0-8c765f00e871@weissschuh…
---
Thomas Weißschuh (2):
selftests/nolibc: rename s390 to s390x
tools/nolibc: add support for 32-bit s390
tools/include/nolibc/arch-s390.h | 5 +++++
tools/include/nolibc/arch.h | 2 +-
tools/testing/selftests/nolibc/Makefile | 10 ++++++++--
tools/testing/selftests/nolibc/run-tests.sh | 7 ++++++-
4 files changed, 20 insertions(+), 4 deletions(-)
---
base-commit: c1f4a7a84037249d086a4114c0c4332a260e9091
change-id: 20250122-nolibc-s390-e57141682c88
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
This series expands the XDP TX metadata framework to allow user
applications to pass per packet 64-bit launch time directly to the kernel
driver, requesting launch time hardware offload support. The XDP TX
metadata framework will not perform any clock conversion or packet
reordering.
Please note that the role of Tx metadata is just to pass the launch time,
not to enable the offload feature. Users will need to enable the launch
time hardware offload feature of the device by using the respective
command, such as the tc-etf command.
Although some devices use the tc-etf command to enable their launch time
hardware offload feature, xsk packets will not go through the etf qdisc.
Therefore, in my opinion, the launch time should always be based on the PTP
Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the
clock source.
To simplify the test steps, I modified the xdp_hw_metadata bpf self-test
tool in such a way that it will set the launch time based on the offset
provided by the user and the value of the Receive Hardware Timestamp, which
is against the PHC. This will eliminate the need to discipline System Clock
with the PHC and then use clock_gettime() to get the time.
Please note that AF_XDP lacks a feedback mechanism to inform the
application if the requested launch time is invalid. So, users are expected
to familiar with the horizon of the launch time of the device they use and
not request a launch time that is beyond the horizon. Otherwise, the driver
might interpret the launch time incorrectly and react wrongly. For stmmac
and igc, where modulo computation is used, a launch time larger than the
horizon will cause the device to transmit the packet earlier that the
requested launch time.
Although there is no feedback mechanism for the launch time request
for now, user still can check whether the requested launch time is
working or not, by requesting the Transmit Completion Hardware Timestamp.
V8:
- check the number of used descriptor in xsk_tx_metadata_request()
by using used_desc of struct igc_metadata_request, and then decreases
the budget with it (Maciej)
V7: https://lore.kernel.org/netdev/20250204004907.789330-1-yoong.siang.song@int…
- split the refactoring code of igc empty packet insertion into a separate
commit (Faizal)
- add explanation on why the value "4" is used as igc transmit budget
(Faizal)
- perform a stress test by sending 1000 packets with 10ms interval and
launch time set to 500us in the future (Faizal & Yong Liang)
V6: https://lore.kernel.org/netdev/20250116155350.555374-1-yoong.siang.song@int…
- fix selftest build errors by using asprintf() and realloc(), instead of
managing the buffer sizes manually (Daniel, Stanislav)
V5: https://lore.kernel.org/netdev/20250114152718.120588-1-yoong.siang.song@int…
- change netdev feature name from tx-launch-time to tx-launch-time-fifo
to explicitly state the FIFO behaviour (Stanislav)
- improve the looping of xdp_hw_metadata app to wait for packet tx
completion to be more readable by using clock_gettime() (Stanislav)
- add launch time setup steps into xdp_hw_metadata app (Stanislav)
V4: https://lore.kernel.org/netdev/20250106135506.9687-1-yoong.siang.song@intel…
- added XDP launch time support to the igc driver (Jesper & Florian)
- added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper)
- added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub)
- added step to enable launch time in the commit message (Jesper & Willem)
- explicitly documented the type of launch_time and which clock source
it is against (Willem)
V3: https://lore.kernel.org/netdev/20231203165129.1740512-1-yoong.siang.song@in…
- renamed to use launch time (Jesper & Willem)
- changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s
because some NICs do not support such a large future time.
V2: https://lore.kernel.org/netdev/20231201062421.1074768-1-yoong.siang.song@in…
- renamed to use Earliest TxTime First (Willem)
- renamed to use txtime (Willem)
V1: https://lore.kernel.org/netdev/20231130162028.852006-1-yoong.siang.song@int…
Song Yoong Siang (5):
xsk: Add launch time hardware offload support to XDP Tx metadata
selftests/bpf: Add launch time request to xdp_hw_metadata
net: stmmac: Add launch time support to XDP ZC
igc: Refactor empty packet insertion into a reusable function
igc: Add launch time support to XDP ZC
Documentation/netlink/specs/netdev.yaml | 4 +
Documentation/networking/xsk-tx-metadata.rst | 62 +++++++
drivers/net/ethernet/intel/igc/igc.h | 1 +
drivers/net/ethernet/intel/igc/igc_main.c | 94 +++++++---
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 +
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++
include/net/xdp_sock.h | 10 ++
include/net/xdp_sock_drv.h | 1 +
include/uapi/linux/if_xdp.h | 10 ++
include/uapi/linux/netdev.h | 3 +
net/core/netdev-genl.c | 2 +
net/xdp/xsk.c | 3 +
tools/include/uapi/linux/if_xdp.h | 10 ++
tools/include/uapi/linux/netdev.h | 3 +
tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++-
15 files changed, 359 insertions(+), 27 deletions(-)
--
2.34.1
The '-f' parameter is there to force the kernel to emit MPTCP FASTCLOSE
by closing the connection with unread bytes in the receive queue.
The xdisconnect() helper was used to stop the connection, but it does
more than that: it will shut it down, then wait before reconnecting to
the same address. This causes the mptcp_join's "fastclose test" to fail
all the time.
This failure is due to a recent change, with commit 218cc166321f
("selftests: mptcp: avoid spurious errors on disconnect"), but that went
unnoticed because the test is currently ignored. The recent modification
only shown an existing issue: xdisconnect() doesn't need to be used
here, only the shutdown() part is needed.
Fixes: 6bf41020b72b ("selftests: mptcp: update and extend fastclose test-cases")
Cc: stable(a)vger.kernel.org
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Notes:
- The failure was not clearly visible on NIPA, because the results for
the two impacted sub-tests are currently ignored (unstable). Still,
it looks important to fix that, as this will help when the tests will
be improved not to be unstable any more.
---
tools/testing/selftests/net/mptcp/mptcp_connect.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c
index 414addef9a4514c489ecd09249143fe0ce2af649..d240d02fa443a1cd802f0e705ab36db5c22063a8 100644
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.c
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c
@@ -1302,7 +1302,7 @@ int main_loop(void)
return ret;
if (cfg_truncate > 0) {
- xdisconnect(fd);
+ shutdown(fd, SHUT_WR);
} else if (--cfg_repeat > 0) {
xdisconnect(fd);
---
base-commit: 4241a702e0d0c2ca9364cfac08dbf134264962de
change-id: 20250204-net-mptcp-sft-conn-f-d1c14274ba66
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
When compiling selftests files under tools/testing/selftests/pidfd/ ,
some compiling errors and warnings will pop out as the following.
CC pidfd_fdinfo_test
pidfd_fdinfo_test.c: In function ‘child_fdinfo_nspid_test’:
pidfd_fdinfo_test.c:231:13: warning: implicit declaration of function \
‘mount’ [-Wimplicit-function-declaration]
231 | r = mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0);
| ^~~~~
pidfd_fdinfo_test.c:231:36: error: ‘MS_REC’ undeclared \
(first use in this function)
231 | r = mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0);
| ^~~~~~
pidfd_fdinfo_test.c:231:36: note: each undeclared identifier is \
reported only once for each function it appears in
pidfd_fdinfo_test.c:231:45: error: ‘MS_PRIVATE’ undeclared \
(first use in this function); did you mean ‘MAP_PRIVATE’?
231 | r = mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, 0);
| ^~~~~~~~~~
| MAP_PRIVATE
pidfd_fdinfo_test.c:237:15: warning: implicit declaration of function \
‘umount2’; did you mean ‘SYS_umount2’? \
[-Wimplicit-function-declaration]
237 | (void)umount2("/proc", MNT_DETACH);
| ^~~~~~~
| SYS_umount2
pidfd_fdinfo_test.c:237:32: error: ‘MNT_DETACH’ undeclared \
(first use in this function)
237 | (void)umount2("/proc", MNT_DETACH);
| ^~~~~~~~~~
make: *** [../lib.mk:222: \
~/linux/tools/testing/selftests/pidfd/pidfd_fdinfo_test] Error 1
Solve these errors and warnings by including header file <sys/mount.h>.
Signed-off-by: I Hsin Cheng <richard120310(a)gmail.com>
---
tools/testing/selftests/pidfd/pidfd_fdinfo_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c b/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c
index f062a986e..f718aac75 100644
--- a/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c
+++ b/tools/testing/selftests/pidfd/pidfd_fdinfo_test.c
@@ -13,6 +13,7 @@
#include <syscall.h>
#include <sys/wait.h>
#include <sys/mman.h>
+#include <sys/mount.h>
#include "pidfd.h"
#include "../kselftest.h"
--
2.43.0
6.13-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
commit 336d02bc4c6bec5c3d933e5d470a94970f830957 upstream.
When porting librseq commit:
commit c7b45750fa85 ("Adapt to glibc __rseq_size feature detection")
from librseq to the kernel selftests, the following line was missed
at the end of rseq_init():
rseq_size = get_rseq_kernel_feature_size();
which effectively leaves rseq_size initialized to -1U when glibc does not
have rseq support. glibc supports rseq from version 2.35 onwards.
In a following librseq commit
commit c67d198627c2 ("Only set 'rseq_size' on first thread registration")
to mimic the libc behavior, a new approach is taken: don't set the
feature size in 'rseq_size' until at least one thread has successfully
registered. This allows using 'rseq_size' in fast-paths to test for both
registration status and available features. The caveat is that on libc
either all threads are registered or none are, while with bare librseq
it is the responsability of the user to register all threads using rseq.
This combines the changes from the following librseq git commits:
commit c7b45750fa85 ("Adapt to glibc __rseq_size feature detection")
commit c67d198627c2 ("Only set 'rseq_size' on first thread registration")
Fixes: a0cc649353bb ("selftests/rseq: Fix mm_cid test failure")
Reported-by: Raghavendra Rao Ananta <rananta(a)google.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Raghavendra Rao Ananta <rananta(a)google.com>
Cc: Shuah Khan <skhan(a)linuxfoundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Carlos O'Donell <carlos(a)redhat.com>
Cc: Florian Weimer <fweimer(a)redhat.com>
Cc: Michael Jeanson <mjeanson(a)efficios.com>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
tools/testing/selftests/rseq/rseq.c | 32 +++++++++++++++++++++++++-------
tools/testing/selftests/rseq/rseq.h | 9 ++++++++-
2 files changed, 33 insertions(+), 8 deletions(-)
--- a/tools/testing/selftests/rseq/rseq.c
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -61,7 +61,6 @@ unsigned int rseq_size = -1U;
unsigned int rseq_flags;
static int rseq_ownership;
-static int rseq_reg_success; /* At least one rseq registration has succeded. */
/* Allocate a large area for the TLS. */
#define RSEQ_THREAD_AREA_ALLOC_SIZE 1024
@@ -152,14 +151,27 @@ int rseq_register_current_thread(void)
}
rc = sys_rseq(&__rseq_abi, get_rseq_min_alloc_size(), 0, RSEQ_SIG);
if (rc) {
- if (RSEQ_READ_ONCE(rseq_reg_success)) {
+ /*
+ * After at least one thread has registered successfully
+ * (rseq_size > 0), the registration of other threads should
+ * never fail.
+ */
+ if (RSEQ_READ_ONCE(rseq_size) > 0) {
/* Incoherent success/failure within process. */
abort();
}
return -1;
}
assert(rseq_current_cpu_raw() >= 0);
- RSEQ_WRITE_ONCE(rseq_reg_success, 1);
+
+ /*
+ * The first thread to register sets the rseq_size to mimic the libc
+ * behavior.
+ */
+ if (RSEQ_READ_ONCE(rseq_size) == 0) {
+ RSEQ_WRITE_ONCE(rseq_size, get_rseq_kernel_feature_size());
+ }
+
return 0;
}
@@ -235,12 +247,18 @@ void rseq_init(void)
return;
}
rseq_ownership = 1;
- if (!rseq_available()) {
- rseq_size = 0;
- return;
- }
+
+ /* Calculate the offset of the rseq area from the thread pointer. */
rseq_offset = (void *)&__rseq_abi - rseq_thread_pointer();
+
+ /* rseq flags are deprecated, always set to 0. */
rseq_flags = 0;
+
+ /*
+ * Set the size to 0 until at least one thread registers to mimic the
+ * libc behavior.
+ */
+ rseq_size = 0;
}
static __attribute__((destructor))
--- a/tools/testing/selftests/rseq/rseq.h
+++ b/tools/testing/selftests/rseq/rseq.h
@@ -60,7 +60,14 @@
extern ptrdiff_t rseq_offset;
/*
- * Size of the registered rseq area. 0 if the registration was
+ * The rseq ABI is composed of extensible feature fields. The extensions
+ * are done by appending additional fields at the end of the structure.
+ * The rseq_size defines the size of the active feature set which can be
+ * used by the application for the current rseq registration. Features
+ * starting at offset >= rseq_size are inactive and should not be used.
+ *
+ * The rseq_size is the intersection between the available allocation
+ * size for the rseq area and the feature size supported by the kernel.
* unsuccessful.
*/
extern unsigned int rseq_size;
6.12-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
commit 336d02bc4c6bec5c3d933e5d470a94970f830957 upstream.
When porting librseq commit:
commit c7b45750fa85 ("Adapt to glibc __rseq_size feature detection")
from librseq to the kernel selftests, the following line was missed
at the end of rseq_init():
rseq_size = get_rseq_kernel_feature_size();
which effectively leaves rseq_size initialized to -1U when glibc does not
have rseq support. glibc supports rseq from version 2.35 onwards.
In a following librseq commit
commit c67d198627c2 ("Only set 'rseq_size' on first thread registration")
to mimic the libc behavior, a new approach is taken: don't set the
feature size in 'rseq_size' until at least one thread has successfully
registered. This allows using 'rseq_size' in fast-paths to test for both
registration status and available features. The caveat is that on libc
either all threads are registered or none are, while with bare librseq
it is the responsability of the user to register all threads using rseq.
This combines the changes from the following librseq git commits:
commit c7b45750fa85 ("Adapt to glibc __rseq_size feature detection")
commit c67d198627c2 ("Only set 'rseq_size' on first thread registration")
Fixes: a0cc649353bb ("selftests/rseq: Fix mm_cid test failure")
Reported-by: Raghavendra Rao Ananta <rananta(a)google.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Raghavendra Rao Ananta <rananta(a)google.com>
Cc: Shuah Khan <skhan(a)linuxfoundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Carlos O'Donell <carlos(a)redhat.com>
Cc: Florian Weimer <fweimer(a)redhat.com>
Cc: Michael Jeanson <mjeanson(a)efficios.com>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
tools/testing/selftests/rseq/rseq.c | 32 +++++++++++++++++++++++++-------
tools/testing/selftests/rseq/rseq.h | 9 ++++++++-
2 files changed, 33 insertions(+), 8 deletions(-)
--- a/tools/testing/selftests/rseq/rseq.c
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -61,7 +61,6 @@ unsigned int rseq_size = -1U;
unsigned int rseq_flags;
static int rseq_ownership;
-static int rseq_reg_success; /* At least one rseq registration has succeded. */
/* Allocate a large area for the TLS. */
#define RSEQ_THREAD_AREA_ALLOC_SIZE 1024
@@ -152,14 +151,27 @@ int rseq_register_current_thread(void)
}
rc = sys_rseq(&__rseq_abi, get_rseq_min_alloc_size(), 0, RSEQ_SIG);
if (rc) {
- if (RSEQ_READ_ONCE(rseq_reg_success)) {
+ /*
+ * After at least one thread has registered successfully
+ * (rseq_size > 0), the registration of other threads should
+ * never fail.
+ */
+ if (RSEQ_READ_ONCE(rseq_size) > 0) {
/* Incoherent success/failure within process. */
abort();
}
return -1;
}
assert(rseq_current_cpu_raw() >= 0);
- RSEQ_WRITE_ONCE(rseq_reg_success, 1);
+
+ /*
+ * The first thread to register sets the rseq_size to mimic the libc
+ * behavior.
+ */
+ if (RSEQ_READ_ONCE(rseq_size) == 0) {
+ RSEQ_WRITE_ONCE(rseq_size, get_rseq_kernel_feature_size());
+ }
+
return 0;
}
@@ -235,12 +247,18 @@ void rseq_init(void)
return;
}
rseq_ownership = 1;
- if (!rseq_available()) {
- rseq_size = 0;
- return;
- }
+
+ /* Calculate the offset of the rseq area from the thread pointer. */
rseq_offset = (void *)&__rseq_abi - rseq_thread_pointer();
+
+ /* rseq flags are deprecated, always set to 0. */
rseq_flags = 0;
+
+ /*
+ * Set the size to 0 until at least one thread registers to mimic the
+ * libc behavior.
+ */
+ rseq_size = 0;
}
static __attribute__((destructor))
--- a/tools/testing/selftests/rseq/rseq.h
+++ b/tools/testing/selftests/rseq/rseq.h
@@ -60,7 +60,14 @@
extern ptrdiff_t rseq_offset;
/*
- * Size of the registered rseq area. 0 if the registration was
+ * The rseq ABI is composed of extensible feature fields. The extensions
+ * are done by appending additional fields at the end of the structure.
+ * The rseq_size defines the size of the active feature set which can be
+ * used by the application for the current rseq registration. Features
+ * starting at offset >= rseq_size are inactive and should not be used.
+ *
+ * The rseq_size is the intersection between the available allocation
+ * size for the rseq area and the feature size supported by the kernel.
* unsuccessful.
*/
extern unsigned int rseq_size;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
commit 336d02bc4c6bec5c3d933e5d470a94970f830957 upstream.
When porting librseq commit:
commit c7b45750fa85 ("Adapt to glibc __rseq_size feature detection")
from librseq to the kernel selftests, the following line was missed
at the end of rseq_init():
rseq_size = get_rseq_kernel_feature_size();
which effectively leaves rseq_size initialized to -1U when glibc does not
have rseq support. glibc supports rseq from version 2.35 onwards.
In a following librseq commit
commit c67d198627c2 ("Only set 'rseq_size' on first thread registration")
to mimic the libc behavior, a new approach is taken: don't set the
feature size in 'rseq_size' until at least one thread has successfully
registered. This allows using 'rseq_size' in fast-paths to test for both
registration status and available features. The caveat is that on libc
either all threads are registered or none are, while with bare librseq
it is the responsability of the user to register all threads using rseq.
This combines the changes from the following librseq git commits:
commit c7b45750fa85 ("Adapt to glibc __rseq_size feature detection")
commit c67d198627c2 ("Only set 'rseq_size' on first thread registration")
Fixes: a0cc649353bb ("selftests/rseq: Fix mm_cid test failure")
Reported-by: Raghavendra Rao Ananta <rananta(a)google.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Raghavendra Rao Ananta <rananta(a)google.com>
Cc: Shuah Khan <skhan(a)linuxfoundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Carlos O'Donell <carlos(a)redhat.com>
Cc: Florian Weimer <fweimer(a)redhat.com>
Cc: Michael Jeanson <mjeanson(a)efficios.com>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
tools/testing/selftests/rseq/rseq.c | 32 ++++++++++++++++++++++-------
tools/testing/selftests/rseq/rseq.h | 9 +++++++-
2 files changed, 33 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
index 5b9772cdf265..f6156790c3b4 100644
--- a/tools/testing/selftests/rseq/rseq.c
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -61,7 +61,6 @@ unsigned int rseq_size = -1U;
unsigned int rseq_flags;
static int rseq_ownership;
-static int rseq_reg_success; /* At least one rseq registration has succeded. */
/* Allocate a large area for the TLS. */
#define RSEQ_THREAD_AREA_ALLOC_SIZE 1024
@@ -152,14 +151,27 @@ int rseq_register_current_thread(void)
}
rc = sys_rseq(&__rseq_abi, get_rseq_min_alloc_size(), 0, RSEQ_SIG);
if (rc) {
- if (RSEQ_READ_ONCE(rseq_reg_success)) {
+ /*
+ * After at least one thread has registered successfully
+ * (rseq_size > 0), the registration of other threads should
+ * never fail.
+ */
+ if (RSEQ_READ_ONCE(rseq_size) > 0) {
/* Incoherent success/failure within process. */
abort();
}
return -1;
}
assert(rseq_current_cpu_raw() >= 0);
- RSEQ_WRITE_ONCE(rseq_reg_success, 1);
+
+ /*
+ * The first thread to register sets the rseq_size to mimic the libc
+ * behavior.
+ */
+ if (RSEQ_READ_ONCE(rseq_size) == 0) {
+ RSEQ_WRITE_ONCE(rseq_size, get_rseq_kernel_feature_size());
+ }
+
return 0;
}
@@ -235,12 +247,18 @@ void rseq_init(void)
return;
}
rseq_ownership = 1;
- if (!rseq_available()) {
- rseq_size = 0;
- return;
- }
+
+ /* Calculate the offset of the rseq area from the thread pointer. */
rseq_offset = (void *)&__rseq_abi - rseq_thread_pointer();
+
+ /* rseq flags are deprecated, always set to 0. */
rseq_flags = 0;
+
+ /*
+ * Set the size to 0 until at least one thread registers to mimic the
+ * libc behavior.
+ */
+ rseq_size = 0;
}
static __attribute__((destructor))
diff --git a/tools/testing/selftests/rseq/rseq.h b/tools/testing/selftests/rseq/rseq.h
index 4e217b620e0c..062d10925a10 100644
--- a/tools/testing/selftests/rseq/rseq.h
+++ b/tools/testing/selftests/rseq/rseq.h
@@ -60,7 +60,14 @@
extern ptrdiff_t rseq_offset;
/*
- * Size of the registered rseq area. 0 if the registration was
+ * The rseq ABI is composed of extensible feature fields. The extensions
+ * are done by appending additional fields at the end of the structure.
+ * The rseq_size defines the size of the active feature set which can be
+ * used by the application for the current rseq registration. Features
+ * starting at offset >= rseq_size are inactive and should not be used.
+ *
+ * The rseq_size is the intersection between the available allocation
+ * size for the rseq area and the feature size supported by the kernel.
* unsuccessful.
*/
extern unsigned int rseq_size;
--
2.48.1
The thuge-gen test_mmap() and test_shmget() tests are repeatedly run for a
variety of sizes but always report the result of their test with the same
name, meaning that automated sysetms running the tests are unable to
distinguish between the various tests. Add the supplied sizes to the logged
test names to distinguish between runs.
Fixes: b38bd9b2c448 ("selftests/mm: thuge-gen: conform to TAP format output")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/mm/thuge-gen.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/thuge-gen.c b/tools/testing/selftests/mm/thuge-gen.c
index e4370b79b62ffb133056eb843cdd1eaeba6503df..cd5174d735be405220d99ae796a3768f53df6ea4 100644
--- a/tools/testing/selftests/mm/thuge-gen.c
+++ b/tools/testing/selftests/mm/thuge-gen.c
@@ -127,7 +127,7 @@ void test_mmap(unsigned long size, unsigned flags)
show(size);
ksft_test_result(size == getpagesize() || (before - after) == NUM_PAGES,
- "%s mmap\n", __func__);
+ "%s mmap %lu\n", __func__, size);
if (munmap(map, size * NUM_PAGES))
ksft_exit_fail_msg("%s: unmap %s\n", __func__, strerror(errno));
@@ -165,7 +165,7 @@ void test_shmget(unsigned long size, unsigned flags)
show(size);
ksft_test_result(size == getpagesize() || (before - after) == NUM_PAGES,
- "%s: mmap\n", __func__);
+ "%s: mmap %lu\n", __func__, size);
if (shmdt(map))
ksft_exit_fail_msg("%s: shmdt: %s\n", __func__, strerror(errno));
}
---
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
change-id: 20250204-kselftest-mm-fix-dups-076a48577184
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The mac address on backup slave should be convert from Solicited-Node
Multicast address, not from bonding unicast target address.
Hangbin Liu (2):
bonding: fix incorrect MAC address setting to receive NA messages
selftests: bonding: fix incorrect mac address
drivers/net/bonding/bond_options.c | 4 +++-
tools/testing/selftests/drivers/net/bonding/bond_options.sh | 4 ++--
2 files changed, 5 insertions(+), 3 deletions(-)
--
2.39.5 (Apple Git-154)
Support for 32-bit s390 is very easy to implement and useful for
testing. For example I used to test some generic compat_ptr() logic,
which is only testable on 32-bit s390.
The series depends on my other series
"selftests/nolibc: test kernel configuration cleanups".
(It's not a hard dependency, only a minor diff conflict)
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (2):
selftests/nolibc: rename s390 to s390x
tools/nolibc: add support for 32-bit s390
tools/include/nolibc/arch-s390.h | 5 +++++
tools/include/nolibc/arch.h | 2 +-
tools/testing/selftests/nolibc/Makefile | 10 ++++++++--
tools/testing/selftests/nolibc/run-tests.sh | 8 +++++++-
4 files changed, 21 insertions(+), 4 deletions(-)
---
base-commit: 0597614d84c8593ba906418bf3c0c0de1e02e82a
change-id: 20250122-nolibc-s390-e57141682c88
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Add a new selftest to verify netconsole's handling of messages that
exceed the packet size limit and require fragmentation. The test sends
messages with varying sizes and userdata, validating that:
1. Large messages are correctly fragmented and reassembled
2. Userdata fields are properly preserved across fragments
3. Messages work correctly with and without kernel release version
appending
The test creates a networking environment using netdevsim, sends
messages through /dev/kmsg, and verifies the received fragments maintain
message integrity.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Reviewed-by: Simon Horman <horms(a)kernel.org>
---
Changes compared to RFC
- Removed the long-line comment (Simon)
- Added Simon's reviwed-by tag.
- Link to v1: https://lore.kernel.org/r/20250131-netcons_frag_msgs-v1-1-0de83bf2a7e6@debi…
---
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 7 ++
.../drivers/net/netcons_fragmented_msg.sh | 122 +++++++++++++++++++++
3 files changed, 130 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile
index 137470bdee0c7fd2517bd1baafc12d575de4b4ac..c7f1c443f2af091aa13f67dd1df9ae05d7a43f40 100644
--- a/tools/testing/selftests/drivers/net/Makefile
+++ b/tools/testing/selftests/drivers/net/Makefile
@@ -7,6 +7,7 @@ TEST_INCLUDES := $(wildcard lib/py/*.py) \
TEST_PROGS := \
netcons_basic.sh \
+ netcons_fragmented_msg.sh \
netcons_overflow.sh \
ping.py \
queues.py \
diff --git a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh
index 3acaba41ac7b21aa2fd8457ed640a5ac8a41bc12..0c262b123fdd3082c40b2bd899ec626d223226ed 100644
--- a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh
+++ b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh
@@ -110,6 +110,13 @@ function create_dynamic_target() {
echo 1 > "${NETCONS_PATH}"/enabled
}
+# Do not append the release to the header of the message
+function disable_release_append() {
+ echo 0 > "${NETCONS_PATH}"/enabled
+ echo 0 > "${NETCONS_PATH}"/release
+ echo 1 > "${NETCONS_PATH}"/enabled
+}
+
function cleanup() {
local NSIM_DEV_SYS_DEL="/sys/bus/netdevsim/del_device"
diff --git a/tools/testing/selftests/drivers/net/netcons_fragmented_msg.sh b/tools/testing/selftests/drivers/net/netcons_fragmented_msg.sh
new file mode 100755
index 0000000000000000000000000000000000000000..4a71e01a230c75c14147a9937920fe0b68a0926a
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/netcons_fragmented_msg.sh
@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+# SPDX-License-Identifier: GPL-2.0
+
+# Test netconsole's message fragmentation functionality.
+#
+# When a message exceeds the maximum packet size, netconsole splits it into
+# multiple fragments for transmission. This test verifies:
+# - Correct fragmentation of large messages
+# - Proper reassembly of fragments at the receiver
+# - Preservation of userdata across fragments
+# - Behavior with and without kernel release version appending
+#
+# Author: Breno Leitao <leitao(a)debian.org>
+
+set -euo pipefail
+
+SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")
+
+source "${SCRIPTDIR}"/lib/sh/lib_netcons.sh
+
+modprobe netdevsim 2> /dev/null || true
+modprobe netconsole 2> /dev/null || true
+
+# The content of kmsg will be save to the following file
+OUTPUT_FILE="/tmp/${TARGET}"
+
+# set userdata to a long value. In this case, it is "1-2-3-4...50-"
+USERDATA_VALUE=$(printf -- '%.2s-' {1..60})
+
+# Convert the header string in a regexp, so, we can remove
+# the second header as well.
+# A header looks like "13,468,514729715,-,ncfrag=0/1135;". If
+# release is appended, you might find something like:L
+# "6.13.0-04048-g4f561a87745a,13,468,514729715,-,ncfrag=0/1135;"
+function header_to_regex() {
+ # header is everything before ;
+ local HEADER="${1}"
+ REGEX=$(echo "${HEADER}" | cut -d'=' -f1)
+ echo "${REGEX}=[0-9]*\/[0-9]*;"
+}
+
+# We have two headers in the message. Remove both to get the full message,
+# and extract the full message.
+function extract_msg() {
+ local MSGFILE="${1}"
+ # Extract the header, which is the very first thing that arrives in the
+ # first list.
+ HEADER=$(sed -n '1p' "${MSGFILE}" | cut -d';' -f1)
+ HEADER_REGEX=$(header_to_regex "${HEADER}")
+
+ # Remove the two headers from the received message
+ # This will return the message without any header, similarly to what
+ # was sent.
+ sed "s/""${HEADER_REGEX}""//g" "${MSGFILE}"
+}
+
+# Validate the message, which has two messages glued together.
+# unwrap them to make sure all the characters were transmitted.
+# File will look like the following:
+# 13,468,514729715,-,ncfrag=0/1135;<message>
+# key=<part of key>-13,468,514729715,-,ncfrag=967/1135;<rest of the key>
+function validate_fragmented_result() {
+ # Discard the netconsole headers, and assemble the full message
+ RCVMSG=$(extract_msg "${1}")
+
+ # check for the main message
+ if ! echo "${RCVMSG}" | grep -q "${MSG}"; then
+ echo "Message body doesn't match." >&2
+ echo "msg received=" "${RCVMSG}" >&2
+ exit "${ksft_fail}"
+ fi
+
+ # check userdata
+ if ! echo "${RCVMSG}" | grep -q "${USERDATA_VALUE}"; then
+ echo "message userdata doesn't match" >&2
+ echo "msg received=" "${RCVMSG}" >&2
+ exit "${ksft_fail}"
+ fi
+ # test passed. hooray
+}
+
+# Check for basic system dependency and exit if not found
+check_for_dependencies
+# Set current loglevel to KERN_INFO(6), and default to KERN_NOTICE(5)
+echo "6 5" > /proc/sys/kernel/printk
+# Remove the namespace, interfaces and netconsole target on exit
+trap cleanup EXIT
+# Create one namespace and two interfaces
+set_network
+# Create a dynamic target for netconsole
+create_dynamic_target
+# Set userdata "key" with the "value" value
+set_user_data
+
+
+# TEST 1: Send message and userdata. They will fragment
+# =======
+MSG=$(printf -- 'MSG%.3s=' {1..150})
+
+# Listen for netconsole port inside the namespace and destination interface
+listen_port_and_save_to "${OUTPUT_FILE}" &
+# Wait for socat to start and listen to the port.
+wait_local_port_listen "${NAMESPACE}" "${PORT}" udp
+# Send the message
+echo "${MSG}: ${TARGET}" > /dev/kmsg
+# Wait until socat saves the file to disk
+busywait "${BUSYWAIT_TIMEOUT}" test -s "${OUTPUT_FILE}"
+# Check if the message was not corrupted
+validate_fragmented_result "${OUTPUT_FILE}"
+
+# TEST 2: Test with smaller message, and without release appended
+# =======
+MSG=$(printf -- 'FOOBAR%.3s=' {1..100})
+# Let's disable release and test again.
+disable_release_append
+
+listen_port_and_save_to "${OUTPUT_FILE}" &
+wait_local_port_listen "${NAMESPACE}" "${PORT}" udp
+echo "${MSG}: ${TARGET}" > /dev/kmsg
+busywait "${BUSYWAIT_TIMEOUT}" test -s "${OUTPUT_FILE}"
+validate_fragmented_result "${OUTPUT_FILE}"
+exit "${ksft_pass}"
---
base-commit: 0ad9617c78acbc71373fb341a6f75d4012b01d69
change-id: 20250129-netcons_frag_msgs-91506d136f50
Best regards,
--
Breno Leitao <leitao(a)debian.org>
The script uses non-POSIX features like `[[` for conditionals and hence
does not work when run with a POSIX /bin/sh.
Change the shebang to /bin/bash instead, like the other tests in cgroup.
Signed-off-by: Bharadwaj Raju <bharadwaj.raju777(a)gmail.com>
---
tools/testing/selftests/cgroup/test_cpuset_v1_hp.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/cgroup/test_cpuset_v1_hp.sh b/tools/testing/selftests/cgroup/test_cpuset_v1_hp.sh
index 3f45512fb512..7406c24be1ac 100755
--- a/tools/testing/selftests/cgroup/test_cpuset_v1_hp.sh
+++ b/tools/testing/selftests/cgroup/test_cpuset_v1_hp.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
#
# Test the special cpuset v1 hotplug case where a cpuset become empty of
--
2.43.0
Greetings:
Welcome to v3. No functional changes, see changelog below.
This is an attempt to followup on something Jakub asked me about [1],
adding an xsk attribute to queues and more clearly documenting which
queues are linked to NAPIs...
After the RFC [2], Jakub suggested creating an empty nest for queues
which have a pool, so I've adjusted this version to work that way.
The nest can be extended in the future to express attributes about XSK
as needed. Queues which are not used for AF_XDP do not have the xsk
attribute present.
I've run the included test on:
- my mlx5 machine (via NETIF=)
- without setting NETIF
And the test seems to pass in both cases.
Thanks,
Joe
[1]: https://lore.kernel.org/netdev/20250113143109.60afa59a@kernel.org/
[2]: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
v3:
- Change comment format in patch 2 to avoid kdoc warnings. No other
changes.
v2: https://lore.kernel.org/all/20250203185828.19334-1-jdamato@fastly.com/
- Switched from RFC to actual submission now that net-next is open
- Adjusted patch 1 to include an empty nest as suggested by Jakub
- Adjusted patch 2 to update the test based on changes to patch 1, and
to incorporate some Python feedback from Jakub :)
rfc: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
Joe Damato (2):
netdev-genl: Add an XSK attribute to queues
selftests: drv-net: Test queue xsk attribute
Documentation/netlink/specs/netdev.yaml | 13 ++-
include/uapi/linux/netdev.h | 6 ++
net/core/netdev-genl.c | 11 +++
tools/include/uapi/linux/netdev.h | 6 ++
.../testing/selftests/drivers/net/.gitignore | 2 +
tools/testing/selftests/drivers/net/Makefile | 3 +
tools/testing/selftests/drivers/net/queues.py | 35 +++++++-
.../selftests/drivers/net/xdp_helper.c | 89 +++++++++++++++++++
8 files changed, 162 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/.gitignore
create mode 100644 tools/testing/selftests/drivers/net/xdp_helper.c
base-commit: c2933b2befe25309f4c5cfbea0ca80909735fd76
--
2.43.0
Good day,
I sent you a message a few hours ago but no reply yet, or you didn't receive it? Kindly read my letter and reply back. I want to make an inquiry
Thanks.
Dr.Allen Cheng
Human Resource Manager | Product Research Assistant
FC Industrial Laboratories Ltd
This patch fixes a grammatical error in a test log message in
reuseaddr_ports_exhausted.c for better clarity as a part of lfx
application tasks
Signed-off-by: Pranav Tyagi <pranav.tyagi03(a)gmail.com>
---
tools/testing/selftests/net/reuseaddr_ports_exhausted.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/reuseaddr_ports_exhausted.c b/tools/testing/selftests/net/reuseaddr_ports_exhausted.c
index 066efd30e294..7b9bf8a7bbe1 100644
--- a/tools/testing/selftests/net/reuseaddr_ports_exhausted.c
+++ b/tools/testing/selftests/net/reuseaddr_ports_exhausted.c
@@ -112,7 +112,7 @@ TEST(reuseaddr_ports_exhausted_reusable_same_euid)
ASSERT_NE(-1, fd[0]) TH_LOG("failed to bind.");
if (opts->reuseport[0] && opts->reuseport[1]) {
- EXPECT_EQ(-1, fd[1]) TH_LOG("should fail to bind because both sockets succeed to be listened.");
+ EXPECT_EQ(-1, fd[1]) TH_LOG("should fail to bind because both sockets successfully listened.");
} else {
EXPECT_NE(-1, fd[1]) TH_LOG("should succeed to bind to connect to different destinations.");
}
--
2.47.1
This series expands the XDP TX metadata framework to allow user
applications to pass per packet 64-bit launch time directly to the kernel
driver, requesting launch time hardware offload support. The XDP TX
metadata framework will not perform any clock conversion or packet
reordering.
Please note that the role of Tx metadata is just to pass the launch time,
not to enable the offload feature. Users will need to enable the launch
time hardware offload feature of the device by using the respective
command, such as the tc-etf command.
Although some devices use the tc-etf command to enable their launch time
hardware offload feature, xsk packets will not go through the etf qdisc.
Therefore, in my opinion, the launch time should always be based on the PTP
Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the
clock source.
To simplify the test steps, I modified the xdp_hw_metadata bpf self-test
tool in such a way that it will set the launch time based on the offset
provided by the user and the value of the Receive Hardware Timestamp, which
is against the PHC. This will eliminate the need to discipline System Clock
with the PHC and then use clock_gettime() to get the time.
Please note that AF_XDP lacks a feedback mechanism to inform the
application if the requested launch time is invalid. So, users are expected
to familiar with the horizon of the launch time of the device they use and
not request a launch time that is beyond the horizon. Otherwise, the driver
might interpret the launch time incorrectly and react wrongly. For stmmac
and igc, where modulo computation is used, a launch time larger than the
horizon will cause the device to transmit the packet earlier that the
requested launch time.
Although there is no feedback mechanism for the launch time request
for now, user still can check whether the requested launch time is
working or not, by requesting the Transmit Completion Hardware Timestamp.
V7:
- split the refactoring code of igc empty packet insertion into a separate
commit (Faizal)
- add explanation on why the value "4" is used as igc transmit budget
(Faizal)
- perform a stress test by sending 1000 packets with 10ms interval and
launch time set to 500us in the future (Faizal & Yong Liang)
V6: https://lore.kernel.org/netdev/20250116155350.555374-1-yoong.siang.song@int…
- fix selftest build errors by using asprintf() and realloc(), instead of
managing the buffer sizes manually (Daniel, Stanislav)
V5: https://lore.kernel.org/netdev/20250114152718.120588-1-yoong.siang.song@int…
- change netdev feature name from tx-launch-time to tx-launch-time-fifo
to explicitly state the FIFO behaviour (Stanislav)
- improve the looping of xdp_hw_metadata app to wait for packet tx
completion to be more readable by using clock_gettime() (Stanislav)
- add launch time setup steps into xdp_hw_metadata app (Stanislav)
V4: https://lore.kernel.org/netdev/20250106135506.9687-1-yoong.siang.song@intel…
- added XDP launch time support to the igc driver (Jesper & Florian)
- added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper)
- added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub)
- added step to enable launch time in the commit message (Jesper & Willem)
- explicitly documented the type of launch_time and which clock source
it is against (Willem)
V3: https://lore.kernel.org/netdev/20231203165129.1740512-1-yoong.siang.song@in…
- renamed to use launch time (Jesper & Willem)
- changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s
because some NICs do not support such a large future time.
V2: https://lore.kernel.org/netdev/20231201062421.1074768-1-yoong.siang.song@in…
- renamed to use Earliest TxTime First (Willem)
- renamed to use txtime (Willem)
V1: https://lore.kernel.org/netdev/20231130162028.852006-1-yoong.siang.song@int…
Song Yoong Siang (5):
xsk: Add launch time hardware offload support to XDP Tx metadata
selftests/bpf: Add launch time request to xdp_hw_metadata
net: stmmac: Add launch time support to XDP ZC
igc: Refactor empty packet insertion into a reusable function
igc: Add launch time support to XDP ZC
Documentation/netlink/specs/netdev.yaml | 4 +
Documentation/networking/xsk-tx-metadata.rst | 62 +++++++
drivers/net/ethernet/intel/igc/igc_main.c | 84 ++++++---
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 +
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++
include/net/xdp_sock.h | 10 ++
include/net/xdp_sock_drv.h | 1 +
include/uapi/linux/if_xdp.h | 10 ++
include/uapi/linux/netdev.h | 3 +
net/core/netdev-genl.c | 2 +
net/xdp/xsk.c | 3 +
tools/include/uapi/linux/if_xdp.h | 10 ++
tools/include/uapi/linux/netdev.h | 3 +
tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++-
14 files changed, 348 insertions(+), 27 deletions(-)
--
2.34.1
From: Kevin Brodsky <kevin.brodsky(a)arm.com>
[ Upstream commit 46036188ea1f5266df23a6149dea0df1c77cd1c7 ]
The mm kselftests are currently built with no optimisation (-O0). It's
unclear why, and besides being obviously suboptimal, this also prevents
the pkeys tests from working as intended. Let's build all the tests with
-O2.
[kevin.brodsky(a)arm.com: silence unused-result warnings]
Link: https://lkml.kernel.org/r/20250107170110.2819685-1-kevin.brodsky@arm.com
Link: https://lkml.kernel.org/r/20241209095019.1732120-6-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky(a)arm.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna(a)oracle.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Joey Gouly <joey.gouly(a)arm.com>
Cc: Keith Lucas <keith.lucas(a)oracle.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
(cherry picked from commit 46036188ea1f5266df23a6149dea0df1c77cd1c7)
[Yifei: This commit also fix the failure of pkey_sighandler_tests_64,
which is also in linux-6.12.y, thus backport this commit]
Signed-off-by: Yifei Liu <yifei.l.liu(a)oracle.com>
---
tools/testing/selftests/mm/Makefile | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 02e1204971b0..c0138cb19705 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -33,9 +33,16 @@ endif
# LDLIBS.
MAKEFLAGS += --no-builtin-rules
-CFLAGS = -Wall -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES) $(TOOLS_INCLUDES)
+CFLAGS = -Wall -O2 -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES) $(TOOLS_INCLUDES)
LDLIBS = -lrt -lpthread -lm
+# Some distributions (such as Ubuntu) configure GCC so that _FORTIFY_SOURCE is
+# automatically enabled at -O1 or above. This triggers various unused-result
+# warnings where functions such as read() or write() are called and their
+# return value is not checked. Disable _FORTIFY_SOURCE to silence those
+# warnings.
+CFLAGS += -U_FORTIFY_SOURCE
+
TEST_GEN_FILES = cow
TEST_GEN_FILES += compaction_test
TEST_GEN_FILES += gup_longterm
--
2.46.0
For platforms on powerpc architecture with a default page size greater
than 4096, there was an inconsistency in fragment size calculation.
This caused the BPF selftest xdp_adjust_tail/xdp_adjust_frags_tail_grow
to fail on powerpc.
The issue occurred because the fragment buffer size in
bpf_prog_test_run_xdp() was set to 4096, while the actual data size in
the fragment within the shared skb was checked against PAGE_SIZE
(65536 on powerpc) in min_t, causing it to exceed 4096 and be set
accordingly. This discrepancy led to an overflow when
bpf_xdp_frags_increase_tail() checked for tailroom, as skb_frag_size(frag)
could be greater than rxq->frag_size (when PAGE_SIZE > 4096).
This commit updates the page size references to 4096 to ensure consistency
and prevent overflow issues in fragment size calculations.
Fixes: 1c1949982524 ("bpf: introduce frags support to bpf_prog_test_run_xdp()")
Signed-off-by: Saket Kumar Bhaskar <skb99(a)linux.ibm.com>
---
net/bpf/test_run.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 501ec4249..eb5476184 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -124,7 +124,7 @@ struct xdp_test_data {
* must be updated accordingly this gets changed, otherwise BPF selftests
* will fail.
*/
-#define TEST_XDP_FRAME_SIZE (PAGE_SIZE - sizeof(struct xdp_page_head))
+#define TEST_XDP_FRAME_SIZE (4096 - sizeof(struct xdp_page_head))
#define TEST_XDP_MAX_BATCH 256
static void xdp_test_run_init_page(netmem_ref netmem, void *arg)
@@ -660,7 +660,7 @@ static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size,
void __user *data_in = u64_to_user_ptr(kattr->test.data_in);
void *data;
- if (size < ETH_HLEN || size > PAGE_SIZE - headroom - tailroom)
+ if (size < ETH_HLEN || size > 4096 - headroom - tailroom)
return ERR_PTR(-EINVAL);
if (user_size > size)
@@ -1297,7 +1297,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
frag = &sinfo->frags[sinfo->nr_frags++];
data_len = min_t(u32, kattr->test.data_size_in - size,
- PAGE_SIZE);
+ 4096);
skb_frag_fill_page_desc(frag, page, 0, data_len);
if (copy_from_user(page_address(page), data_in + size,
--
2.43.5
Greetings:
This is an attempt to followup on something Jakub asked me about [1],
adding an xsk attribute to queues and more clearly documenting which
queues are linked to NAPIs...
After the RFC [2], Jakub suggested creating an empty nest for queues
which have a pool, so I've adjusted this version to work that way.
The nest can be extended in the future to express attributes about XSK
as needed. Queues which are not used for AF_XDP do not have the xsk
attribute present.
I've run the included test on:
- my mlx5 machine (via NETIF=)
- without setting NETIF
And the test seems to pass in both cases.
Thanks,
Joe
[1]: https://lore.kernel.org/netdev/20250113143109.60afa59a@kernel.org/
[2]: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
v2:
- Switched from RFC to actual submission now that net-next is open
- Adjusted patch 1 to include an empty nest as suggested by Jakub
- Adjusted patch 2 to update the test based on changes to patch 1, and
to incorporate some Python feedback from Jakub :)
rfc: https://lore.kernel.org/netdev/20250129172431.65773-1-jdamato@fastly.com/
Joe Damato (2):
netdev-genl: Add an XSK attribute to queues
selftests: drv-net: Test queue xsk attribute
Documentation/netlink/specs/netdev.yaml | 13 ++-
include/uapi/linux/netdev.h | 6 ++
net/core/netdev-genl.c | 11 +++
tools/include/uapi/linux/netdev.h | 6 ++
.../testing/selftests/drivers/net/.gitignore | 2 +
tools/testing/selftests/drivers/net/Makefile | 3 +
tools/testing/selftests/drivers/net/queues.py | 35 +++++++-
.../selftests/drivers/net/xdp_helper.c | 90 +++++++++++++++++++
8 files changed, 163 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/.gitignore
create mode 100644 tools/testing/selftests/drivers/net/xdp_helper.c
base-commit: c2933b2befe25309f4c5cfbea0ca80909735fd76
--
2.25.1
This series introduces support in the ARM PMUv3 driver for
partitioning PMU counters into two separate ranges by taking advantage
of the MDCR_EL2.HPMN register field.
The advantage of a partitioned PMU would be to allow KVM guests direct
access to a subset of PMU functionality, greatly reducing the overhead
of performance monitoring in guests.
While this series could be accepted on its own merits, practically
there is a lot more to be done before it will be fully useful, so I'm
sending as an RFC for now.
This patch is based on v6.13-rc7. It needs a small additional change
after Oliver's Debug cleanups series going into 6.14, specifically
this patch [1], because it changes kvm_arm_setup_mdcr_el2() to
initialize HPMN from a cached value read early in the boot process
instead of reading from the register. The only sensible way I can see
to deal with this is returning to reading the register.
[1] https://lore.kernel.org/kvmarm/20241219224116.3941496-3-oliver.upton@linux.…
Colton Lewis (4):
perf: arm_pmuv3: Introduce module param to partition the PMU
KVM: arm64: Make guests see only counters they can access
perf: arm_pmuv3: Generalize counter bitmasks
perf: arm_pmuv3: Keep out of guest counter partition
arch/arm/include/asm/arm_pmuv3.h | 10 ++
arch/arm64/include/asm/arm_pmuv3.h | 10 ++
arch/arm64/kvm/pmu-emul.c | 8 +-
drivers/perf/arm_pmuv3.c | 113 ++++++++++++++++--
include/linux/perf/arm_pmu.h | 2 +
include/linux/perf/arm_pmuv3.h | 34 +++++-
.../kvm/aarch64/vpmu_counter_access.c | 2 +-
7 files changed, 160 insertions(+), 19 deletions(-)
base-commit: 5bc55a333a2f7316b58edc7573e8e893f7acb532
--
2.48.1.262.g85cc9f2d1e-goog
On Wed, Nov 13, 2024 at 2:31 AM Paolo Bonzini <pbonzini(a)redhat.com> wrote:
>
>
>
> Il mar 12 nov 2024, 21:44 Doug Covelli <doug.covelli(a)broadcom.com> ha scritto:
>>
>> > Split irqchip should be the best tradeoff. Without it, moves from cr8
>> > stay in the kernel, but moves to cr8 always go to userspace with a
>> > KVM_EXIT_SET_TPR exit. You also won't be able to use Intel
>> > flexpriority (in-processor accelerated TPR) because KVM does not know
>> > which bits are set in IRR. So it will be *really* every move to cr8
>> > that goes to userspace.
>>
>> Sorry to hijack this thread but is there a technical reason not to allow CR8
>> based accesses to the TPR (not MMIO accesses) when the in-kernel local APIC is
>> not in use?
>
>
> No worries, you're not hijacking :) The only reason is that it would be more code for a seldom used feature and anyway with worse performance. (To be clear, CR8 based accesses are allowed, but stores cause an exit in order to check the new TPR against IRR. That's because KVM's API does not have an equivalent of the TPR threshold as you point out below).
I have not really looked at the code but it seems like it could also
simplify things as CR8 would be handled more uniformly regardless of
who is virtualizing the local APIC.
>> Also I could not find these documented anywhere but with MSFT's APIC our monitor
>> relies on extensions for trapping certain events such as INIT/SIPI plus LINT0
>> and SVR writes:
>>
>> UINT64 X64ApicInitSipiExitTrap : 1; // WHvRunVpExitReasonX64ApicInitSipiTrap
>> UINT64 X64ApicWriteLint0ExitTrap : 1; // WHvRunVpExitReasonX64ApicWriteTrap
>> UINT64 X64ApicWriteLint1ExitTrap : 1; // WHvRunVpExitReasonX64ApicWriteTrap
>> UINT64 X64ApicWriteSvrExitTrap : 1; // WHvRunVpExitReasonX64ApicWriteTrap
>
>
> There's no need for this in KVM's in-kernel APIC model. INIT and SIPI are handled in the hypervisor and you can get the current state of APs via KVM_GET_MPSTATE. LINT0 and LINT1 are injected with KVM_INTERRUPT and KVM_NMI respectively, and they obey IF/PPR and NMI blocking respectively, plus the interrupt shadow; so there's no need for userspace to know when LINT0/LINT1 themselves change. The spurious interrupt vector register is also handled completely in kernel.
I realize that KVM can handle LINT0/SVR updates themselves but our
interrupt subsystem relies on knowing the current values of these
registers even when not virtualizing the local APIC. I suppose we
could use KVM_GET_LAPIC to sync things up on demand but that seems
like it might nor be great from a performance point of view.
>> I did not see any similar functionality for KVM. Does anything like that exist?
>> In any case we would be happy to add support for handling CR8 accesses w/o
>> exiting w/o the in-kernel APIC along with some sort of a way to configure the
>> TPR threshold if folks are not opposed to that.
>
>
> As far I know everybody who's using KVM (whether proprietary or open source) has had no need for that, so I don't think it's a good idea to make the API more complex. Performance of Windows guests is going to be bad anyway with userspace APIC.
From what I have seen the exit cost with KVM is significantly lower
than with WHP/Hyper-V. I don't think performance of Windows guests
with userspace APIC emulation would be bad if CR8 exits could be
avoided (Linux guests perf isn't bad from what I have observed and the
main difference is the astronomical number of CR8 exits). It seems
like it would be pretty decent although I agree if you want the
absolute best performance then you would want to use the in kernel
APIC to speed up handling of ICR/EOI writes but those are relatively
infrequent compared to CR8 accesses .
Anyway I just saw Sean's response while writing this and it seems he
is not in favor of avoiding CR8 exits w/o the in kernel APIC either so
I suppose we will have to look into making use of the in kernel APIC.
Doug
> Paolo
>
>> Doug
>>
>> > > For now I think it makes sense to handle BDOOR_CMD_GET_VCPU_INFO at userlevel
>> > > like we do on Windows and macOS.
>> > >
>> > > BDOOR_CMD_GETTIME/BDOOR_CMD_GETTIMEFULL are similar with the former being
>> > > deprecated in favor of the latter. Both do essentially the same thing which is
>> > > to return the host OS's time - on Linux this is obtained via gettimeofday. I
>> > > believe this is mainly used by tools to fix up the VM's time when resuming from
>> > > suspend. I think it is fine to continue handling these at userlevel.
>> >
>> > As long as the TSC is not involved it should be okay.
>> >
>> > Paolo
>> >
>> > > > >> Anyway, one question apart from this: is the API the same for the I/O
>> > > > >> port and hypercall backdoors?
>> > > > >
>> > > > > Yeah the calls and arguments are the same. The hypercall based
>> > > > > interface is an attempt to modernize the backdoor since as you pointed
>> > > > > out the I/O based interface is kind of hacky as it bypasses the normal
>> > > > > checks for an I/O port access at CPL3. It would be nice to get rid of
>> > > > > it but unfortunately I don't think that will happen in the foreseeable
>> > > > > future as there are a lot of existing VMs out there with older SW that
>> > > > > still uses this interface.
>> > > >
>> > > > Yeah, but I think it still justifies that the KVM_ENABLE_CAP API can
>> > > > enable the hypercall but not the I/O port.
>> > > >
>> > > > Paolo
>> >
>>
>> --
>> This electronic communication and the information and any files transmitted
>> with it, or attached to it, are confidential and are intended solely for
>> the use of the individual or entity to whom it is addressed and may contain
>> information that is confidential, legally privileged, protected by privacy
>> laws, or otherwise restricted from disclosure to anyone else. If you are
>> not the intended recipient or the person responsible for delivering the
>> e-mail to the intended recipient, you are hereby notified that any use,
>> copying, distributing, dissemination, forwarding, printing, or copying of
>> this e-mail is strictly prohibited. If you received this e-mail in error,
>> please return the e-mail to the sender, delete it from your computer, and
>> destroy any printed copy of it.
>>
--
This electronic communication and the information and any files transmitted
with it, or attached to it, are confidential and are intended solely for
the use of the individual or entity to whom it is addressed and may contain
information that is confidential, legally privileged, protected by privacy
laws, or otherwise restricted from disclosure to anyone else. If you are
not the intended recipient or the person responsible for delivering the
e-mail to the intended recipient, you are hereby notified that any use,
copying, distributing, dissemination, forwarding, printing, or copying of
this e-mail is strictly prohibited. If you received this e-mail in error,
please return the e-mail to the sender, delete it from your computer, and
destroy any printed copy of it.
Nolibc has support for riscv32. But the testsuite did not allow to test
it so far. Add a test configuration.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (6):
tools/nolibc: add support for waitid()
selftests/nolibc: use waitid() over waitpid()
selftests/nolibc: use a pipe to in vfprintf tests
selftests/nolibc: skip tests for unimplemented syscalls
selftests/nolibc: rename riscv to riscv64
selftests/nolibc: add configurations for riscv32
tools/include/nolibc/sys.h | 18 ++++++++++++
tools/testing/selftests/nolibc/Makefile | 11 +++++++
tools/testing/selftests/nolibc/nolibc-test.c | 44 ++++++++++++++++------------
tools/testing/selftests/nolibc/run-tests.sh | 2 +-
4 files changed, 56 insertions(+), 19 deletions(-)
---
base-commit: 499551201b5f4fd3c0618a3e95e3d0d15ea18f31
change-id: 20241219-nolibc-rv32-cff8a3e22394
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Two fixes for nullness elision. See commits for more details.
Daniel Xu (3):
bpf: verifier: Do not extract constant map keys for irrelevant maps
bpf: selftests: Test constant key extraction on irrelevant maps
bpf: verifier: Disambiguate get_constant_map_key() errors
kernel/bpf/verifier.c | 29 ++++++++++++++-----
.../bpf/progs/verifier_array_access.c | 15 ++++++++++
2 files changed, 36 insertions(+), 8 deletions(-)
--
2.47.1
Add a new selftest to verify netconsole's handling of messages that
exceed the packet size limit and require fragmentation. The test sends
messages with varying sizes and userdata, validating that:
1. Large messages are correctly fragmented and reassembled
2. Userdata fields are properly preserved across fragments
3. Messages work correctly with and without kernel release version
appending
The test creates a networking environment using netdevsim, sends
messages through /dev/kmsg, and verifies the received fragments maintain
message integrity.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 7 ++
.../drivers/net/netcons_fragmented_msg.sh | 122 +++++++++++++++++++++
3 files changed, 130 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile
index 137470bdee0c7fd2517bd1baafc12d575de4b4ac..c7f1c443f2af091aa13f67dd1df9ae05d7a43f40 100644
--- a/tools/testing/selftests/drivers/net/Makefile
+++ b/tools/testing/selftests/drivers/net/Makefile
@@ -7,6 +7,7 @@ TEST_INCLUDES := $(wildcard lib/py/*.py) \
TEST_PROGS := \
netcons_basic.sh \
+ netcons_fragmented_msg.sh \
netcons_overflow.sh \
ping.py \
queues.py \
diff --git a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh
index 3acaba41ac7b21aa2fd8457ed640a5ac8a41bc12..0c262b123fdd3082c40b2bd899ec626d223226ed 100644
--- a/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh
+++ b/tools/testing/selftests/drivers/net/lib/sh/lib_netcons.sh
@@ -110,6 +110,13 @@ function create_dynamic_target() {
echo 1 > "${NETCONS_PATH}"/enabled
}
+# Do not append the release to the header of the message
+function disable_release_append() {
+ echo 0 > "${NETCONS_PATH}"/enabled
+ echo 0 > "${NETCONS_PATH}"/release
+ echo 1 > "${NETCONS_PATH}"/enabled
+}
+
function cleanup() {
local NSIM_DEV_SYS_DEL="/sys/bus/netdevsim/del_device"
diff --git a/tools/testing/selftests/drivers/net/netcons_fragmented_msg.sh b/tools/testing/selftests/drivers/net/netcons_fragmented_msg.sh
new file mode 100755
index 0000000000000000000000000000000000000000..d175d5b9db662ab9a6ee203794569cc620801a4f
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/netcons_fragmented_msg.sh
@@ -0,0 +1,122 @@
+#!/usr/bin/env bash
+# SPDX-License-Identifier: GPL-2.0
+
+# Test netconsole's message fragmentation functionality.
+#
+# When a message exceeds the maximum packet size, netconsole splits it into
+# multiple fragments for transmission. This test verifies:
+# - Correct fragmentation of large messages
+# - Proper reassembly of fragments at the receiver
+# - Preservation of userdata across fragments
+# - Behavior with and without kernel release version appending
+#
+# Author: Breno Leitao <leitao(a)debian.org>
+
+set -euo pipefail
+
+SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")
+
+source "${SCRIPTDIR}"/lib/sh/lib_netcons.sh
+
+modprobe netdevsim 2> /dev/null || true
+modprobe netconsole 2> /dev/null || true
+
+# The content of kmsg will be save to the following file
+OUTPUT_FILE="/tmp/${TARGET}"
+
+# set userdata to a long value. In this case, it is "1-2-3-4...50-"
+USERDATA_VALUE=$(printf -- '%.2s-' {1..60})
+
+# Convert the header string in a regexp, so, we can remove
+# the second header as well.
+# A header looks like "13,468,514729715,-,ncfrag=0/1135;". If
+# release is appended, you might find something like:L
+# "6.13.0-04048-g4f561a87745a,13,468,514729715,-,ncfrag=0/1135;"
+function header_to_regex() {
+ # header is everything before ;
+ local HEADER="${1}"
+ REGEX=$(echo "${HEADER}" | cut -d'=' -f1)
+ echo "${REGEX}=[0-9]*\/[0-9]*;"
+}
+
+# We have two headers in the message. Remove both to get the full message,
+# and extract the full message.
+function extract_msg() {
+ local MSGFILE="${1}"
+ # Extract the header, which is the very first thing that arrives in the
+ # first list.
+ HEADER=$(sed -n '1p' "${MSGFILE}" | cut -d';' -f1)
+ HEADER_REGEX=$(header_to_regex "${HEADER}")
+
+ # Remove the two headers from the received message
+ # This will return the message without any header, similarly to what
+ # was sent.
+ sed "s/""${HEADER_REGEX}""//g" "${MSGFILE}"
+}
+
+# Validate the message, which has two messages glued together.
+# unwrap them to make sure all the characters were transmitted.
+# File will look like the following:
+# 13,468,514729715,-,ncfrag=0/1135;MSG1=MSG2=MSG3=MSG4=MSG5=MSG6=MSG7=MSG8=MSG9=MSG10=MSG11=MSG12=MSG13=MSG14=MSG15=MSG16=MSG17=MSG18=MSG19=MSG20=MSG21=MSG22=MSG23=MSG24=MSG25=MSG26=MSG27=MSG28=MSG29=MSG30=MSG31=MSG32=MSG33=MSG34=MSG35=MSG36=MSG37=MSG38=MSG39=MSG40=MSG41=MSG42=MSG43=MSG44=MSG45=MSG46=MSG47=MSG48=MSG49=MSG50=MSG51=MSG52=MSG53=MSG54=MSG55=MSG56=MSG57=MSG58=MSG59=MSG60=MSG61=MSG62=MSG63=MSG64=MSG65=MSG66=MSG67=MSG68=MSG69=MSG70=MSG71=MSG72=MSG73=MSG74=MSG75=MSG76=MSG77=MSG78=MSG79=MSG80=MSG81=MSG82=MSG83=MSG84=MSG85=MSG86=MSG87=MSG88=MSG89=MSG90=MSG91=MSG92=MSG93=MSG94=MSG95=MSG96=MSG97=MSG98=MSG99=MSG100=MSG101=MSG102=MSG103=MSG104=MSG105=MSG106=MSG107=MSG108=MSG109=MSG110=MSG111=MSG112=MSG113=MSG114=MSG115=MSG116=MSG117=MSG118=MSG119=MSG120=MSG121=MSG122=MSG123=MSG124=MSG125=MSG126=MSG127=MSG128=MSG129=MSG130=MSG131=MSG132=MSG133=MSG134=MSG135=MSG136=MSG137=MSG138=MSG139=MSG140=MSG141=MSG142=MSG143=MSG144=MSG145=MSG146=MSG147=MSG148=MSG149=MSG150=: netcons_nzmJQ
+# key=1-2-13,468,514729715,-,ncfrag=967/1135;3-4-5-6-7-8-9-10-11-12-13-14-15-16-17-18-19-20-21-22-23-24-25-26-27-28-29-30-31-32-33-34-35-36-37-38-39-40-41-42-43-44-45-46-47-48-49-50-51-52-53-54-55-56-57-58-59-60-
+function validate_fragmented_result() {
+ # Discard the netconsole headers, and assemble the full message
+ RCVMSG=$(extract_msg "${1}")
+
+ # check for the main message
+ if ! echo "${RCVMSG}" | grep -q "${MSG}"; then
+ echo "Message body doesn't match." >&2
+ echo "msg received=" "${RCVMSG}" >&2
+ exit "${ksft_fail}"
+ fi
+
+ # check userdata
+ if ! echo "${RCVMSG}" | grep -q "${USERDATA_VALUE}"; then
+ echo "message userdata doesn't match" >&2
+ echo "msg received=" "${RCVMSG}" >&2
+ exit "${ksft_fail}"
+ fi
+ # test passed. hooray
+}
+
+# Check for basic system dependency and exit if not found
+check_for_dependencies
+# Set current loglevel to KERN_INFO(6), and default to KERN_NOTICE(5)
+echo "6 5" > /proc/sys/kernel/printk
+# Remove the namespace, interfaces and netconsole target on exit
+trap cleanup EXIT
+# Create one namespace and two interfaces
+set_network
+# Create a dynamic target for netconsole
+create_dynamic_target
+# Set userdata "key" with the "value" value
+set_user_data
+
+
+# TEST 1: Send message and userdata. They will fragment
+# =======
+MSG=$(printf -- 'MSG%.3s=' {1..150})
+
+# Listen for netconsole port inside the namespace and destination interface
+listen_port_and_save_to "${OUTPUT_FILE}" &
+# Wait for socat to start and listen to the port.
+wait_local_port_listen "${NAMESPACE}" "${PORT}" udp
+# Send the message
+echo "${MSG}: ${TARGET}" > /dev/kmsg
+# Wait until socat saves the file to disk
+busywait "${BUSYWAIT_TIMEOUT}" test -s "${OUTPUT_FILE}"
+# Check if the message was not corrupted
+validate_fragmented_result "${OUTPUT_FILE}"
+
+# TEST 2: Test with smaller message, and without release appended
+# =======
+MSG=$(printf -- 'FOOBAR%.3s=' {1..100})
+# Let's disable release and test again.
+disable_release_append
+
+listen_port_and_save_to "${OUTPUT_FILE}" &
+wait_local_port_listen "${NAMESPACE}" "${PORT}" udp
+echo "${MSG}: ${TARGET}" > /dev/kmsg
+busywait "${BUSYWAIT_TIMEOUT}" test -s "${OUTPUT_FILE}"
+validate_fragmented_result "${OUTPUT_FILE}"
+exit "${ksft_pass}"
---
base-commit: 0ad9617c78acbc71373fb341a6f75d4012b01d69
change-id: 20250129-netcons_frag_msgs-91506d136f50
Best regards,
--
Breno Leitao <leitao(a)debian.org>
Hi all,
This patch series continues the work to migrate the *.sh tests into
prog_tests framework.
test_xdp_redirect_multi.sh tests the XDP redirections done through
bpf_redirect_map().
This is already partly covered by test_xdp_veth.c that already tests
map redirections at XDP level. What isn't covered yet by test_xdp_veth is
the use of the broadcast flags (BPF_F_BROADCAST or BPF_F_EXCLUDE_INGRESS)
and XDP egress programs.
Hence, this patch series add test cases to test_xdp_veth.c to get rid of
the test_xdp_redirect_multi.sh:
- PATCH 1 Add an helper to generate unique names
- PATCH 2 to 9 rework test_xdp_veth to make it more generic and allow to
configure different test cases
- PATCH 10 adds test cases for 'classic' bpf_redirect_map()
- PATCH 11 and 12 cover the broadcast flags
- PATCH 13 covers the XDP egress programs
- PATCH 14 removes test_xdp_redirect_multi.sh
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
---
Changes in v4:
- Remove the NO_IP #define
- append_tid() takes string's size as input to ensure there is enough
space to fit the thread ID at the end
- Fix PATCH 12's commit log
- Link to v3: https://lore.kernel.org/r/20250128-redirect-multi-v3-0-c1ce69997c01@bootlin…
Changes in v3:
- Add append_tid() helper and use unique names to allow parallel testing
- Check create_network()'s return value through ASSERT_OK()
- Remove check_ping() and unused defines
- Change next_veth type (from string to int)
- Link to v2: https://lore.kernel.org/r/20250121-redirect-multi-v2-0-fc9cacabc6b2@bootlin…
Changes in v2:
- Use serial_test_* to avoid conflict between tests
- Link to v1: https://lore.kernel.org/r/20250121-redirect-multi-v1-0-b215e35ff505@bootlin…
---
Bastien Curutchet (eBPF Foundation) (14):
selftests/bpf: helpers: Add append_tid()
selftests/bpf: test_xdp_veth: Remove unused defines
selftests/bpf: test_xdp_veth: Remove unecessarry check_ping()
selftests/bpf: test_xdp_veth: Use int to describe next veth
selftests/bpf: test_xdp_veth: Split network configuration
selftests/bpf: test_xdp_veth: Rename config[]
selftests/bpf: test_xdp_veth: Add prog_config[] table
selftests/bpf: test_xdp_veth: Add XDP flags to prog_configuration
selftests/bpf: test_xdp_veth: Use unique names
selftests/bpf: test_xdp_veth: Add new test cases for XDP flags
selftests/bpf: Optionally select broadcasting flags
selftests/bpf: test_xdp_veth: Add XDP broadcast redirection tests
selftests/bpf: test_xdp_veth: Add XDP program on egress test
selftests/bpf: Remove test_xdp_redirect_multi.sh
tools/testing/selftests/bpf/Makefile | 2 -
tools/testing/selftests/bpf/network_helpers.c | 17 +
tools/testing/selftests/bpf/network_helpers.h | 12 +
.../selftests/bpf/prog_tests/test_xdp_veth.c | 588 ++++++++++++++++-----
.../testing/selftests/bpf/progs/xdp_redirect_map.c | 89 ++++
.../selftests/bpf/progs/xdp_redirect_multi_kern.c | 41 +-
.../selftests/bpf/test_xdp_redirect_multi.sh | 214 --------
tools/testing/selftests/bpf/xdp_redirect_multi.c | 226 --------
8 files changed, 615 insertions(+), 574 deletions(-)
---
base-commit: 421ec9c8f46a25743870a8cbaff76de293752e00
change-id: 20250103-redirect-multi-245d6eafb5d1
Best regards,
--
Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
PTRACE_SET_SYSCALL_INFO is a generic ptrace API that complements
PTRACE_GET_SYSCALL_INFO by letting the ptracer modify details of
system calls the tracee is blocked in.
This API allows ptracers to obtain and modify system call details
in a straightforward and architecture-agnostic way.
Current implementation supports changing only those bits of system call
information that are used by strace, namely, syscall number, syscall
arguments, and syscall return value.
Support of changing additional details returned by PTRACE_GET_SYSCALL_INFO,
such as instruction pointer and stack pointer, could be added later if
needed, by using struct ptrace_syscall_info.flags to specify the additional
details that should be set. Currently, "flags" and "reserved" fields of
struct ptrace_syscall_info must be initialized with zeroes; "arch",
"instruction_pointer", and "stack_pointer" fields are ignored.
PTRACE_SET_SYSCALL_INFO currently supports only PTRACE_SYSCALL_INFO_ENTRY,
PTRACE_SYSCALL_INFO_EXIT, and PTRACE_SYSCALL_INFO_SECCOMP operations.
Other operations could be added later if needed.
Ideally, PTRACE_SET_SYSCALL_INFO should have been introduced along with
PTRACE_GET_SYSCALL_INFO, but it didn't happen. The last straw that
convinced me to implement PTRACE_SET_SYSCALL_INFO was apparent failure
to provide an API of changing the first system call argument on riscv
architecture [1].
ptrace(2) man page:
long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);
...
PTRACE_SET_SYSCALL_INFO
Modify information about the system call that caused the stop.
The "data" argument is a pointer to struct ptrace_syscall_info
that specifies the system call information to be set.
The "addr" argument should be set to sizeof(struct ptrace_syscall_info)).
[1] https://lore.kernel.org/all/59505464-c84a-403d-972f-d4b2055eeaac@gmail.com/
Notes:
v4:
* Split out syscall_set_return_value() for hexagon into a separate patch
* s390: Change the style of syscall_set_arguments() implementation as
requested
* Add more Reviewed-by
* v3: https://lore.kernel.org/all/20250128091445.GA8257@strace.io/
v3:
* powerpc: Submit syscall_set_return_value() fix for "sc" case separately
* mips: Do not introduce erroneous argument truncation on mips n32,
add a detailed description to the commit message of the
mips_get_syscall_arg() change
* ptrace: Add explicit padding to the end of struct ptrace_syscall_info,
simplify obtaining of user ptrace_syscall_info,
do not introduce PTRACE_SYSCALL_INFO_SIZE_VER0
* ptrace: Change the return type of ptrace_set_syscall_info_* functions
from "unsigned long" to "int"
* ptrace: Add -ERANGE check to ptrace_set_syscall_info_exit(),
add comments to -ERANGE checks
* ptrace: Update comments about supported syscall stops
* selftests: Extend set_syscall_info test, fix for mips n32
* Add Tested-by and Reviewed-by
v2:
* Add patch to fix syscall_set_return_value() on powerpc
* Add patch to fix mips_get_syscall_arg() on mips
* Add syscall_set_return_value() implementation on hexagon
* Add syscall_set_return_value() invocation to syscall_set_nr()
on arm and arm64.
* Fix syscall_set_nr() and mips_set_syscall_arg() on mips
* Add a comment to syscall_set_nr() on arc, powerpc, s390, sh,
and sparc
* Remove redundant ptrace_syscall_info.op assignments in
ptrace_get_syscall_info_*
* Minor style tweaks in ptrace_get_syscall_info_op()
* Remove syscall_set_return_value() invocation from
ptrace_set_syscall_info_entry()
* Skip syscall_set_arguments() invocation in case of syscall number -1
in ptrace_set_syscall_info_entry()
* Split ptrace_syscall_info.reserved into ptrace_syscall_info.reserved
and ptrace_syscall_info.flags
* Use __kernel_ulong_t instead of unsigned long in set_syscall_info test
v1:
Dmitry V. Levin (7):
mips: fix mips_get_syscall_arg() for o32
hexagon: add syscall_set_return_value()
syscall.h: add syscall_set_arguments()
syscall.h: introduce syscall_set_nr()
ptrace_get_syscall_info: factor out ptrace_get_syscall_info_op
ptrace: introduce PTRACE_SET_SYSCALL_INFO request
selftests/ptrace: add a test case for PTRACE_SET_SYSCALL_INFO
arch/arc/include/asm/syscall.h | 25 +
arch/arm/include/asm/syscall.h | 37 ++
arch/arm64/include/asm/syscall.h | 29 +
arch/csky/include/asm/syscall.h | 13 +
arch/hexagon/include/asm/syscall.h | 21 +
arch/loongarch/include/asm/syscall.h | 15 +
arch/m68k/include/asm/syscall.h | 7 +
arch/microblaze/include/asm/syscall.h | 7 +
arch/mips/include/asm/syscall.h | 70 ++-
arch/nios2/include/asm/syscall.h | 16 +
arch/openrisc/include/asm/syscall.h | 13 +
arch/parisc/include/asm/syscall.h | 19 +
arch/powerpc/include/asm/syscall.h | 20 +
arch/riscv/include/asm/syscall.h | 16 +
arch/s390/include/asm/syscall.h | 21 +
arch/sh/include/asm/syscall_32.h | 24 +
arch/sparc/include/asm/syscall.h | 22 +
arch/um/include/asm/syscall-generic.h | 19 +
arch/x86/include/asm/syscall.h | 43 ++
arch/xtensa/include/asm/syscall.h | 18 +
include/asm-generic/syscall.h | 30 +
include/uapi/linux/ptrace.h | 7 +-
kernel/ptrace.c | 179 +++++-
tools/testing/selftests/ptrace/Makefile | 2 +-
.../selftests/ptrace/set_syscall_info.c | 514 ++++++++++++++++++
25 files changed, 1140 insertions(+), 47 deletions(-)
create mode 100644 tools/testing/selftests/ptrace/set_syscall_info.c
--
ldv
Commit 4094871db1d6 ("udp: only do GSO if # of segs > 1") avoided GSO
for small packets. But the kernel currently dismisses GSO requests only
after checking MTU/PMTU on gso_size. This means any packets, regardless
of their payload sizes, could be dropped when PMTU becomes smaller than
requested gso_size. We encountered this issue in production and it
caused a reliability problem that new QUIC connection cannot be
established before PMTU cache expired, while non GSO sockets still
worked fine at the same time.
Ideally, do not check any GSO related constraints when payload size is
smaller than requested gso_size, and return EMSGSIZE instead of EINVAL
on MTU/PMTU check failure to be more specific on the error cause.
Fixes: 4094871db1d6 ("udp: only do GSO if # of segs > 1")
Signed-off-by: Yan Zhai <yan(a)cloudflare.com>
Suggested-by: Willem de Bruijn <willemdebruijn.kernel(a)gmail.com>
---
v2->v3: simplify the code; adding two test cases
v1->v2: add a missing MTU check when fall back to no GSO mode; Fixed up
commit message to be more precise.
v2: https://lore.kernel.org/netdev/Z5swit7ykNRbJFMS@debian.debian/T/#u
v1: https://lore.kernel.org/all/Z5cgWh%2F6bRQm9vVU@debian.debian/
---
net/ipv4/udp.c | 4 ++--
net/ipv6/udp.c | 4 ++--
tools/testing/selftests/net/udpgso.c | 26 ++++++++++++++++++++++++++
3 files changed, 30 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c472c9a57cf6..a9bb9ce5438e 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1141,9 +1141,9 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
const int hlen = skb_network_header_len(skb) +
sizeof(struct udphdr);
- if (hlen + cork->gso_size > cork->fragsize) {
+ if (hlen + min(datalen, cork->gso_size) > cork->fragsize) {
kfree_skb(skb);
- return -EINVAL;
+ return -EMSGSIZE;
}
if (datalen > cork->gso_size * UDP_MAX_SEGMENTS) {
kfree_skb(skb);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 6671daa67f4f..c6ea438b5c75 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1389,9 +1389,9 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
const int hlen = skb_network_header_len(skb) +
sizeof(struct udphdr);
- if (hlen + cork->gso_size > cork->fragsize) {
+ if (hlen + min(datalen, cork->gso_size) > cork->fragsize) {
kfree_skb(skb);
- return -EINVAL;
+ return -EMSGSIZE;
}
if (datalen > cork->gso_size * UDP_MAX_SEGMENTS) {
kfree_skb(skb);
diff --git a/tools/testing/selftests/net/udpgso.c b/tools/testing/selftests/net/udpgso.c
index 3f2fca02fec5..36ff28af4b19 100644
--- a/tools/testing/selftests/net/udpgso.c
+++ b/tools/testing/selftests/net/udpgso.c
@@ -102,6 +102,19 @@ struct testcase testcases_v4[] = {
.gso_len = CONST_MSS_V4,
.r_num_mss = 1,
},
+ {
+ /* datalen <= MSS < gso_len: will fall back to no GSO */
+ .tlen = CONST_MSS_V4,
+ .gso_len = CONST_MSS_V4 + 1,
+ .r_num_mss = 0,
+ .r_len_last = CONST_MSS_V4,
+ },
+ {
+ /* MSS < datalen < gso_len: fail */
+ .tlen = CONST_MSS_V4 + 1,
+ .gso_len = CONST_MSS_V4 + 2,
+ .tfail = true,
+ },
{
/* send a single MSS + 1B */
.tlen = CONST_MSS_V4 + 1,
@@ -205,6 +218,19 @@ struct testcase testcases_v6[] = {
.gso_len = CONST_MSS_V6,
.r_num_mss = 1,
},
+ {
+ /* datalen <= MSS < gso_len: will fall back to no GSO */
+ .tlen = CONST_MSS_V6,
+ .gso_len = CONST_MSS_V6 + 1,
+ .r_num_mss = 0,
+ .r_len_last = CONST_MSS_V6,
+ },
+ {
+ /* MSS < datalen < gso_len: fail */
+ .tlen = CONST_MSS_V6 + 1,
+ .gso_len = CONST_MSS_V6 + 2,
+ .tfail = true
+ },
{
/* send a single MSS + 1B */
.tlen = CONST_MSS_V6 + 1,
--
2.30.2
After submitting the patch implementing the SO_RCVPRIORITY socket option
(https://lore.kernel.org/netdev/20241213084457.45120-5-annaemesenyiri@gmail.…),
it was requested to include a test for the functionality. As a first step, write
a test that also validates the SO_RCVMARK value, since no existing test covers
it. If this combined test is not suitable, I will provide a standalone
test specifically for SO_RCVPRIORITY and submit it separately.
Anna Emese Nyiri (1):
add support for testing SO_RCVMARK and SO_RCVPRIORITY
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/so_rcv_listener.c | 147 ++++++++++++++++++
tools/testing/selftests/net/test_so_rcv.sh | 56 +++++++
3 files changed, 204 insertions(+)
create mode 100644 tools/testing/selftests/net/so_rcv_listener.c
create mode 100755 tools/testing/selftests/net/test_so_rcv.sh
--
2.43.0
This patch allows progs to elide a null check on statically known map
lookup keys. In other words, if the verifier can statically prove that
the lookup will be in-bounds, allow the prog to drop the null check.
This is useful for two reasons:
1. Large numbers of nullness checks (especially when they cannot fail)
unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ.
2. It forms a tighter contract between programmer and verifier.
For (1), bpftrace is starting to make heavier use of percpu scratch
maps. As a result, for user scripts with large number of unrolled loops,
we are starting to hit jump complexity verification errors. These
percpu lookups cannot fail anyways, as we only use static key values.
Eliding nullness probably results in less work for verifier as well.
For (2), percpu scratch maps are often used as a larger stack, as the
currrent stack is limited to 512 bytes. In these situations, it is
desirable for the programmer to express: "this lookup should never fail,
and if it does, it means I messed up the code". By omitting the null
check, the programmer can "ask" the verifier to double check the logic.
=== Changelog ===
Changes in v7:
* Use more accurate frame number when marking precise
* Add test for non-stack key
* Test for marking stack slot precise
Changes in v6:
* Use is_spilled_scalar_reg() helper and remove unnecessary comment
* Add back deleted selftest with different helper to dirty dst buffer
* Check size of spill is exactly key_size and update selftests
* Read slot_type from correct offset into the spi
* Rewrite selftests in C where possible
* Mark constant map keys as precise
Changes in v5:
* Dropped all acks
* Use s64 instead of long for const_map_key
* Ensure stack slot contains spilled reg before accessing spilled_ptr
* Ensure spilled reg is a scalar before accessing tnum const value
* Fix verifier selftest for 32-bit write to write at 8 byte alignment
to ensure spill is tracked
* Introduce more precise tracking of helper stack accesses
* Do constant map key extraction as part of helper argument processing
and then remove duplicated stack checks
* Use ret_flag instead of regs[BPF_REG_0].type
* Handle STACK_ZERO
* Fix bug in bpf_load_hdr_opt() arg annotation
Changes in v4:
* Only allow for CAP_BPF
* Add test for stack growing upwards
* Improve comment about stack growing upwards
Changes in v3:
* Check if stack is (erroneously) growing upwards
* Mention in commit message why existing tests needed change
Changes in v2:
* Added a check for when R2 is not a ptr to stack
* Added a check for when stack is uninitialized (no stack slot yet)
* Updated existing tests to account for null elision
* Added test case for when R2 can be both const and non-const
Daniel Xu (5):
bpf: verifier: Add missing newline on verbose() call
bpf: tcp: Mark bpf_load_hdr_opt() arg2 as read-write
bpf: verifier: Refactor helper access type tracking
bpf: verifier: Support eliding map lookup nullness
bpf: selftests: verifier: Add nullness elision tests
kernel/bpf/verifier.c | 139 ++++++++++---
net/core/filter.c | 2 +-
.../testing/selftests/bpf/progs/dynptr_fail.c | 6 +-
tools/testing/selftests/bpf/progs/iters.c | 14 +-
.../selftests/bpf/progs/map_kptr_fail.c | 2 +-
.../selftests/bpf/progs/test_global_func10.c | 2 +-
.../selftests/bpf/progs/uninit_stack.c | 5 +-
.../bpf/progs/verifier_array_access.c | 188 ++++++++++++++++++
.../bpf/progs/verifier_basic_stack.c | 2 +-
.../selftests/bpf/progs/verifier_const_or.c | 4 +-
.../progs/verifier_helper_access_var_len.c | 12 +-
.../selftests/bpf/progs/verifier_int_ptr.c | 2 +-
.../selftests/bpf/progs/verifier_map_in_map.c | 2 +-
.../selftests/bpf/progs/verifier_mtu.c | 2 +-
.../selftests/bpf/progs/verifier_raw_stack.c | 4 +-
.../selftests/bpf/progs/verifier_unpriv.c | 2 +-
.../selftests/bpf/progs/verifier_var_off.c | 8 +-
tools/testing/selftests/bpf/verifier/calls.c | 2 +-
.../testing/selftests/bpf/verifier/map_kptr.c | 2 +-
19 files changed, 331 insertions(+), 69 deletions(-)
--
2.47.1
Fix the build failure caused by the undefined `CLONE_NEWTIME`.
Include the `linux/sched.h` header file where the function is defined to
ensure successful compilation of the selftests.
Signed-off-by: Purva Yeshi <purvayeshi550(a)gmail.com>
---
tools/testing/selftests/vDSO/vdso_test_getrandom.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/vDSO/vdso_test_getrandom.c b/tools/testing/selftests/vDSO/vdso_test_getrandom.c
index 95057f7567db..b2c9cf15878b 100644
--- a/tools/testing/selftests/vDSO/vdso_test_getrandom.c
+++ b/tools/testing/selftests/vDSO/vdso_test_getrandom.c
@@ -29,6 +29,8 @@
#include "vdso_config.h"
#include "vdso_call.h"
+#include <linux/sched.h>
+
#ifndef timespecsub
#define timespecsub(tsp, usp, vsp) \
do { \
--
2.34.1
A few cleanups and optimizations for the management of the kernel
configuration.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Changes in v2:
- Rebase on current torvalds/master
- Call "make defconfig" separately
- Link to v1: https://lore.kernel.org/r/20250122-nolibc-config-v1-0-a697db968b49@weisssch…
---
Thomas Weißschuh (5):
selftests/nolibc: drop custom EXTRACONFIG functionality
selftests/nolibc: drop call to prepare target
selftests/nolibc: drop call to mrproper target
selftests/nolibc: execute defconfig before other targets
selftests/nolibc: always keep test kernel configuration up to date
tools/testing/selftests/nolibc/Makefile | 17 +++++------------
tools/testing/selftests/nolibc/run-tests.sh | 7 +++----
2 files changed, 8 insertions(+), 16 deletions(-)
---
base-commit: 21266b8df5224c4f677acf9f353eecc9094731f0
change-id: 20250122-nolibc-config-d639e1612c93
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=*
=======
RFC v2 addresses much of the feedback from RFC v1. I plan on sending
something close to this as net-next reopens, sending it slightly early
to get feedback if any.
Major changes:
--------------
- much improved UAPI as suggested by Stan. We now interpret the iov_base
of the passed in iov from userspace as the offset into the dmabuf to
send from. This removes the need to set iov.iov_base = NULL which may
be confusing to users, and enables us to send multiple iovs in the
same sendmsg() call. ncdevmem and the docs show a sample use of that.
- Removed the duplicate dmabuf iov_iter in binding->iov_iter. I think
this is good improvment as it was confusing to keep track of
2 iterators for the same sendmsg, and mistracking both iterators
caused a couple of bugs reported in the last iteration that are now
resolved with this streamlining.
- Improved test coverage in ncdevmem. Now muliple sendmsg() are tested,
and sending multiple iovs in the same sendmsg() is tested.
- Fixed issue where dmabuf unmapping was happening in invalid context
(Stan).
====================================================================
The TX path had been dropped from the Device Memory TCP patch series
post RFCv1 [1], to make that series slightly easier to review. This
series rebases the implementation of the TX path on top of the
net_iov/netmem framework agreed upon and merged. The motivation for
the feature is thoroughly described in the docs & cover letter of the
original proposal, so I don't repeat the lengthy descriptions here, but
they are available in [1].
Sending this series as RFC as the winder closure is immenient. I plan on
reposting as non-RFC once the tree re-opens, addressing any feedback
I receive in the meantime.
Full outline on usage of the TX path is detailed in the documentation
added in the first patch.
Test example is available via the kselftest included in the series as well.
The series is relatively small, as the TX path for this feature largely
piggybacks on the existing MSG_ZEROCOPY implementation.
Patch Overview:
---------------
1. Documentation & tests to give high level overview of the feature
being added.
2. Add netmem refcounting needed for the TX path.
3. Devmem TX netlink API.
4. Devmem TX net stack implementation.
Testing:
--------
Testing is very similar to devmem TCP RX path. The ncdevmem test used
for the RX path is now augemented with client functionality to test TX
path.
* Test Setup:
Kernel: net-next with this RFC and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Performance results are not included with this version, unfortunately.
I'm having issues running the dma-buf exporter driver against the
upstream kernel on my test setup. The issues are specific to that
dma-buf exporter and do not affect this patch series. I plan to follow
up this series with perf fixes if the tests point to issues once they're
up and running.
Special thanks to Stan who took a stab at rebasing the TX implementation
on top of the netmem/net_iov framework merged. Parts of his proposal [2]
that are reused as-is are forked off into their own patches to give full
credit.
[1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.…
[2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#…
Cc: sdf(a)fomichev.me
Cc: asml.silence(a)gmail.com
Cc: dw(a)davidwei.uk
Cc: Jamal Hadi Salim <jhs(a)mojatatu.com>
Cc: Victor Nogueira <victor(a)mojatatu.com>
Cc: Pedro Tammela <pctammela(a)mojatatu.com>
Mina Almasry (5):
net: add devmem TCP TX documentation
selftests: ncdevmem: Implement devmem TCP TX
net: add get_netmem/put_netmem support
net: devmem: Implement TX path
net: devmem: make dmabuf unbinding scheduled work
Stanislav Fomichev (1):
net: devmem: TCP tx netlink api
Documentation/netlink/specs/netdev.yaml | 12 +
Documentation/networking/devmem.rst | 144 ++++++++-
include/linux/skbuff.h | 15 +-
include/linux/skbuff_ref.h | 4 +-
include/net/netmem.h | 3 +
include/net/sock.h | 1 +
include/uapi/linux/netdev.h | 1 +
include/uapi/linux/uio.h | 6 +-
net/core/datagram.c | 41 ++-
net/core/devmem.c | 110 ++++++-
net/core/devmem.h | 70 ++++-
net/core/netdev-genl-gen.c | 13 +
net/core/netdev-genl-gen.h | 1 +
net/core/netdev-genl.c | 67 ++++-
net/core/skbuff.c | 36 ++-
net/core/sock.c | 8 +
net/ipv4/tcp.c | 36 ++-
net/vmw_vsock/virtio_transport_common.c | 3 +-
tools/include/uapi/linux/netdev.h | 1 +
.../selftests/drivers/net/hw/ncdevmem.c | 276 +++++++++++++++++-
20 files changed, 802 insertions(+), 46 deletions(-)
--
2.48.1.362.g079036d154-goog
The current implementation of netconsole sends all log messages in
parallel, which can lead to an intermixed and interleaved output on the
receiving side. This makes it challenging to demultiplex the messages
and attribute them to their originating CPUs.
As a result, users and developers often struggle to effectively analyze
and debug the parallel log output received through netconsole.
Example of a message got from produciton hosts:
------------[ cut here ]------------
------------[ cut here ]------------
refcount_t: saturated; leaking memory.
WARNING: CPU: 2 PID: 1613668 at lib/refcount.c:22 refcount_warn_saturate+0x5e/0xe0
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 26 PID: 4139916 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0xe0
Modules linked in: bpf_preload(E) vhost_net(E) tun(E) vhost(E)
This series of patches introduces a new feature to the netconsole
subsystem that allows the automatic population of the CPU number in the
userdata field for each log message. This enhancement provides several
benefits:
* Improved demultiplexing of parallel log output: When multiple CPUs are
sending messages concurrently, the added CPU number in the userdata
makes it easier to differentiate and attribute the messages to their
originating CPUs.
* Better visibility into message sources: The CPU number information
gives users and developers more insight into which specific CPU a
particular log message came from, which can be valuable for debugging
and analysis.
The changes in this series are as follows Patches::
Patch "consolidate send buffers into netconsole_target struct"
=================================================
Move the static buffers to netconsole target, from static declaration
in send_msg_no_fragmentation() and send_msg_fragmented().
Patch "netconsole: Rename userdata to extradata"
=================================================
Create the a concept of extradata, which encompasses the concept of
userdata and the upcoming sysdatao
Sysdata is a new concept being added, which is basically fields that are
populated by the kernel. At this time only the CPU#, but, there is a
desire to add current task name, kernel release version, etc.
Patch "netconsole: Helper to count number of used entries"
===========================================================
Create a simple helper to count number of entries in extradata. I am
separating this in a function since it will need to count userdata and
sysdata. For instance, when the user adds an extra userdata, we need to
check if there is space, counting the previous data entries (from
userdata and cpu data)
Patch "Introduce configfs helpers for sysdata features"
======================================================
Create the concept of sysdata feature in the netconsole target, and
create the configfs helpers to enable the bit in nt->sysdata
Patch "Include sysdata in extradata entry count"
================================================
Add the concept of sysdata when counting for available space in the
buffer. This will protect users from creating new userdata/sysdata if
there is no more space
Patch "netconsole: add support for sysdata and CPU population"
===============================================================
This is the core patch. Basically add a new option to enable automatic
CPU number population in the netconsole userdata Provides a new "cpu_nr"
sysfs attribute to control this feature
Patch "netconsole: selftest: test CPU number auto-population"
=============================================================
Expands the existing netconsole selftest to verify the CPU number
auto-population functionality Ensures the received netconsole messages
contain the expected "cpu=<CPU>" entry in the message. Test different
permutation with userdata
Patch "netconsole: docs: Add documentation for CPU number auto-population"
=============================================================================
Updates the netconsole documentation to explain the new CPU number
auto-population feature Provides instructions on how to enable and use
the feature
I believe these changes will be a valuable addition to the netconsole
subsystem, enhancing its usefulness for kernel developers and users.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v3:
- Moved the buffer into netconsole_target, avoiding static functions in
the send path (Jakub).
- Fix a documentation error (Randy Dunlap)
- Created a function that handle all the extradata, consolidating it in
a single place (Jakub)
- Split the patch even more, trying to simplify the review.
- Link to v2: https://lore.kernel.org/r/20250115-netcon_cpu-v2-0-95971b44dc56@debian.org
Changes in v2:
- Create the concept of extradata and sysdata. This will make the design
easier to understand, and the code easier to read.
* Basically extradata encompasses userdata and the new sysdata.
Userdata originates from user, and sysdata originates in kernel.
- Improved the test to send from a very specific CPU, which can be
checked to be correct on the other side, as suggested by Jakub.
- Fixed a bug where CPU # was populated at the wrong place
- Link to v1: https://lore.kernel.org/r/20241113-netcon_cpu-v1-0-d187bf7c0321@debian.org
---
Breno Leitao (8):
netconsole: consolidate send buffers into netconsole_target struct
netconsole: Rename userdata to extradata
netconsole: Helper to count number of used entries
netconsole: Introduce configfs helpers for sysdata features
netconsole: Include sysdata in extradata entry count
netconsole: add support for sysdata and CPU population
netconsole: selftest: test for sysdata CPU
netconsole: docs: Add documentation for CPU number auto-population
Documentation/networking/netconsole.rst | 45 ++++
drivers/net/netconsole.c | 260 ++++++++++++++++-----
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 17 ++
.../selftests/drivers/net/netcons_sysdata.sh | 167 +++++++++++++
5 files changed, 426 insertions(+), 64 deletions(-)
---
base-commit: fa10c9f5aa705deb43fd65623508074bee942764
change-id: 20241108-netcon_cpu-ce3917e88f4b
Best regards,
--
Breno Leitao <leitao(a)debian.org>
As noted in [0], SeaBIOS (QEMU default) makes a mess of the terminal,
qboot does not.
It turns out this is actually useful with kunit.py, since the user is
exposed to this issue if they set --raw_output=all.
qboot is also faster than SeaBIOS, but it's is marginal for this
usecase.
[0] https://lore.kernel.org/all/CA+i-1C0wYb-gZ8Mwh3WSVpbk-LF-Uo+njVbASJPe1WXDUR…
Both SeaBIOS and qboot are x86-specific.
Signed-off-by: Brendan Jackman <jackmanb(a)google.com>
---
tools/testing/kunit/qemu_configs/x86_64.py | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/kunit/qemu_configs/x86_64.py b/tools/testing/kunit/qemu_configs/x86_64.py
index dc794907686304b325dbe180149169dd79bcd44f..4a6bf4e048f5b05c889e3b9b03046f14cc9b0bcc 100644
--- a/tools/testing/kunit/qemu_configs/x86_64.py
+++ b/tools/testing/kunit/qemu_configs/x86_64.py
@@ -7,4 +7,6 @@ CONFIG_SERIAL_8250_CONSOLE=y''',
qemu_arch='x86_64',
kernel_path='arch/x86/boot/bzImage',
kernel_command_line='console=ttyS0',
- extra_qemu_params=[])
+ # qboot is faster than SeaBIOS and doesn't mess up
+ # the terminal.
+ extra_qemu_params=['-bios', 'qboot.rom'])
---
base-commit: 8ea24baaaa869adeb39c6b9ce7542657a7251b56
change-id: 20250124-kunit-qboot-5201945f7e86
Best regards,
--
Brendan Jackman <jackmanb(a)google.com>
This series introduce the dynptr counterpart of the
bpf_probe_read_{kernel,user} helpers and bpf_copy_from_user helper.
These helpers are helpful for reading variable-length data from kernel
memory into dynptr without going through an intermediate buffer.
Link: https://lore.kernel.org/bpf/MEYP282MB2312CFCE5F7712FDE313215AC64D2@MEYP282M…
Suggested-by: Andrii Nakryiko <andrii.nakryiko(a)gmail.com>
Signed-off-by: Levi Zim <rsworktech(a)outlook.com>
---
Changes in v2:
- Add missing bpf-next prefix. I forgot it in the initial series. Sorry
about that.
- Link to v1: https://lore.kernel.org/r/20250125-bpf_dynptr_probe-v1-0-c3cb121f6951@outlo…
---
Levi Zim (7):
bpf: Implement bpf_probe_read_kernel_dynptr helper
bpf: Implement bpf_probe_read_user_dynptr helper
bpf: Implement bpf_copy_from_user_dynptr helper
tools headers UAPI: Update tools's copy of bpf.h header
selftests/bpf: probe_read_kernel_dynptr test
selftests/bpf: probe_read_user_dynptr test
selftests/bpf: copy_from_user_dynptr test
include/linux/bpf.h | 3 +
include/uapi/linux/bpf.h | 49 ++++++++++
kernel/bpf/helpers.c | 53 ++++++++++-
kernel/trace/bpf_trace.c | 72 ++++++++++++++
tools/include/uapi/linux/bpf.h | 49 ++++++++++
tools/testing/selftests/bpf/prog_tests/dynptr.c | 45 ++++++++-
tools/testing/selftests/bpf/progs/dynptr_success.c | 106 +++++++++++++++++++++
7 files changed, 374 insertions(+), 3 deletions(-)
---
base-commit: d0d106a2bd21499901299160744e5fe9f4c83ddb
change-id: 20250124-bpf_dynptr_probe-ab483c554f1a
Best regards,
--
Levi Zim <rsworktech(a)outlook.com>
Greetings:
This is an attempt to followup on something Jakub asked me about [1],
adding an xsk attribute to queues and more clearly documenting which
queues are linked to NAPIs...
But:
1. I couldn't pick a good "thing" to expose as "xsk", so I chose 0 or 1.
Happy to take suggestions on what might be better to expose for the
xsk queue attribute.
2. I create a silly C helper program to create an XDP socket in order to
add a new test to queues.py. I'm not particularly good at python
programming, so there's probably a better way to do this. Notably,
python does not seem to have a socket.AF_XDP, so I needed the C
helper to make a socket and bind it to a queue to perform the test.
Tested this on my mlx5 machine and the test seems to pass.
Happy to take any suggestions / feedback on this one; sorry in advance
if I missed many obvious better ways to do things.
Thanks,
Joe
[1]: https://lore.kernel.org/netdev/20250113143109.60afa59a@kernel.org/
Joe Damato (2):
netdev-genl: Add an XSK attribute to queues
selftests: drv-net: Test queue xsk attribute
Documentation/netlink/specs/netdev.yaml | 10 ++-
include/uapi/linux/netdev.h | 1 +
net/core/netdev-genl.c | 6 ++
tools/include/uapi/linux/netdev.h | 1 +
tools/testing/selftests/drivers/.gitignore | 1 +
tools/testing/selftests/drivers/net/Makefile | 3 +
tools/testing/selftests/drivers/net/queues.py | 32 ++++++-
.../selftests/drivers/net/xdp_helper.c | 90 +++++++++++++++++++
8 files changed, 141 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/xdp_helper.c
base-commit: 0ad9617c78acbc71373fb341a6f75d4012b01d69
--
2.25.1
xtheadvector is a custom extension that is based upon riscv vector
version 0.7.1 [1]. All of the vector routines have been modified to
support this alternative vector version based upon whether xtheadvector
was determined to be supported at boot.
vlenb is not supported on the existing xtheadvector hardware, so a
devicetree property thead,vlenb is added to provide the vlenb to Linux.
There is a new hwprobe key RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 that is
used to request which thead vendor extensions are supported on the
current platform. This allows future vendors to allocate hwprobe keys
for their vendor.
Support for xtheadvector is also added to the vector kselftests.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
[1] https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361…
---
This series is a continuation of a different series that was fragmented
into two other series in an attempt to get part of it merged in the 6.10
merge window. The split-off series did not get merged due to a NAK on
the series that added the generic riscv,vlenb devicetree entry. This
series has converted riscv,vlenb to thead,vlenb to remedy this issue.
The original series is titled "riscv: Support vendor extensions and
xtheadvector" [3].
I have tested this with an Allwinner Nezha board. I used SkiffOS [1] to
manage building the image, but upgraded the U-Boot version to Samuel
Holland's more up-to-date version [2] and changed out the device tree
used by U-Boot with the device trees that are present in upstream linux
and this series. Thank you Samuel for all of the work you did to make
this task possible.
[1] https://github.com/skiffos/SkiffOS/tree/master/configs/allwinner/nezha
[2] https://github.com/smaeul/u-boot/commit/2e89b706f5c956a70c989cd31665f1429e9…
[3] https://lore.kernel.org/all/20240503-dev-charlie-support_thead_vector_6_9-v…
[4] https://lore.kernel.org/lkml/20240719-support_vendor_extensions-v3-4-0af758…
---
Changes in v11:
- Fix an issue where the mitigation was not being properly skipped when
requested
- Fix vstate_discard issue
- Fix issue when -1 was passed into
__riscv_isa_vendor_extension_available()
- Remove some artifacts from being placed in the test directory
- Link to v10: https://lore.kernel.org/r/20240911-xtheadvector-v10-0-8d3930091246@rivosinc…
Changes in v10:
- In DT probing disable vector with new function to clear vendor
extension bits for xtheadvector
- Add ghostwrite mitigations for c9xx CPUs. This disables xtheadvector
unless mitigations=off is set as a kernel boot arg
- Link to v9: https://lore.kernel.org/r/20240806-xtheadvector-v9-0-62a56d2da5d0@rivosinc.…
Changes in v9:
- Rebase onto palmer's for-next
- Fix sparse error in arch/riscv/kernel/vendor_extensions/thead.c
- Fix maybe-uninitialized warning in arch/riscv/include/asm/vendor_extensions/vendor_hwprobe.h
- Wrap some long lines
- Link to v8: https://lore.kernel.org/r/20240724-xtheadvector-v8-0-cf043168e137@rivosinc.…
Changes in v8:
- Rebase onto palmer's for-next
- Link to v7: https://lore.kernel.org/r/20240724-xtheadvector-v7-0-b741910ada3e@rivosinc.…
Changes in v7:
- Add defs for has_xtheadvector_no_alternatives() and has_xtheadvector()
when vector disabled. (Palmer)
- Link to v6: https://lore.kernel.org/r/20240722-xtheadvector-v6-0-c9af0130fa00@rivosinc.…
Changes in v6:
- Fix return type of is_vector_supported()/is_xthead_supported() to be bool
- Link to v5: https://lore.kernel.org/r/20240719-xtheadvector-v5-0-4b485fc7d55f@rivosinc.…
Changes in v5:
- Rebase on for-next
- Link to v4: https://lore.kernel.org/r/20240702-xtheadvector-v4-0-2bad6820db11@rivosinc.…
Changes in v4:
- Replace inline asm with C (Samuel)
- Rename VCSRs to CSRs (Samuel)
- Replace .insn directives with .4byte directives
- Link to v3: https://lore.kernel.org/r/20240619-xtheadvector-v3-0-bff39eb9668e@rivosinc.…
Changes in v3:
- Add back Heiko's signed-off-by (Conor)
- Mark RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 as a bitmask
- Link to v2: https://lore.kernel.org/r/20240610-xtheadvector-v2-0-97a48613ad64@rivosinc.…
Changes in v2:
- Removed extraneous references to "riscv,vlenb" (Jess)
- Moved declaration of "thead,vlenb" into cpus.yaml and added
restriction that it's only applicable to thead cores (Conor)
- Check CONFIG_RISCV_ISA_XTHEADVECTOR instead of CONFIG_RISCV_ISA_V for
thead,vlenb (Jess)
- Fix naming of hwprobe variables (Evan)
- Link to v1: https://lore.kernel.org/r/20240609-xtheadvector-v1-0-3fe591d7f109@rivosinc.…
---
Charlie Jenkins (13):
dt-bindings: riscv: Add xtheadvector ISA extension description
dt-bindings: cpus: add a thead vlen register length property
riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
riscv: Add thead and xtheadvector as a vendor extension
riscv: vector: Use vlenb from DT for thead
riscv: csr: Add CSR encodings for CSR_VXRM/CSR_VXSAT
riscv: Add xtheadvector instruction definitions
riscv: vector: Support xtheadvector save/restore
riscv: hwprobe: Add thead vendor extension probing
riscv: hwprobe: Document thead vendor extensions and xtheadvector extension
selftests: riscv: Fix vector tests
selftests: riscv: Support xtheadvector in vector tests
riscv: Add ghostwrite vulnerability
Heiko Stuebner (1):
RISC-V: define the elements of the VCSR vector CSR
Documentation/arch/riscv/hwprobe.rst | 10 +
Documentation/devicetree/bindings/riscv/cpus.yaml | 19 ++
.../devicetree/bindings/riscv/extensions.yaml | 10 +
arch/riscv/Kconfig.errata | 11 +
arch/riscv/Kconfig.vendor | 26 ++
arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi | 3 +-
arch/riscv/errata/thead/errata.c | 28 ++
arch/riscv/include/asm/bugs.h | 22 ++
arch/riscv/include/asm/cpufeature.h | 2 +
arch/riscv/include/asm/csr.h | 15 +
arch/riscv/include/asm/errata_list.h | 3 +-
arch/riscv/include/asm/hwprobe.h | 3 +-
arch/riscv/include/asm/switch_to.h | 2 +-
arch/riscv/include/asm/vector.h | 222 +++++++++++----
arch/riscv/include/asm/vendor_extensions/thead.h | 47 ++++
.../include/asm/vendor_extensions/thead_hwprobe.h | 19 ++
.../include/asm/vendor_extensions/vendor_hwprobe.h | 37 +++
arch/riscv/include/uapi/asm/hwprobe.h | 3 +-
arch/riscv/include/uapi/asm/vendor/thead.h | 3 +
arch/riscv/kernel/Makefile | 2 +
arch/riscv/kernel/bugs.c | 60 ++++
arch/riscv/kernel/cpufeature.c | 59 +++-
arch/riscv/kernel/kernel_mode_vector.c | 8 +-
arch/riscv/kernel/process.c | 4 +-
arch/riscv/kernel/signal.c | 6 +-
arch/riscv/kernel/sys_hwprobe.c | 5 +
arch/riscv/kernel/vector.c | 24 +-
arch/riscv/kernel/vendor_extensions.c | 10 +
arch/riscv/kernel/vendor_extensions/Makefile | 2 +
arch/riscv/kernel/vendor_extensions/thead.c | 29 ++
.../riscv/kernel/vendor_extensions/thead_hwprobe.c | 19 ++
drivers/base/cpu.c | 3 +
include/linux/cpu.h | 1 +
tools/testing/selftests/riscv/vector/.gitignore | 3 +-
tools/testing/selftests/riscv/vector/Makefile | 17 +-
.../selftests/riscv/vector/v_exec_initval_nolibc.c | 94 +++++++
tools/testing/selftests/riscv/vector/v_helpers.c | 68 +++++
tools/testing/selftests/riscv/vector/v_helpers.h | 8 +
tools/testing/selftests/riscv/vector/v_initval.c | 22 ++
.../selftests/riscv/vector/v_initval_nolibc.c | 68 -----
.../selftests/riscv/vector/vstate_exec_nolibc.c | 20 +-
.../testing/selftests/riscv/vector/vstate_prctl.c | 305 +++++++++++++--------
42 files changed, 1051 insertions(+), 271 deletions(-)
---
base-commit: 0eb512779d642b21ced83778287a0f7a3ca8f2a1
change-id: 20240530-xtheadvector-833d3d17b423
--
- Charlie
Here is a series from Geliang, adding mptcp_subflow bpf_iter support.
We are working on extending MPTCP with BPF, e.g. to control the path
manager -- in charge of the creation, deletion, and announcements of
subflows (paths) -- and the packet scheduler -- in charge of selecting
which available path the next data will be sent to. These extensions
need to iterate over the list of subflows attached to an MPTCP
connection, and do some specific actions via some new kfunc that will be
added later on.
This preparation work is split in different patches:
- Patch 1: extend bpf_skc_to_mptcp_sock() to be called with msk.
- Patch 2: allow using skc_to_mptcp_sock() in CGroup sockopt hooks.
- Patch 3: register some "basic" MPTCP kfunc.
- Patch 4: add mptcp_subflow bpf_iter support. Note that previous
versions of this single patch have already been shared to the
BPF mailing list. The changelog has been kept with a comment,
but the version number has been reset to avoid confusions.
- Patch 5: add kfunc to make sure the msk is valid
- Patch 6: add more MPTCP endpoints in the selftests, in order to create
more than 2 subflows.
- Patch 7: add a very simple test validating mptcp_subflow bpf_iter
support. This test could be written without the new bpf_iter,
but it is there only to make sure this specific feature works
as expected.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Changes in v2:
- Patches 1-2: new ones.
- Patch 3: remove two kfunc, more restrictions. (Martin)
- Patch 4: add BUILD_BUG_ON(), more restrictions. (Martin)
- Patch 7: adaptations due to modifications in patches 1-4.
- Link to v1: https://lore.kernel.org/r/20241108-bpf-next-net-mptcp-bpf_iter-subflows-v1-…
---
Geliang Tang (7):
bpf: Extend bpf_skc_to_mptcp_sock to MPTCP sock
bpf: Allow use of skc_to_mptcp_sock in cg_sockopt
bpf: Register mptcp common kfunc set
bpf: Add mptcp_subflow bpf_iter
bpf: Acquire and release mptcp socket
selftests/bpf: More endpoints for endpoint_init
selftests/bpf: Add mptcp_subflow bpf_iter subtest
include/net/mptcp.h | 4 +-
kernel/bpf/cgroup.c | 2 +
net/core/filter.c | 2 +-
net/mptcp/bpf.c | 113 +++++++++++++++++-
tools/testing/selftests/bpf/bpf_experimental.h | 8 ++
tools/testing/selftests/bpf/prog_tests/mptcp.c | 129 ++++++++++++++++++++-
tools/testing/selftests/bpf/progs/mptcp_bpf.h | 9 ++
.../testing/selftests/bpf/progs/mptcp_bpf_iters.c | 63 ++++++++++
8 files changed, 318 insertions(+), 12 deletions(-)
---
base-commit: dad704ebe38642cd405e15b9c51263356391355c
change-id: 20241108-bpf-next-net-mptcp-bpf_iter-subflows-027f6d87770e
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
This patch series provides workingset reporting of user pages in
lruvecs, of which coldness can be tracked by accessed bits and fd
references. However, the concept of workingset applies generically to
all types of memory, which could be kernel slab caches, discardable
userspace caches (databases), or CXL.mem. Therefore, data sources might
come from slab shrinkers, device drivers, or the userspace.
Another interesting idea might be hugepage workingset, so that we can
measure the proportion of hugepages backing cold memory. However, with
architectures like arm, there may be too many hugepage sizes leading to
a combinatorial explosion when exporting stats to the userspace.
Nonetheless, the kernel should provide a set of workingset interfaces
that is generic enough to accommodate the various use cases, and extensible
to potential future use cases.
Use cases
==========
Job scheduling
On overcommitted hosts, workingset information improves efficiency and
reliability by allowing the job scheduler to have better stats on the
exact memory requirements of each job. This can manifest in efficiency by
landing more jobs on the same host or NUMA node. On the other hand, the
job scheduler can also ensure each node has a sufficient amount of memory
and does not enter direct reclaim or the kernel OOM path. With workingset
information and job priority, the userspace OOM killing or proactive
reclaim policy can kick in before the system is under memory pressure.
If the job shape is very different from the machine shape, knowing the
workingset per-node can also help inform page allocation policies.
Proactive reclaim
Workingset information allows the a container manager to proactively
reclaim memory while not impacting a job's performance. While PSI may
provide a reactive measure of when a proactive reclaim has reclaimed too
much, workingset reporting allows the policy to be more accurate and
flexible.
Ballooning (similar to proactive reclaim)
The last patch of the series extends the virtio-balloon device to report
the guest workingset.
Balloon policies benefit from workingset to more precisely determine the
size of the memory balloon. On end-user devices where memory is scarce and
overcommitted, the balloon sizing in multiple VMs running on the same
device can be orchestrated with workingset reports from each one.
On the server side, workingset reporting allows the balloon controller to
inflate the balloon without causing too much file cache to be reclaimed in
the guest.
Promotion/Demotion
If different mechanisms are used for promition and demotion, workingset
information can help connect the two and avoid pages being migrated back
and forth.
For example, given a promotion hot page threshold defined in reaccess
distance of N seconds (promote pages accessed more often than every N
seconds). The threshold N should be set so that ~80% (e.g.) of pages on
the fast memory node passes the threshold. This calculation can be done
with workingset reports.
To be directly useful for promotion policies, the workingset report
interfaces need to be extended to report hotness and gather hotness
information from the devices[1].
[1]
https://www.opencompute.org/documents/ocp-cms-hotness-tracking-requirements…
Sysfs and Cgroup Interfaces
==========
The interfaces are detailed in the patches that introduce them. The main
idea here is we break down the workingset per-node per-memcg into time
intervals (ms), e.g.
1000 anon=137368 file=24530
20000 anon=34342 file=0
30000 anon=353232 file=333608
40000 anon=407198 file=206052
9223372036854775807 anon=4925624 file=892892
Implementation
==========
The reporting of user pages is based off of MGLRU, and therefore requires
CONFIG_LRU_GEN=y. We would benefit from more MGLRU generations for a more
fine-grained workingset report, but we can already gather a lot of data
with just four generations. The workingset reporting mechanism is gated
behind CONFIG_WORKINGSET_REPORT, and the aging thread is behind
CONFIG_WORKINGSET_REPORT_AGING.
Benchmarks
==========
Ghait Ouled Amar Ben Cheikh has implemented a simple policy and ran Linux
compile and redis benchmarks from openbenchmarking.org. The policy and
runner is referred to as WMO (Workload Memory Optimization).
The results were based on v3 of the series, but v4 doesn't change the core
of the working set reporting and just adds the ballooning counterpart.
The timed Linux kernel compilation benchmark shows improvements in peak
memory usage with a policy of "swap out all bytes colder than 10 seconds
every 40 seconds". A swapfile is configured on SSD.
--------------------------------------------
peak memory usage (with WMO): 4982.61328 MiB
peak memory usage (control): 9569.1367 MiB
peak memory reduction: 47.9%
--------------------------------------------
Benchmark | Experimental |Control | Experimental_Std_Dev | Control_Std_Dev
Timed Linux Kernel Compilation - allmodconfig (sec) | 708.486 (95.91%) | 679.499 (100%) | 0.6% | 0.1%
--------------------------------------------
Seconds, fewer is better
The redis benchmark shows employs the same policy:
--------------------------------------------
peak memory usage (with WMO): 375.9023 MiB
peak memory usage (control): 509.765 MiB
peak memory reduction: 26%
--------------------------------------------
Benchmark | Experimental | Control | Experimental_Std_Dev | Control_Std_Dev
Redis - LPOP (Reqs/sec) | 2023130 (98.22%) | 2059849 (100%) | 1.2% | 2%
Redis - SADD (Reqs/sec) | 2539662 (98.63%) | 2574811 (100%) | 2.3% | 1.4%
Redis - LPUSH (Reqs/sec)| 2024880 (100%) | 2000884 (98.81%) | 1.1% | 0.8%
Redis - GET (Reqs/sec) | 2835764 (100%) | 2763722 (97.46%) | 2.7% | 1.6%
Redis - SET (Reqs/sec) | 2340723 (100%) | 2327372 (99.43%) | 2.4% | 1.8%
--------------------------------------------
Reqs/sec, more is better
The detailed report and benchmarking results are in Ghait's repo:
https://github.com/miloudi98/WMO
Changelog
==========
Changes from PATCH v3 -> v4:
- Added documentation for cgroup-v2
(Waiman Long)
- Fixed types in documentation
(Randy Dunlap)
- Added implementation for the ballooning use case
- Added detailed description of benchmark results
(Andrew Morton)
Changes from PATCH v2 -> v3:
- Fixed typos in commit messages and documentation
(Lance Yang, Randy Dunlap)
- Split out the force_scan patch to be reviewed separately
- Added benchmarks from Ghait Ouled Amar Ben Cheikh
- Fixed reported compile error without CONFIG_MEMCG
Changes from PATCH v1 -> v2:
- Updated selftest to use ksft_test_result_code instead of switch-case
(Muhammad Usama Anjum)
- Included more use cases in the cover letter
(Huang, Ying)
- Added documentation for sysfs and memcg interfaces
- Added an aging-specific struct lru_gen_mm_walk in struct pglist_data
to avoid allocating for each lruvec.
[v1] https://lore.kernel.org/linux-mm/20240504073011.4000534-1-yuanchu@google.co…
[v2] https://lore.kernel.org/linux-mm/20240604020549.1017540-1-yuanchu@google.co…
[v3] https://lore.kernel.org/linux-mm/20240813165619.748102-1-yuanchu@google.com/
Yuanchu Xie (9):
mm: aggregate workingset information into histograms
mm: use refresh interval to rate-limit workingset report aggregation
mm: report workingset during memory pressure driven scanning
mm: extend workingset reporting to memcgs
mm: add kernel aging thread for workingset reporting
selftest: test system-wide workingset reporting
Docs/admin-guide/mm/workingset_report: document sysfs and memcg
interfaces
Docs/admin-guide/cgroup-v2: document workingset reporting
virtio-balloon: add workingset reporting
Documentation/admin-guide/cgroup-v2.rst | 35 +
Documentation/admin-guide/mm/index.rst | 1 +
.../admin-guide/mm/workingset_report.rst | 105 +++
drivers/base/node.c | 6 +
drivers/virtio/virtio_balloon.c | 390 ++++++++++-
include/linux/balloon_compaction.h | 1 +
include/linux/memcontrol.h | 21 +
include/linux/mmzone.h | 13 +
include/linux/workingset_report.h | 167 +++++
include/uapi/linux/virtio_balloon.h | 30 +
mm/Kconfig | 15 +
mm/Makefile | 2 +
mm/internal.h | 19 +
mm/memcontrol.c | 162 ++++-
mm/mm_init.c | 2 +
mm/mmzone.c | 2 +
mm/vmscan.c | 56 +-
mm/workingset_report.c | 653 ++++++++++++++++++
mm/workingset_report_aging.c | 127 ++++
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 3 +
tools/testing/selftests/mm/run_vmtests.sh | 5 +
.../testing/selftests/mm/workingset_report.c | 306 ++++++++
.../testing/selftests/mm/workingset_report.h | 39 ++
.../selftests/mm/workingset_report_test.c | 330 +++++++++
25 files changed, 2482 insertions(+), 9 deletions(-)
create mode 100644 Documentation/admin-guide/mm/workingset_report.rst
create mode 100644 include/linux/workingset_report.h
create mode 100644 mm/workingset_report.c
create mode 100644 mm/workingset_report_aging.c
create mode 100644 tools/testing/selftests/mm/workingset_report.c
create mode 100644 tools/testing/selftests/mm/workingset_report.h
create mode 100644 tools/testing/selftests/mm/workingset_report_test.c
--
2.47.0.338.g60cca15819-goog
A previous commit described in this topic
http://lore.kernel.org/bpf/20230523025618.113937-9-john.fastabend@gmail.com
directly updated 'sk->copied_seq' in the tcp_eat_skb() function when the
action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS,
the update logic for 'sk->copied_seq' was moved to
tcp_bpf_recvmsg_parser() to ensure the accuracy of the 'fionread' feature.
That commit works for a single stream_verdict scenario, as it also
modified 'sk_data_ready->sk_psock_verdict_data_ready->tcp_read_skb'
to remove updating 'sk->copied_seq'.
However, for programs where both stream_parser and stream_verdict are
active (strparser purpose), tcp_read_sock() was used instead of
tcp_read_skb() (sk_data_ready->strp_data_ready->tcp_read_sock).
tcp_read_sock() now still updates 'sk->copied_seq', leading to duplicated
updates.
In summary, for strparser + SK_PASS, copied_seq is redundantly calculated
in both tcp_read_sock() and tcp_bpf_recvmsg_parser().
The issue causes incorrect copied_seq calculations, which prevent
correct data reads from the recv() interface in user-land.
Also we added test cases for bpf + strparser and separated them from
sockmap_basic, as strparser has more encapsulation and parsing
capabilities compared to sockmap.
---
V8 -> v9
https://lore.kernel.org/bpf/20250121050707.55523-1-mrpre@163.com/
Fixed some issues suggested by Jakub Sitnicki.
V7 -> V8
https://lore.kernel.org/bpf/20250116140531.108636-1-mrpre@163.com/
Avoid using add read_sock to psock. (Jakub Sitnicki)
Avoid using warpper function to check whether strparser is supported.
V3 -> V7:
https://lore.kernel.org/bpf/20250109094402.50838-1-mrpre@163.com/https://lore.kernel.org/bpf/20241218053408.437295-1-mrpre@163.com/
Avoid introducing new proto_ops. (Jakub Sitnicki).
Add more edge test cases for strparser + bpf.
Fix patchwork fail of test cases code.
Fix psock fetch without rcu lock.
Move code of modifying to tcp_bpf.c.
V1 -> V3:
https://lore.kernel.org/bpf/20241209152740.281125-1-mrpre@163.com/
Fix patchwork fail by adding Fixes tag.
Save skb data offset for ENOMEM. (John Fastabend)
---
Jiayuan Chen (5):
strparser: add read_sock callback
bpf: fix wrong copied_seq calculation
bpf: disable non stream socket for strparser
selftests/bpf: fix invalid flag of recv()
selftests/bpf: add strparser test for bpf
Documentation/networking/strparser.rst | 9 +-
include/linux/skmsg.h | 2 +
include/net/strparser.h | 2 +
include/net/tcp.h | 8 +
net/core/skmsg.c | 7 +
net/core/sock_map.c | 5 +-
net/ipv4/tcp.c | 29 +-
net/ipv4/tcp_bpf.c | 36 ++
net/strparser/strparser.c | 11 +-
.../selftests/bpf/prog_tests/sockmap_basic.c | 59 +--
.../selftests/bpf/prog_tests/sockmap_strp.c | 454 ++++++++++++++++++
.../selftests/bpf/progs/test_sockmap_strp.c | 53 ++
12 files changed, 610 insertions(+), 65 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/sockmap_strp.c
create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_strp.c
--
2.43.5
This series remove compatibility with Python 2.x from scripts that have some
backward compatibility logic on it. The rationale is that, since
commit 627395716cc3 ("docs: document python version used for compilation"),
the minimal Python version was set to 3.x. Also, Python 2.x is EOL since Jan, 2020.
Patch 1: fix a script that was compatible only with Python 2.x;
Patches 2-4: remove backward-compat code;
Patches 5-6 solves forward-compat with modern Python which warns about using
raw strings without using "r" format.
Mauro Carvalho Chehab (6):
docs: trace: decode_msr.py: make it compatible with python 3
tools: perf: exported-sql-viewer: drop support for Python 2
tools: perf: tools: perf: exported-sql-viewer: drop support for Python
2
tools: perf: task-analyzer: drop support for Python 2
tools: selftests/bpf: test_bpftool_synctypes: escape raw symbols
comedi: convert_csv_to_c.py: use r-string for a regex expression
Documentation/trace/postprocess/decode_msr.py | 2 +-
.../ni_routing/tools/convert_csv_to_c.py | 2 +-
.../scripts/python/exported-sql-viewer.py | 5 ++--
tools/perf/scripts/python/task-analyzer.py | 23 ++++----------
tools/perf/tests/shell/lib/attr.py | 6 +---
.../selftests/bpf/test_bpftool_synctypes.py | 30 +++++++++----------
6 files changed, 25 insertions(+), 43 deletions(-)
--
2.48.1
PTRACE_SET_SYSCALL_INFO is a generic ptrace API that complements
PTRACE_GET_SYSCALL_INFO by letting the ptracer modify details of
system calls the tracee is blocked in.
This API allows ptracers to obtain and modify system call details
in a straightforward and architecture-agnostic way.
Current implementation supports changing only those bits of system call
information that are used by strace, namely, syscall number, syscall
arguments, and syscall return value.
Support of changing additional details returned by PTRACE_GET_SYSCALL_INFO,
such as instruction pointer and stack pointer, could be added later if
needed, by using struct ptrace_syscall_info.flags to specify the additional
details that should be set. Currently, "flags", "reserved", and
"seccomp.reserved2" fields of struct ptrace_syscall_info must be
initialized with zeroes; "arch", "instruction_pointer", and "stack_pointer"
fields are ignored.
PTRACE_SET_SYSCALL_INFO currently supports only PTRACE_SYSCALL_INFO_ENTRY,
PTRACE_SYSCALL_INFO_EXIT, and PTRACE_SYSCALL_INFO_SECCOMP operations.
Other operations could be added later if needed.
Ideally, PTRACE_SET_SYSCALL_INFO should have been introduced along with
PTRACE_GET_SYSCALL_INFO, but it didn't happen. The last straw that
convinced me to implement PTRACE_SET_SYSCALL_INFO was apparent failure
to provide an API of changing the first system call argument on riscv
architecture [1].
ptrace(2) man page:
long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);
...
PTRACE_SET_SYSCALL_INFO
Modify information about the system call that caused the stop.
The "data" argument is a pointer to struct ptrace_syscall_info
that specifies the system call information to be set.
The "addr" argument should be set to sizeof(struct ptrace_syscall_info)).
[1] https://lore.kernel.org/all/59505464-c84a-403d-972f-d4b2055eeaac@gmail.com/
Notes:
v3:
* powerpc: Submit syscall_set_return_value fix for "sc" case separately
* mips: Do not introduce erroneous argument truncation on mips n32,
add a detailed description to the commit message of the
mips_get_syscall_arg change
* ptrace: Add explicit padding to the end of struct ptrace_syscall_info,
simplify obtaining of user ptrace_syscall_info,
do not introduce PTRACE_SYSCALL_INFO_SIZE_VER0
* ptrace: Change the return type of ptrace_set_syscall_info_* functions
from "unsigned long" to "int"
* ptrace: Add -ERANGE check to ptrace_set_syscall_info_exit,
add comments to -ERANGE checks
* ptrace: Update comments about supported syscall stops
* selftests: Extend set_syscall_info test, fix for mips n32
* Add Tested-by and Reviewed-by
v2:
* Add patch to fix syscall_set_return_value() on powerpc
* Add patch to fix mips_get_syscall_arg() on mips
* Add syscall_set_return_value() implementation on hexagon
* Add syscall_set_return_value() invocation to syscall_set_nr()
on arm and arm64.
* Fix syscall_set_nr() and mips_set_syscall_arg() on mips
* Add a comment to syscall_set_nr() on arc, powerpc, s390, sh,
and sparc
* Remove redundant ptrace_syscall_info.op assignments in
ptrace_get_syscall_info_*
* Minor style tweaks in ptrace_get_syscall_info_op()
* Remove syscall_set_return_value() invocation from
ptrace_set_syscall_info_entry()
* Skip syscall_set_arguments() invocation in case of syscall number -1
in ptrace_set_syscall_info_entry()
* Split ptrace_syscall_info.reserved into ptrace_syscall_info.reserved
and ptrace_syscall_info.flags
* Use __kernel_ulong_t instead of unsigned long in set_syscall_info test
Dmitry V. Levin (6):
mips: fix mips_get_syscall_arg() for o32
syscall.h: add syscall_set_arguments() and syscall_set_return_value()
syscall.h: introduce syscall_set_nr()
ptrace_get_syscall_info: factor out ptrace_get_syscall_info_op
ptrace: introduce PTRACE_SET_SYSCALL_INFO request
selftests/ptrace: add a test case for PTRACE_SET_SYSCALL_INFO
arch/arc/include/asm/syscall.h | 25 +
arch/arm/include/asm/syscall.h | 37 ++
arch/arm64/include/asm/syscall.h | 29 +
arch/csky/include/asm/syscall.h | 13 +
arch/hexagon/include/asm/syscall.h | 21 +
arch/loongarch/include/asm/syscall.h | 15 +
arch/m68k/include/asm/syscall.h | 7 +
arch/microblaze/include/asm/syscall.h | 7 +
arch/mips/include/asm/syscall.h | 70 ++-
arch/nios2/include/asm/syscall.h | 16 +
arch/openrisc/include/asm/syscall.h | 13 +
arch/parisc/include/asm/syscall.h | 19 +
arch/powerpc/include/asm/syscall.h | 20 +
arch/riscv/include/asm/syscall.h | 16 +
arch/s390/include/asm/syscall.h | 24 +
arch/sh/include/asm/syscall_32.h | 24 +
arch/sparc/include/asm/syscall.h | 22 +
arch/um/include/asm/syscall-generic.h | 19 +
arch/x86/include/asm/syscall.h | 43 ++
arch/xtensa/include/asm/syscall.h | 18 +
include/asm-generic/syscall.h | 30 +
include/uapi/linux/ptrace.h | 7 +-
kernel/ptrace.c | 179 +++++-
tools/testing/selftests/ptrace/Makefile | 2 +-
.../selftests/ptrace/set_syscall_info.c | 514 ++++++++++++++++++
25 files changed, 1143 insertions(+), 47 deletions(-)
create mode 100644 tools/testing/selftests/ptrace/set_syscall_info.c
--
ldv
From: "Mike Rapoport (Microsoft)" <rppt(a)kernel.org>
Hi,
Following Peter's comments [1] these patches rework handling of ROX caches
for module text allocations.
Instead of using a writable copy that really complicates alternatives
patching, temporarily remap parts of a large ROX page as RW for the time of
module formation and then restore it's ROX protections when the module is
ready.
To keep the ROX memory mapped with large pages, make set_memory_rox()
capable of restoring large pages (more details are in patch 3).
Since this is really about x86, I believe this should go in via tip tree.
The patches also available in git
https://git.kernel.org/rppt/h/execmem/x86-rox/v10
v3 changes:
* instead of adding a new module state handle ROX restoration locally in
load_module() as Petr suggested
v2: https://lore.kernel.org/all/20250121095739.986006-1-rppt@kernel.org
* only collapse large mappings in set_memory_rox()
* simplify RW <-> ROX remapping
* don't remove ROX cache pages from the direct map (patch 4)
v1: https://lore.kernel.org/all/20241227072825.1288491-1-rppt@kernel.org
[1] https://lore.kernel.org/all/20241209083818.GK8562@noisy.programming.kicks-a…
Kirill A. Shutemov (1):
x86/mm/pat: restore large ROX pages after fragmentation
Mike Rapoport (Microsoft) (8):
x86/mm/pat: cpa-test: fix length for CPA_ARRAY test
x86/mm/pat: drop duplicate variable in cpa_flush()
execmem: don't remove ROX cache from the direct map
execmem: add API for temporal remapping as RW and restoring ROX afterwards
module: switch to execmem API for remapping as RW and restoring ROX
Revert "x86/module: prepare module loading for ROX allocations of text"
module: drop unused module_writable_address()
x86: re-enable EXECMEM_ROX support
arch/um/kernel/um_arch.c | 11 +-
arch/x86/Kconfig | 1 +
arch/x86/entry/vdso/vma.c | 3 +-
arch/x86/include/asm/alternative.h | 14 +-
arch/x86/include/asm/pgtable_types.h | 2 +
arch/x86/kernel/alternative.c | 181 +++++++++-------------
arch/x86/kernel/ftrace.c | 30 ++--
arch/x86/kernel/module.c | 45 ++----
arch/x86/mm/pat/cpa-test.c | 2 +-
arch/x86/mm/pat/set_memory.c | 220 ++++++++++++++++++++++++++-
include/linux/execmem.h | 31 ++++
include/linux/module.h | 16 --
include/linux/moduleloader.h | 4 -
include/linux/vm_event_item.h | 2 +
kernel/module/main.c | 78 +++-------
kernel/module/strict_rwx.c | 9 +-
mm/execmem.c | 39 +++--
mm/vmstat.c | 2 +
18 files changed, 422 insertions(+), 268 deletions(-)
base-commit: ffd294d346d185b70e28b1a28abe367bbfe53c04
--
2.45.2
This patch series extends the sev_init2 and the sev_smoke test to
exercise the SEV-SNP VM launch workflow.
Primarily, it introduces the architectural defines, its support in the SEV
library and extends the tests to interact with the SEV-SNP ioctl()
wrappers.
Patch 1 - Do not advertize SNP on incompatible firmware
Patch 2 - Remove SEV support on platform init failure
Patch 3 - SNP test for KVM_SEV_INIT2
Patch 4 - Add VMGEXIT helper
Patch 5 - Introduce SEV+ VM type check
Patch 6 - SNP iotcl() plumbing for the SEV library
Patch 7 - Force set GUEST_MEMFD for SNP
Patch 8 - Cleanups of smoke test - Decouple policy from type
Patch 9 - SNP smoke test
The series is based on
git.kernel.org/pub/scm/virt/kvm/kvm.git next
v4..v5:
* Introduced a check to disable advertising support for SEV, SEV-ES
and SNP when platform initialization fails (Nikunj)
* Remove the redundant SNP check within is_sev_vm() (Nikunj)
* Cleanup of the encrypt_region flow for better readability (Nikunj)
* Refactor paths to use the canonical $(ARCH) to rebase for kvm/next
v3..v4:
https://lore.kernel.org/kvm/20241114234104.128532-1-pratikrajesh.sampat@amd…
* Remove SNP FW API version check in the test and ensure the KVM
capability advertizes the presence of the feature. Retain the minimum
version definitions to exercise these API versions in the smoke test
* Retained only the SNP smoke test and SNP_INIT2 test
* The SNP architectural defined merged with SNP_INIT2 test patch
* SNP shutdown merged with SNP smoke test patch
* Add SEV VM type check to abstract comparisons and reduce clutter
* Define a SNP default policy which sets bits based on the presence of
SMT
* Decouple privatization and encryption for it to be SNP agnostic
* Assert for only positive tests using vm_ioctl()
* Dropped tested-by tags
In summary - based on comments from Sean, I have primarily reduced the
scope of this patch series to focus on breaking down the SNP smoke test
patch (v3 - patch2) to first introduce SEV-SNP support and use this
interface to extend the sev_init2 and the sev_smoke test.
The rest of the v3 patchset that introduces ioctl, pre fault, fallocate
and negative tests, will be re-worked and re-introduced subsequently in
future patch series post addressing the issues discussed.
v2..v3:
https://lore.kernel.org/kvm/20240905124107.6954-1-pratikrajesh.sampat@amd.c…
* Remove the assignments for the prefault and fallocate test type
enums.
* Fix error message for sev launch measure and finish.
* Collect tested-by tags [Peter, Srikanth]
Pratik R. Sampat (9):
KVM: SEV: Disable SEV-SNP on FW validation failure
KVM: SEV: Disable SEV on platform init failure
KVM: selftests: SEV-SNP test for KVM_SEV_INIT2
KVM: selftests: Add VMGEXIT helper
KVM: selftests: Introduce SEV VM type check
KVM: selftests: Add library support for interacting with SNP
KVM: selftests: Force GUEST_MEMFD flag for SNP VM type
KVM: selftests: Abstractions for SEV to decouple policy from type
KVM: selftests: Add a basic SEV-SNP smoke test
arch/x86/kvm/svm/sev.c | 6 +-
drivers/crypto/ccp/sev-dev.c | 16 +++
include/linux/psp-sev.h | 6 ++
.../selftests/kvm/include/x86/processor.h | 1 +
tools/testing/selftests/kvm/include/x86/sev.h | 55 ++++++++++-
tools/testing/selftests/kvm/lib/kvm_util.c | 7 +-
.../testing/selftests/kvm/lib/x86/processor.c | 4 +-
tools/testing/selftests/kvm/lib/x86/sev.c | 99 ++++++++++++++++++-
.../selftests/kvm/x86/sev_init2_tests.c | 13 +++
.../selftests/kvm/x86/sev_smoke_test.c | 96 ++++++++++++++----
10 files changed, 272 insertions(+), 31 deletions(-)
--
2.43.0
Signed-off-by: Miguel García <miguelgarciaroman8(a)gmail.com>
---
tools/testing/selftests/alsa/mixer-test.c | 2 +-
tools/testing/selftests/arm64/gcs/libc-gcs.c | 2 +-
tools/testing/selftests/cgroup/test_cpuset.c | 2 +-
tools/testing/selftests/mm/gup_longterm.c | 2 +-
tools/testing/selftests/mm/mseal_test.c | 2 +-
tools/testing/selftests/mm/protection_keys.c | 4 ++--
tools/testing/selftests/mm/test_vmalloc.sh | 2 +-
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 2 +-
.../pmu/event_code_tests/group_constraint_thresh_sel_test.c | 4 ++--
tools/testing/selftests/safesetid/safesetid-test.c | 2 +-
10 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/tools/testing/selftests/alsa/mixer-test.c b/tools/testing/selftests/alsa/mixer-test.c
index 2a4b2662035e..dc7b290fc4ad 100644
--- a/tools/testing/selftests/alsa/mixer-test.c
+++ b/tools/testing/selftests/alsa/mixer-test.c
@@ -685,7 +685,7 @@ static int write_and_verify(struct ctl_data *ctl,
}
/*
- * Use the libray to compare values, if there's a mismatch
+ * Use the library to compare values, if there's a mismatch
* carry on and try to provide a more useful diagnostic than
* just "mismatch".
*/
diff --git a/tools/testing/selftests/arm64/gcs/libc-gcs.c b/tools/testing/selftests/arm64/gcs/libc-gcs.c
index 17b2fabfec38..482e1c634a65 100644
--- a/tools/testing/selftests/arm64/gcs/libc-gcs.c
+++ b/tools/testing/selftests/arm64/gcs/libc-gcs.c
@@ -129,7 +129,7 @@ TEST(gcs_find_terminator)
* We can access a GCS via ptrace
*
* This could usefully have a fixture but note that each test is
- * fork()ed into a new child whcih causes issues. Might be better to
+ * fork()ed into a new child which causes issues. Might be better to
* lift at least some of this out into a separate, non-harness, test
* program.
*/
diff --git a/tools/testing/selftests/cgroup/test_cpuset.c b/tools/testing/selftests/cgroup/test_cpuset.c
index 4034d14ba69a..3f802e3e8480 100644
--- a/tools/testing/selftests/cgroup/test_cpuset.c
+++ b/tools/testing/selftests/cgroup/test_cpuset.c
@@ -160,7 +160,7 @@ static int test_cpuset_perms_object_deny(const char *root)
}
/*
- * Migrate a process between parent and child implicitely
+ * Migrate a process between parent and child implicitly
* Implicit migration happens when a controller is enabled/disabled.
*
*/
diff --git a/tools/testing/selftests/mm/gup_longterm.c b/tools/testing/selftests/mm/gup_longterm.c
index 9423ad439a61..d2dc3b59a084 100644
--- a/tools/testing/selftests/mm/gup_longterm.c
+++ b/tools/testing/selftests/mm/gup_longterm.c
@@ -154,7 +154,7 @@ static void do_test(int fd, size_t size, enum test_type type, bool shared)
/*
* R/O pinning or pinning in a private mapping is always
* expected to work. Otherwise, we expect long-term R/W pinning
- * to only succeed for special fielesystems.
+ * to only succeed for special filesystems.
*/
should_work = !shared || !rw ||
fs_supports_writable_longterm_pinning(fs_type);
diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c
index 01675c412b2a..2ec2c5aea44a 100644
--- a/tools/testing/selftests/mm/mseal_test.c
+++ b/tools/testing/selftests/mm/mseal_test.c
@@ -732,7 +732,7 @@ static void test_seal_mprotect_two_vma_with_split(bool seal)
else
FAIL_TEST_IF_FALSE(!ret);
- /* the fouth page is not sealed. */
+ /* the fourth page is not sealed. */
ret = sys_mprotect(ptr + 3 * page_size, page_size,
PROT_READ | PROT_WRITE);
FAIL_TEST_IF_FALSE(!ret);
diff --git a/tools/testing/selftests/mm/protection_keys.c b/tools/testing/selftests/mm/protection_keys.c
index 4990f7ab4cb7..fcac7bb26b7a 100644
--- a/tools/testing/selftests/mm/protection_keys.c
+++ b/tools/testing/selftests/mm/protection_keys.c
@@ -900,7 +900,7 @@ void expected_pkey_fault(int pkey)
#if defined(__i386__) || defined(__x86_64__) /* arch */
/*
- * The signal handler shold have cleared out PKEY register to let the
+ * The signal handler should have cleared out PKEY register to let the
* test program continue. We now have to restore it.
*/
if (__read_pkey_reg() != 0)
@@ -1372,7 +1372,7 @@ void test_ptrace_of_child(int *ptr, u16 pkey)
long ret;
int status;
/*
- * This is the "control" for our little expermient. Make sure
+ * This is the "control" for our little experiment. Make sure
* we can always access it when ptracing.
*/
int *plain_ptr_unaligned = malloc(HPAGE_SIZE);
diff --git a/tools/testing/selftests/mm/test_vmalloc.sh b/tools/testing/selftests/mm/test_vmalloc.sh
index d73b846736f1..2d4b3e0a6a17 100755
--- a/tools/testing/selftests/mm/test_vmalloc.sh
+++ b/tools/testing/selftests/mm/test_vmalloc.sh
@@ -21,7 +21,7 @@ ksft_skip=4
#
# Static templates for performance, stressing and smoke tests.
-# Also it is possible to pass any supported parameters manualy.
+# Also it is possible to pass any supported parameters manually.
#
PERF_PARAM="sequential_test_order=1 test_repeat_count=3"
SMOKE_PARAM="test_loop_count=10000 test_repeat_count=10"
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 70f65eb320a7..a6d9f7bd1443 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -682,7 +682,7 @@ TEST_F(mount_setattr, mount_has_writers)
ASSERT_GE(fd, 0);
/*
- * We're holding a fd open to a mount somwhere in the middle so this
+ * We're holding a fd open to a mount somewhere in the middle so this
* needs to fail somewhere in the middle. After this the mount options
* need to be unchanged.
*/
diff --git a/tools/testing/selftests/powerpc/pmu/event_code_tests/group_constraint_thresh_sel_test.c b/tools/testing/selftests/powerpc/pmu/event_code_tests/group_constraint_thresh_sel_test.c
index 50a8cd843ce7..1b1336c1ddb1 100644
--- a/tools/testing/selftests/powerpc/pmu/event_code_tests/group_constraint_thresh_sel_test.c
+++ b/tools/testing/selftests/powerpc/pmu/event_code_tests/group_constraint_thresh_sel_test.c
@@ -34,7 +34,7 @@ static int group_constraint_thresh_sel(void)
/* Check for platform support for the test */
SKIP_IF(platform_check_for_tests());
- /* Init the events for the group contraint thresh select test */
+ /* Init the events for the group constraint thresh select test */
event_init(&leader, EventCode_1);
FAIL_IF(event_open(&leader));
@@ -45,7 +45,7 @@ static int group_constraint_thresh_sel(void)
event_close(&event);
- /* Init the event for the group contraint thresh select test */
+ /* Init the event for the group constraint thresh select test */
event_init(&event, EventCode_3);
/* Expected to succeed as sibling and leader event request same thresh_sel bits */
diff --git a/tools/testing/selftests/safesetid/safesetid-test.c b/tools/testing/selftests/safesetid/safesetid-test.c
index eb9bf0aee951..80f736d545a9 100644
--- a/tools/testing/selftests/safesetid/safesetid-test.c
+++ b/tools/testing/selftests/safesetid/safesetid-test.c
@@ -19,7 +19,7 @@
/*
* NOTES about this test:
- * - requries libcap-dev to be installed on test system
+ * - requires libcap-dev to be installed on test system
* - requires securityfs to me mounted at /sys/kernel/security, e.g.:
* mount -n -t securityfs -o nodev,noexec,nosuid securityfs /sys/kernel/security
* - needs CONFIG_SECURITYFS and CONFIG_SAFESETID to be enabled
--
2.34.1
From: Brian Norris <briannorris(a)chromium.org>
[ Upstream commit 7687c66c18c66d4ccd9949c6f641c0e7b5773483 ]
If the <kunit/platform_device.h> header is included in a test without
certain other headers, it produces compiler warnings like:
In file included from [...]
../include/kunit/platform_device.h:15:57: warning: ‘struct completion’
declared inside parameter list will not be visible outside of this
definition or declaration
15 | struct completion *x);
| ^~~~~~~~~~
Add a 'struct completion' forward declaration to resolve this.
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202412241958.dbAImJsA-lkp@intel.com/
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
Reviewed-by: David Gow <davidgow(a)google.com>
Link: https://lore.kernel.org/r/20241213180841.3023843-1-briannorris@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
include/kunit/platform_device.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/kunit/platform_device.h b/include/kunit/platform_device.h
index 0fc0999d2420a..f8236a8536f7e 100644
--- a/include/kunit/platform_device.h
+++ b/include/kunit/platform_device.h
@@ -2,6 +2,7 @@
#ifndef _KUNIT_PLATFORM_DRIVER_H
#define _KUNIT_PLATFORM_DRIVER_H
+struct completion;
struct kunit;
struct platform_device;
struct platform_driver;
--
2.39.5
From: Brian Norris <briannorris(a)chromium.org>
[ Upstream commit 7687c66c18c66d4ccd9949c6f641c0e7b5773483 ]
If the <kunit/platform_device.h> header is included in a test without
certain other headers, it produces compiler warnings like:
In file included from [...]
../include/kunit/platform_device.h:15:57: warning: ‘struct completion’
declared inside parameter list will not be visible outside of this
definition or declaration
15 | struct completion *x);
| ^~~~~~~~~~
Add a 'struct completion' forward declaration to resolve this.
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202412241958.dbAImJsA-lkp@intel.com/
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
Reviewed-by: David Gow <davidgow(a)google.com>
Link: https://lore.kernel.org/r/20241213180841.3023843-1-briannorris@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
include/kunit/platform_device.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/kunit/platform_device.h b/include/kunit/platform_device.h
index 0fc0999d2420a..f8236a8536f7e 100644
--- a/include/kunit/platform_device.h
+++ b/include/kunit/platform_device.h
@@ -2,6 +2,7 @@
#ifndef _KUNIT_PLATFORM_DRIVER_H
#define _KUNIT_PLATFORM_DRIVER_H
+struct completion;
struct kunit;
struct platform_device;
struct platform_driver;
--
2.39.5
This patch series introduces "kci-gitlab," a GitLab CI pipeline
specifically designed for kernel testing. It provides kernel
developers with an integrated, efficient, and flexible testing
framework using GitLab's CI/CD capabilities. This patch includes
a .gitlab-ci file in the tools/ci/gitlab-ci/ folder, along with
additional YAML and script files, to define a basic test pipeline
triggered by code pushes to a GitLab-CI instance.
The initial version implements:
- Static checks: Includes checkpatch and smatch for code validation.
- Build tests: Covers various architectures and configurations.
- Boot tests: Utilizes virtme for basic boot testing.
Additionally, it introduces a flexible "scenarios" mechanism to
support subsystem-specific extensions.
This series also introduces a drm scenario that adds a job to run IGT
tests for vkms. This scenario includes helper scripts to build deqp-runner
and IGT, leveraging approaches from the drm-ci/mesa-ci project.
We are working towards creating a generic, upstream GitLab-CI pipeline
(kci-gitlab) that will replace DRM-CI [1]. The proposed GitLab-CI pipeline
is designed with a distributed infrastructure model, making it possible
to run on any gitLab instance. We plan to leverage KernelCI [2] as the
backend, utilizing its hardware, rootfs, test plans, and KCIDB [3]
integration.
For an example of a fully executed pipeline with drm scenario set,
including documentation generation,
see: https://gitlab.freedesktop.org/vigneshraman/kernel/-/pipelines/1350262
Please refer to the documentation included in the patch, or check the
rendered version, here:
https://vigneshraman.pages.freedesktop.org/-/kernel/-/jobs/69787927/artifac…
Differences from v1 to v2:
- moved to tools/ci as suggested by Linus on the previous version
- add arm64 containers for native compilation
- added boot tests using virtme: this is the base structure for boot tests,
next steps would be adding other tests such as kselftests/kunit tests
- added DRM scenario testing on vkms: this should replace current vkms test
in drm-ci. This work shows how a test scenario can be used by different
subsystems to add their tests.
- update documentation
For more details on the motivation behind this work, please refer to the
cover letter of v1: https://patchwork.kernel.org/project/linux-kselftest/cover/20240228225527.1…
[1] https://www.collabora.com/news-and-blog/blog/2024/02/08/drm-ci-a-gitlab-ci-…
[2] https://kernelci.org/
[3] https://docs.kernelci.org/kcidb/
Helen Koike (3):
kci-gitlab: Introducing GitLab-CI Pipeline for Kernel Testing
kci-gitlab: Add documentation
kci-gitlab: docs: Add images
Vignesh Raman (2):
MAINTAINERS: Add an entry for ci automated testing
kci-gitlab: Add drm scenario
Documentation/ci/gitlab-ci/gitlab-ci.rst | 471 ++++++++++
.../ci/gitlab-ci/images/drm-vkms.png | Bin 0 -> 73810 bytes
.../ci/gitlab-ci/images/job-matrix.png | Bin 0 -> 20000 bytes
.../gitlab-ci/images/new-project-runner.png | Bin 0 -> 607737 bytes
.../ci/gitlab-ci/images/pipelines-on-push.png | Bin 0 -> 532143 bytes
.../ci/gitlab-ci/images/the-pipeline.png | Bin 0 -> 62464 bytes
.../ci/gitlab-ci/images/variables.png | Bin 0 -> 277518 bytes
Documentation/index.rst | 7 +
MAINTAINERS | 10 +
tools/ci/gitlab-ci/arm_cross_compile.yml | 9 +
tools/ci/gitlab-ci/arm_native_compile.yml | 20 +
tools/ci/gitlab-ci/bootstrap-gitlab-runner.sh | 55 ++
tools/ci/gitlab-ci/build.yml | 43 +
tools/ci/gitlab-ci/cache.yml | 24 +
tools/ci/gitlab-ci/ci-scripts/build-docs.sh | 35 +
tools/ci/gitlab-ci/ci-scripts/build-kernel.sh | 43 +
.../ci/gitlab-ci/ci-scripts/ici-functions.sh | 106 +++
.../ci/gitlab-ci/ci-scripts/install-smatch.sh | 13 +
.../ci-scripts/parse_commit_message.sh | 27 +
.../ci/gitlab-ci/ci-scripts/run-checkpatch.sh | 20 +
tools/ci/gitlab-ci/ci-scripts/run-smatch.sh | 45 +
tools/ci/gitlab-ci/ci-scripts/run-virtme.sh | 52 ++
tools/ci/gitlab-ci/ci-scripts/test-boot.sh | 14 +
tools/ci/gitlab-ci/container.yml | 114 +++
tools/ci/gitlab-ci/docker-compose.yaml | 18 +
tools/ci/gitlab-ci/gitlab-ci.yml | 72 ++
tools/ci/gitlab-ci/scenarios.yml | 15 +
.../scenarios/drm/build-deqp-runner.sh | 42 +
tools/ci/gitlab-ci/scenarios/drm/build-igt.sh | 80 ++
.../ci/gitlab-ci/scenarios/drm/build-rust.sh | 42 +
.../scenarios/drm/create-cross-file.sh | 65 ++
tools/ci/gitlab-ci/scenarios/drm/drm.yml | 44 +
.../scenarios/drm/prepare-container.sh | 18 +
tools/ci/gitlab-ci/scenarios/drm/run-igt.sh | 83 ++
tools/ci/gitlab-ci/scenarios/drm/test.yml | 32 +
.../scenarios/drm/xfails/vkms-none-fails.txt | 22 +
.../scenarios/drm/xfails/vkms-none-flakes.txt | 90 ++
.../scenarios/drm/xfails/vkms-none-skips.txt | 812 ++++++++++++++++++
.../scenarios/file-systems/file-systems.yml | 11 +
tools/ci/gitlab-ci/scenarios/media/media.yml | 11 +
.../gitlab-ci/scenarios/network/network.yml | 11 +
tools/ci/gitlab-ci/static-checks.yml | 21 +
tools/ci/gitlab-ci/test.yml | 16 +
43 files changed, 2613 insertions(+)
create mode 100644 Documentation/ci/gitlab-ci/gitlab-ci.rst
create mode 100644 Documentation/ci/gitlab-ci/images/drm-vkms.png
create mode 100644 Documentation/ci/gitlab-ci/images/job-matrix.png
create mode 100644 Documentation/ci/gitlab-ci/images/new-project-runner.png
create mode 100644 Documentation/ci/gitlab-ci/images/pipelines-on-push.png
create mode 100644 Documentation/ci/gitlab-ci/images/the-pipeline.png
create mode 100644 Documentation/ci/gitlab-ci/images/variables.png
create mode 100644 tools/ci/gitlab-ci/arm_cross_compile.yml
create mode 100644 tools/ci/gitlab-ci/arm_native_compile.yml
create mode 100755 tools/ci/gitlab-ci/bootstrap-gitlab-runner.sh
create mode 100644 tools/ci/gitlab-ci/build.yml
create mode 100644 tools/ci/gitlab-ci/cache.yml
create mode 100755 tools/ci/gitlab-ci/ci-scripts/build-docs.sh
create mode 100755 tools/ci/gitlab-ci/ci-scripts/build-kernel.sh
create mode 100644 tools/ci/gitlab-ci/ci-scripts/ici-functions.sh
create mode 100755 tools/ci/gitlab-ci/ci-scripts/install-smatch.sh
create mode 100755 tools/ci/gitlab-ci/ci-scripts/parse_commit_message.sh
create mode 100755 tools/ci/gitlab-ci/ci-scripts/run-checkpatch.sh
create mode 100755 tools/ci/gitlab-ci/ci-scripts/run-smatch.sh
create mode 100755 tools/ci/gitlab-ci/ci-scripts/run-virtme.sh
create mode 100755 tools/ci/gitlab-ci/ci-scripts/test-boot.sh
create mode 100644 tools/ci/gitlab-ci/container.yml
create mode 100644 tools/ci/gitlab-ci/docker-compose.yaml
create mode 100644 tools/ci/gitlab-ci/gitlab-ci.yml
create mode 100644 tools/ci/gitlab-ci/scenarios.yml
create mode 100755 tools/ci/gitlab-ci/scenarios/drm/build-deqp-runner.sh
create mode 100755 tools/ci/gitlab-ci/scenarios/drm/build-igt.sh
create mode 100755 tools/ci/gitlab-ci/scenarios/drm/build-rust.sh
create mode 100755 tools/ci/gitlab-ci/scenarios/drm/create-cross-file.sh
create mode 100644 tools/ci/gitlab-ci/scenarios/drm/drm.yml
create mode 100755 tools/ci/gitlab-ci/scenarios/drm/prepare-container.sh
create mode 100755 tools/ci/gitlab-ci/scenarios/drm/run-igt.sh
create mode 100644 tools/ci/gitlab-ci/scenarios/drm/test.yml
create mode 100644 tools/ci/gitlab-ci/scenarios/drm/xfails/vkms-none-fails.txt
create mode 100644 tools/ci/gitlab-ci/scenarios/drm/xfails/vkms-none-flakes.txt
create mode 100644 tools/ci/gitlab-ci/scenarios/drm/xfails/vkms-none-skips.txt
create mode 100644 tools/ci/gitlab-ci/scenarios/file-systems/file-systems.yml
create mode 100644 tools/ci/gitlab-ci/scenarios/media/media.yml
create mode 100644 tools/ci/gitlab-ci/scenarios/network/network.yml
create mode 100644 tools/ci/gitlab-ci/static-checks.yml
create mode 100644 tools/ci/gitlab-ci/test.yml
--
2.43.0
Hi all,
This patch series continues the work to migrate the *.sh tests into
prog_tests framework.
test_xdp_redirect_multi.sh tests the XDP redirections done through
bpf_redirect_map().
This is already partly covered by test_xdp_veth.c that already tests
map redirections at XDP level. What isn't covered yet by test_xdp_veth is
the use of the broadcast flags (BPF_F_BROADCAST or BPF_F_EXCLUDE_INGRESS)
and XDP egress programs.
Hence, this patch series add test cases to test_xdp_veth.c to get rid of
the test_xdp_redirect_multi.sh:
- PATCH 1 Add an helper to generate unique names
- PATCH 2 to 9 rework test_xdp_veth to make it more generic and allow to
configure different test cases
- PATCH 10 adds test cases for 'classic' bpf_redirect_map()
- PATCH 11 and 12 cover the broadcast flags
- PATCH 13 covers the XDP egress programs
- PATCH 14 removes test_xdp_redirect_multi.sh
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
---
Changes in v3:
- Add append_tid() helper and use unique names to allow parallel testing
- Check create_network()'s return value through ASSERT_OK()
- Remove check_ping() and unused defines
- Change next_veth type (from string to int)
- Link to v2: https://lore.kernel.org/r/20250121-redirect-multi-v2-0-fc9cacabc6b2@bootlin…
Changes in v2:
- Use serial_test_* to avoid conflict between tests
- Link to v1: https://lore.kernel.org/r/20250121-redirect-multi-v1-0-b215e35ff505@bootlin…
---
Bastien Curutchet (eBPF Foundation) (14):
selftests/bpf: helpers: Add append_tid()
selftests/bpf: test_xdp_veth: Remove unused defines
selftests/bpf: test_xdp_veth: Remove unecessarry check_ping()
selftests/bpf: test_xdp_veth: Use int to describe next veth
selftests/bpf: test_xdp_veth: Split network configuration
selftests/bpf: test_xdp_veth: Rename config[]
selftests/bpf: test_xdp_veth: Add prog_config[] table
selftests/bpf: test_xdp_veth: Add XDP flags to prog_configuration
selftests/bpf: test_xdp_veth: Use unique names
selftests/bpf: test_xdp_veth: Add new test cases for XDP flags
selftests/bpf: Optionally select broadcasting flags
selftests/bpf: test_xdp_veth: Add XDP broadcast redirection tests
selftests/bpf: test_xdp_veth: Add XDP program on egress test
selftests/bpf: Remove test_xdp_redirect_multi.sh
tools/testing/selftests/bpf/Makefile | 2 -
tools/testing/selftests/bpf/network_helpers.c | 11 +
tools/testing/selftests/bpf/network_helpers.h | 10 +
.../selftests/bpf/prog_tests/test_xdp_veth.c | 589 ++++++++++++++++-----
.../testing/selftests/bpf/progs/xdp_redirect_map.c | 89 ++++
.../selftests/bpf/progs/xdp_redirect_multi_kern.c | 41 +-
.../selftests/bpf/test_xdp_redirect_multi.sh | 214 --------
tools/testing/selftests/bpf/xdp_redirect_multi.c | 226 --------
8 files changed, 608 insertions(+), 574 deletions(-)
---
base-commit: e65d1a6e096af0648ffdbc9d7e8bd7a44dcc700b
change-id: 20250103-redirect-multi-245d6eafb5d1
Best regards,
--
Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
The upcoming new Idle HLT Intercept feature allows for the HLT
instruction execution by a vCPU to be intercepted by the hypervisor
only if there are no pending V_INTR and V_NMI events for the vCPU.
When the vCPU is expected to service the pending V_INTR and V_NMI
events, the Idle HLT intercept won’t trigger. The feature allows the
hypervisor to determine if the vCPU is actually idle and reduces
wasteful VMEXITs.
The Idle HLT intercept feature is used for enlightened guests who wish
to securely handle the events. When an enlightened guest does a HLT
while an interrupt is pending, hypervisor will not have a way to
figure out whether the guest needs to be re-entered or not. The Idle
HLT intercept feature allows the HLT execution only if there are no
pending V_INTR and V_NMI events.
Presence of the Idle HLT Intercept feature is indicated via CPUID
function Fn8000_000A_EDX[30].
Document for the Idle HLT intercept feature is available at [1].
This series is based on kvm-x86/next (13e98294d7ce) + [2] + [3].
Testing Done:
- Tested the functionality for the Idle HLT intercept feature
using selftest ipi_hlt_test.
- Tested on normal, SEV, SEV-ES, SEV-SNP guest for the Idle HLT intercept
functionality.
- Tested the Idle HLT intercept functionality on nested guest.
v4 -> v5
- Incorporated Sean's review comments on nested Idle HLT intercept support.
- Make svm_idle_hlt_test independent of the Idle HLT to run on all hardware.
v3 -> v4
- Drop the patches to add vcpu_get_stat() into a new series [2].
- Added nested Idle HLT intercept support.
v2 -> v3
- Incorporated Andrew's suggestion to structure vcpu_stat_types in
a way that each architecture can share the generic types and also
provide its own.
v1 -> v2
- Done changes in svm_idle_hlt_test based on the review comments from Sean.
- Added an enum based approach to get binary stats in vcpu_get_stat() which
doesn't use string to get stat data based on the comments from Sean.
- Added safe_halt() and cli() helpers based on the comments from Sean.
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024,
Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT).
https://bugzilla.kernel.org/attachment.cgi?id=306250
[2]: https://lore.kernel.org/kvm/20241220013906.3518334-1-seanjc@google.com/T/#u
[3]: https://lore.kernel.org/kvm/20241220012617.3513898-1-seanjc@google.com/T/#u
---
V4: https://lore.kernel.org/kvm/20241022054810.23369-1-manali.shukla@amd.com/
V3: https://lore.kernel.org/kvm/20240528041926.3989-4-manali.shukla@amd.com/T/
V2: https://lore.kernel.org/kvm/20240501145433.4070-1-manali.shukla@amd.com/
V1: https://lore.kernel.org/kvm/20240307054623.13632-1-manali.shukla@amd.com/
Manali Shukla (3):
x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept
KVM: SVM: Add Idle HLT intercept support
KVM: selftests: Add self IPI HLT test
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/svm.h | 1 +
arch/x86/include/uapi/asm/svm.h | 2 +
arch/x86/kvm/svm/svm.c | 13 ++-
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../selftests/kvm/include/x86/processor.h | 1 +
tools/testing/selftests/kvm/ipi_hlt_test.c | 85 +++++++++++++++++++
7 files changed, 101 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/kvm/ipi_hlt_test.c
base-commit: 13e98294d7cec978e31138d16824f50556a62d17
prerequisite-patch-id: cb345fc0d814a351df2b5788b76eee0eef9de549
prerequisite-patch-id: 71806f400cffe09f47d6231cb072cbdbd540de1b
prerequisite-patch-id: 9ea0412aab7ecd8555fcee3e9609dbfe8456d47b
prerequisite-patch-id: 3504df50cdd33958456f2e56139d76867273525c
prerequisite-patch-id: 674e56729a56cc487cb85be1a64ef561eb7bac8a
prerequisite-patch-id: 48e87354f9d6e6bd121ca32ab73cd0d7f1dce74f
prerequisite-patch-id: 74daffd7677992995f37e5a5cb784b8d4357e342
prerequisite-patch-id: 509018dc2fc1657debc641544e86f5a92d04bc1a
prerequisite-patch-id: 4a50c6a4dc3b3c8c8c640a86072faafb7bae4384
--
2.34.1
From: Kevin Brodsky <kevin.brodsky(a)arm.com>
[ Upstream commit 46036188ea1f5266df23a6149dea0df1c77cd1c7 ]
The mm kselftests are currently built with no optimisation (-O0). It's
unclear why, and besides being obviously suboptimal, this also prevents
the pkeys tests from working as intended. Let's build all the tests with
-O2.
[kevin.brodsky(a)arm.com: silence unused-result warnings]
Link: https://lkml.kernel.org/r/20250107170110.2819685-1-kevin.brodsky@arm.com
Link: https://lkml.kernel.org/r/20241209095019.1732120-6-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky(a)arm.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna(a)oracle.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Joey Gouly <joey.gouly(a)arm.com>
Cc: Keith Lucas <keith.lucas(a)oracle.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
(cherry picked from commit 46036188ea1f5266df23a6149dea0df1c77cd1c7)
[Yifei: This commit also fix the failure of pkey_sighandler_tests_64,
which is also in linux-6.12.y, thus backport this commit]
Signed-off-by: Yifei Liu <yifei.l.liu(a)oracle.com>
---
tools/testing/selftests/mm/Makefile | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 02e1204971b0..c0138cb19705 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -33,9 +33,16 @@ endif
# LDLIBS.
MAKEFLAGS += --no-builtin-rules
-CFLAGS = -Wall -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES) $(TOOLS_INCLUDES)
+CFLAGS = -Wall -O2 -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES) $(TOOLS_INCLUDES)
LDLIBS = -lrt -lpthread -lm
+# Some distributions (such as Ubuntu) configure GCC so that _FORTIFY_SOURCE is
+# automatically enabled at -O1 or above. This triggers various unused-result
+# warnings where functions such as read() or write() are called and their
+# return value is not checked. Disable _FORTIFY_SOURCE to silence those
+# warnings.
+CFLAGS += -U_FORTIFY_SOURCE
+
TEST_GEN_FILES = cow
TEST_GEN_FILES += compaction_test
TEST_GEN_FILES += gup_longterm
--
2.46.0
Package build environments like Fedora rpmbuild introduced hardening
options (e.g. -pie -Wl,-z,now) by passing a -spec option to CFLAGS
and LDFLAGS.
Some Makefiles currently override CFLAGS but not LDFLAGS, which leads
to a mismatch and build failure, for example:
/usr/bin/ld: /tmp/ccd2apay.o: relocation R_X86_64_32 against
`.rodata.str1.1' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: failed to set dynamic section sizes: bad value
collect2: error: ld returned 1 exit status
make[1]: *** [../../lib.mk:222: tools/testing/selftests/net/lib/csum] Error 1
openvswitch/Makefile CFLAGS currently do not appear to be used, but
fix it anyway for the case when new tests are introduced in future.
Fixes: 1d0dc857b5d8 ("selftests: drv-net: add checksum tests")
Signed-off-by: Jan Stancek <jstancek(a)redhat.com>
---
tools/testing/selftests/net/lib/Makefile | 2 +-
tools/testing/selftests/net/openvswitch/Makefile | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/lib/Makefile b/tools/testing/selftests/net/lib/Makefile
index 18b9443454a9..578de40cc5e3 100644
--- a/tools/testing/selftests/net/lib/Makefile
+++ b/tools/testing/selftests/net/lib/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
-CFLAGS = -Wall -Wl,--no-as-needed -O2 -g
+CFLAGS += -Wall -Wl,--no-as-needed -O2 -g
CFLAGS += -I../../../../../usr/include/ $(KHDR_INCLUDES)
# Additional include paths needed by kselftest.h
CFLAGS += -I../../
diff --git a/tools/testing/selftests/net/openvswitch/Makefile b/tools/testing/selftests/net/openvswitch/Makefile
index 2f1508abc826..1567a549ba34 100644
--- a/tools/testing/selftests/net/openvswitch/Makefile
+++ b/tools/testing/selftests/net/openvswitch/Makefile
@@ -2,7 +2,7 @@
top_srcdir = ../../../../..
-CFLAGS = -Wall -Wl,--no-as-needed -O2 -g -I$(top_srcdir)/usr/include $(KHDR_INCLUDES)
+CFLAGS += -Wall -Wl,--no-as-needed -O2 -g -I$(top_srcdir)/usr/include $(KHDR_INCLUDES)
TEST_PROGS := openvswitch.sh
--
2.43.0
This series expands the XDP TX metadata framework to allow user
applications to pass per packet 64-bit launch time directly to the kernel
driver, requesting launch time hardware offload support. The XDP TX
metadata framework will not perform any clock conversion or packet
reordering.
Please note that the role of Tx metadata is just to pass the launch time,
not to enable the offload feature. Users will need to enable the launch
time hardware offload feature of the device by using the respective
command, such as the tc-etf command.
Although some devices use the tc-etf command to enable their launch time
hardware offload feature, xsk packets will not go through the etf qdisc.
Therefore, in my opinion, the launch time should always be based on the PTP
Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the
clock source.
To simplify the test steps, I modified the xdp_hw_metadata bpf self-test
tool in such a way that it will set the launch time based on the offset
provided by the user and the value of the Receive Hardware Timestamp, which
is against the PHC. This will eliminate the need to discipline System Clock
with the PHC and then use clock_gettime() to get the time.
Please note that AF_XDP lacks a feedback mechanism to inform the
application if the requested launch time is invalid. So, users are expected
to familiar with the horizon of the launch time of the device they use and
not request a launch time that is beyond the horizon. Otherwise, the driver
might interpret the launch time incorrectly and react wrongly. For stmmac
and igc, where modulo computation is used, a launch time larger than the
horizon will cause the device to transmit the packet earlier that the
requested launch time.
Although there is no feedback mechanism for the launch time request
for now, user still can check whether the requested launch time is
working or not, by requesting the Transmit Completion Hardware Timestamp.
Changes since v1:
- renamed to use Earliest TxTime First (Willem)
- renamed to use txtime (Willem)
Changes since v2:
- renamed to use launch time (Jesper & Willem)
- changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s
because some NICs do not support such a large future time.
Changes since v3:
- added XDP launch time support to the igc driver (Jesper & Florian)
- added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper)
- added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub)
- added step to enable launch time in the commit message (Jesper & Willem)
- explicitly documented the type of launch_time and which clock source
it is against (Willem)
Changes since v4:
- change netdev feature name from tx-launch-time to tx-launch-time-fifo
to explicitly state the FIFO behaviour (Stanislav)
- improve the looping of xdp_hw_metadata app to wait for packet tx
completion to be more readable by using clock_gettime() (Stanislav)
- add launch time setup steps into xdp_hw_metadata app (Stanislav)
Changes since v5:
- fix selftest build errors by using asprintf() and realloc() instead of
managing the buffer sizes manually (Daniel, Stanislav)
v1: https://patchwork.kernel.org/project/netdevbpf/cover/20231130162028.852006-…
v2: https://patchwork.kernel.org/project/netdevbpf/cover/20231201062421.1074768…
v3: https://patchwork.kernel.org/project/netdevbpf/cover/20231203165129.1740512…
v4: https://patchwork.kernel.org/project/netdevbpf/cover/20250106135506.9687-1-…
v5: https://patchwork.kernel.org/project/netdevbpf/cover/20250114152718.120588-…
Song Yoong Siang (4):
xsk: Add launch time hardware offload support to XDP Tx metadata
selftests/bpf: Add launch time request to xdp_hw_metadata
net: stmmac: Add launch time support to XDP ZC
igc: Add launch time support to XDP ZC
Documentation/netlink/specs/netdev.yaml | 4 +
Documentation/networking/xsk-tx-metadata.rst | 62 +++++++
drivers/net/ethernet/intel/igc/igc_main.c | 78 +++++---
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 +
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++
include/net/xdp_sock.h | 10 ++
include/net/xdp_sock_drv.h | 1 +
include/uapi/linux/if_xdp.h | 10 ++
include/uapi/linux/netdev.h | 3 +
net/core/netdev-genl.c | 2 +
net/xdp/xsk.c | 3 +
tools/include/uapi/linux/if_xdp.h | 10 ++
tools/include/uapi/linux/netdev.h | 3 +
tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++-
14 files changed, 342 insertions(+), 27 deletions(-)
--
2.34.1
Enabling a (modular) test should not silently enable additional kernel
functionality, as that may increase the attack vector of a product.
Fix this by making FW_CS_DSP_KUNIT_TEST (and FW_CS_DSP_KUNIT_TEST_UTILS)
depend on REGMAP instead of selecting it.
After this, one can safely enable CONFIG_KUNIT_ALL_TESTS=m to build
modules for all appropriate tests for ones system, without pulling in
extra unwanted functionality, while still allowing a tester to manually
enable REGMAP_BUILD and this test suite on a system where REGMAP is not
enabled by default.
Fixes: dd0b6b1f29b92202 ("firmware: cs_dsp: Add KUnit testing of bin file download")
Signed-off-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
---
See also commits 70a640c0efa76674 ("regmap: REGMAP_KUNIT should not
select REGMAP") and 47ee108a113c72e ("regmap: Provide user selectable
option to enable regmap").
BTW, what's the point in having separate FW_CS_DSP_KUNIT_TEST and
FW_CS_DSP_KUNIT_TEST_UTILS symbols? They are always enabled or disabled
together.
---
drivers/firmware/cirrus/Kconfig | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/firmware/cirrus/Kconfig b/drivers/firmware/cirrus/Kconfig
index ee09269c63b51173..0a883091259a2c11 100644
--- a/drivers/firmware/cirrus/Kconfig
+++ b/drivers/firmware/cirrus/Kconfig
@@ -6,15 +6,13 @@ config FW_CS_DSP
config FW_CS_DSP_KUNIT_TEST_UTILS
tristate
- depends on KUNIT
- select REGMAP
+ depends on KUNIT && REGMAP
select FW_CS_DSP
config FW_CS_DSP_KUNIT_TEST
tristate "KUnit tests for Cirrus Logic cs_dsp" if !KUNIT_ALL_TESTS
- depends on KUNIT
+ depends on KUNIT && REGMAP
default KUNIT_ALL_TESTS
- select REGMAP
select FW_CS_DSP
select FW_CS_DSP_KUNIT_TEST_UTILS
help
--
2.43.0
Create a dedicated .gitignore for the tpm2 tests.
Move tpm2 related entries from parent directory's .gitignore.
Signed-off-by: Khaled Elnaggar <khaledelnaggarlinux(a)gmail.com>
---
Hello, as per Shuah's review, instead of adding another entry at
selftests/.gitignore, I created the dedicated .gitignore for
tpm2 tests.
Aside: CCing linux-kernel-mentees as I am working on the mentorship
application tasks.
Thanks
Changes in v2:
- Created a dedicated .gitignore
---
tools/testing/selftests/.gitignore | 1 -
tools/testing/selftests/tpm2/.gitignore | 4 ++++
2 files changed, 4 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/tpm2/.gitignore
diff --git a/tools/testing/selftests/.gitignore b/tools/testing/selftests/.gitignore
index cb24124ac5b9..674aaa02e396 100644
--- a/tools/testing/selftests/.gitignore
+++ b/tools/testing/selftests/.gitignore
@@ -4,7 +4,6 @@ gpiogpio-hammer
gpioinclude/
gpiolsgpio
kselftest_install/
-tpm2/SpaceTest.log
# Python bytecode and cache
__pycache__/
diff --git a/tools/testing/selftests/tpm2/.gitignore b/tools/testing/selftests/tpm2/.gitignore
new file mode 100644
index 000000000000..910bbdbb336a
--- /dev/null
+++ b/tools/testing/selftests/tpm2/.gitignore
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+AsyncTest.log
+SpaceTest.log
+
--
2.45.2
Changes in v6:
- Move la57 check to using mmap().
- Merge kernel_has_lam() and cpu_has_lam() into lam_is_available() since
the syscall (if CONFIG_ADDRESS_MASKING is set) and cpuid check
provides the same information.
Recent change in how get_user() handles pointers [1] has a specific case
for LAM. It assigns a different bitmask that's later used to check
whether a pointer comes from userland in get_user().
While currently commented out (until LASS [2] is merged into the kernel)
it's worth making changes to the LAM selftest ahead of time.
Modify cpu_has_la57() so it provides current paging level information
instead of the cpuid one.
Add test case to LAM that utilizes a ioctl (FIOASYNC) syscall which uses
get_user() in its implementation. Execute the syscall with differently
tagged pointers to verify that valid user pointers are passing through
and invalid kernel/non-canonical pointers are not.
Also to avoid unhelpful test failures add a check in main() to skip
running tests if LAM was not compiled into the kernel.
Code was tested on a Sierra Forest Xeon machine that's LAM capable. The
test was ran without issues with both the LAM lines from [1] untouched
and commented out. The test was also ran without issues with LAM_SUP
both enabled and disabled.
4/5 level pagetables code paths were also successfully tested in Simics
on a 5-level capable machine.
[1] https://lore.kernel.org/all/20241024013214.129639-1-torvalds@linux-foundati…
[2] https://lore.kernel.org/all/20241028160917.1380714-1-alexander.shishkin@lin…
Previous series entries:
[v1] https://lore.kernel.org/all/20241029141421.715686-1-maciej.wieczor-retman@i…
[v2] https://lore.kernel.org/all/20241121090651.254054-1-maciej.wieczor-retman@i…
[v3] https://lore.kernel.org/all/20241122085521.270802-1-maciej.wieczor-retman@i…
[v4] https://lore.kernel.org/all/cover.1732627541.git.maciej.wieczor-retman@inte…
[v5] https://lore.kernel.org/all/cover.1732728879.git.maciej.wieczor-retman@inte…
Maciej Wieczor-Retman (3):
selftests/lam: Move cpu_has_la57() to use cpuinfo flag
selftests/lam: Skip test if LAM is disabled
selftests/lam: Test get_user() LAM pointer handling
tools/testing/selftests/x86/lam.c | 147 +++++++++++++++++++++++++++---
1 file changed, 136 insertions(+), 11 deletions(-)
--
2.47.1
From: Kumar Kartikeya Dwivedi <memxor(a)gmail.com>
[ Upstream commit cbd8730aea8d79cda6b0f3c18b406dfdef0c1b80 ]
The verifier log when leaking resources on BPF_EXIT may be a bit
confusing, as it's a problem only when finally existing from the main
prog, not from any of the subprogs. Hence, update the verifier error
string and the corresponding selftests matching on it.
Acked-by: Eduard Zingerman <eddyz87(a)gmail.com>
Suggested-by: Eduard Zingerman <eddyz87(a)gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor(a)gmail.com>
Link: https://lore.kernel.org/r/20241204030400.208005-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/bpf/verifier.c | 2 +-
.../testing/selftests/bpf/progs/exceptions_fail.c | 4 ++--
tools/testing/selftests/bpf/progs/preempt_lock.c | 14 +++++++-------
.../selftests/bpf/progs/verifier_spin_lock.c | 2 +-
4 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 77f56674aaa99..4f02345b764fd 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18803,7 +18803,7 @@ static int do_check(struct bpf_verifier_env *env)
* match caller reference state when it exits.
*/
err = check_resource_leak(env, exception_exit, !env->cur_state->curframe,
- "BPF_EXIT instruction");
+ "BPF_EXIT instruction in main prog");
if (err)
return err;
diff --git a/tools/testing/selftests/bpf/progs/exceptions_fail.c b/tools/testing/selftests/bpf/progs/exceptions_fail.c
index fe0f3fa5aab68..8a0fdff899271 100644
--- a/tools/testing/selftests/bpf/progs/exceptions_fail.c
+++ b/tools/testing/selftests/bpf/progs/exceptions_fail.c
@@ -131,7 +131,7 @@ int reject_subprog_with_lock(void *ctx)
}
SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_rcu_read_lock-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_rcu_read_lock-ed region")
int reject_with_rcu_read_lock(void *ctx)
{
bpf_rcu_read_lock();
@@ -147,7 +147,7 @@ __noinline static int throwing_subprog(struct __sk_buff *ctx)
}
SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_rcu_read_lock-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_rcu_read_lock-ed region")
int reject_subprog_with_rcu_read_lock(void *ctx)
{
bpf_rcu_read_lock();
diff --git a/tools/testing/selftests/bpf/progs/preempt_lock.c b/tools/testing/selftests/bpf/progs/preempt_lock.c
index 885377e836077..5269571cf7b57 100644
--- a/tools/testing/selftests/bpf/progs/preempt_lock.c
+++ b/tools/testing/selftests/bpf/progs/preempt_lock.c
@@ -6,7 +6,7 @@
#include "bpf_experimental.h"
SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
int preempt_lock_missing_1(struct __sk_buff *ctx)
{
bpf_preempt_disable();
@@ -14,7 +14,7 @@ int preempt_lock_missing_1(struct __sk_buff *ctx)
}
SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
int preempt_lock_missing_2(struct __sk_buff *ctx)
{
bpf_preempt_disable();
@@ -23,7 +23,7 @@ int preempt_lock_missing_2(struct __sk_buff *ctx)
}
SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
int preempt_lock_missing_3(struct __sk_buff *ctx)
{
bpf_preempt_disable();
@@ -33,7 +33,7 @@ int preempt_lock_missing_3(struct __sk_buff *ctx)
}
SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
int preempt_lock_missing_3_minus_2(struct __sk_buff *ctx)
{
bpf_preempt_disable();
@@ -55,7 +55,7 @@ static __noinline void preempt_enable(void)
}
SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
int preempt_lock_missing_1_subprog(struct __sk_buff *ctx)
{
preempt_disable();
@@ -63,7 +63,7 @@ int preempt_lock_missing_1_subprog(struct __sk_buff *ctx)
}
SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
int preempt_lock_missing_2_subprog(struct __sk_buff *ctx)
{
preempt_disable();
@@ -72,7 +72,7 @@ int preempt_lock_missing_2_subprog(struct __sk_buff *ctx)
}
SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
int preempt_lock_missing_2_minus_1_subprog(struct __sk_buff *ctx)
{
preempt_disable();
diff --git a/tools/testing/selftests/bpf/progs/verifier_spin_lock.c b/tools/testing/selftests/bpf/progs/verifier_spin_lock.c
index 3f679de73229f..25599eac9a702 100644
--- a/tools/testing/selftests/bpf/progs/verifier_spin_lock.c
+++ b/tools/testing/selftests/bpf/progs/verifier_spin_lock.c
@@ -187,7 +187,7 @@ l0_%=: r6 = r0; \
SEC("cgroup/skb")
__description("spin_lock: test6 missing unlock")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_spin_lock-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_spin_lock-ed region")
__failure_unpriv __msg_unpriv("")
__naked void spin_lock_test6_missing_unlock(void)
{
--
2.39.5
The fixed commit adds NETIF_F_GSO_ESP bit for bonding gso_partial_features.
However, if we don't set the dev NETIF_F_GSO_PARTIAL bit, the later
netdev_change_features() -> netdev_fix_features() will remove the
NETIF_F_GSO_ESP bit from the dev features. This causes ethtool to show
that the bond does not support tx-esp-segmentation. For example
# ethtool -k bond0 | grep esp
tx-esp-segmentation: off [requested on]
esp-hw-offload: on
esp-tx-csum-hw-offload: on
Add the NETIF_F_GSO_PARTIAL bit to bond dev features when set
gso_partial_features to fix this issue.
Fixes: 4861333b4217 ("bonding: add ESP offload features when slaves support")
Reported-by: Liang Li <liali(a)redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
v2: remove NETIF_F_GSO_PARTIAL bit if not set gso_partial_features.
The issue is reported internally, so there is no Closes tag.
BTW, I saw some drivers set NETIF_F_GSO_PARTIAL on dev->features. Some
other drivers set NETIF_F_GSO_PARTIAL on dev->hw_enc_features. I haven't
see a doc about where we should set. So I just set it on dev->features.
---
drivers/net/bonding/bond_main.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 7b78c2bada81..09d5a8433d86 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1598,10 +1598,13 @@ static void bond_compute_features(struct bonding *bond)
}
bond_dev->hard_header_len = max_hard_header_len;
- if (gso_partial_features & NETIF_F_GSO_ESP)
+ if (gso_partial_features & NETIF_F_GSO_ESP) {
bond_dev->gso_partial_features |= NETIF_F_GSO_ESP;
- else
+ bond_dev->features |= NETIF_F_GSO_PARTIAL;
+ } else {
bond_dev->gso_partial_features &= ~NETIF_F_GSO_ESP;
+ bond_dev->features &= ~NETIF_F_GSO_PARTIAL;
+ }
done:
bond_dev->vlan_features = vlan_features;
--
2.39.5 (Apple Git-154)
Recent change in how get_user() handles pointers [1] has a specific case
for LAM. It assigns a different bitmask that's later used to check
whether a pointer comes from userland in get_user().
While currently commented out (until LASS [2] is merged into the kernel)
it's worth making changes to the LAM selftest ahead of time.
Modify cpu_has_la57() so it provides current paging level information
instead of the cpuid one.
Add test case to LAM that utilizes a ioctl (FIOASYNC) syscall which uses
get_user() in its implementation. Execute the syscall with differently
tagged pointers to verify that valid user pointers are passing through
and invalid kernel/non-canonical pointers are not.
Also to avoid unhelpful test failures add a check in main() to skip
running tests if LAM was not compiled into the kernel.
Code was tested on a Sierra Forest Xeon machine that's LAM capable. The
test was ran without issues with both the LAM lines from [1] untouched
and commented out. The test was also ran without issues with LAM_SUP
both enabled and disabled.
4/5 level pagetables code paths were also successfully tested in Simics
on a 5-level capable machine.
[1] https://lore.kernel.org/all/20241024013214.129639-1-torvalds@linux-foundati…
[2] https://lore.kernel.org/all/20241028160917.1380714-1-alexander.shishkin@lin…
Maciej Wieczor-Retman (3):
selftests/lam: Move cpu_has_la57() to use cpuinfo flag
selftests/lam: Skip test if LAM is disabled
selftests/lam: Test get_user() LAM pointer handling
tools/testing/selftests/x86/lam.c | 120 ++++++++++++++++++++++++++++--
1 file changed, 115 insertions(+), 5 deletions(-)
--
2.47.1
The current implementation of netconsole sends all log messages in
parallel, which can lead to an intermixed and interleaved output on the
receiving side. This makes it challenging to demultiplex the messages
and attribute them to their originating CPUs.
As a result, users and developers often struggle to effectively analyze
and debug the parallel log output received through netconsole.
Example of a message got from produciton hosts:
------------[ cut here ]------------
------------[ cut here ]------------
refcount_t: saturated; leaking memory.
WARNING: CPU: 2 PID: 1613668 at lib/refcount.c:22 refcount_warn_saturate+0x5e/0xe0
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 26 PID: 4139916 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0xe0
Modules linked in: bpf_preload(E) vhost_net(E) tun(E) vhost(E)
This series of patches introduces a new feature to the netconsole
subsystem that allows the automatic population of the CPU number in the
userdata field for each log message. This enhancement provides several
benefits:
* Improved demultiplexing of parallel log output: When multiple CPUs are
sending messages concurrently, the added CPU number in the userdata
makes it easier to differentiate and attribute the messages to their
originating CPUs.
* Better visibility into message sources: The CPU number information
gives users and developers more insight into which specific CPU a
particular log message came from, which can be valuable for debugging
and analysis.
The changes in this series are as follows:
Patch 1: netconsole: Rename userdata to extradata
=================================================
Create the a concept of extradata, which encompasses the concept of
userdata and the upcoming sysdatao
Sysdata is a new concept being added, which is basically fields that are
populated by the kernel. At this time only the CPU#, but, there is a
desire to add current task name, kernel release version, etc.
Patch 2: netconsole: Helper to count number of used entries
===========================================================
Create a simple helper to count number of entries in extradata. I am
separating this in a function since it will need to count userdata and
sysdata. For instance, when the user adds an extra userdata, we need to
check if there is space, counting the previous data entries (from
userdata and cpu data)
Patch 3: netconsole: add support for sysdata and CPU population
===============================================================
This is the core patch. Basically add a new option to enable automatic
CPU number population in the netconsole userdata Provides a new "cpu_nr"
sysfs attribute to control this feature
Patch 4: "netconsole: selftest: test CPU number auto-population"
=============================================================
Expands the existing netconsole selftest to verify the CPU number
auto-population functionality Ensures the received netconsole messages
contain the expected "cpu=<CPU>" entry in the message. Test different
permutation with userdata
Patch 5: "netconsole: docs: Add documentation for CPU number auto-population"
=============================================================================
Updates the netconsole documentation to explain the new CPU number
auto-population feature Provides instructions on how to enable and use
the feature
I believe these changes will be a valuable addition to the netconsole
subsystem, enhancing its usefulness for kernel developers and users.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v2:
- Create the concept of extradata and sysdata. This will make the design
easier to understand, and the code easier to read.
* Basically extradata encompasses userdata and the new sysdata.
Userdata originates from user, and sysdata originates in kernel.
- Improved the test to send from a very specific CPU, which can be
checked to be correct on the other side, as suggested by Jakub.
- Fixed a bug where CPU # was populated at the wrong place
- Link to v1: https://lore.kernel.org/r/20241113-netcon_cpu-v1-0-d187bf7c0321@debian.org
---
Breno Leitao (5):
netconsole: Rename userdata to extradata
netconsole: Helper to count number of used entries
netconsole: add support for sysdata and CPU population
netconsole: selftest: test for sysdata CPU
netconsole: docs: Add documentation for CPU number auto-population
Documentation/networking/netconsole.rst | 45 +++++
drivers/net/netconsole.c | 223 ++++++++++++++++-----
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 17 ++
.../selftests/drivers/net/netcons_sysdata.sh | 166 +++++++++++++++
5 files changed, 407 insertions(+), 45 deletions(-)
---
base-commit: 7b24f164cf005b9649138ef6de94aaac49c9f3d1
change-id: 20241108-netcon_cpu-ce3917e88f4b
Best regards,
--
Breno Leitao <leitao(a)debian.org>
From: "Mike Rapoport (Microsoft)" <rppt(a)kernel.org>
Hi,
Following Peter's comments [1] these patches rework handling of ROX caches
for module text allocations.
Instead of using a writable copy that really complicates alternatives
patching, temporarily remap parts of a large ROX page as RW for the time of
module formation and then restore it's ROX protections when the module is
ready.
To keep the ROX memory mapped with large pages, make set_memory_rox()
capable of restoring large pages (more details are in patch 3).
Since this is really about x86, I believe this should go in via tip tree.
The patches also available in git
https://git.kernel.org/rppt/h/execmem/x86-rox/v9
v2 changes:
* only collapse large mappings in set_memory_rox()
* simplify RW <-> ROX remapping
* don't remove ROX cache pages from the direct map (patch 4)
v1: https://lore.kernel.org/all/20241227072825.1288491-1-rppt@kernel.org
[1] https://lore.kernel.org/all/20241209083818.GK8562@noisy.programming.kicks-a…
Kirill A. Shutemov (1):
x86/mm/pat: restore large ROX pages after fragmentation
Mike Rapoport (Microsoft) (9):
x86/mm/pat: cpa-test: fix length for CPA_ARRAY test
x86/mm/pat: drop duplicate variable in cpa_flush()
execmem: don't remove ROX cache from the direct map
execmem: add API for temporal remapping as RW and restoring ROX afterwards
module: introduce MODULE_STATE_GONE
module: switch to execmem API for remapping as RW and restoring ROX
Revert "x86/module: prepare module loading for ROX allocations of text"
module: drop unused module_writable_address()
x86: re-enable EXECMEM_ROX support
arch/um/kernel/um_arch.c | 11 +-
arch/x86/Kconfig | 1 +
arch/x86/entry/vdso/vma.c | 3 +-
arch/x86/include/asm/alternative.h | 14 +-
arch/x86/include/asm/pgtable_types.h | 2 +
arch/x86/kernel/alternative.c | 181 ++++++--------
arch/x86/kernel/ftrace.c | 30 ++-
arch/x86/kernel/module.c | 45 ++--
arch/x86/mm/pat/cpa-test.c | 2 +-
arch/x86/mm/pat/set_memory.c | 220 +++++++++++++++++-
include/linux/execmem.h | 31 +++
include/linux/module.h | 22 +-
include/linux/moduleloader.h | 4 -
include/linux/vm_event_item.h | 2 +
kernel/module/kallsyms.c | 8 +-
kernel/module/kdb.c | 2 +-
kernel/module/main.c | 86 ++-----
kernel/module/procfs.c | 2 +-
kernel/module/strict_rwx.c | 9 +-
kernel/tracepoint.c | 2 +
lib/kunit/test.c | 2 +
mm/execmem.c | 39 ++--
mm/vmstat.c | 2 +
samples/livepatch/livepatch-callbacks-demo.c | 1 +
.../test_modules/test_klp_callbacks_demo.c | 1 +
.../test_modules/test_klp_callbacks_demo2.c | 1 +
.../livepatch/test_modules/test_klp_state.c | 1 +
.../livepatch/test_modules/test_klp_state2.c | 1 +
28 files changed, 442 insertions(+), 283 deletions(-)
base-commit: ffd294d346d185b70e28b1a28abe367bbfe53c04
--
2.45.2
Hi all,
This patch series continues the work to migrate the *.sh tests into
prog_tests framework.
test_xdp_redirect_multi.sh tests the XDP redirections done through
bpf_redirect_map().
This is already partly covered by test_xdp_veth.c that already tests
map redirections at XDP level. What isn't covered yet by test_xdp_veth is
the use of the broadcast flags (BPF_F_BROADCAST or BPF_F_EXCLUDE_INGRESS)
and XDP egress programs.
Hence, this patch series add test cases to test_xdp_veth.c to get rid of
the test_xdp_redirect_multi.sh:
- PATCH 1 to 5 rework test_xdp_veth to make it more generic and allow to
configure different test cases
- PATCH 6 adds test cases for 'classic' bpf_redirect_map()
- PATCH 7 & 8 covers the broadcast flags
- PATCH 9 covers the XDP egress programs
- PATCH 10 removes test_xdp_redirect_multi.sh
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
---
Changes in v2:
- Use serial_test_* to avoid conflict between tests
- Link to v1: https://lore.kernel.org/r/20250121-redirect-multi-v1-0-b215e35ff505@bootlin…
---
Bastien Curutchet (eBPF Foundation) (10):
selftests/bpf: test_xdp_veth: Split network configuration
selftests/bpf: Remove unused argument
selftests/bpf: test_xdp_veth: Rename config[]
selftests/bpf: test_xdp_veth: Add prog_config[] table
selftests/bpf: test_xdp_veth: Add XDP flags to prog_configuration
selftests/bpf: test_xdp_veth: Add new test cases for XDP flags
selftests/bpf: Optionally select broadcasting flags
selftests/bpf: test_xdp_veth: Add XDP broadcast redirection tests
selftests/bpf: test_xdp_veth: Add XDP program on egress test
selftests/bpf: Remove test_xdp_redirect_multi.sh
tools/testing/selftests/bpf/Makefile | 2 -
.../selftests/bpf/prog_tests/test_xdp_veth.c | 534 +++++++++++++++++----
.../testing/selftests/bpf/progs/xdp_redirect_map.c | 89 ++++
.../selftests/bpf/progs/xdp_redirect_multi_kern.c | 41 +-
.../selftests/bpf/test_xdp_redirect_multi.sh | 214 ---------
tools/testing/selftests/bpf/xdp_redirect_multi.c | 226 ---------
6 files changed, 553 insertions(+), 553 deletions(-)
---
base-commit: 349e0551b929b4712b4d6127f67dfa41ed48d5a2
change-id: 20250103-redirect-multi-245d6eafb5d1
Best regards,
--
Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>