test_memcg_sock() currently requires that memory.stat's "sock " counter
is exactly zero immediately after the TCP server exits. On a busy system
this assumption is too strict:
- Socket memory may be freed with a small delay (e.g. RCU callbacks).
- memcg statistics are updated asynchronously via the rstat flushing
worker, so the "sock " value in memory.stat can stay non-zero for a
short period of time even after all socket memory has been uncharged.
As a result, test_memcg_sock() can intermittently fail even though socket
memory accounting is working correctly.
Make the test more robust by polling memory.stat for the "sock " counter
and allowing it some time to drop to zero instead of checking it only
once. If the counter does not become zero within the timeout, the test
still fails as before.
On my test system, running test_memcontrol 50 times produced:
- Before this patch: 6/50 runs passed.
- After this patch: 50/50 runs passed.
Signed-off-by: Guopeng Zhang <zhangguopeng(a)kylinos.cn>
---
.../selftests/cgroup/test_memcontrol.c | 24 ++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
index 4e1647568c5b..86d9981cddd8 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -1384,6 +1384,8 @@ static int test_memcg_sock(const char *root)
int bind_retries = 5, ret = KSFT_FAIL, pid, err;
unsigned short port;
char *memcg;
+ long sock_post = -1;
+ int i, retries = 30;
memcg = cg_name(root, "memcg_test");
if (!memcg)
@@ -1432,7 +1434,27 @@ static int test_memcg_sock(const char *root)
if (cg_read_long(memcg, "memory.current") < 0)
goto cleanup;
- if (cg_read_key_long(memcg, "memory.stat", "sock "))
+ /*
+ * memory.stat is updated asynchronously via the memcg rstat
+ * flushing worker, so the "sock " counter may stay non-zero
+ * for a short period of time after the TCP connection is
+ * closed and all socket memory has been uncharged.
+ *
+ * Poll memory.stat for up to 3 seconds and require that the
+ * "sock " counter eventually drops to zero.
+ */
+ for (i = 0; i < retries; i++) {
+ sock_post = cg_read_key_long(memcg, "memory.stat", "sock ");
+ if (sock_post < 0)
+ goto cleanup;
+
+ if (!sock_post)
+ break;
+
+ usleep(100 * 1000); /* 100ms */
+ }
+
+ if (sock_post)
goto cleanup;
ret = KSFT_PASS;
--
2.25.1
Here are various unrelated fixes:
- Patch 1: Fix window space computation for fallback connections which
can affect ACK generation. A fix for v5.11.
- Patch 2: Avoid unneeded subflow-level drops due to unsynced received
window. A fix for v5.11.
- Patch 3: Avoid premature close for fallback connections with PREEMPT
kernels. A fix for v5.12.
- Patch 4: Reset instead of fallback in case of data in the MPTCP
out-of-order queue. A fix for v5.7.
- Patches 5-7: Avoid also sending "plain" TCP reset when closing with an
MP_FASTCLOSE. A fix for v6.1.
- Patches 8-9: Longer timeout for background connections in MPTCP Join
selftests. An additional fix for recent patches for v5.13/v6.1.
- Patches 10-11: Fix typo in a check introduce in a recent refactoring.
A fix for v6.15.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Gang Yan (2):
mptcp: fix address removal logic in mptcp_pm_nl_rm_addr
selftests: mptcp: add a check for 'add_addr_accepted'
Matthieu Baerts (NGI0) (3):
selftests: mptcp: join: fastclose: remove flaky marks
selftests: mptcp: join: endpoints: longer timeout
selftests: mptcp: join: userspace: longer timeout
Paolo Abeni (6):
mptcp: fix ack generation for fallback msk
mptcp: avoid unneeded subflow-level drops
mptcp: fix premature close in case of fallback
mptcp: do not fallback when OoO is present
mptcp: decouple mptcp fastclose from tcp close
mptcp: fix duplicate reset on fastclose
net/mptcp/options.c | 54 +++++++++++++++++++++-
net/mptcp/pm_kernel.c | 2 +-
net/mptcp/protocol.c | 59 +++++++++++++++++--------
net/mptcp/protocol.h | 3 +-
tools/testing/selftests/net/mptcp/mptcp_join.sh | 27 ++++++-----
5 files changed, 113 insertions(+), 32 deletions(-)
---
base-commit: 8e0a754b0836d996802713bbebc87bc1cc17925c
change-id: 20251117-net-mptcp-misc-fixes-6-18-rc6-835d94cdc095
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
The current netconsole implementation allocates a static buffer for
extradata (userdata + sysdata) with a fixed size of
MAX_EXTRADATA_ENTRY_LEN * MAX_EXTRADATA_ITEMS bytes for every target,
regardless of whether userspace actually uses this feature. This forces
us to keep MAX_EXTRADATA_ITEMS small (16), which is restrictive for
users who need to attach more metadata to their log messages.
This patch series enables dynamic allocation of the userdata buffer,
allowing it to grow on-demand based on actual usage. The series:
1. Refactors send_fragmented_body() to simplify handling of separated
userdata and sysdata (patch 1/4)
2. Splits userdata and sysdata into separate buffers (patch 2/4)
3. Implements dynamic allocation for the userdata buffer (patch 3/4)
4. Increases MAX_USERDATA_ITEMS from 16 to 256 now that we can do so
without memory waste (patch 4/4)
Benefits:
- No memory waste when userdata is not used
- Targets that use userdata only consume what they need
- Users can attach significantly more metadata without impacting systems
that don't use this feature
Signed-off-by: Gustavo Luiz Duarte <gustavold(a)gmail.com>
---
Changes in v2:
- Added null pointer checks for userdata and sysdata buffers
- Added MAX_SYSDATA_ITEMS to enum sysdata_feature
- Moved code out of ifdef in send_msg_no_fragmentation()
- Renamed variables in send_fragmented_body() to make it easier to
reason about the code
- Link to v1: https://lore.kernel.org/r/20251105-netconsole_dynamic_extradata-v1-0-142890…
---
Gustavo Luiz Duarte (4):
netconsole: Simplify send_fragmented_body()
netconsole: Split userdata and sysdata
netconsole: Dynamic allocation of userdata buffer
netconsole: Increase MAX_USERDATA_ITEMS
drivers/net/netconsole.c | 370 ++++++++++-----------
.../selftests/drivers/net/netcons_overflow.sh | 2 +-
2 files changed, 179 insertions(+), 193 deletions(-)
---
base-commit: 68fa5b092efab37a4f08a47b22bb8ca98f7f6223
change-id: 20251007-netconsole_dynamic_extradata-21bd9d726568
Best regards,
--
Gustavo Duarte <gustavold(a)meta.com>
From: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Since the ftrace fprobe is both fgraph and ftrace based implemented,
the selftest needs to be updated. This does not count the actual
number of lines, but just check the differences.
Signed-off-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
---
.../ftrace/test.d/dynevent/add_remove_fprobe.tc | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc
index 2506f464811b..47067a5e3cb0 100644
--- a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_fprobe.tc
@@ -28,25 +28,21 @@ test -d events/fprobes/myevent1
test -d events/fprobes/myevent2
echo 1 > events/fprobes/myevent1/enable
-# Make sure the event is attached and is the only one
+# Make sure the event is attached.
grep -q $PLACE enabled_functions
cnt=`cat enabled_functions | wc -l`
-if [ $cnt -ne $((ocnt + 1)) ]; then
+if [ $cnt -eq $ocnt ]; then
exit_fail
fi
echo 1 > events/fprobes/myevent2/enable
-# It should till be the only attached function
-cnt=`cat enabled_functions | wc -l`
-if [ $cnt -ne $((ocnt + 1)) ]; then
- exit_fail
-fi
+cnt2=`cat enabled_functions | wc -l`
echo 1 > events/fprobes/myevent3/enable
# If the function is different, the attached function should be increased
grep -q $PLACE2 enabled_functions
cnt=`cat enabled_functions | wc -l`
-if [ $cnt -ne $((ocnt + 2)) ]; then
+if [ $cnt -eq $cnt2 ]; then
exit_fail
fi
@@ -56,12 +52,6 @@ echo "-:myevent2" >> dynamic_events
grep -q myevent1 dynamic_events
! grep -q myevent2 dynamic_events
-# should still have 2 left
-cnt=`cat enabled_functions | wc -l`
-if [ $cnt -ne $((ocnt + 2)) ]; then
- exit_fail
-fi
-
echo 0 > events/fprobes/enable
echo > dynamic_events
vgic_lpi_stress sends MAPTI and MAPC commands during guest GIC
setup to map interrupt events to ITT entries and collection IDs
to redistributors, respectively.
Theoretically, we have no guarantee that the ITS will
finish handling these mapping commands before the selftest
calls KVM_SIGNAL_MSI to inject LPIs to the guest. If LPIs
are injected before ITS mapping completes, the ITS cannot
properly pass the interrupt on to the redistributor.
In practice, KVM processes ITS commands synchronously, so
SYNC calls are functionally unnecessary and ignored in
vgic_its_handle_command().
However, selftests should test based on ARM specification and
be blind to KVM-specific implementation optimizations. Thus,
we must update the test to be architecturally compliant and
logically correct.
Fix by adding a SYNC command to the selftests ITS library,
then calling SYNC after ITS mapping to ensure mapping
completes before signal_lpi() writes to GITS_TRANSLATER.
This patch depends on commit a24f7afce048 ("KVM: selftests:
fix MAPC RDbase target formatting in vgic_lpi_stress"), which
is queued in kvmarm/fixes.
Signed-off-by: Maximilian Dittgen <mdittgen(a)amazon.de>
---
Validated by the following debug logging to the GITS_CMD_SYNC handler
in vgic_its_handle_command():
kvm_info("ITS SYNC command: %016llx %016llx %016llx %016llx\n",
its_cmd[0], its_cmd[1], its_cmd[2], its_cmd[3]);
Initialized a selftest guest with 4 vCPUs by:
./vgic_lpi_stress -v 4
Confirmed that an ITS SYNC was successfully called for all 4 vCPUs:
kvm [5094]: ITS SYNC command: 0000000000000005 0000000000000000 0000000000000000 0000000000000000
kvm [5094]: ITS SYNC command: 0000000000000005 0000000000000000 0000000000010000 0000000000000000
kvm [5094]: ITS SYNC command: 0000000000000005 0000000000000000 0000000000020000 0000000000000000
kvm [5094]: ITS SYNC command: 0000000000000005 0000000000000000 0000000000030000 0000000000000000
---
tools/testing/selftests/kvm/arm64/vgic_lpi_stress.c | 4 ++++
.../testing/selftests/kvm/include/arm64/gic_v3_its.h | 1 +
tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c | 11 +++++++++++
3 files changed, 16 insertions(+)
diff --git a/tools/testing/selftests/kvm/arm64/vgic_lpi_stress.c b/tools/testing/selftests/kvm/arm64/vgic_lpi_stress.c
index 687d04463983..e857a605f577 100644
--- a/tools/testing/selftests/kvm/arm64/vgic_lpi_stress.c
+++ b/tools/testing/selftests/kvm/arm64/vgic_lpi_stress.c
@@ -118,6 +118,10 @@ static void guest_setup_gic(void)
guest_setup_its_mappings();
guest_invalidate_all_rdists();
+
+ /* SYNC to ensure ITS setup is complete */
+ for (cpuid = 0; cpuid < test_data.nr_cpus; cpuid++)
+ its_send_sync_cmd(test_data.cmdq_base_va, cpuid);
}
static void guest_code(size_t nr_lpis)
diff --git a/tools/testing/selftests/kvm/include/arm64/gic_v3_its.h b/tools/testing/selftests/kvm/include/arm64/gic_v3_its.h
index 3722ed9c8f96..58feef3eb386 100644
--- a/tools/testing/selftests/kvm/include/arm64/gic_v3_its.h
+++ b/tools/testing/selftests/kvm/include/arm64/gic_v3_its.h
@@ -15,5 +15,6 @@ void its_send_mapc_cmd(void *cmdq_base, u32 vcpu_id, u32 collection_id, bool val
void its_send_mapti_cmd(void *cmdq_base, u32 device_id, u32 event_id,
u32 collection_id, u32 intid);
void its_send_invall_cmd(void *cmdq_base, u32 collection_id);
+void its_send_sync_cmd(void *cmdq_base, u32 vcpu_id);
#endif // __SELFTESTS_GIC_V3_ITS_H__
diff --git a/tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c b/tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c
index 0e2f8ed90f30..d9ee331074ea 100644
--- a/tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c
+++ b/tools/testing/selftests/kvm/lib/arm64/gic_v3_its.c
@@ -253,3 +253,14 @@ void its_send_invall_cmd(void *cmdq_base, u32 collection_id)
its_send_cmd(cmdq_base, &cmd);
}
+
+void its_send_sync_cmd(void *cmdq_base, u32 vcpu_id)
+{
+ struct its_cmd_block cmd = {};
+
+ its_encode_cmd(&cmd, GITS_CMD_SYNC);
+ its_encode_target(&cmd, procnum_to_rdbase(vcpu_id));
+
+ its_send_cmd(cmdq_base, &cmd);
+}
+
--
2.50.1 (Apple Git-155)
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Christof Hellmis
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
The printf statement attempts to print the DMA direction string using
the syntax 'dir[directions]', which is an invalid array access. The
variable 'dir' is an integer, and 'directions' is a char pointer array.
This incorrect syntax should be 'directions[dir]', using 'dir' as the
index into the 'directions' array. Fix this by correcting the array
access from 'dir[directions]' to 'directions[dir]'.
Signed-off-by: Zhang Chujun <zhangchujun(a)cmss.chinamobile.com>
diff --git a/tools/testing/selftests/dma/dma_map_benchmark.c b/tools/testing/selftests/dma/dma_map_benchmark.c
index b12f1f9babf8..b925756373ce 100644
--- a/tools/testing/selftests/dma/dma_map_benchmark.c
+++ b/tools/testing/selftests/dma/dma_map_benchmark.c
@@ -118,7 +118,7 @@ int main(int argc, char **argv)
}
printf("dma mapping benchmark: threads:%d seconds:%d node:%d dir:%s granule: %d\n",
- threads, seconds, node, dir[directions], granule);
+ threads, seconds, node, directions[dir], granule);
printf("average map latency(us):%.1f standard deviation:%.1f\n",
map.avg_map_100ns/10.0, map.map_stddev/10.0);
printf("average unmap latency(us):%.1f standard deviation:%.1f\n",
--
2.50.1.windows.1
Parsing KTAP is quite an inconvenience, but most of the time the thing
you really want to know is "did anything fail"?
Let's give the user the his information without them needing
to parse anything.
Because of the use of subshells and namespaces, this needs to be
communicated via a file. Just write arbitrary data into the file and
treat non-empty content as a signal that something failed.
In case any user depends on the current behaviour, such as running this
from a script with `set -e` and parsing the result for failures
afterwards, add a flag they can set to get the old behaviour, namely
--no-error-on-fail.
Signed-off-by: Brendan Jackman <jackmanb(a)google.com>
---
Changes in v3:
- Fixed quoting
- Link to v2: https://lore.kernel.org/r/20251014-b4-ksft-error-on-fail-v2-1-b3e2657237b8@…
Changes in v2:
- Fixed bug in report_failure()
- Made error-on-fail the default
- Link to v1: https://lore.kernel.org/r/20251007-b4-ksft-error-on-fail-v1-1-71bf058f5662@…
---
tools/testing/selftests/kselftest/runner.sh | 14 ++++++++++----
tools/testing/selftests/run_kselftest.sh | 14 ++++++++++++++
2 files changed, 24 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh
index 2c3c58e65a419f5ee8d7dc51a37671237a07fa0b..3a62039fa6217f3453423ff011575d0a1eb8c275 100644
--- a/tools/testing/selftests/kselftest/runner.sh
+++ b/tools/testing/selftests/kselftest/runner.sh
@@ -44,6 +44,12 @@ tap_timeout()
fi
}
+report_failure()
+{
+ echo "not ok $*"
+ echo "$*" >> "$kselftest_failures_file"
+}
+
run_one()
{
DIR="$1"
@@ -105,7 +111,7 @@ run_one()
echo "# $TEST_HDR_MSG"
if [ ! -e "$TEST" ]; then
echo "# Warning: file $TEST is missing!"
- echo "not ok $test_num $TEST_HDR_MSG"
+ report_failure "$test_num $TEST_HDR_MSG"
else
if [ -x /usr/bin/stdbuf ]; then
stdbuf="/usr/bin/stdbuf --output=L "
@@ -123,7 +129,7 @@ run_one()
interpreter=$(head -n 1 "$TEST" | cut -c 3-)
cmd="$stdbuf $interpreter ./$BASENAME_TEST"
else
- echo "not ok $test_num $TEST_HDR_MSG"
+ report_failure "$test_num $TEST_HDR_MSG"
return
fi
fi
@@ -137,9 +143,9 @@ run_one()
echo "ok $test_num $TEST_HDR_MSG # SKIP"
elif [ $rc -eq $timeout_rc ]; then \
echo "#"
- echo "not ok $test_num $TEST_HDR_MSG # TIMEOUT $kselftest_timeout seconds"
+ report_failure "$test_num $TEST_HDR_MSG # TIMEOUT $kselftest_timeout seconds"
else
- echo "not ok $test_num $TEST_HDR_MSG # exit=$rc"
+ report_failure "$test_num $TEST_HDR_MSG # exit=$rc"
fi)
cd - >/dev/null
fi
diff --git a/tools/testing/selftests/run_kselftest.sh b/tools/testing/selftests/run_kselftest.sh
index 0443beacf3621ae36cb12ffd57f696ddef3526b5..d4be97498b32e975c63a1167d3060bdeba674c8c 100755
--- a/tools/testing/selftests/run_kselftest.sh
+++ b/tools/testing/selftests/run_kselftest.sh
@@ -33,6 +33,7 @@ Usage: $0 [OPTIONS]
-c | --collection COLLECTION Run all tests from COLLECTION
-l | --list List the available collection:test entries
-d | --dry-run Don't actually run any tests
+ -f | --no-error-on-fail Don't exit with an error just because tests failed
-n | --netns Run each test in namespace
-h | --help Show this usage info
-o | --override-timeout Number of seconds after which we timeout
@@ -44,6 +45,7 @@ COLLECTIONS=""
TESTS=""
dryrun=""
kselftest_override_timeout=""
+ERROR_ON_FAIL=true
while true; do
case "$1" in
-s | --summary)
@@ -65,6 +67,9 @@ while true; do
-d | --dry-run)
dryrun="echo"
shift ;;
+ -f | --no-error-on-fail)
+ ERROR_ON_FAIL=false
+ shift ;;
-n | --netns)
RUN_IN_NETNS=1
shift ;;
@@ -105,9 +110,18 @@ if [ -n "$TESTS" ]; then
available="$(echo "$valid" | sed -e 's/ /\n/g')"
fi
+kselftest_failures_file="$(mktemp --tmpdir kselftest-failures-XXXXXX)"
+export kselftest_failures_file
+
collections=$(echo "$available" | cut -d: -f1 | sort | uniq)
for collection in $collections ; do
[ -w /dev/kmsg ] && echo "kselftest: Running tests in $collection" >> /dev/kmsg
tests=$(echo "$available" | grep "^$collection:" | cut -d: -f2)
($dryrun cd "$collection" && $dryrun run_many $tests)
done
+
+failures="$(cat "$kselftest_failures_file")"
+rm "$kselftest_failures_file"
+if "$ERROR_ON_FAIL" && [ "$failures" ]; then
+ exit 1
+fi
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20251007-b4-ksft-error-on-fail-0c2cb3246041
Best regards,
--
Brendan Jackman <jackmanb(a)google.com>
This series adds support for tests that use multiple devices, and adds
one new test, vfio_pci_device_init_perf_test, which measures parallel
device initialization time to demonstrate the improvement from commit
e908f58b6beb ("vfio/pci: Separate SR-IOV VF dev_set").
This series also breaks apart the monolithic vfio_util.h and
vfio_pci_device.c into separate files, to account for all the new code.
This required quite a bit of code motion so the diffstat looks large.
The final layout is more granular and provides a better separation of
the IOMMU code from the device code.
Final layout:
C files:
- tools/testing/selftests/vfio/lib/iommu.c
- tools/testing/selftests/vfio/lib/iova_allocator.c
- tools/testing/selftests/vfio/lib/libvfio.c
- tools/testing/selftests/vfio/lib/vfio_pci_device.c
- tools/testing/selftests/vfio/lib/vfio_pci_driver.c
H files:
- tools/testing/selftests/vfio/lib/include/libvfio.h
- tools/testing/selftests/vfio/lib/include/libvfio/assert.h
- tools/testing/selftests/vfio/lib/include/libvfio/iommu.h
- tools/testing/selftests/vfio/lib/include/libvfio/iova_allocator.h
- tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
- tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_driver.h
Notably, vfio_util.h is now gone and replaced with libvfio.h.
This series is based on vfio/next plus Alex Mastro's series to add the
IOVA allocator [1]. It should apply cleanly to vfio/next once Alex's
series is merged into 6.18 and then into vfio/next.
This series can be found on GitHub:
https://github.com/dmatlack/linux/tree/vfio/selftests/init_perf_test/v2
[1] https://lore.kernel.org/kvm/20251111-iova-ranges-v3-0-7960244642c5@fb.com/
Cc: Alex Mastro <amastro(a)fb.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Josh Hilke <jrhilke(a)google.com>
Cc: Raghavendra Rao Ananta <rananta(a)google.com>
Cc: Vipin Sharma <vipinsh(a)google.com>
v2:
- Require tests to call iommu_init() and manage struct iommu objects
rather than implicitly doing it in vfio_pci_device_init().
- Drop all the device wrappers for IOMMU methods and require tests to
interact with the iommu_*() helper functions directly.
- Add a commit to eliminate INVALID_IOVA. This is a simple cleanup I've
been meaning to make.
- Upgrade some driver logging to error (Raghavendra)
- Remove plurality from helper function that fetches BDF from
environment variable (Raghavendra)
- Fix cleanup.sh to only delete the device directory when cleaning up
all devices (Raghavendra)
v1: https://lore.kernel.org/kvm/20251008232531.1152035-1-dmatlack@google.com/
David Matlack (18):
vfio: selftests: Move run.sh into scripts directory
vfio: selftests: Split run.sh into separate scripts
vfio: selftests: Allow passing multiple BDFs on the command line
vfio: selftests: Rename struct vfio_iommu_mode to iommu_mode
vfio: selftests: Introduce struct iommu
vfio: selftests: Support multiple devices in the same
container/iommufd
vfio: selftests: Eliminate overly chatty logging
vfio: selftests: Prefix logs with device BDF where relevant
vfio: selftests: Upgrade driver logging to dev_err()
vfio: selftests: Rename struct vfio_dma_region to dma_region
vfio: selftests: Move IOMMU library code into iommu.c
vfio: selftests: Move IOVA allocator into iova_allocator.c
vfio: selftests: Stop passing device for IOMMU operations
vfio: selftests: Rename vfio_util.h to libvfio.h
vfio: selftests: Move vfio_selftests_*() helpers into libvfio.c
vfio: selftests: Split libvfio.h into separate header files
vfio: selftests: Eliminate INVALID_IOVA
vfio: selftests: Add vfio_pci_device_init_perf_test
tools/testing/selftests/vfio/Makefile | 9 +-
.../selftests/vfio/lib/drivers/dsa/dsa.c | 36 +-
.../selftests/vfio/lib/drivers/ioat/ioat.c | 18 +-
.../selftests/vfio/lib/include/libvfio.h | 26 +
.../vfio/lib/include/libvfio/assert.h | 54 ++
.../vfio/lib/include/libvfio/iommu.h | 76 +++
.../vfio/lib/include/libvfio/iova_allocator.h | 23 +
.../lib/include/libvfio/vfio_pci_device.h | 125 ++++
.../lib/include/libvfio/vfio_pci_driver.h | 97 +++
.../selftests/vfio/lib/include/vfio_util.h | 331 -----------
tools/testing/selftests/vfio/lib/iommu.c | 465 +++++++++++++++
.../selftests/vfio/lib/iova_allocator.c | 94 +++
tools/testing/selftests/vfio/lib/libvfio.c | 78 +++
tools/testing/selftests/vfio/lib/libvfio.mk | 5 +-
.../selftests/vfio/lib/vfio_pci_device.c | 555 +-----------------
.../selftests/vfio/lib/vfio_pci_driver.c | 16 +-
tools/testing/selftests/vfio/run.sh | 109 ----
.../testing/selftests/vfio/scripts/cleanup.sh | 41 ++
tools/testing/selftests/vfio/scripts/lib.sh | 42 ++
tools/testing/selftests/vfio/scripts/run.sh | 16 +
tools/testing/selftests/vfio/scripts/setup.sh | 48 ++
.../selftests/vfio/vfio_dma_mapping_test.c | 46 +-
.../selftests/vfio/vfio_iommufd_setup_test.c | 2 +-
.../vfio/vfio_pci_device_init_perf_test.c | 167 ++++++
.../selftests/vfio/vfio_pci_device_test.c | 12 +-
.../selftests/vfio/vfio_pci_driver_test.c | 51 +-
26 files changed, 1479 insertions(+), 1063 deletions(-)
create mode 100644 tools/testing/selftests/vfio/lib/include/libvfio.h
create mode 100644 tools/testing/selftests/vfio/lib/include/libvfio/assert.h
create mode 100644 tools/testing/selftests/vfio/lib/include/libvfio/iommu.h
create mode 100644 tools/testing/selftests/vfio/lib/include/libvfio/iova_allocator.h
create mode 100644 tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
create mode 100644 tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_driver.h
delete mode 100644 tools/testing/selftests/vfio/lib/include/vfio_util.h
create mode 100644 tools/testing/selftests/vfio/lib/iommu.c
create mode 100644 tools/testing/selftests/vfio/lib/iova_allocator.c
create mode 100644 tools/testing/selftests/vfio/lib/libvfio.c
delete mode 100755 tools/testing/selftests/vfio/run.sh
create mode 100755 tools/testing/selftests/vfio/scripts/cleanup.sh
create mode 100755 tools/testing/selftests/vfio/scripts/lib.sh
create mode 100755 tools/testing/selftests/vfio/scripts/run.sh
create mode 100755 tools/testing/selftests/vfio/scripts/setup.sh
create mode 100644 tools/testing/selftests/vfio/vfio_pci_device_init_perf_test.c
base-commit: 0ed3a30fd996cb0cac872432cf25185fda7e5316
prerequisite-patch-id: dcf23dcc1198960bda3102eefaa21df60b2e4c54
prerequisite-patch-id: e32e56d5bf7b6c7dd40d737aa3521560407e00f5
prerequisite-patch-id: 4f79a41bf10a4c025ba5f433551b46035aa15878
prerequisite-patch-id: f903a45f0c32319138cd93a007646ab89132b18c
--
2.52.0.rc1.455.g30608eb744-goog
Main objective of this series is to convert the gro.sh and toeplitz.sh
tests to be "NIPA-compatible" - meaning make use of the Python env,
which lets us run the tests against either netdevsim or a real device.
The tests seem to have been written with a different flow in mind.
Namely they source different bash "setup" scripts depending on arguments
passed to the test. While I have nothing against the use of bash and
the overall architecture - the existing code needs quite a bit of work
(don't assume MAC/IP addresses, support remote endpoint over SSH).
If I'm the one fixing it, I'd rather convert them to our "simplistic"
Python.
This series rewrites the tests in Python while addressing their
shortcomings. The functionality of running the test over loopback
on a real device is retained but with a different method of invocation
(see the last patch).
Once again we are dealing with a script which run over a variety of
protocols (combination of [ipv4, ipv6, ipip] x [tcp, udp]). The first
4 patches add support for test variants to our scripts. We use the
term "variant" in the same sense as the C kselftest_harness.h -
variant is just a set of static input arguments.
Note that neither GRO nor the Toeplitz test fully passes for me on
any HW I have access to. But this is unrelated to the conversion.
This series is not making any real functional changes to the tests,
it is limited to improving the "test harness" scripts.
v2:
[patch 5] fix accidental modification of gitignore
[patch 8] fix typo in "compared"
[patch 9] fix typo I -> It
[patch 10] fix typoe configure -> configured
v1: https://lore.kernel.org/20251117205810.1617533-1-kuba@kernel.org
Jakub Kicinski (12):
selftests: net: py: coding style improvements
selftests: net: py: extract the case generation logic
selftests: net: py: add test variants
selftests: drv-net: xdp: use variants for qstat tests
selftests: net: relocate gro and toeplitz tests to drivers/net
selftests: net: py: support ksft ready without wait
selftests: net: py: read ip link info about remote dev
netdevsim: pass packets thru GRO on Rx
selftests: drv-net: add a Python version of the GRO test
selftests: drv-net: hw: convert the Toeplitz test to Python
netdevsim: add loopback support
selftests: net: remove old setup_* scripts
tools/testing/selftests/drivers/net/Makefile | 2 +
.../testing/selftests/drivers/net/hw/Makefile | 6 +-
tools/testing/selftests/net/Makefile | 7 -
tools/testing/selftests/net/lib/Makefile | 1 +
drivers/net/netdevsim/netdev.c | 26 ++-
.../testing/selftests/{ => drivers}/net/gro.c | 5 +-
.../{net => drivers/net/hw}/toeplitz.c | 7 +-
.../testing/selftests/drivers/net/.gitignore | 1 +
tools/testing/selftests/drivers/net/gro.py | 161 ++++++++++++++
.../selftests/drivers/net/hw/.gitignore | 1 +
.../drivers/net/hw/lib/py/__init__.py | 4 +-
.../selftests/drivers/net/hw/toeplitz.py | 208 ++++++++++++++++++
.../selftests/drivers/net/lib/py/__init__.py | 4 +-
.../selftests/drivers/net/lib/py/env.py | 2 +
tools/testing/selftests/drivers/net/xdp.py | 42 ++--
tools/testing/selftests/net/.gitignore | 2 -
tools/testing/selftests/net/gro.sh | 105 ---------
.../selftests/net/lib/ksft_setup_loopback.sh | 111 ++++++++++
.../testing/selftests/net/lib/py/__init__.py | 5 +-
tools/testing/selftests/net/lib/py/ksft.py | 93 ++++++--
tools/testing/selftests/net/lib/py/nsim.py | 2 +-
tools/testing/selftests/net/lib/py/utils.py | 20 +-
tools/testing/selftests/net/setup_loopback.sh | 120 ----------
tools/testing/selftests/net/setup_veth.sh | 45 ----
tools/testing/selftests/net/toeplitz.sh | 199 -----------------
.../testing/selftests/net/toeplitz_client.sh | 28 ---
26 files changed, 630 insertions(+), 577 deletions(-)
rename tools/testing/selftests/{ => drivers}/net/gro.c (99%)
rename tools/testing/selftests/{net => drivers/net/hw}/toeplitz.c (99%)
create mode 100755 tools/testing/selftests/drivers/net/gro.py
create mode 100755 tools/testing/selftests/drivers/net/hw/toeplitz.py
delete mode 100755 tools/testing/selftests/net/gro.sh
create mode 100755 tools/testing/selftests/net/lib/ksft_setup_loopback.sh
delete mode 100644 tools/testing/selftests/net/setup_loopback.sh
delete mode 100644 tools/testing/selftests/net/setup_veth.sh
delete mode 100755 tools/testing/selftests/net/toeplitz.sh
delete mode 100755 tools/testing/selftests/net/toeplitz_client.sh
--
2.51.1
v23:
fixed some of the "CHECK:" reported on checkpatch --strict.
Accepted Joel's suggestion for kselftest's Makefile.
CONFIG_RISCV_USER_CFI is enabled when zicfiss, zicfilp and fcf-protection
are all present in toolchain
v22: fixing build error due to -march=zicfiss being picked in gcc-13 and above
but not actually doing any codegen or recognizing instruction for zicfiss.
Change in v22 makes dependence on `-fcf-protection=full` compiler flag to
ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be
visible in menuconfig.
v21: fixed build errors.
Basics and overview
===================
Software with larger attack surfaces (e.g. network facing apps like databases,
browsers or apps relying on browser runtimes) suffer from memory corruption
issues which can be utilized by attackers to bend control flow of the program
to eventually gain control (by making their payload executable). Attackers are
able to perform such attacks by leveraging call-sites which rely on indirect
calls or return sites which rely on obtaining return address from stack memory.
To mitigate such attacks, risc-v extension zicfilp enforces that all indirect
calls must land on a landing pad instruction `lpad` else cpu will raise software
check exception (a new cpu exception cause code on riscv).
Similarly for return flow, risc-v extension zicfiss extends architecture with
- `sspush` instruction to push return address on a shadow stack
- `sspopchk` instruction to pop return address from shadow stack
and compare with input operand (i.e. return address on stack)
- `sspopchk` to raise software check exception if comparision above
was a mismatch
- Protection mechanism using which shadow stack is not writeable via
regular store instructions
More information an details can be found at extensions github repo [1].
Equivalent to landing pad (zicfilp) on x86 is `ENDBRANCH` instruction in Intel
CET [3] and branch target identification (BTI) [4] on arm.
Similarly x86's Intel CET has shadow stack [5] and arm64 has guarded control
stack (GCS) [6] which are very similar to risc-v's zicfiss shadow stack.
x86 and arm64 support for user mode shadow stack is already in mainline.
Kernel awareness for user control flow integrity
================================================
This series picks up Samuel Holland's envcfg changes [2] as well. So if those are
being applied independently, they should be removed from this series.
Enabling:
In order to maintain compatibility and not break anything in user mode, kernel
doesn't enable control flow integrity cpu extensions on binary by default.
Instead exposes a prctl interface to enable, disable and lock the shadow stack
or landing pad feature for a task. This allows userspace (loader) to enumerate
if all objects in its address space are compiled with shadow stack and landing
pad support and accordingly enable the feature. Additionally if a subsequent
`dlopen` happens on a library, user mode can take a decision again to disable
the feature (if incoming library is not compiled with support) OR terminate the
task (if user mode policy is strict to have all objects in address space to be
compiled with control flow integirty cpu feature). prctl to enable shadow stack
results in allocating shadow stack from virtual memory and activating for user
address space. x86 and arm64 are also following same direction due to similar
reason(s).
clone/fork:
On clone and fork, cfi state for task is inherited by child. Shadow stack is
part of virtual memory and is a writeable memory from kernel perspective
(writeable via a restricted set of instructions aka shadow stack instructions)
Thus kernel changes ensure that this memory is converted into read-only when
fork/clone happens and COWed when fault is taken due to sspush, sspopchk or
ssamoswap. In case `CLONE_VM` is specified and shadow stack is to be enabled,
kernel will automatically allocate a shadow stack for that clone call.
map_shadow_stack:
x86 introduced `map_shadow_stack` system call to allow user space to explicitly
map shadow stack memory in its address space. It is useful to allocate shadow
for different contexts managed by a single thread (green threads or contexts)
risc-v implements this system call as well.
signal management:
If shadow stack is enabled for a task, kernel performs an asynchronous control
flow diversion to deliver the signal and eventually expects userspace to issue
sigreturn so that original execution can be resumed. Even though resume context
is prepared by kernel, it is in user space memory and is subject to memory
corruption and corruption bugs can be utilized by attacker in this race window
to perform arbitrary sigreturn and eventually bypass cfi mechanism.
Another issue is how to ensure that cfi related state on sigcontext area is not
trampled by legacy apps or apps compiled with old kernel headers.
In order to mitigate control-flow hijacting, kernel prepares a token and place
it on shadow stack before signal delivery and places address of token in
sigcontext structure. During sigreturn, kernel obtains address of token from
sigcontext struture, reads token from shadow stack and validates it and only
then allow sigreturn to succeed. Compatiblity issue is solved by adopting
dynamic sigcontext management introduced for vector extension. This series
re-factor the code little bit to allow future sigcontext management easy (as
proposed by Andy Chiu from SiFive)
config and compilation:
Introduce a new risc-v config option `CONFIG_RISCV_USER_CFI`. Selecting this
config option picks the kernel support for user control flow integrity. This
optin is presented only if toolchain has shadow stack and landing pad support.
And is on purpose guarded by toolchain support. Reason being that eventually
vDSO also needs to be compiled in with shadow stack and landing pad support.
vDSO compile patches are not included as of now because landing pad labeling
scheme is yet to settle for usermode runtime.
To get more information on kernel interactions with respect to
zicfilp and zicfiss, patch series adds documentation for
`zicfilp` and `zicfiss` in following:
Documentation/arch/riscv/zicfiss.rst
Documentation/arch/riscv/zicfilp.rst
How to test this series
=======================
Toolchain
---------
$ git clone git@github.com:sifive/riscv-gnu-toolchain.git -b cfi-dev
$ riscv-gnu-toolchain/configure --prefix=<path-to-where-to-build> --with-arch=rv64gc_zicfilp_zicfiss --enable-linux --disable-gdb --with-extra-multilib-test="rv64gc_zicfilp_zicfiss-lp64d:-static"
$ make -j$(nproc)
Qemu
----
Get the lastest qemu
$ cd qemu
$ mkdir build
$ cd build
$ ../configure --target-list=riscv64-softmmu
$ make -j$(nproc)
Opensbi
-------
$ git clone git@github.com:deepak0414/opensbi.git -b v6_cfi_spec_split_opensbi
$ make CROSS_COMPILE=<your riscv toolchain> -j$(nproc) PLATFORM=generic
Linux
-----
Running defconfig is fine. CFI is enabled by default if the toolchain
supports it.
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc) defconfig
$ make ARCH=riscv CROSS_COMPILE=<path-to-cfi-riscv-gnu-toolchain>/build/bin/riscv64-unknown-linux-gnu- -j$(nproc)
Running
-------
Modify your qemu command to have:
-bios <path-to-cfi-opensbi>/build/platform/generic/firmware/fw_dynamic.bin
-cpu rv64,zicfilp=true,zicfiss=true,zimop=true,zcmop=true
References
==========
[1] - https://github.com/riscv/riscv-cfi
[2] - https://lore.kernel.org/all/20240814081126.956287-1-samuel.holland@sifive.c…
[3] - https://lwn.net/Articles/889475/
[4] - https://developer.arm.com/documentation/109576/0100/Branch-Target-Identific…
[5] - https://www.intel.com/content/dam/develop/external/us/en/documents/catc17-i…
[6] - https://lwn.net/Articles/940403/
To: Thomas Gleixner <tglx(a)linutronix.de>
To: Ingo Molnar <mingo(a)redhat.com>
To: Borislav Petkov <bp(a)alien8.de>
To: Dave Hansen <dave.hansen(a)linux.intel.com>
To: x86(a)kernel.org
To: H. Peter Anvin <hpa(a)zytor.com>
To: Andrew Morton <akpm(a)linux-foundation.org>
To: Liam R. Howlett <Liam.Howlett(a)oracle.com>
To: Vlastimil Babka <vbabka(a)suse.cz>
To: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
To: Paul Walmsley <paul.walmsley(a)sifive.com>
To: Palmer Dabbelt <palmer(a)dabbelt.com>
To: Albert Ou <aou(a)eecs.berkeley.edu>
To: Conor Dooley <conor(a)kernel.org>
To: Rob Herring <robh(a)kernel.org>
To: Krzysztof Kozlowski <krzk+dt(a)kernel.org>
To: Arnd Bergmann <arnd(a)arndb.de>
To: Christian Brauner <brauner(a)kernel.org>
To: Peter Zijlstra <peterz(a)infradead.org>
To: Oleg Nesterov <oleg(a)redhat.com>
To: Eric Biederman <ebiederm(a)xmission.com>
To: Kees Cook <kees(a)kernel.org>
To: Jonathan Corbet <corbet(a)lwn.net>
To: Shuah Khan <shuah(a)kernel.org>
To: Jann Horn <jannh(a)google.com>
To: Conor Dooley <conor+dt(a)kernel.org>
To: Miguel Ojeda <ojeda(a)kernel.org>
To: Alex Gaynor <alex.gaynor(a)gmail.com>
To: Boqun Feng <boqun.feng(a)gmail.com>
To: Gary Guo <gary(a)garyguo.net>
To: Björn Roy Baron <bjorn3_gh(a)protonmail.com>
To: Benno Lossin <benno.lossin(a)proton.me>
To: Andreas Hindborg <a.hindborg(a)kernel.org>
To: Alice Ryhl <aliceryhl(a)google.com>
To: Trevor Gross <tmgross(a)umich.edu>
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-fsdevel(a)vger.kernel.org
Cc: linux-mm(a)kvack.org
Cc: linux-riscv(a)lists.infradead.org
Cc: devicetree(a)vger.kernel.org
Cc: linux-arch(a)vger.kernel.org
Cc: linux-doc(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: alistair.francis(a)wdc.com
Cc: richard.henderson(a)linaro.org
Cc: jim.shu(a)sifive.com
Cc: andybnac(a)gmail.com
Cc: kito.cheng(a)sifive.com
Cc: charlie(a)rivosinc.com
Cc: atishp(a)rivosinc.com
Cc: evan(a)rivosinc.com
Cc: cleger(a)rivosinc.com
Cc: alexghiti(a)rivosinc.com
Cc: samitolvanen(a)google.com
Cc: broonie(a)kernel.org
Cc: rick.p.edgecombe(a)intel.com
Cc: rust-for-linux(a)vger.kernel.org
changelog
---------
v23:
- fixed some of the "CHECK:" reported on checkpatch --strict.
- Accepted Joel's suggestion for kselftest's Makefile.
- CONFIG_RISCV_USER_CFI is enabled when zicfiss, zicfilp and fcf-protection
are all present in toolchain
v22:
- CONFIG_RISCV_USER_CFI was by default "n". With dual vdso support it is
default "y" (if toolchain supports it). Fixing build error due to
"-march=zicfiss" being picked in gcc-13 partially. gcc-13 only recognizes the
flag but not actually doing any codegen or recognizing instruction for zicfiss.
Change in v22 makes dependence on `-fcf-protection=full` compiler flag to
ensure that toolchain has support and then only CONFIG_RISCV_USER_CFI will be
visible in menuconfig.
- picked up tags and some cosmetic changes in commit message for dual vdso
patch.
v21:
- Fixing build errors due to changes in arch/riscv/include/asm/vdso.h
Using #ifdef instead of IS_ENABLED in arch/riscv/include/asm/vdso.h
vdso-cfi-offsets.h should be included only when CONFIG_RISCV_USER_CFI
is selected.
v20:
- rebased on v6.18-rc1.
- Added two vDSO support. If `CONFIG_RISCV_USER_CFI` is selected
two vDSOs are compiled (one for hardware prior to RVA23 and one
for RVA23 onwards). Kernel exposes RVA23 vDSO if hardware/cpu
implements zimop else exposes existing vDSO to userspace.
- default selection for `CONFIG_RISCV_USER_CFI` is "Yes".
- replaced "__ASSEMBLY__" with "__ASSEMBLER__"
v19:
- riscv_nousercfi was `int`. changed it to unsigned long.
Thanks to Alex Ghiti for reporting it. It was a bug.
- ELP is cleared on trap entry only when CONFIG_64BIT.
- restore ssp back on return to usermode was being done
before `riscv_v_context_nesting_end` on trap exit path.
If kernel shadow stack were enabled this would result in
kernel operating on user shadow stack and panic (as I found
in my testing of kcfi patch series). So fixed that.
v18:
- rebased on 6.16-rc1
- uprobe handling clears ELP in sstatus image in pt_regs
- vdso was missing shadow stack elf note for object files.
added that. Additional asm file for vdso needed the elf marker
flag. toolchain should complain if `-fcf-protection=full` and
marker is missing for object generated from asm file. Asked
toolchain folks to fix this. Although no reason to gate the merge
on that.
- Split up compile options for march and fcf-protection in vdso
Makefile
- CONFIG_RISCV_USER_CFI option is moved under "Kernel features" menu
Added `arch/riscv/configs/hardening.config` fragment which selects
CONFIG_RISCV_USER_CFI
v17:
- fixed warnings due to empty macros in usercfi.h (reported by alexg)
- fixed prefixes in commit titles reported by alexg
- took below uprobe with fcfi v2 patch from Zong Li and squashed it with
"riscv/traps: Introduce software check exception and uprobe handling"
https://lore.kernel.org/all/20250604093403.10916-1-zong.li@sifive.com/
v16:
- If FWFT is not implemented or returns error for shadow stack activation, then
no_usercfi is set to disable shadow stack. Although this should be picked up
by extension validation and activation. Fixed this bug for zicfilp and zicfiss
both. Thanks to Charlie Jenkins for reporting this.
- If toolchain doesn't support cfi, cfi kselftest shouldn't build. Suggested by
Charlie Jenkins.
- Default for CONFIG_RISCV_USER_CFI is set to no. Charlie/Atish suggested to
keep it off till we have more hardware availibility with RVA23 profile and
zimop/zcmop implemented. Else this will start breaking people's workflow
- Includes the fix if "!RV64 and !SBI" then definitions for FWFT in
asm-offsets.c error.
v15:
- Toolchain has been updated to include `-fcf-protection` flag. This
exists for x86 as well. Updated kernel patches to compile vDSO and
selftest to compile with `fcf-protection=full` flag.
- selecting CONFIG_RISCV_USERCFI selects CONFIG_RISCV_SBI.
- Patch to enable shadow stack for kernel wasn't hidden behind
CONFIG_RISCV_USERCFI and CONFIG_RISCV_SBI. fixed that.
v14:
- rebased on top of palmer/sbi-v3. Thus dropped clement's FWFT patches
Updated RISCV_ISA_EXT_XXXX in hwcap and hwprobe constants.
- Took Radim's suggestions on bitfields.
- Placed cfi_state at the end of thread_info block so that current situation
is not disturbed with respect to member fields of thread_info in single
cacheline.
v13:
- cpu_supports_shadow_stack/cpu_supports_indirect_br_lp_instr uses
riscv_has_extension_unlikely()
- uses nops(count) to create nop slide
- RISCV_ACQUIRE_BARRIER is not needed in `amo_user_shstk`. Removed it
- changed ternaries to simply use implicit casting to convert to bool.
- kernel command line allows to disable zicfilp and zicfiss independently.
updated kernel-parameters.txt.
- ptrace user abi for cfi uses bitmasks instead of bitfields. Added ptrace
kselftest.
- cosmetic and grammatical changes to documentation.
v12:
- It seems like I had accidently squashed arch agnostic indirect branch
tracking prctl and riscv implementation of those prctls. Split them again.
- set_shstk_status/set_indir_lp_status perform CSR writes only when CPU
support is available. As suggested by Zong Li.
- Some minor clean up in kselftests as suggested by Zong Li.
v11:
- patch "arch/riscv: compile vdso with landing pad" was unconditionally
selecting `_zicfilp` for vDSO compile. fixed that. Changed `lpad 1` to
to `lpad 0`.
v10:
- dropped "mm: helper `is_shadow_stack_vma` to check shadow stack vma". This patch
is not that interesting to this patch series for risc-v. There are instances in
arch directories where VM_SHADOW_STACK flag is anyways used. Dropping this patch
to expedite merging in riscv tree.
- Took suggestions from `Clement` on "riscv: zicfiss / zicfilp enumeration" to
validate presence of cfi based on config.
- Added a patch for vDSO to have `lpad 0`. I had omitted this earlier to make sure
we add single vdso object with cfi enabled. But a vdso object with scheme of
zero labeled landing pad is least common denominator and should work with all
objects of zero labeled as well as function-signature labeled objects.
v9:
- rebased on master (39a803b754d5 fix braino in "9p: fix ->rename_sem exclusion")
- dropped "mm: Introduce ARCH_HAS_USER_SHADOW_STACK" (master has it from arm64/gcs)
- dropped "prctl: arch-agnostic prctl for shadow stack" (master has it from arm64/gcs)
v8:
- rebased on palmer/for-next
- dropped samuel holland's `envcfg` context switch patches.
they are in parlmer/for-next
v7:
- Removed "riscv/Kconfig: enable HAVE_EXIT_THREAD for riscv"
Instead using `deactivate_mm` flow to clean up.
see here for more context
https://lore.kernel.org/all/20230908203655.543765-1-rick.p.edgecombe@intel.…
- Changed the header include in `kselftest`. Hopefully this fixes compile
issue faced by Zong Li at SiFive.
- Cleaned up an orphaned change to `mm/mmap.c` in below patch
"riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE"
- Lock interfaces for shadow stack and indirect branch tracking expect arg == 0
Any future evolution of this interface should accordingly define how arg should
be setup.
- `mm/map.c` has an instance of using `VM_SHADOW_STACK`. Fixed it to use helper
`is_shadow_stack_vma`.
- Link to v6: https://lore.kernel.org/r/20241008-v5_user_cfi_series-v6-0-60d9fe073f37@riv…
v6:
- Picked up Samuel Holland's changes as is with `envcfg` placed in
`thread` instead of `thread_info`
- fixed unaligned newline escapes in kselftest
- cleaned up messages in kselftest and included test output in commit message
- fixed a bug in clone path reported by Zong Li
- fixed a build issue if CONFIG_RISCV_ISA_V is not selected
(this was introduced due to re-factoring signal context
management code)
v5:
- rebased on v6.12-rc1
- Fixed schema related issues in device tree file
- Fixed some of the documentation related issues in zicfilp/ss.rst
(style issues and added index)
- added `SHADOW_STACK_SET_MARKER` so that implementation can define base
of shadow stack.
- Fixed warnings on definitions added in usercfi.h when
CONFIG_RISCV_USER_CFI is not selected.
- Adopted context header based signal handling as proposed by Andy Chiu
- Added support for enabling kernel mode access to shadow stack using
FWFT
(https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-firmware…)
- Link to v5: https://lore.kernel.org/r/20241001-v5_user_cfi_series-v1-0-3ba65b6e550f@riv…
(Note: I had an issue in my workflow due to which version number wasn't
picked up correctly while sending out patches)
v4:
- rebased on 6.11-rc6
- envcfg: Converged with Samuel Holland's patches for envcfg management on per-
thread basis.
- vma_is_shadow_stack is renamed to is_vma_shadow_stack
- picked up Mark Brown's `ARCH_HAS_USER_SHADOW_STACK` patch
- signal context: using extended context management to maintain compatibility.
- fixed `-Wmissing-prototypes` compiler warnings for prctl functions
- Documentation fixes and amending typos.
- Link to v4: https://lore.kernel.org/all/20240912231650.3740732-1-debug@rivosinc.com/
v3:
- envcfg
logic to pick up base envcfg had a bug where `ENVCFG_CBZE` could have been
picked on per task basis, even though CPU didn't implement it. Fixed in
this series.
- dt-bindings
As suggested, split into separate commit. fixed the messaging that spec is
in public review
- arch_is_shadow_stack change
arch_is_shadow_stack changed to vma_is_shadow_stack
- hwprobe
zicfiss / zicfilp if present will get enumerated in hwprobe
- selftests
As suggested, added object and binary filenames to .gitignore
Selftest binary anyways need to be compiled with cfi enabled compiler which
will make sure that landing pad and shadow stack are enabled. Thus removed
separate enable/disable tests. Cleaned up tests a bit.
- Link to v3: https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/
v2:
- Using config `CONFIG_RISCV_USER_CFI`, kernel support for riscv control flow
integrity for user mode programs can be compiled in the kernel.
- Enabling of control flow integrity for user programs is left to user runtime
- This patch series introduces arch agnostic `prctls` to enable shadow stack
and indirect branch tracking. And implements them on riscv.
---
Changes in v23:
- Link to v22: https://lore.kernel.org/r/20251023-v5_user_cfi_series-v22-0-1935270f7636@ri…
Changes in v22:
- Link to v21: https://lore.kernel.org/r/20251015-v5_user_cfi_series-v21-0-6a07856e90e7@ri…
Changes in v21:
- Link to v20: https://lore.kernel.org/r/20251013-v5_user_cfi_series-v20-0-b9de4be9912e@ri…
Changes in v20:
- Link to v19: https://lore.kernel.org/r/20250731-v5_user_cfi_series-v19-0-09b468d7beab@ri…
Changes in v19:
- Link to v18: https://lore.kernel.org/r/20250711-v5_user_cfi_series-v18-0-a8ee62f9f38e@ri…
Changes in v18:
- Link to v17: https://lore.kernel.org/r/20250604-v5_user_cfi_series-v17-0-4565c2cf869f@ri…
Changes in v17:
- Link to v16: https://lore.kernel.org/r/20250522-v5_user_cfi_series-v16-0-64f61a35eee7@ri…
Changes in v16:
- Link to v15: https://lore.kernel.org/r/20250502-v5_user_cfi_series-v15-0-914966471885@ri…
Changes in v15:
- changelog posted just below cover letter
- Link to v14: https://lore.kernel.org/r/20250429-v5_user_cfi_series-v14-0-5239410d012a@ri…
Changes in v14:
- changelog posted just below cover letter
- Link to v13: https://lore.kernel.org/r/20250424-v5_user_cfi_series-v13-0-971437de586a@ri…
Changes in v13:
- changelog posted just below cover letter
- Link to v12: https://lore.kernel.org/r/20250314-v5_user_cfi_series-v12-0-e51202b53138@ri…
Changes in v12:
- changelog posted just below cover letter
- Link to v11: https://lore.kernel.org/r/20250310-v5_user_cfi_series-v11-0-86b36cbfb910@ri…
Changes in v11:
- changelog posted just below cover letter
- Link to v10: https://lore.kernel.org/r/20250210-v5_user_cfi_series-v10-0-163dcfa31c60@ri…
---
Andy Chiu (1):
riscv: signal: abstract header saving for setup_sigcontext
Deepak Gupta (26):
mm: VM_SHADOW_STACK definition for riscv
dt-bindings: riscv: zicfilp and zicfiss in dt-bindings (extensions.yaml)
riscv: zicfiss / zicfilp enumeration
riscv: zicfiss / zicfilp extension csr and bit definitions
riscv: usercfi state for task and save/restore of CSR_SSP on trap entry/exit
riscv/mm : ensure PROT_WRITE leads to VM_READ | VM_WRITE
riscv/mm: manufacture shadow stack pte
riscv/mm: teach pte_mkwrite to manufacture shadow stack PTEs
riscv/mm: write protect and shadow stack
riscv/mm: Implement map_shadow_stack() syscall
riscv/shstk: If needed allocate a new shadow stack on clone
riscv: Implements arch agnostic shadow stack prctls
prctl: arch-agnostic prctl for indirect branch tracking
riscv: Implements arch agnostic indirect branch tracking prctls
riscv/traps: Introduce software check exception and uprobe handling
riscv/signal: save and restore of shadow stack for signal
riscv/kernel: update __show_regs to print shadow stack register
riscv/ptrace: riscv cfi status and state via ptrace and in core files
riscv/hwprobe: zicfilp / zicfiss enumeration in hwprobe
riscv: kernel command line option to opt out of user cfi
riscv: enable kernel access to shadow stack memory via FWFT sbi call
arch/riscv: dual vdso creation logic and select vdso based on hw
riscv: create a config for shadow stack and landing pad instr support
riscv: Documentation for landing pad / indirect branch tracking
riscv: Documentation for shadow stack on riscv
kselftest/riscv: kselftest for user mode cfi
Jim Shu (1):
arch/riscv: compile vdso with landing pad and shadow stack note
Documentation/admin-guide/kernel-parameters.txt | 8 +
Documentation/arch/riscv/index.rst | 2 +
Documentation/arch/riscv/zicfilp.rst | 115 +++++
Documentation/arch/riscv/zicfiss.rst | 179 +++++++
.../devicetree/bindings/riscv/extensions.yaml | 14 +
arch/riscv/Kconfig | 22 +
arch/riscv/Makefile | 8 +-
arch/riscv/configs/hardening.config | 4 +
arch/riscv/include/asm/asm-prototypes.h | 1 +
arch/riscv/include/asm/assembler.h | 44 ++
arch/riscv/include/asm/cpufeature.h | 12 +
arch/riscv/include/asm/csr.h | 16 +
arch/riscv/include/asm/entry-common.h | 2 +
arch/riscv/include/asm/hwcap.h | 2 +
arch/riscv/include/asm/mman.h | 26 +
arch/riscv/include/asm/mmu_context.h | 7 +
arch/riscv/include/asm/pgtable.h | 30 +-
arch/riscv/include/asm/processor.h | 1 +
arch/riscv/include/asm/thread_info.h | 3 +
arch/riscv/include/asm/usercfi.h | 95 ++++
arch/riscv/include/asm/vdso.h | 13 +-
arch/riscv/include/asm/vector.h | 3 +
arch/riscv/include/uapi/asm/hwprobe.h | 2 +
arch/riscv/include/uapi/asm/ptrace.h | 34 ++
arch/riscv/include/uapi/asm/sigcontext.h | 1 +
arch/riscv/kernel/Makefile | 2 +
arch/riscv/kernel/asm-offsets.c | 10 +
arch/riscv/kernel/cpufeature.c | 27 +
arch/riscv/kernel/entry.S | 38 ++
arch/riscv/kernel/head.S | 27 +
arch/riscv/kernel/process.c | 27 +-
arch/riscv/kernel/ptrace.c | 95 ++++
arch/riscv/kernel/signal.c | 148 +++++-
arch/riscv/kernel/sys_hwprobe.c | 2 +
arch/riscv/kernel/sys_riscv.c | 10 +
arch/riscv/kernel/traps.c | 54 ++
arch/riscv/kernel/usercfi.c | 545 +++++++++++++++++++++
arch/riscv/kernel/vdso.c | 7 +
arch/riscv/kernel/vdso/Makefile | 40 +-
arch/riscv/kernel/vdso/flush_icache.S | 4 +
arch/riscv/kernel/vdso/gen_vdso_offsets.sh | 4 +-
arch/riscv/kernel/vdso/getcpu.S | 4 +
arch/riscv/kernel/vdso/note.S | 3 +
arch/riscv/kernel/vdso/rt_sigreturn.S | 4 +
arch/riscv/kernel/vdso/sys_hwprobe.S | 4 +
arch/riscv/kernel/vdso/vgetrandom-chacha.S | 5 +-
arch/riscv/kernel/vdso_cfi/Makefile | 25 +
arch/riscv/kernel/vdso_cfi/vdso-cfi.S | 11 +
arch/riscv/mm/init.c | 2 +-
arch/riscv/mm/pgtable.c | 16 +
include/linux/cpu.h | 4 +
include/linux/mm.h | 7 +
include/uapi/linux/elf.h | 2 +
include/uapi/linux/prctl.h | 27 +
kernel/sys.c | 30 ++
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/cfi/.gitignore | 2 +
tools/testing/selftests/riscv/cfi/Makefile | 23 +
tools/testing/selftests/riscv/cfi/cfi_rv_test.h | 82 ++++
tools/testing/selftests/riscv/cfi/cfitests.c | 173 +++++++
tools/testing/selftests/riscv/cfi/shadowstack.c | 385 +++++++++++++++
tools/testing/selftests/riscv/cfi/shadowstack.h | 27 +
62 files changed, 2481 insertions(+), 41 deletions(-)
---
base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787
change-id: 20240930-v5_user_cfi_series-3dc332f8f5b2
--
- debug
This patch adds support for the Zalasr ISA extension, which supplies the
real load acquire/store release instructions.
The specification can be found here:
https://github.com/riscv/riscv-zalasr/blob/main/chapter2.adoc
This patch seires has been tested with ltp on Qemu with Brensan's zalasr
support patch[1].
Some false positive spacing error happens during patch checking. Thus I
CCed maintainers of checkpatch.pl as well.
[1] https://lore.kernel.org/all/CAGPSXwJEdtqW=nx71oufZp64nK6tK=0rytVEcz4F-gfvCO…
v4:
- Apply acquire/release semantics to arch_atomic operations. Thanks
to Andrea.
v3:
- Apply acquire/release semantics to arch_xchg/arch_cmpxchg operations
so as to ensure FENCE.TSO ordering between operations which precede the
UNLOCK+LOCK sequence and operations which follow the sequence. Thanks
to Andrea.
- Support hwprobe of Zalasr.
- Allow Zalasr extensions for Guest/VM.
v2:
- Adjust the order of Zalasr and Zalrsc in dt-bindings. Thanks to
Conor.
Xu Lu (10):
riscv: Add ISA extension parsing for Zalasr
dt-bindings: riscv: Add Zalasr ISA extension description
riscv: hwprobe: Export Zalasr extension
riscv: Introduce Zalasr instructions
riscv: Apply Zalasr to smp_load_acquire/smp_store_release
riscv: Apply acquire/release semantics to arch_xchg/arch_cmpxchg
operations
riscv: Apply acquire/release semantics to arch_atomic operations
riscv: Remove arch specific __atomic_acquire/release_fence
RISC-V: KVM: Allow Zalasr extensions for Guest/VM
RISC-V: KVM: selftests: Add Zalasr extensions to get-reg-list test
Documentation/arch/riscv/hwprobe.rst | 5 +-
.../devicetree/bindings/riscv/extensions.yaml | 5 +
arch/riscv/include/asm/atomic.h | 70 ++++++++-
arch/riscv/include/asm/barrier.h | 91 +++++++++--
arch/riscv/include/asm/cmpxchg.h | 144 +++++++++---------
arch/riscv/include/asm/fence.h | 4 -
arch/riscv/include/asm/hwcap.h | 1 +
arch/riscv/include/asm/insn-def.h | 79 ++++++++++
arch/riscv/include/uapi/asm/hwprobe.h | 1 +
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kernel/cpufeature.c | 1 +
arch/riscv/kernel/sys_hwprobe.c | 1 +
arch/riscv/kvm/vcpu_onereg.c | 2 +
.../selftests/kvm/riscv/get-reg-list.c | 4 +
14 files changed, 314 insertions(+), 95 deletions(-)
--
2.20.1
Currently, guard regions are not visible to users except through
/proc/$pid/pagemap, with no explicit visibility at the VMA level.
This makes the feature less useful, as it isn't entirely apparent which
VMAs may have these entries present, especially when performing actions
which walk through memory regions such as those performed by CRIU.
This series addresses this issue by introducing the VM_MAYBE_GUARD flag
which fulfils this role, updating the smaps logic to display an entry for
these.
The semantics of this flag are that a guard region MAY be present if set
(we cannot be sure, as we can't efficiently track whether an
MADV_GUARD_REMOVE finally removes all the guard regions in a VMA) - but if
not set the VMA definitely does NOT have any guard regions present.
It's problematic to establish this flag without further action, because
that means that VMAs with guard regions in them become non-mergeable with
adjacent VMAs for no especially good reason.
To work around this, this series also introduces the concept of 'sticky'
VMA flags - that is flags which:
a. if set in one VMA and not in another still permit those VMAs to be
merged (if otherwise compatible).
b. When they are merged, the resultant VMA must have the flag set.
The VMA logic is updated to propagate these flags correctly.
Additionally, VM_MAYBE_GUARD being an explicit VMA flag allows us to solve
an issue with file-backed guard regions - previously these established an
anon_vma object for file-backed mappings solely to have vma_needs_copy()
correctly propagate guard region mappings to child processes.
We introduce a new flag alias VM_COPY_ON_FORK (which currently only
specifies VM_MAYBE_GUARD) and update vma_needs_copy() to check explicitly
for this flag and to copy page tables if it is present, which resolves this
issue.
Additionally, we add the ability for allow-listed VMA flags to be
atomically writable with only mmap/VMA read locks held.
The only flag we allow so far is VM_MAYBE_GUARD, which we carefully ensure
does not cause any races by being allowed to do so.
This allows us to maintain guard region installation as a read-locked
operation and not endure the overhead of obtaining a write lock here.
Finally we introduce extensive VMA userland tests to assert that the sticky
VMA logic behaves correctly as well as guard region self tests to assert
that smaps visibility is correctly implemented.
v3:
* Propagated tags thanks Vlastimil & Pedro! :)
* Fixed doc nit as per Pedro.
* Added vma_flag_test_atomic() in preparation for fixing
retract_page_tables() (see below). We make this not require any locks, as
we serialise on the page table lock in retract_page_tables().
* Split the atomic flag enablement and actually setting the flag for guard
install into two separate commits so we clearly separate the various VMA
flag implementation details and us enabling this feature.
* Mentioned setting anon_vma for anonymous mappings in commit message as
per Vlastimil.
* Fixed an issue with retract_page_tables() whereby madvise(...,
MADV_COLLAPSE) relies upon file-backed VMAs not being collapsed due to
the UFFD WP VMA flag being set or the VMA having vma->anon_vma set
(i.e. being a MAP_PRIVATE file-backed VMA). This was updated to also
check for VM_MAYBE_GUARD.
* Introduced MADV_COLLAPSE self test to assert that the behaviour is
correct. I first reproduced the issue locally and then adapted the test
to assert that this no longer occurs.
* Mentioned KCSAN permissiveness in commit message as per Pedro.
* Mentioned mmap/VMA read lock excluding mmap/VMA write lock and thus
avoiding meaningful RMW races in commit message as per Vlastimil.
* Mentioned previous unconditional vma->anon_vma installation on guard
region installation as per Vlastimil.
* Avoided having merging compromised by reordering patches such that the
sticky VMA functionality is implemented prior to VM_MAYBE_GUARD being
utilised upon guard region installation, rendering Vlastimil's request to
mention this in a commit message unnecessary.
* Separated out sticky and copy on fork patches as per Pedro.
* Added VM_PFNMAP, VM_MIXEDMAP, VM_UFFD_WP to VM_COPY_ON_FORK to make
things more consistent and clean.
* Added mention of why generally VM_STICKY should be VM_COPY_ON_FORK in
copy on fork patch.
v2:
* Separated out userland VMA tests for sticky behaviour as per Suren.
* Added the concept of atomic writable VMA flags as per Pedro and Vlastimil.
* Made VM_MAYBE_GUARD an atomic writable flag so we don't have to take a VMA
write lock in madvise() as per Pedro and Vlastimil.
https://lore.kernel.org/all/cover.1762422915.git.lorenzo.stoakes@oracle.com/
v1:
https://lore.kernel.org/all/cover.1761756437.git.lorenzo.stoakes@oracle.com/
Lorenzo Stoakes (8):
mm: introduce VM_MAYBE_GUARD and make visible in /proc/$pid/smaps
mm: add atomic VMA flags and set VM_MAYBE_GUARD as such
mm: implement sticky VMA flags
mm: introduce copy-on-fork VMAs and make VM_MAYBE_GUARD one
mm: set the VM_MAYBE_GUARD flag on guard region install
tools/testing/vma: add VMA sticky userland tests
tools/testing/selftests/mm: add MADV_COLLAPSE test case
tools/testing/selftests/mm: add smaps visibility guard region test
Documentation/filesystems/proc.rst | 5 +-
fs/proc/task_mmu.c | 1 +
include/linux/mm.h | 102 ++++++++++++
include/trace/events/mmflags.h | 1 +
mm/khugepaged.c | 72 +++++---
mm/madvise.c | 22 ++-
mm/memory.c | 14 +-
mm/vma.c | 22 +--
tools/testing/selftests/mm/guard-regions.c | 185 +++++++++++++++++++++
tools/testing/selftests/mm/vm_util.c | 5 +
tools/testing/selftests/mm/vm_util.h | 1 +
tools/testing/vma/vma.c | 89 ++++++++--
tools/testing/vma/vma_internal.h | 56 +++++++
13 files changed, 511 insertions(+), 64 deletions(-)
--
2.51.0
The vector regset uses the maximum possible vlenb 8192 to allocate a
2^18 bytes buffer to copy the vector register. But most platforms
don’t support the largest vlenb.
The regset has 2 users, ptrace syscall and coredump. When handling the
PTRACE_GETREGSET requests from ptrace syscall, Linux will prepare a
kernel buffer which size is min(user buffer size, limit). A malicious
user process might overwhelm a memory-constrainted system when the
buffer limit is very large. The coredump uses regset_get_alloc() to
get the context of vector register. But this API allocates buffer
before checking whether the target process uses vector extension, this
wastes time to prepare a large memory buffer.
The buffer limit can be determined after getting platform vlenb in the
early boot stage, this can let the regset buffer match real hardware
limits. Also add .active callbacks to let the coredump skip vector part
when target process doesn't use it.
After this patchset, userspace process needs 2 ptrace syscalls to
retrieve the vector regset with PTRACE_GETREGSET. The first ptrace call
only reads the header to get the vlenb information. Then prepare a
suitable buffer to get the register context. The new vector ptrace
kselftest demonstrates it.
---
v2:
- fix issues in vector ptrace kselftest (Andy)
Yong-Xuan Wang (2):
riscv: ptrace: Optimize the allocation of vector regset
selftests: riscv: Add test for the Vector ptrace interface
arch/riscv/include/asm/vector.h | 1 +
arch/riscv/kernel/ptrace.c | 24 +++-
arch/riscv/kernel/vector.c | 2 +
tools/testing/selftests/riscv/vector/Makefile | 5 +-
.../selftests/riscv/vector/vstate_ptrace.c | 134 ++++++++++++++++++
5 files changed, 162 insertions(+), 4 deletions(-)
create mode 100644 tools/testing/selftests/riscv/vector/vstate_ptrace.c
--
2.43.0
The user_notification_wait_killable_after_reply test fails due to an
unhandled error when a traced syscall is interrupted by a signal.
When a signal arrives after the tracer has received a seccomp
notification but before it has replied, the notification can become
stale. Any subsequent reply (like with SECCOMP_IOCTL_NOTIF_ADDFD)
will fail with -ENOENT.
This patch fixes the test by handling the -ENOENT return value from
SECCOMP_IOCTL_NOTIF_ADDFD, preventing the test from failing
incorrectly. The loop counter is decremented to re-run the iteration
for the restarted syscall.
Signed-off-by: Wake Liu <wakel(a)google.com>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 574fdd102eb5..c3e598c9c4ee 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -5048,8 +5048,12 @@ TEST(user_notification_wait_killable_after_reply)
addfd.id = req.id;
addfd.flags = SECCOMP_ADDFD_FLAG_SEND;
addfd.srcfd = 0;
- ASSERT_GE(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), 0)
- kill(pid, SIGKILL);
+ ret = ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd);
+ if (ret < 0 && errno == ENOENT) {
+ i--;
+ continue;
+ }
+ ASSERT_GE(ret, 0);
}
/*
--
2.52.0.rc1.455.g30608eb744-goog
Dzień dobry,
pomagamy przedsiębiorcom wprowadzić model wymiany walut, który minimalizuje wahania kosztów przy rozliczeniach międzynarodowych.
Kiedyv możemy umówić się na 15-minutową rozmowę, aby zaprezentować, jak taki model mógłby działać w Państwa firmie - z gwarancją indywidualnych kursów i pełnym uproszczeniem płatności? Proszę o propozycję dogodnego terminu.
Pozdrawiam
Marek Poradecki
This commit introduces checks for kernel version and seccomp filter flag
support to the seccomp selftests. It also includes conditional header
inclusions using __GLIBC_PREREQ.
Some tests were gated by kernel version, and adjustments were made for
flags introduced after kernel 5.4. This ensures the selftests can run
and pass correctly on kernel versions 5.4 and later, preventing failures
due to features not present in older kernels.
The use of __GLIBC_PREREQ ensures proper compilation and functionality
across different glibc versions in a mainline Linux kernel context.
While it might appear redundant in specific build environments due to
global overrides, it is crucial for upstream correctness and portability.
Signed-off-by: Wake Liu <wakel(a)google.com>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 108 ++++++++++++++++--
1 file changed, 99 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 61acbd45ffaa..9b660cff5a4a 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -13,12 +13,14 @@
* we need to use the kernel's siginfo.h file and trick glibc
* into accepting it.
*/
+#if defined(__GLIBC__) && defined(__GLIBC_PREREQ)
#if !__GLIBC_PREREQ(2, 26)
# include <asm/siginfo.h>
# define __have_siginfo_t 1
# define __have_sigval_t 1
# define __have_sigevent_t 1
#endif
+#endif
#include <errno.h>
#include <linux/filter.h>
@@ -300,6 +302,26 @@ int seccomp(unsigned int op, unsigned int flags, void *args)
}
#endif
+int seccomp_flag_supported(int flag)
+{
+ /*
+ * Probes if a seccomp filter flag is supported by the kernel.
+ *
+ * When an unsupported flag is passed to seccomp(SECCOMP_SET_MODE_FILTER, ...),
+ * the kernel returns EINVAL.
+ *
+ * When a supported flag is passed, the kernel proceeds to validate the
+ * filter program pointer. By passing NULL for the filter program,
+ * the kernel attempts to dereference a bad address, resulting in EFAULT.
+ *
+ * Therefore, checking for EFAULT indicates that the flag itself was
+ * recognized and supported by the kernel.
+ */
+ if (seccomp(SECCOMP_SET_MODE_FILTER, flag, NULL) == -1 && errno == EFAULT)
+ return 1;
+ return 0;
+}
+
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
#define syscall_arg(_n) (offsetof(struct seccomp_data, args[_n]))
#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
@@ -2436,13 +2458,12 @@ TEST(detect_seccomp_filter_flags)
ASSERT_NE(ENOSYS, errno) {
TH_LOG("Kernel does not support seccomp syscall!");
}
- EXPECT_EQ(-1, ret);
- EXPECT_EQ(EFAULT, errno) {
- TH_LOG("Failed to detect that a known-good filter flag (0x%X) is supported!",
- flag);
- }
- all_flags |= flag;
+ if (seccomp_flag_supported(flag))
+ all_flags |= flag;
+ else
+ TH_LOG("Filter flag (0x%X) is not found to be supported!",
+ flag);
}
/*
@@ -2870,6 +2891,12 @@ TEST_F(TSYNC, two_siblings_with_one_divergence)
TEST_F(TSYNC, two_siblings_with_one_divergence_no_tid_in_err)
{
+ /* Depends on 5189149 (seccomp: allow TSYNC and USER_NOTIF together) */
+ if (!seccomp_flag_supported(SECCOMP_FILTER_FLAG_TSYNC_ESRCH)) {
+ SKIP(return, "Kernel does not support SECCOMP_FILTER_FLAG_TSYNC_ESRCH");
+ return;
+ }
+
long ret, flags;
void *status;
@@ -3475,6 +3502,11 @@ TEST(user_notification_basic)
TEST(user_notification_with_tsync)
{
+ /* Depends on 5189149 (seccomp: allow TSYNC and USER_NOTIF together) */
+ if (!seccomp_flag_supported(SECCOMP_FILTER_FLAG_TSYNC_ESRCH)) {
+ SKIP(return, "Kernel does not support SECCOMP_FILTER_FLAG_TSYNC_ESRCH");
+ return;
+ }
int ret;
unsigned int flags;
@@ -3966,6 +3998,13 @@ TEST(user_notification_filter_empty)
TEST(user_ioctl_notification_filter_empty)
{
+ /* Depends on 95036a7 (seccomp: interrupt SECCOMP_IOCTL_NOTIF_RECV
+ * when all users have exited) */
+ if (!ksft_min_kernel_version(6, 11)) {
+ SKIP(return, "Kernel version < 6.11");
+ return;
+ }
+
pid_t pid;
long ret;
int status, p[2];
@@ -4119,6 +4158,12 @@ int get_next_fd(int prev_fd)
TEST(user_notification_addfd)
{
+ /* Depends on 0ae71c7 (seccomp: Support atomic "addfd + send reply") */
+ if (!ksft_min_kernel_version(5, 14)) {
+ SKIP(return, "Kernel version < 5.14");
+ return;
+ }
+
pid_t pid;
long ret;
int status, listener, memfd, fd, nextfd;
@@ -4281,6 +4326,12 @@ TEST(user_notification_addfd)
TEST(user_notification_addfd_rlimit)
{
+ /* Depends on 7cf97b1 (seccomp: Introduce addfd ioctl to seccomp user notifier) */
+ if (!ksft_min_kernel_version(5, 9)) {
+ SKIP(return, "Kernel version < 5.9");
+ return;
+ }
+
pid_t pid;
long ret;
int status, listener, memfd;
@@ -4326,9 +4377,12 @@ TEST(user_notification_addfd_rlimit)
EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), -1);
EXPECT_EQ(errno, EMFILE);
- addfd.flags = SECCOMP_ADDFD_FLAG_SEND;
- EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), -1);
- EXPECT_EQ(errno, EMFILE);
+ /* Depends on 0ae71c7 (seccomp: Support atomic "addfd + send reply") */
+ if (ksft_min_kernel_version(5, 14)) {
+ addfd.flags = SECCOMP_ADDFD_FLAG_SEND;
+ EXPECT_EQ(ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd), -1);
+ EXPECT_EQ(errno, EMFILE);
+ }
addfd.newfd = 100;
addfd.flags = SECCOMP_ADDFD_FLAG_SETFD;
@@ -4356,6 +4410,12 @@ TEST(user_notification_addfd_rlimit)
TEST(user_notification_sync)
{
+ /* Depends on 48a1084 (seccomp: add the synchronous mode for seccomp_unotify) */
+ if (!ksft_min_kernel_version(6, 6)) {
+ SKIP(return, "Kernel version < 6.6");
+ return;
+ }
+
struct seccomp_notif req = {};
struct seccomp_notif_resp resp = {};
int status, listener;
@@ -4520,6 +4580,12 @@ static char get_proc_stat(struct __test_metadata *_metadata, pid_t pid)
TEST(user_notification_fifo)
{
+ /* Depends on 4cbf6f6 (seccomp: Use FIFO semantics to order notifications) */
+ if (!ksft_min_kernel_version(5, 19)) {
+ SKIP(return, "Kernel version < 5.19");
+ return;
+ }
+
struct seccomp_notif_resp resp = {};
struct seccomp_notif req = {};
int i, status, listener;
@@ -4623,6 +4689,12 @@ static long get_proc_syscall(struct __test_metadata *_metadata, int pid)
/* Ensure non-fatal signals prior to receive are unmodified */
TEST(user_notification_wait_killable_pre_notification)
{
+ /* Depends on c2aa2df (seccomp: Add wait_killable semantic to seccomp user notifier) */
+ if (!ksft_min_kernel_version(5, 19)) {
+ SKIP(return, "Kernel version < 5.19");
+ return;
+ }
+
struct sigaction new_action = {
.sa_handler = signal_handler,
};
@@ -4693,6 +4765,12 @@ TEST(user_notification_wait_killable_pre_notification)
/* Ensure non-fatal signals after receive are blocked */
TEST(user_notification_wait_killable)
{
+ /* Depends on c2aa2df (seccomp: Add wait_killable semantic to seccomp user notifier) */
+ if (!ksft_min_kernel_version(5, 19)) {
+ SKIP(return, "Kernel version < 5.19");
+ return;
+ }
+
struct sigaction new_action = {
.sa_handler = signal_handler,
};
@@ -4772,6 +4850,12 @@ TEST(user_notification_wait_killable)
/* Ensure fatal signals after receive are not blocked */
TEST(user_notification_wait_killable_fatal)
{
+ /* Depends on c2aa2df (seccomp: Add wait_killable semantic to seccomp user notifier) */
+ if (!ksft_min_kernel_version(5, 19)) {
+ SKIP(return, "Kernel version < 5.19");
+ return;
+ }
+
struct seccomp_notif req = {};
int listener, status;
pid_t pid;
@@ -4854,6 +4938,12 @@ static void *tsync_vs_dead_thread_leader_sibling(void *_args)
*/
TEST(tsync_vs_dead_thread_leader)
{
+ /* Depends on bfafe5e (seccomp: release task filters when the task exits) */
+ if (!ksft_min_kernel_version(6, 11)) {
+ SKIP(return, "Kernel version < 6.11");
+ return;
+ }
+
int status;
pid_t pid;
long ret;
--
2.50.1.703.g449372360f-goog
syzbot ci has tested the following series
[v4] ipvlan: support mac-nat mode
https://lore.kernel.org/all/20251118100046.2944392-1-skorodumov.dmitry@huaw…
* [PATCH net-next 01/13] ipvlan: Support MACNAT mode
* [PATCH net-next 02/13] ipvlan: macnat: Handle rx mcast-ip and unicast eth
* [PATCH net-next 03/13] ipvlan: Forget all IP when device goes down
* [PATCH net-next 04/13] ipvlan: Support IPv6 in macnat mode.
* [PATCH net-next 05/13] ipvlan: Fix compilation warning about __be32 -> u32
* [PATCH net-next 06/13] ipvlan: Make the addrs_lock be per port
* [PATCH net-next 07/13] ipvlan: Take addr_lock in ipvlan_open()
* [PATCH net-next 08/13] ipvlan: Don't allow children to use IPs of main
* [PATCH net-next 09/13] ipvlan: const-specifier for functions that use iaddr
* [PATCH net-next 10/13] ipvlan: Common code from v6/v4 validator_event
* [PATCH net-next 11/13] ipvlan: common code to handle ipv6/ipv4 address events
* [PATCH net-next 12/13] ipvlan: Ignore PACKET_LOOPBACK in handle_mode_l2()
* [PATCH net-next 13/13] selftests: drv-net: selftest for ipvlan-macnat mode
and found the following issue:
WARNING: suspicious RCU usage in ipvlan_addr_event
Full report is available here:
https://ci.syzbot.org/series/e483b93a-1063-4c8a-b0e2-89530e79768b
***
WARNING: suspicious RCU usage in ipvlan_addr_event
tree: net-next
URL: https://kernel.googlesource.com/pub/scm/linux/kernel/git/netdev/net-next.git
base: c99ebb6132595b4b288a413981197eb076547c5a
arch: amd64
compiler: Debian clang version 20.1.8 (++20250708063551+0c9f909b7976-1~exp1~20250708183702.136), Debian LLD 20.1.8
config: https://ci.syzbot.org/builds/ac5af6f3-6b14-4e35-9d81-ee1522de3952/config
8021q: adding VLAN 0 to HW filter on device batadv0
=============================
WARNING: suspicious RCU usage
syzkaller #0 Not tainted
-----------------------------
drivers/net/ipvlan/ipvlan.h:128 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
2 locks held by syz-executor/5984:
#0: ffffffff8f2cc248 (rtnl_mutex){+.+.}-{4:4}, at: inet_rtm_newaddr+0x3b0/0x18b0
#1: ffffffff8f39d9b0 ((inetaddr_chain).rwsem){++++}-{4:4}, at: blocking_notifier_call_chain+0x54/0x90
stack backtrace:
CPU: 1 UID: 0 PID: 5984 Comm: syz-executor Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250
lockdep_rcu_suspicious+0x140/0x1d0
ipvlan_addr_event+0x60b/0x950
notifier_call_chain+0x1b6/0x3e0
blocking_notifier_call_chain+0x6a/0x90
__inet_insert_ifa+0xa13/0xbf0
inet_rtm_newaddr+0xf3a/0x18b0
rtnetlink_rcv_msg+0x7cf/0xb70
netlink_rcv_skb+0x208/0x470
netlink_unicast+0x82f/0x9e0
netlink_sendmsg+0x805/0xb30
__sock_sendmsg+0x21c/0x270
__sys_sendto+0x3bd/0x520
__x64_sys_sendto+0xde/0x100
do_syscall_64+0xfa/0xfa0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f711f191503
Code: 64 89 02 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 80 3d 61 70 22 00 00 41 89 ca 74 14 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 55 48 83 ec 30 44 89 4c 24
RSP: 002b:00007ffc44b05f28 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f711ff14620 RCX: 00007f711f191503
RDX: 0000000000000028 RSI: 00007f711ff14670 RDI: 0000000000000003
RBP: 0000000000000001 R08: 00007ffc44b05f44 R09: 000000000000000c
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000003
R13: 0000000000000000 R14: 00007f711ff14670 R15: 0000000000000000
</TASK>
syz-executor (5984) used greatest stack depth: 19864 bytes left
***
If these findings have caused you to resend the series or submit a
separate fix, please add the following tag to your commit message:
Tested-by: syzbot(a)syzkaller.appspotmail.com
---
This report is generated by a bot. It may contain errors.
syzbot ci engineers can be reached at syzkaller(a)googlegroups.com.
Main objective of this series is to convert the gro.sh and toeplitz.sh
tests to be "NIPA-compatible" - meaning make use of the Python env,
which lets us run the tests against either netdevsim or a real device.
The tests seem to have been written with a different flow in mind.
Namely they source different bash "setup" scripts depending on arguments
passed to the test. While I have nothing against the use of bash and
the overall architecture - the existing code needs quite a bit of work
(don't assume MAC/IP addresses, support remote endpoint over SSH).
If I'm the one fixing it, I'd rather convert them to our "simplistic"
Python.
This series rewrites the tests in Python while addressing their
shortcomings. The functionality of running the test over loopback
on a real device is retained but with a different method of invocation
(see the last patch).
Once again we are dealing with a script which run over a variety of
protocols (combination of [ipv4, ipv6, ipip] x [tcp, udp]). The first
4 patches add support for test variants to our scripts. We use the
term "variant" in the same sense as the C kselftest_harness.h -
variant is just a set of static input arguments.
Note that neither GRO nor the Toeplitz test fully passes for me on
any HW I have access to. But this is unrelated to the conversion.
This series is not making any real functional changes to the tests,
it is limited to improving the "test harness" scripts.
Jakub Kicinski (12):
selftests: net: py: coding style improvements
selftests: net: py: extract the case generation logic
selftests: net: py: add test variants
selftests: drv-net: xdp: use variants for qstat tests
selftests: net: relocate gro and toeplitz tests to drivers/net
selftests: net: py: support ksft ready without wait
selftests: net: py: read ip link info about remote dev
netdevsim: pass packets thru GRO on Rx
selftests: drv-net: add a Python version of the GRO test
selftests: drv-net: hw: convert the Toeplitz test to Python
netdevsim: add loopback support
selftests: net: remove old setup_* scripts
tools/testing/selftests/drivers/net/Makefile | 2 +
.../testing/selftests/drivers/net/hw/Makefile | 6 +-
tools/testing/selftests/net/Makefile | 7 -
tools/testing/selftests/net/lib/Makefile | 1 +
drivers/net/netdevsim/netdev.c | 26 ++-
.../testing/selftests/{ => drivers}/net/gro.c | 5 +-
.../{net => drivers/net/hw}/toeplitz.c | 7 +-
.../testing/selftests/drivers/net/.gitignore | 1 +
tools/testing/selftests/drivers/net/gro.py | 161 ++++++++++++++
.../selftests/drivers/net/hw/.gitignore | 3 +-
.../drivers/net/hw/lib/py/__init__.py | 4 +-
.../selftests/drivers/net/hw/toeplitz.py | 208 ++++++++++++++++++
.../selftests/drivers/net/lib/py/__init__.py | 4 +-
.../selftests/drivers/net/lib/py/env.py | 2 +
tools/testing/selftests/drivers/net/xdp.py | 42 ++--
tools/testing/selftests/net/.gitignore | 2 -
tools/testing/selftests/net/gro.sh | 105 ---------
.../selftests/net/lib/ksft_setup_loopback.sh | 111 ++++++++++
.../testing/selftests/net/lib/py/__init__.py | 5 +-
tools/testing/selftests/net/lib/py/ksft.py | 93 ++++++--
tools/testing/selftests/net/lib/py/nsim.py | 2 +-
tools/testing/selftests/net/lib/py/utils.py | 20 +-
tools/testing/selftests/net/setup_loopback.sh | 120 ----------
tools/testing/selftests/net/setup_veth.sh | 45 ----
tools/testing/selftests/net/toeplitz.sh | 199 -----------------
.../testing/selftests/net/toeplitz_client.sh | 28 ---
26 files changed, 631 insertions(+), 578 deletions(-)
rename tools/testing/selftests/{ => drivers}/net/gro.c (99%)
rename tools/testing/selftests/{net => drivers/net/hw}/toeplitz.c (99%)
create mode 100755 tools/testing/selftests/drivers/net/gro.py
create mode 100755 tools/testing/selftests/drivers/net/hw/toeplitz.py
delete mode 100755 tools/testing/selftests/net/gro.sh
create mode 100755 tools/testing/selftests/net/lib/ksft_setup_loopback.sh
delete mode 100644 tools/testing/selftests/net/setup_loopback.sh
delete mode 100644 tools/testing/selftests/net/setup_veth.sh
delete mode 100755 tools/testing/selftests/net/toeplitz.sh
delete mode 100755 tools/testing/selftests/net/toeplitz_client.sh
--
2.51.1
Here are a bunch of small improvements to the MPTCP selftests:
- Patch 1: move code to mptcp_lib.sh to prepare the new features.
- Patch 2: simplify mptcp_lib_pr_err_stats helper use.
- Patch 3: remove unused last column from nstat output.
- Patch 4: improve stats dump in mptcp_join.sh.
- Patch 5: get counters from nstat history and simplify mptcp_connect.sh.
- Patch 6: avoid taking the same packet trace twice.
- Patch 7: wait for an event instead of a fix time.
- Patch 8: instead of using 'timeout' and print the stats after, another
internal timeout is used: if it fires, it will print stats, then stop
everything. This avoids confusions around stats in case of timeout.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Matthieu Baerts (NGI0) (8):
selftests: mptcp: lib: introduce 'nstat_{init,get}'
selftests: mptcp: lib: remove stats files args
selftests: mptcp: lib: stats: remove nstat rate columns
selftests: mptcp: join: dump stats from history
selftests: mptcp: lib: get counters from nstat history
selftests: mptcp: connect: avoid double packet traces
selftests: mptcp: wait for port instead of sleep
selftests: mptcp: get stats just before timing out
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 140 ++++++++++-----------
tools/testing/selftests/net/mptcp/mptcp_join.sh | 65 +++++-----
tools/testing/selftests/net/mptcp/mptcp_lib.sh | 58 +++++++--
tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 43 ++++---
tools/testing/selftests/net/mptcp/simult_flows.sh | 44 ++++---
tools/testing/selftests/net/mptcp/userspace_pm.sh | 3 +-
6 files changed, 203 insertions(+), 150 deletions(-)
---
base-commit: df58ee7d8faf353ebf5d4703c35fcf3e578e9b1b
change-id: 20251114-net-next-mptcp-sft-count-cache-stats-timeout-faa64482db92
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Examples (i.e. doctests) may want to use names such as `foo`, thus the
`clippy::disallowed_names` lint gets in the way.
Thus allow it for all doctests.
In addition, remove it from the existing `expect`s we have in a few
doctests.
This does not mean that we should stop trying to find good names for
our examples, though.
Suggested-by: Alice Ryhl <aliceryhl(a)google.com>
Link: https://lore.kernel.org/rust-for-linux/aRHSLChi5HYXW4-9@google.com/
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
---
rust/kernel/init.rs | 3 +--
rust/kernel/types.rs | 1 -
scripts/rustdoc_test_gen.rs | 2 +-
3 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/rust/kernel/init.rs b/rust/kernel/init.rs
index e476d81c1a27..899b9a962762 100644
--- a/rust/kernel/init.rs
+++ b/rust/kernel/init.rs
@@ -30,7 +30,7 @@
//! ## General Examples
//!
//! ```rust
-//! # #![expect(clippy::disallowed_names, clippy::undocumented_unsafe_blocks)]
+//! # #![expect(clippy::undocumented_unsafe_blocks)]
//! use kernel::types::Opaque;
//! use pin_init::pin_init_from_closure;
//!
@@ -67,7 +67,6 @@
//! ```
//!
//! ```rust
-//! # #![expect(clippy::disallowed_names)]
//! use kernel::{prelude::*, types::Opaque};
//! use core::{ptr::addr_of_mut, marker::PhantomPinned, pin::Pin};
//! # mod bindings {
diff --git a/rust/kernel/types.rs b/rust/kernel/types.rs
index 835824788506..9c5e7dbf1632 100644
--- a/rust/kernel/types.rs
+++ b/rust/kernel/types.rs
@@ -289,7 +289,6 @@ fn drop(&mut self) {
/// # Examples
///
/// ```
-/// # #![expect(clippy::disallowed_names)]
/// use kernel::types::Opaque;
/// # // Emulate a C struct binding which is from C, maybe uninitialized or not, only the C side
/// # // knows.
diff --git a/scripts/rustdoc_test_gen.rs b/scripts/rustdoc_test_gen.rs
index 0e6a0542d1bd..be0561049660 100644
--- a/scripts/rustdoc_test_gen.rs
+++ b/scripts/rustdoc_test_gen.rs
@@ -208,7 +208,7 @@ macro_rules! assert_eq {{
#[allow(unused)]
static __DOCTEST_ANCHOR: i32 = ::core::line!() as i32 + {body_offset} + 1;
{{
- #![allow(unreachable_pub)]
+ #![allow(unreachable_pub, clippy::disallowed_names)]
{body}
main();
}}
--
2.51.2
Commit 4dfd4bba8578 ("selftests/mm/uffd: refactor non-composite global
vars into struct") moved some of the operations previously implemented
in uffd_setup_environment() earlier in the main test loop.
The calculation of nr_pages, which involves a division by page_size, now
occurs before checking that default_huge_page_size() returns a non-zero
This leads to a division-by-zero error on systems with !CONFIG_HUGETLB.
Fix this by relocating the non-zero page_size check before the nr_pages
calculation, as it was originally implemented.
Cc: stable(a)vger.kernel.org
Fixes: 4dfd4bba8578 ("selftests/mm/uffd: refactor non-composite global vars into struct")
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
---
tools/testing/selftests/mm/uffd-unit-tests.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/mm/uffd-unit-tests.c b/tools/testing/selftests/mm/uffd-unit-tests.c
index 9e3be2ee7f1b..f917b4c4c943 100644
--- a/tools/testing/selftests/mm/uffd-unit-tests.c
+++ b/tools/testing/selftests/mm/uffd-unit-tests.c
@@ -1758,10 +1758,15 @@ int main(int argc, char *argv[])
uffd_test_ops = mem_type->mem_ops;
uffd_test_case_ops = test->test_case_ops;
- if (mem_type->mem_flag & (MEM_HUGETLB_PRIVATE | MEM_HUGETLB))
+ if (mem_type->mem_flag & (MEM_HUGETLB_PRIVATE | MEM_HUGETLB)) {
gopts.page_size = default_huge_page_size();
- else
+ if (gopts.page_size == 0) {
+ uffd_test_skip("huge page size is 0, feature missing?");
+ continue;
+ }
+ } else {
gopts.page_size = psize();
+ }
/* Ensure we have at least 2 pages */
gopts.nr_pages = MAX(UFFD_TEST_MEM_SIZE, gopts.page_size * 2)
@@ -1776,12 +1781,6 @@ int main(int argc, char *argv[])
continue;
uffd_test_start("%s on %s", test->name, mem_type->name);
- if ((mem_type->mem_flag == MEM_HUGETLB ||
- mem_type->mem_flag == MEM_HUGETLB_PRIVATE) &&
- (default_huge_page_size() == 0)) {
- uffd_test_skip("huge page size is 0, feature missing?");
- continue;
- }
if (!uffd_feature_supported(test)) {
uffd_test_skip("feature missing");
continue;
--
2.51.2.1041.gc1ab5b90ca-goog
The series is separated from [1] to show the independency and compare
potential use cases easier. This use case replaces filp->f_op to
revocable-aware warppers. It relies on the revocable core part [2].
It tries to fix an UAF in the fops of cros_ec_chardev after the
underlying protocol device has gone by using revocable.
The warppers make sure file operations in drivers won't be called if the
resource has been revoked.
The 1st patch introduces revocable fops replacement.
The 2nd patch supports the fops replacement in miscdevice.
The 3rd patch uses the support from miscdevice to fix the UAF.
[1] https://lore.kernel.org/chrome-platform/20251016054204.1523139-1-tzungbi@ke…
[2] https://lore.kernel.org/chrome-platform/20251106152330.11733-1-tzungbi@kern…
v6:
- New, separated from an existing series.
Tzung-Bi Shih (3):
revocable: Add fops replacement
char: misc: Leverage revocable fops replacement
platform/chrome: cros_ec_chardev: Secure cros_ec_device via revocable
drivers/char/misc.c | 18 ++-
drivers/platform/chrome/cros_ec_chardev.c | 1 +
fs/Makefile | 2 +-
fs/fs_revocable.c | 156 ++++++++++++++++++++++
include/linux/fs_revocable.h | 14 ++
include/linux/miscdevice.h | 2 +
6 files changed, 190 insertions(+), 3 deletions(-)
create mode 100644 fs/fs_revocable.c
create mode 100644 include/linux/fs_revocable.h
--
2.48.1
The exception vector constants CP_VECTOR, HV_VECTOR, VC_VECTOR, and
SX_VECTOR are used in ex_str(), but the header that defines
them is not included. Other exception vectors are picked up through
indirect includes, but these four are not, which leads to unresolved
identifiers during selftest builds.
lib/x86/processor.c: In function ‘ex_str’:
lib/x86/processor.c:52:17: error: ‘CP_VECTOR’ undeclared
lib/x86/processor.c:53:17: error: ‘HV_VECTOR’ undeclared
lib/x86/processor.c:54:17: error: ‘VC_VECTOR’ undeclared
lib/x86/processor.c:55:17: error: ‘SX_VECTOR’ undeclared
These vector definitions live in:
tools/arch/x86/include/uapi/asm/kvm.h
Add the missing include the userspace API exception vector constants.
Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com>
---
tools/testing/selftests/kvm/lib/x86/processor.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/kvm/lib/x86/processor.c b/tools/testing/selftests/kvm/lib/x86/processor.c
index b418502c5ecc..fb589f07f2a4 100644
--- a/tools/testing/selftests/kvm/lib/x86/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86/processor.c
@@ -4,6 +4,7 @@
*/
#include "linux/bitmap.h"
+#include "uapi/asm/kvm.h"
#include "test_util.h"
#include "kvm_util.h"
#include "pmu.h"
--
2.51.1
This patchset introduces target resume capability to netconsole allowing
it to recover targets when underlying low-level interface comes back
online.
The patchset starts by refactoring netconsole state representation in
order to allow representing deactivated targets (targets that are
disabled due to interfaces going down).
It then modifies netconsole to handle NETDEV_UP events for such targets
and setups netpoll. Targets are matched with incoming interfaces
depending on how they were initially bound in netconsole (by mac or
interface name).
The patchset includes a selftest that validates netconsole target state
transitions and that target is functional after resumed.
Signed-off-by: Andre Carvalho <asantostc(a)gmail.com>
---
Changes in v4:
- Simplify selftest cleanup, removing trap setup in loop.
- Drop netpoll helper (__setup_netpoll_hold) and manage reference inside
netconsole.
- Move resume_list processing logic to separate function.
- Link to v3: https://lore.kernel.org/r/20251109-netcons-retrigger-v3-0-1654c280bbe6@gmai…
Changes in v3:
- Resume by mac or interface name depending on how target was created.
- Attempt to resume target without holding target list lock, by moving
the target to a temporary list. This is required as netpoll may
attempt to allocate memory.
- Link to v2: https://lore.kernel.org/r/20250921-netcons-retrigger-v2-0-a0e84006237f@gmai…
Changes in v2:
- Attempt to resume target in the same thread, instead of using
workqueue .
- Add wrapper around __netpoll_setup (patch 4).
- Renamed resume_target to maybe_resume_target and moved conditionals to
inside its implementation, keeping code more clear.
- Verify that device addr matches target mac address when target was
setup using mac.
- Update selftest to cover targets bound by mac and interface name.
- Fix typo in selftest comment and sort tests alphabetically in
Makefile.
- Link to v1:
https://lore.kernel.org/r/20250909-netcons-retrigger-v1-0-3aea904926cf@gmai…
---
Andre Carvalho (3):
netconsole: convert 'enabled' flag to enum for clearer state management
netconsole: resume previously deactivated target
selftests: netconsole: validate target resume
Breno Leitao (2):
netconsole: add target_state enum
netconsole: add STATE_DEACTIVATED to track targets disabled by low level
drivers/net/netconsole.c | 145 ++++++++++++++++-----
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 35 ++++-
.../selftests/drivers/net/netcons_resume.sh | 97 ++++++++++++++
4 files changed, 244 insertions(+), 34 deletions(-)
---
base-commit: c9dfb92de0738eb7fe6a591ad1642333793e8b6e
change-id: 20250816-netcons-retrigger-a4f547bfc867
Best regards,
--
Andre Carvalho <asantostc(a)gmail.com>
This series adds namespace support to vhost-vsock and loopback. It does
not add namespaces to any of the other guest transports (virtio-vsock,
hyperv, or vmci).
The current revision supports two modes: local and global. Local
mode is complete isolation of namespaces, while global mode is complete
sharing between namespaces of CIDs (the original behavior).
The mode is set using /proc/sys/net/vsock/ns_mode.
Modes are per-netns and write-once. This allows a system to configure
namespaces independently (some may share CIDs, others are completely
isolated). This also supports future possible mixed use cases, where
there may be namespaces in global mode spinning up VMs while there are
mixed mode namespaces that provide services to the VMs, but are not
allowed to allocate from the global CID pool (this mode is not
implemented in this series).
If a socket or VM is created when a namespace is global but the
namespace changes to local, the socket or VM will continue working
normally. That is, the socket or VM assumes the mode behavior of the
namespace at the time the socket/VM was created. The original mode is
captured in vsock_create() and so occurs at the time of socket(2) and
accept(2) for sockets and open(2) on /dev/vhost-vsock for VMs. This
prevents a socket/VM connection from suddenly breaking due to a
namespace mode change. Any new sockets/VMs created after the mode change
will adopt the new mode's behavior.
Additionally, added tests for the new namespace features:
tools/testing/selftests/vsock/vmtest.sh
1..29
ok 1 vm_server_host_client
ok 2 vm_client_host_server
ok 3 vm_loopback
ok 4 ns_guest_local_mode_rejected
ok 5 ns_host_vsock_ns_mode_ok
ok 6 ns_host_vsock_ns_mode_write_once_ok
ok 7 ns_global_same_cid_fails
ok 8 ns_local_same_cid_ok
ok 9 ns_global_local_same_cid_ok
ok 10 ns_local_global_same_cid_ok
ok 11 ns_diff_global_host_connect_to_global_vm_ok
ok 12 ns_diff_global_host_connect_to_local_vm_fails
ok 13 ns_diff_global_vm_connect_to_global_host_ok
ok 14 ns_diff_global_vm_connect_to_local_host_fails
ok 15 ns_diff_local_host_connect_to_local_vm_fails
ok 16 ns_diff_local_vm_connect_to_local_host_fails
ok 17 ns_diff_global_to_local_loopback_local_fails
ok 18 ns_diff_local_to_global_loopback_fails
ok 19 ns_diff_local_to_local_loopback_fails
ok 20 ns_diff_global_to_global_loopback_ok
ok 21 ns_same_local_loopback_ok
ok 22 ns_same_local_host_connect_to_local_vm_ok
ok 23 ns_same_local_vm_connect_to_local_host_ok
ok 24 ns_mode_change_connection_continue_vm_ok
ok 25 ns_mode_change_connection_continue_host_ok
ok 26 ns_mode_change_connection_continue_both_ok
ok 27 ns_delete_vm_ok
ok 28 ns_delete_host_ok
ok 29 ns_delete_both_ok
SUMMARY: PASS=29 SKIP=0 FAIL=0
Dependent on series:
https://lore.kernel.org/all/20251108-vsock-selftests-fixes-and-improvements…
Thanks again for everyone's help and reviews!
Suggested-by: Sargun Dhillon <sargun(a)sargun.me>
Signed-off-by: Bobby Eshleman <bobbyeshleman(a)gmail.com>
To: Stefano Garzarella <sgarzare(a)redhat.com>
To: Shuah Khan <shuah(a)kernel.org>
To: David S. Miller <davem(a)davemloft.net>
To: Eric Dumazet <edumazet(a)google.com>
To: Jakub Kicinski <kuba(a)kernel.org>
To: Paolo Abeni <pabeni(a)redhat.com>
To: Simon Horman <horms(a)kernel.org>
To: Stefan Hajnoczi <stefanha(a)redhat.com>
To: Michael S. Tsirkin <mst(a)redhat.com>
To: Jason Wang <jasowang(a)redhat.com>
To: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
To: Eugenio Pérez <eperezma(a)redhat.com>
To: K. Y. Srinivasan <kys(a)microsoft.com>
To: Haiyang Zhang <haiyangz(a)microsoft.com>
To: Wei Liu <wei.liu(a)kernel.org>
To: Dexuan Cui <decui(a)microsoft.com>
To: Bryan Tan <bryan-bt.tan(a)broadcom.com>
To: Vishnu Dasa <vishnu.dasa(a)broadcom.com>
To: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: virtualization(a)lists.linux.dev
Cc: netdev(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: kvm(a)vger.kernel.org
Cc: linux-hyperv(a)vger.kernel.org
Cc: berrange(a)redhat.com
Cc: Sargun Dhillon <sargun(a)sargun.me>
Changes in v9:
- reorder loopback patch after patch for virtio transport common code
- remove module ordering tests patch because loopback no longer depends
on pernet ops
- major simplifications in vsock_loopback
- added a new patch for blocking local mode for guests, added test case
to check
- add net ref tracking to vsock_loopback patch
- Link to v8: https://lore.kernel.org/r/20251023-vsock-vmtest-v8-0-dea984d02bb0@meta.com
Changes in v8:
- Break generic cleanup/refactoring patches into standalone series,
remove those from this series
- Link to dependency: https://lore.kernel.org/all/20251022-vsock-selftests-fixes-and-improvements…
- Link to v7: https://lore.kernel.org/r/20251021-vsock-vmtest-v7-0-0661b7b6f081@meta.com
Changes in v7:
- fix hv_sock build
- break out vmtest patches into distinct, more well-scoped patches
- change `orig_net_mode` to `net_mode`
- many fixes and style changes in per-patch change sets (see individual
patches for specific changes)
- optimize `virtio_vsock_skb_cb` layout
- update commit messages with more useful descriptions
- vsock_loopback: use orig_net_mode instead of current net mode
- add tests for edge cases (ns deletion, mode changing, loopback module
load ordering)
- Link to v6: https://lore.kernel.org/r/20250916-vsock-vmtest-v6-0-064d2eb0c89d@meta.com
Changes in v6:
- define behavior when mode changes to local while socket/VM is alive
- af_vsock: clarify description of CID behavior
- af_vsock: use stronger langauge around CID rules (dont use "may")
- af_vsock: improve naming of buf/buffer
- af_vsock: improve string length checking on proc writes
- vsock_loopback: add space in struct to clarify lock protection
- vsock_loopback: do proper cleanup/unregister on vsock_loopback_exit()
- vsock_loopback: use virtio_vsock_skb_net() instead of sock_net()
- vsock_loopback: set loopback to NULL after kfree()
- vsock_loopback: use pernet_operations and remove callback mechanism
- vsock_loopback: add macros for "global" and "local"
- vsock_loopback: fix length checking
- vmtest.sh: check for namespace support in vmtest.sh
- Link to v5: https://lore.kernel.org/r/20250827-vsock-vmtest-v5-0-0ba580bede5b@meta.com
Changes in v5:
- /proc/net/vsock_ns_mode -> /proc/sys/net/vsock/ns_mode
- vsock_global_net -> vsock_global_dummy_net
- fix netns lookup in vhost_vsock to respect pid namespaces
- add callbacks for vsock_loopback to avoid circular dependency
- vmtest.sh loads vsock_loopback module
- remove vsock_net_mode_can_set()
- change vsock_net_write_mode() to return true/false based on success
- make vsock_net_mode enum instead of u8
- Link to v4: https://lore.kernel.org/r/20250805-vsock-vmtest-v4-0-059ec51ab111@meta.com
Changes in v4:
- removed RFC tag
- implemented loopback support
- renamed new tests to better reflect behavior
- completed suite of tests with permutations of ns modes and vsock_test
as guest/host
- simplified socat bridging with unix socket instead of tcp + veth
- only use vsock_test for success case, socat for failure case (context
in commit message)
- lots of cleanup
Changes in v3:
- add notion of "modes"
- add procfs /proc/net/vsock_ns_mode
- local and global modes only
- no /dev/vhost-vsock-netns
- vmtest.sh already merged, so new patch just adds new tests for NS
- Link to v2:
https://lore.kernel.org/kvm/20250312-vsock-netns-v2-0-84bffa1aa97a@gmail.com
Changes in v2:
- only support vhost-vsock namespaces
- all g2h namespaces retain old behavior, only common API changes
impacted by vhost-vsock changes
- add /dev/vhost-vsock-netns for "opt-in"
- leave /dev/vhost-vsock to old behavior
- removed netns module param
- Link to v1:
https://lore.kernel.org/r/20200116172428.311437-1-sgarzare@redhat.com
Changes in v1:
- added 'netns' module param to vsock.ko to enable the
network namespace support (disabled by default)
- added 'vsock_net_eq()' to check the "net" assigned to a socket
only when 'netns' support is enabled
- Link to RFC: https://patchwork.ozlabs.org/cover/1202235/
---
Bobby Eshleman (14):
vsock: a per-net vsock NS mode state
vsock: add netns to vsock core
vsock/virtio: add netns support to virtio transport and virtio common
vsock/virtio: pack struct virtio_vsock_skb_cb
vsock: add netns and netns_tracker to vsock skb cb
vsock/loopback: add netns support
vhost/vsock: add netns support
vsock: reject bad VSOCK_NET_MODE_LOCAL configuration for G2H
selftests/vsock: add namespace helpers to vmtest.sh
selftests/vsock: prepare vm management helpers for namespaces
selftests/vsock: add tests for proc sys vsock ns_mode
selftests/vsock: add namespace tests for CID collisions
selftests/vsock: add tests for host <-> vm connectivity with namespaces
selftests/vsock: add tests for namespace deletion and mode changes
MAINTAINERS | 1 +
drivers/vhost/vsock.c | 48 +-
include/linux/virtio_vsock.h | 43 +-
include/net/af_vsock.h | 57 +-
include/net/net_namespace.h | 4 +
include/net/netns/vsock.h | 17 +
net/vmw_vsock/af_vsock.c | 290 +++++++++-
net/vmw_vsock/hyperv_transport.c | 1 +
net/vmw_vsock/virtio_transport.c | 14 +-
net/vmw_vsock/virtio_transport_common.c | 57 +-
net/vmw_vsock/vsock_loopback.c | 48 +-
tools/testing/selftests/vsock/vmtest.sh | 931 ++++++++++++++++++++++++++++++--
12 files changed, 1418 insertions(+), 93 deletions(-)
---
base-commit: 962ac5ca99a5c3e7469215bf47572440402dfd59
change-id: 20250325-vsock-vmtest-b3a21d2102c2
prerequisite-message-id: <20251022-vsock-selftests-fixes-and-improvements-v1-0-edeb179d6463(a)meta.com>
prerequisite-patch-id: a2eecc3851f2509ed40009a7cab6990c6d7cfff5
prerequisite-patch-id: 501db2100636b9c8fcb3b64b8b1df797ccbede85
prerequisite-patch-id: ba1a2f07398a035bc48ef72edda41888614be449
prerequisite-patch-id: fd5cc5445aca9355ce678e6d2bfa89fab8a57e61
prerequisite-patch-id: 795ab4432ffb0843e22b580374782e7e0d99b909
prerequisite-patch-id: 1499d263dc933e75366c09e045d2125ca39f7ddd
prerequisite-patch-id: f92d99bb1d35d99b063f818a19dcda999152d74c
prerequisite-patch-id: e3296f38cdba6d903e061cff2bbb3e7615e8e671
prerequisite-patch-id: bc4662b4710d302d4893f58708820fc2a0624325
prerequisite-patch-id: f8991f2e98c2661a706183fde6b35e2b8d9aedcf
prerequisite-patch-id: 44bf9ed69353586d284e5ee63d6fffa30439a698
prerequisite-patch-id: d50621bc630eeaf608bbaf260370c8dabf6326df
Best regards,
--
Bobby Eshleman <bobbyeshleman(a)meta.com>
In cgroup v2, a mutual overlap check is required when at least one of
two
cpusets is exclusive. However, this check should be relaxed and limited
to
cases where both cpusets are exclusive.
This patch ensures that for sibling cpusets A1 (exclusive) and B1
(non-exclusive), change B1 cannot affect A1's exclusivity.
for example. Assume a machine has 4 CPUs (0-3).
root cgroup
/ \
A1 B1
Case 1:
Table 1.1: Before applying the patch
Step | A1's prstate | B1'sprstate |
#1> echo "0-1" > A1/cpuset.cpus | member | member |
#2> echo "root" > A1/cpuset.cpus.partition | root | member |
#3> echo "0" > B1/cpuset.cpus | root invalid | member |
After step #3, A1 changes from "root" to "root invalid" because its CPUs
(0-1) overlap with those requested by B1 (0-3). However, B1 can actually
use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to
remain as "root."
Table 1.2: After applying the patch
Step | A1's prstate | B1'sprstate |
#1> echo "0-1" > A1/cpuset.cpus | member | member |
#2> echo "root" > A1/cpuset.cpus.partition | root | member |
#3> echo "0" > B1/cpuset.cpus | root | member |
Case 2: (This situation remains unchanged from before)
Table 2.1: Before applying the patch
Step | A1's prstate | B1'sprstate |
#1> echo "0-1" > A1/cpuset.cpus | member | member |
#3> echo "1-2" > B1/cpuset.cpus | member | member |
#2> echo "root" > A1/cpuset.cpus.partition | root invalid | member |
Table 2.2: After applying the patch
Step | A1's prstate | B1'sprstate |
#1> echo "0-1" > A1/cpuset.cpus | member | member |
#3> echo "1-2" > B1/cpuset.cpus | member | member |
#2> echo "root" > A1/cpuset.cpus.partition | root invalid | member |
All other cases remain unaffected. For example, cgroup-v1, both A1 and
B1 are exclusive or non-exlusive.
---
v2 -> v3:
- Ensure compliance with constraints such as cpuset.cpus.exclusive.
- Link:
https://lore.kernel.org/cgroups/20251113131434.606961-1-sunshaojie@kylinos.…
v1 -> v2:
- Keeps the current cgroup v1 behavior unchanged
- Link:
https://lore.kernel.org/cgroups/c8e234f4-2c27-4753-8f39-8ae83197efd3@redhat…
kernel/cgroup/cpuset-internal.h | 3 ++
kernel/cgroup/cpuset-v1.c | 20 +++++++++
kernel/cgroup/cpuset.c | 44 ++++++++++++++-----
.../selftests/cgroup/test_cpuset_prs.sh | 10 ++---
4 files changed, 60 insertions(+), 17 deletions(-)
--
2.25.1
On Fri, Nov 14, 2025 at 11:55:48AM +0800, Guopeng Zhang <zhangguopeng(a)kylinos.cn> wrote:
> Actually, selftests are no longer just something for developers to view locally; they are now extensively
> run in CI and stable branch regression testing. Using a standardized layout means that general test runners
> and CI systems can parse the cgroup test results without any special handling.
Nice. I appreciate you took this up.
> This patch is not part of a formal, tree-wide conversion series I am running; it is an incremental step to align the
> cgroup C tests with the existing TAP usage. I started here because these tests already use ksft_test_result_*() and
> only require minor changes to generate proper TAP output.
The tests are in various state of usage, correctness and usefulness,
hence...
>
> > I'm asking to better asses whether also the scripts listed in
> > Makefile:TEST_PROGS should be converted too.
>
> I agree that having them produce TAP output would benefit tooling and CI. I did not want to mix
> that into this change, but if you and other maintainers think this direction is reasonable,
> I would be happy to follow up and convert the cgroup shell tests to TAP as well.
...I'd suggest next focus on test_cpuset_prs.sh (as discussed, it may
need more changes to adapt its output too).
Michal
Remove the "trigger_count" in trigger_bench.c and reuse trigger_driver()
instead for trigger_kernel_count_setup().
With the calling to bpf_get_numa_node_id(), the result for "kernel_count"
will become a little more accurate.
It will also easier if we want to test the performance of livepatch, just
hook the bpf_get_numa_node_id() and run the "kernel_count" bench trigger.
Signed-off-by: Menglong Dong <dongml2(a)chinatelecom.cn>
---
.../selftests/bpf/benchs/bench_trigger.c | 5 +----
.../testing/selftests/bpf/progs/trigger_bench.c | 17 +++++------------
2 files changed, 6 insertions(+), 16 deletions(-)
diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c
index 1e2aff007c2a..34fd8fa3b803 100644
--- a/tools/testing/selftests/bpf/benchs/bench_trigger.c
+++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c
@@ -179,11 +179,8 @@ static void trigger_syscall_count_setup(void)
static void trigger_kernel_count_setup(void)
{
setup_ctx();
- bpf_program__set_autoload(ctx.skel->progs.trigger_driver, false);
- bpf_program__set_autoload(ctx.skel->progs.trigger_count, true);
+ ctx.skel->rodata->kernel_count = 1;
load_ctx();
- /* override driver program */
- ctx.driver_prog_fd = bpf_program__fd(ctx.skel->progs.trigger_count);
}
static void trigger_kprobe_setup(void)
diff --git a/tools/testing/selftests/bpf/progs/trigger_bench.c b/tools/testing/selftests/bpf/progs/trigger_bench.c
index 3d5f30c29ae3..6564d1909c7b 100644
--- a/tools/testing/selftests/bpf/progs/trigger_bench.c
+++ b/tools/testing/selftests/bpf/progs/trigger_bench.c
@@ -39,26 +39,19 @@ int bench_trigger_uprobe_multi(void *ctx)
return 0;
}
+const volatile int kernel_count = 0;
const volatile int batch_iters = 0;
-SEC("?raw_tp")
-int trigger_count(void *ctx)
-{
- int i;
-
- for (i = 0; i < batch_iters; i++)
- inc_counter();
-
- return 0;
-}
-
SEC("?raw_tp")
int trigger_driver(void *ctx)
{
int i;
- for (i = 0; i < batch_iters; i++)
+ for (i = 0; i < batch_iters; i++) {
(void)bpf_get_numa_node_id(); /* attach point for benchmarking */
+ if (kernel_count)
+ inc_counter();
+ }
return 0;
}
--
2.51.2
The XDP qstats tests send 2k packets over a single socket.
Looks like when netdev CI is busy running those tests in QEMU
occasionally flakes. The target doesn't get to run at all
before all 2000 packets are sent.
Lower the number of packets to 1000 and reopen the socket
every 50 packets, to give RSS a chance to spread the packets
to multiple queues.
For the netdev CI testing either lowering the count or using
multiple sockets is enough, but let's do both for extra resiliency.
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
CC: shuah(a)kernel.org
CC: ast(a)kernel.org
CC: hawk(a)kernel.org
CC: john.fastabend(a)gmail.com
CC: sdf(a)fomichev.me
CC: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/drivers/net/xdp.py | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/xdp.py b/tools/testing/selftests/drivers/net/xdp.py
index a148004e1c36..834a37ae7d0d 100755
--- a/tools/testing/selftests/drivers/net/xdp.py
+++ b/tools/testing/selftests/drivers/net/xdp.py
@@ -687,9 +687,12 @@ from lib.py import ip, bpftool, defer
"/dev/null"
# Listener runs on "remote" in case of XDP_TX
rx_host = cfg.remote if act == XDPAction.TX else None
- # We want to spew 2000 packets quickly, bash seems to do a good enough job
- tx_udp = f"exec 5<>/dev/udp/{cfg.addr}/{port}; " \
- "for i in `seq 2000`; do echo a >&5; done; exec 5>&-"
+ # We want to spew 1000 packets quickly, bash seems to do a good enough job
+ # Each reopening of the socket gives us a differenot local port (for RSS)
+ tx_udp = "for _ in `seq 20`; do " \
+ f"exec 5<>/dev/udp/{cfg.addr}/{port}; " \
+ "for i in `seq 50`; do echo a >&5; done; " \
+ "exec 5>&-; done"
cfg.wait_hw_stats_settle()
# Qstats have more clearly defined semantics than rtnetlink.
@@ -704,11 +707,11 @@ from lib.py import ip, bpftool, defer
cfg.wait_hw_stats_settle()
after = cfg.netnl.qstats_get({"ifindex": cfg.ifindex}, dump=True)[0]
- ksft_ge(after['rx-packets'] - before['rx-packets'], 2000)
+ expected_pkts = 1000
+ ksft_ge(after['rx-packets'] - before['rx-packets'], expected_pkts)
if act == XDPAction.TX:
- ksft_ge(after['tx-packets'] - before['tx-packets'], 2000)
+ ksft_ge(after['tx-packets'] - before['tx-packets'], expected_pkts)
- expected_pkts = 2000
stats = _get_stats(prog_info["maps"]["map_xdp_stats"])
ksft_eq(stats[XDPStats.RX.value], expected_pkts, "XDP RX stats mismatch")
if act == XDPAction.TX:
--
2.51.1
Pahole fails to encode BTF for some Go projects (e.g. Kubernetes and
Podman) due to recursive type definitions that create reference loops
not representable in C. These recursive typedefs trigger a failure in
the BTF deduplication algorithm.
This patch extends btf_dedup_struct_types() to properly handle potential
recursion for BTF_KIND_TYPEDEF, similar to how recursion is already
handled for BTF_KIND_STRUCT. This allows pahole to successfully
generate BTF for Go binaries using recursive types without impacting
existing C-based workflows.
Changes in v4: fix typo found by Claude-based CI
Changes in v3:
1. Patch 1: Adjusted the comment of btf_dedup_ref_type() to refer to
typedef as well.
2. Patch 2: Update of the "dedup: recursive typedef" test to include a
duplicated version of the types to make sure deduplication still happens
in this case.
Changes in v2:
1. Patch 1: Refactored code to prevent copying existing logic. Instead of
adding a new function we modify the existing btf_dedup_struct_type()
function to handle the BTF_KIND_TYPEDEF case. Calls to btf_hash_struct()
and btf_shallow_equal_struct() are replaced with calls to functions that
select btf_hash_struct() / btf_hash_typedef() based on the type.
2. Patch 2: Added tests
v3: https://lore.kernel.org/lkml/cover.1763024337.git.paul.houssel@orange.com/
v2: https://lore.kernel.org/lkml/cover.1762956564.git.paul.houssel@orange.com/
v1: https://lore.kernel.org/lkml/20251107153408.159342-1-paulhoussel2@gmail.com/
Paul Houssel (2):
libbpf: fix BTF dedup to support recursive typedef definitions
selftests/bpf: add BTF dedup tests for recursive typedef definitions
tools/lib/bpf/btf.c | 71 +++++++++++++++-----
tools/testing/selftests/bpf/prog_tests/btf.c | 65 ++++++++++++++++++
2 files changed, 120 insertions(+), 16 deletions(-)
--
2.51.0
Hello,
IIUC this is the first independent patch series for guest_memfd's in-place
conversion series! Happy to finally bring this out on its own.
Previous versions of this feature, part of other series, are available at
[1][2][3].
Many prior discussions have led up to these main features of this series, and
these are the main points I'd like feedback on.
1. Having private/shared status stored in a maple tree (Thanks Michael for your
support of using maple trees over xarrays for performance! [4]).
2. Having a new guest_memfd ioctl (not a vm ioctl) that performs conversions.
3. Using ioctls/structs/input attribute similar to the existing vm ioctl
KVM_SET_MEMORY_ATTRIBUTES to perform conversions.
4. Storing requested attributes directly in the maple tree.
5. Using a KVM module-wide param to toggle between setting memory attributes via
vm and guest_memfd ioctls (making them mututally exclusive - a single loaded
KVM module can only do one of the two.)
6. Skipping LRU in guest_memfd folios - make guest_memfd folios not participate
in LRU to avoid LRU refcounts from interfering with conversions.
This series is based on kvm/next, followed by
+ v12 of NUMA mempolicy support patches [5]
+ 3 cleanup patches from Sean [6][7][8]
Everything is stitched together here for your convenience
https://github.com/googleprodkernel/linux-cc/commits/guest_memfd-inplace-co…
Thank you all for helping with this series!
If I missed out your comment from a previous series, it's not intentional!
Please do raise it again.
TODOs:
+ There might be an issue with memory failure handling because when guest_memfd
folios stop participating in LRU. From a preliminary analysis,
HWPoisonHandlable() is only true if PageLRU() is true. This needs further
investigation.
[1] https://lore.kernel.org/all/bd163de3118b626d1005aa88e71ef2fb72f0be0f.172600…
[2] https://lore.kernel.org/all/20250117163001.2326672-6-tabba@google.com/
[3] https://lore.kernel.org/all/b784326e9ccae6a08388f1bf39db70a2204bdc51.174726…
[4] https://lore.kernel.org/all/20250529054227.hh2f4jmyqf6igd3i@amd.com/
[5] https://lore.kernel.org/all/20251007221420.344669-1-seanjc@google.com/T/
[6] https://lore.kernel.org/all/20250924174255.2141847-1-seanjc@google.com/
[7] https://lore.kernel.org/all/20251007224515.374516-1-seanjc@google.com/
[8] https://lore.kernel.org/all/20251007223625.369939-1-seanjc@google.com/
Ackerley Tng (19):
KVM: guest_memfd: Update kvm_gmem_populate() to use gmem attributes
KVM: Introduce KVM_SET_MEMORY_ATTRIBUTES2
KVM: guest_memfd: Don't set FGP_ACCESSED when getting folios
KVM: guest_memfd: Skip LRU for guest_memfd folios
KVM: guest_memfd: Add support for KVM_SET_MEMORY_ATTRIBUTES
KVM: selftests: Update framework to use KVM_SET_MEMORY_ATTRIBUTES2
KVM: selftests: guest_memfd: Test basic single-page conversion flow
KVM: selftests: guest_memfd: Test conversion flow when INIT_SHARED
KVM: selftests: guest_memfd: Test indexing in guest_memfd
KVM: selftests: guest_memfd: Test conversion before allocation
KVM: selftests: guest_memfd: Convert with allocated folios in
different layouts
KVM: selftests: guest_memfd: Test precision of conversion
KVM: selftests: guest_memfd: Test that truncation does not change
shared/private status
KVM: selftests: guest_memfd: Test conversion with elevated page
refcount
KVM: selftests: Reset shared memory after hole-punching
KVM: selftests: Provide function to look up guest_memfd details from
gpa
KVM: selftests: Make TEST_EXPECT_SIGBUS thread-safe
KVM: selftests: Update private_mem_conversions_test to mmap()
guest_memfd
KVM: selftests: Add script to exercise private_mem_conversions_test
Sean Christopherson (18):
KVM: guest_memfd: Introduce per-gmem attributes, use to guard user
mappings
KVM: Rename KVM_GENERIC_MEMORY_ATTRIBUTES to KVM_VM_MEMORY_ATTRIBUTES
KVM: Enumerate support for PRIVATE memory iff kvm_arch_has_private_mem
is defined
KVM: Stub in ability to disable per-VM memory attribute tracking
KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem
attributes
KVM: guest_memfd: Enable INIT_SHARED on guest_memfd for x86 Coco VMs
KVM: Move KVM_VM_MEMORY_ATTRIBUTES config definition to x86
KVM: Let userspace disable per-VM mem attributes, enable per-gmem
attributes
KVM: selftests: Create gmem fd before "regular" fd when adding memslot
KVM: selftests: Rename guest_memfd{,_offset} to gmem_{fd,offset}
KVM: selftests: Add support for mmap() on guest_memfd in core library
KVM: selftests: Add helpers for calling ioctls on guest_memfd
KVM: selftests: guest_memfd: Test that shared/private status is
consistent across processes
KVM: selftests: Add selftests global for guest memory attributes
capability
KVM: selftests: Provide common function to set memory attributes
KVM: selftests: Check fd/flags provided to mmap() when setting up
memslot
KVM: selftests: Update pre-fault test to work with per-guest_memfd
attributes
KVM: selftests: Update private memory exits test work with per-gmem
attributes
Documentation/virt/kvm/api.rst | 72 ++-
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/Kconfig | 15 +-
arch/x86/kvm/mmu/mmu.c | 4 +-
arch/x86/kvm/x86.c | 13 +-
include/linux/kvm_host.h | 44 +-
include/trace/events/kvm.h | 4 +-
include/uapi/linux/kvm.h | 17 +
mm/filemap.c | 1 +
mm/memcontrol.c | 2 +
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../kvm/guest_memfd_conversions_test.c | 498 ++++++++++++++++++
.../testing/selftests/kvm/include/kvm_util.h | 127 ++++-
.../testing/selftests/kvm/include/test_util.h | 29 +-
tools/testing/selftests/kvm/lib/kvm_util.c | 128 +++--
tools/testing/selftests/kvm/lib/test_util.c | 7 -
.../selftests/kvm/pre_fault_memory_test.c | 2 +-
.../kvm/x86/private_mem_conversions_test.c | 55 +-
.../kvm/x86/private_mem_conversions_test.py | 159 ++++++
.../kvm/x86/private_mem_kvm_exits_test.c | 36 +-
virt/kvm/Kconfig | 4 +-
virt/kvm/guest_memfd.c | 414 +++++++++++++--
virt/kvm/kvm_main.c | 104 +++-
24 files changed, 1554 insertions(+), 185 deletions(-)
create mode 100644 tools/testing/selftests/kvm/guest_memfd_conversions_test.c
create mode 100755 tools/testing/selftests/kvm/x86/private_mem_conversions_test.py
--
2.51.0.858.gf9c4a03a3a-goog