iommufd gives userspace the capability to manipulate iommu subsytem.
e.g. DMA map/unmap etc. In the near future, it will support iommu nested
translation. Different platform vendors have different implementation for
the nested translation. For example, Intel VT-d supports using guest I/O
page table as the stage-1 translation table. This requires guest I/O page
table be compatible with hardware IOMMU. So before set up nested translation,
userspace needs to know the hardware iommu information to understand the
nested translation requirements.
This series reports the iommu hardware information for a given device
which has been bound to iommufd. It is preparation work for userspace to
allocate hwpt for given device. Like the nested translation support[1].
This series introduces an iommu op to report the iommu hardware info,
and an ioctl IOMMU_GET_HW_INFO is added to report such hardware info to
user. enum iommu_hw_info_type is defined to differentiate the iommu hardware
info reported to user hence user can decode them. This series only adds the
framework for iommu hw info reporting, the complete reporting path needs vendor
specific definition and driver support. The full code is available in [1]
as well.
[1] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
Change log:
v4:
- Rename ioctl to IOMMU_GET_HW_INFO and structure to iommu_hw_info
- Move the iommufd_get_hw_info handler to main.c
- Place iommu_hw_info prior to iommu_hwpt_alloc
- Update the function namings accordingly
- Update uapi kdocs
v3: https://lore.kernel.org/linux-iommu/20230511143024.19542-1-yi.l.liu@intel.c…
- Add r-b from Baolu
- Rename IOMMU_HW_INFO_TYPE_DEFAULT to be IOMMU_HW_INFO_TYPE_NONE to
better suit what it means
- Let IOMMU_DEVICE_GET_HW_INFO succeed even the underlying iommu driver
does not have driver-specific data to report per below remark.
https://lore.kernel.org/kvm/ZAcwJSK%2F9UVI9LXu@nvidia.com/
v2: https://lore.kernel.org/linux-iommu/20230309075358.571567-1-yi.l.liu@intel.…
- Drop patch 05 of v1 as it is already covered by other series
- Rename the capability info to be iommu hardware info
v1: https://lore.kernel.org/linux-iommu/20230209041642.9346-1-yi.l.liu@intel.co…
Regards,
Yi Liu
Lu Baolu (1):
iommu: Add new iommu op to get iommu hardware information
Nicolin Chen (1):
iommufd/selftest: Add coverage for IOMMU_GET_HW_INFO ioctl
Yi Liu (2):
iommu: Move dev_iommu_ops() to private header
iommufd: Add IOMMU_GET_HW_INFO
drivers/iommu/iommu-priv.h | 11 +++
drivers/iommu/iommufd/device.c | 1 +
drivers/iommu/iommufd/iommufd_test.h | 9 +++
drivers/iommu/iommufd/main.c | 76 +++++++++++++++++++
drivers/iommu/iommufd/selftest.c | 16 ++++
include/linux/iommu.h | 27 ++++---
include/uapi/linux/iommufd.h | 44 +++++++++++
tools/testing/selftests/iommu/iommufd.c | 17 ++++-
tools/testing/selftests/iommu/iommufd_utils.h | 26 +++++++
9 files changed, 215 insertions(+), 12 deletions(-)
--
2.34.1
This small series of 4 patches adds some improvements in MPTCP
selftests:
- Patch 1 reworks the detailed report of mptcp_join.sh selftest to
better display what went well or wrong per test.
- Patch 2 adds colours (if supported, forced and/or not disabled) in
mptcp_join.sh selftest output to help spotting issues.
- Patch 3 modifies an MPTCP selftest tool to interact with the
path-manager via Netlink to always look for errors if any. This makes
sure odd behaviours can be seen in the logs and errors can be caught
later if needed.
- Patch 4 removes stdout and stderr redirections to /dev/null when using
pm_nl_ctl if no errors are expected in order to log odd behaviours.
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Matthieu Baerts (4):
selftests: mptcp: join: rework detailed report
selftests: mptcp: join: colored results
selftests: mptcp: pm_nl_ctl: always look for errors
selftests: mptcp: userspace_pm: unmute unexpected errors
tools/testing/selftests/net/mptcp/mptcp_join.sh | 452 ++++++++++------------
tools/testing/selftests/net/mptcp/mptcp_lib.sh | 39 ++
tools/testing/selftests/net/mptcp/pm_netlink.sh | 6 +-
tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 33 +-
tools/testing/selftests/net/mptcp/userspace_pm.sh | 100 ++---
5 files changed, 329 insertions(+), 301 deletions(-)
---
base-commit: 64a37272fa5fb2d951ebd1a96fd42b045d64924c
change-id: 20230728-upstream-net-next-20230728-mptcp-selftests-misc-0190cfd69ef9
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
There is a warning reported by coccinelle:
./tools/testing/selftests/cgroup/test_zswap.c:211:6-18: WARNING:
Unsigned expression compared with zero: stored_pages < 0
The type of "stored_pages" is size_t, which always be an unsigned type,
so it is impossible less than zero. Drop the if statements to silence
the warning.
Signed-off-by: Li Zetao <lizetao1(a)huawei.com>
---
tools/testing/selftests/cgroup/test_zswap.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c
index 49def87a909b..dbad8d0cd090 100644
--- a/tools/testing/selftests/cgroup/test_zswap.c
+++ b/tools/testing/selftests/cgroup/test_zswap.c
@@ -208,8 +208,6 @@ static int test_no_kmem_bypass(const char *root)
free(trigger_allocation);
if (get_zswap_stored_pages(&stored_pages))
break;
- if (stored_pages < 0)
- break;
/* If memory was pushed to zswap, verify it belongs to memcg */
if (stored_pages > stored_pages_threshold) {
int zswapped = cg_read_key_long(test_group, "memory.stat", "zswapped ");
--
2.34.1
This 3 patch series consists of fixes to proc_filter test
found during linun-next testing.
The first patch fixes the LKFT reported compile error, second
one adds .gitignore and the third fixes error paths to skip
instead of fail (root check, and argument checks)
Shuah Khan (3):
selftests:connector: Fix Makefile to include KHDR_INCLUDES
selftests:connector: Add .gitignore and poupulate it with test
selftests:connector: Add root check and fix arg error paths to skip
tools/testing/selftests/connector/.gitignore | 1 +
tools/testing/selftests/connector/Makefile | 2 +-
tools/testing/selftests/connector/proc_filter.c | 9 +++++++--
3 files changed, 9 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/connector/.gitignore
--
2.39.2
The openvswitch selftests currently contain a few cases for managing the
datapath, which includes creating datapath instances, adding interfaces,
and doing some basic feature / upcall tests. This is useful to validate
the control path.
Add the ability to program some of the more common flows with actions. This
can be improved overtime to include regression testing, etc.
Changes from original:
1. Fix issue when parsing ipv6 in the NAT action
2. Fix issue calculating length during ctact parsing
3. Fix error message when invalid bridge is passed
4. Fold in Adrian's patch to support key masks
Aaron Conole (4):
selftests: openvswitch: add an initial flow programming case
selftests: openvswitch: add a test for ipv4 forwarding
selftests: openvswitch: add basic ct test case parsing
selftests: openvswitch: add ct-nat test case with ipv4
Adrian Moreno (1):
selftests: openvswitch: support key masks
.../selftests/net/openvswitch/openvswitch.sh | 223 +++++++
.../selftests/net/openvswitch/ovs-dpctl.py | 601 +++++++++++++++++-
2 files changed, 800 insertions(+), 24 deletions(-)
--
2.40.1
I checked and the Landlock ptrace test failed because Yama is enabled,
which is expected. You can check that with
/proc/sys/kernel/yama/ptrace_scope
Jeff Xu sent a patch to fix this case but it is not ready yet:
https://lore.kernel.org/r/20220628222941.2642917-1-jeffxu@google.com
Could you please send a new patch Jeff, and add Limin in Cc?
On 29/11/2022 12:26, limin wrote:
> cat /proc/cmdline
> BOOT_IMAGE=/vmlinuz-6.1.0-next-20221116
> root=UUID=a65b3a79-dc02-4728-8a0c-5cf24f4ae08b ro
> systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all
>
>
> config
>
> #
> # Automatically generated file; DO NOT EDIT.
> # Linux/x86 6.1.0-rc6 Kernel Configuration
> #
[...]
> CONFIG_SECURITY_YAMA=y
[...]
> CONFIG_LSM="landlock,lockdown,yama,integrity,apparmor"
[...]
>
> On 2022/11/29 19:03, Mickaël Salaün wrote:
>> I tested with next-20221116 and all tests are OK. Could you share your
>> kernel configuration with a link? What is the content of /proc/cmdline?
>>
>> On 29/11/2022 02:42, limin wrote:
>>> I run test on Linux ubuntu2204 6.1.0-next-20221116
>>>
>>> I did't use yama.
>>>
>>> you can reproduce by this step:
>>>
>>> cd kernel_src
>>>
>>> cd tools/testing/selftests/landlock/
>>> make
>>> ./ptrace_test
>>>
>>>
>>>
>>>
>>> On 2022/11/29 3:44, Mickaël Salaün wrote:
>>>> This patch changes the test semantic and then cannot work on my test
>>>> environment. On which kernel did you run test? Do you use Yama or
>>>> something similar?
>>>>
>>>> On 28/11/2022 03:04, limin wrote:
>>>>> Tests PTRACE_ATTACH and PTRACE_MODE_READ on the parent,
>>>>> trace parent return -1 when child== 0
>>>>> How to reproduce warning:
>>>>> $ make -C tools/testing/selftests TARGETS=landlock run_tests
>>>>>
>>>>> Signed-off-by: limin <limin100(a)huawei.com>
>>>>> ---
>>>>> tools/testing/selftests/landlock/ptrace_test.c | 5 ++---
>>>>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/tools/testing/selftests/landlock/ptrace_test.c
>>>>> b/tools/testing/selftests/landlock/ptrace_test.c
>>>>> index c28ef98ff3ac..88c4dc63eea0 100644
>>>>> --- a/tools/testing/selftests/landlock/ptrace_test.c
>>>>> +++ b/tools/testing/selftests/landlock/ptrace_test.c
>>>>> @@ -267,12 +267,11 @@ TEST_F(hierarchy, trace)
>>>>> /* Tests PTRACE_ATTACH and PTRACE_MODE_READ on the
>>>>> parent. */
>>>>> err_proc_read = test_ptrace_read(parent);
>>>>> ret = ptrace(PTRACE_ATTACH, parent, NULL, 0);
>>>>> + EXPECT_EQ(-1, ret);
>>>>> + EXPECT_EQ(EPERM, errno);
>>>>> if (variant->domain_child) {
>>>>> - EXPECT_EQ(-1, ret);
>>>>> - EXPECT_EQ(EPERM, errno);
>>>>> EXPECT_EQ(EACCES, err_proc_read);
>>>>> } else {
>>>>> - EXPECT_EQ(0, ret);
>>>>> EXPECT_EQ(0, err_proc_read);
>>>>> }
>>>>> if (ret == 0) {
Add the definition of the '__weak' attribute since it is not available in
GCC 11.3.0 and newer:
rseq.c:41:1: error: unknown type name ‘__weak’
41 | __weak ptrdiff_t __rseq_offset;
Fixes: 3bcbc20942db ("selftests/rseq: Play nice with binaries statically linked against glibc 2.35+")
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
---
tools/testing/selftests/rseq/rseq.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
index a723da253244..584eec9b0930 100644
--- a/tools/testing/selftests/rseq/rseq.c
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -34,6 +34,10 @@
#include "../kselftest.h"
#include "rseq.h"
+#ifndef __weak
+#define __weak __attribute__((weak))
+#endif
+
/*
* Define weak versions to play nice with binaries that are statically linked
* against a libc that doesn't support registering its own rseq.
--
2.34.1
Hi,
Since v6.5-rc1, kunit gained a devm/drmm-like mechanism that makes tests
resources much easier to cleanup.
This series converts the existing tests to use those new actions where
relevant.
Let me know what you think,
Maxime
Signed-off-by: Maxime Ripard <mripard(a)kernel.org>
---
Changes in v3:
- Fixed the build cast warnings by switching to wrapper functions
- Link to v2: https://lore.kernel.org/r/20230720-kms-kunit-actions-rework-v2-0-175017bd56…
Changes in v2:
- Fix some typos
- Use plaltform_device_del instead of removing the call to
platform_device_put after calling platform_device_add
- Link to v1: https://lore.kernel.org/r/20230710-kms-kunit-actions-rework-v1-0-722c58d72c…
---
Maxime Ripard (11):
drm/tests: helpers: Switch to kunit actions
drm/tests: client-modeset: Remove call to drm_kunit_helper_free_device()
drm/tests: modes: Remove call to drm_kunit_helper_free_device()
drm/tests: probe-helper: Remove call to drm_kunit_helper_free_device()
drm/tests: helpers: Create a helper to allocate a locking ctx
drm/tests: helpers: Create a helper to allocate an atomic state
drm/vc4: tests: pv-muxing: Remove call to drm_kunit_helper_free_device()
drm/vc4: tests: mock: Use a kunit action to unregister DRM device
drm/vc4: tests: pv-muxing: Switch to managed locking init
drm/vc4: tests: Switch to atomic state allocation helper
drm/vc4: tests: pv-muxing: Document test scenario
drivers/gpu/drm/tests/drm_client_modeset_test.c | 8 --
drivers/gpu/drm/tests/drm_kunit_helpers.c | 141 +++++++++++++++++++++++-
drivers/gpu/drm/tests/drm_modes_test.c | 8 --
drivers/gpu/drm/tests/drm_probe_helper_test.c | 8 --
drivers/gpu/drm/vc4/tests/vc4_mock.c | 12 ++
drivers/gpu/drm/vc4/tests/vc4_test_pv_muxing.c | 115 +++++++------------
include/drm/drm_kunit_helpers.h | 7 ++
7 files changed, 198 insertions(+), 101 deletions(-)
---
base-commit: d7b3af5a77e8d8da28f435f313e069aea5bcf172
change-id: 20230710-kms-kunit-actions-rework-5d163762c93b
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
Hi,
Since v6.5-rc1, kunit gained a devm/drmm-like mechanism that makes tests
resources much easier to cleanup.
This series converts the existing tests to use those new actions where
relevant.
Let me know what you think,
Maxime
Signed-off-by: Maxime Ripard <mripard(a)kernel.org>
---
Changes in v2:
- Fix some typos
- Use plaltform_device_del instead of removing the call to
platform_device_put after calling platform_device_add
- Link to v1: https://lore.kernel.org/r/20230710-kms-kunit-actions-rework-v1-0-722c58d72c…
---
Maxime Ripard (11):
drm/tests: helpers: Switch to kunit actions
drm/tests: client-modeset: Remove call to drm_kunit_helper_free_device()
drm/tests: modes: Remove call to drm_kunit_helper_free_device()
drm/tests: probe-helper: Remove call to drm_kunit_helper_free_device()
drm/tests: helpers: Create a helper to allocate a locking ctx
drm/tests: helpers: Create a helper to allocate an atomic state
drm/vc4: tests: pv-muxing: Remove call to drm_kunit_helper_free_device()
drm/vc4: tests: mock: Use a kunit action to unregister DRM device
drm/vc4: tests: pv-muxing: Switch to managed locking init
drm/vc4: tests: Switch to atomic state allocation helper
drm/vc4: tests: pv-muxing: Document test scenario
drivers/gpu/drm/tests/drm_client_modeset_test.c | 8 --
drivers/gpu/drm/tests/drm_kunit_helpers.c | 108 +++++++++++++++++++++-
drivers/gpu/drm/tests/drm_modes_test.c | 8 --
drivers/gpu/drm/tests/drm_probe_helper_test.c | 8 --
drivers/gpu/drm/vc4/tests/vc4_mock.c | 5 ++
drivers/gpu/drm/vc4/tests/vc4_test_pv_muxing.c | 115 +++++++++---------------
include/drm/drm_kunit_helpers.h | 7 ++
7 files changed, 158 insertions(+), 101 deletions(-)
---
base-commit: c58c49dd89324b18a812762a2bfa5a0458e4f252
change-id: 20230710-kms-kunit-actions-rework-5d163762c93b
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
Submit the top-level headers also from the kunit test module notifier
initialization callback, so external tools that are parsing dmesg for
kunit test output are able to tell how many test suites should be expected
and whether to continue parsing after complete output from the first test
suite is collected.
Extend kunit module notifier initialization callback with a processing
path for only listing the tests provided by a module if the kunit action
parameter is set to "list", so external tools can obtain a list of test
cases to be executed in advance and can make a better job on assigning
kernel messages interleaved with kunit output to specific tests.
Use test filtering functions in kunit module notifier callback functions,
so external tools are able to execute individual test cases from kunit
test modules in order to still better isolate their potential impact on
kernel messages that appear interleaved with output from other tests.
v2: Fix new name of a structure moved to kunit namespace not updated
across all uses.
Janusz Krzysztofik (3):
kunit: Report the count of test suites in a module
kunit: Make 'list' action available to kunit test modules
kunit: Allow kunit test modules to use test filtering
include/kunit/test.h | 14 +++++++++++
lib/kunit/executor.c | 57 +++++++++++++++++++++++++-------------------
lib/kunit/test.c | 57 +++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 103 insertions(+), 25 deletions(-)
--
2.41.0
Hi, Willy
Most of the suggestions of v2 [1] have been applied in this v3 revision,
except the local menuconfig and mrproper targets, as explained in [2].
A fresh run with tinyconfig for ppc, ppc64 and ppc64le:
$ for arch in ppc ppc64 ppc64le; do \
mkdir -p $PWD/kernel-$arch;
time make defconfig run DEFCONFIG=tinyconfig ARCH=$arch O=$PWD/kernel-$arch RUN_OUT=$PWD/run.$arch.out;
done
rerun for ppc, ppc64 and ppc64le:
$ for arch in ppc ppc64 ppc64le; do \
make rerun ARCH=$arch O=$PWD/kernel-$arch RUN_OUT=$PWD/run.$arch.out;
done
Running /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/kernel-ppc/vmlinux on qemu-system-ppc
>> [ppc] Kernel command line: console=ttyS0 panic=-1
printk: console [ttyS0] enabled
Run /init as init process
Running test 'startup'
Running test 'syscall'
Running test 'stdlib'
Running test 'vfprintf'
Running test 'protection'
Leaving init with final status: 0
reboot: Power down
powered off, test finish
qemu-system-ppc: terminating on signal 15 from pid 190248 ()
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
See all results in /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/run.ppc.out
Running /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/kernel-ppc64/vmlinux on qemu-system-ppc64
Linux version 6.4.0+ (ubuntu@linux-lab) (powerpc64le-linux-gnu-gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #2 SMP Fri Jul 28 01:40:55 CST 2023
Kernel command line: console=hvc0 panic=-1
printk: console [hvc0] enabled
printk: console [hvc0] enabled
Run /init as init process
Running test 'startup'
Running test 'syscall'
Running test 'stdlib'
Running test 'vfprintf'
Running test 'protection'
Leaving init with final status: 0
reboot: Power down
powered off, test finish
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
See all results in /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/run.ppc64.out
Running /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/kernel-ppc64le/arch/powerpc/boot/zImage on qemu-system-ppc64le
Linux version 6.4.0+ (ubuntu@linux-lab) (powerpc64le-linux-gnu-gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #2 SMP Fri Jul 28 01:41:12 CST 2023
Kernel command line: console=hvc0 panic=-1
Run /init as init process
Running test 'startup'
Running test 'syscall'
Running test 'stdlib'
Running test 'vfprintf'
Running test 'protection'
Leaving init with final status: 0
reboot: Power down
powered off, test finish
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
See all results in /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/run.ppc64le.out
A fast report on existing test logs:
$ for arch in ppc ppc64 ppc64le; do \
make report ARCH=$arch RUN_OUT=$PWD/run.$arch.out | grep status; \
done
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
Changes from v2 --> v3:
* selftests/nolibc: allow report with existing test log
selftests/nolibc: fix up O= option support
selftests/nolibc: allow customize CROSS_COMPILE by architecture
selftests/nolibc: customize CROSS_COMPILE for 32/64-bit powerpc
selftests/nolibc: tinyconfig: add extra common options
No Change.
* selftests/nolibc: add macros to reduce duplicated changes
Remove REPORT_RUN_OUT and LOG_OUT.
* selftests/nolibc: string the core targets
Removed extconfig target from our v3 powerpc patchset [3], the
operations have been merged into the defconfig target.
Let kernel depends on $(KERNEL_CONFIG) instead of the removed
extconfig.
* selftests/nolibc: add menuconfig and mrproper for development
like the other local nolibc targets, still require local menuconfig
and mrproper targets for consistent usage with the same ARCH and no -C
/path/to/srctree
Merge them together to reduce duplicated entries.
* selftests/nolibc: allow quit qemu-system when poweroff fails
Enhance timeout logic with more expected strings print and detection
about the booting of bios, kernel, init and test.
Add a default 10 seconds of QEMU_TIMEOUT for every architecture to
detect all of the potential boog hang or failed poweroff.
* selftests/nolibc: customize QEMU_TIMEOUT for ppc64/ppc64le
Reduce QEMU_TIMEOUT from 60 seconds to a more normal 15 and 20
seconds for ppc64 and ppc64le respectively. the main time cost is
the slow bios used.
* selftests/nolibc: tinyconfig: add support for 32/64-bit powerpc
Rename the file names to shorter ones as suggestions from the powerpc
patchset.
* selftests/nolibc: speed up some targets with multiple jobs
New to speed up with -j<N> by default.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1689759351.git.falcon@tinylab.org/
[2]: https://lore.kernel.org/lkml/20230727132418.117924-1-falcon@tinylab.org/
[3]: https://lore.kernel.org/lkml/8e9e5ac6283c6ec2ecf10a70ce55b219028497c1.16904…
Zhangjin Wu (12):
selftests/nolibc: allow report with existing test log
selftests/nolibc: add macros to reduce duplicated changes
selftests/nolibc: fix up O= option support
selftests/nolibc: string the core targets
selftests/nolibc: allow customize CROSS_COMPILE by architecture
selftests/nolibc: customize CROSS_COMPILE for 32/64-bit powerpc
selftests/nolibc: add menuconfig and mrproper for development
selftests/nolibc: allow quit qemu-system when poweroff fails
selftests/nolibc: customize QEMU_TIMEOUT for ppc64/ppc64le
selftests/nolibc: tinyconfig: add extra common options
selftests/nolibc: tinyconfig: add support for 32/64-bit powerpc
selftests/nolibc: speed up some targets with multiple jobs
tools/testing/selftests/nolibc/Makefile | 102 ++++++++++++++----
.../selftests/nolibc/configs/common.config | 4 +
.../selftests/nolibc/configs/ppc.config | 3 +
.../selftests/nolibc/configs/ppc64.config | 3 +
.../selftests/nolibc/configs/ppc64le.config | 4 +
5 files changed, 98 insertions(+), 18 deletions(-)
create mode 100644 tools/testing/selftests/nolibc/configs/common.config
create mode 100644 tools/testing/selftests/nolibc/configs/ppc64.config
create mode 100644 tools/testing/selftests/nolibc/configs/ppc64le.config
--
2.25.1
Hi, Willy, Thomas
v3 here is to fix up two issues introduced in v2 powerpc patchset [1].
- One is restore the wrongly removed '\' for a '\$$ARCH'
- Another is add the missing $(ARCH).config for ppc, the default variant
for powerpc is renamed to ppc in v2 (as discussed with Willy in [2]), but
ppc.config is missing in v2 patchset, not sure why this happen, may a
'git clean -fdx .' is required to do a new test, just did it.
Btw, the v3 tinyconfig-part1 for powerpc is ready, I will send it out
soon.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1690373704.git.falcon@tinylab.org/
[2]: https://lore.kernel.org/lkml/ZL9leVOI25S2+0+g@1wt.eu/
Zhangjin Wu (7):
tools/nolibc: add support for powerpc
tools/nolibc: add support for powerpc64
selftests/nolibc: add extra configs customize support
selftests/nolibc: add XARCH and ARCH mapping support
selftests/nolibc: add test support for ppc
selftests/nolibc: add test support for ppc64le
selftests/nolibc: add test support for ppc64
tools/include/nolibc/arch-powerpc.h | 202 ++++++++++++++++++
tools/include/nolibc/arch.h | 2 +
tools/testing/selftests/nolibc/Makefile | 46 +++-
.../selftests/nolibc/configs/ppc.config | 3 +
4 files changed, 246 insertions(+), 7 deletions(-)
create mode 100644 tools/include/nolibc/arch-powerpc.h
create mode 100644 tools/testing/selftests/nolibc/configs/ppc.config
--
2.25.1
Hi,
Zhangjin and I are working on a tiny shell with nolibc. This patch
enables the missing pipe() with its testcase.
Thanks.
Yuan Tan (2):
tools/nolibc: add pipe() support
selftests/nolibc: add testcase for pipe.
tools/include/nolibc/sys.h | 17 ++++++++++
tools/testing/selftests/nolibc/nolibc-test.c | 34 ++++++++++++++++++++
2 files changed, 51 insertions(+)
--
2.39.2
Hi,
This is the semi-friendly patch-bot of Greg Kroah-Hartman.
Markus, you seem to have sent a nonsensical or otherwise pointless
review comment to a patch submission on a Linux kernel developer mailing
list. I strongly suggest that you not do this anymore. Please do not
bother developers who are actively working to produce patches and
features with comments that, in the end, are a waste of time.
Patch submitter, please ignore Markus's suggestion; you do not need to
follow it at all. The person/bot/AI that sent it is being ignored by
almost all Linux kernel maintainers for having a persistent pattern of
behavior of producing distracting and pointless commentary, and
inability to adapt to feedback. Please feel free to also ignore emails
from them.
thanks,
greg k-h's patch email bot
damos_new_filter() is returning a damos_filter struct without
initializing its ->list field. And the users of the function uses the
struct without initializing the field. As a result, uninitialized
memory access error is possible. Actually, a kernel NULL pointer
dereference BUG can be triggered using DAMON user-space tool, like
below.
# damo start --damos_action stat --damos_filter anon matching
# damo tune --damos_action stat --damos_filter anon matching --damos_filter anon nomatching
# dmesg
[...]
[ 36.908136] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 36.910483] #PF: supervisor write access in kernel mode
[ 36.912238] #PF: error_code(0x0002) - not-present page
[ 36.913415] PGD 0 P4D 0
[ 36.913978] Oops: 0002 [#1] PREEMPT SMP PTI
[ 36.914878] CPU: 32 PID: 1335 Comm: kdamond.0 Not tainted 6.5.0-rc3-mm-unstable-damon+ #1
[ 36.916621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 36.919051] RIP: 0010:damos_destroy_filter (include/linux/list.h:114 include/linux/list.h:137 include/linux/list.h:148 mm/damon/core.c:345 mm/damon/core.c:355)
[...]
[ 36.938247] Call Trace:
[ 36.938721] <TASK>
[...]
[ 36.950064] ? damos_destroy_filter (include/linux/list.h:114 include/linux/list.h:137 include/linux/list.h:148 mm/damon/core.c:345 mm/damon/core.c:355)
[ 36.950883] ? damon_sysfs_set_scheme_filters.isra.0 (mm/damon/sysfs-schemes.c:1573)
[ 36.952019] damon_sysfs_set_schemes (mm/damon/sysfs-schemes.c:1674 mm/damon/sysfs-schemes.c:1686)
[ 36.952875] damon_sysfs_apply_inputs (mm/damon/sysfs.c:1312 mm/damon/sysfs.c:1298)
[ 36.953757] ? damon_pa_check_accesses (mm/damon/paddr.c:168 mm/damon/paddr.c:179)
[ 36.954648] damon_sysfs_cmd_request_callback (mm/damon/sysfs.c:1329 mm/damon/sysfs.c:1359)
[...]
The first patch of this patchset fixes the bug by initializing the field in
damos_new_filter(). The second patch adds a unit test for the problem.
Note that the second patch Cc stable@ without Fixes: tag, since it would
be better to be ingested together for avoiding any future regression.
SeongJae Park (2):
mm/damon/core: initialize damo_filter->list from damos_new_filter()
mm/damon/core-test: add a test for damos_new_filter()
mm/damon/core-test.h | 13 +++++++++++++
mm/damon/core.c | 1 +
2 files changed, 14 insertions(+)
--
2.25.1
Hi, Willy, Thomas
Here is the first part of v2 of our tinyconfig support for nolibc-test
[1], the patchset subject is reserved as before.
As discussed in v1 thread [1], to easier the review progress, the whole
tinyconfig support is divided into several parts, mainly by
architecture, here is the first part, include basic preparation and
powerpc example.
This patchset should be applied after the 32/64-bit powerpc support [2],
exactly these two are required by us:
* selftests/nolibc: add extra config file customize support
* selftests/nolibc: add XARCH and ARCH mapping support
In this patchset, we firstly add some misc preparations and at last add
the tinyconfig target and use powerpc as the first example.
Tests:
// powerpc run-user
$ for arch in powerpc powerpc64 powerpc64le; do \
rm -rf $PWD/kernel-$arch; \
mkdir -p $PWD/kernel-$arch; \
make run-user XARCH=$arch O=$PWD/kernel-$arch RUN_OUT=$PWD/run.$arch.out | grep "status: "; \
done
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
// powerpc run
$ for arch in powerpc powerpc64 powerpc64le; do \
rm -rf $PWD/kernel-$arch; \
mkdir -p $PWD/kernel-$arch; \
make tinyconfig run XARCH=$arch O=$PWD/kernel-$arch RUN_OUT=$PWD/run.$arch.out; \
done
$ for arch in powerpc powerpc64 powerpc64le; do \
make report XARCH=$arch O=$PWD/kernel-$arch RUN_OUT=$PWD/run.$arch.out | grep "status: "; \
done
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
// the others, randomly choose some
$ make run-user XARCH=arm O=$PWD/kernel-arm RUN_OUT=$PWD/run.arm.out CROSS_COMPILE=arm-linux-gnueabi- | grep status:
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
$ make run-user XARCH=x86_64 O=$PWD/kernel-arm RUN_OUT=$PWD/run.x86_64.out CROSS_COMPILE=x86_64-linux-gnu- | grep status:
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
$ make run-libc-test | grep status:
165 test(s): 153 passed, 12 skipped, 0 failed => status: warning
// x86_64, require noapic kernel command line option for old qemu-system-x86_64 (v4.2.1)
$ make run XARCH=x86_64 O=$PWD/kernel-x86_64 RUN_OUT=$PWD/run.x86_64.out CROSS_COMPILE=x86_64-linux-gnu- | grep status
$ make rerun XARCH=x86_64 O=$PWD/kernel-x86_64 RUN_OUT=$PWD/run.x86_64.out CROSS_COMPILE=x86_64-linux-gnu- | grep status
165 test(s): 159 passed, 6 skipped, 0 failed => status: warning
tinyconfig mainly targets as a time-saver, the misc preparations service
for the same goal, let's take a look:
* selftests/nolibc: allow report with existing test log
Like rerun without rebuild, Add report (without rerun) to summarize
the existing test log, this may work perfectly with the 'grep status'
* selftests/nolibc: add macros to enhance maintainability
Several macros are added to dedup the shared code to shrink lines
and easier the maintainability
The macros are added just before the using area to avoid code change
conflicts in the future.
* selftests/nolibc: print running log to screen
Enable logging to let developers learn what is happening at the
first glance, without the need to edit the Makefile and rerun it.
These helps a lot when there is a long-time running, a failed
poweroff or even a forever hang.
For test summmary, the 'grep status' can be used together with the
standalone report target.
* selftests/nolibc: fix up O= option support
With objtree instead srctree for .config and IMAGE, now, O= works.
Using O=$PWD/kernel-$arch avoid the mrproer for every build.
* selftests/nolibc: add menuconfig for development
Allow manually tuning some options, mainly for a new architecture
porting.
* selftests/nolibc: add mrproper for development
selftests/nolibc: defconfig: remove mrproper target
Split the mrproper target out of defconfig, when with O=, mrproper is not
required by defconfig, but reserve it for the other use scenes.
* selftests/nolibc: string the core targets
Allow simply 'make run' instead of 'make defconfig; make extconfig;
make kernel; make run'.
* selftests/nolibc: allow quit qemu-system when poweroff fails
When poweroff fails, allow exit while detects the power off string
from output or the wait time is too long (specified by QEMU_TIMEOUT).
This helps the boards who have no poweroff support or the kernel not
enable the poweroff options (mainly for tinyconfig).
* selftests/nolibc: allow customize CROSS_COMPILE by architecture
* selftests/nolibc: customize CROSS_COMPILE for 32/64-bit powerpc
This further saves a CROSS_COMPILE option for 'make run', it is very
important when iterates all of the supported architectures and the
compilers are not just prefixed with the XARCH variable.
For example, binary of big endian powerpc64 can be compiled with
powerpc64le-linux-gnu-, but the prefix is powerpc64le.
Even if the pre-customized compiler not exist, we can configure
CROSS_COMPILE_<ARCH> before the test loop to use the code.
* selftests/nolibc: add tinyconfig target
selftests/nolibc: tinyconfig: add extra common options
selftests/nolibc: tinyconfig: add support for 32/64-bit powerpc
Here is the first architecture(and its variants) support tinyconfig.
powerpc is actually a very good architecture, for it has 'various'
variants for test.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1687706332.git.falcon@tinylab.org/
[2]: https://lore.kernel.org/lkml/cover.1689713175.git.falcon@tinylab.org/
Zhangjin Wu (14):
selftests/nolibc: allow report with existing test log
selftests/nolibc: add macros to enhance maintainability
selftests/nolibc: print running log to screen
selftests/nolibc: fix up O= option support
selftests/nolibc: add menuconfig for development
selftests/nolibc: add mrproper for development
selftests/nolibc: defconfig: remove mrproper target
selftests/nolibc: string the core targets
selftests/nolibc: allow quit qemu-system when poweroff fails
selftests/nolibc: allow customize CROSS_COMPILE by architecture
selftests/nolibc: customize CROSS_COMPILE for 32/64-bit powerpc
selftests/nolibc: add tinyconfig target
selftests/nolibc: tinyconfig: add extra common options
selftests/nolibc: tinyconfig: add support for 32/64-bit powerpc
tools/testing/selftests/nolibc/Makefile | 102 ++++++++++++++----
.../selftests/nolibc/configs/common.config | 4 +
.../selftests/nolibc/configs/powerpc.config | 3 +
.../selftests/nolibc/configs/powerpc64.config | 3 +
.../nolibc/configs/powerpc64le.config | 4 +
5 files changed, 98 insertions(+), 18 deletions(-)
create mode 100644 tools/testing/selftests/nolibc/configs/common.config
create mode 100644 tools/testing/selftests/nolibc/configs/powerpc64.config
create mode 100644 tools/testing/selftests/nolibc/configs/powerpc64le.config
--
2.25.1
Submit the top-level headers also from the kunit test module notifier
initialization callback, so external tools that are parsing dmesg for
kunit test output are able to tell how many test suites should be expected
and whether to continue parsing after complete output from the first test
suite is collected.
Extend kunit module notifier initialization callback with a processing
path for only listing the tests provided by a module if the kunit action
parameter is set to "list", so external tools can obtain a list of test
cases to be executed in advance and can make a better job on assigning
kernel messages interleaved with kunit output to specific tests.
Use test filtering functions in kunit module notifier callback functions,
so external tools are able to execute individual test cases from kunit
test modules in order to still better isolate their potential impact on
kernel messages that appear interleaved with output from other tests.
Janusz Krzysztofik (3):
kunit: Report the count of test suites in a module
kunit: Make 'list' action available to kunit test modules
kunit: Allow kunit test modules to use test filtering
include/kunit/test.h | 14 +++++++++++
lib/kunit/executor.c | 51 ++++++++++++++++++++++-----------------
lib/kunit/test.c | 57 +++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 99 insertions(+), 23 deletions(-)
--
2.41.0
Hi, Willy
Here is the powerpc support, includes 32-bit big-endian powerpc, 64-bit
little endian and big endian powerpc.
All of them passes run-user with the default powerpc toolchain from
Ubuntu 20.04:
$ make run-user DEFCONFIG=tinyconfig XARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- | grep status
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
$ make run-user DEFCONFIG=tinyconfig XARCH=powerpc64 CROSS_COMPILE=powerpc64le-linux-gnu- | grep status
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
$ make run-user DEFCONFIG=tinyconfig XARCH=powerpc64le CROSS_COMPILE=powerpc64le-linux-gnu- | grep status
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
For the slow 'run' target, I have run with defconfig before, and just
verified them via the fast tinyconfig + run with a new patch from next
patchset, all of them passes:
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
Note, the big endian crosstool powerpc64-linux-gcc from
https://mirrors.edge.kernel.org/pub/tools/crosstool/ has been used to
test both little endian and big endian powerpc64 too, both passed.
Here simply explain what they are:
* tools/nolibc: add support for powerpc
tools/nolibc: add support for powerpc64
32-bit & 64-bit powerpc support of nolibc.
* selftests/nolibc: select_null: fix up for big endian powerpc64
fix up a test case for big endian powerpc64.
* selftests/nolibc: add extra config file customize support
add extconfig target to allow enable extra config options via
configs/<ARCH>.config
applied suggestion from Thomas to use config files instead of config
lines.
* selftests/nolibc: add XARCH and ARCH mapping support
applied suggestions from Willy, use XARCH as the input of our nolibc
test, use ARCH as the pure kernel input, at last build the mapping
between XARCH and ARCH.
Customize the variables via the input XARCH.
* selftests/nolibc: add test support for powerpc
Require to use extconfig to enable the console options specified in
configs/powerpc.config
currently, we should manually run extconfig after defconfig, in next
patchset, we will do this automatically.
* selftests/nolibc: add test support for powerpc64le
selftests/nolibc: add test support for powerpc64
Very simple, but customize CFLAGS carefully to let them work with
powerpc64le-linux-gnu-gcc (from Linux distributions) and
powerpc64-linux-gcc (from mirrors.edge.kernel.org)
The next patchset will not be tinyconfig, but some prepare patches, will
be sent out soon.
Best regards,
Zhangjin
---
Zhangjin Wu (8):
tools/nolibc: add support for powerpc
tools/nolibc: add support for powerpc64
selftests/nolibc: select_null: fix up for big endian powerpc64
selftests/nolibc: add extra config file customize support
selftests/nolibc: add XARCH and ARCH mapping support
selftests/nolibc: add test support for powerpc
selftests/nolibc: add test support for powerpc64le
selftests/nolibc: add test support for powerpc64
tools/include/nolibc/arch-powerpc.h | 170 ++++++++++++++++++
tools/testing/selftests/nolibc/Makefile | 55 ++++--
.../selftests/nolibc/configs/powerpc.config | 3 +
tools/testing/selftests/nolibc/nolibc-test.c | 2 +-
4 files changed, 217 insertions(+), 13 deletions(-)
create mode 100644 tools/include/nolibc/arch-powerpc.h
create mode 100644 tools/testing/selftests/nolibc/configs/powerpc.config
--
2.25.1
If the test description is longer than the status alignment the
parameter 'n' to putcharn() would lead to a signed underflow that then
gets converted to a very large unsigned value.
This in turn leads out-of-bound writes in memset() crashing the
application.
The failure case of EXPECT_PTRER() used in "mmap_bad" exhibits this
exact behavior.
Fixes: 8a27526f49f9 ("selftests/nolibc: add EXPECT_PTREQ, EXPECT_PTRNE and EXPECT_PTRER")
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
tools/testing/selftests/nolibc/nolibc-test.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/nolibc/nolibc-test.c b/tools/testing/selftests/nolibc/nolibc-test.c
index 03b1d30f5507..9b76603e4ce3 100644
--- a/tools/testing/selftests/nolibc/nolibc-test.c
+++ b/tools/testing/selftests/nolibc/nolibc-test.c
@@ -151,7 +151,8 @@ static void result(int llen, enum RESULT r)
else
msg = "[FAIL]";
- putcharn(' ', 64 - llen);
+ if (llen < 64)
+ putcharn(' ', 64 - llen);
puts(msg);
}
---
base-commit: dfef4fc45d5713eb23d87f0863aff9c33bd4bfaf
change-id: 20230726-nolibc-result-width-1f4b0b4f3ca0
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
=== Context ===
In the context of a middlebox, fragmented packets are tricky to handle.
The full 5-tuple of a packet is often only available in the first
fragment which makes enforcing consistent policy difficult. There are
really only two stateless options, neither of which are very nice:
1. Enforce policy on first fragment and accept all subsequent fragments.
This works but may let in certain attacks or allow data exfiltration.
2. Enforce policy on first fragment and drop all subsequent fragments.
This does not really work b/c some protocols may rely on
fragmentation. For example, DNS may rely on oversized UDP packets for
large responses.
So stateful tracking is the only sane option. RFC 8900 [0] calls this
out as well in section 6.3:
Middleboxes [...] should process IP fragments in a manner that is
consistent with [RFC0791] and [RFC8200]. In many cases, middleboxes
must maintain state in order to achieve this goal.
=== BPF related bits ===
Policy has traditionally been enforced from XDP/TC hooks. Both hooks
run before kernel reassembly facilities. However, with the new
BPF_PROG_TYPE_NETFILTER, we can rather easily hook into existing
netfilter reassembly infra.
The basic idea is we bump a refcnt on the netfilter defrag module and
then run the bpf prog after the defrag module runs. This allows bpf
progs to transparently see full, reassembled packets. The nice thing
about this is that progs don't have to carry around logic to detect
fragments.
=== Changelog ===
Changes from v5:
* Fix defrag disable codepaths
Changes from v4:
* Refactor module handling code to not sleep in rcu_read_lock()
* Also unify the v4 and v6 hook structs so they can share codepaths
* Fixed some checkpatch.pl formatting warnings
Changes from v3:
* Correctly initialize `addrlen` stack var for recvmsg()
Changes from v2:
* module_put() if ->enable() fails
* Fix CI build errors
Changes from v1:
* Drop bpf_program__attach_netfilter() patches
* static -> static const where appropriate
* Fix callback assignment order during registration
* Only request_module() if callbacks are missing
* Fix retval when modprobe fails in userspace
* Fix v6 defrag module name (nf_defrag_ipv6_hooks -> nf_defrag_ipv6)
* Simplify priority checking code
* Add warning if module doesn't assign callbacks in the future
* Take refcnt on module while defrag link is active
[0]: https://datatracker.ietf.org/doc/html/rfc8900
Daniel Xu (5):
netfilter: defrag: Add glue hooks for enabling/disabling defrag
netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
bpf: selftests: Support not connecting client socket
bpf: selftests: Support custom type and proto for client sockets
bpf: selftests: Add defrag selftests
include/linux/netfilter.h | 10 +
include/uapi/linux/bpf.h | 5 +
net/ipv4/netfilter/nf_defrag_ipv4.c | 17 +-
net/ipv6/netfilter/nf_defrag_ipv6_hooks.c | 11 +
net/netfilter/core.c | 6 +
net/netfilter/nf_bpf_link.c | 123 +++++++-
tools/include/uapi/linux/bpf.h | 5 +
tools/testing/selftests/bpf/Makefile | 4 +-
.../selftests/bpf/generate_udp_fragments.py | 90 ++++++
.../selftests/bpf/ip_check_defrag_frags.h | 57 ++++
tools/testing/selftests/bpf/network_helpers.c | 26 +-
tools/testing/selftests/bpf/network_helpers.h | 3 +
.../bpf/prog_tests/ip_check_defrag.c | 283 ++++++++++++++++++
.../selftests/bpf/progs/ip_check_defrag.c | 104 +++++++
14 files changed, 718 insertions(+), 26 deletions(-)
create mode 100755 tools/testing/selftests/bpf/generate_udp_fragments.py
create mode 100644 tools/testing/selftests/bpf/ip_check_defrag_frags.h
create mode 100644 tools/testing/selftests/bpf/prog_tests/ip_check_defrag.c
create mode 100644 tools/testing/selftests/bpf/progs/ip_check_defrag.c
--
2.41.0
There are use cases that need to apply DAMOS schemes to specific address
ranges or DAMON monitoring targets. NUMA nodes in the physical address
space, special memory objects in the virtual address space, and
monitoring target specific efficient monitoring results snapshot
retrieval could be examples of such use cases. This patchset extends
DAMOS filters feature for such cases, by implementing two more filter
types, namely address ranges and DAMON monitoring types.
Patches sequence
----------------
The first seven patches are for the address ranges based DAMOS filter.
The first patch implements the filter feature and expose it via DAMON
kernel API. The second patch further expose the feature to users via
DAMON sysfs interface. The third and fourth patches implement unit
tests and selftests for the feature. Three patches (fifth to seventh)
updating the documents follow.
The following six patches are for the DAMON monitoring target based
DAMOS filter. The eighth patch implements the feature in the core layer
and expose it via DAMON's kernel API. The ninth patch further expose it
to users via DAMON sysfs interface. Tenth patch add a selftest, and two
patches (eleventh and twelfth) update documents.
SeongJae Park (13):
mm/damon/core: introduce address range type damos filter
mm/damon/sysfs-schemes: support address range type DAMOS filter
mm/damon/core-test: add a unit test for __damos_filter_out()
selftests/damon/sysfs: test address range damos filter
Docs/mm/damon/design: update for address range filters
Docs/ABI/damon: update for address range DAMOS filter
Docs/admin-guide/mm/damon/usage: update for address range type DAMOS
filter
mm/damon/core: implement target type damos filter
mm/damon/sysfs-schemes: support target damos filter
selftests/damon/sysfs: test damon_target filter
Docs/mm/damon/design: update for DAMON monitoring target type DAMOS
filter
Docs/ABI/damon: update for DAMON monitoring target type DAMOS filter
Docs/admin-guide/mm/damon/usage: update for DAMON monitoring target
type DAMOS filter
.../ABI/testing/sysfs-kernel-mm-damon | 27 +++++-
Documentation/admin-guide/mm/damon/usage.rst | 34 +++++---
Documentation/mm/damon/design.rst | 24 ++++--
include/linux/damon.h | 28 +++++--
mm/damon/core-test.h | 61 ++++++++++++++
mm/damon/core.c | 62 ++++++++++++++
mm/damon/sysfs-schemes.c | 83 +++++++++++++++++++
tools/testing/selftests/damon/sysfs.sh | 5 ++
8 files changed, 299 insertions(+), 25 deletions(-)
--
2.25.1
The tried_regions directory of DAMON sysfs interface is useful for
retrieving monitoring results snapshot or DAMOS debugging. However, for
common use case that need to monitor only the total size of the scheme
tried regions (e.g., monitoring working set size), the kernel overhead
for directory construction and user overhead for reading the content
could be high if the number of monitoring region is not small. This
patchset implements DAMON sysfs files for efficient support of the use
case.
The first patch implements the sysfs file to reduce the user space
overhead, and the second patch implements a command for reducing the
kernel space overhead.
The third patch adds a selftest for the new file, and following two
patches update documents.
SeongJae Park (5):
mm/damon/sysfs-schemes: implement DAMOS tried total bytes file
mm/damon/sysfs: implement a command for updating only schemes tried
total bytes
selftests/damon/sysfs: test tried_regions/total_bytes file
Docs/ABI/damon: update for tried_regions/total_bytes
Docs/admin-guide/mm/damon/usage: update for tried_regions/total_bytes
.../ABI/testing/sysfs-kernel-mm-damon | 13 +++++-
Documentation/admin-guide/mm/damon/usage.rst | 42 ++++++++++++-------
mm/damon/sysfs-common.h | 2 +-
mm/damon/sysfs-schemes.c | 24 ++++++++++-
mm/damon/sysfs.c | 26 +++++++++---
tools/testing/selftests/damon/sysfs.sh | 1 +
6 files changed, 83 insertions(+), 25 deletions(-)
--
2.25.1
Events Tracing infrastructure contains lot of files, directories
(internally in terms of inodes, dentries). And ends up by consuming
memory in MBs. We can have multiple events of Events Tracing, which
further requires more memory.
Instead of creating inodes/dentries, eventfs could keep meta-data and
skip the creation of inodes/dentries. As and when require, eventfs will
create the inodes/dentries only for required files/directories.
Also eventfs would delete the inodes/dentries once no more requires
but preserve the meta data.
Tracing events took ~9MB, with this approach it took ~4.5MB
for ~10K files/dir.
Diff from v5:
Patch 02: removed TRACEFS_EVENT_INODE enum.
Patch 04: added TRACEFS_EVENT_INODE enum.
Patch 06: removed WARN_ON_ONCE in eventfs_set_ef_status_free()
Patch 07: added WARN_ON_ONCE in create_dentry()
moved declaration of following to internal.h:
eventfs_start_creating()
eventfs_failed_creating()
eventfs_end_creating()
Patch 08: added WARN_ON_ONCE in eventfs_set_ef_status_free()
Diff from v4:
Patch 02: moved from v4 08/10
added fs/tracefs/internal.h
Patch 03: moved from v4 02/10
removed fs/tracefs/internal.h
Patch 04: moved from v4 03/10
moved out changes of fs/tracefs/internal.h
Patch 05: moved from v4 04/10
renamed eventfs_add_top_file() -> eventfs_add_events_file()
Patch 06: moved from v4 07/10
implemented create_dentry() helper function
added create_file(), create_dir() stub function
Patch 07: moved from v4 06/10
Patch 08: moved from v4 05/10
improved eventfs remove functionality
Patch 09: removed unwanted if conditions
Patch 10: added available_filter_functions check
Diff from v3:
Patch 3,4,5,7,9:
removed all the eventfs_rwsem code and replaced it with an srcu
lock for the readers, and a mutex to synchronize the writers of
the list.
Patch 2: moved 'tracefs_inode' and 'get_tracefs()' to v4 03/10
Patch 3: moved the struct eventfs_file and eventfs_inode into event_inode.c
as it really should not be exposed to all users.
Patch 5: added a recursion check to eventfs_remove_rec() as it is really
dangerous to have unchecked recursion in the kernel (we do have
a fixed size stack).
have the free use srcu callbacks. After the srcu grace periods
are done, it adds the eventfs_file onto a llist (lockless link
list) and wakes up a work queue. Then the work queue does the
freeing (this needs to be done in task/workqueue context, as
srcu callbacks are done in softirq context).
Patch 6: renamed:
eventfs_create_file() -> create_file()
eventfs_create_dir() -> create_dir()
Diff from v2:
Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM'
Patch 02: moved from v1 1/9
Patch 03: moved from v1 2/9
As suggested by Zheng Yejian, introduced eventfs_prepare_ef()
helper function to add files or directories to eventfs
fix WARNING reported by kernel test robot in v1 8/9
Patch 04: moved from v1 3/9
used eventfs_prepare_ef() to add files
fix WARNING reported by kernel test robot in v1 8/9
Patch 05: moved from v1 4/9
fix compiling warning reported by kernel test robot in v1 4/9
Patch 06: moved from v1 5/9
Patch 07: moved from v1 6/9
Patch 08: moved from v1 7/9
Patch 09: moved from v1 8/9
rebased because of v3 01/10
Patch 10: moved from v1 9/9
Diff from v1:
Patch 1: add header file
Patch 2: resolved kernel test robot issues
protecting eventfs lists using nested eventfs_rwsem
Patch 3: protecting eventfs lists using nested eventfs_rwsem
Patch 4: improve events cleanup code to fix crashes
Patch 5: resolved kernel test robot issues
removed d_instantiate_anon() calls
Patch 6: resolved kernel test robot issues
fix kprobe test in eventfs_root_lookup()
protecting eventfs lists using nested eventfs_rwsem
Patch 7: remove header file
Patch 8: pass eventfs_rwsem as argument to eventfs functions
called eventfs_remove_events_dir() instead of tracefs_remove()
from event_trace_del_tracer()
Patch 9: new patch to fix kprobe test case
fs/tracefs/Makefile | 1 +
fs/tracefs/event_inode.c | 801 ++++++++++++++++++
fs/tracefs/inode.c | 151 +++-
fs/tracefs/internal.h | 29 +
include/linux/trace_events.h | 1 +
include/linux/tracefs.h | 23 +
kernel/trace/trace.h | 2 +-
kernel/trace/trace_events.c | 76 +-
.../ftrace/test.d/kprobe/kprobe_args_char.tc | 9 +-
.../test.d/kprobe/kprobe_args_string.tc | 9 +-
10 files changed, 1050 insertions(+), 52 deletions(-)
create mode 100644 fs/tracefs/event_inode.c
create mode 100644 fs/tracefs/internal.h
--
2.39.0
[ This series depends on the VFIO device cdev series ]
Changelog
v11:
* Added Reviewed-by from Kevin
* Dropped 'rc' in iommufd_access_detach()
* Dropped a duplicated IS_ERR check at new_ioas
* Separate into a new patch the change in iommufd_access_destroy_object
v10:
https://lore.kernel.org/all/cover.1690488745.git.nicolinc@nvidia.com/
* Added Reviewed-by from Jason
* Replaced the iommufd_ref_to_user call with refcount_inc
* Added a wrapper iommufd_access_change_ioas_id and used it in the
iommufd_access_attach() and iommufd_access_replace() APIs
v9:
https://lore.kernel.org/all/cover.1690440730.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd for-next tree
* Added Reviewed-by from Jason and Alex
* Reworked the replace API patches
* Added a new patch allowing passing in to iopt_remove_access
* Added a new patch of a helper function following Jason's design,
mainly by blocking any concurrent detach/replace and keeping the
refcount_dec at the end of the function
* Added a call of the new helper in iommufd_access_destroy_object()
to reduce race condition
* Simplified the replace API patch
v8:
https://lore.kernel.org/all/cover.1690226015.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt series and then cdev v15 series:
https://lore.kernel.org/all/0-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.…https://lore.kernel.org/kvm/20230718135551.6592-1-yi.l.liu@intel.com/
* Changed the order of detach() and attach() in replace(), to fix a bug
v7:
https://lore.kernel.org/all/cover.1683593831.git.nicolinc@nvidia.com/
* Rebased on top of v6.4-rc1 and cdev v11 candidate
* Fixed a wrong file in replace() API patch
* Added Kevin's "Reviewed-by" to replace() API patch
v6:
https://lore.kernel.org/all/cover.1679939952.git.nicolinc@nvidia.com/
* Rebased on top of cdev v8 series
https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/
* Added "Reviewed-by" from Kevin to PATCH-4
* Squashed access->ioas updating lines into iommufd_access_change_pt(),
and changed function return type accordingly for simplification.
v5:
https://lore.kernel.org/all/cover.1679559476.git.nicolinc@nvidia.com/
* Kept the cmd->id in the iommufd_test_create_access() so the access can
be created with an ioas by default. Then, renamed the previous ioctl
IOMMU_TEST_OP_ACCESS_SET_IOAS to IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, so
it would be used to replace an access->ioas pointer.
* Added iommufd_access_replace() API after the introductions of the other
two APIs iommufd_access_attach() and iommufd_access_detach().
* Since vdev->iommufd_attached is also set in emulated pathway too, call
iommufd_access_update(), similar to the physical pathway.
v4:
https://lore.kernel.org/all/cover.1678284812.git.nicolinc@nvidia.com/
* Rebased on top of Jason's series adding replace() and hwpt_alloc()
https://lore.kernel.org/all/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia…
* Rebased on top of cdev series v6
https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/
* Dropped the patch that's moved to cdev series.
* Added unmap function pointer sanity before calling it.
* Added "Reviewed-by" from Kevin and Yi.
* Added back the VFIO change updating the ATTACH uAPI.
v3:
https://lore.kernel.org/all/cover.1677288789.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt branch:
https://lore.kernel.org/all/0-v2-406f7ac07936+6a-iommufd_hwpt_jgg@nvidia.co…
* Dropped patches from this series accordingly. There were a couple of
VFIO patches that will be submitted after the VFIO cdev series. Also,
renamed the series to be "emulated".
* Moved dma_unmap sanity patch to the first in the series.
* Moved dma_unmap sanity to cover both VFIO and IOMMUFD pathways.
* Added Kevin's "Reviewed-by" to two of the patches.
* Fixed a NULL pointer bug in vfio_iommufd_emulated_bind().
* Moved unmap() call to the common place in iommufd_access_set_ioas().
v2:
https://lore.kernel.org/all/cover.1675802050.git.nicolinc@nvidia.com/
* Rebased on top of vfio_device cdev v2 series.
* Update the kdoc and commit message of iommu_group_replace_domain().
* Dropped revert-to-core-domain part in iommu_group_replace_domain().
* Dropped !ops->dma_unmap check in vfio_iommufd_emulated_attach_ioas().
* Added missing rc value in vfio_iommufd_emulated_attach_ioas() from the
iommufd_access_set_ioas() call.
* Added a new patch in vfio_main to deny vfio_pin/unpin_pages() calls if
vdev->ops->dma_unmap is not implemented.
* Added a __iommmufd_device_detach helper and let the replace routine do
a partial detach().
* Added restriction on auto_domains to use the replace feature.
* Added the patch "iommufd/device: Make hwpt_list list_add/del symmetric"
from the has_group removal series.
v1:
https://lore.kernel.org/all/cover.1675320212.git.nicolinc@nvidia.com/
Hi all,
The existing IOMMU APIs provide a pair of functions: iommu_attach_group()
for callers to attach a device from the default_domain (NULL if not being
supported) to a given iommu domain, and iommu_detach_group() for callers
to detach a device from a given domain to the default_domain. Internally,
the detach_dev op is deprecated for the newer drivers with default_domain.
This means that those drivers likely can switch an attaching domain to
another one, without stagging the device at a blocking or default domain,
for use cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
to an S1 domain, or when switching between relevant S1 domains.
This series is rebased on top of Jason Gunthorpe's series that introduces
iommu_group_replace_domain API and IOMMUFD infrastructure for the IOMMUFD
"physical" devices. The IOMMUFD "emulated" deivces will need some extra
steps to replace the access->ioas object and its iopt pointer.
You can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v11
Thank you
Nicolin Chen
Nicolin Chen (7):
vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
iommufd: Allow passing in iopt_access_list_id to iopt_remove_access()
iommufd: Add iommufd_access_change_ioas(_id) helpers
iommufd: Use iommufd_access_change_ioas in
iommufd_access_destroy_object
iommufd: Add iommufd_access_replace() API
iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
vfio: Support IO page table replacement
drivers/iommu/iommufd/device.c | 128 ++++++++++++------
drivers/iommu/iommufd/io_pagetable.c | 6 +-
drivers/iommu/iommufd/iommufd_private.h | 3 +-
drivers/iommu/iommufd/iommufd_test.h | 4 +
drivers/iommu/iommufd/selftest.c | 19 +++
drivers/vfio/iommufd.c | 11 +-
drivers/vfio/vfio_main.c | 4 +
include/linux/iommufd.h | 1 +
include/uapi/linux/vfio.h | 6 +
tools/testing/selftests/iommu/iommufd.c | 29 +++-
tools/testing/selftests/iommu/iommufd_utils.h | 19 +++
11 files changed, 179 insertions(+), 51 deletions(-)
--
2.41.0
A missing break in kms_tests leads to kselftest hang when the
parameter -s is used.
In current code flow because of missing break in -s, -t parses
args spilled from -s and as -t accepts only valid values as 0,1
so any arg in -s >1 or <0, gets in ksm_test failure
This went undetected since, before the addition of option -t,
the next case -M would immediately break out of the switch
statement but that is no longer the case
Add the missing break statement.
----Before----
./ksm_tests -H -s 100
Invalid merge type
----After----
./ksm_tests -H -s 100
Number of normal pages: 0
Number of huge pages: 50
Total size: 100 MiB
Total time: 0.401732682 s
Average speed: 248.922 MiB/s
Fixes: 9e7cb94ca218 ("selftests: vm: add KSM merging time test")
Signed-off-by: Ayush Jain <ayush.jain3(a)amd.com>
---
tools/testing/selftests/mm/ksm_tests.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mm/ksm_tests.c b/tools/testing/selftests/mm/ksm_tests.c
index 435acebdc325..380b691d3eb9 100644
--- a/tools/testing/selftests/mm/ksm_tests.c
+++ b/tools/testing/selftests/mm/ksm_tests.c
@@ -831,6 +831,7 @@ int main(int argc, char *argv[])
printf("Size must be greater than 0\n");
return KSFT_FAIL;
}
+ break;
case 't':
{
int tmp = atoi(optarg);
--
2.34.1
[ This series depends on the VFIO device cdev series ]
Changelog
v8:
* Rebased on top of Jason's iommufd_hwpt series and then cdev v15 series:
https://lore.kernel.org/all/0-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.…https://lore.kernel.org/kvm/20230718135551.6592-1-yi.l.liu@intel.com/
* Changed the order of detach() and attach() in replace(), to fix a bug
v7:
https://lore.kernel.org/all/cover.1683593831.git.nicolinc@nvidia.com/
* Rebased on top of v6.4-rc1 and cdev v11 candidate
* Fixed a wrong file in replace() API patch
* Added Kevin's "Reviewed-by" to replace() API patch
v6:
https://lore.kernel.org/all/cover.1679939952.git.nicolinc@nvidia.com/
* Rebased on top of cdev v8 series
https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/
* Added "Reviewed-by" from Kevin to PATCH-4
* Squashed access->ioas updating lines into iommufd_access_change_pt(),
and changed function return type accordingly for simplification.
v5:
https://lore.kernel.org/all/cover.1679559476.git.nicolinc@nvidia.com/
* Kept the cmd->id in the iommufd_test_create_access() so the access can
be created with an ioas by default. Then, renamed the previous ioctl
IOMMU_TEST_OP_ACCESS_SET_IOAS to IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, so
it would be used to replace an access->ioas pointer.
* Added iommufd_access_replace() API after the introductions of the other
two APIs iommufd_access_attach() and iommufd_access_detach().
* Since vdev->iommufd_attached is also set in emulated pathway too, call
iommufd_access_update(), similar to the physical pathway.
v4:
https://lore.kernel.org/all/cover.1678284812.git.nicolinc@nvidia.com/
* Rebased on top of Jason's series adding replace() and hwpt_alloc()
https://lore.kernel.org/all/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia…
* Rebased on top of cdev series v6
https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/
* Dropped the patch that's moved to cdev series.
* Added unmap function pointer sanity before calling it.
* Added "Reviewed-by" from Kevin and Yi.
* Added back the VFIO change updating the ATTACH uAPI.
v3:
https://lore.kernel.org/all/cover.1677288789.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt branch:
https://lore.kernel.org/all/0-v2-406f7ac07936+6a-iommufd_hwpt_jgg@nvidia.co…
* Dropped patches from this series accordingly. There were a couple of
VFIO patches that will be submitted after the VFIO cdev series. Also,
renamed the series to be "emulated".
* Moved dma_unmap sanity patch to the first in the series.
* Moved dma_unmap sanity to cover both VFIO and IOMMUFD pathways.
* Added Kevin's "Reviewed-by" to two of the patches.
* Fixed a NULL pointer bug in vfio_iommufd_emulated_bind().
* Moved unmap() call to the common place in iommufd_access_set_ioas().
v2:
https://lore.kernel.org/all/cover.1675802050.git.nicolinc@nvidia.com/
* Rebased on top of vfio_device cdev v2 series.
* Update the kdoc and commit message of iommu_group_replace_domain().
* Dropped revert-to-core-domain part in iommu_group_replace_domain().
* Dropped !ops->dma_unmap check in vfio_iommufd_emulated_attach_ioas().
* Added missing rc value in vfio_iommufd_emulated_attach_ioas() from the
iommufd_access_set_ioas() call.
* Added a new patch in vfio_main to deny vfio_pin/unpin_pages() calls if
vdev->ops->dma_unmap is not implemented.
* Added a __iommmufd_device_detach helper and let the replace routine do
a partial detach().
* Added restriction on auto_domains to use the replace feature.
* Added the patch "iommufd/device: Make hwpt_list list_add/del symmetric"
from the has_group removal series.
v1:
https://lore.kernel.org/all/cover.1675320212.git.nicolinc@nvidia.com/
Hi all,
The existing IOMMU APIs provide a pair of functions: iommu_attach_group()
for callers to attach a device from the default_domain (NULL if not being
supported) to a given iommu domain, and iommu_detach_group() for callers
to detach a device from a given domain to the default_domain. Internally,
the detach_dev op is deprecated for the newer drivers with default_domain.
This means that those drivers likely can switch an attaching domain to
another one, without stagging the device at a blocking or default domain,
for use cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
to an S1 domain, or when switching between relevant S1 domains.
This series is rebased on top of Jason Gunthorpe's series that introduces
iommu_group_replace_domain API and IOMMUFD infrastructure for the IOMMUFD
"physical" devices. The IOMMUFD "emulated" deivces will need some extra
steps to replace the access->ioas object and its iopt pointer.
You can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v8
Thank you
Nicolin Chen
Nicolin Chen (4):
vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
iommufd: Add iommufd_access_replace() API
iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
vfio: Support IO page table replacement
drivers/iommu/iommufd/device.c | 72 ++++++++++++++-----
drivers/iommu/iommufd/iommufd_test.h | 4 ++
drivers/iommu/iommufd/selftest.c | 19 +++++
drivers/vfio/iommufd.c | 11 +--
drivers/vfio/vfio_main.c | 4 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/vfio.h | 6 ++
tools/testing/selftests/iommu/iommufd.c | 29 +++++++-
tools/testing/selftests/iommu/iommufd_utils.h | 19 +++++
9 files changed, 142 insertions(+), 23 deletions(-)
--
2.41.0
PTP_SYS_OFFSET_EXTENDED was added in November 2018 in
361800876f80 (" ptp: add PTP_SYS_OFFSET_EXTENDED ioctl")
and PTP_SYS_OFFSET_PRECISE was added in February 2016 in
719f1aa4a671 ("ptp: Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping")
The PTP selftest code is lacking support for these two IOCTLS.
This short series of patches adds support for them.
Changes in v2:
- Fixed rebase issues (v1 somehow ended up with patch 1 being from the
first manual split of my changes and patch 2 being from rebase 2 out
of 3)
- Rebased on top of net-next
Alex Maftei (2):
selftests/ptp: Add -x option for testing PTP_SYS_OFFSET_EXTENDED
selftests/ptp: Add -X option for testing PTP_SYS_OFFSET_PRECISE
tools/testing/selftests/ptp/testptp.c | 73 ++++++++++++++++++++++++++-
1 file changed, 71 insertions(+), 2 deletions(-)
--
2.25.1
[ This series depends on the VFIO device cdev series ]
Changelog
v10:
* Added Reviewed-by from Jason
* Replaced the iommufd_ref_to_user call with refcount_inc
* Added a wrapper iommufd_access_change_ioas_id and used it in the
iommufd_access_attach() and iommufd_access_replace() APIs
v9:
https://lore.kernel.org/linux-iommu/cover.1690440730.git.nicolinc@nvidia.co…
* Rebased on top of Jason's iommufd for-next tree
* Added Reviewed-by from Jason and Alex
* Reworked the replace API patches
* Added a new patch allowing passing in to iopt_remove_access
* Added a new patch of a helper function following Jason's design,
mainly by blocking any concurrent detach/replace and keeping the
refcount_dec at the end of the function
* Added a call of the new helper in iommufd_access_destroy_object()
to reduce race condition
* Simplified the replace API patch
v8:
https://lore.kernel.org/all/cover.1690226015.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt series and then cdev v15 series:
https://lore.kernel.org/all/0-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.…https://lore.kernel.org/kvm/20230718135551.6592-1-yi.l.liu@intel.com/
* Changed the order of detach() and attach() in replace(), to fix a bug
v7:
https://lore.kernel.org/all/cover.1683593831.git.nicolinc@nvidia.com/
* Rebased on top of v6.4-rc1 and cdev v11 candidate
* Fixed a wrong file in replace() API patch
* Added Kevin's "Reviewed-by" to replace() API patch
v6:
https://lore.kernel.org/all/cover.1679939952.git.nicolinc@nvidia.com/
* Rebased on top of cdev v8 series
https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/
* Added "Reviewed-by" from Kevin to PATCH-4
* Squashed access->ioas updating lines into iommufd_access_change_pt(),
and changed function return type accordingly for simplification.
v5:
https://lore.kernel.org/all/cover.1679559476.git.nicolinc@nvidia.com/
* Kept the cmd->id in the iommufd_test_create_access() so the access can
be created with an ioas by default. Then, renamed the previous ioctl
IOMMU_TEST_OP_ACCESS_SET_IOAS to IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, so
it would be used to replace an access->ioas pointer.
* Added iommufd_access_replace() API after the introductions of the other
two APIs iommufd_access_attach() and iommufd_access_detach().
* Since vdev->iommufd_attached is also set in emulated pathway too, call
iommufd_access_update(), similar to the physical pathway.
v4:
https://lore.kernel.org/all/cover.1678284812.git.nicolinc@nvidia.com/
* Rebased on top of Jason's series adding replace() and hwpt_alloc()
https://lore.kernel.org/all/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia…
* Rebased on top of cdev series v6
https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/
* Dropped the patch that's moved to cdev series.
* Added unmap function pointer sanity before calling it.
* Added "Reviewed-by" from Kevin and Yi.
* Added back the VFIO change updating the ATTACH uAPI.
v3:
https://lore.kernel.org/all/cover.1677288789.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt branch:
https://lore.kernel.org/all/0-v2-406f7ac07936+6a-iommufd_hwpt_jgg@nvidia.co…
* Dropped patches from this series accordingly. There were a couple of
VFIO patches that will be submitted after the VFIO cdev series. Also,
renamed the series to be "emulated".
* Moved dma_unmap sanity patch to the first in the series.
* Moved dma_unmap sanity to cover both VFIO and IOMMUFD pathways.
* Added Kevin's "Reviewed-by" to two of the patches.
* Fixed a NULL pointer bug in vfio_iommufd_emulated_bind().
* Moved unmap() call to the common place in iommufd_access_set_ioas().
v2:
https://lore.kernel.org/all/cover.1675802050.git.nicolinc@nvidia.com/
* Rebased on top of vfio_device cdev v2 series.
* Update the kdoc and commit message of iommu_group_replace_domain().
* Dropped revert-to-core-domain part in iommu_group_replace_domain().
* Dropped !ops->dma_unmap check in vfio_iommufd_emulated_attach_ioas().
* Added missing rc value in vfio_iommufd_emulated_attach_ioas() from the
iommufd_access_set_ioas() call.
* Added a new patch in vfio_main to deny vfio_pin/unpin_pages() calls if
vdev->ops->dma_unmap is not implemented.
* Added a __iommmufd_device_detach helper and let the replace routine do
a partial detach().
* Added restriction on auto_domains to use the replace feature.
* Added the patch "iommufd/device: Make hwpt_list list_add/del symmetric"
from the has_group removal series.
v1:
https://lore.kernel.org/all/cover.1675320212.git.nicolinc@nvidia.com/
Hi all,
The existing IOMMU APIs provide a pair of functions: iommu_attach_group()
for callers to attach a device from the default_domain (NULL if not being
supported) to a given iommu domain, and iommu_detach_group() for callers
to detach a device from a given domain to the default_domain. Internally,
the detach_dev op is deprecated for the newer drivers with default_domain.
This means that those drivers likely can switch an attaching domain to
another one, without stagging the device at a blocking or default domain,
for use cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
to an S1 domain, or when switching between relevant S1 domains.
This series is rebased on top of Jason Gunthorpe's series that introduces
iommu_group_replace_domain API and IOMMUFD infrastructure for the IOMMUFD
"physical" devices. The IOMMUFD "emulated" deivces will need some extra
steps to replace the access->ioas object and its iopt pointer.
You can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v10
Thank you
Nicolin Chen
Nicolin Chen (6):
vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
iommufd: Allow passing in iopt_access_list_id to iopt_remove_access()
iommufd: Add iommufd_access_change_ioas(_id) helpers
iommufd: Add iommufd_access_replace() API
iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
vfio: Support IO page table replacement
drivers/iommu/iommufd/device.c | 132 ++++++++++++------
drivers/iommu/iommufd/io_pagetable.c | 6 +-
drivers/iommu/iommufd/iommufd_private.h | 3 +-
drivers/iommu/iommufd/iommufd_test.h | 4 +
drivers/iommu/iommufd/selftest.c | 19 +++
drivers/vfio/iommufd.c | 11 +-
drivers/vfio/vfio_main.c | 4 +
include/linux/iommufd.h | 1 +
include/uapi/linux/vfio.h | 6 +
tools/testing/selftests/iommu/iommufd.c | 29 +++-
tools/testing/selftests/iommu/iommufd_utils.h | 19 +++
11 files changed, 184 insertions(+), 50 deletions(-)
--
2.41.0
Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Users can now select a
desired address space using a non-zero hint address to mmap. Previously,
requesting the default address space from mmap by passing zero as the hint
address would result in using the largest address space possible. Some
applications depend on empty bits in the virtual address space, like Go and
Java, so this patch provides more flexibility for application developers.
-Charlie
---
v7:
- Changing RLIMIT_STACK inside of an executing program does not trigger
arch_pick_mmap_layout(), so rewrite tests to change RLIMIT_STACK from a
script before executing tests. RLIMIT_STACK of infinity forces bottomup
mmap allocation.
- Make arch_get_mmap_base macro more readible by extracting out the rnd
calculation.
- Use MMAP_MIN_VA_BITS in TASK_UNMAPPED_BASE to support case when mmap
attempts to allocate address smaller than DEFAULT_MAP_WINDOW.
- Fix incorrect wording in documentation.
v6:
- Rebase onto the correct base
v5:
- Minor wording change in documentation
- Change some parenthesis in arch_get_mmap_ macros
- Added case for addr==0 in arch_get_mmap_ because without this, programs would
crash if RLIMIT_STACK was modified before executing the program. This was
tested using the libhugetlbfs tests.
v4:
- Split testcases/document patch into test cases, in-code documentation, and
formal documentation patches
- Modified the mmap_base macro to be more legible and better represent memory
layout
- Fixed documentation to better reflect the implmentation
- Renamed DEFAULT_VA_BITS to MMAP_VA_BITS
- Added additional test case for rlimit changes
---
Charlie Jenkins (4):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Document mmap changes
Documentation/riscv/vm-layout.rst | 22 +++++++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 21 ++++--
arch/riscv/include/asm/processor.h | 47 ++++++++++++--
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/.gitignore | 2 +
tools/testing/selftests/riscv/mm/Makefile | 15 +++++
.../riscv/mm/testcases/mmap_bottomup.c | 35 ++++++++++
.../riscv/mm/testcases/mmap_default.c | 35 ++++++++++
.../selftests/riscv/mm/testcases/mmap_test.h | 64 +++++++++++++++++++
.../selftests/riscv/mm/testcases/run_mmap.sh | 12 ++++
11 files changed, 244 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_default.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_test.h
create mode 100755 tools/testing/selftests/riscv/mm/testcases/run_mmap.sh
--
2.41.0
[ This series depends on the VFIO device cdev series ]
Changelog
v9:
* Rebased on top of Jason's iommufd for-next tree
* Added Reviewed-by from Jason and Alex
* Reworked the replace API patches
* Added a new patch allowing passing in to iopt_remove_access
* Added a new patch of a helper function following Jason's design,
mainly by blocking any concurrent detach/replace and keeping the
refcount_dec at the end of the function
* Added a call of the new helper in iommufd_access_destroy_object()
to reduce race condition
* Simplified the replace API patch
v8:
https://lore.kernel.org/all/cover.1690226015.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt series and then cdev v15 series:
https://lore.kernel.org/all/0-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.…https://lore.kernel.org/kvm/20230718135551.6592-1-yi.l.liu@intel.com/
* Changed the order of detach() and attach() in replace(), to fix a bug
v7:
https://lore.kernel.org/all/cover.1683593831.git.nicolinc@nvidia.com/
* Rebased on top of v6.4-rc1 and cdev v11 candidate
* Fixed a wrong file in replace() API patch
* Added Kevin's "Reviewed-by" to replace() API patch
v6:
https://lore.kernel.org/all/cover.1679939952.git.nicolinc@nvidia.com/
* Rebased on top of cdev v8 series
https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/
* Added "Reviewed-by" from Kevin to PATCH-4
* Squashed access->ioas updating lines into iommufd_access_change_pt(),
and changed function return type accordingly for simplification.
v5:
https://lore.kernel.org/all/cover.1679559476.git.nicolinc@nvidia.com/
* Kept the cmd->id in the iommufd_test_create_access() so the access can
be created with an ioas by default. Then, renamed the previous ioctl
IOMMU_TEST_OP_ACCESS_SET_IOAS to IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, so
it would be used to replace an access->ioas pointer.
* Added iommufd_access_replace() API after the introductions of the other
two APIs iommufd_access_attach() and iommufd_access_detach().
* Since vdev->iommufd_attached is also set in emulated pathway too, call
iommufd_access_update(), similar to the physical pathway.
v4:
https://lore.kernel.org/all/cover.1678284812.git.nicolinc@nvidia.com/
* Rebased on top of Jason's series adding replace() and hwpt_alloc()
https://lore.kernel.org/all/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia…
* Rebased on top of cdev series v6
https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/
* Dropped the patch that's moved to cdev series.
* Added unmap function pointer sanity before calling it.
* Added "Reviewed-by" from Kevin and Yi.
* Added back the VFIO change updating the ATTACH uAPI.
v3:
https://lore.kernel.org/all/cover.1677288789.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt branch:
https://lore.kernel.org/all/0-v2-406f7ac07936+6a-iommufd_hwpt_jgg@nvidia.co…
* Dropped patches from this series accordingly. There were a couple of
VFIO patches that will be submitted after the VFIO cdev series. Also,
renamed the series to be "emulated".
* Moved dma_unmap sanity patch to the first in the series.
* Moved dma_unmap sanity to cover both VFIO and IOMMUFD pathways.
* Added Kevin's "Reviewed-by" to two of the patches.
* Fixed a NULL pointer bug in vfio_iommufd_emulated_bind().
* Moved unmap() call to the common place in iommufd_access_set_ioas().
v2:
https://lore.kernel.org/all/cover.1675802050.git.nicolinc@nvidia.com/
* Rebased on top of vfio_device cdev v2 series.
* Update the kdoc and commit message of iommu_group_replace_domain().
* Dropped revert-to-core-domain part in iommu_group_replace_domain().
* Dropped !ops->dma_unmap check in vfio_iommufd_emulated_attach_ioas().
* Added missing rc value in vfio_iommufd_emulated_attach_ioas() from the
iommufd_access_set_ioas() call.
* Added a new patch in vfio_main to deny vfio_pin/unpin_pages() calls if
vdev->ops->dma_unmap is not implemented.
* Added a __iommmufd_device_detach helper and let the replace routine do
a partial detach().
* Added restriction on auto_domains to use the replace feature.
* Added the patch "iommufd/device: Make hwpt_list list_add/del symmetric"
from the has_group removal series.
v1:
https://lore.kernel.org/all/cover.1675320212.git.nicolinc@nvidia.com/
Hi all,
The existing IOMMU APIs provide a pair of functions: iommu_attach_group()
for callers to attach a device from the default_domain (NULL if not being
supported) to a given iommu domain, and iommu_detach_group() for callers
to detach a device from a given domain to the default_domain. Internally,
the detach_dev op is deprecated for the newer drivers with default_domain.
This means that those drivers likely can switch an attaching domain to
another one, without stagging the device at a blocking or default domain,
for use cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
to an S1 domain, or when switching between relevant S1 domains.
This series is rebased on top of Jason Gunthorpe's series that introduces
iommu_group_replace_domain API and IOMMUFD infrastructure for the IOMMUFD
"physical" devices. The IOMMUFD "emulated" deivces will need some extra
steps to replace the access->ioas object and its iopt pointer.
You can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v9
Thank you
Nicolin Chen
Nicolin Chen (6):
vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
iommufd: Allow passing in iopt_access_list_id to iopt_remove_access()
iommufd: Add iommufd_access_change_ioas helper
iommufd: Add iommufd_access_replace() API
iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
vfio: Support IO page table replacement
drivers/iommu/iommufd/device.c | 123 ++++++++++++------
drivers/iommu/iommufd/io_pagetable.c | 6 +-
drivers/iommu/iommufd/iommufd_private.h | 3 +-
drivers/iommu/iommufd/iommufd_test.h | 4 +
drivers/iommu/iommufd/selftest.c | 19 +++
drivers/vfio/iommufd.c | 11 +-
drivers/vfio/vfio_main.c | 4 +
include/linux/iommufd.h | 1 +
include/uapi/linux/vfio.h | 6 +
tools/testing/selftests/iommu/iommufd.c | 29 ++++-
tools/testing/selftests/iommu/iommufd_utils.h | 19 +++
11 files changed, 175 insertions(+), 50 deletions(-)
--
2.41.0
When we collect a signal context with one of the SME modes enabled we will
have enabled that mode behind the compiler and libc's back so they may
issue some instructions not valid in streaming mode, causing spurious
failures.
For the code prior to issuing the BRK to trigger signal handling we need to
stay in streaming mode if we were already there since that's a part of the
signal context the caller is trying to collect. Unfortunately this code
includes a memset() which is likely to be heavily optimised and is likely
to use FP instructions incompatible with streaming mode. We can avoid this
happening by open coding the memset(), inserting a volatile assembly
statement to avoid the compiler recognising what's being done and doing
something in optimisation. This code is not performance critical so the
inefficiency should not be an issue.
After collecting the context we can simply exit streaming mode, avoiding
these issues. Use a full SMSTOP for safety to prevent any issues appearing
with ZA.
Reported-by: Will Deacon <will(a)kernel.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v3:
- Open code OPTIMISER_HIDE_VAR() instead of the memory clobber.
- Link to v2: https://lore.kernel.org/r/20230712-arm64-signal-memcpy-fix-v2-1-494f7025caf…
Changes in v2:
- Rebase onto v6.5-rc1.
- Link to v1: https://lore.kernel.org/r/20230628-arm64-signal-memcpy-fix-v1-1-db3e0300829…
---
.../selftests/arm64/signal/test_signals_utils.h | 25 +++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.h b/tools/testing/selftests/arm64/signal/test_signals_utils.h
index 222093f51b67..c7f5627171dd 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.h
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.h
@@ -60,13 +60,25 @@ static __always_inline bool get_current_context(struct tdescr *td,
size_t dest_sz)
{
static volatile bool seen_already;
+ int i;
+ char *uc = (char *)dest_uc;
assert(td && dest_uc);
/* it's a genuine invocation..reinit */
seen_already = 0;
td->live_uc_valid = 0;
td->live_sz = dest_sz;
- memset(dest_uc, 0x00, td->live_sz);
+
+ /*
+ * This is a memset() but we don't want the compiler to
+ * optimise it into either instructions or a library call
+ * which might be incompatible with streaming mode.
+ */
+ for (i = 0; i < td->live_sz; i++) {
+ uc[i] = 0;
+ __asm__ ("" : "=r" (uc[i]) : "0" (uc[i]));
+ }
+
td->live_uc = dest_uc;
/*
* Grab ucontext_t triggering a SIGTRAP.
@@ -103,6 +115,17 @@ static __always_inline bool get_current_context(struct tdescr *td,
:
: "memory");
+ /*
+ * If we were grabbing a streaming mode context then we may
+ * have entered streaming mode behind the system's back and
+ * libc or compiler generated code might decide to do
+ * something invalid in streaming mode, or potentially even
+ * the state of ZA. Issue a SMSTOP to exit both now we have
+ * grabbed the state.
+ */
+ if (td->feats_supported & FEAT_SME)
+ asm volatile("msr S0_3_C4_C6_3, xzr");
+
/*
* If we get here with seen_already==1 it implies the td->live_uc
* context has been used to get back here....this probably means
---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230628-arm64-signal-memcpy-fix-7de3b3c8fa10
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This series fixes an issue which David Spickett found where if we change
the SVE VL while SME is in use we can end up attempting to save state to
an unallocated buffer and adds testing coverage for that plus a bit more
coverage of VL changes, just for paranioa.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Always reallocate the SVE state.
- Rebase onto v6.5-rc2.
- Link to v1: https://lore.kernel.org/r/20230713-arm64-fix-sve-sme-vl-change-v1-0-129dd86…
---
Mark Brown (3):
arm64/fpsimd: Ensure SME storage is allocated after SVE VL changes
kselftest/arm64: Add a test case for SVE VL changes with SME active
kselftest/arm64: Validate that changing one VL type does not affect another
arch/arm64/kernel/fpsimd.c | 33 +++++--
tools/testing/selftests/arm64/fp/vec-syscfg.c | 127 +++++++++++++++++++++++++-
2 files changed, 148 insertions(+), 12 deletions(-)
---
base-commit: 06785562d1b99ff6dc1cd0af54be5e3ff999dc02
change-id: 20230713-arm64-fix-sve-sme-vl-change-60eb1fa6a707
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The openvswitch selftests currently contain a few cases for managing the
datapath, which includes creating datapath instances, adding interfaces,
and doing some basic feature / upcall tests. This is useful to validate
the control path.
Add the ability to program some of the more common flows with actions. This
can be improved overtime to include regression testing, etc.
Aaron Conole (4):
selftests: openvswitch: add an initial flow programming case
selftests: openvswitch: add a test for ipv4 forwarding
selftests: openvswitch: add basic ct test case parsing
selftests: openvswitch: add ct-nat test case with ipv4
.../selftests/net/openvswitch/openvswitch.sh | 223 ++++++++
.../selftests/net/openvswitch/ovs-dpctl.py | 507 ++++++++++++++++++
2 files changed, 730 insertions(+)
--
2.40.1
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
With this proposal we're able to reach ~96.6% line rate speeds with data sent
and received directly from/to device memory.
* Patch overview:
** Part 1: struct paged device memory
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to understand a new memory type.
This proposal implements option #1. We implement a small framework to generate
struct pages for an sg_table returned from dma_buf_map_attachment(). The support
added here should be generic and easily extended to other use cases interested
in struct paged device memory. We use this framework to generate pages that can
be used in the networking stack.
** Part 2: recvmsg() & sendmsg() APIs
We define user APIs for the user to send and receive these dmabuf pages.
** part 3: support for unreadable skb frags
Dmabuf pages are not accessible by the host; we implement changes throughput the
networking stack to correctly handle skbs with unreadable frags.
** part 4: page pool support
We piggy back on Jakub's page pool memory providers idea:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory TCP
changes from the driver.
This is not strictly necessary, the driver can choose to allocate dmabuf pages
and use them directly without going through the page pool (if acceptable to
their maintainers).
Not included with this RFC is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This RFC is built on top of v6.4-rc7 with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using dmabuf pages
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
Not included in this series is our devmem TCP benchmark, which
transfers data to/from GPU dmabufs directly.
With this implementation & benchmark we're able to reach ~96.6% line rate
speeds with 4 GPU/NIC pairs running bi-direction traffic, with all the
packet payloads going straight to the GPU memory (no host buffer bounce).
** Test Setup
Kernel: v6.4-rc7, with this RFC and Jakub's memory provider API
cherry-picked locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Benchmark: custom devmem TCP benchmark not yet open sourced.
Mina Almasry (10):
dma-buf: add support for paged attachment mappings
dma-buf: add support for NET_RX pages
dma-buf: add support for NET_TX pages
net: add support for skbs with unreadable frags
tcp: implement recvmsg() RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX pages
tcp: implement sendmsg() TX path for for devmem tcp
selftests: add ncdevmem, netcat for devmem TCP
memory-provider: updates core provider API for devmem TCP
memory-provider: add dmabuf devmem provider
drivers/dma-buf/dma-buf.c | 444 ++++++++++++++++
include/linux/dma-buf.h | 142 +++++
include/linux/netdevice.h | 1 +
include/linux/skbuff.h | 34 +-
include/linux/socket.h | 1 +
include/net/page_pool.h | 21 +
include/net/sock.h | 4 +
include/net/tcp.h | 6 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/dma-buf.h | 12 +
include/uapi/linux/uio.h | 10 +
net/core/datagram.c | 3 +
net/core/page_pool.c | 111 +++-
net/core/skbuff.c | 81 ++-
net/core/sock.c | 47 ++
net/ipv4/tcp.c | 262 +++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 8 +
net/ipv4/tcp_output.c | 5 +-
net/packet/af_packet.c | 4 +-
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/ncdevmem.c | 693 +++++++++++++++++++++++++
23 files changed, 1868 insertions(+), 42 deletions(-)
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.41.0.390.g38632f3daf-goog
Events Tracing infrastructure contains lot of files, directories
(internally in terms of inodes, dentries). And ends up by consuming
memory in MBs. We can have multiple events of Events Tracing, which
further requires more memory.
Instead of creating inodes/dentries, eventfs could keep meta-data and
skip the creation of inodes/dentries. As and when require, eventfs will
create the inodes/dentries only for required files/directories.
Also eventfs would delete the inodes/dentries once no more requires
but preserve the meta data.
Tracing events took ~9MB, with this approach it took ~4.5MB
for ~10K files/dir.
Diff from v4:
Patch 02: moved from v4 08/10
added fs/tracefs/internal.h
Patch 03: moved from v4 02/10
removed fs/tracefs/internal.h
Patch 04: moved from v4 03/10
moved out changes of fs/tracefs/internal.h
Patch 05: moved from v4 04/10
renamed eventfs_add_top_file() -> eventfs_add_events_file()
Patch 06: moved from v4 07/10
implemented create_dentry() helper function
added create_file(), create_dir() stub function
Patch 07: moved from v4 06/10
Patch 08: moved from v4 05/10
improved eventfs remove functionality
Patch 09: removed unwanted if conditions
Patch 10: added available_filter_functions check
Diff from v3:
Patch 3,4,5,7,9:
removed all the eventfs_rwsem code and replaced it with an srcu
lock for the readers, and a mutex to synchronize the writers of
the list.
Patch 2: moved 'tracefs_inode' and 'get_tracefs()' to v4 03/10
Patch 3: moved the struct eventfs_file and eventfs_inode into event_inode.c
as it really should not be exposed to all users.
Patch 5: added a recursion check to eventfs_remove_rec() as it is really
dangerous to have unchecked recursion in the kernel (we do have
a fixed size stack).
have the free use srcu callbacks. After the srcu grace periods
are done, it adds the eventfs_file onto a llist (lockless link
list) and wakes up a work queue. Then the work queue does the
freeing (this needs to be done in task/workqueue context, as
srcu callbacks are done in softirq context).
Patch 6: renamed:
eventfs_create_file() -> create_file()
eventfs_create_dir() -> create_dir()
Diff from v2:
Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM'
Patch 02: moved from v1 1/9
Patch 03: moved from v1 2/9
As suggested by Zheng Yejian, introduced eventfs_prepare_ef()
helper function to add files or directories to eventfs
fix WARNING reported by kernel test robot in v1 8/9
Patch 04: moved from v1 3/9
used eventfs_prepare_ef() to add files
fix WARNING reported by kernel test robot in v1 8/9
Patch 05: moved from v1 4/9
fix compiling warning reported by kernel test robot in v1 4/9
Patch 06: moved from v1 5/9
Patch 07: moved from v1 6/9
Patch 08: moved from v1 7/9
Patch 09: moved from v1 8/9
rebased because of v3 01/10
Patch 10: moved from v1 9/9
Diff from v1:
Patch 1: add header file
Patch 2: resolved kernel test robot issues
protecting eventfs lists using nested eventfs_rwsem
Patch 3: protecting eventfs lists using nested eventfs_rwsem
Patch 4: improve events cleanup code to fix crashes
Patch 5: resolved kernel test robot issues
removed d_instantiate_anon() calls
Patch 6: resolved kernel test robot issues
fix kprobe test in eventfs_root_lookup()
protecting eventfs lists using nested eventfs_rwsem
Patch 7: remove header file
Patch 8: pass eventfs_rwsem as argument to eventfs functions
called eventfs_remove_events_dir() instead of tracefs_remove()
from event_trace_del_tracer()
Patch 9: new patch to fix kprobe test case
fs/tracefs/Makefile | 1 +
fs/tracefs/event_inode.c | 795 ++++++++++++++++++
fs/tracefs/inode.c | 151 +++-
fs/tracefs/internal.h | 26 +
include/linux/trace_events.h | 1 +
include/linux/tracefs.h | 30 +
kernel/trace/trace.h | 2 +-
kernel/trace/trace_events.c | 76 +-
.../ftrace/test.d/kprobe/kprobe_args_char.tc | 9 +-
.../test.d/kprobe/kprobe_args_string.tc | 9 +-
10 files changed, 1048 insertions(+), 52 deletions(-)
create mode 100644 fs/tracefs/event_inode.c
create mode 100644 fs/tracefs/internal.h
--
2.39.0
Hello,
This is v4 of the patch series for TDX selftests.
It has been updated for Intel’s v14 of the TDX host patches which was
proposed here:
https://lore.kernel.org/lkml/cover.1685333727.git.isaku.yamahata@intel.com/
The tree can be found at:
https://github.com/googleprodkernel/linux-cc/tree/tdx-selftests-rfc-v4
Changes from RFC v3:
In v14, TDX can only run with UPM enabled so the necessary changes were
made to handle that.
td_vcpu_run() was added to handle TdVmCalls that are now handled in
userspace.
The comments under the patch "KVM: selftests: Require GCC to realign
stacks on function entry" were addressed with the following patch:
https://lore.kernel.org/lkml/Y%2FfHLdvKHlK6D%2F1v@google.com/T/
And other minor tweaks were made to integrate the selftest
infrastructure onto v14.
In RFCv4, TDX selftest code is organized into:
+ headers in tools/testing/selftests/kvm/include/x86_64/tdx/
+ common code in tools/testing/selftests/kvm/lib/x86_64/tdx/
+ selftests in tools/testing/selftests/kvm/x86_64/tdx_*
Dependencies
+ Peter’s patches, which provide functions for the host to allocate
and track protected memory in the
guest. https://lore.kernel.org/lkml/20221018205845.770121-1-pgonda@google.com/T/
Further work for this patch series/TODOs
+ Sean’s comments for the non-confidential UPM selftests patch series
at https://lore.kernel.org/lkml/Y8dC8WDwEmYixJqt@google.com/T/#u apply
here as well
+ Add ucall support for TDX selftests
I would also like to acknowledge the following people, who helped
review or test patches in RFCv1, RFCv2, and RFCv3:
+ Sean Christopherson <seanjc(a)google.com>
+ Zhenzhong Duan <zhenzhong.duan(a)intel.com>
+ Peter Gonda <pgonda(a)google.com>
+ Andrew Jones <drjones(a)redhat.com>
+ Maxim Levitsky <mlevitsk(a)redhat.com>
+ Xiaoyao Li <xiaoyao.li(a)intel.com>
+ David Matlack <dmatlack(a)google.com>
+ Marc Orr <marcorr(a)google.com>
+ Isaku Yamahata <isaku.yamahata(a)gmail.com>
+ Maciej S. Szmigiero <maciej.szmigiero(a)oracle.com>
Links to earlier patch series
+ RFC v1: https://lore.kernel.org/lkml/20210726183816.1343022-1-erdemaktas@google.com…
+ RFC v2: https://lore.kernel.org/lkml/20220830222000.709028-1-sagis@google.com/T/#u
+ RFC v3: https://lore.kernel.org/lkml/20230121001542.2472357-1-ackerleytng@google.co…
Ackerley Tng (12):
KVM: selftests: Add function to allow one-to-one GVA to GPA mappings
KVM: selftests: Expose function that sets up sregs based on VM's mode
KVM: selftests: Store initial stack address in struct kvm_vcpu
KVM: selftests: Refactor steps in vCPU descriptor table initialization
KVM: selftests: TDX: Use KVM_TDX_CAPABILITIES to validate TDs'
attribute configuration
KVM: selftests: TDX: Update load_td_memory_region for VM memory backed
by guest memfd
KVM: selftests: Add functions to allow mapping as shared
KVM: selftests: Expose _vm_vaddr_alloc
KVM: selftests: TDX: Add support for TDG.MEM.PAGE.ACCEPT
KVM: selftests: TDX: Add support for TDG.VP.VEINFO.GET
KVM: selftests: TDX: Add TDX UPM selftest
KVM: selftests: TDX: Add TDX UPM selftests for implicit conversion
Erdem Aktas (3):
KVM: selftests: Add helper functions to create TDX VMs
KVM: selftests: TDX: Add TDX lifecycle test
KVM: selftests: TDX: Adding test case for TDX port IO
Roger Wang (1):
KVM: selftests: TDX: Add TDG.VP.INFO test
Ryan Afranji (2):
KVM: selftests: TDX: Verify the behavior when host consumes a TD
private memory
KVM: selftests: TDX: Add shared memory test
Sagi Shahar (10):
KVM: selftests: TDX: Add report_fatal_error test
KVM: selftests: TDX: Add basic TDX CPUID test
KVM: selftests: TDX: Add basic get_td_vmcall_info test
KVM: selftests: TDX: Add TDX IO writes test
KVM: selftests: TDX: Add TDX IO reads test
KVM: selftests: TDX: Add TDX MSR read/write tests
KVM: selftests: TDX: Add TDX HLT exit test
KVM: selftests: TDX: Add TDX MMIO reads test
KVM: selftests: TDX: Add TDX MMIO writes test
KVM: selftests: TDX: Add TDX CPUID TDVMCALL test
tools/testing/selftests/kvm/Makefile | 8 +
.../selftests/kvm/include/kvm_util_base.h | 35 +
.../selftests/kvm/include/x86_64/processor.h | 4 +
.../kvm/include/x86_64/tdx/td_boot.h | 82 +
.../kvm/include/x86_64/tdx/td_boot_asm.h | 16 +
.../selftests/kvm/include/x86_64/tdx/tdcall.h | 59 +
.../selftests/kvm/include/x86_64/tdx/tdx.h | 65 +
.../kvm/include/x86_64/tdx/tdx_util.h | 19 +
.../kvm/include/x86_64/tdx/test_util.h | 164 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 115 +-
.../selftests/kvm/lib/x86_64/processor.c | 77 +-
.../selftests/kvm/lib/x86_64/tdx/td_boot.S | 101 ++
.../selftests/kvm/lib/x86_64/tdx/tdcall.S | 158 ++
.../selftests/kvm/lib/x86_64/tdx/tdx.c | 262 ++++
.../selftests/kvm/lib/x86_64/tdx/tdx_util.c | 565 +++++++
.../selftests/kvm/lib/x86_64/tdx/test_util.c | 101 ++
.../kvm/x86_64/tdx_shared_mem_test.c | 134 ++
.../selftests/kvm/x86_64/tdx_upm_test.c | 469 ++++++
.../selftests/kvm/x86_64/tdx_vm_tests.c | 1322 +++++++++++++++++
19 files changed, 3730 insertions(+), 26 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/td_boot.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/td_boot_asm.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/tdcall.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/tdx.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/tdx_util.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/test_util.h
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/td_boot.S
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/tdcall.S
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/tdx.c
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/tdx_util.c
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/test_util.c
create mode 100644 tools/testing/selftests/kvm/x86_64/tdx_shared_mem_test.c
create mode 100644 tools/testing/selftests/kvm/x86_64/tdx_upm_test.c
create mode 100644 tools/testing/selftests/kvm/x86_64/tdx_vm_tests.c
--
2.41.0.487.g6d72f3e995-goog
[ Resending because claws-mail is messing with the Cc again. It doesn't like quotes :-p ]
On Fri, 21 Jul 2023 08:48:39 -0400
Steven Rostedt <rostedt(a)goodmis.org> wrote:
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 4db048250cdb..2718de1533e6 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -36,16 +36,36 @@ struct eventfs_file {
> const struct file_operations *fop;
> const struct inode_operations *iop;
> union {
> + struct list_head del_list;
> struct rcu_head rcu;
> - struct llist_node llist; /* For freeing after RCU */
> + unsigned long is_freed; /* Freed if one of the above is set */
I changed the freeing around. The dentries are freed before returning from
eventfs_remove_dir().
I also added a "is_freed" field that is part of the union and is set if
list elements have content. Note, since the union was criticized before, I
will state the entire purpose of doing this patch set is to save memory.
This structure will be used for every event file. What's the point of
getting rid of dentries if we are replacing it with something just as big?
Anyway, struct dentry does the exact same thing!
> };
> void *data;
> umode_t mode;
> - bool created;
> + unsigned int flags;
Bah, I forgot to remove flags (one iteration replaced the created with
flags to set both created and freed). I removed the freed with the above
"is_freed" and noticed that created is set if and only if ef->dentry is
set. So instead of using the created boolean, just test ef->dentry.
The flags isn't used and can be removed. I just forgot to do so.
> };
>
> static DEFINE_MUTEX(eventfs_mutex);
> DEFINE_STATIC_SRCU(eventfs_srcu);
> +
> +static struct dentry *eventfs_root_lookup(struct inode *dir,
> + struct dentry *dentry,
> + unsigned int flags);
> +static int dcache_dir_open_wrapper(struct inode *inode, struct file *file);
> +static int eventfs_release(struct inode *inode, struct file *file);
> +
> +static const struct inode_operations eventfs_root_dir_inode_operations = {
> + .lookup = eventfs_root_lookup,
> +};
> +
> +static const struct file_operations eventfs_file_operations = {
> + .open = dcache_dir_open_wrapper,
> + .read = generic_read_dir,
> + .iterate_shared = dcache_readdir,
> + .llseek = generic_file_llseek,
> + .release = eventfs_release,
> +};
> +
In preparing for getting rid of eventfs_file, I noticed that all
directories are set to the above ops. In create_dir() instead of passing in
ef->*ops, just use these directly. This does help with future work.
> /**
> * create_file - create a file in the tracefs filesystem
> * @name: the name of the file to create.
> @@ -123,17 +143,12 @@ static struct dentry *create_file(const char *name, umode_t mode,
> * If tracefs is not enabled in the kernel, the value -%ENODEV will be
> * returned.
> */
> -static struct dentry *create_dir(const char *name, umode_t mode,
> - struct dentry *parent, void *data,
> - const struct file_operations *fop,
> - const struct inode_operations *iop)
> +static struct dentry *create_dir(const char *name, struct dentry *parent, void *data)
> {
As stated, the directories always used the same *op values, so I just hard
coded it.
> struct tracefs_inode *ti;
> struct dentry *dentry;
> struct inode *inode;
>
> - WARN_ON(!S_ISDIR(mode));
> -
> dentry = eventfs_start_creating(name, parent);
> if (IS_ERR(dentry))
> return dentry;
> @@ -142,9 +157,9 @@ static struct dentry *create_dir(const char *name, umode_t mode,
> if (unlikely(!inode))
> return eventfs_failed_creating(dentry);
>
> - inode->i_mode = mode;
> - inode->i_op = iop;
> - inode->i_fop = fop;
> + inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
> + inode->i_op = &eventfs_root_dir_inode_operations;
> + inode->i_fop = &eventfs_file_operations;
> inode->i_private = data;
>
> ti = get_tracefs(inode);
> @@ -169,15 +184,27 @@ void eventfs_set_ef_status_free(struct dentry *dentry)
> struct tracefs_inode *ti_parent;
> struct eventfs_file *ef;
>
> + mutex_lock(&eventfs_mutex);
To synchronize with the removals, I needed to add locking here.
> ti_parent = get_tracefs(dentry->d_parent->d_inode);
> if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE))
> - return;
> + goto out;
>
> ef = dentry->d_fsdata;
> if (!ef)
> - return;
> - ef->created = false;
> + goto out;
> + /*
> + * If ef was freed, then the LSB bit is set for d_fsdata.
> + * But this should not happen, as it should still have a
> + * ref count that prevents it. Warn in case it does.
> + */
> + if (WARN_ON_ONCE((unsigned long)ef & 1))
> + goto out;
During the remove, a dget() is done to keep the dentry from freeing. To
make sure that it doesn't get freed, I added this test.
> +
> + dentry->d_fsdata = NULL;
> +
> ef->dentry = NULL;
> + out:
> + mutex_unlock(&eventfs_mutex);
> }
>
> /**
> @@ -202,6 +229,79 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
> ti->private = ef->ei;
> }
>
> +static struct dentry *
> +create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
> +{
Because both the lookup and the dir_open_wrapper did basically the same
thing, I created a helper function so that I didn't have to update both
locations.
> + bool invalidate = false;
> + struct dentry *dentry;
> +
> + mutex_lock(&eventfs_mutex);
> + if (ef->is_freed) {
> + mutex_unlock(&eventfs_mutex);
> + return NULL;
> + }
Ignore if the ef is on its way to be freed.
> + if (ef->dentry) {
> + dentry = ef->dentry;
If the ef already has a dentry (created) then use it.
> + /* On dir open, up the ref count */
> + if (!lookup)
> + dget(dentry);
> + mutex_unlock(&eventfs_mutex);
> + return dentry;
> + }
> + mutex_unlock(&eventfs_mutex);
> +
> + if (!lookup)
> + inode_lock(parent->d_inode);
> +
> + if (ef->ei)
> + dentry = create_dir(ef->name, parent, ef->data);
> + else
> + dentry = create_file(ef->name, ef->mode, parent,
> + ef->data, ef->fop);
> +
> + if (!lookup)
> + inode_unlock(parent->d_inode);
> +
> + mutex_lock(&eventfs_mutex);
> + if (IS_ERR_OR_NULL(dentry)) {
With the lock dropped, the dentry could have been created causing it to
fail. Check if the ef->dentry exists, and if so, use it instead.
Note, if the ef is freed, it should not have a dentry.
> + /* If the ef was already updated get it */
> + dentry = ef->dentry;
> + if (dentry && !lookup)
> + dget(dentry);
> + mutex_unlock(&eventfs_mutex);
> + return dentry;
> + }
> +
> + if (!ef->dentry && !ef->is_freed) {
With the lock dropped, the dentry could have been filled too. If so, drop
the created dentry and use the one owned by the ef->dentry.
> + ef->dentry = dentry;
> + if (ef->ei)
> + eventfs_post_create_dir(ef);
> + dentry->d_fsdata = ef;
> + } else {
> + /* A race here, should try again (unless freed) */
> + invalidate = true;
I had a WARN_ON() once here. Probably could add a:
WARN_ON_ONCE(!ef->is_freed);
> + }
> + mutex_unlock(&eventfs_mutex);
> + if (invalidate)
> + d_invalidate(dentry);
> +
> + if (lookup || invalidate)
> + dput(dentry);
> +
> + return invalidate ? NULL : dentry;
> +}
> +
> +static bool match_event_file(struct eventfs_file *ef, const char *name)
> +{
A bit of a paranoid helper function. I wanted to make sure to synchronize
with the removals.
> + bool ret;
> +
> + mutex_lock(&eventfs_mutex);
> + ret = !ef->is_freed && strcmp(ef->name, name) == 0;
> + mutex_unlock(&eventfs_mutex);
> +
> + return ret;
> +}
> +
> /**
> * eventfs_root_lookup - lookup routine to create file/dir
> * @dir: directory in which lookup to be done
> @@ -211,7 +311,6 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
> * Used to create dynamic file/dir with-in @dir, search with-in ei
> * list, if @dentry found go ahead and create the file/dir
> */
> -
> static struct dentry *eventfs_root_lookup(struct inode *dir,
> struct dentry *dentry,
> unsigned int flags)
> @@ -230,30 +329,10 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
> idx = srcu_read_lock(&eventfs_srcu);
> list_for_each_entry_srcu(ef, &ei->e_top_files, list,
> srcu_read_lock_held(&eventfs_srcu)) {
> - if (strcmp(ef->name, dentry->d_name.name))
> + if (!match_event_file(ef, dentry->d_name.name))
> continue;
> ret = simple_lookup(dir, dentry, flags);
> - if (ef->created)
> - continue;
> - mutex_lock(&eventfs_mutex);
> - ef->created = true;
> - if (ef->ei)
> - ef->dentry = create_dir(ef->name, ef->mode, ef->d_parent,
> - ef->data, ef->fop, ef->iop);
> - else
> - ef->dentry = create_file(ef->name, ef->mode, ef->d_parent,
> - ef->data, ef->fop);
> -
> - if (IS_ERR_OR_NULL(ef->dentry)) {
> - ef->created = false;
> - mutex_unlock(&eventfs_mutex);
> - } else {
> - if (ef->ei)
> - eventfs_post_create_dir(ef);
> - ef->dentry->d_fsdata = ef;
> - mutex_unlock(&eventfs_mutex);
> - dput(ef->dentry);
> - }
> + create_dentry(ef, ef->d_parent, true);
> break;
> }
> srcu_read_unlock(&eventfs_srcu, idx);
> @@ -270,6 +349,7 @@ static int eventfs_release(struct inode *inode, struct file *file)
> struct tracefs_inode *ti;
> struct eventfs_inode *ei;
> struct eventfs_file *ef;
> + struct dentry *dentry;
> int idx;
>
> ti = get_tracefs(inode);
> @@ -280,8 +360,11 @@ static int eventfs_release(struct inode *inode, struct file *file)
> idx = srcu_read_lock(&eventfs_srcu);
> list_for_each_entry_srcu(ef, &ei->e_top_files, list,
> srcu_read_lock_held(&eventfs_srcu)) {
> - if (ef->created)
> - dput(ef->dentry);
> + mutex_lock(&eventfs_mutex);
> + dentry = ef->dentry;
> + mutex_unlock(&eventfs_mutex);
> + if (dentry)
> + dput(dentry);
> }
> srcu_read_unlock(&eventfs_srcu, idx);
> return dcache_dir_close(inode, file);
> @@ -312,47 +395,12 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file)
> ei = ti->private;
> idx = srcu_read_lock(&eventfs_srcu);
> list_for_each_entry_rcu(ef, &ei->e_top_files, list) {
> - if (ef->created) {
> - dget(ef->dentry);
> - continue;
> - }
> - mutex_lock(&eventfs_mutex);
> - ef->created = true;
> -
> - inode_lock(dentry->d_inode);
> - if (ef->ei)
> - ef->dentry = create_dir(ef->name, ef->mode, dentry,
> - ef->data, ef->fop, ef->iop);
> - else
> - ef->dentry = create_file(ef->name, ef->mode, dentry,
> - ef->data, ef->fop);
> - inode_unlock(dentry->d_inode);
> -
> - if (IS_ERR_OR_NULL(ef->dentry)) {
> - ef->created = false;
> - } else {
> - if (ef->ei)
> - eventfs_post_create_dir(ef);
> - ef->dentry->d_fsdata = ef;
> - }
> - mutex_unlock(&eventfs_mutex);
> + create_dentry(ef, dentry, false);
> }
> srcu_read_unlock(&eventfs_srcu, idx);
> return dcache_dir_open(inode, file);
> }
>
> -static const struct file_operations eventfs_file_operations = {
> - .open = dcache_dir_open_wrapper,
> - .read = generic_read_dir,
> - .iterate_shared = dcache_readdir,
> - .llseek = generic_file_llseek,
> - .release = eventfs_release,
> -};
> -
> -static const struct inode_operations eventfs_root_dir_inode_operations = {
> - .lookup = eventfs_root_lookup,
> -};
> -
> /**
> * eventfs_prepare_ef - helper function to prepare eventfs_file
> * @name: the name of the file/directory to create.
> @@ -470,11 +518,7 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
> ti_parent = get_tracefs(parent->d_inode);
> ei_parent = ti_parent->private;
>
> - ef = eventfs_prepare_ef(name,
> - S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
> - &eventfs_file_operations,
> - &eventfs_root_dir_inode_operations, NULL);
> -
> + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);
For directories, just use the hard coded values.
> if (IS_ERR(ef))
> return ef;
>
> @@ -502,11 +546,7 @@ struct eventfs_file *eventfs_add_dir(const char *name,
> if (!ef_parent)
> return ERR_PTR(-EINVAL);
>
> - ef = eventfs_prepare_ef(name,
> - S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
> - &eventfs_file_operations,
> - &eventfs_root_dir_inode_operations, NULL);
> -
> + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);
ditto.
> if (IS_ERR(ef))
> return ef;
>
> @@ -601,37 +641,15 @@ int eventfs_add_file(const char *name, umode_t mode,
> return 0;
> }
>
> -static LLIST_HEAD(free_list);
> -
> -static void eventfs_workfn(struct work_struct *work)
> -{
> - struct eventfs_file *ef, *tmp;
> - struct llist_node *llnode;
> -
> - llnode = llist_del_all(&free_list);
> - llist_for_each_entry_safe(ef, tmp, llnode, llist) {
> - if (ef->created && ef->dentry)
> - dput(ef->dentry);
> - kfree(ef->name);
> - kfree(ef->ei);
> - kfree(ef);
> - }
> -}
> -
> -DECLARE_WORK(eventfs_work, eventfs_workfn);
> -
> static void free_ef(struct rcu_head *head)
> {
> struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu);
>
> - if (!llist_add(&ef->llist, &free_list))
> - return;
> -
> - queue_work(system_unbound_wq, &eventfs_work);
> + kfree(ef->name);
> + kfree(ef->ei);
> + kfree(ef);
Since I did not do the dput() or d_invalidate() here I don't need call this
from task context. This simplifies the process.
> }
>
> -
> -
> /**
> * eventfs_remove_rec - remove eventfs dir or file from list
> * @ef: eventfs_file to be removed.
> @@ -639,7 +657,7 @@ static void free_ef(struct rcu_head *head)
> * This function recursively remove eventfs_file which
> * contains info of file or dir.
> */
> -static void eventfs_remove_rec(struct eventfs_file *ef, int level)
> +static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, int level)
> {
> struct eventfs_file *ef_child;
>
> @@ -659,15 +677,12 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
> /* search for nested folders or files */
> list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
> lockdep_is_held(&eventfs_mutex)) {
> - eventfs_remove_rec(ef_child, level + 1);
> + eventfs_remove_rec(ef_child, head, level + 1);
> }
> }
>
> - if (ef->created && ef->dentry)
> - d_invalidate(ef->dentry);
> -
> list_del_rcu(&ef->list);
> - call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
> + list_add_tail(&ef->del_list, head);
Hold off on freeing the ef. Add it to a link list to do so later.
> }
>
> /**
> @@ -678,12 +693,62 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
> */
> void eventfs_remove(struct eventfs_file *ef)
> {
> + struct eventfs_file *tmp;
> + LIST_HEAD(ef_del_list);
> + struct dentry *dentry_list = NULL;
> + struct dentry *dentry;
> +
> if (!ef)
> return;
>
> mutex_lock(&eventfs_mutex);
> - eventfs_remove_rec(ef, 0);
> + eventfs_remove_rec(ef, &ef_del_list, 0);
The above returns back with ef_del_list holding all the ef's to be freed.
I probably could have just passed the dentry_list down instead, but I
wanted the below complexity done in a non recursive function.
> +
> + list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) {
> + if (ef->dentry) {
> + unsigned long ptr = (unsigned long)dentry_list;
> +
> + /* Keep the dentry from being freed yet */
> + dget(ef->dentry);
> +
> + /*
> + * Paranoid: The dget() above should prevent the dentry
> + * from being freed and calling eventfs_set_ef_status_free().
> + * But just in case, set the link list LSB pointer to 1
> + * and have eventfs_set_ef_status_free() check that to
> + * make sure that if it does happen, it will not think
> + * the d_fsdata is an event_file.
> + *
> + * For this to work, no event_file should be allocated
> + * on a odd space, as the ef should always be allocated
> + * to be at least word aligned. Check for that too.
> + */
> + WARN_ON_ONCE(ptr & 1);
> +
> + ef->dentry->d_fsdata = (void *)(ptr | 1);
Set the d_fsdata to be a link list. The comment above needs to say to say
struct eventfs_file and struct dentry should be word aligned. Anyway, while
the eventfs_mutex is held, set all the dentries belonging to eventfs_files
to the dentry_list and clear the ef->dentry.
> + dentry_list = ef->dentry;
> + ef->dentry = NULL;
> + }
> + call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
> + }
> mutex_unlock(&eventfs_mutex);
> +
> + while (dentry_list) {
> + unsigned long ptr;
> +
> + dentry = dentry_list;
> + ptr = (unsigned long)dentry->d_fsdata & ~1UL;
> + dentry_list = (struct dentry *)ptr;
> + dentry->d_fsdata = NULL;
With the mutex released, it is safe to free the dentries here. This also
must be done before returning from this function, as when I had it done in
the workqueue, it was failing some tests that would remove a dynamic event
and still see that the directory was still around!
> + d_invalidate(dentry);
> + mutex_lock(&eventfs_mutex);
> + /* dentry should now have at least a single reference */
> + WARN_ONCE((int)d_count(dentry) < 1,
> + "dentry %px less than one reference (%d) after invalidate\n",
I did update the above to:
WARN_ONCE((int)d_count(dentry) < 1,
"dentry %px (%s) less than one reference (%d) after invalidate\n",
dentry, dentry->d_name.name, d_count(dentry));
To include the name of the dentry (my current work is triggering this still).
> + dentry, d_count(dentry));
> + mutex_unlock(&eventfs_mutex);
> + dput(dentry);
> + }
> }
>
> /**
> diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
> index c443a0c32a8c..1b880b5cd29d 100644
> --- a/fs/tracefs/internal.h
> +++ b/fs/tracefs/internal.h
> @@ -22,4 +22,6 @@ struct dentry *tracefs_end_creating(struct dentry *dentry);
> struct dentry *tracefs_failed_creating(struct dentry *dentry);
> struct inode *tracefs_get_inode(struct super_block *sb);
>
> +void eventfs_set_ef_status_free(struct dentry *dentry);
> +
> #endif /* _TRACEFS_INTERNAL_H */
> diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
> index 4d30b0cafc5f..47c1b4d21735 100644
> --- a/include/linux/tracefs.h
> +++ b/include/linux/tracefs.h
> @@ -51,8 +51,6 @@ void eventfs_remove(struct eventfs_file *ef);
>
> void eventfs_remove_events_dir(struct dentry *dentry);
>
> -void eventfs_set_ef_status_free(struct dentry *dentry);
> -
Oh, and eventfs_set_ef_status_free() should not be exported to outside the
tracefs system.
-- Steve
> struct dentry *tracefs_create_file(const char *name, umode_t mode,
> struct dentry *parent, void *data,
> const struct file_operations *fops);
Hi, Willy, Thomas
The suggestions of v1 nolibc powerpc patchset [1] from you have been applied,
here is v2.
Testing results:
- run with tinyconfig
arch/board | result
------------|------------
ppc/g3beige | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc/ppce500 | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc64le/pseries | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc64le/powernv | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc64/pseries | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc64/powernv | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
- run-user
(Tested with -Os, -O0 and -O2)
// for 32-bit PowerPC
$ for arch in powerpc ppc; do make run-user ARCH=$arch CROSS_COMPILE=powerpc-linux-gnu- ; done | grep status
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
// for 64-bit big-endian PowerPC and 64-bit little-endian PowerPC
$ for arch in ppc64 ppc64le; do make run-user ARCH=$arch CROSS_COMPILE=powerpc64le-linux-gnu- ; done | grep status
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
Changes from v1 --> v2:
- tools/nolibc: add support for powerpc
Add missing arch-powerpc.h lines to arch.h
Align with the other arch-<ARCH>.h, naming the variables
with more meaningful words, such as _ret, _num, _arg1 ...
Clean up the syscall instructions
No line from musl now.
Suggestons from Thomas
* tools/nolibc: add support for pppc64
No change
* selftests/nolibc: add extra configs customize support
To reduce complexity, merge the commands from the new extraconfig
target to defconfig target and drop the extconfig target completely.
Derived from Willy's suggestion of the tinyconfig patchset
* selftests/nolibc: add XARCH and ARCH mapping support
To reduce complexity, let's use XARCH internally and only reserve
ARCH as the input variable.
Derived from Willy's suggestion
* selftests/nolibc: add test support for powerpc
Add ppc as the default 32-bit variant for powerpc target, allow pass
ARCH=ppc or ARCH=powerpc to test 32-bit powerpc
Derived from Willy's suggestion
* selftests/nolibc: add test support for pppc64le
Rename powerpc64le to ppc64le
Suggestion from Willy
* selftests/nolibc: add test support for pppc64
Rename powerpc64 to ppc64
Suggestion from Willy
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1689713175.git.falcon@tinylab.org/
Zhangjin Wu (7):
tools/nolibc: add support for powerpc
tools/nolibc: add support for powerpc64
selftests/nolibc: add extra configs customize support
selftests/nolibc: add XARCH and ARCH mapping support
selftests/nolibc: add test support for ppc
selftests/nolibc: add test support for ppc64le
selftests/nolibc: add test support for ppc64
tools/include/nolibc/arch-powerpc.h | 202 ++++++++++++++++++++++++
tools/include/nolibc/arch.h | 2 +
tools/testing/selftests/nolibc/Makefile | 48 +++++-
3 files changed, 244 insertions(+), 8 deletions(-)
create mode 100644 tools/include/nolibc/arch-powerpc.h
--
2.25.1
Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Users can now select a
desired address space using a non-zero hint address to mmap. Previously,
requesting the default address space from mmap by passing zero as the hint
address would result in using the largest address space possible. Some
applications depend on empty bits in the virtual address space, like Go and
Java, so this patch provides more flexibility for application developers.
-Charlie
---
v6:
- Rebase onto the correct base
v5:
- Minor wording change in documentation
- Change some parenthesis in arch_get_mmap_ macros
- Added case for addr==0 in arch_get_mmap_ because without this, programs would
crash if RLIMIT_STACK was modified before executing the program. This was
tested using the libhugetlbfs tests.
v4:
- Split testcases/document patch into test cases, in-code documentation, and
formal documentation patches
- Modified the mmap_base macro to be more legible and better represent memory
layout
- Fixed documentation to better reflect the implmentation
- Renamed DEFAULT_VA_BITS to MMAP_VA_BITS
- Added additional test case for rlimit changes
---
Charlie Jenkins (4):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Document mmap changes
Documentation/riscv/vm-layout.rst | 22 +++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 20 ++-
arch/riscv/include/asm/processor.h | 46 +++++-
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/.gitignore | 1 +
tools/testing/selftests/riscv/mm/Makefile | 21 +++
.../selftests/riscv/mm/testcases/mmap.c | 133 ++++++++++++++++++
8 files changed, 234 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap.c
--
2.41.0
This is the basic functionality for iommufd to support
iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices.
iommufd_device_replace() allows changing the HWPT associated with the
device to a new IOAS or HWPT. Replace does this in way that failure leaves
things unchanged, and utilizes the iommu iommu_group_replace_domain() API
to allow the iommu driver to perform an optional non-disruptive change.
IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and
used by attach or replace. At this point it isn't very useful since the
HWPT is the same as the automatically managed HWPT from the IOAS. However
a following series will allow userspace to customize the created HWPT.
The implementation is complicated because we have to introduce some
per-iommu_group memory in iommufd and redo how we think about multi-device
groups to be more explicit. This solves all the locking problems in the
prior attempts.
This series is infrastructure work for the following series which:
- Add replace for attach
- Expose replace through VFIO APIs
- Implement driver parameters for HWPT creation (nesting)
Once review of this is complete I will keep it on a side branch and
accumulate the following series when they are ready so we can have a
stable base and make more incremental progress. When we have all the parts
together to get a full implementation it can go to Linus.
This is on github: https://github.com/jgunthorpe/linux/commits/iommufd_hwpt
v8:
- Rebase to v6.5-rc2, update to new behavior of __iommu_group_set_domain()
v7: https://lore.kernel.org/r/0-v7-6c0fd698eda2+5e3-iommufd_alloc_jgg@nvidia.com
- Rebase to v6.4-rc2, update to new signature of iommufd_get_ioas()
v6: https://lore.kernel.org/r/0-v6-fdb604df649a+369-iommufd_alloc_jgg@nvidia.com
- Go back to the v4 locking arragnment with now both the attach/detach
igroup->locks inside the functions, Kevin says he needs this for a
followup series. This still fixes the syzkaller bug
- Fix two more error unwind locking bugs where
iommufd_object_abort_and_destroy(hwpt) would deadlock or be mislocked.
Make sure fail_nth will catch these mistakes
- Add a patch allowing objects to have different abort than destroy
function, it allows hwpt abort to require the caller to continue
to hold the lock and enforces this with lockdep.
v5: https://lore.kernel.org/r/0-v5-6716da355392+c5-iommufd_alloc_jgg@nvidia.com
- Go back to the v3 version of the code, keep the comment changes from
v4. Syzkaller says the group lock change in v4 didn't work.
- Adjust the fail_nth test to cover the path syzkaller found. We need to
have an ioas with a mapped page installed to inject a failure during
domain attachment.
v4: https://lore.kernel.org/r/0-v4-9cd79ad52ee8+13f5-iommufd_alloc_jgg@nvidia.c…
- Refine comments and commit messages
- Move the group lock into iommufd_hw_pagetable_attach()
- Fix error unwind in iommufd_device_do_replace()
v3: https://lore.kernel.org/r/0-v3-61d41fd9e13e+1f5-iommufd_alloc_jgg@nvidia.com
- Refine comments and commit messages
- Adjust the flow in iommufd_device_auto_get_domain() so pt_id is only
set on success
- Reject replace on non-attached devices
- Add missing __reserved check for IOMMU_HWPT_ALLOC
v2: https://lore.kernel.org/r/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.c…
- Use WARN_ON for the igroup->group test and move that logic to a
function iommufd_group_try_get()
- Change igroup->devices to igroup->device list
Replace will need to iterate over all attached idevs
- Rename to iommufd_group_setup_msi()
- New patch to export iommu_get_resv_regions()
- New patch to use per-device reserved regions instead of per-group
regions
- Split out the reorganizing of iommufd_device_change_pt() from the
replace patch
- Replace uses the per-dev reserved regions
- Use stdev_id in a few more places in the selftest
- Fix error handling in IOMMU_HWPT_ALLOC
- Clarify comments
- Rebase on v6.3-rc1
v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia…
Jason Gunthorpe (17):
iommufd: Move isolated msi enforcement to iommufd_device_bind()
iommufd: Add iommufd_group
iommufd: Replace the hwpt->devices list with iommufd_group
iommu: Export iommu_get_resv_regions()
iommufd: Keep track of each device's reserved regions instead of
groups
iommufd: Use the iommufd_group to avoid duplicate MSI setup
iommufd: Make sw_msi_start a group global
iommufd: Move putting a hwpt to a helper function
iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
iommufd: Allow a hwpt to be aborted after allocation
iommufd: Fix locking around hwpt allocation
iommufd: Reorganize iommufd_device_attach into
iommufd_device_change_pt
iommufd: Add iommufd_device_replace()
iommufd: Make destroy_rwsem use a lock class per object type
iommufd: Add IOMMU_HWPT_ALLOC
iommufd/selftest: Return the real idev id from selftest mock_domain
iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
Nicolin Chen (2):
iommu: Introduce a new iommu_group_replace_domain() API
iommufd/selftest: Test iommufd_device_replace()
drivers/iommu/iommu-priv.h | 10 +
drivers/iommu/iommu.c | 38 +-
drivers/iommu/iommufd/device.c | 555 +++++++++++++-----
drivers/iommu/iommufd/hw_pagetable.c | 112 +++-
drivers/iommu/iommufd/io_pagetable.c | 32 +-
drivers/iommu/iommufd/iommufd_private.h | 52 +-
drivers/iommu/iommufd/iommufd_test.h | 6 +
drivers/iommu/iommufd/main.c | 24 +-
drivers/iommu/iommufd/selftest.c | 40 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/iommufd.h | 26 +
tools/testing/selftests/iommu/iommufd.c | 67 ++-
.../selftests/iommu/iommufd_fail_nth.c | 67 ++-
tools/testing/selftests/iommu/iommufd_utils.h | 63 +-
14 files changed, 867 insertions(+), 226 deletions(-)
create mode 100644 drivers/iommu/iommu-priv.h
base-commit: fdf0eaf11452d72945af31804e2a1048ee1b574c
--
2.41.0
Apologize for sending previous mail from a wrong app (not text mode).
Resending to keep the mailing list thread consistent.
On Wed, Jul 26, 2023 at 3:10 AM Markus Elfring <Markus.Elfring(a)web.de>
wrote:
>
> > Tests BPF redirect at the lwt xmit hook to ensure error handling are
> > safe, i.e. won't panic the kernel.
>
> Are imperative change descriptions still preferred?
Hi Markus,
I think you linked this to me yesterday that it should be described
imperatively:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Doc…
>
> See also:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Doc…
>
I don’t follow the purpose of this reference. This points to user impact
but this is a selftest, so I don’t see any user impact here. Or is there
anything I missed?
>
> Can remaining wording weaknesses be adjusted accordingly?
I am not following this question . Can you be more specific or provide an
example?
Yan
>
> Regards,
> Markus
>
Dzień dobry,
zapoznałem się z Państwa ofertą i z przyjemnością przyznaję, że przyciąga uwagę i zachęca do dalszych rozmów.
Pomyślałem, że może mógłbym mieć swój wkład w Państwa rozwój i pomóc dotrzeć z tą ofertą do większego grona odbiorców. Pozycjonuję strony www, dzięki czemu generują świetny ruch w sieci.
Możemy porozmawiać w najbliższym czasie?
Pozdrawiam
Adam Charachuta
Add specification for declaring test metadata to the KTAP v2 spec.
The purpose of test metadata is to allow for the declaration of essential
testing information in KTAP output. This information includes test
names, test configuration info, test attributes, and test files.
There have been similar ideas around the idea of test metadata such as test
prefixes and test name lines. However, I propose this specification as an
overall fix for these issues.
These test metadata lines are a form of diagnostic lines with the
format: "# <metadata_type>: <data>". As a type of diagnostic line, test
metadata lines are compliant with KTAP v1, which will help to not
interfere too much with current parsers.
Specifically the "# Subtest:" line is derived from the TAP 14 spec:
https://testanything.org/tap-version-14-specification.html.
The proposed location for test metadata is in the test header, between the
version line and the test plan line. Note including diagnostic lines in
the test header is a depature from KTAP v1.
This location provides two main benefits:
First, metadata will be printed prior to when subtests are run. Then if a
test fails, test metadata can help discern which test is causing the issue
and potentially why.
Second, this location ensures that the lines will not be accidentally
parsed as a subtest's diagnostic lines because the lines are bordered by
the version line and plan line.
Here is an example of test metadata:
KTAP version 2
# Config: CONFIG_TEST=y
1..1
KTAP version 2
# Subtest: test_suite
# File: /sys/kernel/...
# Attributes: slow
# Other: example_test
1..2
ok 1 test_1
ok 2 test_2
ok 1 test_suite
Here is a link to a version of the KUnit parser that is able to parse test
metadata lines for KTAP version 2. Note this includes test metadata
lines for the main level of KTAP.
Link: https://kunit-review.googlesource.com/c/linux/+/5809
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
Hi everyone,
I would like to use this proposal similar to an RFC to gather ideas on the
topic of test metadata. Let me know what you think.
I am also interested in brainstorming a list of recognized metadata types.
Providing recognized metadata types would be helpful in parsing and
displaying test metadata in a useful way.
Current ideas:
- "# Subtest: <test_name>" to indicate test name (name must match
corresponding result line)
- "# Attributes: <attributes list>" to indicate test attributes (list
separated by commas)
- "# File: <file_path>" to indicate file used in testing
Any other ideas?
Note this proposal replaces two of my previous proposals: "ktap_v2: add
recognized test name line" and "ktap_v2: allow prefix to KTAP lines."
Thanks!
-Rae
Note: this patch is based on Frank's ktap_spec_version_2 branch.
Documentation/dev-tools/ktap.rst | 51 ++++++++++++++++++++++++++++++--
1 file changed, 48 insertions(+), 3 deletions(-)
diff --git a/Documentation/dev-tools/ktap.rst b/Documentation/dev-tools/ktap.rst
index ff77f4aaa6ef..a2d0a196c115 100644
--- a/Documentation/dev-tools/ktap.rst
+++ b/Documentation/dev-tools/ktap.rst
@@ -17,7 +17,9 @@ KTAP test results describe a series of tests (which may be nested: i.e., test
can have subtests), each of which can contain both diagnostic data -- e.g., log
lines -- and a final result. The test structure and results are
machine-readable, whereas the diagnostic data is unstructured and is there to
-aid human debugging.
+aid human debugging. One exception to this is test metadata lines - a type
+of diagnostic lines. Test metadata is located between the version line and
+plan line of a test and can be machine-readable.
KTAP output is built from four different types of lines:
- Version lines
@@ -28,8 +30,7 @@ KTAP output is built from four different types of lines:
In general, valid KTAP output should also form valid TAP output, but some
information, in particular nested test results, may be lost. Also note that
there is a stagnant draft specification for TAP14, KTAP diverges from this in
-a couple of places (notably the "Subtest" header), which are described where
-relevant later in this document.
+a couple of places, which are described where relevant later in this document.
Version lines
-------------
@@ -166,6 +167,45 @@ even if they do not start with a "#": this is to capture any other useful
kernel output which may help debug the test. It is nevertheless recommended
that tests always prefix any diagnostic output they have with a "#" character.
+Test metadata lines
+-------------------
+
+Test metadata lines are a type of diagnostic lines used to the declare the
+name of a test and other helpful testing information in the test header.
+These lines are often helpful for parsing and for providing context during
+crashes.
+
+Test metadata lines must follow the format: "# <metadata_type>: <data>".
+These lines must be located between the version line and the plan line
+within a test header.
+
+There are a few currently recognized metadata types:
+- "# Subtest: <test_name>" to indicate test name (name must match
+ corresponding result line)
+- "# Attributes: <attributes list>" to indicate test attributes (list
+ separated by commas)
+- "# File: <file_path>" to indicate file used in testing
+
+As a rule, the "# Subtest:" line is generally first to declare the test
+name. Note that metadata lines do not necessarily need to use a
+recognized metadata type.
+
+An example of using metadata lines:
+
+::
+
+ KTAP version 2
+ 1..1
+ # File: /sys/kernel/...
+ KTAP version 2
+ # Subtest: example
+ # Attributes: slow, example_test
+ 1..1
+ ok 1 test_1
+ # example passed
+ ok 1 example
+
+
Unknown lines
-------------
@@ -206,6 +246,7 @@ An example of a test with two nested subtests:
KTAP version 2
1..1
KTAP version 2
+ # Subtest: example
1..2
ok 1 test_1
not ok 2 test_2
@@ -219,6 +260,7 @@ An example format with multiple levels of nested testing:
KTAP version 2
1..2
KTAP version 2
+ # Subtest: example_test_1
1..2
KTAP version 2
1..2
@@ -254,6 +296,7 @@ Example KTAP output
KTAP version 2
1..1
KTAP version 2
+ # Subtest: main_test
1..3
KTAP version 2
1..1
@@ -261,11 +304,13 @@ Example KTAP output
ok 1 test_1
ok 1 example_test_1
KTAP version 2
+ # Attributes: slow
1..2
ok 1 test_1 # SKIP test_1 skipped
ok 2 test_2
ok 2 example_test_2
KTAP version 2
+ # Subtest: example_test_3
1..3
ok 1 test_1
# test_2: FAIL
base-commit: 906f02e42adfbd5ae70d328ee71656ecb602aaf5
--
2.40.0.396.gfff15efe05-goog
lwt xmit hook does not expect positive return values in function
ip_finish_output2 and ip6_finish_output2. However, BPF redirect programs
can return positive values such like NET_XMIT_DROP, NET_RX_DROP, and etc
as errors. Such return values can panic the kernel unexpectedly:
https://gist.github.com/zhaiyan920/8fbac245b261fe316a7ef04c9b1eba48
This patch fixes the return values from BPF redirect, so the error
handling would be consistent at xmit hook. It also adds a few test cases
to prevent future regressions.
v2: https://lore.kernel.org/netdev/ZLdY6JkWRccunvu0@debian.debian/
v1: https://lore.kernel.org/bpf/ZLbYdpWC8zt9EJtq@debian.debian/
changes since v2:
* subject name changed
* also covered redirect to ingress case
* added selftests
changes since v1:
* minor code style changes
Yan Zhai (2):
bpf: fix skb_do_redirect return values
bpf: selftests: add lwt redirect regression test cases
net/core/filter.c | 12 +-
tools/testing/selftests/bpf/Makefile | 1 +
.../selftests/bpf/progs/test_lwt_redirect.c | 67 +++++++
.../selftests/bpf/test_lwt_redirect.sh | 165 ++++++++++++++++++
4 files changed, 244 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/bpf/progs/test_lwt_redirect.c
create mode 100755 tools/testing/selftests/bpf/test_lwt_redirect.sh
--
2.30.2
Remove clean target in Makefile to fix the following warning
and use the one in common lib.mk
Makefile:14: warning: overriding recipe for target 'clean'
../lib.mk:160: warning: ignoring old recipe for target 'clean'
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
tools/testing/selftests/prctl/Makefile | 2 --
1 file changed, 2 deletions(-)
diff --git a/tools/testing/selftests/prctl/Makefile b/tools/testing/selftests/prctl/Makefile
index cfc35d29fc2e..01dc90fbb509 100644
--- a/tools/testing/selftests/prctl/Makefile
+++ b/tools/testing/selftests/prctl/Makefile
@@ -10,7 +10,5 @@ all: $(TEST_PROGS)
include ../lib.mk
-clean:
- rm -fr $(TEST_PROGS)
endif
endif
--
2.39.2
On Tue, Jul 25, 2023 at 12:11 AM Markus Elfring <Markus.Elfring(a)web.de> wrote:
>
> > … unexpected problems. This change
> > converts the positive status code to proper error code.
>
> Please choose a corresponding imperative change suggestion.
>
> See also:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Doc…
>
>
> Did you provide sufficient justification for a possible addition of the tag “Fixes”?
>
>
> …
> > v2: code style change suggested by Stanislav Fomichev
> > ---
> > net/core/filter.c | 12 +++++++++++-
> …
>
> How do you think about to replace this marker by a line break?
>
> See also:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Doc…
>
> Regards,
> Markus
Hi Markus,
Thanks for the suggestions, those are what I could use more help with.
Will address these in the next version.
Yan
Hello everyone,
This patch series adds a test attributes framework to KUnit.
There has been interest in filtering out "slow" KUnit tests. Most notably,
a new config, CONFIG_MEMCPY_SLOW_KUNIT_TEST, has been added to exclude a
particularly slow memcpy test
(https://lore.kernel.org/all/20230118200653.give.574-kees@kernel.org/).
This attributes framework can be used to save and access test associated
data, including whether a test is slow. These attributes are reportable
(via KTAP and command line output) and are also filterable.
This framework is designed to allow for the addition of other attributes in
the future. These attributes could include whether the test can be run
concurrently, test file path, etc.
To try out the framework I suggest running:
"./tools/testing/kunit/kunit.py run --filter speed!=slow"
This patch series was originally sent out as an RFC. Here is a link to the
RFC v2:
https://lore.kernel.org/all/20230707210947.1208717-1-rmoar@google.com/
Thanks!
Rae
Rae Moar (9):
kunit: Add test attributes API structure
kunit: Add speed attribute
kunit: Add module attribute
kunit: Add ability to filter attributes
kunit: tool: Add command line interface to filter and report
attributes
kunit: memcpy: Mark tests as slow using test attributes
kunit: time: Mark test as slow using test attributes
kunit: add tests for filtering attributes
kunit: Add documentation of KUnit test attributes
.../dev-tools/kunit/running_tips.rst | 166 +++++++
include/kunit/attributes.h | 50 +++
include/kunit/test.h | 70 ++-
kernel/time/time_test.c | 2 +-
lib/Kconfig.debug | 3 +
lib/kunit/Makefile | 3 +-
lib/kunit/attributes.c | 418 ++++++++++++++++++
lib/kunit/executor.c | 115 ++++-
lib/kunit/executor_test.c | 128 +++++-
lib/kunit/kunit-example-test.c | 9 +
lib/kunit/test.c | 27 +-
lib/memcpy_kunit.c | 8 +-
tools/testing/kunit/kunit.py | 70 ++-
tools/testing/kunit/kunit_kernel.py | 8 +-
tools/testing/kunit/kunit_parser.py | 11 +-
tools/testing/kunit/kunit_tool_test.py | 39 +-
16 files changed, 1051 insertions(+), 76 deletions(-)
create mode 100644 include/kunit/attributes.h
create mode 100644 lib/kunit/attributes.c
base-commit: 64bd4641310c41a1ecf07c13c67bc0ed61045dfd
--
2.41.0.487.g6d72f3e995-goog
Hi All,
This is v3 of my series to clean up mm selftests so that they run correctly on
arm64. See [1] for full explanation.
Only patch 6 has changed vs v2. The rest are the same and already carry
reviewed/acked-bys. So I'm hoping I can get the final patch reviewed and this
series is hopefully then good enough to merge?
Changes Since v2 [2]
--------------------
- Patch 6: Change approach to cleaning up child processes; Use "parent death
signal", as suggested by David.
- Added Reviewed-by/Acked-by tags: thanks to David, Mark and Peter!
Changes Since v1 [1]
--------------------
- Patch 1: Explicitly set line buffer mode in ksft_print_header()
- Dropped v1 patch 2 (set execute permissions): Andrew has taken this into his
branch separately.
- Patch 2: Don't compile `soft-dirty` suite for arm64 instead of skipping it
at runtime.
- Patch 2: Declare fewer tests and skip all of test_softdirty() if soft-dirty
is not supported, rather than conditionally marking each check as skipped.
- Added Reviewed-by tags: thanks DavidH!
- Patch 8: Clarified commit message.
[1] https://lore.kernel.org/linux-mm/20230713135440.3651409-1-ryan.roberts@arm.…
[2] https://lore.kernel.org/linux-mm/20230717103152.202078-1-ryan.roberts@arm.c…
Thanks,
Ryan
Ryan Roberts (8):
selftests: Line buffer test program's stdout
selftests/mm: Skip soft-dirty tests on arm64
selftests/mm: Enable mrelease_test for arm64
selftests/mm: Fix thuge-gen test bugs
selftests/mm: va_high_addr_switch should skip unsupported arm64
configs
selftests/mm: Make migration test robust to failure
selftests/mm: Optionally pass duration to transhuge-stress
selftests/mm: Run all tests from run_vmtests.sh
tools/testing/selftests/kselftest.h | 9 ++
tools/testing/selftests/kselftest/runner.sh | 7 +-
tools/testing/selftests/mm/Makefile | 82 ++++++++++---------
tools/testing/selftests/mm/madv_populate.c | 26 +++++-
tools/testing/selftests/mm/migration.c | 12 ++-
tools/testing/selftests/mm/mrelease_test.c | 1 +
tools/testing/selftests/mm/run_vmtests.sh | 28 ++++++-
tools/testing/selftests/mm/settings | 2 +-
tools/testing/selftests/mm/thuge-gen.c | 4 +-
tools/testing/selftests/mm/transhuge-stress.c | 12 ++-
.../selftests/mm/va_high_addr_switch.c | 2 +-
11 files changed, 132 insertions(+), 53 deletions(-)
--
2.25.1
The arm64 Guarded Control Stack (GCS) feature provides support for
hardware protected stacks of return addresses, intended to provide
hardening against return oriented programming (ROP) attacks and to make
it easier to gather call stacks for applications such as profiling.
When GCS is active a secondary stack called the Guarded Control Stack is
maintained, protected with a memory attribute which means that it can
only be written with specific GCS operations. When a BL is executed the
value stored in LR is also pushed onto the GCS, and when a RET is
executed the top of the GCS is popped and compared to LR with a fault
being raised if the values do not match. GCS operations may only be
performed on GCS pages, a data abort is generated if they are not.
This series implements support for use of GCS by userspace, along with
support for use of GCS within KVM guests. It does not enable use of GCS
by either EL1 or EL2. Executables are started without GCS and must use
a prctl() to enable it, it is expected that this will be done very early
in application execution by the dynamic linker or other startup code.
x86 has an equivalent feature called shadow stacks, this series depends
on the x86 patches for generic memory management support for the new
guarded/shadow stack page type and shares APIs as much as possible. As
there has been extensive discussion with the wider community around the
ABI for shadow stacks I have as far as practical kept implementation
decisions close to those for x86, anticipating that review would lead to
similar conclusions in the absence of strong reasoning for divergence.
The main divergence I am concious of is that x86 allows shadow stack to
be enabled and disabled repeatedly, freeing the shadow stack for the
thread whenever disabled, while this implementation keeps the GCS
allocated after disable but refuses to reenable it. This is to avoid
races with things actively walking the GCS during a disable, we do
anticipate that some systems will wish to disable GCS at runtime but are
not aware of any demand for subsequently reenabling it.
x86 uses an arch_prctl() to manage enable and disable, since only x86
and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a
patch set for the equivalent RISC-V zisslpcfi feature which I initially
adopted fairly directly but following review feedback has been reviewed
quite a bit.
There is an open issue with support for CRIU, on x86 this required the
ability to set the GCS mode via ptrace. This series supports
configuring mode bits other than enable/disable via ptrace but it needs
to be confirmed if this is sufficient.
There's a few bits where I'm not convinced with where I've placed
things, in particular the GCS write operation is in the GCS header not
in uaccess.h, I wasn't sure what was clearest there and am probably too
close to the code to have a clear opinion. The reporting of GCS in
/proc/PID/smaps is also a bit awkward.
The series depends on the x86 shadow stack support:
https://lore.kernel.org/lkml/20230227222957.24501-1-rick.p.edgecombe@intel.…
I've rebased this onto v6.5-rc3 but not included it in the series in
order to avoid confusion with Rick's work and cut down the size of the
series, you can see the branch at:
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc.git arm64-gcs
[1] https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto v6.5-rc3.
- Rework prctl() interface to allow each bit to be locked independently.
- map_shadow_stack() now places the cap token based on the size
requested by the caller not the actual space allocated.
- Mode changes other than enable via ptrace are now supported.
- Expand test coverage.
- Various smaller fixes and adjustments.
- Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org
---
Mark Brown (35):
prctl: arch-agnostic prctl for shadow stack
arm64: Document boot requirements for Guarded Control Stacks
arm64/gcs: Document the ABI for Guarded Control Stacks
arm64/sysreg: Add new system registers for GCS
arm64/sysreg: Add definitions for architected GCS caps
arm64/gcs: Add manual encodings of GCS instructions
arm64/gcs: Provide copy_to_user_gcs()
arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS)
arm64/mm: Allocate PIE slots for EL0 guarded control stack
mm: Define VM_SHADOW_STACK for arm64 when we support GCS
arm64/mm: Map pages for guarded control stack
KVM: arm64: Manage GCS registers for guests
arm64/el2_setup: Allow GCS usage at EL0 and EL1
arm64/idreg: Add overrride for GCS
arm64/hwcap: Add hwcap for GCS
arm64/traps: Handle GCS exceptions
arm64/mm: Handle GCS data aborts
arm64/gcs: Context switch GCS registers for EL0
arm64/gcs: Allocate a new GCS for threads with GCS enabled
arm64/gcs: Implement shadow stack prctl() interface
arm64/mm: Implement map_shadow_stack()
arm64/signal: Set up and restore the GCS context for signal handlers
arm64/signal: Expose GCS state in signal frames
arm64/ptrace: Expose GCS via ptrace and core files
arm64: Add Kconfig for Guarded Control Stack (GCS)
kselftest/arm64: Verify the GCS hwcap
kselftest/arm64: Add GCS as a detected feature in the signal tests
kselftest/arm64: Add framework support for GCS to signal handling tests
kselftest/arm64: Allow signals tests to specify an expected si_code
kselftest/arm64: Always run signals tests with GCS enabled
kselftest/arm64: Add very basic GCS test program
kselftest/arm64: Add a GCS test program built with the system libc
kselftest/arm64: Add test coverage for GCS mode locking
selftests/arm64: Add GCS signal tests
kselftest/arm64: Enable GCS for the FP stress tests
Documentation/admin-guide/kernel-parameters.txt | 3 +
Documentation/arch/arm64/booting.rst | 22 ++
Documentation/arch/arm64/elf_hwcaps.rst | 3 +
Documentation/arch/arm64/gcs.rst | 225 +++++++++++++
Documentation/arch/arm64/index.rst | 1 +
Documentation/filesystems/proc.rst | 2 +-
arch/arm64/Kconfig | 19 ++
arch/arm64/include/asm/cpufeature.h | 6 +
arch/arm64/include/asm/el2_setup.h | 17 +
arch/arm64/include/asm/esr.h | 28 +-
arch/arm64/include/asm/exception.h | 2 +
arch/arm64/include/asm/gcs.h | 106 ++++++
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/asm/kvm_host.h | 12 +
arch/arm64/include/asm/pgtable-prot.h | 14 +-
arch/arm64/include/asm/processor.h | 7 +
arch/arm64/include/asm/sysreg.h | 20 ++
arch/arm64/include/asm/uaccess.h | 42 +++
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/ptrace.h | 8 +
arch/arm64/include/uapi/asm/sigcontext.h | 9 +
arch/arm64/kernel/cpufeature.c | 19 ++
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kernel/entry-common.c | 23 ++
arch/arm64/kernel/idreg-override.c | 2 +
arch/arm64/kernel/process.c | 78 +++++
arch/arm64/kernel/ptrace.c | 59 ++++
arch/arm64/kernel/signal.c | 237 ++++++++++++-
arch/arm64/kernel/traps.c | 11 +
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 17 +
arch/arm64/kvm/sys_regs.c | 22 ++
arch/arm64/mm/Makefile | 1 +
arch/arm64/mm/fault.c | 78 ++++-
arch/arm64/mm/gcs.c | 226 +++++++++++++
arch/arm64/mm/mmap.c | 17 +-
arch/arm64/tools/cpucaps | 1 +
arch/arm64/tools/sysreg | 55 +++
fs/proc/task_mmu.c | 3 +
include/linux/mm.h | 16 +-
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 5 +-
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 22 ++
kernel/sys.c | 30 ++
kernel/sys_ni.c | 1 +
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/abi/hwcap.c | 19 ++
tools/testing/selftests/arm64/fp/assembler.h | 15 +
tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 +
tools/testing/selftests/arm64/fp/sve-test.S | 2 +
tools/testing/selftests/arm64/fp/za-test.S | 2 +
tools/testing/selftests/arm64/fp/zt-test.S | 2 +
tools/testing/selftests/arm64/gcs/.gitignore | 3 +
tools/testing/selftests/arm64/gcs/Makefile | 19 ++
tools/testing/selftests/arm64/gcs/basic-gcs.c | 351 +++++++++++++++++++
tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 +++++++++++
tools/testing/selftests/arm64/gcs/gcs-util.h | 87 +++++
tools/testing/selftests/arm64/gcs/libc-gcs.c | 372 +++++++++++++++++++++
tools/testing/selftests/arm64/signal/.gitignore | 1 +
.../testing/selftests/arm64/signal/test_signals.c | 17 +-
.../testing/selftests/arm64/signal/test_signals.h | 6 +
.../selftests/arm64/signal/test_signals_utils.c | 32 +-
.../selftests/arm64/signal/test_signals_utils.h | 39 +++
.../arm64/signal/testcases/gcs_exception_fault.c | 59 ++++
.../selftests/arm64/signal/testcases/gcs_frame.c | 78 +++++
.../arm64/signal/testcases/gcs_write_fault.c | 67 ++++
.../selftests/arm64/signal/testcases/testcases.c | 7 +
.../selftests/arm64/signal/testcases/testcases.h | 1 +
68 files changed, 2825 insertions(+), 32 deletions(-)
---
base-commit: b8f2cc1100d85456f9a48243328b33ab0ce5caff
change-id: 20230303-arm64-gcs-e311ab0d8729
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Hi,
Using the same config for 6.5-rc2 on Ubuntu 22.04 LTS and 22.10, the execution
stop at the exact same line on both boxes (os I reckon it is more than an
accident):
# selftests: net/forwarding: sch_ets.sh
# TEST: ping vlan 10 [ OK ]
# TEST: ping vlan 11 [ OK ]
# TEST: ping vlan 12 [ OK ]
# Running in priomap mode
# Testing ets bands 3 strict 3, streams 0 1
# TEST: band 0 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 1 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 3 strict 3, streams 1 2
# TEST: band 1 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 2 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
# TEST: band 0 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 1 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
# TEST: bands 1:2 [ OK ]
# INFO: Expected ratio 2.00 Measured ratio 1.99
# Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
# TEST: bands 0:1 [ OK ]
# INFO: Expected ratio 1.00 Measured ratio .99
# TEST: bands 0:2 [ OK ]
# INFO: Expected ratio 1.00 Measured ratio 1.00
# Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
# TEST: bands 0:1 [ OK ]
# INFO: Expected ratio 1.42 Measured ratio 1.42
# TEST: bands 0:2 [ OK ]
# INFO: Expected ratio 3.33 Measured ratio 3.33
# Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
# TEST: bands 0:1 [ OK ]
# INFO: Expected ratio 1.60 Measured ratio 1.59
# TEST: bands 0:2 [ OK ]
# INFO: Expected ratio 3.33 Measured ratio 3.33
# Testing ets bands 2 quanta 5000 2500, streams 0 1
# TEST: bands 0:1 [ OK ]
# INFO: Expected ratio 2.00 Measured ratio 1.99
# Running in classifier mode
# Testing ets bands 3 strict 3, streams 0 1
# TEST: band 0 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 1 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 3 strict 3, streams 1 2
# TEST: band 1 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 2 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
I tried to run 'set -x' enabled version standalone, but that one finished
correctly (?).
It could be something previous scripts left, but right now I don't have a clue.
I can attempt to rerun all tests with sch_ets.sh bash 'set -x' enabled later today.
Best regards,
Mirsad Todorovac
Adds a check to verify if the rtc device file is valid or not
and prints a useful error message if the file is not accessible.
Signed-off-by: Atul Kumar Pant <atulpant.linux(a)gmail.com>
Acked-by: Alexandre Belloni <alexandre.belloni(a)bootlin.com>
---
changes since v4:
Updated the commit message.
changes since v3:
Added Linux-kselftest and Linux-kernel mailing lists.
changes since v2:
Changed error message when rtc file does not exist.
changes since v1:
Removed check for uid=0
If rtc file is invalid, then exit the test.
tools/testing/selftests/rtc/rtctest.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/rtc/rtctest.c b/tools/testing/selftests/rtc/rtctest.c
index 63ce02d1d5cc..630fef735c7e 100644
--- a/tools/testing/selftests/rtc/rtctest.c
+++ b/tools/testing/selftests/rtc/rtctest.c
@@ -17,6 +17,7 @@
#include <unistd.h>
#include "../kselftest_harness.h"
+#include "../kselftest.h"
#define NUM_UIE 3
#define ALARM_DELTA 3
@@ -419,6 +420,8 @@ __constructor_order_last(void)
int main(int argc, char **argv)
{
+ int ret = -1;
+
switch (argc) {
case 2:
rtc_file = argv[1];
@@ -430,5 +433,11 @@ int main(int argc, char **argv)
return 1;
}
- return test_harness_run(argc, argv);
+ // Run the test if rtc_file is valid
+ if (access(rtc_file, F_OK) == 0)
+ ret = test_harness_run(argc, argv);
+ else
+ ksft_exit_fail_msg("[ERROR]: Cannot access rtc file %s - Exiting\n", rtc_file);
+
+ return ret;
}
--
2.25.1
Hello everyone,
This patch series adds a test attributes framework to KUnit.
There has been interest in filtering out "slow" KUnit tests. Most notably,
a new config, CONFIG_MEMCPY_SLOW_KUNIT_TEST, has been added to exclude a
particularly slow memcpy test
(https://lore.kernel.org/all/20230118200653.give.574-kees@kernel.org/).
This attributes framework can be used to save and access test associated
data, including whether a test is slow. These attributes are reportable
(via KTAP and command line output) and are also filterable.
This framework is designed to allow for the addition of other attributes in
the future. These attributes could include whether the test can be run
concurrently, test file path, etc.
To try out the framework I suggest running:
"./tools/testing/kunit/kunit.py run --filter speed!=slow"
This patch series was originally sent out as an RFC. Here is a link to the
RFC v2:
https://lore.kernel.org/all/20230707210947.1208717-1-rmoar@google.com/
Thanks!
Rae
Rae Moar (9):
kunit: Add test attributes API structure
kunit: Add speed attribute
kunit: Add module attribute
kunit: Add ability to filter attributes
kunit: tool: Add command line interface to filter and report
attributes
kunit: memcpy: Mark tests as slow using test attributes
kunit: time: Mark test as slow using test attributes
kunit: add tests for filtering attributes
kunit: Add documentation of KUnit test attributes
.../dev-tools/kunit/running_tips.rst | 166 +++++++
include/kunit/attributes.h | 50 +++
include/kunit/test.h | 70 ++-
kernel/time/time_test.c | 2 +-
lib/Kconfig.debug | 3 +
lib/kunit/Makefile | 3 +-
lib/kunit/attributes.c | 421 ++++++++++++++++++
lib/kunit/executor.c | 115 ++++-
lib/kunit/executor_test.c | 128 +++++-
lib/kunit/kunit-example-test.c | 9 +
lib/kunit/test.c | 27 +-
lib/memcpy_kunit.c | 8 +-
tools/testing/kunit/kunit.py | 70 ++-
tools/testing/kunit/kunit_kernel.py | 8 +-
tools/testing/kunit/kunit_parser.py | 11 +-
tools/testing/kunit/kunit_tool_test.py | 39 +-
16 files changed, 1054 insertions(+), 76 deletions(-)
create mode 100644 include/kunit/attributes.h
create mode 100644 lib/kunit/attributes.c
base-commit: 64bd4641310c41a1ecf07c13c67bc0ed61045dfd
--
2.41.0.255.g8b1d071c50-goog
Hi,
There seems to be a problem with net/forwarding line of 6.5-rc2 kselftests,
vanilla Torvalds tree, commit fdf0eaf11452, on Ubuntu 22.04 LTS Jammy Jellyfish.
(Confirmed on Ubuntu 22.10 Kinetic Kudu.)
Tests fail with error message:
Command line is not complete. Try option "help"
Failed to create netif
The script
# tools/testing/seltests/net/forwarding/bridge_igmp.sh
bash `set -x` ends with an error:
++ create_netif_veth
++ local i
++ (( i = 1 ))
++ (( i <= NUM_NETIFS ))
++ local j=2
++ ip link show dev
++ [[ 255 -ne 0 ]]
++ ip link add type veth peer name
Command line is not complete. Try option "help"
++ [[ 255 -ne 0 ]]
++ echo 'Failed to create netif'
Failed to create netif
++ exit 1
The problem seems to be linked with this piece of code of "lib.sh":
create_netif_veth()
{
local i
for ((i = 1; i <= NUM_NETIFS; ++i)); do
local j=$((i+1))
ip link show dev ${NETIFS[p$i]} &> /dev/null
if [[ $? -ne 0 ]]; then
ip link add ${NETIFS[p$i]} type veth \
peer name ${NETIFS[p$j]}
if [[ $? -ne 0 ]]; then
echo "Failed to create netif"
exit 1
fi
fi
i=$j
done
}
Somehow, ${NETIFS[p$i]} is evaluated to an empty string?
However, I can't seem to see what is the expected result.
The problem was confirmed in the backlogs of 6.5-rc1 and 6.4 kselftests.
It is possible that I'm doing something terribly wrong, but it is basically
the default kselftest suite on a rather minimal Ubuntu.
Please find attached the bash output from `set -x`.
Version of iproute2 is:
ii iproute2 5.15.0-1ubuntu2 amd64 networking and traffic control tools
Hope this helps.
Best regards,
Mirsad Todorovac
Thank you Michał.
On 7/21/23 12:28 AM, Michał Mirosław wrote:
> b. rename match "flags" to 'page categories' everywhere - this makes
> it easier to differentiate the ioctl()s categorisation of pages
> from struct page flags;
> c. change {required + excluded} to {inverted + required}. This was
> rejected before, but I'd like to illustrate the difference.
> Old interface can be translated to the new by:
> categories_inverted = excluded_mask
> categories_mask = required_mask | excluded_mask
> categories_anyof_mask = anyof_mask
> The new way allows filtering by: A & (B | !C)
> categories_inverted = C
> categories_mask = A
> categories_anyof_mask = B | C
Andrei and Danylo,
Are you okay with these masks? It were you two who had proposed these.
--
BR,
Muhammad Usama Anjum
This series demonstrates how KTAP output can be used by nolibc-test to
make the test results better to read for people and machines.
Especially when running multiple invocations for different architectors
or build configurations we can make use of the kernels TAP parser to
automatically provide aggregated test reports.
The code is very hacky and incomplete and mostly meant to validate if
the output format is useful.
Start with the last patch of the series to actually see the generated
format, or run it for yourself.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (7):
selftests/nolibc: statically calculate number of testsuites
selftests/nolibc: use unsigned indices for testcases
selftests/nolibc: replace repetitive test structure with macro
selftests/nolibc: count subtests
kselftest: support KTAP format
kselftest: support skipping tests with testname
selftests/nolibc: proof of concept for TAP output
tools/testing/selftests/kselftest.h | 20 +++
tools/testing/selftests/nolibc/nolibc-test.c | 197 ++++++++++--------------
tools/testing/selftests/nolibc/run-all-tests.sh | 22 +++
3 files changed, 127 insertions(+), 112 deletions(-)
---
base-commit: dfef4fc45d5713eb23d87f0863aff9c33bd4bfaf
change-id: 20230718-nolibc-ktap-tmp-4408f505408d
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Hi Michał,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.5-rc2 next-20230721]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Micha-Miros-aw/Re-fs-proc-ta…
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/a0b5c6776b2ed91f78a7575649f8b100e58bd3a9.16898810…
patch subject: Re: fs/proc/task_mmu: Implement IOCTL for efficient page table scanning
config: powerpc-randconfig-r015-20230720 (https://download.01.org/0day-ci/archive/20230721/202307211507.xOl45LiR-lkp@…)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce: (https://download.01.org/0day-ci/archive/20230721/202307211507.xOl45LiR-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202307211507.xOl45LiR-lkp@intel.com/
All errors (new ones prefixed by >>):
>> fs/proc/task_mmu.c:1921:6: error: call to undeclared function 'userfaultfd_wp_async'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
1921 | if (userfaultfd_wp_async(vma) && userfaultfd_wp_use_markers(vma))
| ^
>> fs/proc/task_mmu.c:2200:12: error: call to undeclared function 'uffd_wp_range'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
2200 | int err = uffd_wp_range(vma, addr, end - addr, true);
| ^
2 errors generated.
vim +/userfaultfd_wp_async +1921 fs/proc/task_mmu.c
1913
1914 static int pagemap_scan_test_walk(unsigned long start, unsigned long end,
1915 struct mm_walk *walk)
1916 {
1917 struct pagemap_scan_private *p = walk->private;
1918 struct vm_area_struct *vma = walk->vma;
1919 unsigned long vma_category = 0;
1920
> 1921 if (userfaultfd_wp_async(vma) && userfaultfd_wp_use_markers(vma))
1922 vma_category |= PAGE_IS_WPASYNC;
1923 else if (p->arg.flags & PM_SCAN_CHECK_WPASYNC)
1924 return -EPERM;
1925
1926 if (vma->vm_flags & VM_PFNMAP)
1927 return 1;
1928
1929 if (!pagemap_scan_is_interesting_vma(vma_category, p))
1930 return 1;
1931
1932 p->cur_vma_category = vma_category;
1933 return 0;
1934 }
1935
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Hi Michał,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.5-rc2 next-20230720]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Micha-Miros-aw/Re-fs-proc-ta…
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/a0b5c6776b2ed91f78a7575649f8b100e58bd3a9.16898810…
patch subject: Re: fs/proc/task_mmu: Implement IOCTL for efficient page table scanning
config: i386-randconfig-r022-20230720 (https://download.01.org/0day-ci/archive/20230721/202307211337.5dwCMeHb-lkp@…)
compiler: clang version 15.0.7 (https://github.com/llvm/llvm-project.git 8dfdcc7b7bf66834a761bd8de445840ef68e4d1a)
reproduce: (https://download.01.org/0day-ci/archive/20230721/202307211337.5dwCMeHb-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202307211337.5dwCMeHb-lkp@intel.com/
All errors (new ones prefixed by >>):
>> fs/proc/task_mmu.c:1921:6: error: call to undeclared function 'userfaultfd_wp_async'; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration]
if (userfaultfd_wp_async(vma) && userfaultfd_wp_use_markers(vma))
^
>> fs/proc/task_mmu.c:2200:12: error: call to undeclared function 'uffd_wp_range'; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration]
int err = uffd_wp_range(vma, addr, end - addr, true);
^
2 errors generated.
vim +/userfaultfd_wp_async +1921 fs/proc/task_mmu.c
1913
1914 static int pagemap_scan_test_walk(unsigned long start, unsigned long end,
1915 struct mm_walk *walk)
1916 {
1917 struct pagemap_scan_private *p = walk->private;
1918 struct vm_area_struct *vma = walk->vma;
1919 unsigned long vma_category = 0;
1920
> 1921 if (userfaultfd_wp_async(vma) && userfaultfd_wp_use_markers(vma))
1922 vma_category |= PAGE_IS_WPASYNC;
1923 else if (p->arg.flags & PM_SCAN_CHECK_WPASYNC)
1924 return -EPERM;
1925
1926 if (vma->vm_flags & VM_PFNMAP)
1927 return 1;
1928
1929 if (!pagemap_scan_is_interesting_vma(vma_category, p))
1930 return 1;
1931
1932 p->cur_vma_category = vma_category;
1933 return 0;
1934 }
1935
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
In order to select a nexthop for multipath routes, fib_select_multipath()
is used with legacy nexthops and nexthop_select_path_hthr() is used with
nexthop objects. Those two functions perform a validity test on the
neighbor related to each nexthop but their logic is structured differently.
This causes a divergence in behavior and nexthop_select_path_hthr() may
return a nexthop that failed the neighbor validity test even if there was
one that passed.
Refactor nexthop_select_path_hthr() to make it more similar to
fib_select_multipath() and fix the problem mentioned above.
v2:
Removed unnecessary "first" variable in "nexthop: Do not return invalid
nexthop object during multipath selection".
v1:
https://lore.kernel.org/netdev/20230529201914.69828-1-bpoirier@nvidia.com/
---
Benjamin Poirier (4):
nexthop: Factor out hash threshold fdb nexthop selection
nexthop: Factor out neighbor validity check
nexthop: Do not return invalid nexthop object during multipath selection
selftests: net: Add test cases for nexthop groups with invalid neighbors
net/ipv4/nexthop.c | 61 +++++++++----
tools/testing/selftests/net/fib_nexthops.sh | 129 ++++++++++++++++++++++++++++
2 files changed, 171 insertions(+), 19 deletions(-)
---
base-commit: 36395b2efe905650cd179d67411ffee3b770268b
change-id: 20230719-nh_select-0303d55a1fb0
Best regards,
--
Benjamin Poirier <bpoirier(a)nvidia.com>
Hi Michał,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.5-rc2 next-20230720]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Micha-Miros-aw/Re-fs-proc-ta…
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/a0b5c6776b2ed91f78a7575649f8b100e58bd3a9.16898810…
patch subject: Re: fs/proc/task_mmu: Implement IOCTL for efficient page table scanning
config: arc-randconfig-r035-20230720 (https://download.01.org/0day-ci/archive/20230721/202307211030.2CJH6TkM-lkp@…)
compiler: arceb-elf-gcc (GCC) 12.3.0
reproduce: (https://download.01.org/0day-ci/archive/20230721/202307211030.2CJH6TkM-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202307211030.2CJH6TkM-lkp@intel.com/
All errors (new ones prefixed by >>):
fs/proc/task_mmu.c: In function 'pagemap_scan_test_walk':
fs/proc/task_mmu.c:1921:13: error: implicit declaration of function 'userfaultfd_wp_async'; did you mean 'userfaultfd_wp'? [-Werror=implicit-function-declaration]
1921 | if (userfaultfd_wp_async(vma) && userfaultfd_wp_use_markers(vma))
| ^~~~~~~~~~~~~~~~~~~~
| userfaultfd_wp
fs/proc/task_mmu.c: In function 'pagemap_scan_pte_hole':
>> fs/proc/task_mmu.c:2200:19: error: implicit declaration of function 'uffd_wp_range' [-Werror=implicit-function-declaration]
2200 | int err = uffd_wp_range(vma, addr, end - addr, true);
| ^~~~~~~~~~~~~
fs/proc/task_mmu.c: In function 'pagemap_scan_init_bounce_buffer':
fs/proc/task_mmu.c:2290:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
2290 | p->vec_out = (void __user *)p->arg.vec;
| ^
fs/proc/task_mmu.c: At top level:
fs/proc/task_mmu.c:1967:13: warning: 'pagemap_scan_backout_range' defined but not used [-Wunused-function]
1967 | static void pagemap_scan_backout_range(struct pagemap_scan_private *p,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
vim +/uffd_wp_range +2200 fs/proc/task_mmu.c
2182
2183 static int pagemap_scan_pte_hole(unsigned long addr, unsigned long end,
2184 int depth, struct mm_walk *walk)
2185 {
2186 struct pagemap_scan_private *p = walk->private;
2187 struct vm_area_struct *vma = walk->vma;
2188 int ret;
2189
2190 if (!vma || !pagemap_scan_is_interesting_page(p->cur_vma_category, p))
2191 return 0;
2192
2193 ret = pagemap_scan_output(p->cur_vma_category, p, addr, &end);
2194 if (addr == end)
2195 return ret;
2196
2197 if (~p->arg.flags & PM_SCAN_WP_MATCHING)
2198 return ret;
2199
> 2200 int err = uffd_wp_range(vma, addr, end - addr, true);
2201 if (err < 0)
2202 ret = err;
2203
2204 return ret;
2205 }
2206
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Hi Michał,
kernel test robot noticed the following build warnings:
[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on linus/master v6.5-rc2 next-20230720]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Micha-Miros-aw/Re-fs-proc-ta…
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/a0b5c6776b2ed91f78a7575649f8b100e58bd3a9.16898810…
patch subject: Re: fs/proc/task_mmu: Implement IOCTL for efficient page table scanning
config: mips-allyesconfig (https://download.01.org/0day-ci/archive/20230721/202307210528.2qgK1vwi-lkp@…)
compiler: mips-linux-gcc (GCC) 12.3.0
reproduce: (https://download.01.org/0day-ci/archive/20230721/202307210528.2qgK1vwi-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202307210528.2qgK1vwi-lkp@intel.com/
All warnings (new ones prefixed by >>):
fs/proc/task_mmu.c: In function 'pagemap_scan_test_walk':
fs/proc/task_mmu.c:1921:13: error: implicit declaration of function 'userfaultfd_wp_async'; did you mean 'userfaultfd_wp'? [-Werror=implicit-function-declaration]
1921 | if (userfaultfd_wp_async(vma) && userfaultfd_wp_use_markers(vma))
| ^~~~~~~~~~~~~~~~~~~~
| userfaultfd_wp
fs/proc/task_mmu.c: In function 'pagemap_scan_init_bounce_buffer':
>> fs/proc/task_mmu.c:2290:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
2290 | p->vec_out = (void __user *)p->arg.vec;
| ^
fs/proc/task_mmu.c: At top level:
fs/proc/task_mmu.c:1967:13: warning: 'pagemap_scan_backout_range' defined but not used [-Wunused-function]
1967 | static void pagemap_scan_backout_range(struct pagemap_scan_private *p,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
vim +2290 fs/proc/task_mmu.c
2264
2265 static int pagemap_scan_init_bounce_buffer(struct pagemap_scan_private *p)
2266 {
2267 if (!p->arg.vec_len) {
2268 /*
2269 * An arbitrary non-page-aligned sentinel value for
2270 * pagemap_scan_push_range().
2271 */
2272 p->cur_buf.start = p->cur_buf.end = ULLONG_MAX;
2273 return 0;
2274 }
2275
2276 /*
2277 * Allocate a smaller buffer to get output from inside the page
2278 * walk functions and walk the range in PAGEMAP_WALK_SIZE chunks.
2279 * The last range is always stored in p.cur_buf to allow coalescing
2280 * consecutive ranges that have the same categories returned across
2281 * walk_page_range() calls.
2282 */
2283 p->vec_buf_len = min_t(size_t, PAGEMAP_WALK_SIZE >> PAGE_SHIFT,
2284 p->arg.vec_len - 1);
2285 p->vec_buf = kmalloc_array(p->vec_buf_len, sizeof(*p->vec_buf),
2286 GFP_KERNEL);
2287 if (!p->vec_buf)
2288 return -ENOMEM;
2289
> 2290 p->vec_out = (void __user *)p->arg.vec;
2291
2292 return 0;
2293 }
2294
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
=== Context ===
In the context of a middlebox, fragmented packets are tricky to handle.
The full 5-tuple of a packet is often only available in the first
fragment which makes enforcing consistent policy difficult. There are
really only two stateless options, neither of which are very nice:
1. Enforce policy on first fragment and accept all subsequent fragments.
This works but may let in certain attacks or allow data exfiltration.
2. Enforce policy on first fragment and drop all subsequent fragments.
This does not really work b/c some protocols may rely on
fragmentation. For example, DNS may rely on oversized UDP packets for
large responses.
So stateful tracking is the only sane option. RFC 8900 [0] calls this
out as well in section 6.3:
Middleboxes [...] should process IP fragments in a manner that is
consistent with [RFC0791] and [RFC8200]. In many cases, middleboxes
must maintain state in order to achieve this goal.
=== BPF related bits ===
Policy has traditionally been enforced from XDP/TC hooks. Both hooks
run before kernel reassembly facilities. However, with the new
BPF_PROG_TYPE_NETFILTER, we can rather easily hook into existing
netfilter reassembly infra.
The basic idea is we bump a refcnt on the netfilter defrag module and
then run the bpf prog after the defrag module runs. This allows bpf
progs to transparently see full, reassembled packets. The nice thing
about this is that progs don't have to carry around logic to detect
fragments.
=== Changelog ===
Changes from v4:
* Refactor module handling code to not sleep in rcu_read_lock()
* Also unify the v4 and v6 hook structs so they can share codepaths
* Fixed some checkpatch.pl formatting warnings
Changes from v3:
* Correctly initialize `addrlen` stack var for recvmsg()
Changes from v2:
* module_put() if ->enable() fails
* Fix CI build errors
Changes from v1:
* Drop bpf_program__attach_netfilter() patches
* static -> static const where appropriate
* Fix callback assignment order during registration
* Only request_module() if callbacks are missing
* Fix retval when modprobe fails in userspace
* Fix v6 defrag module name (nf_defrag_ipv6_hooks -> nf_defrag_ipv6)
* Simplify priority checking code
* Add warning if module doesn't assign callbacks in the future
* Take refcnt on module while defrag link is active
[0]: https://datatracker.ietf.org/doc/html/rfc8900
Daniel Xu (5):
netfilter: defrag: Add glue hooks for enabling/disabling defrag
netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
bpf: selftests: Support not connecting client socket
bpf: selftests: Support custom type and proto for client sockets
bpf: selftests: Add defrag selftests
include/linux/netfilter.h | 10 +
include/uapi/linux/bpf.h | 5 +
net/ipv4/netfilter/nf_defrag_ipv4.c | 17 +-
net/ipv6/netfilter/nf_defrag_ipv6_hooks.c | 11 +
net/netfilter/core.c | 6 +
net/netfilter/nf_bpf_link.c | 116 ++++++-
tools/include/uapi/linux/bpf.h | 5 +
tools/testing/selftests/bpf/Makefile | 4 +-
.../selftests/bpf/generate_udp_fragments.py | 90 ++++++
.../selftests/bpf/ip_check_defrag_frags.h | 57 ++++
tools/testing/selftests/bpf/network_helpers.c | 26 +-
tools/testing/selftests/bpf/network_helpers.h | 3 +
.../bpf/prog_tests/ip_check_defrag.c | 283 ++++++++++++++++++
.../selftests/bpf/progs/ip_check_defrag.c | 104 +++++++
14 files changed, 715 insertions(+), 22 deletions(-)
create mode 100755 tools/testing/selftests/bpf/generate_udp_fragments.py
create mode 100644 tools/testing/selftests/bpf/ip_check_defrag_frags.h
create mode 100644 tools/testing/selftests/bpf/prog_tests/ip_check_defrag.c
create mode 100644 tools/testing/selftests/bpf/progs/ip_check_defrag.c
--
2.41.0
On Thu, Jul 20, 2023 at 09:28:52PM +0200, Michał Mirosław wrote:
> This is a massaged version of patch by Muhammad Usama Anjum [1]
> to illustrate my review comments and hopefully push the implementation
> efforts closer to conclusion. The changes are:
[...]
> +static void pagemap_scan_backout_range(struct pagemap_scan_private *p,
> + unsigned long addr, unsigned long end)
> +{
> + struct page_region *cur_buf = &p->cur_buf;
> +
> + if (cur_buf->start != addr) {
> + cur_buf->end = addr;
> + } else {
> + cur_buf->start = cur_buf->end = 0;
> + }
> +
> + p->end_addr = 0;
Just noticed that this is missing:
p->found_pages -= (end - addr) / PAGE_SIZE;
> +}
[...]
Best Regards
Michał Mirosław
v1: https://lore.kernel.org/rust-for-linux/20230614180837.630180-1-ojeda@kernel…
v2:
- Rebased on top of v6.5-rc1, which requires a change from
`kunit_do_failed_assertion` to `__kunit_do_failed_assertion` (since
the former became a macro) and the addition of a call to
`__kunit_abort` (since previously the call was done by the old
function which we cannot use anymore since it is a macro). (David)
- Added prerequisite patch to KUnit header to include `stddef.h` to
support the `KUNIT=y` case. (Reported by Boqun)
- Added comment on the purpose of `trait FromErrno`. (Martin asked
about it)
- Simplify code to use `std::fs::write` instead of `write!`, which
improves code size too. (Suggested by Alice)
- Fix copy-paste type in docs from "error" to "info" and, to make it
proper English, copy the `printk` docs style, i.e. from "info"
to "info-level message" -- and same for the "error" one. (Miguel)
- Swap `FILE` and `LINE` `static`s to keep the same order as done
elsewhere. (Miguel)
- Rename config option from `RUST_KERNEL_KUNIT_TEST` to
`RUST_KERNEL_DOCTESTS` (and update its title), so that we can use
the former for the "normal" (i.e. non-doctests, e.g. `#[test]` ones)
tests in the future. (David)
- Follow the syntax proposed for declaring test metadata in the KTAP
v2 spec, which may also get used for the KUnit test attributes API.
Thus, instead of "# Doctest from line {line}", use
"# {test_name}.location: {file}.{line}", which ideally will allow to
migrate to a KUnit attribute later.
This is done in all cases, i.e. when the tests succeeds, because
it may be useful for users running KUnit manually, or when passing
`--raw_output` to `kunit.py`. (David)
David: I used "location" instead of your suggested "line" alone, in
order to have both in a single line, which looked nice and closer to
the "ASSERTION FAILURE" case/line, since now we do have the original
file (please see below).
- Figure out the original line. This is done by deploying an anchor
so that the difference in lines between the beginning of the test
and the assert, in the generated file, can be computed. Then, we
offset the line number of the original test, which is given by
`rustdoc`. (developed by Boqun)
- Figure out the original file. This is done by walking the
filesystem, checking directory prefixes to reduce the amount of
combinations to check, and it is only done once per file (rather
than per test).
Ambiguities are detected and reported. It does limit the filenames
(module names) we can use, but it is unlikely we will hit it soon
and this should be temporary anyway until `rustdoc` provides us
with the real path. (Miguel)
Tested with both in-tree and `O=` builds, but I would appreciate
extra testing on this one, including via the `kunit.py` script.
- The three last items combined means that we now see this output even
for successful cases:
# rust_doctest_kernel_sync_locked_by_rs_0.location: rust/kernel/sync/locked_by.rs:28
ok 53 rust_doctest_kernel_sync_locked_by_rs_0
Which basically gives the user all the information they need to go
back to the source code of the doctest, while keeping them fairly
stable for bisection, and while avoiding to require users to write
test names manually. (David + Boqun + Miguel)
David: from what I saw in v2 of the RFC for the test attributes API,
the syntax still contains the test name when it is not a suite, so
I followed that, but if you prefer to omit it, please feel free to
do so (for me either way it is fine, and if this is the expected
attribute syntax, I guess it is worth to follow it to make migration
easier later on):
# location: rust/kernel/sync/locked_by.rs:28
ok 53 rust_doctest_kernel_sync_locked_by_rs_0
- Collected `Reviewed-by`s and `Tested-by`s, except for the main
commit due to the substantial changes.
Miguel Ojeda (7):
kunit: test-bug.h: include `stddef.h` for `NULL`
rust: init: make doctests compilable/testable
rust: str: make doctests compilable/testable
rust: sync: make doctests compilable/testable
rust: types: make doctests compilable/testable
rust: support running Rust documentation tests as KUnit ones
MAINTAINERS: add Rust KUnit files to the KUnit entry
MAINTAINERS | 2 +
include/kunit/test-bug.h | 2 +
lib/Kconfig.debug | 13 ++
rust/.gitignore | 2 +
rust/Makefile | 29 ++++
rust/bindings/bindings_helper.h | 1 +
rust/helpers.c | 7 +
rust/kernel/init.rs | 26 +--
rust/kernel/kunit.rs | 163 +++++++++++++++++++
rust/kernel/lib.rs | 2 +
rust/kernel/str.rs | 4 +-
rust/kernel/sync/arc.rs | 9 +-
rust/kernel/sync/lock/mutex.rs | 1 +
rust/kernel/sync/lock/spinlock.rs | 1 +
rust/kernel/types.rs | 6 +-
scripts/.gitignore | 2 +
scripts/Makefile | 4 +
scripts/rustdoc_test_builder.rs | 72 +++++++++
scripts/rustdoc_test_gen.rs | 260 ++++++++++++++++++++++++++++++
19 files changed, 591 insertions(+), 15 deletions(-)
create mode 100644 rust/kernel/kunit.rs
create mode 100644 scripts/rustdoc_test_builder.rs
create mode 100644 scripts/rustdoc_test_gen.rs
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
--
2.41.0
This series fixes an issue which David Spickett found where if we change
the SVE VL while SME is in use we can end up attempting to save state to
an unallocated buffer and adds testing coverage for that plus a bit more
coverage of VL changes, just for paranioa.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (3):
arm64/fpsimd: Ensure SME storage is allocated after SVE VL changes
kselftest/arm64: Add a test case for SVE VL changes with SME active
kselftest/arm64: Validate that changing one VL type does not affect another
arch/arm64/kernel/fpsimd.c | 32 +++++--
tools/testing/selftests/arm64/fp/vec-syscfg.c | 127 +++++++++++++++++++++++++-
2 files changed, 148 insertions(+), 11 deletions(-)
---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230713-arm64-fix-sve-sme-vl-change-60eb1fa6a707
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Hi,
This follows the discussion here:
https://lore.kernel.org/linux-kselftest/20230324123157.bbwvfq4gsxnlnfwb@hou…
This shows a couple of inconsistencies with regard to how device-managed
resources are cleaned up. Basically, devm resources will only be cleaned up
if the device is attached to a bus and bound to a driver. Failing any of
these cases, a call to device_unregister will not end up in the devm
resources being released.
We had to work around it in DRM to provide helpers to create a device for
kunit tests, but the current discussion around creating similar, generic,
helpers for kunit resumed interest in fixing this.
This can be tested using the command:
./tools/testing/kunit/kunit.py run --kunitconfig=drivers/base/test/
I added the fix David suggested back in that discussion which does fix
the tests. The SoB is missing, since David didn't provide it back then.
Let me know what you think,
Maxime
Signed-off-by: Maxime Ripard <maxime(a)cerno.tech>
---
Changes in v2:
- Use an init function
- Document the tests
- Add a fix for the bugs
- Link to v1: https://lore.kernel.org/r/20230329-kunit-devm-inconsistencies-test-v1-0-c33…
---
David Gow (1):
drivers: base: Free devm resources when unregistering a device
Maxime Ripard (2):
drivers: base: Add basic devm tests for root devices
drivers: base: Add basic devm tests for platform devices
drivers/base/core.c | 11 ++
drivers/base/test/.kunitconfig | 2 +
drivers/base/test/Kconfig | 4 +
drivers/base/test/Makefile | 3 +
drivers/base/test/platform-device-test.c | 220 +++++++++++++++++++++++++++++++
drivers/base/test/root-device-test.c | 108 +++++++++++++++
6 files changed, 348 insertions(+)
---
base-commit: 53cdf865f90ba922a854c65ed05b519f9d728424
change-id: 20230329-kunit-devm-inconsistencies-test-5e5a7d01e60d
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
Hi,
This patch series aims to improve the PMU event filter settings with a cleaner
and more organized structure and adds several test cases related to PMU event
filters.
These changes help to ensure that KVM's PMU event filter functions as expected
in all supported use cases.
Any feedback or suggestions are greatly appreciated.
Sincerely,
Jinrong Liang
Changes log:
v5:
- Add more x86 properties for Intel PMU;
- Designated initializer instead of overwrite all members; (Isaku Yamahata)
- PMU event filter invalid flag modified to "KVM_PMU_EVENT_FLAGS_VALID_MASK << 1"; (Isaku Yamahata)
- sizeof(bitmap) is modified to "sizeof(bitmap) * 8" to represent the number of
bits that can be represented by the bitmap variable. (Isaku Yamahata)
Previous:
https://lore.kernel.org/kvm/20230717062343.3743-1-cloudliang@tencent.com/T/
Jinrong Liang (6):
KVM: selftests: Add x86 properties for Intel PMU in processor.h
KVM: selftests: Drop the return of remove_event()
KVM: selftests: Introduce __kvm_pmu_event_filter to improved event
filter settings
KVM: selftests: Add test cases for unsupported PMU event filter input
values
KVM: selftests: Test if event filter meets expectations on fixed
counters
KVM: selftests: Test gp event filters don't affect fixed event filters
.../selftests/kvm/include/x86_64/processor.h | 5 +
.../kvm/x86_64/pmu_event_filter_test.c | 317 ++++++++++++------
2 files changed, 228 insertions(+), 94 deletions(-)
base-commit: 88bb466c9dec4f70d682cf38c685324e7b1b3d60
--
2.39.3
When we collect a signal context with one of the SME modes enabled we will
have enabled that mode behind the compiler and libc's back so they may
issue some instructions not valid in streaming mode, causing spurious
failures.
For the code prior to issuing the BRK to trigger signal handling we need to
stay in streaming mode if we were already there since that's a part of the
signal context the caller is trying to collect. Unfortunately this code
includes a memset() which is likely to be heavily optimised and is likely
to use FP instructions incompatible with streaming mode. We can avoid this
happening by open coding the memset(), inserting a volatile assembly
statement to avoid the compiler recognising what's being done and doing
something in optimisation. This code is not performance critical so the
inefficiency should not be an issue.
After collecting the context we can simply exit streaming mode, avoiding
these issues. Use a full SMSTOP for safety to prevent any issues appearing
with ZA.
Reported-by: Will Deacon <will(a)kernel.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto v6.5-rc1.
- Link to v1: https://lore.kernel.org/r/20230628-arm64-signal-memcpy-fix-v1-1-db3e0300829…
---
.../selftests/arm64/signal/test_signals_utils.h | 28 +++++++++++++++++++++-
1 file changed, 27 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.h b/tools/testing/selftests/arm64/signal/test_signals_utils.h
index 222093f51b67..db28409fd44b 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.h
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.h
@@ -60,13 +60,28 @@ static __always_inline bool get_current_context(struct tdescr *td,
size_t dest_sz)
{
static volatile bool seen_already;
+ int i;
+ char *uc = (char *)dest_uc;
assert(td && dest_uc);
/* it's a genuine invocation..reinit */
seen_already = 0;
td->live_uc_valid = 0;
td->live_sz = dest_sz;
- memset(dest_uc, 0x00, td->live_sz);
+
+ /*
+ * This is a memset() but we don't want the compiler to
+ * optimise it into either instructions or a library call
+ * which might be incompatible with streaming mode.
+ */
+ for (i = 0; i < td->live_sz; i++) {
+ asm volatile("nop"
+ : "+m" (*dest_uc)
+ :
+ : "memory");
+ uc[i] = 0;
+ }
+
td->live_uc = dest_uc;
/*
* Grab ucontext_t triggering a SIGTRAP.
@@ -103,6 +118,17 @@ static __always_inline bool get_current_context(struct tdescr *td,
:
: "memory");
+ /*
+ * If we were grabbing a streaming mode context then we may
+ * have entered streaming mode behind the system's back and
+ * libc or compiler generated code might decide to do
+ * something invalid in streaming mode, or potentially even
+ * the state of ZA. Issue a SMSTOP to exit both now we have
+ * grabbed the state.
+ */
+ if (td->feats_supported & FEAT_SME)
+ asm volatile("msr S0_3_C4_C6_3, xzr");
+
/*
* If we get here with seen_already==1 it implies the td->live_uc
* context has been used to get back here....this probably means
---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230628-arm64-signal-memcpy-fix-7de3b3c8fa10
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Hi All,
This is v2 of my series to clean up mm selftests so that they run correctly on
arm64. See [1] for full explanation.
Changes Since v1 [1]
--------------------
- Patch 1: Explicitly set line buffer mode in ksft_print_header()
- Dropped v1 patch 2 (set execute permissions): Andrew has taken this into his
branch separately.
- Patch 2: Don't compile `soft-dirty` suite for arm64 instead of skipping it
at runtime.
- Patch 2: Declare fewer tests and skip all of test_softdirty() if soft-dirty
is not supported, rather than conditionally marking each check as skipped.
- Added Reviewed-by tags: thanks DavidH!
- Patch 8: Clarified commit message.
[1] https://lore.kernel.org/linux-mm/20230713135440.3651409-1-ryan.roberts@arm.…
Thanks,
Ryan
Ryan Roberts (8):
selftests: Line buffer test program's stdout
selftests/mm: Skip soft-dirty tests on arm64
selftests/mm: Enable mrelease_test for arm64
selftests/mm: Fix thuge-gen test bugs
selftests/mm: va_high_addr_switch should skip unsupported arm64
configs
selftests/mm: Make migration test robust to failure
selftests/mm: Optionally pass duration to transhuge-stress
selftests/mm: Run all tests from run_vmtests.sh
tools/testing/selftests/kselftest.h | 9 ++
tools/testing/selftests/kselftest/runner.sh | 7 +-
tools/testing/selftests/mm/Makefile | 82 ++++++++++---------
tools/testing/selftests/mm/madv_populate.c | 26 +++++-
tools/testing/selftests/mm/migration.c | 14 +++-
tools/testing/selftests/mm/mrelease_test.c | 1 +
tools/testing/selftests/mm/run_vmtests.sh | 28 ++++++-
tools/testing/selftests/mm/settings | 2 +-
tools/testing/selftests/mm/thuge-gen.c | 4 +-
tools/testing/selftests/mm/transhuge-stress.c | 12 ++-
.../selftests/mm/va_high_addr_switch.c | 2 +-
11 files changed, 133 insertions(+), 54 deletions(-)
--
2.25.1
Hi,
Consequential to the previous problem report, this one addresses almost the very
next test script.
The testing environment is the same: 6.5-rc2 vanilla Torvalds tree on Ubuntu 22.04 LTS.
The used config is the same, please find it with the bridge_mdb.sh normal and "set -x"
output on this link (too large to attach):
https://domac.alu.unizg.hr/~mtodorov/linux/selftests/net-forwarding/bridge_…
root@defiant:# ./bridge_mdb.sh
INFO: # Host entries configuration tests
TEST: Common host entries configuration tests (IPv4) [FAIL]
Managed to add IPv4 host entry with a filter mode
TEST: Common host entries configuration tests (IPv6) [FAIL]
Managed to add IPv6 host entry with a filter mode
TEST: Common host entries configuration tests (L2) [FAIL]
Managed to add L2 host entry with a filter mode
INFO: # Port group entries configuration tests - (*, G)
Command "replace" is unknown, try "bridge mdb help".
TEST: Common port group entries configuration tests (IPv4 (*, G)) [FAIL]
Failed to replace IPv4 (*, G) entry
Command "replace" is unknown, try "bridge mdb help".
TEST: Common port group entries configuration tests (IPv6 (*, G)) [FAIL]
Failed to replace IPv6 (*, G) entry
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
RTNETLINK answers: Invalid argument
Error: bridge: (*, G) group is already joined by port.
Error: bridge: (*, G) group is already joined by port.
TEST: IPv4 (*, G) port group entries configuration tests [FAIL]
(S, G) entry not created
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
RTNETLINK answers: Invalid argument
Error: bridge: (*, G) group is already joined by port.
Error: bridge: (*, G) group is already joined by port.
TEST: IPv6 (*, G) port group entries configuration tests [FAIL]
(S, G) entry not created
INFO: # Port group entries configuration tests - (S, G)
Command "replace" is unknown, try "bridge mdb help".
TEST: Common port group entries configuration tests (IPv4 (S, G)) [FAIL]
Failed to replace IPv4 (S, G) entry
Command "replace" is unknown, try "bridge mdb help".
TEST: Common port group entries configuration tests (IPv6 (S, G)) [FAIL]
Failed to replace IPv6 (S, G) entry
Error: bridge: (S, G) group is already joined by port.
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
TEST: IPv4 (S, G) port group entries configuration tests [FAIL]
Managed to add an entry with a filter mode
Error: bridge: (S, G) group is already joined by port.
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
TEST: IPv6 (S, G) port group entries configuration tests [FAIL]
"temp" entry has an unpending group timer
INFO: # Port group entries configuration tests - L2
Command "replace" is unknown, try "bridge mdb help".
TEST: Common port group entries configuration tests (L2 (*, G)) [FAIL]
Failed to replace L2 (*, G) entry
TEST: L2 (*, G) port group entries configuration tests [FAIL]
Managed to add an entry with a filter mode
INFO: # Large scale dump tests
TEST: IPv4 large scale dump tests [ OK ]
TEST: IPv6 large scale dump tests [ OK ]
TEST: L2 large scale dump tests [ OK ]
INFO: # Forwarding tests
Error: bridge: Group is already joined by host.
TEST: IPv4 host entries forwarding tests [FAIL]
Packet not locally received after adding a host entry
Error: bridge: Group is already joined by host.
TEST: IPv6 host entries forwarding tests [FAIL]
Packet locally received after flood
TEST: L2 host entries forwarding tests [FAIL]
Packet not locally received after flood
Command "replace" is unknown, try "bridge mdb help".
TEST: IPv4 port group "exclude" entries forwarding tests [FAIL]
Packet from valid source not received on H2 after adding entry
Command "replace" is unknown, try "bridge mdb help".
TEST: IPv6 port group "exclude" entries forwarding tests [FAIL]
Packet from invalid source received on H2 after adding entry
Command "replace" is unknown, try "bridge mdb help".
TEST: IPv4 port group "include" entries forwarding tests [FAIL]
Packet from valid source not received on H2 after adding entry
Command "replace" is unknown, try "bridge mdb help".
TEST: IPv6 port group "include" entries forwarding tests [FAIL]
Packet from invalid source received on H2 after adding entry
TEST: L2 port entries forwarding tests [ OK ]
INFO: # Control packets tests
Command "replace" is unknown, try "bridge mdb help".
TEST: IGMPv3 MODE_IS_INCLUDE tests [FAIL]
Source not add to source list
Command "replace" is unknown, try "bridge mdb help".
TEST: MLDv2 MODE_IS_INCLUDE tests [FAIL]
Source not add to source list
root@defiant:# bridge mdb show
root@defiant:#
NOTE that several "sleep 10" command looped in the script can easily exceed
the default timeout of 45 seconds, and SIGTERM to the script isn't processed,
so it leaves the system in an unpredictable state from which even
"systemctl restart networking" didn't bail out.
Setting tools/testing/selftests/net/forwarding/settings:timeout=150 seemed enough.
Best regards,
Mirsad Todorovac
This series is a follow up to the recent change [1] which added
per-cpu insert/delete statistics for maps. The bpf_map_sum_elem_count
kfunc presented in the original series was only available to tracing
programs, so let's make it available to all.
The first patch makes types listed in the reg2btf_ids[] array to be
considered trusted by kfuncs.
The second patch allows to treat CONST_PTR_TO_MAP as trusted pointers from
kfunc's point of view by adding it to the reg2btf_ids[] array.
The third patch adds missing const to the map argument of the
bpf_map_sum_elem_count kfunc.
The fourth patch registers the bpf_map_sum_elem_count for all programs,
and patches selftests correspondingly.
[1] https://lore.kernel.org/bpf/20230705160139.19967-1-aspsk@isovalent.com/
v1 -> v2:
* treat the whole reg2btf_ids array as trusted (Alexei)
Anton Protopopov (4):
bpf: consider types listed in reg2btf_ids as trusted
bpf: consider CONST_PTR_TO_MAP as trusted pointer to struct bpf_map
bpf: make an argument const in the bpf_map_sum_elem_count kfunc
bpf: allow any program to use the bpf_map_sum_elem_count kfunc
include/linux/btf_ids.h | 1 +
kernel/bpf/map_iter.c | 7 +++---
kernel/bpf/verifier.c | 22 +++++++++++--------
.../selftests/bpf/progs/map_ptr_kern.c | 5 +++++
4 files changed, 22 insertions(+), 13 deletions(-)
--
2.34.1
bpf_dynptr_slice(_rw) uses a user provided buffer if it can not provide
a pointer to a block of contiguous memory. This buffer is unused in the
case of local dynptrs, and may be unused in other cases as well. There
is no need to require the buffer, as the kfunc can just return NULL if
it was needed and not provided.
This adds another kfunc annotation, __opt, which combines with __sz and
__szk to allow the buffer associated with the size to be NULL. If the
buffer is NULL, the verifier does not check that the buffer is of
sufficient size.
Signed-off-by: Daniel Rosenberg <drosen(a)google.com>
---
Documentation/bpf/kfuncs.rst | 23 ++++++++++++++++++++++-
include/linux/skbuff.h | 2 +-
kernel/bpf/helpers.c | 30 ++++++++++++++++++------------
kernel/bpf/verifier.c | 17 +++++++++++++----
4 files changed, 54 insertions(+), 18 deletions(-)
diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index ea2516374d92..7a3d9de5f315 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -100,7 +100,7 @@ Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
size parameter, and the value of the constant matters for program safety, __k
suffix should be used.
-2.2.2 __uninit Annotation
+2.2.3 __uninit Annotation
-------------------------
This annotation is used to indicate that the argument will be treated as
@@ -117,6 +117,27 @@ Here, the dynptr will be treated as an uninitialized dynptr. Without this
annotation, the verifier will reject the program if the dynptr passed in is
not initialized.
+2.2.4 __opt Annotation
+-------------------------
+
+This annotation is used to indicate that the buffer associated with an __sz or __szk
+argument may be null. If the function is passed a nullptr in place of the buffer,
+the verifier will not check that length is appropriate for the buffer. The kfunc is
+responsible for checking if this buffer is null before using it.
+
+An example is given below::
+
+ __bpf_kfunc void *bpf_dynptr_slice(..., void *buffer__opt, u32 buffer__szk)
+ {
+ ...
+ }
+
+Here, the buffer may be null. If buffer is not null, it at least of size buffer_szk.
+Either way, the returned buffer is either NULL, or of size buffer_szk. Without this
+annotation, the verifier will reject the program if a null pointer is passed in with
+a nonzero size.
+
+
.. _BPF_kfunc_nodef:
2.3 Using an existing kernel function
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 738776ab8838..8ddb4af1a501 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4033,7 +4033,7 @@ __skb_header_pointer(const struct sk_buff *skb, int offset, int len,
if (likely(hlen - offset >= len))
return (void *)data + offset;
- if (!skb || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
+ if (!skb || !buffer || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
return NULL;
return buffer;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 8d368fa353f9..26efb6fbeab2 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2167,13 +2167,15 @@ __bpf_kfunc struct task_struct *bpf_task_from_pid(s32 pid)
* bpf_dynptr_slice() - Obtain a read-only pointer to the dynptr data.
* @ptr: The dynptr whose data slice to retrieve
* @offset: Offset into the dynptr
- * @buffer: User-provided buffer to copy contents into
- * @buffer__szk: Size (in bytes) of the buffer. This is the length of the
- * requested slice. This must be a constant.
+ * @buffer__opt: User-provided buffer to copy contents into. May be NULL
+ * @buffer__szk: Size (in bytes) of the buffer if present. This is the
+ * length of the requested slice. This must be a constant.
*
* For non-skb and non-xdp type dynptrs, there is no difference between
* bpf_dynptr_slice and bpf_dynptr_data.
*
+ * If buffer__opt is NULL, the call will fail if buffer_opt was needed.
+ *
* If the intention is to write to the data slice, please use
* bpf_dynptr_slice_rdwr.
*
@@ -2190,7 +2192,7 @@ __bpf_kfunc struct task_struct *bpf_task_from_pid(s32 pid)
* direct pointer)
*/
__bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset,
- void *buffer, u32 buffer__szk)
+ void *buffer__opt, u32 buffer__szk)
{
enum bpf_dynptr_type type;
u32 len = buffer__szk;
@@ -2210,15 +2212,17 @@ __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset
case BPF_DYNPTR_TYPE_RINGBUF:
return ptr->data + ptr->offset + offset;
case BPF_DYNPTR_TYPE_SKB:
- return skb_header_pointer(ptr->data, ptr->offset + offset, len, buffer);
+ return skb_header_pointer(ptr->data, ptr->offset + offset, len, buffer__opt);
case BPF_DYNPTR_TYPE_XDP:
{
void *xdp_ptr = bpf_xdp_pointer(ptr->data, ptr->offset + offset, len);
if (xdp_ptr)
return xdp_ptr;
- bpf_xdp_copy_buf(ptr->data, ptr->offset + offset, buffer, len, false);
- return buffer;
+ if (!buffer__opt)
+ return NULL;
+ bpf_xdp_copy_buf(ptr->data, ptr->offset + offset, buffer__opt, len, false);
+ return buffer__opt;
}
default:
WARN_ONCE(true, "unknown dynptr type %d\n", type);
@@ -2230,13 +2234,15 @@ __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset
* bpf_dynptr_slice_rdwr() - Obtain a writable pointer to the dynptr data.
* @ptr: The dynptr whose data slice to retrieve
* @offset: Offset into the dynptr
- * @buffer: User-provided buffer to copy contents into
- * @buffer__szk: Size (in bytes) of the buffer. This is the length of the
- * requested slice. This must be a constant.
+ * @buffer__opt: User-provided buffer to copy contents into. May be NULL
+ * @buffer__szk: Size (in bytes) of the buffer if present. This is the
+ * length of the requested slice. This must be a constant.
*
* For non-skb and non-xdp type dynptrs, there is no difference between
* bpf_dynptr_slice and bpf_dynptr_data.
*
+ * If buffer__opt is NULL, the call will fail if buffer_opt was needed.
+ *
* The returned pointer is writable and may point to either directly the dynptr
* data at the requested offset or to the buffer if unable to obtain a direct
* data pointer to (example: the requested slice is to the paged area of an skb
@@ -2267,7 +2273,7 @@ __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset
* direct pointer)
*/
__bpf_kfunc void *bpf_dynptr_slice_rdwr(const struct bpf_dynptr_kern *ptr, u32 offset,
- void *buffer, u32 buffer__szk)
+ void *buffer__opt, u32 buffer__szk)
{
if (!ptr->data || bpf_dynptr_is_rdonly(ptr))
return NULL;
@@ -2294,7 +2300,7 @@ __bpf_kfunc void *bpf_dynptr_slice_rdwr(const struct bpf_dynptr_kern *ptr, u32 o
* will be copied out into the buffer and the user will need to call
* bpf_dynptr_write() to commit changes.
*/
- return bpf_dynptr_slice(ptr, offset, buffer, buffer__szk);
+ return bpf_dynptr_slice(ptr, offset, buffer__opt, buffer__szk);
}
__bpf_kfunc void *bpf_cast_to_kern_ctx(void *obj)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index fbcf5a4e2fcd..708ae7bca1fe 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9398,6 +9398,11 @@ static bool is_kfunc_arg_const_mem_size(const struct btf *btf,
return __kfunc_param_match_suffix(btf, arg, "__szk");
}
+static bool is_kfunc_arg_optional(const struct btf *btf, const struct btf_param *arg)
+{
+ return __kfunc_param_match_suffix(btf, arg, "__opt");
+}
+
static bool is_kfunc_arg_constant(const struct btf *btf, const struct btf_param *arg)
{
return __kfunc_param_match_suffix(btf, arg, "__k");
@@ -10464,13 +10469,17 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
break;
case KF_ARG_PTR_TO_MEM_SIZE:
{
+ struct bpf_reg_state *buff_reg = ®s[regno];
+ const struct btf_param *buff_arg = &args[i];
struct bpf_reg_state *size_reg = ®s[regno + 1];
const struct btf_param *size_arg = &args[i + 1];
- ret = check_kfunc_mem_size_reg(env, size_reg, regno + 1);
- if (ret < 0) {
- verbose(env, "arg#%d arg#%d memory, len pair leads to invalid memory access\n", i, i + 1);
- return ret;
+ if (!register_is_null(buff_reg) || !is_kfunc_arg_optional(meta->btf, buff_arg)) {
+ ret = check_kfunc_mem_size_reg(env, size_reg, regno + 1);
+ if (ret < 0) {
+ verbose(env, "arg#%d arg#%d memory, len pair leads to invalid memory access\n", i, i + 1);
+ return ret;
+ }
}
if (is_kfunc_arg_const_mem_size(meta->btf, size_arg, size_reg)) {
base-commit: 6e98b09da931a00bf4e0477d0fa52748bf28fcce
--
2.40.1.495.gc816e09b53d-goog
The arm64 Guarded Control Stack (GCS) feature provides support for
hardware protected stacks of return addresses, intended to provide
hardening against return oriented programming (ROP) attacks and to make
it easier to gather call stacks for applications such as profiling.
When GCS is active a secondary stack called the Guarded Control Stack is
maintained, protected with a memory attribute which means that it can
only be written with specific GCS operations. When a BL is executed the
value stored in LR is also pushed onto the GCS, and when a RET is
executed the top of the GCS is popped and compared to LR with a fault
being raised if the values do not match. GCS operations may only be
performed on GCS pages, a data abort is generated if they are not.
This series implements support for use of GCS by EL0, along with support
for use of GCS within KVM guests. It does not enable use of GCS by
either EL1 or EL2. Executables are started without GCS and must use a
prctl() to enable it, it is expected that this will be done very early
in application execution by the dynamic linker or other startup code.
x86 has an equivalent feature called shadow stacks, this series depends
on the x86 patches for generic memory management support for the new
guarded/shadow stack page type and shares APIs as much as possible. As
there has been extensive discussion with the wider community around the
ABI for shadow stacks I have as far as practical kept implementation
decisions close to those for x86, anticipating that review would lead to
similar conclusions in the absence of strong reasoning for divergence.
The main divergence I am concious of is that x86 allows shadow stack to
be enabled and disabled repeatedly, freeing the shadow stack for the
thread whenever disabled, while this implementation keeps the GCS
allocated after disable but refuses to reenable it. This is to avoid
races with things actively walking the GCS during a disable, we do
anticipate that some systems will wish to disable GCS at runtime but are
not aware of any demand for subsequently reenabling it.
x86 uses an arch_prctl() to manage enable and disable, since only x86
and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a
patch set for the equivalent RISC-V zisslpcfi feature which is adopted
with some enhancements here.
There's a few bits where I'm not convinced with where I've placed
things, in particular the GCS write operation is in the GCS header not
in uaccess.h, I wasn't sure what was clearest there and am probably too
close to the code to have a clear opinion.
The series depends on the x86 shadow stack support:
https://lore.kernel.org/lkml/20230227222957.24501-1-rick.p.edgecombe@intel.…
I've rebased this onto v6.5-rc1 but not included it in the series in
order to avoid confusion with Rick's work and cut down the size of the
series, you can see the branch at:
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc.git arm64-gcs
[1] https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Deepak Gupta (1):
prctl: arch-agnostic prctl for shadow stack
Mark Brown (34):
prctl: Add flag for shadow stack writeability and push/pop
arm64: Document boot requirements for Guarded Control Stacks
arm64/gcs: Document the ABI for Guarded Control Stacks
arm64/sysreg: Add new system registers for GCS
arm64/sysreg: Add definitions for architected GCS caps
arm64/gcs: Add manual encodings of GCS instructions
arm64/gcs: Provide copy_to_user_gcs()
arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS)
arm64/mm: Allocate PIE slots for EL0 guarded control stack
mm: Define VM_SHADOW_STACK for arm64 when we support GCS
arm64/mm: Map pages for guarded control stack
KVM: arm64: Manage GCS registers for guests
arm64: Disable traps for GCS usage at EL0 and EL1
arm64/idreg: Add overrride for GCS
arm64/hwcap: Add hwcap for GCS
arm64/traps: Handle GCS exceptions
arm64/mm: Handle GCS data aborts
arm64/gcs: Context switch GCS registers for EL0
arm64/gcs: Allocate a new GCS for threads with GCS enabled
arm64/gcs: Implement shadow stack prctl() interface
arm64/mm: Implement map_shadow_stack()
arm64/signal: Set up and restore the GCS context for signal handlers
arm64/signal: Expose GCS state in signal frames
arm64/ptrace: Expose GCS via ptrace and core files
arm64: Add Kconfig for Guarded Control Stack (GCS)
kselftest/arm64: Verify the GCS hwcap
kselftest/arm64: Add GCS as a detected feature in the signal tests
kselftest/arm64: Add framework support for GCS to signal handling tests
kselftest/arm64: Allow signals tests to specify an expected si_code
kselftest/arm64: Always run signals tests with GCS enabled
kselftest/arm64: Add very basic GCS test program
kselftest/arm64: Add a GCS test program built with the system libc
selftests/arm64: Add GCS signal tests
kselftest/arm64: Enable GCS for the FP stress tests
Documentation/admin-guide/kernel-parameters.txt | 3 +
Documentation/arch/arm64/booting.rst | 22 ++
Documentation/arch/arm64/elf_hwcaps.rst | 3 +
Documentation/arch/arm64/gcs.rst | 216 +++++++++++++
Documentation/arch/arm64/index.rst | 1 +
Documentation/filesystems/proc.rst | 2 +-
arch/arm64/Kconfig | 19 ++
arch/arm64/include/asm/cpufeature.h | 6 +
arch/arm64/include/asm/el2_setup.h | 9 +
arch/arm64/include/asm/esr.h | 26 +-
arch/arm64/include/asm/exception.h | 2 +
arch/arm64/include/asm/gcs.h | 88 ++++++
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/asm/kvm_host.h | 12 +
arch/arm64/include/asm/pgtable-prot.h | 14 +-
arch/arm64/include/asm/processor.h | 6 +
arch/arm64/include/asm/sysreg.h | 20 ++
arch/arm64/include/asm/uaccess.h | 42 +++
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/ptrace.h | 7 +
arch/arm64/include/uapi/asm/sigcontext.h | 9 +
arch/arm64/kernel/cpufeature.c | 23 ++
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kernel/entry-common.c | 23 ++
arch/arm64/kernel/idreg-override.c | 2 +
arch/arm64/kernel/process.c | 77 +++++
arch/arm64/kernel/ptrace.c | 50 +++
arch/arm64/kernel/signal.c | 240 +++++++++++++-
arch/arm64/kernel/traps.c | 11 +
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 17 +
arch/arm64/kvm/sys_regs.c | 22 ++
arch/arm64/mm/Makefile | 1 +
arch/arm64/mm/fault.c | 75 ++++-
arch/arm64/mm/gcs.c | 202 ++++++++++++
arch/arm64/mm/mmap.c | 17 +-
arch/arm64/tools/cpucaps | 1 +
arch/arm64/tools/sysreg | 55 ++++
fs/proc/task_mmu.c | 3 +
include/linux/mm.h | 15 +-
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 5 +-
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 19 ++
kernel/sys.c | 20 ++
kernel/sys_ni.c | 1 +
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/abi/hwcap.c | 19 ++
tools/testing/selftests/arm64/fp/assembler.h | 15 +
tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 +
tools/testing/selftests/arm64/fp/sve-test.S | 2 +
tools/testing/selftests/arm64/fp/za-test.S | 2 +
tools/testing/selftests/arm64/fp/zt-test.S | 2 +
tools/testing/selftests/arm64/gcs/.gitignore | 2 +
tools/testing/selftests/arm64/gcs/Makefile | 19 ++
tools/testing/selftests/arm64/gcs/basic-gcs.c | 350 +++++++++++++++++++++
tools/testing/selftests/arm64/gcs/gcs-util.h | 65 ++++
tools/testing/selftests/arm64/gcs/libc-gcs.c | 217 +++++++++++++
tools/testing/selftests/arm64/signal/.gitignore | 1 +
.../testing/selftests/arm64/signal/test_signals.c | 17 +-
.../testing/selftests/arm64/signal/test_signals.h | 6 +
.../selftests/arm64/signal/test_signals_utils.c | 32 +-
.../selftests/arm64/signal/test_signals_utils.h | 39 +++
.../arm64/signal/testcases/gcs_exception_fault.c | 59 ++++
.../selftests/arm64/signal/testcases/gcs_frame.c | 78 +++++
.../arm64/signal/testcases/gcs_write_fault.c | 67 ++++
.../selftests/arm64/signal/testcases/testcases.c | 7 +
.../selftests/arm64/signal/testcases/testcases.h | 1 +
67 files changed, 2363 insertions(+), 32 deletions(-)
---
base-commit: 023ee2d672f3d7c2d15acf62bcfc4bc49c3677e5
change-id: 20230303-arm64-gcs-e311ab0d8729
Best regards,
--
Mark Brown <broonie(a)kernel.org>
We have some KUnit tests for ASoC but they're not being run as much as
they should be since ASoC isn't enabled in the configs used by default
with KUnit and in the case of the topology tests there is no way to
enable them without enabling drivers that use them. This series
provides a Kconfig option which KUnit can use directly rather than worry
about drivers.
Further, since KUnit is typically run in UML but ALSA prevents build
with UML we need to remove that Kconfig conflict. As far as I can tell
the motiviation for this is that many ALSA drivers use iomem APIs which
are not available under UML and it's more trouble than it's worth to go
through and add per driver dependencies. In order to avoid these issues
we also provide stubs for these APIs so there are no build time issues
if a driver relies on iomem but does not depend on it. With these stubs
I am able to build all the sound drivers available in a UML defconfig
(UML allmodconfig appears to have substantial other issues in a quick
test).
With this series I am able to run the topology KUnit tests as part of a
kunit --alltests run.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Add support for building ALSA with UML.
- Link to v1: https://lore.kernel.org/r/20230712-asoc-topology-kunit-enable-v1-0-b9f2da9d…
---
Mark Brown (5):
driver core: Provide stubs for !IOMEM builds
platform: Provide stubs for !HAS_IOMEM builds
ALSA: Enable build with UML
kunit: Enable ASoC in all_tests.config
ASoC: topology: Add explicit build option
include/linux/device.h | 26 ++++++++++++++++++++++++++
include/linux/platform_device.h | 28 ++++++++++++++++++++++++++++
sound/Kconfig | 4 ----
sound/soc/Kconfig | 11 +++++++++++
tools/testing/kunit/configs/all_tests.config | 5 +++++
5 files changed, 70 insertions(+), 4 deletions(-)
---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230701-asoc-topology-kunit-enable-5e8dd50d0ed7
Best regards,
--
Mark Brown <broonie(a)kernel.org>
We have some KUnit tests for ASoC but they're not being run as much as
they should be since ASoC isn't enabled in the configs used by default
with KUnit and in the case of the topolofy tests there is no way to
enable them without enabling drivers that use them. Let's improve that.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (2):
kunit: Enable ASoC in all_tests.config
ASoC: topology: Add explicit build option
sound/soc/Kconfig | 11 +++++++++++
tools/testing/kunit/configs/all_tests.config | 5 +++++
2 files changed, 16 insertions(+)
---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230701-asoc-topology-kunit-enable-5e8dd50d0ed7
Best regards,
--
Mark Brown <broonie(a)kernel.org>
In order to select a nexthop for multipath routes, fib_select_multipath()
is used with legacy nexthops and nexthop_select_path_hthr() is used with
nexthop objects. Those two functions perform a validity test on the
neighbor related to each nexthop but their logic is structured differently.
This causes a divergence in behavior and nexthop_select_path_hthr() may
return a nexthop that failed the neighbor validity test even if there was
one that passed.
Refactor nexthop_select_path_hthr() to make it more similar to
fib_select_multipath() and fix the problem mentioned above.
Benjamin Poirier (4):
nexthop: Factor out hash threshold fdb nexthop selection
nexthop: Factor out neighbor validity check
nexthop: Do not return invalid nexthop object during multipath
selection
selftests: net: Add test cases for nexthop groups with invalid
neighbors
net/ipv4/nexthop.c | 64 +++++++---
tools/testing/selftests/net/fib_nexthops.sh | 129 ++++++++++++++++++++
2 files changed, 174 insertions(+), 19 deletions(-)
--
2.40.1
The current selftests infrastructure formats the results in TAP 13. This
version doesn't support subtests and only the end result of each
selftest is taken into account. It means that a single issue in a
subtest of a selftest containing multiple subtests forces the whole
selftest to be marked as failed. It also means that subtests results are
not tracked by CI executing selftests.
MPTCP selftests run hundreds of various subtests. It is then important
to track each of them and not one result per selftest.
It is particularly interesting to do that when validating stable kernels
with the last version of the test suite: tests might fail because a
feature is not supported but the test didn't skip that part. In this
case, if subtests are not tracked, the whole selftest will be marked as
failed making the other subtests useless because their results are
ignored.
Regarding this patch set:
- The two first patches modify connect and userspace_pm selftests to
continue executing other tests if there is an error before the end.
This is what is done in the other MPTCP selftests.
- Patches 3-5 are refactoring the code in userspace_pm selftest to
reduce duplicated code, suppress some shellcheck warnings and prepare
subtests' support by using new helpers.
- Patch 6 adds new helpers in mptcp_lib.sh to easily support printing
the subtests results in the different MPTCP selftests.
- Patch 7-13 format subtests results in TAP 13 in the different MPTCP
selftests.
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Matthieu Baerts (13):
selftests: mptcp: connect: don't stop if error
selftests: mptcp: userspace pm: don't stop if error
selftests: mptcp: userspace_pm: fix shellcheck warnings
selftests: mptcp: userspace_pm: uniform results printing
selftests: mptcp: userspace_pm: reduce dup code around printf
selftests: mptcp: lib: format subtests results in TAP
selftests: mptcp: connect: format subtests results in TAP
selftests: mptcp: pm_netlink: format subtests results in TAP
selftests: mptcp: join: format subtests results in TAP
selftests: mptcp: diag: format subtests results in TAP
selftests: mptcp: simult flows: format subtests results in TAP
selftests: mptcp: sockopt: format subtests results in TAP
selftests: mptcp: userspace_pm: format subtests results in TAP
tools/testing/selftests/net/mptcp/diag.sh | 7 +
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 66 ++++++--
tools/testing/selftests/net/mptcp/mptcp_join.sh | 37 ++++-
tools/testing/selftests/net/mptcp/mptcp_lib.sh | 66 ++++++++
tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 20 ++-
tools/testing/selftests/net/mptcp/pm_netlink.sh | 6 +-
tools/testing/selftests/net/mptcp/simult_flows.sh | 4 +
tools/testing/selftests/net/mptcp/userspace_pm.sh | 181 +++++++++++++--------
8 files changed, 298 insertions(+), 89 deletions(-)
---
base-commit: 60cc1f7d0605598b47ee3c0c2b4b6fbd4da50a06
change-id: 20230712-upstream-net-next-20230712-selftests-mptcp-subtests-25d250d77886
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
Hello everyone,
This is an RFC patch series to propose the addition of a test attributes
framework to KUnit.
There has been interest in filtering out "slow" KUnit tests. Most notably,
a new config, CONFIG_MEMCPY_SLOW_KUNIT_TEST, has been added to exclude a
particularly slow memcpy test
(https://lore.kernel.org/all/20230118200653.give.574-kees@kernel.org/).
This proposed attributes framework would be used to save and access test
associated data, including whether a test is slow. These attributes would
be reportable (via KTAP and command line output) and some will be
filterable.
This framework is designed to allow for the addition of other attributes in
the future. These attributes could include whether the test is flaky,
associated test files, etc.
This is the second version of the RFC I have added a few big changes:
- Change method for inputting filters to allow for spaces in filtering
values
- Add option to skip filtered tests instead of not run or show them with
the --filter_skip flag
- Separate the new feature to list tests and their attributes into both
--list_tests (lists just tests) and --list_tests_attr (lists all)
- Add new attribute to store module name associated with test
- Add Tests to executor_test.c
- Add Documentation
- A few small changes to code commented on previously
I would love to hear about the new features. If the series seems overall
good I will send out the next version as an official patch series.
Thanks!
Rae
Rae Moar (9):
kunit: Add test attributes API structure
kunit: Add speed attribute
kunit: Add module attribute
kunit: Add ability to filter attributes
kunit: tool: Add command line interface to filter and report
attributes
kunit: memcpy: Mark tests as slow using test attributes
kunit: time: Mark test as slow using test attributes
kunit: add tests for filtering attributes
kunit: Add documentation of KUnit test attributes
.../dev-tools/kunit/running_tips.rst | 163 +++++++
include/kunit/attributes.h | 50 +++
include/kunit/test.h | 68 ++-
kernel/time/time_test.c | 2 +-
lib/Kconfig.debug | 3 +
lib/kunit/Makefile | 3 +-
lib/kunit/attributes.c | 406 ++++++++++++++++++
lib/kunit/executor.c | 115 ++++-
lib/kunit/executor_test.c | 119 ++++-
lib/kunit/kunit-example-test.c | 9 +
lib/kunit/test.c | 27 +-
lib/memcpy_kunit.c | 8 +-
tools/testing/kunit/kunit.py | 80 +++-
tools/testing/kunit/kunit_kernel.py | 6 +-
tools/testing/kunit/kunit_tool_test.py | 39 +-
15 files changed, 1022 insertions(+), 76 deletions(-)
create mode 100644 include/kunit/attributes.h
create mode 100644 lib/kunit/attributes.c
base-commit: 2e66833579ed759d7b7da1a8f07eb727ec6e80db
--
2.41.0.255.g8b1d071c50-goog
This short series is a follow up to the recent series [1] which added
per-cpu insert/delete statistics for maps. The bpf_map_sum_elem_count
kfunc presented in the original series was only available to tracing
programs, so let's make it available to all.
The first patch allows to treat CONST_PTR_TO_MAP as trusted pointers
from kfunc's point of view.
The second patch just adds const to the map argument of the
bpf_map_sum_elem_count kfunc.
The third patch registers the bpf_map_sum_elem_count for all programs,
and patches selftests correspondingly.
Anton Protopopov (3):
bpf: consider CONST_PTR_TO_MAP as trusted pointer to struct bpf_map
bpf: make an argument const in the bpf_map_sum_elem_count kfunc
bpf: allow any program to use the bpf_map_sum_elem_count kfunc
include/linux/btf_ids.h | 1 +
kernel/bpf/map_iter.c | 7 +++----
kernel/bpf/verifier.c | 5 ++++-
tools/testing/selftests/bpf/progs/map_ptr_kern.c | 5 +++++
4 files changed, 13 insertions(+), 5 deletions(-)
--
2.34.1
Hi,
This patch series aims to improve the PMU event filter settings with a cleaner
and more organized structure and adds several test cases related to PMU event
filters.
These changes help to ensure that KVM's PMU event filter functions as expected
in all supported use cases.
Any feedback or suggestions are greatly appreciated.
Sincerely,
Jinrong Liang
Changes log:
v4:
- Rebased to 88bb466c9dec(tag: kvm-x86-next-2023.06.22);
- Add a patch to add macros for fixed counters in processor.h;
- Add a patch to drop the return of remove_event(); (Sean)
- Reverse xmas tree; (Sean)
- Optimize code style and comments; (Sean)
Previous:
https://lore.kernel.org/kvm/20230607123700.40229-1-cloudliang@tencent.com/T
Jinrong Liang (6):
KVM: selftests: Add macros for fixed counters in processor.h
KVM: selftests: Drop the return of remove_event()
KVM: selftests: Introduce __kvm_pmu_event_filter to improved event
filter settings
KVM: selftests: Add test cases for unsupported PMU event filter input
values
KVM: selftests: Test if event filter meets expectations on fixed
counters
KVM: selftests: Test gp event filters don't affect fixed event filters
.../selftests/kvm/include/x86_64/processor.h | 2 +
.../kvm/x86_64/pmu_event_filter_test.c | 314 ++++++++++++------
2 files changed, 222 insertions(+), 94 deletions(-)
base-commit: 88bb466c9dec4f70d682cf38c685324e7b1b3d60
--
2.39.3
When looking for something else in LKFT reports [1], I noticed that the
TC selftest ended with a timeout error:
not ok 1 selftests: tc-testing: tdc.sh # TIMEOUT 45 seconds
I also noticed most of the tests were skipped because the "teardown
stage" did not complete successfully. It was due to missing kconfig.
These patches fix these two errors plus an extra one because this
selftest reads info from "/proc/net/nf_conntrack". Thank you Pedro for
having helped me fixing these issues [2].
Link: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230711/te… [1]
Link: https://lore.kernel.org/netdev/0e061d4a-9a23-9f58-3b35-d8919de332d7@tessare… [2]
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Matthieu Baerts (3):
selftests: tc: set timeout to 15 minutes
selftests: tc: add 'ct' action kconfig dep
selftests: tc: add ConnTrack procfs kconfig
tools/testing/selftests/tc-testing/config | 2 ++
tools/testing/selftests/tc-testing/settings | 1 +
2 files changed, 3 insertions(+)
---
base-commit: 9d23aac8a85f69239e585c8656c6fdb21be65695
change-id: 20230713-tc-selftests-lkft-363e4590f105
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
Hi Willy, Zhangjin,
after your recent discussions about the test output and report I
wondered if it would make sense to switch nolibc-test to KTAP output
format [0].
With this it would be possible to have a wrapper script run each
architecture test as its own test subcomponent.
A (K)TAP parser/runner could then directly recognize and report failing
testcases, making it easier to validate.
Also maybe we can hook it up into the regular kselftests setup and have
the bots run it as part of that.
The kernel even includes a header-only library to implement the format [1].
It also should be fairly easy to emit the format without a library.
Thomas
[0] Documentation/dev-tools/ktap.rst
[1] Documentation/dev-tools/kselftest.rst (Test harness)
Willy, Thomas
We just sent the 'selftests/nolibc: allow run with minimal kernel
config' series [1], Here is the 'minimal' kernel config support, with
both of them, it is possible to run nolibc-test for all architectures
with oneline command and in less than ~30 minutes - 1 hour (not fullly
measured yet):
// run with tiny config + qemu-system
// Note: rv32 and loongarch require to download the bios at first
$ time make run-tiny-all QUIET_RUN=1
// run with default config + qemu-system
$ time make run-default-all QUIET_RUN=1
// run with qemu-user
$ time make run-user-all QUIET_RUN=1
Besides the 'tinyconfig' suggestion from Thomas, this patch also merge
the generic part of my local powerpc porting (the extconfig to add
additional console support).
This is applied after the test report patchset [2] and the rv32 compile
patchset [3], because all of them touched the same Makefile.
Even without the 'selftests/nolibc: allow run with minimal kernel
config' series [1], all of the tests can pass except the /proc/self/net
related ones (We haven't enable CONFIG_NET in this patchset), the
chmod_net one will be removed by Thomas from this patchset [4] for the
wrong chmodable attribute issue of /proc/self/net, the link_cross one
can be simply fixed up by using another /proc/self interface (like
/proc/self/cmdline), which will be covered in our revision of the [1]
series.
Beside the core 'minimal' config support, some generic patch are added
together to avoid patch conflicts.
* selftests/nolibc: add test for -include /path/to/nolibc.h
Add a test switch to allow run nolibc-test with nolibc.h
* selftests/nolibc: print result to the screen too
Let the run targets print results by default, allow disable by
QUIET_RUN=1
* selftests/nolibc: allow use x86_64 toolchain for i386
Allow use x86_64 toolchains for i386
* selftests/nolibc: add menuconfig target for manual config
a new 'menuconfig' target added for development and debugging
* selftests/nolibc: add tinyconfig target
a new 'tinyconfig' compare to 'defconfig', smaller and faster, but not
enough for boot and print, require following 'extconfig' target
* selftests/nolibc: allow customize extra kernel config options
a new 'extconfig' allows to add extra config options for 'defconfig'
and 'tinyconfig'
* selftests/nolibc: add common extra config options
selftests/nolibc: add power reset control support
selftests/nolibc: add procfs, shmem and tmpfs
Add common extra configs, the 3rd one (procfs, shmem and tmpfs) can be
completely reverted after [1] series, but as discuss with Thomas,
procfs may be still a hard requirement.
* selftests/nolibc: add extra configs for i386
selftests/nolibc: add extra configs for x86_64
selftests/nolibc: add extra configs for arm64
selftests/nolibc: add extra configs for arm
selftests/nolibc: add extra configs for mips
selftests/nolibc: add extra configs for riscv32
selftests/nolibc: add extra configs for riscv64
selftests/nolibc: add extra configs for s390x
selftests/nolibc: add extra configs for loongarch
Add architecture specific extra configs to let kernel boot and
nolibc-test print. The rv32 added here is only for test, it should not
be merged before the missing 64bit syscalls are added (still wait for
the merging of the __sysret and -ENOSYS patches).
* selftests/nolibc: config default CROSS_COMPILE
selftests/nolibc: add run-tiny and run-default
both run-tiny and run-default are added to do config and run together,
this easier test a log.
* selftests/nolibc: allow run tests on all targets
selftests/nolibc: detect bios existing to avoid hang
Further allow do run-user, run-tiny and run-default for all
architectures at once, the -all suffix is added to do so.
Since some generic patches are still in review, before sending the left
rv32 patches, I'm will send more generic patches later, the coming one
is arch-xxx.h cleanup, and then, the 32bit powerpc porting support.
For the compile speedup, the next step may be add architecture specific
'O' support, which may allow us rerun across architectures without
mrproper, for a single architecture development, this 'minimal' config
should be enough ;-)
Thanks.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1687344643.git.falcon@tinylab.org/
[2]: https://lore.kernel.org/lkml/cover.1687156559.git.falcon@tinylab.org/
[3]: https://lore.kernel.org/linux-riscv/cover.1687176996.git.falcon@tinylab.org/
[4]: https://lore.kernel.org/lkml/20230624-proc-net-setattr-v1-0-73176812adee@we…
Zhangjin Wu (22):
selftests/nolibc: add test for -include /path/to/nolibc.h
selftests/nolibc: print result to the screen too
selftests/nolibc: allow use x86_64 toolchain for i386
selftests/nolibc: add menuconfig target for manual config
selftests/nolibc: add tinyconfig target
selftests/nolibc: allow customize extra kernel config options
selftests/nolibc: add common extra config options
selftests/nolibc: add power reset control support
selftests/nolibc: add procfs, shmem and tmpfs
selftests/nolibc: add extra configs for i386
selftests/nolibc: add extra configs for x86_64
selftests/nolibc: add extra configs for arm64
selftests/nolibc: add extra configs for arm
selftests/nolibc: add extra configs for mips
selftests/nolibc: add extra configs for riscv32
selftests/nolibc: add extra configs for riscv64
selftests/nolibc: add extra configs for s390x
selftests/nolibc: add extra configs for loongarch
selftests/nolibc: config default CROSS_COMPILE
selftests/nolibc: add run-tiny and run-default
selftests/nolibc: allow run tests on all targets
selftests/nolibc: detect bios existing to avoid hang
tools/testing/selftests/nolibc/Makefile | 125 ++++++++++++++++++++++--
1 file changed, 119 insertions(+), 6 deletions(-)
--
2.25.1
Hi Linus,
Please pull the following Kselftest fixes update for Linux 6.5-rc3.
This Kselftest fixes update for Linux 6.5-rc3 consists of fixes to
bugs that are interfering with arm64 and riscv workflows. This update
also includes two fixes to timer and mincore tests that are causing
test failures.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5:
Linux 6.5-rc1 (2023-07-09 13:53:13 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-fixes-6.5-rc3
for you to fetch changes up to 569f8b501b177f21121d483a96491716ab8905f4:
selftests/arm64: fix build failure during the "emit_tests" step (2023-07-14 12:33:35 -0600)
----------------------------------------------------------------
linux-kselftest-fixes-6.5-rc3
This Kselftest fixes update for Linux 6.5-rc3 consists of fixes to
bugs that are interfering with arm64 and riscv workflows. This update
also includes two fixes to timer and mincore tests that are causing
test failures.
----------------------------------------------------------------
John Hubbard (2):
selftests/riscv: fix potential build failure during the "emit_tests" step
selftests/arm64: fix build failure during the "emit_tests" step
Minjie Du (1):
tools: timers: fix freq average calculation
Ricardo Cañuelo (1):
selftests/mincore: fix skip condition for check_huge_pages test
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/mincore/mincore_selftest.c | 4 ++--
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/timers/raw_skew.c | 3 +--
4 files changed, 5 insertions(+), 6 deletions(-)
----------------------------------------------------------------
*Changes in v25*:
- Do proper filtering on hole as well (hole got missed earlier)
*Changes in v24*:
- Rebase on top of next-20230710
- Place WP markers in case of hole as well
*Changes in v23*:
- Set vec_buf_index in loop only when vec_buf_index is set
- Return -EFAULT instead of -EINVAL if vec is NULL
- Correctly return the walk ending address to the page granularity
*Changes in v22*:
- Interface change:
- Replace [start start + len) with [start, end)
- Return the ending address of the address walk in start
*Changes in v21*:
- Abort walk instead of returning error if WP is to be performed on
partial hugetlb
*Changes in v20*
- Correct PAGE_IS_FILE and add PAGE_IS_PFNZERO
*Changes in v19*
- Minor changes and interface updates
*Changes in v18*
- Rebase on top of next-20230613
- Minor updates
*Changes in v17*
- Rebase on top of next-20230606
- Minor improvements in PAGEMAP_SCAN IOCTL patch
*Changes in v16*
- Fix a corner case
- Add exclusive PM_SCAN_OP_WP back
*Changes in v15*
- Build fix (Add missed build fix in RESEND)
*Changes in v14*
- Fix build error caused by #ifdef added at last minute in some configs
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 58 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 591 +++++++
fs/userfaultfd.c | 26 +-
include/linux/hugetlb.h | 1 +
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 55 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 34 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 55 +
tools/testing/selftests/mm/.gitignore | 2 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1464 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
16 files changed, 2362 insertions(+), 24 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
kselftest.rst states that flags must be specified before including lib.mk,
but the vDSO selftest Makefile does not follow this order. As a result,
changes made by lib.mk to flags and other variables are overwritten by the
Makefile. For example, it is impossible to pass CFLAGS to the compiler via
make.
Rectify this by including lib.mk after assigning flag values.
Also change the paths of the generated programs from absolute to relative
paths as lib.mk will now correctly prepend the output directory path to
the program name as intended.
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Signed-off-by: Aditya Deshpande <aditya.deshpande(a)arm.com>
---
tools/testing/selftests/vDSO/Makefile | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/vDSO/Makefile b/tools/testing/selftests/vDSO/Makefile
index d53a4d8008f9..19145210d044 100644
--- a/tools/testing/selftests/vDSO/Makefile
+++ b/tools/testing/selftests/vDSO/Makefile
@@ -1,16 +1,15 @@
# SPDX-License-Identifier: GPL-2.0
-include ../lib.mk
-
uname_M := $(shell uname -m 2>/dev/null || echo not)
ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e s/x86_64/x86/)
-TEST_GEN_PROGS := $(OUTPUT)/vdso_test_gettimeofday $(OUTPUT)/vdso_test_getcpu
-TEST_GEN_PROGS += $(OUTPUT)/vdso_test_abi
-TEST_GEN_PROGS += $(OUTPUT)/vdso_test_clock_getres
+TEST_GEN_PROGS := vdso_test_gettimeofday
+TEST_GEN_PROGS += vdso_test_getcpu
+TEST_GEN_PROGS += vdso_test_abi
+TEST_GEN_PROGS += vdso_test_clock_getres
ifeq ($(ARCH),$(filter $(ARCH),x86 x86_64))
-TEST_GEN_PROGS += $(OUTPUT)/vdso_standalone_test_x86
+TEST_GEN_PROGS += vdso_standalone_test_x86
endif
-TEST_GEN_PROGS += $(OUTPUT)/vdso_test_correctness
+TEST_GEN_PROGS += vdso_test_correctness
CFLAGS := -std=gnu99
CFLAGS_vdso_standalone_test_x86 := -nostdlib -fno-asynchronous-unwind-tables -fno-stack-protector
@@ -19,7 +18,8 @@ ifeq ($(CONFIG_X86_32),y)
LDLIBS += -lgcc_s
endif
-all: $(TEST_GEN_PROGS)
+include ../lib.mk
+
$(OUTPUT)/vdso_test_gettimeofday: parse_vdso.c vdso_test_gettimeofday.c
$(OUTPUT)/vdso_test_getcpu: parse_vdso.c vdso_test_getcpu.c
$(OUTPUT)/vdso_test_abi: parse_vdso.c vdso_test_abi.c
--
2.25.1
When running the rtctest if we pass wrong rtc device file as an argument
the test fails expectedly, but prints the logs that are not useful
to point out the issue.
To handle this, the patch adds a checks to verify if the rtc_file is valid.
Signed-off-by: Atul Kumar Pant <atulpant.linux(a)gmail.com>
---
changes since v3:
Added Linux-kselftest and Linux-kernel mailing lists.
changes since v2:
Changed error message when rtc file does not exist.
changes since v1:
Removed check for uid=0
If rtc file is invalid, then exit the test.
tools/testing/selftests/rtc/rtctest.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/rtc/rtctest.c b/tools/testing/selftests/rtc/rtctest.c
index 63ce02d1d5cc..630fef735c7e 100644
--- a/tools/testing/selftests/rtc/rtctest.c
+++ b/tools/testing/selftests/rtc/rtctest.c
@@ -17,6 +17,7 @@
#include <unistd.h>
#include "../kselftest_harness.h"
+#include "../kselftest.h"
#define NUM_UIE 3
#define ALARM_DELTA 3
@@ -419,6 +420,8 @@ __constructor_order_last(void)
int main(int argc, char **argv)
{
+ int ret = -1;
+
switch (argc) {
case 2:
rtc_file = argv[1];
@@ -430,5 +433,11 @@ int main(int argc, char **argv)
return 1;
}
- return test_harness_run(argc, argv);
+ // Run the test if rtc_file is valid
+ if (access(rtc_file, F_OK) == 0)
+ ret = test_harness_run(argc, argv);
+ else
+ ksft_exit_fail_msg("[ERROR]: Cannot access rtc file %s - Exiting\n", rtc_file);
+
+ return ret;
}
--
2.25.1
There are macros in kernel.h that can be used outside of that header.
Split them to args.h and replace open coded variants.
Test compiled with `make allmodconfig` for x86_64.
(Note that positive diff statistics is due to documentation being
updated.)
In v3:
- split to a series of patches
- fixed build issue on `make allmodconfig` for x86_64 (Andrew)
In v2:
- converted existing users at the same time (Andrew, Rasmus)
- documented how it does work (Andrew, Rasmus)
Andy Shevchenko (4):
kernel.h: Split out COUNT_ARGS() and CONCATENATE() to args.h
x86/asm: Replace custom COUNT_ARGS() & CONCATENATE() implementations
arm64: smccc: Replace custom COUNT_ARGS() & CONCATENATE()
implementations
genetlink: Replace custom CONCATENATE() implementation
arch/x86/include/asm/rmwcc.h | 11 +++--------
include/kunit/test.h | 1 +
include/linux/args.h | 28 ++++++++++++++++++++++++++++
include/linux/arm-smccc.h | 27 ++++++++++-----------------
include/linux/genl_magic_func.h | 27 ++++++++++++++-------------
include/linux/genl_magic_struct.h | 8 +++-----
include/linux/kernel.h | 7 -------
include/linux/pci.h | 2 +-
include/trace/bpf_probe.h | 2 ++
9 files changed, 62 insertions(+), 51 deletions(-)
create mode 100644 include/linux/args.h
--
2.40.0.1.gaa8946217a0b
Hi All,
Given my on-going work on large anon folios and contpte mappings, I decided it
would be a good idea to start running mm selftests to help guard against
regressions. However, it soon became clear that I couldn't get the suite to run
cleanly on arm64 with a vanilla v6.5-rc1 kernel (perhaps I'm just doing it
wrong??), so got stuck in a rabbit hole trying to debug and fix all the issues.
Some were down to misconfigurations, but I also found a number of issues with
the tests and even a couple of issues with the kernel.
This series aims to fix (most of) the test issues. It applies on top of
v6.5-rc1.
Reproducing
-----------
What follows is a write up of how I'm running the tests and the results I see
with this series applied. I don't yet have a concrete understanding of all of
the remaining failures. So if anyone has any comments on my setup or reasons for
the test failures it would be great to hear.
Source: v6.5-rc1 + this series + [1] + [2]. [1] is a patch from Florent Revest to
fix mdwe mmap_FIXED tests. [2] is a fix for a regression in the kernel that I
found by running `mlock-random-test` and `mlock2-tests`.
Compile the kernel (on arm64 system):
$ make defconfig
$ ./scripts/config --enable CONFIG_SQUASHFS_LZ4
$ ./scripts/config --enable CONFIG_SQUASHFS_LZO
$ ./scripts/config --enable CONFIG_SQUASHFS_XZ
$ ./scripts/config --enable CONFIG_SQUASHFS_ZSTD
$ ./scripts/config --enable CONFIG_XFS_FS
$ ./scripts/config --enable CONFIG_SYSVIPC
$ ./scripts/config --enable CONFIG_USERFAULTFD
$ ./scripts/config --enable CONFIG_TEST_VMALLOC
$ ./scripts/config --enable CONFIG_GUP_TEST
$ ./scripts/config --enable CONFIG_TRANSPARENT_HUGEPAGE
$ ./scripts/config --enable CONFIG_MEM_SOFT_DIRTY
$ make olddefconfig
$ make -s -j`nproc` Image
(In the above case, I'm building/testing a 4K kernel).
Note that it turns out that arm64 doesn't really support ZONE_DEVICE; Although
it defines ARCH_HAS_PTE_DEVMAP, it can't allocate `struct page`s for arbitrary
physical addresses. This means that the TEST_HMM module causes warnings to be
emitted when initializing because it tries to reserve arbitrary PA range then
requests struct page's for them. I haven't fully investigated this yet, but for
now, I'm just deliverately excluding ZONE_DEVICE, (which TEST_HMM depends upon).
This means that the `hmm-tests` selftest gets skipped at runtime.
Compile the tests:
$ make -j`nproc` headers_install
$ make -C tools/testing/selftests TARGETS=mm install INSTALL_PATH=<path/to/install>
Start a VM running the kernel we just compiled:
$ taskset -c 8-15 qemu-system-aarch64 \
-object memory-backend-file,id=mem0,size=6G,mem-path=/hugetlbfs,merge=off,prealloc=on,host-nodes=0,policy=bind,align=1G \
-object memory-backend-file,id=mem1,size=6G,mem-path=/hugetlbfs,merge=off,prealloc=on,host-nodes=0,policy=bind,align=1G \
-nographic -enable-kvm -machine virt,gic-version=3 -cpu max \
-smp 8 -m 12G \
-numa node,memdev=mem0,cpus=0-3,nodeid=0 \
-numa node,memdev=mem1,cpus=4-7,nodeid=1 \
-drive if=virtio,format=raw,file=ubuntu-22.04.xfs \
-object rng-random,filename=/dev/urandom,id=rng0 \
-device virtio-scsi-pci,id=scsi0 \
-netdev user,id=net0,hostfwd=tcp::8022-:22 \
-device virtio-rng-pci,rng=rng0 \
-device virtio-net-pci,netdev=net0 \
-kernel arch/arm64/boot/Image \
-append "earlycon root=/dev/vda2 secretmem.enable hugepagesz=1G hugepages=0:2,1:2 hugepagesz=32M hugepages=0:2,1:2 default_hugepagesz=2M hugepages=0:64,1:64 hugepagesz=64K hugepages=0:2,1:2"
This starts a VM with 2 numa nodes (needed by ksm and migration tests), with 6G
of memory and 4 CPUs on each node. The kernel command line enables secretmem
(needed for `memfd_secret` test), and preallocates a bunch of huge pages
(divined by reading the comments and source for a bunch of tests that require
huge pages). 128M of the default huge page size, and 4 pages of each of the
other sizes appear to be sufficient. I'm allocating half on each numa node.
Once booted, copy the selftests we just compiled onto it.
On the VM, run the tests:
$ cd path/to/selftests
$ sudo ./run_kselftest.sh
or alternatively:
$ cd path/to/selftests/mm
$ sudo ./run_vmtests.sh
Test Results
------------
TOP-LEVEL SUMMARY: PASS=42 SKIP=4 FAIL=2
Only showing nested tests if they are skipped or failed.
[PASS] hugepage-mmap
[PASS] hugepage-shm
[PASS] map_hugetlb
[PASS] hugepage-mremap
[PASS] hugepage-vmemmap
[PASS] hugetlb-madvise
[PASS] map_fixed_noreplace
[PASS] gup_test -u
[PASS] gup_test -a
[PASS] gup_test -ct -F 0x1 0 19 0x1000
[PASS] gup_longterm
[PASS] uffd-unit-tests
[PASS] uffd-stress anon 20 16
[PASS] uffd-stress hugetlb 128 32
[PASS] uffd-stress hugetlb-private 128 32
[PASS] uffd-stress shmem 20 16
[PASS] uffd-stress shmem-private 20 16
[PASS] compaction_test
[PASS] on-fault-limit
[PASS] map_populate
[PASS] mlock-random-test
[PASS] mlock2-tests
[PASS] mrelease_test
[PASS] mremap_test
[PASS] thuge-gen
[PASS] virtual_address_range
[SKIP] va_high_addr_switch.sh
# 4K kernel does not support big enough VA space for test
[SKIP] test_vmalloc.sh smoke
# Test requires test_vmalloc kernel module which isn't present
[PASS] mremap_dontunmap
[SKIP] test_hmm.sh smoke
# Test requires test_hmm kernel module - see ZONE_DEVICE issue above
[PASS] madv_populate
[PASS] test_softdirty
[SKIP] range is not softdirty
[SKIP] MADV_POPULATE_READ
[SKIP] range is not softdirty
[SKIP] MADV_POPULATE_WRITE
[SKIP] range is softdirty
# All skipped because arm64 does not support soft-dirty
[PASS] memfd_secret
[PASS] ksm_tests -M -p 10
[PASS] ksm_tests -U
[PASS] ksm_tests -Z -p 10 -z 0
[PASS] ksm_tests -Z -p 10 -z 1
[PASS] ksm_tests -N -m 1
[PASS] ksm_tests -N -m 0
[PASS] ksm_functional_tests
[SKIP] test_unmerge_uffd_wp
# UFFD_FEATURE_PAGEFAULT_FLAG_WP not available on arm64
[PASS] ksm_functional_tests
[SKIP] test_unmerge_uffd_wp
# UFFD_FEATURE_PAGEFAULT_FLAG_WP not available on arm64
[SKIP] soft-dirty
# Skipped because arm64 does not support soft-dirty
[FAIL] cow
[FAIL] vmsplice() + unmap in child ... with hugetlb
[FAIL] vmsplice() + unmap in child with mprotect() optimization ... with hugetlb
[FAIL] vmsplice() before fork(), unmap in parent after fork() ... with hugetlb
[FAIL] vmsplice() + unmap in parent after fork() ... with hugetlb
# Above are known issues for vmsplice + hugetlb
# Reproduces on x86
[SKIP] Basic COW after fork() ... with swapped-out, PTE-mapped THP
[SKIP] Basic COW after fork() with mprotect() optimization ... with swapped-out, PTE-mapped THP
[SKIP] vmsplice() + unmap in child ... with swapped-out, PTE-mapped THP
[SKIP] vmsplice() + unmap in child with mprotect() optimization ... with swapped-out, PTE-mapped THP
[SKIP] vmsplice() before fork(), unmap in parent after fork() ... with swapped-out, PTE-mapped THP
[SKIP] vmsplice() + unmap in parent after fork() ... with swapped-out, PTE-mapped THP
[SKIP] R/O-mapping a page registered as iouring fixed buffer ... with swapped-out, PTE-mapped THP
[SKIP] fork() with an iouring fixed buffer ... with swapped-out, PTE-mapped THP
[SKIP] R/O GUP pin on R/O-mapped shared page ... with swapped-out, PTE-mapped THP
[SKIP] R/O GUP-fast pin on R/O-mapped shared page ... with swapped-out, PTE-mapped THP
[SKIP] R/O GUP pin on R/O-mapped previously-shared page ... with swapped-out, PTE-mapped THP
[SKIP] R/O GUP-fast pin on R/O-mapped previously-shared page ... with swapped-out, PTE-mapped THP
[SKIP] R/O GUP pin on R/O-mapped exclusive page ... with swapped-out, PTE-mapped THP
[SKIP] R/O GUP-fast pin on R/O-mapped exclusive page ... with swapped-out, PTE-mapped THP
# Above all skipped due to "MADV_PAGEOUT did not work, is swap enabled?"
# swap is enabled though
# Reproduces on x86
[SKIP] Basic COW after fork() when collapsing after fork() (fully shared)
# MADV_COLLAPSE failed: Invalid argument
[PASS] khugepaged
[PASS] transhuge-stress -d 20
[PASS] split_huge_page_test
[FAIL] migration
[FAIL] migration.shared_anon
# move_pages() reports that the requested page was not migrated
# after a few iterations.
[PASS] mkdirty
[PASS] mdwe_test
[1] https://lore.kernel.org/lkml/20230704153630.1591122-3-revest@chromium.org/
[2] https://lore.kernel.org/linux-mm/20230711175020.4091336-1-Liam.Howlett@orac…
Thanks,
Ryan
Ryan Roberts (9):
selftests: Line buffer test program's stdout
selftests/mm: Give scripts execute permission
selftests/mm: Skip soft-dirty tests on arm64
selftests/mm: Enable mrelease_test for arm64
selftests/mm: Fix thuge-gen test bugs
selftests/mm: va_high_addr_switch should skip unsupported arm64
configs
selftests/mm: Make migration test robust to failure
selftests/mm: Optionally pass duration to transhuge-stress
selftests/mm: Run all tests from run_vmtests.sh
tools/testing/selftests/kselftest/runner.sh | 5 +-
tools/testing/selftests/mm/Makefile | 79 ++++++++++---------
.../selftests/mm/charge_reserved_hugetlb.sh | 0
tools/testing/selftests/mm/check_config.sh | 0
.../selftests/mm/hugetlb_reparenting_test.sh | 0
tools/testing/selftests/mm/madv_populate.c | 18 +++--
tools/testing/selftests/mm/migration.c | 14 +++-
tools/testing/selftests/mm/mrelease_test.c | 1 +
tools/testing/selftests/mm/run_vmtests.sh | 23 ++++++
tools/testing/selftests/mm/settings | 2 +-
tools/testing/selftests/mm/soft-dirty.c | 3 +
tools/testing/selftests/mm/test_hmm.sh | 0
tools/testing/selftests/mm/test_vmalloc.sh | 0
tools/testing/selftests/mm/thuge-gen.c | 4 +-
tools/testing/selftests/mm/transhuge-stress.c | 12 ++-
.../selftests/mm/va_high_addr_switch.c | 3 +-
.../selftests/mm/va_high_addr_switch.sh | 0
tools/testing/selftests/mm/vm_util.c | 17 ++++
tools/testing/selftests/mm/vm_util.h | 1 +
.../selftests/mm/write_hugetlb_memory.sh | 0
20 files changed, 127 insertions(+), 55 deletions(-)
mode change 100644 => 100755 tools/testing/selftests/mm/charge_reserved_hugetlb.sh
mode change 100644 => 100755 tools/testing/selftests/mm/check_config.sh
mode change 100644 => 100755 tools/testing/selftests/mm/hugetlb_reparenting_test.sh
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
mode change 100644 => 100755 tools/testing/selftests/mm/test_hmm.sh
mode change 100644 => 100755 tools/testing/selftests/mm/test_vmalloc.sh
mode change 100644 => 100755 tools/testing/selftests/mm/va_high_addr_switch.sh
mode change 100644 => 100755 tools/testing/selftests/mm/write_hugetlb_memory.sh
--
2.25.1
Hi, Willy, Thomas
Thanks very much for your careful review and great suggestions, now, we
get v4 revision of the arch shrink series [1], it mainly include a new
fixup for -O0 under gcc < 11.1.0, the stackprotector support for
_start_c(), new testcases for startup code and two new test targets.
All of the tests passed or skipped (tinyconfig + few options +
qemu-system) for both -Os and -O0:
arch/board | result
------------|------------
arm/versatilepb | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
arm/vexpress-a9 | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
arm/virt | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
aarch64/virt | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
i386/pc | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
x86_64/pc | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
mipsel/malta | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
loongarch64/virt | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
riscv64/virt | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
s390x/s390-ccw-virtio | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
And more, for both -Os and -O0:
$ for r in run-user run-nolibc-test run-libc-test; do make clean > /dev/null; make $r | grep status; done
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 153 passed, 12 skipped, 0 failed => status: warning
// for make run-user, the euid0 and 32bit limit related tests are
// skipped
$ make clean && make run-user
$ grep -i skip run.out
17 chroot_root [SKIPPED]
39 link_dir [SKIPPED]
62 limit_intptr_min_32 [SKIPPED]
63 limit_intptr_max_32 [SKIPPED]
64 limit_uintptr_max_32 [SKIPPED]
65 limit_ptrdiff_min_32 [SKIPPED]
66 limit_ptrdiff_max_32 [SKIPPED]
67 limit_size_max_32 [SKIPPED]
// for run-libc-test, the _auxv variables, euid0, 32bits limit and
// stackprotector related tests are skipped
$ make clean && make run-libc-test
$ grep -i skip run.out
9 environ_auxv [SKIPPED]
10 environ_total [SKIPPED]
12 auxv_addr [SKIPPED]
17 chroot_root [SKIPPED]
39 link_dir [SKIPPED]
62 limit_intptr_min_32 [SKIPPED]
63 limit_intptr_max_32 [SKIPPED]
64 limit_uintptr_max_32 [SKIPPED]
65 limit_ptrdiff_min_32 [SKIPPED]
66 limit_ptrdiff_max_32 [SKIPPED]
67 limit_size_max_32 [SKIPPED]
0 -fstackprotector not supported [SKIPPED]
$ make clean >/dev/null; make run-libc-test CC=/labs/linux-lab/src/examples/musl-install/bin/musl-gcc | grep status
165 test(s): 151 passed, 12 skipped, 2 failed => status: failure
// The failures are expected for musl has disabled both sbrk and brk
// but not the sbrk(0); the _auxv variables, euid0, 32bits limit and
// stackprotector related tests are skipped for musl too
$ grep FAIL -ur run.out
9 sbrk = 1 ENOMEM [FAIL]
10 brk = -1 ENOMEM [FAIL]
$ grep "SKIP" -ur run.out
9 environ_auxv [SKIPPED]
10 environ_total [SKIPPED]
12 auxv_addr [SKIPPED]
17 chroot_root [SKIPPED]
39 link_dir [SKIPPED]
62 limit_intptr_min_32 [SKIPPED]
63 limit_intptr_max_32 [SKIPPED]
64 limit_uintptr_max_32 [SKIPPED]
65 limit_ptrdiff_min_32 [SKIPPED]
66 limit_ptrdiff_max_32 [SKIPPED]
67 limit_size_max_32 [SKIPPED]
0 -fstackprotector not supported [SKIPPED]
For stackprotector, gcc 13.1.0 is used to test on x86_64 standalonely:
$ make run-user CROSS_COMPILE=x86_64-linux- | grep status
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
$ grep stack -ur run.out
0 -fstackprotector [OK]
$ make run-nolibc-test CROSS_COMPILE=x86_64-linux- | grep status
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
$ grep stack -ur run.out
0 -fstackprotector [OK]
Changes from v3 --> v4:
* tools/nolibc: arch-*.h: add missing space after ','
tools/nolibc: fix up startup failures for -O0 under gcc < 11.1.0
Both of the above changes are for _start, it is able to merge them
if necessary.
The first one is old for format errors reported by
scripts/checkpatch.pl
The second one is for -O0 failure under gcc < 11.1.0, applied the
optimize("-Os", "omit-frame-pointer") suggestion from Thomas.
* tools/nolibc: remove the old sys_stat support
As suggested by Willy, Document carefully about the statx supported
Linux version info.
* tools/nolibc: add new crt.h with _start_c
The code is polished carefully for smaller size and better
readability.
* tools/nolibc: stackprotector.h: add empty __stack_chk_init for !_NOLIBC_STACKPROTECTOR
tools/nolibc: crt.h: initialize stack protector
As suggested by Thomas, init stackprotector in _start_c() too.
* tools/nolibc: arm: shrink _start with _start_c
tools/nolibc: aarch64: shrink _start with _start_c
tools/nolibc: i386: shrink _start with _start_c
tools/nolibc: x86_64: shrink _start with _start_c
tools/nolibc: mips: shrink _start with _start_c
tools/nolibc: loongarch: shrink _start with _start_c
tools/nolibc: riscv: shrink _start with _start_c
tools/nolibc: s390: shrink _start with _start_c
Removed the stackprotector initialization from _start too, we
already have it in _start_c().
* selftests/nolibc: add EXPECT_PTRGE, EXPECT_PTRGT, EXPECT_PTRLE, EXPECT_PTRLT
selftests/nolibc: add testcases for startup code
Add a new startup test group to cover the testing of argc,
argv/argv0, envp/environ and _auxv.
Some testcases are enhanced, some are newly added from after the
discussion during v3 review.
* selftests/nolibc: allow run nolibc-test locally
selftests/nolibc: allow test -include /path/to/nolibc.h
Two new test targets are added to cover more scenes.
Hope you like this revisoin ;-)
Next patchset is powerpc & powerpc64 support, after that we will send
the v2 of tinyconfig support, at last the left rv32 patches (mainly
64bit time).
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/20230715100134.GD24086@1wt.eu/
Zhangjin Wu (18):
tools/nolibc: arch-*.h: add missing space after ','
tools/nolibc: fix up startup failures for -O0 under gcc < 11.1.0
tools/nolibc: remove the old sys_stat support
tools/nolibc: add new crt.h with _start_c
tools/nolibc: stackprotector.h: add empty __stack_chk_init for
!_NOLIBC_STACKPROTECTOR
tools/nolibc: crt.h: initialize stack protector
tools/nolibc: arm: shrink _start with _start_c
tools/nolibc: aarch64: shrink _start with _start_c
tools/nolibc: i386: shrink _start with _start_c
tools/nolibc: x86_64: shrink _start with _start_c
tools/nolibc: mips: shrink _start with _start_c
tools/nolibc: loongarch: shrink _start with _start_c
tools/nolibc: riscv: shrink _start with _start_c
tools/nolibc: s390: shrink _start with _start_c
selftests/nolibc: add EXPECT_PTRGE, EXPECT_PTRGT, EXPECT_PTRLE,
EXPECT_PTRLT
selftests/nolibc: add testcases for startup code
selftests/nolibc: allow run nolibc-test locally
selftests/nolibc: allow test -include /path/to/nolibc.h
tools/include/nolibc/Makefile | 1 +
tools/include/nolibc/arch-aarch64.h | 57 +---------
tools/include/nolibc/arch-arm.h | 83 ++-------------
tools/include/nolibc/arch-i386.h | 62 ++---------
tools/include/nolibc/arch-loongarch.h | 46 +-------
tools/include/nolibc/arch-mips.h | 76 ++-----------
tools/include/nolibc/arch-riscv.h | 69 ++----------
tools/include/nolibc/arch-s390.h | 63 ++---------
tools/include/nolibc/arch-x86_64.h | 58 ++--------
tools/include/nolibc/crt.h | 61 +++++++++++
tools/include/nolibc/stackprotector.h | 2 +
tools/include/nolibc/sys.h | 63 ++---------
tools/include/nolibc/types.h | 4 +-
tools/testing/selftests/nolibc/Makefile | 12 +++
tools/testing/selftests/nolibc/nolibc-test.c | 106 ++++++++++++++++++-
15 files changed, 246 insertions(+), 517 deletions(-)
create mode 100644 tools/include/nolibc/crt.h
--
2.25.1
Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Users can now select a
desired address space using a non-zero hint address to mmap. Previously,
requesting the default address space from mmap by passing zero as the hint
address would result in using the largest address space possible. Some
applications depend on empty bits in the virtual address space, like Go and
Java, so this patch provides more flexibility for application developers.
-Charlie
---
v5:
- Minor wording change in documentation
- Change some parenthesis in arch_get_mmap_ macros
- Added case for addr==0 in arch_get_mmap_ because without this, programs would
crash if RLIMIT_STACK was modified before executing the program. This was
tested using the libhugetlbfs tests.
v4:
- Split testcases/document patch into test cases, in-code documentation, and
formal documentation patches
- Modified the mmap_base macro to be more legible and better represent memory
layout
- Fixed documentation to better reflect the implmentation
- Renamed DEFAULT_VA_BITS to MMAP_VA_BITS
- Added additional test case for rlimit changes
---
Charlie Jenkins (4):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Document mmap changes
Documentation/riscv/vm-layout.rst | 22 +++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 20 ++-
arch/riscv/include/asm/processor.h | 46 +++++-
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/.gitignore | 1 +
tools/testing/selftests/riscv/mm/Makefile | 21 +++
.../selftests/riscv/mm/testcases/mmap.c | 133 ++++++++++++++++++
8 files changed, 234 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap.c
--
2.41.0
In this series, Geliang did some refactoring in the mptcp_join.sh file.
Patch 1 reduces the scope of some global env vars, only used by some
tests: easier to deal with.
Patch 2 uses a dedicated env var for fastclose case instead of re-using
addr_nr_ns2 with embedded info, clearer.
Patch 3 is similar but for the fullmesh case.
Patch 4 moves a positional but optional argument of run_tests() to an
env var like it has already been done with the other args, cleaner.
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Geliang Tang (4):
selftests: mptcp: set all env vars as local ones
selftests: mptcp: add fastclose env var
selftests: mptcp: add fullmesh env var
selftests: mptcp: add speed env var
tools/testing/selftests/net/mptcp/mptcp_join.sh | 271 +++++++++++++-----------
1 file changed, 151 insertions(+), 120 deletions(-)
---
base-commit: e0f0a5db5f8c413cbbf48607f711c2a21023ee66
change-id: 20230712-upstream-net-next-20230712-selftests-mptcp-use-local-env-ad964224bc2a
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
This RFC is a follow-up of the discussions taken here:
https://lore.kernel.org/linux-doc/20230704132812.02ba97ba@maurocar-mobl2/T/…
It adds a new extension that allows documenting tests using the same tool we're
using for DRM unit tests at IGT GPU tools: https://gitlab.freedesktop.org/drm/igt-gpu-tools.
While kernel-doc has provided documentation for in-lined functions/struct comments,
it was not meant to document tests.
Tests need to be grouped by the test functions. It should also be possible to produce
other outputs from the documentation, to integrate it with test suites. For instance,
Internally at Intel, we use the comments to generate DOT files hierarchically grouped
per feature categories.
This is just an initial RFC to start discussions around the solution. Before being merged
upstream, we need to define what tags will be used to identify test markups and add
a simple change at kernel-doc to let it ignore such markups.
On this series, we have:
- patch 1:
adding test_list.py as present at the IGT tree, after a patch series to make it
more generic: https://patchwork.freedesktop.org/series/120622/
- patch 2:
adds an example about how tests could be documented. This is a really simple
example, just to test the feature, specially designed to make easy to build just
the test documentation from a single DRM kunit file.
After discussions, my plan is to send a new version addressing the issues, and add
some documentation for DRM and/or i915 kunit tests.
Mauro Carvalho Chehab (2):
docs: add support for documenting kUnit and kSelftests
drm: add documentation for drm_buddy_test kUnit test
Documentation/conf.py | 2 +-
Documentation/index.rst | 2 +-
Documentation/sphinx/test_kdoc.py | 108 ++
Documentation/sphinx/test_list.py | 1288 ++++++++++++++++++++++++
Documentation/tests/index.rst | 6 +
Documentation/tests/kunit.rst | 5 +
drivers/gpu/drm/tests/drm_buddy_test.c | 12 +
7 files changed, 1421 insertions(+), 2 deletions(-)
create mode 100644 Documentation/sphinx/test_kdoc.py
create mode 100644 Documentation/sphinx/test_list.py
create mode 100644 Documentation/tests/index.rst
create mode 100644 Documentation/tests/kunit.rst
--
2.40.1
This series decreases the pcm-test duration in order to avoid timeouts
by first moving the audio stream duration to a variable and subsequently
decreasing it.
Nícolas F. R. A. Prado (2):
kselftest/alsa: pcm-test: Move stream duration and margin to variables
kselftest/alsa: pcm-test: Decrease stream duration from 4 to 2 seconds
tools/testing/selftests/alsa/pcm-test.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--
2.41.0
Since apparently enabling all the KUnit tests shouldn't enable any new
subsystems it is hard to enable the regmap KUnit tests in normal KUnit
testing scenarios that don't enable any drivers. Add a Kconfig option
to help with this and include it in the KUnit all tests config.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
drivers/base/regmap/Kconfig | 12 +++++++++++-
tools/testing/kunit/configs/all_tests.config | 2 ++
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/base/regmap/Kconfig b/drivers/base/regmap/Kconfig
index 0db2021f7477..b1affac70d5d 100644
--- a/drivers/base/regmap/Kconfig
+++ b/drivers/base/regmap/Kconfig
@@ -4,7 +4,7 @@
# subsystems should select the appropriate symbols.
config REGMAP
- bool "Register Map support" if KUNIT_ALL_TESTS
+ bool
default y if (REGMAP_I2C || REGMAP_SPI || REGMAP_SPMI || REGMAP_W1 || REGMAP_AC97 || REGMAP_MMIO || REGMAP_IRQ || REGMAP_SOUNDWIRE || REGMAP_SOUNDWIRE_MBQ || REGMAP_SCCB || REGMAP_I3C || REGMAP_SPI_AVMM || REGMAP_MDIO || REGMAP_FSI)
select IRQ_DOMAIN if REGMAP_IRQ
select MDIO_BUS if REGMAP_MDIO
@@ -23,6 +23,16 @@ config REGMAP_KUNIT
default KUNIT_ALL_TESTS
select REGMAP_RAM
+config REGMAP_BUILD
+ bool "Enable regmap build"
+ depends on KUNIT
+ select REGMAP
+ help
+ This option exists purely to allow the regmap KUnit tests to
+ be enabled without having to enable some driver that uses
+ regmap due to unfortunate issues with how KUnit tests are
+ normally enabled.
+
config REGMAP_AC97
tristate
diff --git a/tools/testing/kunit/configs/all_tests.config b/tools/testing/kunit/configs/all_tests.config
index 0393940c706a..873f3e06ccad 100644
--- a/tools/testing/kunit/configs/all_tests.config
+++ b/tools/testing/kunit/configs/all_tests.config
@@ -33,5 +33,7 @@ CONFIG_DAMON_PADDR=y
CONFIG_DEBUG_FS=y
CONFIG_DAMON_DBGFS=y
+CONFIG_REGMAP_BUILD=y
+
CONFIG_SECURITY=y
CONFIG_SECURITY_APPARMOR=y
---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230701-regmap-kunit-enable-a08718e77dd4
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Delete a duplicate assignment from this function implementation.
The note means ppm is average of the two actual freq samples.
But ppm have a duplicate assignment.
Signed-off-by: Minjie Du <duminjie(a)vivo.com>
---
tools/testing/selftests/timers/raw_skew.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/tools/testing/selftests/timers/raw_skew.c b/tools/testing/selftests/timers/raw_skew.c
index 5beceeed0..6eba203f9 100644
--- a/tools/testing/selftests/timers/raw_skew.c
+++ b/tools/testing/selftests/timers/raw_skew.c
@@ -129,8 +129,7 @@ int main(int argc, char **argv)
printf("%lld.%i(est)", eppm/1000, abs((int)(eppm%1000)));
/* Avg the two actual freq samples adjtimex gave us */
- ppm = (tx1.freq + tx2.freq) * 1000 / 2;
- ppm = (long long)tx1.freq * 1000;
+ ppm = (long long)(tx1.freq + tx2.freq) * 1000 / 2;
ppm = shift_right(ppm, 16);
printf(" %lld.%i(act)", ppm/1000, abs((int)(ppm%1000)));
--
2.39.0
From: Jeff Xu <jeffxu(a)google.com>
Add documentation for sysctl vm.memfd_noexec
Link:https://lore.kernel.org/linux-mm/CABi2SkXUX_QqTQ10Yx9bBUGpN1wByOi_=gZU…
Reported-by: Dominique Martinet <asmadeus(a)codewreck.org>
Signed-off-by: Jeff Xu <jeffxu(a)google.com>
---
Documentation/admin-guide/sysctl/vm.rst | 29 +++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 45ba1f4dc004..71923c3d7044 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -424,6 +424,35 @@ e.g., up to one or two maps per allocation.
The default value is 65530.
+memfd_noexec:
+=============
+This pid namespaced sysctl controls memfd_create().
+
+The new MFD_NOEXEC_SEAL and MFD_EXEC flags of memfd_create() allows
+application to set executable bit at creation time.
+
+When MFD_NOEXEC_SEAL is set, memfd is created without executable bit
+(mode:0666), and sealed with F_SEAL_EXEC, so it can't be chmod to
+be executable (mode: 0777) after creation.
+
+when MFD_EXEC flag is set, memfd is created with executable bit
+(mode:0777), this is the same as the old behavior of memfd_create.
+
+The new pid namespaced sysctl vm.memfd_noexec has 3 values:
+0: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
+ MFD_EXEC was set.
+1: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL acts like
+ MFD_NOEXEC_SEAL was set.
+2: memfd_create() without MFD_NOEXEC_SEAL will be rejected.
+
+The default value is 0.
+
+Once set, it can't be downgraded at runtime, i.e. 2=>1, 1=>0
+are denied.
+
+This is pid namespaced sysctl, child processes inherit the parent
+process's pid at the time of fork. Changes to the parent process
+after fork are not automatically propagated to the child process.
memory_failure_early_kill:
==========================
--
2.41.0.255.g8b1d071c50-goog
/proc/$PID/net currently allows the setting of file attributes,
in contrast to other /proc/$PID/ files and directories.
This would break the nolibc testsuite so the first patch in the series
removes the offending testcase.
The "fix" for nolibc-test is intentionally kept trivial as the series
will most likely go through the filesystem tree and if conflicts arise,
it is obvious on how to resolve them.
Technically this can lead to breakage of nolibc-test if an old
nolibc-test is used with a newer kernel containing the fix.
Note:
Except for /proc itself this is the only "struct inode_operations" in
fs/proc/ that is missing an implementation of setattr().
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (2):
selftests/nolibc: drop test chmod_net
proc: use generic setattr() for /proc/$PID/net
fs/proc/proc_net.c | 1 +
tools/testing/selftests/nolibc/nolibc-test.c | 1 -
2 files changed, 1 insertion(+), 1 deletion(-)
---
base-commit: a92b7d26c743b9dc06d520f863d624e94978a1d9
change-id: 20230624-proc-net-setattr-8f0a6b8eb2f5
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
In the case where a sysfs file cannot be opened the error return path
fcloses file pointer fpl, however, fpl has already been closed in the
previous stanza. Fix the double fclose by removing it.
Fixes: 10b98a4db11a ("selftests: ALSA: Add test for the 'pcmtest' driver")
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/alsa/test-pcmtest-driver.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/tools/testing/selftests/alsa/test-pcmtest-driver.c b/tools/testing/selftests/alsa/test-pcmtest-driver.c
index 71931b240a83..357adc722cba 100644
--- a/tools/testing/selftests/alsa/test-pcmtest-driver.c
+++ b/tools/testing/selftests/alsa/test-pcmtest-driver.c
@@ -47,10 +47,8 @@ static int read_patterns(void)
sprintf(pf, "/sys/kernel/debug/pcmtest/fill_pattern%d", i);
fp = fopen(pf, "r");
- if (!fp) {
- fclose(fpl);
+ if (!fp)
return -1;
- }
fread(patterns[i].buf, 1, patterns[i].len, fp);
fclose(fp);
}
--
2.39.2
The checksum_32 code was originally written to only handle 2-byte
aligned buffers, but was later extended to support arbitrary alignment.
However, the non-PPro variant doesn't apply the carry before jumping to
the 2- or 4-byte aligned versions, which clear CF.
This causes the new checksum_kunit test to fail, as it runs with a large
number of different possible alignments and both with and without
carries.
For example:
./tools/testing/kunit/kunit.py run --arch i386 --kconfig_add CONFIG_M486=y checksum
Gives:
KTAP version 1
# Subtest: checksum
1..3
ok 1 test_csum_fixed_random_inputs
# test_csum_all_carry_inputs: ASSERTION FAILED at lib/checksum_kunit.c:267
Expected result == expec, but
result == 65281 (0xff01)
expec == 65280 (0xff00)
not ok 2 test_csum_all_carry_inputs
# test_csum_no_carry_inputs: ASSERTION FAILED at lib/checksum_kunit.c:314
Expected result == expec, but
result == 65535 (0xffff)
expec == 65534 (0xfffe)
not ok 3 test_csum_no_carry_inputs
With this patch, it passes.
KTAP version 1
# Subtest: checksum
1..3
ok 1 test_csum_fixed_random_inputs
ok 2 test_csum_all_carry_inputs
ok 3 test_csum_no_carry_inputs
I also tested it on a real 486DX2, with the same results.
Signed-off-by: David Gow <davidgow(a)google.com>
---
This is a follow-up to the UML patch to use the common 32-bit x86
checksum implementations:
https://lore.kernel.org/linux-um/20230704083022.692368-2-davidgow@google.co…
---
arch/x86/lib/checksum_32.S | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/lib/checksum_32.S b/arch/x86/lib/checksum_32.S
index 23318c338db0..128287cea42d 100644
--- a/arch/x86/lib/checksum_32.S
+++ b/arch/x86/lib/checksum_32.S
@@ -62,6 +62,7 @@ SYM_FUNC_START(csum_partial)
jl 8f
movzbl (%esi), %ebx
adcl %ebx, %eax
+ adcl $0, %eax
roll $8, %eax
inc %esi
testl $2, %esi
--
2.41.0.255.g8b1d071c50-goog
While it probably doesn't make a huge difference given the current KUnit
coverage we will get the best coverage of arm64 architecture features if
we specify -cpu=max rather than picking a specific CPU, this will include
all architecture features that qemu supports including many which have not
yet made it into physical implementations.
Due to performance issues emulating the architected pointer authentication
algorithm it is recommended to use the implementation defined algorithm
that qemu has instead, this should make no meaningful difference to the
coverage and will run the tests faster.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/kunit/qemu_configs/arm64.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/kunit/qemu_configs/arm64.py b/tools/testing/kunit/qemu_configs/arm64.py
index 67d04064f785..d3ff27024755 100644
--- a/tools/testing/kunit/qemu_configs/arm64.py
+++ b/tools/testing/kunit/qemu_configs/arm64.py
@@ -9,4 +9,4 @@ CONFIG_SERIAL_AMBA_PL011_CONSOLE=y''',
qemu_arch='aarch64',
kernel_path='arch/arm64/boot/Image.gz',
kernel_command_line='console=ttyAMA0',
- extra_qemu_params=['-machine', 'virt', '-cpu', 'cortex-a57'])
+ extra_qemu_params=['-machine', 'virt', '-cpu', 'max,pauth-impdef=on'])
---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230702-kunit-arm64-cpu-max-7e3aa5f02fb2
Best regards,
--
Mark Brown <broonie(a)kernel.org>
=== Context ===
In the context of a middlebox, fragmented packets are tricky to handle.
The full 5-tuple of a packet is often only available in the first
fragment which makes enforcing consistent policy difficult. There are
really only two stateless options, neither of which are very nice:
1. Enforce policy on first fragment and accept all subsequent fragments.
This works but may let in certain attacks or allow data exfiltration.
2. Enforce policy on first fragment and drop all subsequent fragments.
This does not really work b/c some protocols may rely on
fragmentation. For example, DNS may rely on oversized UDP packets for
large responses.
So stateful tracking is the only sane option. RFC 8900 [0] calls this
out as well in section 6.3:
Middleboxes [...] should process IP fragments in a manner that is
consistent with [RFC0791] and [RFC8200]. In many cases, middleboxes
must maintain state in order to achieve this goal.
=== BPF related bits ===
Policy has traditionally been enforced from XDP/TC hooks. Both hooks
run before kernel reassembly facilities. However, with the new
BPF_PROG_TYPE_NETFILTER, we can rather easily hook into existing
netfilter reassembly infra.
The basic idea is we bump a refcnt on the netfilter defrag module and
then run the bpf prog after the defrag module runs. This allows bpf
progs to transparently see full, reassembled packets. The nice thing
about this is that progs don't have to carry around logic to detect
fragments.
=== Changelog ===
Changes from v3:
* Correctly initialize `addrlen` stack var for recvmsg()
Changes from v2:
* module_put() if ->enable() fails
* Fix CI build errors
Changes from v1:
* Drop bpf_program__attach_netfilter() patches
* static -> static const where appropriate
* Fix callback assignment order during registration
* Only request_module() if callbacks are missing
* Fix retval when modprobe fails in userspace
* Fix v6 defrag module name (nf_defrag_ipv6_hooks -> nf_defrag_ipv6)
* Simplify priority checking code
* Add warning if module doesn't assign callbacks in the future
* Take refcnt on module while defrag link is active
[0]: https://datatracker.ietf.org/doc/html/rfc8900
Daniel Xu (6):
netfilter: defrag: Add glue hooks for enabling/disabling defrag
netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
netfilter: bpf: Prevent defrag module unload while link active
bpf: selftests: Support not connecting client socket
bpf: selftests: Support custom type and proto for client sockets
bpf: selftests: Add defrag selftests
include/linux/netfilter.h | 15 +
include/uapi/linux/bpf.h | 5 +
net/ipv4/netfilter/nf_defrag_ipv4.c | 17 +-
net/ipv6/netfilter/nf_defrag_ipv6_hooks.c | 11 +
net/netfilter/core.c | 6 +
net/netfilter/nf_bpf_link.c | 150 +++++++++-
tools/include/uapi/linux/bpf.h | 5 +
tools/testing/selftests/bpf/Makefile | 4 +-
.../selftests/bpf/generate_udp_fragments.py | 90 ++++++
.../selftests/bpf/ip_check_defrag_frags.h | 57 ++++
tools/testing/selftests/bpf/network_helpers.c | 26 +-
tools/testing/selftests/bpf/network_helpers.h | 3 +
.../bpf/prog_tests/ip_check_defrag.c | 283 ++++++++++++++++++
.../selftests/bpf/progs/ip_check_defrag.c | 104 +++++++
14 files changed, 754 insertions(+), 22 deletions(-)
create mode 100755 tools/testing/selftests/bpf/generate_udp_fragments.py
create mode 100644 tools/testing/selftests/bpf/ip_check_defrag_frags.h
create mode 100644 tools/testing/selftests/bpf/prog_tests/ip_check_defrag.c
create mode 100644 tools/testing/selftests/bpf/progs/ip_check_defrag.c
--
2.41.0
The #endif is the wrong side of a } causing a build failure when
__NR_userfaultfd is not defined. Fix this by moving the #end to
enclose the }
Fixes: 9eac40fc0cc7 ("selftests/mm: mkdirty: test behavior of (pte|pmd)_mkdirty on VMAs without write permissions")
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/mm/mkdirty.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/mkdirty.c b/tools/testing/selftests/mm/mkdirty.c
index 6d71d972997b..301abb99e027 100644
--- a/tools/testing/selftests/mm/mkdirty.c
+++ b/tools/testing/selftests/mm/mkdirty.c
@@ -321,8 +321,8 @@ static void test_uffdio_copy(void)
munmap:
munmap(dst, pagesize);
free(src);
-#endif /* __NR_userfaultfd */
}
+#endif /* __NR_userfaultfd */
int main(void)
{
--
2.39.2
Awk is already called for /sys/block/zram#/mm_stat parsing, so use it
to also perform the floating point capacity vs consumption ratio
calculations. The test output is unchanged.
This allows bc to be dropped as a dependency for the zram selftests.
The documented free dependency can also be removed following
d18da7ec37195 ("selftests/zram01.sh: Fix compression ratio calculation")
Signed-off-by: David Disseldorp <ddiss(a)suse.de>
---
tools/testing/selftests/zram/README | 2 --
tools/testing/selftests/zram/zram01.sh | 18 ++++++++----------
2 files changed, 8 insertions(+), 12 deletions(-)
v2: drop unused dependencies from selftests/zram/README
diff --git a/tools/testing/selftests/zram/README b/tools/testing/selftests/zram/README
index 110b34834a6fa..510ca5a1087f5 100644
--- a/tools/testing/selftests/zram/README
+++ b/tools/testing/selftests/zram/README
@@ -27,9 +27,7 @@ zram01.sh: creates general purpose ram disks with ext4 filesystems
zram02.sh: creates block device for swap
Commands required for testing:
- - bc
- dd
- - free
- awk
- mkswap
- swapon
diff --git a/tools/testing/selftests/zram/zram01.sh b/tools/testing/selftests/zram/zram01.sh
index 8f4affe34f3e4..df1b1d4158989 100755
--- a/tools/testing/selftests/zram/zram01.sh
+++ b/tools/testing/selftests/zram/zram01.sh
@@ -33,7 +33,7 @@ zram_algs="lzo"
zram_fill_fs()
{
- for i in $(seq $dev_start $dev_end); do
+ for ((i = $dev_start; i <= $dev_end && !ERR_CODE; i++)); do
echo "fill zram$i..."
local b=0
while [ true ]; do
@@ -44,15 +44,13 @@ zram_fill_fs()
done
echo "zram$i can be filled with '$b' KB"
- local mem_used_total=`awk '{print $3}' "/sys/block/zram$i/mm_stat"`
- local v=$((100 * 1024 * $b / $mem_used_total))
- if [ "$v" -lt 100 ]; then
- echo "FAIL compression ratio: 0.$v:1"
- ERR_CODE=-1
- return
- fi
-
- echo "zram compression ratio: $(echo "scale=2; $v / 100 " | bc):1: OK"
+ awk -v b="$b" '{ v = (100 * 1024 * b / $3) } END {
+ if (v < 100) {
+ printf "FAIL compression ratio: 0.%u:1\n", v
+ exit 1
+ }
+ printf "zram compression ratio: %.2f:1: OK\n", v / 100
+ }' "/sys/block/zram$i/mm_stat" || ERR_CODE=-1
done
}
--
2.35.3
Dzień dobry,
zapoznałem się z Państwa ofertą i z przyjemnością przyznaję, że przyciąga uwagę i zachęca do dalszych rozmów.
Pomyślałem, że może mógłbym mieć swój wkład w Państwa rozwój i pomóc dotrzeć z tą ofertą do większego grona odbiorców. Pozycjonuję strony www, dzięki czemu generują świetny ruch w sieci.
Możemy porozmawiać w najbliższym czasie?
Pozdrawiam
Adam Charachuta
*Changes in v24*:
- Rebase on top of next-20230710
- Place WP markers in case of hole as well
*Changes in v23*:
- Set vec_buf_index in loop only when vec_buf_index is set
- Return -EFAULT instead of -EINVAL if vec is NULL
- Correctly return the walk ending address to the page granularity
*Changes in v22*:
- Interface change:
- Replace [start start + len) with [start, end)
- Return the ending address of the address walk in start
*Changes in v21*:
- Abort walk instead of returning error if WP is to be performed on
partial hugetlb
*Changes in v20*
- Correct PAGE_IS_FILE and add PAGE_IS_PFNZERO
*Changes in v19*
- Minor changes and interface updates
*Changes in v18*
- Rebase on top of next-20230613
- Minor updates
*Changes in v17*
- Rebase on top of next-20230606
- Minor improvements in PAGEMAP_SCAN IOCTL patch
*Changes in v16*
- Fix a corner case
- Add exclusive PM_SCAN_OP_WP back
*Changes in v15*
- Build fix (Add missed build fix in RESEND)
*Changes in v14*
- Fix build error caused by #ifdef added at last minute in some configs
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 58 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 583 +++++++
fs/userfaultfd.c | 26 +-
include/linux/hugetlb.h | 1 +
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 55 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 34 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 55 +
tools/testing/selftests/mm/.gitignore | 2 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1464 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
16 files changed, 2354 insertions(+), 24 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
A few cleanups to the existing test logic.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (4):
selftests/nolibc: make evaluation of test conditions
selftests/nolibc: simplify status printing
selftests/nolibc: simplify status argument
selftests/nolibc: avoid gaps in test numbers
tools/testing/selftests/nolibc/nolibc-test.c | 201 +++++++++++----------------
1 file changed, 85 insertions(+), 116 deletions(-)
---
base-commit: 078cda365b3f47f61047a08230925a1478e9a1c8
change-id: 20230711-nolibc-sizeof-long-gaps-0f28cba7ee4d
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
We want to replace iptables TPROXY with a BPF program at TC ingress.
To make this work in all cases we need to assign a SO_REUSEPORT socket
to an skb, which is currently prohibited. This series adds support for
such sockets to bpf_sk_assing.
I did some refactoring to cut down on the amount of duplicate code. The
key to this is to use INDIRECT_CALL in the reuseport helpers. To show
that this approach is not just beneficial to TC sk_assign I removed
duplicate code for bpf_sk_lookup as well.
Joint work with Daniel Borkmann.
Signed-off-by: Lorenz Bauer <lmb(a)isovalent.com>
---
Changes in v5:
- Drop reuse_sk == sk check in inet[6]_steal_stock (Kuniyuki)
- Link to v4: https://lore.kernel.org/r/20230613-so-reuseport-v4-0-4ece76708bba@isovalent…
Changes in v4:
- WARN_ON_ONCE if reuseport socket is refcounted (Kuniyuki)
- Use inet[6]_ehashfn_t to shorten function declarations (Kuniyuki)
- Shuffle documentation patch around (Kuniyuki)
- Update commit message to explain why IPv6 needs EXPORT_SYMBOL
- Link to v3: https://lore.kernel.org/r/20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent…
Changes in v3:
- Fix warning re udp_ehashfn and udp6_ehashfn (Simon)
- Return higher scoring connected UDP reuseport sockets (Kuniyuki)
- Fix ipv6 module builds
- Link to v2: https://lore.kernel.org/r/20230613-so-reuseport-v2-0-b7c69a342613@isovalent…
Changes in v2:
- Correct commit abbrev length (Kuniyuki)
- Reduce duplication (Kuniyuki)
- Add checks on sk_state (Martin)
- Split exporting inet[6]_lookup_reuseport into separate patch (Eric)
---
Daniel Borkmann (1):
selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper
Lorenz Bauer (6):
udp: re-score reuseport groups when connected sockets are present
net: export inet_lookup_reuseport and inet6_lookup_reuseport
net: remove duplicate reuseport_lookup functions
net: document inet[6]_lookup_reuseport sk_state requirements
net: remove duplicate sk_lookup helpers
bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign
include/net/inet6_hashtables.h | 81 ++++++++-
include/net/inet_hashtables.h | 74 +++++++-
include/net/sock.h | 7 +-
include/uapi/linux/bpf.h | 3 -
net/core/filter.c | 2 -
net/ipv4/inet_hashtables.c | 68 ++++---
net/ipv4/udp.c | 88 ++++-----
net/ipv6/inet6_hashtables.c | 71 +++++---
net/ipv6/udp.c | 98 ++++------
tools/include/uapi/linux/bpf.h | 3 -
tools/testing/selftests/bpf/network_helpers.c | 3 +
.../selftests/bpf/prog_tests/assign_reuse.c | 197 +++++++++++++++++++++
.../selftests/bpf/progs/test_assign_reuse.c | 142 +++++++++++++++
13 files changed, 658 insertions(+), 179 deletions(-)
---
base-commit: c20f9cef725bc6b19efe372696e8000fb5af0d46
change-id: 20230613-so-reuseport-e92c526173ee
Best regards,
--
Lorenz Bauer <lmb(a)isovalent.com>
The build failure reported in [1] occurred because commit 9fc96c7c19df
("selftests: error out if kernel header files are not yet built") added
a new "kernel_header_files" dependency to "all", and that triggered
another, pre-existing problem. Specifically, the arm64 selftests
override the emit_tests target, and that override improperly declares
itself to depend upon the "all" target.
This is a problem because the "emit_tests" target in lib.mk was not
intended to be overridden. emit_tests is a very simple, sequential build
target that was originally invoked from the "install" target, which in
turn, depends upon "all".
That approach worked for years. But with 9fc96c7c19df in place,
emit_tests failed, because it does not set up all of the elaborate
things that "install" does. And that caused the new
"kernel_header_files" target (which depends upon $(KBUILD_OUTPUT) being
correct) to fail.
Some detail: The "all" target is .PHONY. Therefore, each target that
depends on "all" will cause it to be invoked again, and because
dependencies are managed quite loosely in the selftests Makefiles, many
things will run, even "all" is invoked several times in immediate
succession. So this is not a "real" failure, as far as build steps go:
everything gets built, but "all" reports a problem when invoked a second
time from a bad environment.
To fix this, simply remove the unnecessary "all" dependency from the
overridden emit_tests target. The dependency is still effectively
honored, because again, invocation is via "install", which also depends
upon "all".
An alternative approach would be to harden the emit_tests target so that
it can depend upon "all", but that's a lot more complicated and hard to
get right, and doesn't seem worth it, especially given that emit_tests
should probably not be overridden at all.
[1] https://lore.kernel.org/20230710-kselftest-fix-arm64-v1-1-48e872844f25@kern…
Fixes: 9fc96c7c19df ("selftests: error out if kernel header files are not yet built")
Reported-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
---
tools/testing/selftests/arm64/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/Makefile b/tools/testing/selftests/arm64/Makefile
index 9460cbe81bcc..ace8b67fb22d 100644
--- a/tools/testing/selftests/arm64/Makefile
+++ b/tools/testing/selftests/arm64/Makefile
@@ -42,7 +42,7 @@ run_tests: all
done
# Avoid any output on non arm64 on emit_tests
-emit_tests: all
+emit_tests:
@for DIR in $(ARM64_SUBTARGETS); do \
BUILD_TARGET=$(OUTPUT)/$$DIR; \
make OUTPUT=$$BUILD_TARGET -C $$DIR $@; \
base-commit: d5fe758c21f4770763ae4c05580be239be18947d
--
2.41.0
v4:
- [v3] https://lore.kernel.org/lkml/20230627005529.1564984-1-longman@redhat.com/
- Fix compilation problem reported by kernel test robot.
v3:
- [v2] https://lore.kernel.org/lkml/20230531163405.2200292-1-longman@redhat.com/
- Change the new control file from root-only "cpuset.cpus.reserve" to
non-root "cpuset.cpus.exclusive" which lists the set of exclusive
CPUs distributed down the hierarchy.
- Add a patch to restrict boot-time isolated CPUs to isolated
partitions only.
- Update the test_cpuset_prs.sh test script and documentation
accordingly.
This patch series introduces a new cpuset control file
"cpuset.cpus.exclusive" which must be a subset of "cpuset.cpus"
and the parent's "cpuset.cpus.exclusive". This control file lists
the exclusive CPUs to be distributed down the hierarchy. Any one
of the exclusive CPUs can only be distributed to at most one child
cpuset. Unlike "cpuset.cpus", invalid input to "cpuset.cpus.exclusive"
will be rejected with an error. This new control file has no effect on
the behavior of the cpuset until it turns into a partition root. At that
point, its effective CPUs will be set to its exclusive CPUs unless some
of them are offline.
This patch series also introduces a new category of cpuset partition
called remote partitions. The existing partition category where the
partition roots have to be clustered around the root cgroup in a
hierarchical way is now referred to as local partitions.
A remote partition can be formed far from the root cgroup
with no partition root parent. While local partitions can be
created without touching "cpuset.cpus.exclusive" as it can be set
automatically if a cpuset becomes a local partition root. Properly set
"cpuset.cpus.exclusive" values down the hierarchy are required to create
a remote partition.
Both scheduling and isolated partitions can be formed in a remote
partition. A local partition can be created under a remote partition.
A remote partition, however, cannot be formed under a local partition
for now.
Modern container orchestration tools like Kubernetes use the cgroup
hierarchy to manage different containers. And it is relying on other
middleware like systemd to help managing it. If a container needs to
use isolated CPUs, it is hard to get those with the local partitions
as it will require the administrative parent cgroup to be a partition
root too which tool like systemd may not be ready to manage.
With this patch series, we allow the creation of remote partition
far from the root. The container management tool can manage the
"cpuset.cpus.exclusive" file without impacting the other cpuset
files that are managed by other middlewares. Of course, invalid
"cpuset.cpus.exclusive" values will be rejected and changes to
"cpuset.cpus" can affect the value of "cpuset.cpus.exclusive" due to
the requirement that it has to be a subset of the former control file.
Waiman Long (9):
cgroup/cpuset: Inherit parent's load balance state in v2
cgroup/cpuset: Extract out CS_CPU_EXCLUSIVE & CS_SCHED_LOAD_BALANCE
handling
cgroup/cpuset: Improve temporary cpumasks handling
cgroup/cpuset: Allow suppression of sched domain rebuild in
update_cpumasks_hier()
cgroup/cpuset: Add cpuset.cpus.exclusive for v2
cgroup/cpuset: Introduce remote partition
cgroup/cpuset: Check partition conflict with housekeeping setup
cgroup/cpuset: Documentation update for partition
cgroup/cpuset: Extend test_cpuset_prs.sh to test remote partition
Documentation/admin-guide/cgroup-v2.rst | 100 +-
kernel/cgroup/cpuset.c | 1347 ++++++++++++-----
.../selftests/cgroup/test_cpuset_prs.sh | 398 +++--
3 files changed, 1291 insertions(+), 554 deletions(-)
--
2.31.1
We want to replace iptables TPROXY with a BPF program at TC ingress.
To make this work in all cases we need to assign a SO_REUSEPORT socket
to an skb, which is currently prohibited. This series adds support for
such sockets to bpf_sk_assing.
I did some refactoring to cut down on the amount of duplicate code. The
key to this is to use INDIRECT_CALL in the reuseport helpers. To show
that this approach is not just beneficial to TC sk_assign I removed
duplicate code for bpf_sk_lookup as well.
Joint work with Daniel Borkmann.
Signed-off-by: Lorenz Bauer <lmb(a)isovalent.com>
---
Changes in v4:
- WARN_ON_ONCE if reuseport socket is refcounted (Kuniyuki)
- Use inet[6]_ehashfn_t to shorten function declarations (Kuniyuki)
- Shuffle documentation patch around (Kuniyuki)
- Update commit message to explain why IPv6 needs EXPORT_SYMBOL
- Link to v3: https://lore.kernel.org/r/20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent…
Changes in v3:
- Fix warning re udp_ehashfn and udp6_ehashfn (Simon)
- Return higher scoring connected UDP reuseport sockets (Kuniyuki)
- Fix ipv6 module builds
- Link to v2: https://lore.kernel.org/r/20230613-so-reuseport-v2-0-b7c69a342613@isovalent…
Changes in v2:
- Correct commit abbrev length (Kuniyuki)
- Reduce duplication (Kuniyuki)
- Add checks on sk_state (Martin)
- Split exporting inet[6]_lookup_reuseport into separate patch (Eric)
---
Daniel Borkmann (1):
selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper
Lorenz Bauer (6):
udp: re-score reuseport groups when connected sockets are present
net: export inet_lookup_reuseport and inet6_lookup_reuseport
net: remove duplicate reuseport_lookup functions
net: document inet[6]_lookup_reuseport sk_state requirements
net: remove duplicate sk_lookup helpers
bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign
include/net/inet6_hashtables.h | 81 ++++++++-
include/net/inet_hashtables.h | 74 +++++++-
include/net/sock.h | 7 +-
include/uapi/linux/bpf.h | 3 -
net/core/filter.c | 2 -
net/ipv4/inet_hashtables.c | 67 ++++---
net/ipv4/udp.c | 88 ++++-----
net/ipv6/inet6_hashtables.c | 70 +++++---
net/ipv6/udp.c | 98 ++++------
tools/include/uapi/linux/bpf.h | 3 -
tools/testing/selftests/bpf/network_helpers.c | 3 +
.../selftests/bpf/prog_tests/assign_reuse.c | 197 +++++++++++++++++++++
.../selftests/bpf/progs/test_assign_reuse.c | 142 +++++++++++++++
13 files changed, 656 insertions(+), 179 deletions(-)
---
base-commit: 970308a7b544fa1c7ee98a2721faba3765be8dd8
change-id: 20230613-so-reuseport-e92c526173ee
Best regards,
--
Lorenz Bauer <lmb(a)isovalent.com>
=== Context ===
In the context of a middlebox, fragmented packets are tricky to handle.
The full 5-tuple of a packet is often only available in the first
fragment which makes enforcing consistent policy difficult. There are
really only two stateless options, neither of which are very nice:
1. Enforce policy on first fragment and accept all subsequent fragments.
This works but may let in certain attacks or allow data exfiltration.
2. Enforce policy on first fragment and drop all subsequent fragments.
This does not really work b/c some protocols may rely on
fragmentation. For example, DNS may rely on oversized UDP packets for
large responses.
So stateful tracking is the only sane option. RFC 8900 [0] calls this
out as well in section 6.3:
Middleboxes [...] should process IP fragments in a manner that is
consistent with [RFC0791] and [RFC8200]. In many cases, middleboxes
must maintain state in order to achieve this goal.
=== BPF related bits ===
Policy has traditionally been enforced from XDP/TC hooks. Both hooks
run before kernel reassembly facilities. However, with the new
BPF_PROG_TYPE_NETFILTER, we can rather easily hook into existing
netfilter reassembly infra.
The basic idea is we bump a refcnt on the netfilter defrag module and
then run the bpf prog after the defrag module runs. This allows bpf
progs to transparently see full, reassembled packets. The nice thing
about this is that progs don't have to carry around logic to detect
fragments.
=== Changelog ===
Changes from v2:
* module_put() if ->enable() fails
* Fix CI build errors
Changes from v1:
* Drop bpf_program__attach_netfilter() patches
* static -> static const where appropriate
* Fix callback assignment order during registration
* Only request_module() if callbacks are missing
* Fix retval when modprobe fails in userspace
* Fix v6 defrag module name (nf_defrag_ipv6_hooks -> nf_defrag_ipv6)
* Simplify priority checking code
* Add warning if module doesn't assign callbacks in the future
* Take refcnt on module while defrag link is active
[0]: https://datatracker.ietf.org/doc/html/rfc8900
Daniel Xu (6):
netfilter: defrag: Add glue hooks for enabling/disabling defrag
netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
netfilter: bpf: Prevent defrag module unload while link active
bpf: selftests: Support not connecting client socket
bpf: selftests: Support custom type and proto for client sockets
bpf: selftests: Add defrag selftests
include/linux/netfilter.h | 15 +
include/uapi/linux/bpf.h | 5 +
net/ipv4/netfilter/nf_defrag_ipv4.c | 17 +-
net/ipv6/netfilter/nf_defrag_ipv6_hooks.c | 11 +
net/netfilter/core.c | 6 +
net/netfilter/nf_bpf_link.c | 150 +++++++++-
tools/include/uapi/linux/bpf.h | 5 +
tools/testing/selftests/bpf/Makefile | 4 +-
.../selftests/bpf/generate_udp_fragments.py | 90 ++++++
.../selftests/bpf/ip_check_defrag_frags.h | 57 ++++
tools/testing/selftests/bpf/network_helpers.c | 26 +-
tools/testing/selftests/bpf/network_helpers.h | 3 +
.../bpf/prog_tests/ip_check_defrag.c | 282 ++++++++++++++++++
.../selftests/bpf/progs/ip_check_defrag.c | 104 +++++++
14 files changed, 753 insertions(+), 22 deletions(-)
create mode 100755 tools/testing/selftests/bpf/generate_udp_fragments.py
create mode 100644 tools/testing/selftests/bpf/ip_check_defrag_frags.h
create mode 100644 tools/testing/selftests/bpf/prog_tests/ip_check_defrag.c
create mode 100644 tools/testing/selftests/bpf/progs/ip_check_defrag.c
--
2.41.0
On Mon, 10 Jul 2023 15:07:30 -0400
Steven Rostedt <rostedt(a)goodmis.org> wrote:
> On Mon, 10 Jul 2023 15:06:06 -0400
> Steven Rostedt <rostedt(a)goodmis.org> wrote:
>
> > > Something was broken in your mail (I guess cc list) and couldn’t reach to lkml or
> > > ignored by lkml. I just wanted to track the auto test results from linux-kselftest.
> >
> > Yeah, claws-mail has an issue with some emails with quotes in it (sometimes
> > drops the second quote). Sad part is, it happens after I hit send, and it
> > is not part of the email. I'll send this reply now, but I bet it's going to happen again.
> >
> > Let's see :-/ I checked the To and Cc's and they all have the proper
> > quotes. Let's see what ends up in my "Sent" folder.
>
> This time it worked!
>
But this reply did not :-p
It was fine before I sent, but the email in my Sent folder shows:
Cc: "mhiramat(a)kernel.org" <mhiramat(a)kernel.org>, "shuah(a)kernel.org" <shuah(a)kernel.org>, "linux-kernel(a)vger.kernel.org" <linux-kernel(a)vger.kernel.org>, "linux-trace-kernel(a)vger.kernel.org\" <linux-trace-kernel(a)vger.kernel.org>, "linux-kselftest(a)vger.kernel.org" <linux-kselftest(a)vger.kernel.org>, Ching-lin Yu <chinglinyu(a)google.com>, Nadav Amit <namit(a)vmware.com>, "srivatsa(a)csail.mit.edu" <srivatsa(a)csail.mit.edu>, Alexey Makhalov <amakhalov(a)vmware.com>, Vasavi Sirnapalli <vsirnapalli(a)vmware.com>, Tapas Kundu <tkundu(a)vmware.com>, "er.ajay.kaher(a)gmail.com" <er.ajay.kaher(a)gmail.com>
Claw's injected a backslash into: "linux-trace-kernel(a)vger.kernel.org\" <linux-trace-kernel(a)vger.kernel.org>
I have my own build of claws-mail, let me update it and perhaps this will
go away.
-- Steve
This is the basic functionality for iommufd to support
iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices.
iommufd_device_replace() allows changing the HWPT associated with the
device to a new IOAS or HWPT. Replace does this in way that failure leaves
things unchanged, and utilizes the iommu iommu_group_replace_domain() API
to allow the iommu driver to perform an optional non-disruptive change.
IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and
used by attach or replace. At this point it isn't very useful since the
HWPT is the same as the automatically managed HWPT from the IOAS. However
a following series will allow userspace to customize the created HWPT.
The implementation is complicated because we have to introduce some
per-iommu_group memory in iommufd and redo how we think about multi-device
groups to be more explicit. This solves all the locking problems in the
prior attempts.
This series is infrastructure work for the following series which:
- Add replace for attach
- Expose replace through VFIO APIs
- Implement driver parameters for HWPT creation (nesting)
Once review of this is complete I will keep it on a side branch and
accumulate the following series when they are ready so we can have a
stable base and make more incremental progress. When we have all the parts
together to get a full implementation it can go to Linus.
This is on github: https://github.com/jgunthorpe/linux/commits/iommufd_hwpt
v7:
- Rebase to v6.4-rc2, update to new signature of iommufd_get_ioas()
v6: https://lore.kernel.org/r/0-v6-fdb604df649a+369-iommufd_alloc_jgg@nvidia.com
- Go back to the v4 locking arragnment with now both the attach/detach
igroup->locks inside the functions, Kevin says he needs this for a
followup series. This still fixes the syzkaller bug
- Fix two more error unwind locking bugs where
iommufd_object_abort_and_destroy(hwpt) would deadlock or be mislocked.
Make sure fail_nth will catch these mistakes
- Add a patch allowing objects to have different abort than destroy
function, it allows hwpt abort to require the caller to continue
to hold the lock and enforces this with lockdep.
v5: https://lore.kernel.org/r/0-v5-6716da355392+c5-iommufd_alloc_jgg@nvidia.com
- Go back to the v3 version of the code, keep the comment changes from
v4. Syzkaller says the group lock change in v4 didn't work.
- Adjust the fail_nth test to cover the path syzkaller found. We need to
have an ioas with a mapped page installed to inject a failure during
domain attachment.
v4: https://lore.kernel.org/r/0-v4-9cd79ad52ee8+13f5-iommufd_alloc_jgg@nvidia.c…
- Refine comments and commit messages
- Move the group lock into iommufd_hw_pagetable_attach()
- Fix error unwind in iommufd_device_do_replace()
v3: https://lore.kernel.org/r/0-v3-61d41fd9e13e+1f5-iommufd_alloc_jgg@nvidia.com
- Refine comments and commit messages
- Adjust the flow in iommufd_device_auto_get_domain() so pt_id is only
set on success
- Reject replace on non-attached devices
- Add missing __reserved check for IOMMU_HWPT_ALLOC
v2: https://lore.kernel.org/r/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.c…
- Use WARN_ON for the igroup->group test and move that logic to a
function iommufd_group_try_get()
- Change igroup->devices to igroup->device list
Replace will need to iterate over all attached idevs
- Rename to iommufd_group_setup_msi()
- New patch to export iommu_get_resv_regions()
- New patch to use per-device reserved regions instead of per-group
regions
- Split out the reorganizing of iommufd_device_change_pt() from the
replace patch
- Replace uses the per-dev reserved regions
- Use stdev_id in a few more places in the selftest
- Fix error handling in IOMMU_HWPT_ALLOC
- Clarify comments
- Rebase on v6.3-rc1
v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia…
Jason Gunthorpe (17):
iommufd: Move isolated msi enforcement to iommufd_device_bind()
iommufd: Add iommufd_group
iommufd: Replace the hwpt->devices list with iommufd_group
iommu: Export iommu_get_resv_regions()
iommufd: Keep track of each device's reserved regions instead of
groups
iommufd: Use the iommufd_group to avoid duplicate MSI setup
iommufd: Make sw_msi_start a group global
iommufd: Move putting a hwpt to a helper function
iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
iommufd: Allow a hwpt to be aborted after allocation
iommufd: Fix locking around hwpt allocation
iommufd: Reorganize iommufd_device_attach into
iommufd_device_change_pt
iommufd: Add iommufd_device_replace()
iommufd: Make destroy_rwsem use a lock class per object type
iommufd: Add IOMMU_HWPT_ALLOC
iommufd/selftest: Return the real idev id from selftest mock_domain
iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
Nicolin Chen (2):
iommu: Introduce a new iommu_group_replace_domain() API
iommufd/selftest: Test iommufd_device_replace()
drivers/iommu/iommu-priv.h | 10 +
drivers/iommu/iommu.c | 41 +-
drivers/iommu/iommufd/device.c | 553 +++++++++++++-----
drivers/iommu/iommufd/hw_pagetable.c | 112 +++-
drivers/iommu/iommufd/io_pagetable.c | 32 +-
drivers/iommu/iommufd/iommufd_private.h | 52 +-
drivers/iommu/iommufd/iommufd_test.h | 6 +
drivers/iommu/iommufd/main.c | 24 +-
drivers/iommu/iommufd/selftest.c | 40 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/iommufd.h | 26 +
tools/testing/selftests/iommu/iommufd.c | 67 ++-
.../selftests/iommu/iommufd_fail_nth.c | 67 ++-
tools/testing/selftests/iommu/iommufd_utils.h | 63 +-
14 files changed, 868 insertions(+), 226 deletions(-)
create mode 100644 drivers/iommu/iommu-priv.h
base-commit: f1fcbaa18b28dec10281551dfe6ed3a3ed80e3d6
--
2.40.1
Hi Liam,
On Thu, May 18, 2023 at 9:37 PM Liam R. Howlett <Liam.Howlett(a)oracle.com> wrote:
> Now that the functions have changed the limits, update the testing of
> the maple tree to test these new settings.
>
> Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Thanks for your patch, which is now commit eb2e817f38cafbf7
("maple_tree: update testing code for mas_{next,prev,walk}") in
> --- a/lib/test_maple_tree.c
> +++ b/lib/test_maple_tree.c
> @@ -2011,7 +2011,7 @@ static noinline void __init next_prev_test(struct maple_tree *mt)
>
> val = mas_next(&mas, ULONG_MAX);
> MT_BUG_ON(mt, val != NULL);
> - MT_BUG_ON(mt, mas.index != ULONG_MAX);
> + MT_BUG_ON(mt, mas.index != 0x7d6);
On m68k (ARAnyM):
TEST STARTING
BUG at next_prev_test:2014 (1)
Pass: 3749128 Run:3749129
And after that it seems to hang[*].
After adding a debug print (thus shifting all line numbers by +1):
next_prev_test:mas.index = 0x138e
BUG at next_prev_test:2015 (1)
0x138e = 5006, while the expected value is 0x7d6 = 2006.
I guess converting this test to the KUnit framework would make it a
bit easier to investigate failures...
[*] Left the debug one running, and I got a few more:
BUG at check_empty_area_window:2656 (1)
Pass: 3754275 Run:3754277
BUG at check_empty_area_window:2657 (1)
Pass: 3754275 Run:3754278
BUG at check_empty_area_window:2658 (1)
Pass: 3754275 Run:3754279
BUG at check_empty_area_window:2662 (1)
Pass: 3754275 Run:3754280
BUG at check_empty_area_window:2663 (1)
Pass: 3754275 Run:3754281
maple_tree: 3804518 of 3804524 tests passed
So the full test took more than 20 minutes...
> MT_BUG_ON(mt, mas.last != ULONG_MAX);
>
> val = mas_prev(&mas, 0);
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert(a)linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
Hi, Willy
This v4 mainly uses the argv0 suggested by you, at the same time, a new
run-libc-test target is added for glibc and musl, and the RB_ flags are
added for nolibc to allow compile nolibc-test.c without <linux/reboot.h>
for glibc, musl and nolibc (mainly for musl-gcc, without -I
/path/to/sysroot).
This patchset is based on the 20230705-nolibc-series2 branch of nolibc
repo [2], it must be applied after our v6 __sysret series [3] (argv0
exported there) and Thomas' chmod_net removal patchset [4] (the new
chmod_argv0 is added at the same line of chmod_net, will conflict).
This patchset assumes the chmod_net removal patchset will be applied at
first, if not, the chmod_argv0 added alphabetically will not be applied.
Since our new chmod_argv0 is exactly added to replace chmod_net, so,
Willy, is it ok for you to at least apply the chmod_net removal patch
[5] before this patchset?
selftests/nolibc: drop test chmod_net
This patchset is tested together with the v6 __sysret series [3]:
arch/board | result
------------|------------
arm/vexpress-a9 | 142 test(s) passed, 1 skipped, 0 failed.
arm/virt | 142 test(s) passed, 1 skipped, 0 failed.
aarch64/virt | 142 test(s) passed, 1 skipped, 0 failed.
ppc/g3beige | not supported
ppc/ppce500 | not supported
i386/pc | 142 test(s) passed, 1 skipped, 0 failed.
x86_64/pc | 142 test(s) passed, 1 skipped, 0 failed.
mipsel/malta | 142 test(s) passed, 1 skipped, 0 failed.
loongarch64/virt | 142 test(s) passed, 1 skipped, 0 failed.
riscv64/virt | 142 test(s) passed, 1 skipped, 0 failed.
riscv32/virt | 0 test(s) passed, 0 skipped, 0 failed.
s390x/s390-ccw-virtio | 142 test(s) passed, 1 skipped, 0 failed.
If use tinyconfig + basic console options (means disable all of the
other options, include procfs, shmem, tmpfs, net and memfd_create, to
save test time, only randomly choose 4 archs):
...
LOG: testing report for loongarch64/virt:
15 chmod_self [SKIPPED]
16 chown_self [SKIPPED]
40 link_cross [SKIPPED]
0 -fstackprotector not supported [SKIPPED]
139 test(s) passed, 4 skipped, 0 failed.
See all results in /labs/linux-lab/logging/nolibc/loongarch64-virt-nolibc-test.log
LOG: testing summary:
arch/board | result
------------|------------
arm/vexpress-a9 | 139 test(s) passed, 4 skipped, 0 failed.
x86_64/pc | 139 test(s) passed, 4 skipped, 0 failed.
mipsel/malta | 139 test(s) passed, 4 skipped, 0 failed.
loongarch64/virt | 139 test(s) passed, 4 skipped, 0 failed.
Changes from v3 --> v4:
* selftests/nolibc: stat_fault: silence NULL argument warning with glibc
selftests/nolibc: gettid: restore for glibc and musl
selftests/nolibc: add _LARGEFILE64_SOURCE for musl
selftests/nolibc: fix up int_fast16/32_t test cases for musl
selftests/nolibc: fix up kernel parameters support
selftests/nolibc: link_cross: use /proc/self/cmdline
tools/nolibc: add rmdir() support
selftests/nolibc: add a new rmdir() test case
selftests/nolibc: fix up failures when CONFIG_PROC_FS=n
selftests/nolibc: prepare /tmp for tmpfs or ramfs
selftests/nolibc: vfprintf: remove MEMFD_CREATE dependency
No change.
* selftests/nolibc: add run-libc-test target
New run and report for glibc or musl. for musl, we can simply issue:
$ make run-libc-test CC=/path/to/musl-install/bin/musl-gcc
* tools/nolibc: types.h: add RB_ flags for reboot()
selftests/nolibc: prefer <sys/reboot.h> to <linux/reboot.h>
Required by musl to compile nolibc-test.c without -I/path/to/sysroot
* selftests/nolibc: chdir_root: restore current path after test
restore current path to prevent breakage of using relative path
* selftests/nolibc: stat_timestamps: remove procfs dependency
selftests/nolibc: chroot_exe: remove procfs dependency
selftests/nolibc: add chmod_argv0 test
use argv0 instead of '/init' as before.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1688134399.git.falcon@tinylab.org/
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/wtarreau/nolibc.git
[3]: https://lore.kernel.org/lkml/cover.1688739492.git.falcon@tinylab.org/
[4]: https://lore.kernel.org/lkml/20230624-proc-net-setattr-v1-0-73176812adee@we…
[5]: https://lore.kernel.org/lkml/20230624-proc-net-setattr-v1-1-73176812adee@we…
Zhangjin Wu (18):
selftests/nolibc: add run-libc-test target
selftests/nolibc: stat_fault: silence NULL argument warning with glibc
selftests/nolibc: gettid: restore for glibc and musl
selftests/nolibc: add _LARGEFILE64_SOURCE for musl
selftests/nolibc: fix up int_fast16/32_t test cases for musl
tools/nolibc: types.h: add RB_ flags for reboot()
selftests/nolibc: prefer <sys/reboot.h> to <linux/reboot.h>
selftests/nolibc: fix up kernel parameters support
selftests/nolibc: link_cross: use /proc/self/cmdline
tools/nolibc: add rmdir() support
selftests/nolibc: add a new rmdir() test case
selftests/nolibc: fix up failures when CONFIG_PROC_FS=n
selftests/nolibc: prepare /tmp for tmpfs or ramfs
selftests/nolibc: vfprintf: remove MEMFD_CREATE dependency
selftests/nolibc: chdir_root: restore current path after test
selftests/nolibc: stat_timestamps: remove procfs dependency
selftests/nolibc: chroot_exe: remove procfs dependency
selftests/nolibc: add chmod_argv0 test
tools/include/nolibc/sys.h | 23 ++++-
tools/include/nolibc/types.h | 12 ++-
tools/testing/selftests/nolibc/Makefile | 4 +
tools/testing/selftests/nolibc/nolibc-test.c | 88 +++++++++++++++-----
4 files changed, 104 insertions(+), 23 deletions(-)
--
2.25.1
According to commit 01d6c48a828b ("Documentation: kselftest:
"make headers" is a prerequisite"), running the kselftests requires
to run "make headers" first.
Do that in "vmtest.sh" as well to fix the HID CI.
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Looks like the new master branch (v6.5-rc1) broke my CI.
And given that `make headers` is now a requisite to run the kselftests,
also include that command in vmtests.sh.
Broken CI job: https://gitlab.freedesktop.org/bentiss/hid/-/jobs/44704436
Fixed CI job: https://gitlab.freedesktop.org/bentiss/hid/-/jobs/45151040
---
tools/testing/selftests/hid/vmtest.sh | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/hid/vmtest.sh b/tools/testing/selftests/hid/vmtest.sh
index 681b906b4853..4da48bf6b328 100755
--- a/tools/testing/selftests/hid/vmtest.sh
+++ b/tools/testing/selftests/hid/vmtest.sh
@@ -79,6 +79,7 @@ recompile_kernel()
cd "${kernel_checkout}"
${make_command} olddefconfig
+ ${make_command} headers
${make_command}
}
---
base-commit: 0e382fa72bbf0610be40af9af9b03b0cd149df82
change-id: 20230709-fix-selftests-c8b0bdff1d20
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
Hi, Willy
As you suggested, the 'status: [success|warning|failure]' info is added
to the summary line, with additional newlines around this line to
extrude the status info. at the same time, the total tests is printed,
the passed, skipped and failed values are aligned with '%03d'.
This patchset is based on 20230705-nolibc-series2 of nolibc repo[1].
The test result looks like:
...
138 test(s): 135 passed, 002 skipped, 001 failed => status: failure
See all results in /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/run.out
Or:
...
137 test(s): 134 passed, 003 skipped, 000 failed => status: warning
See all results in /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/run.out
Best regards,
Zhangjin
---
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/wtarreau/nolibc.git
Zhangjin Wu (5):
selftests/nolibc: report: print a summarized test status
selftests/nolibc: report: print total tests
selftests/nolibc: report: align passed, skipped and failed
selftests/nolibc: report: extrude the test status line
selftests/nolibc: report: add newline before test failures
tools/testing/selftests/nolibc/Makefile | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--
2.25.1
On Mon, 10 Jul 2023 02:17:01 +0000
Nadav Amit <namit(a)vmware.com> wrote:
> > On Jul 9, 2023, at 6:54 PM, Steven Rostedt <rostedt(a)goodmis.org> wrote:
> >
> > + union {
> > + struct rcu_head rcu;
> > + struct llist_node llist; /* For freeing after RCU */
> > + };
>
> The memory savings from using a union might not be worth the potential impact
> of type confusion and bugs.
It's also documentation. The two are related, as one is the hand off to
the other. It's not a random union, and I'd like to leave it that way.
-- Steve
Since commit 53fcfafa8c5c ("tools/nolibc/unistd: add syscall()") nolibc
has support for syscall(2).
Use it to get rid of some ifdef-ery.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
tools/testing/selftests/nolibc/nolibc-test.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/tools/testing/selftests/nolibc/nolibc-test.c b/tools/testing/selftests/nolibc/nolibc-test.c
index 486334981e60..c02d89953679 100644
--- a/tools/testing/selftests/nolibc/nolibc-test.c
+++ b/tools/testing/selftests/nolibc/nolibc-test.c
@@ -1051,11 +1051,7 @@ int main(int argc, char **argv, char **envp)
* exit with status code 2N+1 when N is written to 0x501. We
* hard-code the syscall here as it's arch-dependent.
*/
-#if defined(_NOLIBC_SYS_H)
- else if (my_syscall3(__NR_ioperm, 0x501, 1, 1) == 0)
-#else
- else if (ioperm(0x501, 1, 1) == 0)
-#endif
+ else if (syscall(__NR_ioperm, 0x501, 1, 1) == 0)
__asm__ volatile ("outb %%al, %%dx" :: "d"(0x501), "a"(0));
/* if it does nothing, fall back to the regular panic */
#endif
---
base-commit: a901a3568fd26ca9c4a82d8bc5ed5b3ed844d451
change-id: 20230703-nolibc-ioperm-88d87ae6d5e9
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Also enable users to select
desired address space using a non-zero hint address to mmap. Previous
kernel changes caused Java and other applications to be broken on sv57
which this patch fixes.
Documentation is also added to the RISC-V virtual memory section to explain
these changes.
-Charlie
---
v4:
- Split testcases/document patch into test cases, in-code documentation, and
formal documentation patches
- Modified the mmap_base macro to be more legible and better represent memory
layout
- Fixed documentation to better reflect the implmentation
- Renamed DEFAULT_VA_BITS to MMAP_VA_BITS
- Added additional test case for rlimit changes
---
Charlie Jenkins (4):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Document mmap changes
Documentation/riscv/vm-layout.rst | 22 +++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 21 ++-
arch/riscv/include/asm/processor.h | 43 +++++-
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/.gitignore | 1 +
tools/testing/selftests/riscv/mm/Makefile | 21 +++
.../selftests/riscv/mm/testcases/mmap.c | 133 ++++++++++++++++++
8 files changed, 232 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap.c
--
2.41.0
This series adds a new userfaultfd feature, UFFDIO_POISON. See commit 4
for a detailed description of the feature.
The series is based on Linus master (partial 6.5 merge window), and
structured like this:
- Patches 1-3 are preparation / refactoring
- Patches 4-6 implement and advertise the new feature
- Patches 7-8 implement a unit test for the new feature
Changelog:
v2 -> v3:
- Rebase onto current Linus master.
- Don't overwrite existing PTE markers for non-hugetlb UFFDIO_POISON.
Before, non-hugetlb would override them, but hugetlb would not. I don't
think there's a use case where we *want* to override a UFFD_WP marker
for example, so take the more conservative behavior for all kinds of
memory.
- [Peter] Drop hugetlb mfill atomic refactoring, since it isn't needed
for this series (we don't touch that code directly anyway).
- [Peter] Switch to re-using PTE_MARKER_SWAPIN_ERROR instead of defining
new PTE_MARKER_UFFD_POISON.
- [Peter] Extract start / len range overflow check into existing
validate_range helper; this fixes the style issue of unnecessary braces
in the UFFDIO_POISON implementation, because this code is just deleted.
- [Peter] Extract file size check out into a new helper.
- [Peter] Defer actually "enabling" the new feature until the last commit
in the series; combine this with adding the documentation. As a
consequence, move the selftest commits after this one.
- [Randy] Fix typo in documentation.
v1 -> v2:
- [Peter] Return VM_FAULT_HWPOISON not VM_FAULT_SIGBUS, to yield the
correct behavior for KVM (guest MCE).
- [Peter] Rename UFFDIO_SIGBUS to UFFDIO_POISON.
- [Peter] Implement hugetlbfs support for UFFDIO_POISON.
Axel Rasmussen (8):
mm: make PTE_MARKER_SWAPIN_ERROR more general
mm: userfaultfd: check for start + len overflow in validate_range
mm: userfaultfd: extract file size check out into a helper
mm: userfaultfd: add new UFFDIO_POISON ioctl
mm: userfaultfd: support UFFDIO_POISON for hugetlbfs
mm: userfaultfd: document and enable new UFFDIO_POISON feature
selftests/mm: refactor uffd_poll_thread to allow custom fault handlers
selftests/mm: add uffd unit test for UFFDIO_POISON
Documentation/admin-guide/mm/userfaultfd.rst | 15 +++
fs/userfaultfd.c | 73 ++++++++++--
include/linux/mm_inline.h | 19 +++
include/linux/swapops.h | 10 +-
include/linux/userfaultfd_k.h | 4 +
include/uapi/linux/userfaultfd.h | 25 +++-
mm/hugetlb.c | 51 ++++++--
mm/madvise.c | 2 +-
mm/memory.c | 15 ++-
mm/mprotect.c | 4 +-
mm/shmem.c | 4 +-
mm/swapfile.c | 2 +-
mm/userfaultfd.c | 83 ++++++++++---
tools/testing/selftests/mm/uffd-common.c | 5 +-
tools/testing/selftests/mm/uffd-common.h | 3 +
tools/testing/selftests/mm/uffd-stress.c | 12 +-
tools/testing/selftests/mm/uffd-unit-tests.c | 117 +++++++++++++++++++
17 files changed, 377 insertions(+), 67 deletions(-)
--
2.41.0.255.g8b1d071c50-goog
When wrapping code, use ';' better than using ',' which is more
in line with the coding habits of most engineers.
Signed-off-by: Lu Hongfei <luhongfei(a)vivo.com>
---
Compared to the previous version, the modifications made are:
1. Modified the subject to make it clearer and more accurate
2. Newly optimized typo in tcp_hdr_options.c
tools/testing/selftests/bpf/benchs/bench_ringbufs.c | 2 +-
tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/benchs/bench_ringbufs.c b/tools/testing/selftests/bpf/benchs/bench_ringbufs.c
index 3ca14ad36607..e1ee979e6acc 100644
--- a/tools/testing/selftests/bpf/benchs/bench_ringbufs.c
+++ b/tools/testing/selftests/bpf/benchs/bench_ringbufs.c
@@ -399,7 +399,7 @@ static void perfbuf_libbpf_setup(void)
ctx->skel = perfbuf_setup_skeleton();
memset(&attr, 0, sizeof(attr));
- attr.config = PERF_COUNT_SW_BPF_OUTPUT,
+ attr.config = PERF_COUNT_SW_BPF_OUTPUT;
attr.type = PERF_TYPE_SOFTWARE;
attr.sample_type = PERF_SAMPLE_RAW;
/* notify only every Nth sample */
diff --git a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
index 13bcaeb028b8..56685fc03c7e 100644
--- a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
+++ b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
@@ -347,7 +347,7 @@ static void syncookie_estab(void)
exp_active_estab_in.max_delack_ms = 22;
exp_passive_hdr_stg.syncookie = true;
- exp_active_hdr_stg.resend_syn = true,
+ exp_active_hdr_stg.resend_syn = true;
prepare_out();
--
2.39.0
When wrapping code, use ';' better than using ',' which is more
in line with the coding habits of most engineers.
Signed-off-by: Lu Hongfei <luhongfei(a)vivo.com>
---
tools/testing/selftests/bpf/benchs/bench_ringbufs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/benchs/bench_ringbufs.c b/tools/testing/selftests/bpf/benchs/bench_ringbufs.c
index 3ca14ad36607..e1ee979e6acc 100644
--- a/tools/testing/selftests/bpf/benchs/bench_ringbufs.c
+++ b/tools/testing/selftests/bpf/benchs/bench_ringbufs.c
@@ -399,7 +399,7 @@ static void perfbuf_libbpf_setup(void)
ctx->skel = perfbuf_setup_skeleton();
memset(&attr, 0, sizeof(attr));
- attr.config = PERF_COUNT_SW_BPF_OUTPUT,
+ attr.config = PERF_COUNT_SW_BPF_OUTPUT;
attr.type = PERF_TYPE_SOFTWARE;
attr.sample_type = PERF_SAMPLE_RAW;
/* notify only every Nth sample */
--
2.39.0
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Define a new TLV-based format for keys and signatures, aiming to store and
use in the kernel the crypto material from other unsupported formats
(e.g. PGP).
TLV fields have been defined to fill the corresponding kernel structures
public_key, public_key_signature and key_preparsed_payload.
Keys:
struct public_key { struct key_preparsed_payload {
KEY_PUB --> void *key;
u32 keylen; --> prep->payload.data[asym_crypto]
KEY_ALGO --> const char *pkey_algo;
KEY_KID0
KEY_KID1 --> prep->payload.data[asym_key_ids]
KEY_KID2
KEY_DESC --> prep->description
Signatures:
struct public_key_signature {
SIG_S --> u8 *s;
u32 s_size;
SIG_KEY_ALGO --> const char *pkey_algo;
SIG_HASH_ALGO --> const char *hash_algo;
u32 digest_size;
SIG_ENC --> const char *encoding;
SIG_KID0
SIG_KID1 --> struct asymmetric_key_id *auth_ids[3];
SIG_KID2
For keys, since the format conversion has to be done in user space, user
space is assumed to be trusted, in this proposal. Without this assumption,
a malicious conversion tool could make a user load to the kernel a
different key than the one expected.
That should not be a particular problem for keys that are embedded in the
kernel image and loaded at boot, since the conversion happens in a trusted
environment such as the building infrastructure of the Linux distribution
vendor.
In the other cases, such as enrolling a key through the Machine Owner Key
(MOK) mechanism, the user is responsible to ensure that the crypto material
carried in the original format remains the same after the conversion.
For signatures, assuming the strength of the crypto algorithms, altering
the crypto material is simply a Denial-of-Service (DoS), as data can be
validated only with the right signature.
This patch set also offers the following contributions:
- An API similar to the PKCS#7 one, to verify the authenticity of system
data through user asymmetric keys and signatures
- A mechanism to store a keyring blob in the kernel image and to extract
and load the keys at system boot
- eBPF binding, so that data authenticity verification with user asymmetric
keys and signatures can be carried out also with eBPF programs
- A new command for gnupg (in user space), to convert keys and signatures
from PGP to the new kernel format
The primary use case for this patch set is to verify the authenticity of
RPM package headers with the PGP keys of the Linux distribution. Once their
authenticity is verified, file digests can be extracted from those RPM
headers and used as reference values for IMA Appraisal.
Compared to the previous patch set, the main difference is not relying on
User Mode Drivers (UMDs) for the conversion from the original format to the
kernel format, due to the concern that full isolation of the UMD process
cannot be achieved against a fully privileged system user (root).
The discussion is still ongoing here:
https://lore.kernel.org/linux-integrity/eb31920bd00e2c921b0aa6ebed8745cb013…
This however does not prevent the goal mentioned above of verifying the
authenticity of RPM headers to be achieved. The fact that Linux
distribution vendors do the conversion in their infrastructure is a good
enough guarantee.
A very quick way to test the patch set is to execute:
# gpg --conv-kernel /etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-rawhide-primary | keyctl padd asymmetric "" @u
# keyctl show @u
Keyring
762357580 --alswrv 0 65534 keyring: _uid.0
567216072 --als--v 0 0 \_ asymmetric: PGP: 18b8e74c
Patches 1-2 preliminarly export some definitions to user space so that
conversion tools can specify the right public key algorithms and signature
encodings (digest algorithms are already exported).
Patches 3-5 introduce the user asymmetric keys and signatures.
Patches 6 introduces a system API for verifying the authenticity of system
data through user asymmetric keys and signatures.
Patch 7-8 introduce a mechanism to store a keyring blob with user
asymmetric keys in the kernel image, and load them at system boot.
Patches 9-10 introduce the eBPF binding and corresponding test (which can
be enabled only after the gnupg patches are upstreamed).
Patches 1-2 [GNUPG] introduce the new gpg command --conv-kernel to convert
PGP keys and signatures to the new kernel format.
Changelog
v1:
- Remove useless check in validate_key() (suggested by Yonghong)
- Don't rely on User Mode Drivers for the conversion from the original
format to the kernel format
- Use the more extensible TLV format, instead of a fixed structure
Roberto Sassu (10):
crypto: Export public key algorithm information
crypto: Export signature encoding information
KEYS: asymmetric: Introduce a parser for user asymmetric keys and sigs
KEYS: asymmetric: Introduce the user asymmetric key parser
KEYS: asymmetric: Introduce the user asymmetric key signature parser
verification: Add verify_uasym_signature() and
verify_uasym_sig_message()
KEYS: asymmetric: Preload user asymmetric keys from a keyring blob
KEYS: Introduce load_uasym_keyring()
bpf: Introduce bpf_verify_uasym_signature() kfunc
selftests/bpf: Prepare a test for user asymmetric key signatures
MAINTAINERS | 1 +
certs/Kconfig | 11 +
certs/Makefile | 7 +
certs/system_certificates.S | 18 +
certs/system_keyring.c | 166 +++++-
crypto/Kconfig | 6 +
crypto/Makefile | 2 +
crypto/asymmetric_keys/Kconfig | 14 +
crypto/asymmetric_keys/Makefile | 10 +
crypto/asymmetric_keys/asymmetric_type.c | 3 +-
crypto/asymmetric_keys/uasym_key_parser.c | 229 ++++++++
crypto/asymmetric_keys/uasym_key_preload.c | 99 ++++
crypto/asymmetric_keys/uasym_parser.c | 201 +++++++
crypto/asymmetric_keys/uasym_parser.h | 43 ++
crypto/asymmetric_keys/uasym_sig_parser.c | 491 ++++++++++++++++++
crypto/pub_key_info.c | 20 +
crypto/sig_enc_info.c | 16 +
include/crypto/pub_key_info.h | 15 +
include/crypto/sig_enc_info.h | 15 +
include/crypto/uasym_keys_sigs.h | 82 +++
include/keys/asymmetric-type.h | 1 +
include/linux/verification.h | 50 ++
include/uapi/linux/pub_key_info.h | 22 +
include/uapi/linux/sig_enc_info.h | 18 +
include/uapi/linux/uasym_parser.h | 107 ++++
kernel/trace/bpf_trace.c | 68 ++-
...y_pkcs7_sig.c => verify_pkcs7_uasym_sig.c} | 159 +++++-
...s7_sig.c => test_verify_pkcs7_uasym_sig.c} | 18 +-
.../testing/selftests/bpf/verify_sig_setup.sh | 82 ++-
29 files changed, 1924 insertions(+), 50 deletions(-)
create mode 100644 crypto/asymmetric_keys/uasym_key_parser.c
create mode 100644 crypto/asymmetric_keys/uasym_key_preload.c
create mode 100644 crypto/asymmetric_keys/uasym_parser.c
create mode 100644 crypto/asymmetric_keys/uasym_parser.h
create mode 100644 crypto/asymmetric_keys/uasym_sig_parser.c
create mode 100644 crypto/pub_key_info.c
create mode 100644 crypto/sig_enc_info.c
create mode 100644 include/crypto/pub_key_info.h
create mode 100644 include/crypto/sig_enc_info.h
create mode 100644 include/crypto/uasym_keys_sigs.h
create mode 100644 include/uapi/linux/pub_key_info.h
create mode 100644 include/uapi/linux/sig_enc_info.h
create mode 100644 include/uapi/linux/uasym_parser.h
rename tools/testing/selftests/bpf/prog_tests/{verify_pkcs7_sig.c => verify_pkcs7_uasym_sig.c} (69%)
rename tools/testing/selftests/bpf/progs/{test_verify_pkcs7_sig.c => test_verify_pkcs7_uasym_sig.c} (82%)
--
2.34.1
Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Also enable users to select
desired address space using a non-zero hint address to mmap. Previous
kernel changes caused Java and other applications to be broken on sv57
which this patch fixes.
Documentation is also added to the RISC-V virtual memory section to explain
these changes.
Charlie Jenkins (2):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Update documentation and include test
Documentation/riscv/vm-layout.rst | 22 +++++++++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 21 ++++++--
arch/riscv/include/asm/processor.h | 34 ++++++++++---
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/.gitignore | 1 +
tools/testing/selftests/riscv/mm/Makefile | 21 ++++++++
.../selftests/riscv/mm/testcases/mmap.c | 49 +++++++++++++++++++
8 files changed, 139 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap.c
--
2.41.0
=== Context ===
In the context of a middlebox, fragmented packets are tricky to handle.
The full 5-tuple of a packet is often only available in the first
fragment which makes enforcing consistent policy difficult. There are
really only two stateless options, neither of which are very nice:
1. Enforce policy on first fragment and accept all subsequent fragments.
This works but may let in certain attacks or allow data exfiltration.
2. Enforce policy on first fragment and drop all subsequent fragments.
This does not really work b/c some protocols may rely on
fragmentation. For example, DNS may rely on oversized UDP packets for
large responses.
So stateful tracking is the only sane option. RFC 8900 [0] calls this
out as well in section 6.3:
Middleboxes [...] should process IP fragments in a manner that is
consistent with [RFC0791] and [RFC8200]. In many cases, middleboxes
must maintain state in order to achieve this goal.
=== BPF related bits ===
Policy has traditionally been enforced from XDP/TC hooks. Both hooks
run before kernel reassembly facilities. However, with the new
BPF_PROG_TYPE_NETFILTER, we can rather easily hook into existing
netfilter reassembly infra.
The basic idea is we bump a refcnt on the netfilter defrag module and
then run the bpf prog after the defrag module runs. This allows bpf
progs to transparently see full, reassembled packets. The nice thing
about this is that progs don't have to carry around logic to detect
fragments.
=== Changelog ===
Changes from v1:
* Drop bpf_program__attach_netfilter() patches
* static -> static const where appropriate
* Fix callback assignment order during registration
* Only request_module() if callbacks are missing
* Fix retval when modprobe fails in userspace
* Fix v6 defrag module name (nf_defrag_ipv6_hooks -> nf_defrag_ipv6)
* Simplify priority checking code
* Add warning if module doesn't assign callbacks in the future
* Take refcnt on module while defrag link is active
[0]: https://datatracker.ietf.org/doc/html/rfc8900
Daniel Xu (6):
netfilter: defrag: Add glue hooks for enabling/disabling defrag
netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
netfilter: bpf: Prevent defrag module unload while link active
bpf: selftests: Support not connecting client socket
bpf: selftests: Support custom type and proto for client sockets
bpf: selftests: Add defrag selftests
include/linux/netfilter.h | 15 +
include/uapi/linux/bpf.h | 5 +
net/ipv4/netfilter/nf_defrag_ipv4.c | 17 +-
net/ipv6/netfilter/nf_defrag_ipv6_hooks.c | 11 +
net/netfilter/core.c | 6 +
net/netfilter/nf_bpf_link.c | 149 ++++++++-
tools/include/uapi/linux/bpf.h | 5 +
tools/testing/selftests/bpf/Makefile | 4 +-
.../selftests/bpf/generate_udp_fragments.py | 90 ++++++
.../selftests/bpf/ip_check_defrag_frags.h | 57 ++++
tools/testing/selftests/bpf/network_helpers.c | 26 +-
tools/testing/selftests/bpf/network_helpers.h | 3 +
.../bpf/prog_tests/ip_check_defrag.c | 282 ++++++++++++++++++
.../selftests/bpf/progs/ip_check_defrag.c | 104 +++++++
14 files changed, 752 insertions(+), 22 deletions(-)
create mode 100755 tools/testing/selftests/bpf/generate_udp_fragments.py
create mode 100644 tools/testing/selftests/bpf/ip_check_defrag_frags.h
create mode 100644 tools/testing/selftests/bpf/prog_tests/ip_check_defrag.c
create mode 100644 tools/testing/selftests/bpf/progs/ip_check_defrag.c
--
2.41.0
From: Björn Töpel <bjorn(a)rivosinc.com>
BPF tests that load /proc/kallsyms, e.g. bpf_cookie, will perform a
buffer overrun if the number of syms on the system is larger than
MAX_SYMS.
Bump the MAX_SYMS to 400000, and add a runtime check that bails out if
the maximum is reached.
Signed-off-by: Björn Töpel <bjorn(a)rivosinc.com>
---
tools/testing/selftests/bpf/trace_helpers.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/trace_helpers.c b/tools/testing/selftests/bpf/trace_helpers.c
index 9b070cdf44ac..f83d9f65c65b 100644
--- a/tools/testing/selftests/bpf/trace_helpers.c
+++ b/tools/testing/selftests/bpf/trace_helpers.c
@@ -18,7 +18,7 @@
#define TRACEFS_PIPE "/sys/kernel/tracing/trace_pipe"
#define DEBUGFS_PIPE "/sys/kernel/debug/tracing/trace_pipe"
-#define MAX_SYMS 300000
+#define MAX_SYMS 400000
static struct ksym syms[MAX_SYMS];
static int sym_cnt;
@@ -46,6 +46,9 @@ int load_kallsyms_refresh(void)
break;
if (!addr)
continue;
+ if (i >= MAX_SYMS)
+ return -EFBIG;
+
syms[i].addr = (long) addr;
syms[i].name = strdup(func);
i++;
base-commit: fd283ab196a867f8f65f36913e0fadd031fcb823
--
2.39.2
*Changes in v23*:
- Set vec_buf_index in loop only when vec_buf_index is set
- Return -EFAULT instead of -EINVAL if vec is NULL
- Correctly return the walk ending address to the page granularity
*Changes in v22*:
- Interface change:
- Replace [start start + len) with [start, end)
- Return the ending address of the address walk in start
*Changes in v21*:
- Abort walk instead of returning error if WP is to be performed on
partial hugetlb
*Changes in v20*
- Correct PAGE_IS_FILE and add PAGE_IS_PFNZERO
*Changes in v19*
- Minor changes and interface updates
*Changes in v18*
- Rebase on top of next-20230613
- Minor updates
*Changes in v17*
- Rebase on top of next-20230606
- Minor improvements in PAGEMAP_SCAN IOCTL patch
*Changes in v16*
- Fix a corner case
- Add exclusive PM_SCAN_OP_WP back
*Changes in v15*
- Build fix (Add missed build fix in RESEND)
*Changes in v14*
- Fix build error caused by #ifdef added at last minute in some configs
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 58 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 577 +++++++
fs/userfaultfd.c | 26 +-
include/linux/hugetlb.h | 1 +
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 55 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 34 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 55 +
tools/testing/selftests/mm/.gitignore | 2 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1464 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
16 files changed, 2348 insertions(+), 24 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
Changes in v22:
- Interface change:
- Replace [start start + len) with [start, end)
- Return the ending address of the address walk in start
Changes in v21:
- Abort walk instead of returning error if WP is to be performed on
partial hugetlb
*Changes in v20*
- Correct PAGE_IS_FILE and add PAGE_IS_PFNZERO
*Changes in v19*
- Minor changes and interface updates
*Changes in v18*
- Rebase on top of next-20230613
- Minor updates
*Changes in v17*
- Rebase on top of next-20230606
- Minor improvements in PAGEMAP_SCAN IOCTL patch
*Changes in v16*
- Fix a corner case
- Add exclusive PM_SCAN_OP_WP back
*Changes in v15*
- Build fix (Add missed build fix in RESEND)
*Changes in v14*
- Fix build error caused by #ifdef added at last minute in some configs
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 58 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 565 +++++++
fs/userfaultfd.c | 26 +-
include/linux/hugetlb.h | 1 +
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 55 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 34 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 55 +
tools/testing/selftests/mm/.gitignore | 2 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1464 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
16 files changed, 2336 insertions(+), 24 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
The basic idea here is to "simulate" memory poisoning for VMs. A VM
running on some host might encounter a memory error, after which some
page(s) are poisoned (i.e., future accesses SIGBUS). They expect that
once poisoned, pages can never become "un-poisoned". So, when we live
migrate the VM, we need to preserve the poisoned status of these pages.
When live migrating, we try to get the guest running on its new host as
quickly as possible. So, we start it running before all memory has been
copied, and before we're certain which pages should be poisoned or not.
So the basic way to use this new feature is:
- On the new host, the guest's memory is registered with userfaultfd, in
either MISSING or MINOR mode (doesn't really matter for this purpose).
- On any first access, we get a userfaultfd event. At this point we can
communicate with the old host to find out if the page was poisoned.
- If so, we can respond with a UFFDIO_POISON - this places a swap marker
so any future accesses will SIGBUS. Because the pte is now "present",
future accesses won't generate more userfaultfd events, they'll just
SIGBUS directly.
UFFDIO_POISON does not handle unmapping previously-present PTEs. This
isn't needed, because during live migration we want to intercept
all accesses with userfaultfd (not just writes, so WP mode isn't useful
for this). So whether minor or missing mode is being used (or both), the
PTE won't be present in any case, so handling that case isn't needed.
Why return VM_FAULT_HWPOISON instead of VM_FAULT_SIGBUS when one of
these markers is encountered? For "normal" userspace programs there
isn't a big difference, both yield a SIGBUS. The difference for KVM is
key though: VM_FAULT_HWPOISON will result in an MCE being injected into
the guest (which is the behavior we want). With VM_FAULT_SIGBUS, the
hypervisor would need to catch the SIGBUS and deal with the MCE
injection itself.
Signed-off-by: Axel Rasmussen <axelrasmussen(a)google.com>
---
fs/userfaultfd.c | 63 ++++++++++++++++++++++++++++++++
include/linux/swapops.h | 3 +-
include/linux/userfaultfd_k.h | 4 ++
include/uapi/linux/userfaultfd.h | 25 +++++++++++--
mm/memory.c | 4 ++
mm/userfaultfd.c | 62 ++++++++++++++++++++++++++++++-
6 files changed, 156 insertions(+), 5 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 7cecd49e078b..c26a883399c9 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1965,6 +1965,66 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg)
return ret;
}
+static inline int userfaultfd_poison(struct userfaultfd_ctx *ctx, unsigned long arg)
+{
+ __s64 ret;
+ struct uffdio_poison uffdio_poison;
+ struct uffdio_poison __user *user_uffdio_poison;
+ struct userfaultfd_wake_range range;
+
+ user_uffdio_poison = (struct uffdio_poison __user *)arg;
+
+ ret = -EAGAIN;
+ if (atomic_read(&ctx->mmap_changing))
+ goto out;
+
+ ret = -EFAULT;
+ if (copy_from_user(&uffdio_poison, user_uffdio_poison,
+ /* don't copy the output fields */
+ sizeof(uffdio_poison) - (sizeof(__s64))))
+ goto out;
+
+ ret = validate_range(ctx->mm, uffdio_poison.range.start,
+ uffdio_poison.range.len);
+ if (ret)
+ goto out;
+
+ ret = -EINVAL;
+ /* double check for wraparound just in case. */
+ if (uffdio_poison.range.start + uffdio_poison.range.len <=
+ uffdio_poison.range.start) {
+ goto out;
+ }
+ if (uffdio_poison.mode & ~UFFDIO_POISON_MODE_DONTWAKE)
+ goto out;
+
+ if (mmget_not_zero(ctx->mm)) {
+ ret = mfill_atomic_poison(ctx->mm, uffdio_poison.range.start,
+ uffdio_poison.range.len,
+ &ctx->mmap_changing, 0);
+ mmput(ctx->mm);
+ } else {
+ return -ESRCH;
+ }
+
+ if (unlikely(put_user(ret, &user_uffdio_poison->updated)))
+ return -EFAULT;
+ if (ret < 0)
+ goto out;
+
+ /* len == 0 would wake all */
+ BUG_ON(!ret);
+ range.len = ret;
+ if (!(uffdio_poison.mode & UFFDIO_POISON_MODE_DONTWAKE)) {
+ range.start = uffdio_poison.range.start;
+ wake_userfault(ctx, &range);
+ }
+ ret = range.len == uffdio_poison.range.len ? 0 : -EAGAIN;
+
+out:
+ return ret;
+}
+
static inline unsigned int uffd_ctx_features(__u64 user_features)
{
/*
@@ -2066,6 +2126,9 @@ static long userfaultfd_ioctl(struct file *file, unsigned cmd,
case UFFDIO_CONTINUE:
ret = userfaultfd_continue(ctx, arg);
break;
+ case UFFDIO_POISON:
+ ret = userfaultfd_poison(ctx, arg);
+ break;
}
return ret;
}
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 4c932cb45e0b..8259fee32421 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -394,7 +394,8 @@ typedef unsigned long pte_marker;
#define PTE_MARKER_UFFD_WP BIT(0)
#define PTE_MARKER_SWAPIN_ERROR BIT(1)
-#define PTE_MARKER_MASK (BIT(2) - 1)
+#define PTE_MARKER_UFFD_POISON BIT(2)
+#define PTE_MARKER_MASK (BIT(3) - 1)
static inline swp_entry_t make_pte_marker_entry(pte_marker marker)
{
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index ac7b0c96d351..ac8c6854097c 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -46,6 +46,7 @@ enum mfill_atomic_mode {
MFILL_ATOMIC_COPY,
MFILL_ATOMIC_ZEROPAGE,
MFILL_ATOMIC_CONTINUE,
+ MFILL_ATOMIC_POISON,
NR_MFILL_ATOMIC_MODES,
};
@@ -83,6 +84,9 @@ extern ssize_t mfill_atomic_zeropage(struct mm_struct *dst_mm,
extern ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long dst_start,
unsigned long len, atomic_t *mmap_changing,
uffd_flags_t flags);
+extern ssize_t mfill_atomic_poison(struct mm_struct *dst_mm, unsigned long start,
+ unsigned long len, atomic_t *mmap_changing,
+ uffd_flags_t flags);
extern int mwriteprotect_range(struct mm_struct *dst_mm,
unsigned long start, unsigned long len,
bool enable_wp, atomic_t *mmap_changing);
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index 66dd4cd277bd..62151706c5a3 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -39,7 +39,8 @@
UFFD_FEATURE_MINOR_SHMEM | \
UFFD_FEATURE_EXACT_ADDRESS | \
UFFD_FEATURE_WP_HUGETLBFS_SHMEM | \
- UFFD_FEATURE_WP_UNPOPULATED)
+ UFFD_FEATURE_WP_UNPOPULATED | \
+ UFFD_FEATURE_POISON)
#define UFFD_API_IOCTLS \
((__u64)1 << _UFFDIO_REGISTER | \
(__u64)1 << _UFFDIO_UNREGISTER | \
@@ -49,12 +50,14 @@
(__u64)1 << _UFFDIO_COPY | \
(__u64)1 << _UFFDIO_ZEROPAGE | \
(__u64)1 << _UFFDIO_WRITEPROTECT | \
- (__u64)1 << _UFFDIO_CONTINUE)
+ (__u64)1 << _UFFDIO_CONTINUE | \
+ (__u64)1 << _UFFDIO_POISON)
#define UFFD_API_RANGE_IOCTLS_BASIC \
((__u64)1 << _UFFDIO_WAKE | \
(__u64)1 << _UFFDIO_COPY | \
+ (__u64)1 << _UFFDIO_WRITEPROTECT | \
(__u64)1 << _UFFDIO_CONTINUE | \
- (__u64)1 << _UFFDIO_WRITEPROTECT)
+ (__u64)1 << _UFFDIO_POISON)
/*
* Valid ioctl command number range with this API is from 0x00 to
@@ -71,6 +74,7 @@
#define _UFFDIO_ZEROPAGE (0x04)
#define _UFFDIO_WRITEPROTECT (0x06)
#define _UFFDIO_CONTINUE (0x07)
+#define _UFFDIO_POISON (0x08)
#define _UFFDIO_API (0x3F)
/* userfaultfd ioctl ids */
@@ -91,6 +95,8 @@
struct uffdio_writeprotect)
#define UFFDIO_CONTINUE _IOWR(UFFDIO, _UFFDIO_CONTINUE, \
struct uffdio_continue)
+#define UFFDIO_POISON _IOWR(UFFDIO, _UFFDIO_POISON, \
+ struct uffdio_poison)
/* read() structure */
struct uffd_msg {
@@ -225,6 +231,7 @@ struct uffdio_api {
#define UFFD_FEATURE_EXACT_ADDRESS (1<<11)
#define UFFD_FEATURE_WP_HUGETLBFS_SHMEM (1<<12)
#define UFFD_FEATURE_WP_UNPOPULATED (1<<13)
+#define UFFD_FEATURE_POISON (1<<14)
__u64 features;
__u64 ioctls;
@@ -321,6 +328,18 @@ struct uffdio_continue {
__s64 mapped;
};
+struct uffdio_poison {
+ struct uffdio_range range;
+#define UFFDIO_POISON_MODE_DONTWAKE ((__u64)1<<0)
+ __u64 mode;
+
+ /*
+ * Fields below here are written by the ioctl and must be at the end:
+ * the copy_from_user will not read past here.
+ */
+ __s64 updated;
+};
+
/*
* Flags for the userfaultfd(2) system call itself.
*/
diff --git a/mm/memory.c b/mm/memory.c
index d8a9a770b1f1..7fbda39e060d 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3692,6 +3692,10 @@ static vm_fault_t handle_pte_marker(struct vm_fault *vmf)
if (WARN_ON_ONCE(!marker))
return VM_FAULT_SIGBUS;
+ /* Poison emulation explicitly requested for this PTE. */
+ if (marker & PTE_MARKER_UFFD_POISON)
+ return VM_FAULT_HWPOISON;
+
/* Higher priority than uffd-wp when data corrupted */
if (marker & PTE_MARKER_SWAPIN_ERROR)
return VM_FAULT_SIGBUS;
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index a2bf37ee276d..87b62ca1e09e 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -286,6 +286,51 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd,
goto out;
}
+/* Handles UFFDIO_POISON for all non-hugetlb VMAs. */
+static int mfill_atomic_pte_poison(pmd_t *dst_pmd,
+ struct vm_area_struct *dst_vma,
+ unsigned long dst_addr,
+ uffd_flags_t flags)
+{
+ int ret;
+ struct mm_struct *dst_mm = dst_vma->vm_mm;
+ pte_t _dst_pte, *dst_pte;
+ spinlock_t *ptl;
+
+ _dst_pte = make_pte_marker(PTE_MARKER_UFFD_POISON);
+ dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
+
+ if (vma_is_shmem(dst_vma)) {
+ struct inode *inode;
+ pgoff_t offset, max_off;
+
+ /* serialize against truncate with the page table lock */
+ inode = dst_vma->vm_file->f_inode;
+ offset = linear_page_index(dst_vma, dst_addr);
+ max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
+ ret = -EFAULT;
+ if (unlikely(offset >= max_off))
+ goto out_unlock;
+ }
+
+ ret = -EEXIST;
+ /*
+ * For now, we don't handle unmapping pages, so only support filling in
+ * none PTEs, or replacing PTE markers.
+ */
+ if (!pte_none_mostly(*dst_pte))
+ goto out_unlock;
+
+ set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
+
+ /* No need to invalidate - it was non-present before */
+ update_mmu_cache(dst_vma, dst_addr, dst_pte);
+ ret = 0;
+out_unlock:
+ pte_unmap_unlock(dst_pte, ptl);
+ return ret;
+}
+
static pmd_t *mm_alloc_pmd(struct mm_struct *mm, unsigned long address)
{
pgd_t *pgd;
@@ -336,8 +381,12 @@ static __always_inline ssize_t mfill_atomic_hugetlb(
* supported by hugetlb. A PMD_SIZE huge pages may exist as used
* by THP. Since we can not reliably insert a zero page, this
* feature is not supported.
+ *
+ * PTE marker handling for hugetlb is a bit special, so for now
+ * UFFDIO_POISON is not supported.
*/
- if (uffd_flags_mode_is(flags, MFILL_ATOMIC_ZEROPAGE)) {
+ if (uffd_flags_mode_is(flags, MFILL_ATOMIC_ZEROPAGE) ||
+ uffd_flags_mode_is(flags, MFILL_ATOMIC_POISON)) {
mmap_read_unlock(dst_mm);
return -EINVAL;
}
@@ -481,6 +530,9 @@ static __always_inline ssize_t mfill_atomic_pte(pmd_t *dst_pmd,
if (uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE)) {
return mfill_atomic_pte_continue(dst_pmd, dst_vma,
dst_addr, flags);
+ } else if (uffd_flags_mode_is(flags, MFILL_ATOMIC_POISON)) {
+ return mfill_atomic_pte_poison(dst_pmd, dst_vma,
+ dst_addr, flags);
}
/*
@@ -702,6 +754,14 @@ ssize_t mfill_atomic_continue(struct mm_struct *dst_mm, unsigned long start,
uffd_flags_set_mode(flags, MFILL_ATOMIC_CONTINUE));
}
+ssize_t mfill_atomic_poison(struct mm_struct *dst_mm, unsigned long start,
+ unsigned long len, atomic_t *mmap_changing,
+ uffd_flags_t flags)
+{
+ return mfill_atomic(dst_mm, start, 0, len, mmap_changing,
+ uffd_flags_set_mode(flags, MFILL_ATOMIC_POISON));
+}
+
long uffd_wp_range(struct vm_area_struct *dst_vma,
unsigned long start, unsigned long len, bool enable_wp)
{
--
2.41.0.255.g8b1d071c50-goog
From: Björn Töpel <bjorn(a)rivosinc.com>
This series has two minor fixes, found when cross-compiling for the
RISC-V architecture.
Some RISC-V systems do not define HAVE_EFFICIENT_UNALIGNED_ACCESS,
which made some of tests bail out. Fix the failing tests by adding
F_NEEDS_EFFICIENT_UNALIGNED_ACCESS.
...and some RISC-V systems *do* define
HAVE_EFFICIENT_UNALIGNED_ACCESS. In this case the autoconf.h was not
correctly picked up by the build system.
Cheers,
Björn
Björn Töpel (2):
selftests/bpf: Add F_NEEDS_EFFICIENT_UNALIGNED_ACCESS to some tests
selftests/bpf: Honor $(O) when figuring out paths
tools/testing/selftests/bpf/Makefile | 4 ++++
tools/testing/selftests/bpf/verifier/atomic_cmpxchg.c | 1 +
tools/testing/selftests/bpf/verifier/ctx_skb.c | 2 ++
tools/testing/selftests/bpf/verifier/jmp32.c | 8 ++++++++
tools/testing/selftests/bpf/verifier/map_kptr.c | 2 ++
tools/testing/selftests/bpf/verifier/precise.c | 2 +-
6 files changed, 18 insertions(+), 1 deletion(-)
base-commit: a94098d490e17d652770f2309fcb9b46bc4cf864
--
2.39.2
In use_missing_map function, value is
initialized twice.There is no
connection between the two assignment.
This patch could fix this bug.
Signed-off-by: Wang Ming <machel(a)vivo.com>
---
tools/testing/selftests/bpf/progs/test_log_fixup.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/test_log_fixup.c b/tools/testing/selftests/bpf/progs/test_log_fixup.c
index 1bd48feaaa42..1c49b2f9be6c 100644
--- a/tools/testing/selftests/bpf/progs/test_log_fixup.c
+++ b/tools/testing/selftests/bpf/progs/test_log_fixup.c
@@ -52,13 +52,9 @@ struct {
SEC("?raw_tp/sys_enter")
int use_missing_map(const void *ctx)
{
- int zero = 0, *value;
+ int zero = 0;
- value = bpf_map_lookup_elem(&existing_map, &zero);
-
- value = bpf_map_lookup_elem(&missing_map, &zero);
-
- return value != NULL;
+ return bpf_map_lookup_elem(&missing_map, &zero) != NULL;
}
extern int bpf_nonexistent_kfunc(void) __ksym __weak;
--
2.25.1
From: Björn Töpel <bjorn(a)rivosinc.com>
Timeouts in kselftest are done using the "timeout" command with the
"--foreground" option. Without the "foreground" option, it is not
possible for a user to cancel the runner using SIGINT, because the
signal is not propagated to timeout which is running in a different
process group. The "forground" options places the timeout in the same
process group as its parent, but only sends the SIGTERM (on timeout)
signal to the forked process. Unfortunately, this does not play nice
with all kselftests, e.g. "net:fcnal-test.sh", where the child
processes will linger because timeout does not send SIGTERM to the
group.
Some users have noted these hangs [1].
Fix this by nesting the timeout with an additional timeout without the
foreground option.
Link: https://lore.kernel.org/all/7650b2eb-0aee-a2b0-2e64-c9bc63210f67@alu.unizg.… # [1]
Fixes: 651e0d881461 ("kselftest/runner: allow to properly deliver signals to tests")
Signed-off-by: Björn Töpel <bjorn(a)rivosinc.com>
---
tools/testing/selftests/kselftest/runner.sh | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh
index 1c952d1401d4..70e0a465e30d 100644
--- a/tools/testing/selftests/kselftest/runner.sh
+++ b/tools/testing/selftests/kselftest/runner.sh
@@ -36,7 +36,8 @@ tap_timeout()
{
# Make sure tests will time out if utility is available.
if [ -x /usr/bin/timeout ] ; then
- /usr/bin/timeout --foreground "$kselftest_timeout" $1
+ /usr/bin/timeout --foreground "$kselftest_timeout" \
+ /usr/bin/timeout "$kselftest_timeout" $1
else
$1
fi
base-commit: d528014517f2b0531862c02865b9d4c908019dc4
--
2.39.2
Here is a first batch of fixes for v6.5 and older.
The fixes are not linked to each others.
Patch 1 ensures subflows are unhashed before cleaning the backlog to
avoid races. This fixes another recent fix from v6.4.
Patch 2 does not rely on implicit state check in mptcp_listen() to avoid
races when receiving an MP_FASTCLOSE. A regression from v5.17.
The rest fixes issues in the selftests.
Patch 3 makes sure errors when setting up the environment are no longer
ignored. For v5.17+.
Patch 4 uses 'iptables-legacy' if available to be able to run on older
kernels. A fix for v5.13 and newer.
Patch 5 catches errors when issues are detected with packet marks. Also
for v5.13+.
Patch 6 uses the correct variable instead of an undefined one. Even if
there was no visible impact, it can help to find regressions later. An
issue visible in v5.19+.
Patch 7 makes sure errors with some sub-tests are reported to have the
selftest marked as failed as expected. Also for v5.19+.
Patch 8 adds a kernel config that is required to execute MPTCP
selftests. It is valid for v5.9+.
Patch 9 fixes issues when validating the userspace path-manager with
32-bit arch, an issue affecting v5.19+.
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Matthieu Baerts (7):
selftests: mptcp: connect: fail if nft supposed to work
selftests: mptcp: sockopt: use 'iptables-legacy' if available
selftests: mptcp: sockopt: return error if wrong mark
selftests: mptcp: userspace_pm: use correct server port
selftests: mptcp: userspace_pm: report errors with 'remove' tests
selftests: mptcp: depend on SYN_COOKIES
selftests: mptcp: pm_nl_ctl: fix 32-bit support
Paolo Abeni (2):
mptcp: ensure subflow is unhashed before cleaning the backlog
mptcp: do not rely on implicit state check in mptcp_listen()
net/mptcp/protocol.c | 7 +++++-
tools/testing/selftests/net/mptcp/config | 1 +
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 3 +++
tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 29 ++++++++++++----------
tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 10 ++++----
tools/testing/selftests/net/mptcp/userspace_pm.sh | 4 ++-
6 files changed, 34 insertions(+), 20 deletions(-)
---
base-commit: 14bb236b29922c4f57d8c05bfdbcb82677f917c9
change-id: 20230704-upstream-net-20230704-misc-fixes-6-5-rc1-c52608649559
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>