May 2023 - Linux-kselftest-mirror

[PATCH net-next 0/4] nexthop: Refactor and fix nexthop selection for multipath routes

by Benjamin Poirier

In order to select a nexthop for multipath routes, fib_select_multipath() is used with legacy nexthops and nexthop_select_path_hthr() is used with nexthop objects. Those two functions perform a validity test on the neighbor related to each nexthop but their logic is structured differently. This causes a divergence in behavior and nexthop_select_path_hthr() may return a nexthop that failed the neighbor validity test even if there was one that passed. Refactor nexthop_select_path_hthr() to make it more similar to fib_select_multipath() and fix the problem mentioned above. Benjamin Poirier (4): nexthop: Factor out hash threshold fdb nexthop selection nexthop: Factor out neighbor validity check nexthop: Do not return invalid nexthop object during multipath selection selftests: net: Add test cases for nexthop groups with invalid neighbors net/ipv4/nexthop.c | 64 +++++++--- tools/testing/selftests/net/fib_nexthops.sh | 129 ++++++++++++++++++++ 2 files changed, 174 insertions(+), 19 deletions(-) -- 2.40.1

1 year, 11 months

3
10
0 0

[PATCH v7 00/19] Add iommufd physical device operations for replace and alloc hwpt

by Jason Gunthorpe

This is the basic functionality for iommufd to support iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices. iommufd_device_replace() allows changing the HWPT associated with the device to a new IOAS or HWPT. Replace does this in way that failure leaves things unchanged, and utilizes the iommu iommu_group_replace_domain() API to allow the iommu driver to perform an optional non-disruptive change. IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and used by attach or replace. At this point it isn't very useful since the HWPT is the same as the automatically managed HWPT from the IOAS. However a following series will allow userspace to customize the created HWPT. The implementation is complicated because we have to introduce some per-iommu_group memory in iommufd and redo how we think about multi-device groups to be more explicit. This solves all the locking problems in the prior attempts. This series is infrastructure work for the following series which: - Add replace for attach - Expose replace through VFIO APIs - Implement driver parameters for HWPT creation (nesting) Once review of this is complete I will keep it on a side branch and accumulate the following series when they are ready so we can have a stable base and make more incremental progress. When we have all the parts together to get a full implementation it can go to Linus. This is on github: https://github.com/jgunthorpe/linux/commits/iommufd_hwpt v7: - Rebase to v6.4-rc2, update to new signature of iommufd_get_ioas() v6: https://lore.kernel.org/r/0-v6-fdb604df649a+369-iommufd_alloc_jgg@nvidia.com - Go back to the v4 locking arragnment with now both the attach/detach igroup->locks inside the functions, Kevin says he needs this for a followup series. This still fixes the syzkaller bug - Fix two more error unwind locking bugs where iommufd_object_abort_and_destroy(hwpt) would deadlock or be mislocked. Make sure fail_nth will catch these mistakes - Add a patch allowing objects to have different abort than destroy function, it allows hwpt abort to require the caller to continue to hold the lock and enforces this with lockdep. v5: https://lore.kernel.org/r/0-v5-6716da355392+c5-iommufd_alloc_jgg@nvidia.com - Go back to the v3 version of the code, keep the comment changes from v4. Syzkaller says the group lock change in v4 didn't work. - Adjust the fail_nth test to cover the path syzkaller found. We need to have an ioas with a mapped page installed to inject a failure during domain attachment. v4: https://lore.kernel.org/r/0-v4-9cd79ad52ee8+13f5-iommufd_alloc_jgg@nvidia.c… - Refine comments and commit messages - Move the group lock into iommufd_hw_pagetable_attach() - Fix error unwind in iommufd_device_do_replace() v3: https://lore.kernel.org/r/0-v3-61d41fd9e13e+1f5-iommufd_alloc_jgg@nvidia.com - Refine comments and commit messages - Adjust the flow in iommufd_device_auto_get_domain() so pt_id is only set on success - Reject replace on non-attached devices - Add missing __reserved check for IOMMU_HWPT_ALLOC v2: https://lore.kernel.org/r/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.c… - Use WARN_ON for the igroup->group test and move that logic to a function iommufd_group_try_get() - Change igroup->devices to igroup->device list Replace will need to iterate over all attached idevs - Rename to iommufd_group_setup_msi() - New patch to export iommu_get_resv_regions() - New patch to use per-device reserved regions instead of per-group regions - Split out the reorganizing of iommufd_device_change_pt() from the replace patch - Replace uses the per-dev reserved regions - Use stdev_id in a few more places in the selftest - Fix error handling in IOMMU_HWPT_ALLOC - Clarify comments - Rebase on v6.3-rc1 v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia… Jason Gunthorpe (17): iommufd: Move isolated msi enforcement to iommufd_device_bind() iommufd: Add iommufd_group iommufd: Replace the hwpt->devices list with iommufd_group iommu: Export iommu_get_resv_regions() iommufd: Keep track of each device's reserved regions instead of groups iommufd: Use the iommufd_group to avoid duplicate MSI setup iommufd: Make sw_msi_start a group global iommufd: Move putting a hwpt to a helper function iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc() iommufd: Allow a hwpt to be aborted after allocation iommufd: Fix locking around hwpt allocation iommufd: Reorganize iommufd_device_attach into iommufd_device_change_pt iommufd: Add iommufd_device_replace() iommufd: Make destroy_rwsem use a lock class per object type iommufd: Add IOMMU_HWPT_ALLOC iommufd/selftest: Return the real idev id from selftest mock_domain iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC Nicolin Chen (2): iommu: Introduce a new iommu_group_replace_domain() API iommufd/selftest: Test iommufd_device_replace() drivers/iommu/iommu-priv.h | 10 + drivers/iommu/iommu.c | 41 +- drivers/iommu/iommufd/device.c | 553 +++++++++++++----- drivers/iommu/iommufd/hw_pagetable.c | 112 +++- drivers/iommu/iommufd/io_pagetable.c | 32 +- drivers/iommu/iommufd/iommufd_private.h | 52 +- drivers/iommu/iommufd/iommufd_test.h | 6 + drivers/iommu/iommufd/main.c | 24 +- drivers/iommu/iommufd/selftest.c | 40 ++ include/linux/iommufd.h | 1 + include/uapi/linux/iommufd.h | 26 + tools/testing/selftests/iommu/iommufd.c | 67 ++- .../selftests/iommu/iommufd_fail_nth.c | 67 ++- tools/testing/selftests/iommu/iommufd_utils.h | 63 +- 14 files changed, 868 insertions(+), 226 deletions(-) create mode 100644 drivers/iommu/iommu-priv.h base-commit: f1fcbaa18b28dec10281551dfe6ed3a3ed80e3d6 -- 2.40.1

1 year, 11 months

5
36
0 0

ww_mutex.sh hangs since v5.16-rc1

by Li Zhijian

Hi Folks LKP/0Day found that ww_mutex.sh cannot complete since v5.16-rc1, but I'm pretty sorry that we failed to bisect the FBC, instead, the bisection pointed to a/below merge commit(91e1c99e17) finally. Due to this hang, other tests in the same group are also blocked in 0Day, we hope we can fix this hang ASAP. So if you have any idea about this, or need more debug information, feel free to let me know :) BTW, ww_mutex.sh was failed in v5.15 without hang, and looks it cannot reproduce on a vm. Our box: root@lkp-knm01 ~# lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 288 On-line CPU(s) list: 0-287 Thread(s) per core: 4 Core(s) per socket: 72 Socket(s): 1 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 133 Model name: Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz Stepping: 0 CPU MHz: 1385.255 CPU max MHz: 1600.0000 CPU min MHz: 1000.0000 BogoMIPS: 2992.76 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K NUMA node0 CPU(s): 0-287 NUMA node1 CPU(s): Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ring3mwait cpuid_fault epb pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms avx512f rdseed adx avx512pf avx512er avx512cd xsaveopt dtherm ida arat pln pts avx512_vpopcntdq avx512_4vnniw avx512_4fmaps Below the call stack in v5.16-rc2 [ 1000.374954][ T2713] make: Leaving directory '/usr/src/perf_selftests-x86_64-rhel-8.3-kselftests-136057256686de39cc3a07c2e39ef6bc43003ff6/tools/testing/selftests/locking' [ 1000.375030][ T2713] [ 1000.428791][ T2713] 2021-11-22 22:21:27 make run_tests -C locking [ 1000.428864][ T2713] [ 1000.491043][ T2713] make: Entering directory '/usr/src/perf_selftests-x86_64-rhel-8.3-kselftests-136057256686de39cc3a07c2e39ef6bc43003ff6/tools/testing/selftests/locking' [ 1000.491121][ T2713] [ 1000.540807][ T2713] TAP version 13 [ 1000.540882][ T2713] [ 1000.576050][ T2713] 1..1 [ 1000.576282][ T2713] [ 1000.612980][ T2713] # selftests: locking: ww_mutex.sh [ 1000.613288][ T2713] [ 1495.201324][ T1577] INFO: task kworker/u576:16:1470 blocked for more than 491 seconds. [ 1495.220059][ T1577] Tainted: G B 5.16.0-rc2 #1 [ 1495.240902][ T1577] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1495.265617][ T1577] task:kworker/u576:16 state:D stack: 0 pid: 1470 ppid: 2 flags:0x00004000 [ 1495.289054][ T1577] Workqueue: test-ww_mutex test_cycle_work [test_ww_mutex] [ 1495.310936][ T1577] Call Trace: [ 1495.327809][ T1577] <TASK> [ 1495.344735][ T1577] __schedule+0xdb0/0x25c0 [ 1495.362764][ T1577] ? io_schedule_timeout+0x180/0x180 [ 1495.382013][ T1577] ? lock_downgrade+0x680/0x680 [ 1495.400894][ T1577] ? do_raw_spin_lock+0x125/0x2c0 [ 1495.418866][ T1577] schedule+0xe4/0x280 [ 1495.435597][ T1577] schedule_preempt_disabled+0x18/0x40 [ 1495.454588][ T1577] __ww_mutex_lock+0x1248/0x34c0 [ 1495.476189][ T1577] ? test_cycle_work+0x1bb/0x500 [test_ww_mutex] [ 1495.497763][ T1577] ? mutex_lock_interruptible_nested+0x40/0x40 [ 1495.518959][ T1577] ? lock_downgrade+0x680/0x680 [ 1495.536861][ T1577] ? wait_for_completion_interruptible+0x340/0x340 [ 1495.556253][ T1577] ? ww_mutex_lock+0x3e/0x380 [ 1495.574003][ T1577] ww_mutex_lock+0x3e/0x380 [ 1495.591958][ T1577] test_cycle_work+0x1bb/0x500 [test_ww_mutex] [ 1495.612260][ T1577] ? stress_reorder_work+0xa00/0xa00 [test_ww_mutex] [ 1495.632857][ T1577] ? 0xffffffff81000000 [ 1495.649027][ T1577] ? rcu_read_lock_sched_held+0x5f/0x100 [ 1495.668211][ T1577] ? rcu_read_lock_bh_held+0xc0/0xc0 [ 1495.687010][ T1577] process_one_work+0x817/0x13c0 [ 1495.704991][ T1577] ? rcu_read_unlock+0x40/0x40 [ 1495.723024][ T1577] ? pwq_dec_nr_in_flight+0x280/0x280 [ 1495.740211][ T1577] ? rwlock_bug+0xc0/0xc0 [ 1495.758038][ T1577] worker_thread+0x8b/0xd80 [ 1495.775008][ T1577] ? process_one_work+0x13c0/0x13c0 [ 1495.793017][ T1577] kthread+0x3b9/0x4c0 [ 1495.810782][ T1577] ? set_kthread_struct+0x100/0x100 [ 1495.829988][ T1577] ret_from_fork+0x22/0x30 [ 1495.845811][ T1577] </TASK> [ 1495.859087][ T1577] INFO: task kworker/u576:36:1490 blocked for more than 492 seconds. [ 1495.879048][ T1577] Tainted: G B 5.16.0-rc2 #1 [ 1495.897879][ T1577] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1495.919582][ T1577] task:kworker/u576:36 state:D stack: 0 pid: 1490 ppid: 2 flags:0x00004000 [ 1495.941865][ T1577] Workqueue: test-ww_mutex test_cycle_work [test_ww_mutex] [ 1495.959889][ T1577] Call Trace: [ 1495.974816][ T1577] <TASK> [ 1495.988759][ T1577] __schedule+0xdb0/0x25c0 [ 1495.988759][ T1577] __schedule+0xdb0/0x25c0 [ 1496.003849][ T1577] ? io_schedule_timeout+0x180/0x180 [ 1496.020839][ T1577] ? lock_downgrade+0x680/0x680 [ 1496.036854][ T1577] ? do_raw_spin_lock+0x125/0x2c0 [ 1496.051976][ T1577] schedule+0xe4/0x280 [ 1496.067780][ T1577] schedule_preempt_disabled+0x18/0x40 [ 1496.085004][ T1577] __ww_mutex_lock+0x1248/0x34c0 [ 1496.101895][ T1577] ? test_cycle_work+0x1bb/0x500 [test_ww_mutex] [ 1496.119889][ T1577] ? mutex_lock_interruptible_nested+0x40/0x40 [ 1496.137873][ T1577] ? lock_downgrade+0x680/0x680 [ 1496.152657][ T1577] ? wait_for_completion_interruptible+0x340/0x340 [ 1496.168773][ T1577] ? ww_mutex_lock+0x3e/0x380 [ 1496.184862][ T1577] ww_mutex_lock+0x3e/0x380 [ 1496.199979][ T1577] test_cycle_work+0x1bb/0x500 [test_ww_mutex] [ 1496.216277][ T1577] ? stress_reorder_work+0xa00/0xa00 [test_ww_mutex] [ 1496.234904][ T1577] ? 0xffffffff81000000 [ 1496.249856][ T1577] ? rcu_read_lock_sched_held+0x5f/0x100 [ 1496.265951][ T1577] ? rcu_read_lock_bh_held+0xc0/0xc0 [ 1496.282815][ T1577] process_one_work+0x817/0x13c0 [ 1496.299791][ T1577] ? rcu_read_unlock+0x40/0x40 [ 1496.314754][ T1577] ? pwq_dec_nr_in_flight+0x280/0x280 [ 1496.331779][ T1577] ? rwlock_bug+0xc0/0xc0 [ 1496.348007][ T1577] worker_thread+0x8b/0xd80 [ 1496.362905][ T1577] ? process_one_work+0x13c0/0x13c0 [ 1496.378975][ T1577] kthread+0x3b9/0x4c0 [ 1496.393866][ T1577] ? set_kthread_struct+0x100/0x100 [ 1496.408827][ T1577] ret_from_fork+0x22/0x30 [ 1496.423901][ T1577] </TASK> [ 1496.437994][ T1577] INFO: task kworker/u576:0:15113 blocked for more than 492 seconds. [ 1496.455862][ T1577] Tainted: G B 5.16.0-rc2 #1 [ 1496.473759][ T1577] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1496.494808][ T1577] task:kworker/u576:0 state:D stack: 0 pid:15113 ppid: 2 flags:0x00004000 [ 1496.517000][ T1577] Workqueue: test-ww_mutex test_cycle_work [test_ww_mutex] [ 1496.537035][ T1577] Call Trace: [ 1496.551187][ T1577] <TASK> [ 1496.566405][ T1577] __schedule+0xdb0/0x25c0 [ 1496.582012][ T1577] ? io_schedule_timeout+0x180/0x180 [ 1496.598049][ T1577] ? lock_downgrade+0x680/0x680 [ 1496.615360][ T1577] ? do_raw_spin_lock+0x125/0x2c0 [ 1496.631835][ T1577] schedule+0xe4/0x280 [ 1496.645972][ T1577] schedule_preempt_disabled+0x18/0x40 [ 1496.663774][ T1577] __ww_mutex_lock+0x1248/0x34c0 [ 1496.681795][ T1577] ? test_cycle_work+0x1bb/0x500 [test_ww_mutex] [ 1496.698731][ T1577] ? mutex_lock_interruptible_nested+0x40/0x40 [ 1496.714996][ T1577] ? lock_downgrade+0x680/0x680 [ 1496.730888][ T1577] ? wait_for_completion_interruptible+0x340/0x340 [ 1496.747926][ T1577] ? ww_mutex_lock+0x3e/0x380 [ 1496.762482][ T1577] ww_mutex_lock+0x3e/0x380 [ 1496.778844][ T1577] test_cycle_work+0x1bb/0x500 [test_ww_mutex] And, we found that it occasionally hangs on v5.16-rc3 (1/3 runs), below is a good dmesg. [ 962.136756][ T2950] make: Entering directory '/usr/src/perf_selftests-x86_64-rhel-8.3-kselftests-d58071a8a76d779eedab38033ae4c821c30295a5/tools/testing/selftests/locking' [ 962.136831][ T2950]- [ 962.205036][ T2950] TAP version 13 [ 962.206003][ T2950]- [ 962.298458][ T2950] 1..1 [ 962.299657][ T2950]- [ 962.345588][ T2950] # selftests: locking: ww_mutex.sh [ 962.345657][ T2950]- [ 973.641869][T25509] All ww mutex selftests passed [ 973.773996][ T2950] # locking/ww_mutex: ok [ 973.774068][ T2950]- [ 973.774236][ T2960] # locking/ww_mutex: ok [ 973.802355][ T2960]- [ 973.829966][ T2950] ok 1 selftests: locking: ww_mutex.sh [ 973.834748][ T2950]- [ 973.838302][ T2960] ok 1 selftests: locking: ww_mutex.sh [ 973.899815][ T2960]- [ 973.921431][ T2950] make: Leaving directory '/usr/src/perf_selftests-x86_64-rhel-8.3-kselftests-d58071a8a76d779eedab38033ae4c821c30295a5/tools/testing/selftests/locking' [ 973.932312][ T2950]- [ 973.957345][ T2960] make: Leaving directory '/usr/src/perf_selftests-x86_64-rhel-8.3-kselftests-d58071a8a76d779eedab38033ae4c821c30295a5/tools/testing/selftests/locking' Thanks Zhijian@0Day

1 year, 12 months

3
3
0 0

[PATCH] kunit: tool: undo type subscripts for subprocess.Popen

by Daniel Latypov

Writing `subprocess.Popen[str]` requires python 3.9+. kunit.py has an assertion that the python version is 3.7+, so we should try to stay backwards compatible. This conflicts a bit with commit 1da2e6220e11 ("kunit: tool: fix pre-existing `mypy --strict` errors and update run_checks.py"), since mypy complains like so > kunit_kernel.py:95: error: Missing type parameters for generic type "Popen" [type-arg] Note: `mypy --strict --python-version 3.7` does not work. We could annotate each file with comments like `# mypy: disable-error-code="type-arg" but then we might still get nudged to break back-compat in other files. This patch adds a `mypy.ini` file since it seems like the only way to disable specific error codes for all our files. Note: run_checks.py doesn't need to specify `--config_file mypy.ini`, but I think being explicit is better, particularly since most kernel devs won't be familiar with how mypy works. Fixes: 695e26030858 ("kunit: tool: add subscripts for type annotations where appropriate") Reported-by: SeongJae Park <sj(a)kernel.org> Link: https://lore.kernel.org/linux-kselftest/20230501171520.138753-1-sj@kernel.o… Signed-off-by: Daniel Latypov <dlatypov(a)google.com> --- tools/testing/kunit/kunit_kernel.py | 6 +++--- tools/testing/kunit/mypy.ini | 6 ++++++ tools/testing/kunit/run_checks.py | 2 +- 3 files changed, 10 insertions(+), 4 deletions(-) create mode 100644 tools/testing/kunit/mypy.ini diff --git a/tools/testing/kunit/kunit_kernel.py b/tools/testing/kunit/kunit_kernel.py index f01f94106129..7f648802caf6 100644 --- a/tools/testing/kunit/kunit_kernel.py +++ b/tools/testing/kunit/kunit_kernel.py @@ -92,7 +92,7 @@ class LinuxSourceTreeOperations: if stderr: # likely only due to build warnings print(stderr.decode()) - def start(self, params: List[str], build_dir: str) -> subprocess.Popen[str]: + def start(self, params: List[str], build_dir: str) -> subprocess.Popen: raise RuntimeError('not implemented!') @@ -113,7 +113,7 @@ class LinuxSourceTreeOperationsQemu(LinuxSourceTreeOperations): kconfig.merge_in_entries(base_kunitconfig) return kconfig - def start(self, params: List[str], build_dir: str) -> subprocess.Popen[str]: + def start(self, params: List[str], build_dir: str) -> subprocess.Popen: kernel_path = os.path.join(build_dir, self._kernel_path) qemu_command = ['qemu-system-' + self._qemu_arch, '-nodefaults', @@ -142,7 +142,7 @@ class LinuxSourceTreeOperationsUml(LinuxSourceTreeOperations): kconfig.merge_in_entries(base_kunitconfig) return kconfig - def start(self, params: List[str], build_dir: str) -> subprocess.Popen[str]: + def start(self, params: List[str], build_dir: str) -> subprocess.Popen: """Runs the Linux UML binary. Must be named 'linux'.""" linux_bin = os.path.join(build_dir, 'linux') params.extend(['mem=1G', 'console=tty', 'kunit_shutdown=halt']) diff --git a/tools/testing/kunit/mypy.ini b/tools/testing/kunit/mypy.ini new file mode 100644 index 000000000000..ddd288309efa --- /dev/null +++ b/tools/testing/kunit/mypy.ini @@ -0,0 +1,6 @@ +[mypy] +strict = True + +# E.g. we can't write subprocess.Popen[str] until Python 3.9+. +# But kunit.py tries to support Python 3.7+, so let's disable it. +disable_error_code = type-arg diff --git a/tools/testing/kunit/run_checks.py b/tools/testing/kunit/run_checks.py index 8208c3b3135e..c6d494ea3373 100755 --- a/tools/testing/kunit/run_checks.py +++ b/tools/testing/kunit/run_checks.py @@ -23,7 +23,7 @@ commands: Dict[str, Sequence[str]] = { 'kunit_tool_test.py': ['./kunit_tool_test.py'], 'kunit smoke test': ['./kunit.py', 'run', '--kunitconfig=lib/kunit', '--build_dir=kunit_run_checks'], 'pytype': ['/bin/sh', '-c', 'pytype *.py'], - 'mypy': ['mypy', '--strict', '--exclude', '_test.py$', '--exclude', 'qemu_configs/', '.'], + 'mypy': ['mypy', '--config-file', 'mypy.ini', '--exclude', '_test.py$', '--exclude', 'qemu_configs/', '.'], } # The user might not have mypy or pytype installed, skip them if so. base-commit: a42077b787680cbc365a96446b30f32399fa3f6f -- 2.40.1.495.gc816e09b53d-goog

1 year, 12 months

4
8
0 0

[PATCH v2 00/13] nolibc: add part2 of support for rv32

by Zhangjin Wu

Hi, all Thanks very much for your review suggestions of the v1 series [1], we just sent out the generic part1 [2], and here is the part2 of the whole v2 revision. Changes from v1 -> v2: * Don't emulate the return values in the new syscalls path, fix up or support the new syscalls in the side of the related test cases (1-3) selftests/nolibc: remove gettimeofday_bad1/2 completely selftests/nolibc: support two errnos with EXPECT_SYSER2() selftests/nolibc: waitpid_min: add waitid syscall support (Review suggestions from Willy and Thomas) * Fix up new failure of the state_timestamps test case (4, new) tools/nolibc: add missing nanoseconds support for __NR_statx (Fixes for the commit a89c937d781a ("tools/nolibc: support nanoseconds in stat()") * Add new waitstatus macros as a standalone patch for the waitid support (5) tools/nolibc: add more wait status related types (Split and Cleanup for the waitid syscall based sys_wait4) * Pure 64bit lseek and time64 select/poll/gettimeofday support (6-11) tools/nolibc: add pure 64bit off_t, time_t and blkcnt_t tools/nolibc: sys_lseek: add pure 64bit lseek tools/nolibc: add pure 64bit time structs tools/nolibc: sys_select: add pure 64bit select tools/nolibc: sys_poll: add pure 64bit poll tools/nolibc: sys_gettimeofday: add pure 64bit gettimeofday (Review suggestions from Arnd, Thomas and Willy, time32 variants have been removed completely and some fixups) * waitid syscall support cleanup (12) tools/nolibc: sys_wait4: add waitid syscall support (Sync with the waitstatus macros update and Removal of emulated code) * rv32 nolibc-test support, commit message update (13) selftests/nolibc: riscv: customize makefile for rv32 (Review suggestions from Thomas, explain more about the change logic in commit message) Best regards, Zhangjin --- [1]: https://lore.kernel.org/linux-riscv/20230529113143.GB2762@1wt.eu/T/#t [2]: https://lore.kernel.org/linux-riscv/cover.1685362482.git.falcon@tinylab.org/ Zhangjin Wu (13): selftests/nolibc: remove gettimeofday_bad1/2 completely selftests/nolibc: support two errnos with EXPECT_SYSER2() selftests/nolibc: waitpid_min: add waitid syscall support tools/nolibc: add missing nanoseconds support for __NR_statx tools/nolibc: add more wait status related types tools/nolibc: add pure 64bit off_t, time_t and blkcnt_t tools/nolibc: sys_lseek: add pure 64bit lseek tools/nolibc: add pure 64bit time structs tools/nolibc: sys_select: add pure 64bit select tools/nolibc: sys_poll: add pure 64bit poll tools/nolibc: sys_gettimeofday: add pure 64bit gettimeofday tools/nolibc: sys_wait4: add waitid syscall support selftests/nolibc: riscv: customize makefile for rv32 tools/include/nolibc/arch-aarch64.h | 3 - tools/include/nolibc/arch-loongarch.h | 3 - tools/include/nolibc/arch-riscv.h | 3 - tools/include/nolibc/std.h | 28 ++-- tools/include/nolibc/sys.h | 134 +++++++++++++++---- tools/include/nolibc/types.h | 58 +++++++- tools/testing/selftests/nolibc/Makefile | 11 +- tools/testing/selftests/nolibc/nolibc-test.c | 20 +-- 8 files changed, 202 insertions(+), 58 deletions(-) -- 2.25.1

2 years

5
33
0 0

[PATCH -next] selftests/ptrace: Fix Test terminated by timeout in ptrace_attach

by limin

That is an open issue Bernd Edlinger wrote the test case in anticipation that all of patch series got accepted,but the last patch was not picked up for inclusion in the linux kernel. How to reproduce warning: $ make -C tools/testing/selftests TARGETS=ptrace run_tests Example vmaccess from 6.1.0-next source tree run fail on bare metal RUN global.attach ... attach: Test terminated by timeout FAIL global.attach Link:https://lore.kernel.org/all/AM8PR10MB4708E6FF0E155261455064C2E4209@AM8… Fixes: 2de4e82318c7 ("selftests/ptrace: add test cases for dead-locks") Signed-off-by: limin <limin100(a)huawei.com> --- tools/testing/selftests/ptrace/vmaccess.c | 37 ++++++++--------------- 1 file changed, 13 insertions(+), 24 deletions(-) diff --git a/tools/testing/selftests/ptrace/vmaccess.c b/tools/testing/selftests/ptrace/vmaccess.c index 4db327b44586..751a41f1163c 100644 --- a/tools/testing/selftests/ptrace/vmaccess.c +++ b/tools/testing/selftests/ptrace/vmaccess.c @@ -45,42 +45,31 @@ TEST(vmaccess) TEST(attach) { - int s, k, pid = fork(); + int k; + int s; + pid_t pid = fork(); if (!pid) { - pthread_t pt; - - pthread_create(&pt, NULL, thread, NULL); - pthread_join(pt, NULL); + ptrace(PTRACE_TRACEME, 0, NULL, NULL); execlp("sleep", "sleep", "2", NULL); } sleep(1); k = ptrace(PTRACE_ATTACH, pid, 0L, 0L); - ASSERT_EQ(errno, EAGAIN); + printf("k1:%d\n", k); + ASSERT_EQ(k, -1); + waitpid(pid, &s, WNOHANG); ASSERT_EQ(k, -1); - k = waitpid(-1, &s, WNOHANG); - ASSERT_NE(k, -1); ASSERT_NE(k, 0); ASSERT_NE(k, pid); - ASSERT_EQ(WIFEXITED(s), 1); - ASSERT_EQ(WEXITSTATUS(s), 0); - sleep(1); - k = ptrace(PTRACE_ATTACH, pid, 0L, 0L); - ASSERT_EQ(k, 0); - k = waitpid(-1, &s, 0); - ASSERT_EQ(k, pid); + if (WIFEXITED(s)) + ASSERT_EQ(WEXITSTATUS(s), 0); + if (WIFSTOPPED(s)) + ASSERT_EQ(WSTOPSIG(s), SIGTRAP); ASSERT_EQ(WIFSTOPPED(s), 1); - ASSERT_EQ(WSTOPSIG(s), SIGSTOP); - k = ptrace(PTRACE_DETACH, pid, 0L, 0L); - ASSERT_EQ(k, 0); - k = waitpid(-1, &s, 0); - ASSERT_EQ(k, pid); - ASSERT_EQ(WIFEXITED(s), 1); - ASSERT_EQ(WEXITSTATUS(s), 0); - k = waitpid(-1, NULL, 0); + sleep(1); + ptrace(PTRACE_CONT, pid, NULL, NULL); ASSERT_EQ(k, -1); - ASSERT_EQ(errno, ECHILD); } TEST_HARNESS_MAIN -- 2.33.0

2 years

4
3
0 0

[PATCH v8 0/5] mm/memfd: introduce MFD_NOEXEC_SEAL and MFD_EXEC

by jeffxu＠chromium.org

From: Jeff Xu <jeffxu(a)google.com> Since Linux introduced the memfd feature, memfd have always had their execute bit set, and the memfd_create() syscall doesn't allow setting it differently. However, in a secure by default system, such as ChromeOS, (where all executables should come from the rootfs, which is protected by Verified boot), this executable nature of memfd opens a door for NoExec bypass and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm process created a memfd to share the content with an external process, however the memfd is overwritten and used for executing arbitrary code and root escalation. [2] lists more VRP in this kind. On the other hand, executable memfd has its legit use, runc uses memfd’s seal and executable feature to copy the contents of the binary then execute them, for such system, we need a solution to differentiate runc's use of executable memfds and an attacker's [3]. To address those above, this set of patches add following: 1> Let memfd_create() set X bit at creation time. 2> Let memfd to be sealed for modifying X bit. 3> A new pid namespace sysctl: vm.memfd_noexec to control the behavior of X bit.For example, if a container has vm.memfd_noexec=2, then memfd_create() without MFD_NOEXEC_SEAL will be rejected. 4> A new security hook in memfd_create(). This make it possible to a new LSM, which rejects or allows executable memfd based on its security policy. Change history: v8: - Update ref bug in cover letter. - Add Reviewed-by field. - Remove security hook (security_memfd_create) patch, which will have its own patch set in future. v7: - patch 2/6: remove #ifdef and MAX_PATH (memfd_test.c). - patch 3/6: check capability (CAP_SYS_ADMIN) from userns instead of global ns (pid_sysctl.h). Add a tab (pid_namespace.h). - patch 5/6: remove #ifdef (memfd_test.c) - patch 6/6: remove unneeded security_move_mount(security.c). v6:https://lore.kernel.org/lkml/20221206150233.1963717-1-jeffxu@google.com/ - Address comment and move "#ifdef CONFIG_" from .c file to pid_sysctl.h v5:https://lore.kernel.org/lkml/20221206152358.1966099-1-jeffxu@google.com/ - Pass vm.memfd_noexec from current ns to child ns. - Fix build issue detected by kernel test robot. - Add missing security.c v3:https://lore.kernel.org/lkml/20221202013404.163143-1-jeffxu@google.com/ - Address API design comments in v2. - Let memfd_create() to set X bit at creation time. - A new pid namespace sysctl: vm.memfd_noexec to control behavior of X bit. - A new security hook in memfd_create(). v2:https://lore.kernel.org/lkml/20220805222126.142525-1-jeffxu@google.com/ - address comments in V1. - add sysctl (vm.mfd_noexec) to set the default file permissions of memfd_create to be non-executable. v1:https://lwn.net/Articles/890096/ [1] https://crbug.com/1305267 [2] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20me… [3] https://lwn.net/Articles/781013/ Daniel Verkamp (2): mm/memfd: add F_SEAL_EXEC selftests/memfd: add tests for F_SEAL_EXEC Jeff Xu (3): mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC mm/memfd: Add write seals when apply SEAL_EXEC to executable memfd selftests/memfd: add tests for MFD_NOEXEC_SEAL MFD_EXEC include/linux/pid_namespace.h | 19 ++ include/uapi/linux/fcntl.h | 1 + include/uapi/linux/memfd.h | 4 + kernel/pid_namespace.c | 5 + kernel/pid_sysctl.h | 59 ++++ mm/memfd.c | 56 +++- mm/shmem.c | 6 + tools/testing/selftests/memfd/fuse_test.c | 1 + tools/testing/selftests/memfd/memfd_test.c | 341 ++++++++++++++++++++- 9 files changed, 489 insertions(+), 3 deletions(-) create mode 100644 kernel/pid_sysctl.h base-commit: eb7081409f94a9a8608593d0fb63a1aa3d6f95d8 -- 2.39.0.rc1.256.g54fd8350bd-goog

2 years

4
11
0 0

[PATCH v2 00/11] iommufd: Add nesting infrastructure

by Yi Liu

Nested translation is a hardware feature that is supported by many modern IOMMU hardwares. It has two stages (stage-1, stage-2) address translation to get access to the physical address. stage-1 translation table is owned by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes to stage-1 translation table should be followed by an IOTLB invalidation. Take Intel VT-d as an example, the stage-1 translation table is I/O page table. As the below diagram shows, guest I/O page table pointer in GPA (guest physical address) is passed to host and be used to perform the stage-1 address translation. Along with it, modifications to present mappings in the guest I/O page table should be followed with an IOTLB invalidation. .-------------. .---------------------------. | vIOMMU | | Guest I/O page table | | | '---------------------------' .----------------/ | PASID Entry |--- PASID cache flush --+ '-------------' | | | V | | I/O page table pointer in GPA '-------------' Guest ------| Shadow |--------------------------|-------- v v v Host .-------------. .------------------------. | pIOMMU | | FS for GIOVA->GPA | | | '------------------------' .----------------/ | | PASID Entry | V (Nested xlate) '----------------\.----------------------------------. | | | SS for GPA->HPA, unmanaged domain| | | '----------------------------------' '-------------' Where: - FS = First stage page tables - SS = Second stage page tables <Intel VT-d Nested translation> In IOMMUFD, all the translation tables are tracked by hw_pagetable (hwpt) and each has an iommu_domain allocated from iommu driver. So in this series hw_pagetable and iommu_domain means the same thing if no special note. IOMMUFD has already supported allocating hw_pagetable that is linked with an IOAS. However, nesting requires IOMMUFD to allow allocating hw_pagetable with driver specific parameters and interface to sync stage-1 IOTLB as user owns the stage-1 translation table. This series is based on the iommu hw info reporting series [1]. It first introduces new iommu op for allocating domains with user data and the op for syncing stage-1 IOTLB, and then extend the IOMMUFD internal infrastructure to accept user_data and parent hwpt, then relay the data to iommu core to allocate iommu_domain. After it, extend the ioctl IOMMU_HWPT_ALLOC to accept user data and stage-2 hwpt ID to allocate hwpt. Along with it, ioctl IOMMU_HWPT_INVALIDATE is added to invalidate stage-1 IOTLB. This is needed for user-managed hwpts. Selftest is added as well to cover the new ioctls. Complete code can be found in [2], QEMU could can be found in [3]. At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks them for the help. ^_^. Look forward to your feedbacks. base-commit: cf905391237ded2331388e75adb5afbabeddc852 [1] https://lore.kernel.org/linux-iommu/20230511143024.19542-1-yi.l.liu@intel.c… [2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting [3] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv4.mig.reset.v4_var3%… Change log: v2: - Add union iommu_domain_user_data to include all user data structures to avoid passing void * in kernel APIs. - Add iommu op to return user data length for user domain allocation - Rename struct iommu_hwpt_alloc::data_type to be hwpt_type - Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len - Convert cache_invalidate_user op to be int instead of void - Remove @data_type in struct iommu_hwpt_invalidate - Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1 v1: https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.… Thanks, Yi Liu Lu Baolu (2): iommu: Add new iommu op to create domains owned by userspace iommu: Add nested domain support Nicolin Chen (5): iommufd/hw_pagetable: Do not populate user-managed hw_pagetables iommufd/selftest: Add domain_alloc_user() support in iommu mock iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl Yi Liu (4): iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc() iommufd: IOMMU_HWPT_ALLOC allocation with user data iommufd: Add IOMMU_HWPT_INVALIDATE drivers/iommu/iommufd/device.c | 2 +- drivers/iommu/iommufd/hw_pagetable.c | 191 +++++++++++++++++- drivers/iommu/iommufd/iommufd_private.h | 16 +- drivers/iommu/iommufd/iommufd_test.h | 30 +++ drivers/iommu/iommufd/main.c | 5 +- drivers/iommu/iommufd/selftest.c | 119 ++++++++++- include/linux/iommu.h | 36 ++++ include/uapi/linux/iommufd.h | 58 +++++- tools/testing/selftests/iommu/iommufd.c | 126 +++++++++++- tools/testing/selftests/iommu/iommufd_utils.h | 70 +++++++ 10 files changed, 629 insertions(+), 24 deletions(-) -- 2.34.1

2 years

7
63
0 0

[RFC PATCHES 00/17] IOMMUFD: Deliver IO page faults to user space

by Lu Baolu

Hi folks, This series implements the functionality of delivering IO page faults to user space through the IOMMUFD framework. The use case is nested translation, where modern IOMMU hardware supports two-stage translation tables. The second-stage translation table is managed by the host VMM while the first-stage translation table is owned by the user space. Hence, any IO page fault that occurs on the first-stage page table should be delivered to the user space and handled there. The user space should respond the page fault handling result to the device top-down through the IOMMUFD response uAPI. User space indicates its capablity of handling IO page faults by setting a user HWPT allocation flag IOMMU_HWPT_ALLOC_FLAGS_IOPF_CAPABLE. IOMMUFD will then setup its infrastructure for page fault delivery. Together with the iopf-capable flag, user space should also provide an eventfd where it will listen on any down-top page fault messages. On a successful return of the allocation of iopf-capable HWPT, a fault fd will be returned. User space can open and read fault messages from it once the eventfd is signaled. Besides the overall design, I'd like to hear comments about below designs: - The IOMMUFD fault message format. It is very similar to that in uapi/linux/iommu which has been discussed before and partially used by the IOMMU SVA implementation. I'd like to get more comments on the format when it comes to IOMMUFD. - The timeout value for the pending page fault messages. Ideally we should determine the timeout value from the device configuration, but I failed to find any statement in the PCI specification (version 6.x). A default 100 milliseconds is selected in the implementation, but it leave the room for grow the code for per-device setting. This series is only for review comment purpose. I used IOMMUFD selftest to verify the hwpt allocation, attach/detach and replace. But I didn't get a chance to run it with real hardware yet. I will do more test in the subsequent versions when I am confident that I am heading on the right way. This series is based on the latest implementation of the nested translation under discussion. The whole series and related patches are available on gitbub: https://github.com/LuBaolu/intel-iommu/commits/iommufd-io-pgfault-delivery-… Best regards, baolu Lu Baolu (17): iommu: Move iommu fault data to linux/iommu.h iommu: Support asynchronous I/O page fault response iommu: Add helper to set iopf handler for domain iommu: Pass device parameter to iopf handler iommu: Split IO page fault handling from SVA iommu: Add iommu page fault cookie helpers iommufd: Add iommu page fault data iommufd: IO page fault delivery initialization and release iommufd: Add iommufd hwpt iopf handler iommufd: Add IOMMU_HWPT_ALLOC_FLAGS_USER_PASID_TABLE for hwpt_alloc iommufd: Deliver fault messages to user space iommufd: Add io page fault response support iommufd: Add a timer for each iommufd fault data iommufd: Drain all pending faults when destroying hwpt iommufd: Allow new hwpt_alloc flags iommufd/selftest: Add IOPF feature for mock devices iommufd/selftest: Cover iopf-capable nested hwpt include/linux/iommu.h | 175 +++++++++- drivers/iommu/{iommu-sva.h => io-pgfault.h} | 25 +- drivers/iommu/iommu-priv.h | 3 + drivers/iommu/iommufd/iommufd_private.h | 32 ++ include/uapi/linux/iommu.h | 161 --------- include/uapi/linux/iommufd.h | 73 +++- tools/testing/selftests/iommu/iommufd_utils.h | 20 +- .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +- drivers/iommu/intel/iommu.c | 2 +- drivers/iommu/intel/svm.c | 2 +- drivers/iommu/io-pgfault.c | 7 +- drivers/iommu/iommu-sva.c | 4 +- drivers/iommu/iommu.c | 50 ++- drivers/iommu/iommufd/device.c | 64 +++- drivers/iommu/iommufd/hw_pagetable.c | 318 +++++++++++++++++- drivers/iommu/iommufd/main.c | 3 + drivers/iommu/iommufd/selftest.c | 71 ++++ tools/testing/selftests/iommu/iommufd.c | 17 +- MAINTAINERS | 1 - drivers/iommu/Kconfig | 4 + drivers/iommu/Makefile | 3 +- drivers/iommu/intel/Kconfig | 1 + 23 files changed, 837 insertions(+), 203 deletions(-) rename drivers/iommu/{iommu-sva.h => io-pgfault.h} (71%) delete mode 100644 include/uapi/linux/iommu.h -- 2.34.1

2 years

5
36
0 0

[PATCH v2 0/4] KVM: selftests: add powerpc support

by Nicholas Piggin

This series adds initial KVM selftests support for powerpc (64-bit, BookS, radix MMU). Since v1: - Update MAINTAINERS KVM PPC entry to include kvm selftests. - Fixes and cleanups from Sean's review including new patch 1. - Add 4K guest page support requiring new patch 2. Thanks, Nick Nicholas Piggin (4): KVM: selftests: Move pgd_created check into virt_pgd_alloc KVM: selftests: Add aligned guest physical page allocator KVM: PPC: selftests: add support for powerpc KVM: PPC: selftests: add selftests sanity tests MAINTAINERS | 2 + tools/testing/selftests/kvm/Makefile | 15 + .../selftests/kvm/include/kvm_util_base.h | 27 ++ .../selftests/kvm/include/powerpc/hcall.h | 21 + .../selftests/kvm/include/powerpc/ppc_asm.h | 32 ++ .../selftests/kvm/include/powerpc/processor.h | 33 ++ .../selftests/kvm/lib/aarch64/processor.c | 4 - tools/testing/selftests/kvm/lib/guest_modes.c | 3 + tools/testing/selftests/kvm/lib/kvm_util.c | 56 ++- .../selftests/kvm/lib/powerpc/handlers.S | 93 ++++ .../testing/selftests/kvm/lib/powerpc/hcall.c | 45 ++ .../selftests/kvm/lib/powerpc/processor.c | 429 ++++++++++++++++++ .../testing/selftests/kvm/lib/powerpc/ucall.c | 30 ++ .../selftests/kvm/lib/riscv/processor.c | 4 - .../selftests/kvm/lib/s390x/processor.c | 4 - .../selftests/kvm/lib/x86_64/processor.c | 7 +- tools/testing/selftests/kvm/powerpc/helpers.h | 46 ++ .../testing/selftests/kvm/powerpc/null_test.c | 166 +++++++ .../selftests/kvm/powerpc/rtas_hcall.c | 146 ++++++ 19 files changed, 1129 insertions(+), 34 deletions(-) create mode 100644 tools/testing/selftests/kvm/include/powerpc/hcall.h create mode 100644 tools/testing/selftests/kvm/include/powerpc/ppc_asm.h create mode 100644 tools/testing/selftests/kvm/include/powerpc/processor.h create mode 100644 tools/testing/selftests/kvm/lib/powerpc/handlers.S create mode 100644 tools/testing/selftests/kvm/lib/powerpc/hcall.c create mode 100644 tools/testing/selftests/kvm/lib/powerpc/processor.c create mode 100644 tools/testing/selftests/kvm/lib/powerpc/ucall.c create mode 100644 tools/testing/selftests/kvm/powerpc/helpers.h create mode 100644 tools/testing/selftests/kvm/powerpc/null_test.c create mode 100644 tools/testing/selftests/kvm/powerpc/rtas_hcall.c -- 2.40.0

2 years

3
7
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror May 2023