September 2023 - Linux-kselftest-mirror

by Tiezhu Yang

v2: Rebase on 6.5-rc1 and update the commit message Tiezhu Yang (2): selftests/vDSO: Add support for LoongArch selftests/vDSO: Get version and name for all archs tools/testing/selftests/vDSO/vdso_config.h | 6 ++++- tools/testing/selftests/vDSO/vdso_test_getcpu.c | 16 +++++-------- .../selftests/vDSO/vdso_test_gettimeofday.c | 26 ++++++---------------- 3 files changed, 18 insertions(+), 30 deletions(-) -- 2.1.0

7 months, 3 weeks

1
4
0 0

[PATCH v1 00/20] Permission Overlay Extension

by Joey Gouly

Hi all, This series implements the Permission Overlay Extension introduced in 2022 VMSA enhancements [1]. It is based on v6.6-rc3. The Permission Overlay Extension allows to constrain permissions on memory regions. This can be used from userspace (EL0) without a system call or TLB invalidation. POE is used to implement the Memory Protection Keys [2] Linux syscall. The first few patches add the basic framework, then the PKEYS interface is implemented, and then the selftests are made to work on arm64. There was discussion about what the 'default' protection key value should be, I used disallow-all (apart from pkey 0), which matches what x86 does. Patch 15 contains a call to cpus_have_const_cap(), which I couldn't avoid until Mark's patch to re-order when the alternatives were applied [3] is committed. The KVM part isn't tested yet. I have tested the modified protection_keys test on x86_64 [4], but not PPC. Hopefully I have CC'd everyone correctly. Thanks, Joey Joey Gouly (20): arm64/sysreg: add system register POR_EL{0,1} arm64/sysreg: update CPACR_EL1 register arm64: cpufeature: add Permission Overlay Extension cpucap arm64: disable trapping of POR_EL0 to EL2 arm64: context switch POR_EL0 register KVM: arm64: Save/restore POE registers arm64: enable the Permission Overlay Extension for EL0 arm64: add POIndex defines arm64: define VM_PKEY_BIT* for arm64 arm64: mask out POIndex when modifying a PTE arm64: enable ARCH_HAS_PKEYS on arm64 arm64: handle PKEY/POE faults arm64: stop using generic mm_hooks.h arm64: implement PKEYS support arm64: add POE signal support arm64: enable PKEY support for CPUs with S1POE arm64: enable POE and PIE to coexist kselftest/arm64: move get_header() selftests: mm: move fpregs printing selftests: mm: make protection_keys test work on arm64 Documentation/arch/arm64/elf_hwcaps.rst | 3 + arch/arm64/Kconfig | 1 + arch/arm64/include/asm/el2_setup.h | 10 +- arch/arm64/include/asm/hwcap.h | 1 + arch/arm64/include/asm/kvm_host.h | 4 + arch/arm64/include/asm/mman.h | 6 +- arch/arm64/include/asm/mmu.h | 2 + arch/arm64/include/asm/mmu_context.h | 51 ++++++- arch/arm64/include/asm/pgtable-hwdef.h | 10 ++ arch/arm64/include/asm/pgtable-prot.h | 8 +- arch/arm64/include/asm/pgtable.h | 28 +++- arch/arm64/include/asm/pkeys.h | 110 ++++++++++++++ arch/arm64/include/asm/por.h | 33 +++++ arch/arm64/include/asm/processor.h | 1 + arch/arm64/include/asm/sysreg.h | 16 ++ arch/arm64/include/asm/traps.h | 1 + arch/arm64/include/uapi/asm/hwcap.h | 1 + arch/arm64/include/uapi/asm/sigcontext.h | 7 + arch/arm64/kernel/cpufeature.c | 15 ++ arch/arm64/kernel/cpuinfo.c | 1 + arch/arm64/kernel/process.c | 16 ++ arch/arm64/kernel/signal.c | 51 +++++++ arch/arm64/kernel/traps.c | 12 +- arch/arm64/kvm/sys_regs.c | 2 + arch/arm64/mm/fault.c | 44 +++++- arch/arm64/mm/mmap.c | 7 + arch/arm64/mm/mmu.c | 38 +++++ arch/arm64/tools/cpucaps | 1 + arch/arm64/tools/sysreg | 11 +- fs/proc/task_mmu.c | 2 + include/linux/mm.h | 11 +- .../arm64/signal/testcases/testcases.c | 23 --- .../arm64/signal/testcases/testcases.h | 26 +++- tools/testing/selftests/mm/Makefile | 2 +- tools/testing/selftests/mm/pkey-arm64.h | 138 ++++++++++++++++++ tools/testing/selftests/mm/pkey-helpers.h | 8 + tools/testing/selftests/mm/pkey-powerpc.h | 3 + tools/testing/selftests/mm/pkey-x86.h | 4 + tools/testing/selftests/mm/protection_keys.c | 29 ++-- 39 files changed, 685 insertions(+), 52 deletions(-) create mode 100644 arch/arm64/include/asm/pkeys.h create mode 100644 arch/arm64/include/asm/por.h create mode 100644 tools/testing/selftests/mm/pkey-arm64.h -- 2.25.1

7 months, 3 weeks

6
40
0 0

[PATCH] selftests/x86/lam: Zero out buffer for readlink()

by Binbin Wu

Zero out the buffer for readlink() since readlink() does not append a terminating null byte to the buffer. Fixes: 833c12ce0f430 ("selftests/x86/lam: Add inherit test cases for linear-address masking") Signed-off-by: Binbin Wu <binbin.wu(a)linux.intel.com> --- tools/testing/selftests/x86/lam.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c index eb0e46905bf9..9f06942a8e25 100644 --- a/tools/testing/selftests/x86/lam.c +++ b/tools/testing/selftests/x86/lam.c @@ -680,7 +680,7 @@ static int handle_execve(struct testcases *test) perror("Fork failed."); ret = 1; } else if (pid == 0) { - char path[PATH_MAX]; + char path[PATH_MAX] = {0}; /* Set LAM mode in parent process */ if (set_lam(lam) != 0) base-commit: ce9ecca0238b140b88f43859b211c9fdfd8e5b70 -- 2.25.1

7 months, 3 weeks

2
4
0 0

[RFC 0/8] iommufd support pasid attach/replace

by Yi Liu

PASID (Process Address Space ID) is a PCIe extension to tag the DMA transactions out of a physical device, and most modern IOMMU hardware have supported PASID granular address translation. So a PASID-capable devices can be attached to multiple hwpts (a.k.a. domains), each attachment is tagged with a PASID. This series first adds a missing iommu API to replace domain for a pasid, then adds iommufd APIs for device drivers to attach/replace/detach pasid to/from hwpt per userspace's request, and adds selftest to validate the iommufd APIs. pasid attach/replace is mandatory on Intel VT-d given the PASID table locates in the physical address space hence must be managed by the kernel, both for supporting vSVA and coming SIOV. But it's optional on ARM/AMD which allow configuring the PASID/CD table either in host physical address space or nested on top of an GPA address space. This series only add VT-d support as the minimal requirement. Complete code can be found in below link: https://github.com/yiliu1765/iommufd/tree/iommufd_pasid Regards, Yi Liu Kevin Tian (1): iommufd: Support attach/replace hwpt per pasid Lu Baolu (2): iommu: Introduce a replace API for device pasid iommu/vt-d: Add set_dev_pasid callback for nested domain Yi Liu (5): iommufd: replace attach_fn with a structure iommufd/selftest: Add set_dev_pasid and remove_dev_pasid in mock iommu iommufd/selftest: Add a helper to get test device iommufd/selftest: Add test ops to test pasid attach/detach iommufd/selftest: Add coverage for iommufd pasid attach/detach drivers/iommu/intel/nested.c | 47 +++++ drivers/iommu/iommu-priv.h | 2 + drivers/iommu/iommu.c | 73 ++++++-- drivers/iommu/iommufd/Makefile | 1 + drivers/iommu/iommufd/device.c | 42 +++-- drivers/iommu/iommufd/iommufd_private.h | 16 ++ drivers/iommu/iommufd/iommufd_test.h | 24 +++ drivers/iommu/iommufd/pasid.c | 152 ++++++++++++++++ drivers/iommu/iommufd/selftest.c | 158 ++++++++++++++-- include/linux/iommufd.h | 6 + tools/testing/selftests/iommu/iommufd.c | 172 ++++++++++++++++++ .../selftests/iommu/iommufd_fail_nth.c | 28 ++- tools/testing/selftests/iommu/iommufd_utils.h | 78 ++++++++ 13 files changed, 756 insertions(+), 43 deletions(-) create mode 100644 drivers/iommu/iommufd/pasid.c -- 2.34.1

7 months, 3 weeks

4
20
0 0

[RFC PATCH v2 00/20] context_tracking,x86: Defer some IPIs until a user->kernel transition

by Valentin Schneider

Context ======= We've observed within Red Hat that isolated, NOHZ_FULL CPUs running a pure-userspace application get regularly interrupted by IPIs sent from housekeeping CPUs. Those IPIs are caused by activity on the housekeeping CPUs leading to various on_each_cpu() calls, e.g.: 64359.052209596 NetworkManager 0 1405 smp_call_function_many_cond (cpu=0, func=do_kernel_range_flush) smp_call_function_many_cond+0x1 smp_call_function+0x39 on_each_cpu+0x2a flush_tlb_kernel_range+0x7b __purge_vmap_area_lazy+0x70 _vm_unmap_aliases.part.42+0xdf change_page_attr_set_clr+0x16a set_memory_ro+0x26 bpf_int_jit_compile+0x2f9 bpf_prog_select_runtime+0xc6 bpf_prepare_filter+0x523 sk_attach_filter+0x13 sock_setsockopt+0x92c __sys_setsockopt+0x16a __x64_sys_setsockopt+0x20 do_syscall_64+0x87 entry_SYSCALL_64_after_hwframe+0x65 The heart of this series is the thought that while we cannot remove NOHZ_FULL CPUs from the list of CPUs targeted by these IPIs, they may not have to execute the callbacks immediately. Anything that only affects kernelspace can wait until the next user->kernel transition, providing it can be executed "early enough" in the entry code. The original implementation is from Peter [1]. Nicolas then added kernel TLB invalidation deferral to that [2], and I picked it up from there. Deferral approach ================= Storing each and every callback, like a secondary call_single_queue turned out to be a no-go: the whole point of deferral is to keep NOHZ_FULL CPUs in userspace for as long as possible - no signal of any form would be sent when deferring an IPI. This means that any form of queuing for deferred callbacks would end up as a convoluted memory leak. Deferred IPIs must thus be coalesced, which this series achieves by assigning IPIs a "type" and having a mapping of IPI type to callback, leveraged upon kernel entry. What about IPIs whose callback take a parameter, you may ask? Peter suggested during OSPM23 [3] that since on_each_cpu() targets housekeeping CPUs *and* isolated CPUs, isolated CPUs can access either global or housekeeping-CPU-local state to "reconstruct" the data that would have been sent via the IPI. This series does not affect any IPI callback that requires an argument, but the approach would remain the same (one coalescable callback executed on kernel entry). Kernel entry vs execution of the deferred operation =================================================== There is a non-zero length of code that is executed upon kernel entry before the deferred operation can be itself executed (i.e. before we start getting into context_tracking.c proper). This means one must take extra care to what can happen in the early entry code, and that <bad things> cannot happen. For instance, we really don't want to hit instructions that have been modified by a remote text_poke() while we're on our way to execute a deferred sync_core(). Patches ======= o Patches 1-9 have been submitted separately and are included for the sake of testing o Patches 10-14 focus on having objtool detect problematic static key usage in early entry o Patch 15 adds the infrastructure for IPI deferral. o Patches 16-17 add some RCU testing infrastructure o Patch 18 adds text_poke() IPI deferral. o Patches 19-20 add vunmap() flush_tlb_kernel_range() IPI deferral These ones I'm a lot less confident about, mostly due to lacking instrumentation/verification. The actual deferred callback is also incomplete as it's not properly noinstr: vmlinux.o: warning: objtool: __flush_tlb_all_noinstr+0x19: call to native_write_cr4() leaves .noinstr.text section and it doesn't support PARAVIRT - it's going to need a pv_ops.mmu entry, but I have *no idea* what a sane implementation would be for Xen so I haven't touched that yet. Patches are also available at: https://gitlab.com/vschneid/linux.git -b redhat/isolirq/defer/v2 Testing ======= Note: this is a different machine than used for v1, because that machine decided to act difficult. Xeon E5-2699 system with SMToff, NOHZ_FULL, isolated CPUs. RHEL9 userspace. Workload is using rteval (kernel compilation + hackbench) on housekeeping CPUs and a dummy stay-in-userspace loop on the isolated CPUs. The main invocation is: $ trace-cmd record -e "csd_queue_cpu" -f "cpu & CPUS{$ISOL_CPUS}" \ -e "ipi_send_cpumask" -f "cpumask & CPUS{$ISOL_CPUS}" \ -e "ipi_send_cpu" -f "cpu & CPUS{$ISOL_CPUS}" \ rteval --onlyload --loads-cpulist=$HK_CPUS \ --hackbench-runlowmem=True --duration=$DURATION This only records IPIs sent to isolated CPUs, so any event there is interference (with a bit of fuzz at the start/end of the workload when spawning the processes). All tests were done with a duration of 30 minutes. v6.5-rc1 (+ cpumask filtering patches): # This is the actual IPI count $ trace-cmd report | grep callback | awk '{ print $(NF) }' | sort | uniq -c | sort -nr 338 callback=generic_smp_call_function_single_interrupt+0x0 # These are the different CSD's that caused IPIs $ trace-cmd report | grep csd_queue | awk '{ print $(NF-1) }' | sort | uniq -c | sort -nr 9207 func=do_flush_tlb_all 1116 func=do_sync_core 62 func=do_kernel_range_flush 3 func=nohz_full_kick_func v6.5-rc1 + patches: # This is the actual IPI count $ trace-cmd report | grep callback | awk '{ print $(NF) }' | sort | uniq -c | sort -nr 2 callback=generic_smp_call_function_single_interrupt+0x0 # These are the different CSD's that caused IPIs $ trace-cmd report | grep csd_queue | awk '{ print $(NF-1) }' | sort | uniq -c | sort -nr 2 func=nohz_full_kick_func The incriminating IPIs are all gone, but note that on the machine I used to test v1 there were still some do_flush_tlb_all() IPIs caused by pcpu_balance_workfn(), since only vmalloc is affected by the deferral mechanism. Acknowledgements ================ Special thanks to: o Clark Williams for listening to my ramblings about this and throwing ideas my way o Josh Poimboeuf for his guidance regarding objtool and hinting at the .data..ro_after_init section. Links ===== [1]: https://lore.kernel.org/all/20210929151723.162004989@infradead.org/ [2]: https://github.com/vianpl/linux.git -b ct-work-defer-wip [3]: https://youtu.be/0vjE6fjoVVE Revisions ========= RFCv1 -> RFCv2 ++++++++++++++ o Rebased onto v6.5-rc1 o Updated the trace filter patches (Steven) o Fixed __ro_after_init keys used in modules (Peter) o Dropped the extra context_tracking atomic, squashed the new bits in the existing .state field (Peter, Frederic) o Added an RCU_EXPERT config for the RCU dynticks counter size, and added an rcutorture case for a low-size counter (Paul) The new TREE11 case with a 2-bit dynticks counter seems to pass when ran against this series. o Fixed flush_tlb_kernel_range_deferrable() definition Peter Zijlstra (1): jump_label,module: Don't alloc static_key_mod for __ro_after_init keys Valentin Schneider (19): tracing/filters: Dynamically allocate filter_pred.regex tracing/filters: Enable filtering a cpumask field by another cpumask tracing/filters: Enable filtering a scalar field by a cpumask tracing/filters: Enable filtering the CPU common field by a cpumask tracing/filters: Optimise cpumask vs cpumask filtering when user mask is a single CPU tracing/filters: Optimise scalar vs cpumask filtering when the user mask is a single CPU tracing/filters: Optimise CPU vs cpumask filtering when the user mask is a single CPU tracing/filters: Further optimise scalar vs cpumask comparison tracing/filters: Document cpumask filtering objtool: Flesh out warning related to pv_ops[] calls objtool: Warn about non __ro_after_init static key usage in .noinstr context_tracking: Make context_tracking_key __ro_after_init x86/kvm: Make kvm_async_pf_enabled __ro_after_init context-tracking: Introduce work deferral infrastructure rcu: Make RCU dynticks counter size configurable rcutorture: Add a test config to torture test low RCU_DYNTICKS width context_tracking,x86: Defer kernel text patching IPIs context_tracking,x86: Add infrastructure to defer kernel TLBI x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs Documentation/trace/events.rst | 14 + arch/Kconfig | 9 + arch/x86/Kconfig | 1 + arch/x86/include/asm/context_tracking_work.h | 20 ++ arch/x86/include/asm/text-patching.h | 1 + arch/x86/include/asm/tlbflush.h | 2 + arch/x86/kernel/alternative.c | 24 +- arch/x86/kernel/kprobes/core.c | 4 +- arch/x86/kernel/kprobes/opt.c | 4 +- arch/x86/kernel/kvm.c | 2 +- arch/x86/kernel/module.c | 2 +- arch/x86/mm/tlb.c | 40 ++- include/asm-generic/sections.h | 5 + include/linux/context_tracking.h | 26 ++ include/linux/context_tracking_state.h | 65 +++- include/linux/context_tracking_work.h | 28 ++ include/linux/jump_label.h | 1 + include/linux/trace_events.h | 1 + init/main.c | 1 + kernel/context_tracking.c | 53 ++- kernel/jump_label.c | 49 +++ kernel/rcu/Kconfig | 33 ++ kernel/time/Kconfig | 5 + kernel/trace/trace_events_filter.c | 302 ++++++++++++++++-- mm/vmalloc.c | 19 +- tools/objtool/check.c | 22 +- tools/objtool/include/objtool/check.h | 1 + tools/objtool/include/objtool/special.h | 2 + tools/objtool/special.c | 3 + .../selftests/rcutorture/configs/rcu/TREE11 | 19 ++ .../rcutorture/configs/rcu/TREE11.boot | 1 + 31 files changed, 695 insertions(+), 64 deletions(-) create mode 100644 arch/x86/include/asm/context_tracking_work.h create mode 100644 include/linux/context_tracking_work.h create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE11 create mode 100644 tools/testing/selftests/rcutorture/configs/rcu/TREE11.boot -- 2.31.1

7 months, 3 weeks

12
75
0 0

[PATCH 0/4] kunit: Fix indentation of parameterized tests messages

by Michal Wajdeczko

The indentation of parameterized tests messages is currently broken in kunit. Try to fix that by introducing a test level attribute, that will be increased during nested parameterized tests execution, and use it to generate correct indent at the runtime when printing message or writing them to the log. Also improve kunit by providing test plan for the parameterized tests. Cc: David Gow <davidgow(a)google.com> Cc: Rae Moar <rmoar(a)google.com> Michal Wajdeczko (4): kunit: Drop redundant text from suite init failure message kunit: Fix indentation level of suite messages kunit: Fix indentation of parameterized tests messages kunit: Prepare test plan for parameterized subtests include/kunit/test.h | 25 ++++++++++++-- lib/kunit/test.c | 79 +++++++++++++++++++++++--------------------- 2 files changed, 65 insertions(+), 39 deletions(-) -- 2.25.1

7 months, 4 weeks

3
16
0 0

[PATCH RESEND 0/3] kunit: Init and run test suites in the right state

by Jinjie Ruan

The original order of cases in kunit_module_notify() is confusing and misleading. And the best practice is return the err code from MODULE_STATE_COMING func. And the test suits should be executed when notify MODULE_STATE_LIVE. Jinjie Ruan (3): kunit: Make the cases sequence more reasonable for kunit_module_notify() kunit: Return error from kunit_module_init() kunit: Init and run test suites in the right state lib/kunit/test.c | 37 +++++++++++++++++++++++-------------- 1 file changed, 23 insertions(+), 14 deletions(-) -- 2.34.1

7 months, 4 weeks

3
5
0 0

[PATCH 1/6] selftests: capabilities: remove duplicate unneeded defines

by Muhammad Usama Anjum

These duplicate defines should automatically be picked up from kernel headers. Use KHDR_INCLUDES to add kernel header files. Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> --- tools/testing/selftests/capabilities/Makefile | 2 +- tools/testing/selftests/capabilities/test_execve.c | 8 -------- tools/testing/selftests/capabilities/validate_cap.c | 8 -------- 3 files changed, 1 insertion(+), 17 deletions(-) diff --git a/tools/testing/selftests/capabilities/Makefile b/tools/testing/selftests/capabilities/Makefile index 6e9d98d457d5b..411ac098308f1 100644 --- a/tools/testing/selftests/capabilities/Makefile +++ b/tools/testing/selftests/capabilities/Makefile @@ -2,7 +2,7 @@ TEST_GEN_FILES := validate_cap TEST_GEN_PROGS := test_execve -CFLAGS += -O2 -g -std=gnu99 -Wall +CFLAGS += -O2 -g -std=gnu99 -Wall $(KHDR_INCLUDES) LDLIBS += -lcap-ng -lrt -ldl include ../lib.mk diff --git a/tools/testing/selftests/capabilities/test_execve.c b/tools/testing/selftests/capabilities/test_execve.c index df0ef02b40367..e3a352b020a79 100644 --- a/tools/testing/selftests/capabilities/test_execve.c +++ b/tools/testing/selftests/capabilities/test_execve.c @@ -20,14 +20,6 @@ #include "../kselftest.h" -#ifndef PR_CAP_AMBIENT -#define PR_CAP_AMBIENT 47 -# define PR_CAP_AMBIENT_IS_SET 1 -# define PR_CAP_AMBIENT_RAISE 2 -# define PR_CAP_AMBIENT_LOWER 3 -# define PR_CAP_AMBIENT_CLEAR_ALL 4 -#endif - static int nerrs; static pid_t mpid; /* main() pid is used to avoid duplicate test counts */ diff --git a/tools/testing/selftests/capabilities/validate_cap.c b/tools/testing/selftests/capabilities/validate_cap.c index cdfc94268fe6e..60b4e7b716a75 100644 --- a/tools/testing/selftests/capabilities/validate_cap.c +++ b/tools/testing/selftests/capabilities/validate_cap.c @@ -9,14 +9,6 @@ #include "../kselftest.h" -#ifndef PR_CAP_AMBIENT -#define PR_CAP_AMBIENT 47 -# define PR_CAP_AMBIENT_IS_SET 1 -# define PR_CAP_AMBIENT_RAISE 2 -# define PR_CAP_AMBIENT_LOWER 3 -# define PR_CAP_AMBIENT_CLEAR_ALL 4 -#endif - #if __GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 19) # define HAVE_GETAUXVAL #endif -- 2.39.2

7 months, 4 weeks

4
16
0 0

[PATCH v2 0/3] userfaultfd remap option

by Suren Baghdasaryan

This patch series introduces UFFDIO_REMAP feature to userfaultfd, which has long been implemented and maintained by Andrea in his local tree [1], but was not upstreamed due to lack of use cases where this approach would be better than allocating a new page and copying the contents. UFFDIO_COPY performs ~20% better than UFFDIO_REMAP when the application needs pages to be allocated [2]. However, with UFFDIO_REMAP, if pages are available (in userspace) for recycling, as is usually the case in heap compaction algorithms, then we can avoid the page allocation and memcpy (done by UFFDIO_COPY). Also, since the pages are recycled in the userspace, we avoid the need to release (via madvise) the pages back to the kernel [3]. We see over 40% reduction (on a Google pixel 6 device) in the compacting thread’s completion time by using UFFDIO_REMAP vs. UFFDIO_COPY. This was measured using a benchmark that emulates a heap compaction implementation using userfaultfd (to allow concurrent accesses by application threads). More details of the usecase are explained in [3]. Furthermore, UFFDIO_REMAP enables remapping swapped-out pages without touching them within the same vma. Today, it can only be done by mremap, however it forces splitting the vma. Main changes since Andrea's last version [1]: - Trivial translations from page to folio, mmap_sem to mmap_lock - Replace pmd_trans_unstable() with pte_offset_map_nolock() and handle its possible failure - Move pte mapping into remap_pages_pte to allow for retries when source page or anon_vma is contended. Since pte_offset_map_nolock() start RCU read section, we can't block anymore after mapping a pte, so have to unmap the ptesm do the locking and retry. - Add and use anon_vma_trylock_write() to avoid blocking while in RCU read section. - Accommodate changes in mmu_notifier_range_init() API, switch to mmu_notifier_invalidate_range_start_nonblock() to avoid blocking while in RCU read section. - Open-code now removed __swp_swapcount() - Replace pmd_read_atomic() with pmdp_get_lockless() - Add new selftest for UFFDIO_REMAP Changes since v1 [4]: - add mmget_not_zero in userfaultfd_remap, per Jann Horn - removed extern from function definitions, per Matthew Wilcox - converted to folios in remap_pages_huge_pmd, per Matthew Wilcox - use PageAnonExclusive in remap_pages_huge_pmd, per David Hildenbrand - handle pgtable transfers between MMs, per Jann Horn - ignore concurrent A/D pte bit changes, per Jann Horn - split functions into smaller units, per David Hildenbrand - test for folio_test_large in remap_anon_pte, per Matthew Wilcox - use pte_swp_exclusive for swapcount check, per David Hildenbrand - eliminated use of mmu_notifier_invalidate_range_start_nonblock, per Jann Horn - simplified THP alignment checks, per Jann Horn - refactored the loop inside remap_pages, per Jann Horn - additional clarifying comments, per Jann Horn [1] https://gitlab.com/aarcange/aa/-/commit/2aec7aea56b10438a3881a20a411aa4b1fc… [2] https://lore.kernel.org/all/1425575884-2574-1-git-send-email-aarcange@redha… [3] https://lore.kernel.org/linux-mm/CA+EESO4uO84SSnBhArH4HvLNhaUQ5nZKNKXqxRCyj… [4] https://lore.kernel.org/all/20230914152620.2743033-1-surenb@google.com/ Andrea Arcangeli (2): userfaultfd: UFFDIO_REMAP: rmap preparation userfaultfd: UFFDIO_REMAP uABI Suren Baghdasaryan (1): selftests/mm: add UFFDIO_REMAP ioctl test fs/userfaultfd.c | 63 ++ include/linux/rmap.h | 5 + include/linux/userfaultfd_k.h | 12 + include/uapi/linux/userfaultfd.h | 22 + mm/huge_memory.c | 130 ++++ mm/khugepaged.c | 3 + mm/rmap.c | 13 + mm/userfaultfd.c | 590 +++++++++++++++++++ tools/testing/selftests/mm/uffd-common.c | 41 +- tools/testing/selftests/mm/uffd-common.h | 1 + tools/testing/selftests/mm/uffd-unit-tests.c | 62 ++ 11 files changed, 940 insertions(+), 2 deletions(-) -- 2.42.0.515.g380fc7ccd1-goog

8 months

5
54
0 0

[PATCH 0/2] tracing/user_events: Fix alignment issues for 32 on 64-bit and BE

by Beau Belgrave

All architectures should use a long aligned address passed to set_bit(). User processes can pass either a 32-bit or 64-bit sized value to be updated when tracing is enabled when on a 64-bit kernel. Both cases are ensured to be naturally aligned, however, that is not enough. The address must be long aligned without affecting checks on the value within the user process which require different adjustments for the bit for little and big endian CPUs. 32 bit on 64 bit, even when properly long aligned, still require a 32 bit offset to be done for BE. Due to this, it cannot be easily put into a generic method. The abi_test also used a long, which broke the test on 64-bit BE machines. The change simply uses an int for 32-bit value checks and a long when on 64-bit kernels for 64-bit specific checks. I've run these changes and self tests for user_events on ppc64 BE, x86_64 LE, and aarch64 LE. It'd be great to test this also on RISC-V, but I do not have one. Clément Léger originally put a patch together for the alignment issue, but we uncovered more issues as we went further into the problem. Clément felt my version was better [1] so I am sending this series out that addresses the selftest, BE bit offset, and the alignment issue. 1. https://lore.kernel.org/linux-trace-kernel/713f4916-00ff-4a24-82d1-72884500… Beau Belgrave (2): tracing/user_events: Align set_bit() address for all archs selftests/user_events: Fix abi_test for BE archs kernel/trace/trace_events_user.c | 58 ++++++++++++++++--- .../testing/selftests/user_events/abi_test.c | 16 ++--- 2 files changed, 60 insertions(+), 14 deletions(-) base-commit: fc1653abba0d554aad80224e51bcad42b09895ed -- 2.34.1

8 months

3
10
0 0

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror September 2023