August 2023 - Linux-kselftest-mirror

[PATCH v3 0/7] Split a folio to any lower order folios

by Zi Yan

From: Zi Yan <ziy(a)nvidia.com> Hi all, File folio supports any order and people would like to support flexible orders for anonymous folio[1] too. Currently, split_huge_page() only splits a huge page to order-0 pages, but splitting to orders higher than 0 is also useful. This patchset adds support for splitting a huge page to any lower order pages and uses it during file folio truncate operations. The patchset is on top of mm-everything-2023-03-27-21-20. Changelog === Since v2 --- 1. Fixed an issue in __split_page_owner() introduced during my rebase Since v1 --- 1. Changed split_page_memcg() and split_page_owner() parameter to use order 2. Used folio_test_pmd_mappable() in place of the equivalent code Details === * Patch 1 changes split_page_memcg() to use order instead of nr_pages * Patch 2 changes split_page_owner() to use order instead of nr_pages * Patch 3 and 4 add new_order parameter split_page_memcg() and split_page_owner() and prepare for upcoming changes. * Patch 5 adds split_huge_page_to_list_to_order() to split a huge page to any lower order. The original split_huge_page_to_list() calls split_huge_page_to_list_to_order() with new_order = 0. * Patch 6 uses split_huge_page_to_list_to_order() in large pagecache folio truncation instead of split the large folio all the way down to order-0. * Patch 7 adds a test API to debugfs and test cases in split_huge_page_test selftests. Comments and/or suggestions are welcome. [1] https://lore.kernel.org/linux-mm/Y%2FblF0GIunm+pRIC@casper.infradead.org/ Zi Yan (7): mm/memcg: use order instead of nr in split_page_memcg() mm/page_owner: use order instead of nr in split_page_owner() mm: memcg: make memcg huge page split support any order split. mm: page_owner: add support for splitting to any order in split page_owner. mm: thp: split huge page to any lower order pages. mm: truncate: split huge page cache page to a non-zero order if possible. mm: huge_memory: enable debugfs to split huge pages to any order. include/linux/huge_mm.h | 10 +- include/linux/memcontrol.h | 4 +- include/linux/page_owner.h | 10 +- mm/huge_memory.c | 137 ++++++++--- mm/memcontrol.c | 10 +- mm/page_alloc.c | 8 +- mm/page_owner.c | 8 +- mm/truncate.c | 21 +- .../selftests/mm/split_huge_page_test.c | 225 +++++++++++++++++- 9 files changed, 365 insertions(+), 68 deletions(-) -- 2.39.2

1 year, 4 months

6
22
0 0

[PATCH v10 0/4] RISC-V: mm: Make SV48 the default address space

by Charlie Jenkins

Make sv48 the default address space for mmap as some applications currently depend on this assumption. Users can now select a desired address space using a non-zero hint address to mmap. Previously, requesting the default address space from mmap by passing zero as the hint address would result in using the largest address space possible. Some applications depend on empty bits in the virtual address space, like Go and Java, so this patch provides more flexibility for application developers. -Charlie --- v10: - Move pgtable.h defintions into a no __ASSEMBLY__ region to resolve compilation conflicts (pointed out by Conor) - Will now compile with allmodconfig v9: - Raise the mmap_end default to STACK_TOP_MAX to allow the address space to grow beyond the default of sv48 on sv57 machines as suggested by Alexandre - Some of the mmap macros had unnecessary conditionals that I have removed v8: - Fix RV32 and the RV32 compat mode of RV64 (suggested by Conor) - Extract out addr and base from the mmap macros (suggested by Alexandre) v7: - Changing RLIMIT_STACK inside of an executing program does not trigger arch_pick_mmap_layout(), so rewrite tests to change RLIMIT_STACK from a script before executing tests. RLIMIT_STACK of infinity forces bottomup mmap allocation. - Make arch_get_mmap_base macro more readible by extracting out the rnd calculation. - Use MMAP_MIN_VA_BITS in TASK_UNMAPPED_BASE to support case when mmap attempts to allocate address smaller than DEFAULT_MAP_WINDOW. - Fix incorrect wording in documentation. v6: - Rebase onto the correct base v5: - Minor wording change in documentation - Change some parenthesis in arch_get_mmap_ macros - Added case for addr==0 in arch_get_mmap_ because without this, programs would crash if RLIMIT_STACK was modified before executing the program. This was tested using the libhugetlbfs tests. v4: - Split testcases/document patch into test cases, in-code documentation, and formal documentation patches - Modified the mmap_base macro to be more legible and better represent memory layout - Fixed documentation to better reflect the implmentation - Renamed DEFAULT_VA_BITS to MMAP_VA_BITS - Added additional test case for rlimit changes --- Charlie Jenkins (4): RISC-V: mm: Restrict address space for sv39,sv48,sv57 RISC-V: mm: Add tests for RISC-V mm RISC-V: mm: Update pgtable comment documentation RISC-V: mm: Document mmap changes Documentation/riscv/vm-layout.rst | 22 +++++++ arch/riscv/include/asm/elf.h | 2 +- arch/riscv/include/asm/pgtable.h | 33 ++++++++-- arch/riscv/include/asm/processor.h | 52 +++++++++++++-- tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/mm/.gitignore | 2 + tools/testing/selftests/riscv/mm/Makefile | 15 +++++ .../riscv/mm/testcases/mmap_bottomup.c | 35 ++++++++++ .../riscv/mm/testcases/mmap_default.c | 35 ++++++++++ .../selftests/riscv/mm/testcases/mmap_test.h | 64 +++++++++++++++++++ .../selftests/riscv/mm/testcases/run_mmap.sh | 12 ++++ 11 files changed, 261 insertions(+), 13 deletions(-) create mode 100644 tools/testing/selftests/riscv/mm/.gitignore create mode 100644 tools/testing/selftests/riscv/mm/Makefile create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_default.c create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_test.h create mode 100755 tools/testing/selftests/riscv/mm/testcases/run_mmap.sh -- 2.34.1

1 year, 5 months

3
11
0 0

[PATCH v3 00/11] A minor flurry of selftest/mm fixes

by John Hubbard

Hi, Changes since v2 [1]: * Added a new patch (sent separately earlier) at the end, to error out if "make headers" has not yet been run. * Reworked and simplified the uffd movement patch. Now it only moves some uffd*() routines, not all, and doesn't have to touch the Makefile at all. This lighter touch also allowed me to drop the "move psize(), pshift() into vm_utils.c" entirely. I expect Peter Xu will be a little happier with this new approach. * Fixed the commit description for the MADV_COLLAPSE patch. * Added more Reviewed-by tags from David Hildenbrand and Peter Xu. [1] https://lore.kernel.org/all/20230603021558.95299-1-jhubbard@nvidia.com/ John Hubbard (11): selftests/mm: fix uffd-stress unused function warning selftests/mm: fix unused variable warnings in hugetlb-madvise.c, migration.c selftests/mm: fix "warning: expression which evaluates to zero..." in mlock2-tests.c selftests/mm: fix invocation of tests that are run via shell scripts selftests/mm: .gitignore: add mkdirty, va_high_addr_switch selftests/mm: fix two -Wformat-security warnings in uffd builds selftests/mm: fix a "possibly uninitialized" warning in pkey-x86.h selftests/mm: fix build failures due to missing MADV_COLLAPSE selftests/mm: move certain uffd*() routines from vm_util.c to uffd-common.c Documentation: kselftest: "make headers" is a prerequisite selftests: error out if kernel header files are not yet built Documentation/dev-tools/kselftest.rst | 1 + tools/testing/selftests/lib.mk | 36 +++++++++++- tools/testing/selftests/mm/.gitignore | 2 + tools/testing/selftests/mm/cow.c | 7 --- tools/testing/selftests/mm/hugetlb-madvise.c | 8 ++- tools/testing/selftests/mm/khugepaged.c | 10 ---- tools/testing/selftests/mm/migration.c | 5 +- tools/testing/selftests/mm/mlock2-tests.c | 1 - tools/testing/selftests/mm/pkey-x86.h | 2 +- tools/testing/selftests/mm/run_vmtests.sh | 6 +- tools/testing/selftests/mm/uffd-common.c | 59 ++++++++++++++++++++ tools/testing/selftests/mm/uffd-common.h | 5 ++ tools/testing/selftests/mm/uffd-stress.c | 10 ---- tools/testing/selftests/mm/uffd-unit-tests.c | 16 ++---- tools/testing/selftests/mm/vm_util.c | 59 -------------------- tools/testing/selftests/mm/vm_util.h | 14 +++-- 16 files changed, 130 insertions(+), 111 deletions(-) base-commit: f8dba31b0a826e691949cd4fdfa5c30defaac8c5 -- 2.40.1

1 year, 7 months

6
38
0 0

[PATCH v3 0/3] Add a test to catch unprobed Devicetree devices

by Nícolas F. R. A. Prado

Regressions that cause a device to no longer be probed by a driver can have a big impact on the platform's functionality, and despite being relatively common there isn't currently any generic test to detect them. As an example, bootrr [1] does test for device probe, but it requires defining the expected probed devices for each platform. Given that the Devicetree already provides a static description of devices on the system, it is a good basis for building such a test on top. This series introduces a test to catch regressions that prevent devices from probing. Patches 1 and 2 extend the existing dt-extract-compatibles to be able to output only the compatibles that can be expected to match a Devicetree node to a driver. Patch 2 adds a kselftest that walks over the Devicetree nodes on the current platform and compares the compatibles to the ones on the list, and on an ignore list, to point out devices that failed to be probed. A compatible list is needed because not all compatibles that can show up in a Devicetree node can be used to match to a driver, for example the code for that compatible might use "OF_DECLARE" type macros and avoid the driver framework, or the node might be controlled by a driver that was bound to a different node. An ignore list is needed for the few cases where it's common for a driver to match a device but not probe, like for the "simple-mfd" compatible, where the driver only probes if that compatible is the node's first compatible. The reason for parsing the kernel source instead of relying on information exposed by the kernel at runtime (say, looking at modaliases or introducing some other mechanism), is to be able to catch issues where a config was renamed or a driver moved across configs, and the .config used by the kernel not updated accordingly. We need to parse the source to find all compatibles present in the kernel independent of the current config being run. [1] https://github.com/kernelci/bootrr Changes in v3: - Added DT selftest path to MAINTAINERS - Enabled device probe test for nodes with 'status = "ok"' - Added pass/fail/skip totals to end of test output Changes in v2: - Extended dt-extract-compatibles script to be able to extract driver matching compatibles, instead of adding a new one in Coccinelle - Made kselftest output in the KTAP format Nícolas F. R. A. Prado (3): dt: dt-extract-compatibles: Handle cfile arguments in generator function dt: dt-extract-compatibles: Add flag for driver matching compatibles kselftest: Add new test for detecting unprobed Devicetree devices MAINTAINERS | 1 + scripts/dtc/dt-extract-compatibles | 74 +++++++++++++---- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/dt/.gitignore | 1 + tools/testing/selftests/dt/Makefile | 21 +++++ .../selftests/dt/compatible_ignore_list | 1 + tools/testing/selftests/dt/ktap_helpers.sh | 70 ++++++++++++++++ .../selftests/dt/test_unprobed_devices.sh | 83 +++++++++++++++++++ 8 files changed, 236 insertions(+), 16 deletions(-) create mode 100644 tools/testing/selftests/dt/.gitignore create mode 100644 tools/testing/selftests/dt/Makefile create mode 100644 tools/testing/selftests/dt/compatible_ignore_list create mode 100644 tools/testing/selftests/dt/ktap_helpers.sh create mode 100755 tools/testing/selftests/dt/test_unprobed_devices.sh -- 2.42.0

1 year, 7 months

6
12
0 0

[PATCH v1 0/9] x86/resctrl: Use soft RMIDs for reliable MBM on AMD

by Peter Newman

Hi Reinette, Fenghua, This series introduces a new mount option enabling an alternate mode for MBM to work around an issue on present AMD implementations and any other resctrl implementation where there are more RMIDs (or equivalent) than hardware counters. The L3 External Bandwidth Monitoring feature of the AMD PQoS extension[1] only guarantees that RMIDs currently assigned to a processor will be tracked by hardware. The counters of any other RMIDs which are no longer being tracked will be reset to zero. The MBM event counters return "Unavailable" to indicate when this has happened. An interval for effectively measuring memory bandwidth typically needs to be multiple seconds long. In Google's workloads, it is not feasible to bound the number of jobs with different RMIDs which will run in a cache domain over any period of time. Consequently, on a fully-committed system where all RMIDs are allocated, few groups' counters return non-zero values. To demonstrate the underlying issue, the first patch provides a test case in tools/testing/selftests/resctrl/test_rmids.sh. On an AMD EPYC 7B12 64-Core Processor with the default behavior: # ./test_rmids.sh Created 255 monitoring groups. g1: mbm_total_bytes: Unavailable -> Unavailable (FAIL) g2: mbm_total_bytes: Unavailable -> Unavailable (FAIL) g3: mbm_total_bytes: Unavailable -> Unavailable (FAIL) [..] g238: mbm_total_bytes: Unavailable -> Unavailable (FAIL) g239: mbm_total_bytes: Unavailable -> Unavailable (FAIL) g240: mbm_total_bytes: Unavailable -> Unavailable (FAIL) g241: mbm_total_bytes: Unavailable -> 660497472 g242: mbm_total_bytes: Unavailable -> 660793344 g243: mbm_total_bytes: Unavailable -> 660477312 g244: mbm_total_bytes: Unavailable -> 660495360 g245: mbm_total_bytes: Unavailable -> 660775360 g246: mbm_total_bytes: Unavailable -> 660645504 g247: mbm_total_bytes: Unavailable -> 660696128 g248: mbm_total_bytes: Unavailable -> 660605248 g249: mbm_total_bytes: Unavailable -> 660681280 g250: mbm_total_bytes: Unavailable -> 660834240 g251: mbm_total_bytes: Unavailable -> 660440064 g252: mbm_total_bytes: Unavailable -> 660501504 g253: mbm_total_bytes: Unavailable -> 660590720 g254: mbm_total_bytes: Unavailable -> 660548352 g255: mbm_total_bytes: Unavailable -> 660607296 255 groups, 0 returned counts in first pass, 15 in second successfully measured bandwidth from 15/255 groups To compare, here is the output from an Intel(R) Xeon(R) Platinum 8173M CPU: # ./test_rmids.sh Created 223 monitoring groups. g1: mbm_total_bytes: 0 -> 606126080 g2: mbm_total_bytes: 0 -> 613236736 g3: mbm_total_bytes: 0 -> 610254848 [..] g221: mbm_total_bytes: 0 -> 584679424 g222: mbm_total_bytes: 0 -> 588808192 g223: mbm_total_bytes: 0 -> 587317248 223 groups, 223 returned counts in first pass, 223 in second successfully measured bandwidth from 223/223 groups To make better use of the hardware in such a use case, this patchset introduces a "soft" RMID implementation, where each CPU is permanently assigned a "hard" RMID. On context switches which change the current soft RMID, the difference between each CPU's current event counts and most recent counts is added to the totals for the current or outgoing soft RMID. This technique does not work for cache occupancy counters, so this patch series disables cache occupancy events when soft RMIDs are enabled. This series adds the "mbm_soft_rmid" mount option to allow users to opt-in to the functionaltiy when they deem it helpful. When the same system from the earlier AMD example enables the mbm_soft_rmid mount option: # ./test_rmids.sh Created 255 monitoring groups. g1: mbm_total_bytes: 0 -> 686560576 g2: mbm_total_bytes: 0 -> 668204416 [..] g252: mbm_total_bytes: 0 -> 672651200 g253: mbm_total_bytes: 0 -> 666956800 g254: mbm_total_bytes: 0 -> 665917056 g255: mbm_total_bytes: 0 -> 671049600 255 groups, 255 returned counts in first pass, 255 in second successfully measured bandwidth from 255/255 groups (patches are based on tip/master) [1] https://www.amd.com/system/files/TechDocs/56375_1.03_PUB.pdf Peter Newman (8): selftests/resctrl: Verify all RMIDs count together x86/resctrl: Add resctrl_mbm_flush_cpu() to collect CPUs' MBM events x86/resctrl: Flush MBM event counts on soft RMID change x86/resctrl: Call mon_event_count() directly for soft RMIDs x86/resctrl: Create soft RMID version of __mon_event_count() x86/resctrl: Assign HW RMIDs to CPUs for soft RMID x86/resctrl: Use mbm_update() to push soft RMID counts x86/resctrl: Add mount option to enable soft RMID Stephane Eranian (1): x86/resctrl: Hold a spinlock in __rmid_read() on AMD arch/x86/include/asm/resctrl.h | 29 +++- arch/x86/kernel/cpu/resctrl/core.c | 80 ++++++++- arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 9 +- arch/x86/kernel/cpu/resctrl/internal.h | 19 ++- arch/x86/kernel/cpu/resctrl/monitor.c | 158 +++++++++++++++++- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 52 ++++++ tools/testing/selftests/resctrl/test_rmids.sh | 93 +++++++++++ 7 files changed, 425 insertions(+), 15 deletions(-) create mode 100755 tools/testing/selftests/resctrl/test_rmids.sh base-commit: dd806e2f030e57dd5bac973372aa252b6c175b73 -- 2.40.0.634.g4ca3ef3211-goog

1 year, 7 months

2
39
0 0

[PATCH v2] selftests: prctl: Add prctl test for PR_GET_NAME

by Osama Muhammad

This patch covers the testing of PR_GET_NAME by reading it's value from proc/self/task/pid/comm and matching it with the value returned by PR_GET_NAME. If the values are matched then it's successful, otherwise it fails. changes since v1: - Handled fscanf,fopen error checking. - Defined MAX_PATH_LEN. Signed-off-by: Osama Muhammad <osmtendev(a)gmail.com> --- .../selftests/prctl/set-process-name.c | 32 +++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/tools/testing/selftests/prctl/set-process-name.c b/tools/testing/selftests/prctl/set-process-name.c index 3bc5e0e09..562f707ba 100644 --- a/tools/testing/selftests/prctl/set-process-name.c +++ b/tools/testing/selftests/prctl/set-process-name.c @@ -12,6 +12,7 @@ #define CHANGE_NAME "changename" #define EMPTY_NAME "" #define TASK_COMM_LEN 16 +#define MAX_PATH_LEN 50 int set_name(char *name) { @@ -47,6 +48,35 @@ int check_null_pointer(char *check_name) return res; } +int check_name(void) +{ + + int pid; + + pid = getpid(); + FILE *fptr = NULL; + char path[MAX_PATH_LEN] = {}; + char name[TASK_COMM_LEN] = {}; + char output[TASK_COMM_LEN] = {}; + int j; + + j = snprintf(path, MAX_PATH_LEN, "/proc/self/task/%d/comm", pid); + fptr = fopen(path, "r"); + if (!fptr) + return -EIO; + + fscanf(fptr, "%s", output); + if (ferror(fptr)) + return -EIO; + + int res = prctl(PR_GET_NAME, name, NULL, NULL, NULL); + + if (res < 0) + return -errno; + + return !strcmp(output, name); +} + TEST(rename_process) { EXPECT_GE(set_name(CHANGE_NAME), 0); @@ -57,6 +87,8 @@ TEST(rename_process) { EXPECT_GE(set_name(CHANGE_NAME), 0); EXPECT_LT(check_null_pointer(CHANGE_NAME), 0); + + EXPECT_TRUE(check_name()); } TEST_HARNESS_MAIN -- 2.34.1

1 year, 7 months

2
3
0 0

selftests: gpio: crash on arm64

by Naresh Kamboju

Following kernel warnings and crash notices on arm64 Rpi4 device while running selftests: gpio on Linux mainline 6.3.0-rc1 kernel and Linux next. Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org> Please refer to test log links for detailed test plan and kernel crash logs. It is reproducible on arm64 juno-r2, Rpi4 and Qualcomm dragonboard 410c and qemu-arm64. Test log: ----------- kselftest: Running tests in gpio TAP version 13 1..2 # selftests: gpio: gpio-mockup.sh # 1. Module load tests [ 61.176149] ============================================================================= [ 61.176802] [ 61.176807] ====================================================== [ 61.176809] WARNING: possible circular locking dependency detected [ 61.176811] 6.3.0-rc1-next-20230307 #1 Not tainted [ 61.176814] ------------------------------------------------------ [ 61.176816] modprobe/510 is trying to acquire lock: [ 61.176818] ffff80000b2284e8 (console_owner){..-.}-{0:0}, at: console_flush_all (kernel/printk/printk.c:2879 kernel/printk/printk.c:2942) [ 61.176846] [ 61.176846] but task is already holding lock: [ 61.176848] ffff000040000698 (&n->list_lock){-.-.}-{2:2}, at: get_partial_node.part.0 (mm/slub.c:2271) [ 61.176861] [ 61.176861] which lock already depends on the new lock. [ 61.176861] [ 61.176863] [ 61.176863] the existing dependency chain (in reverse order) is: [ 61.176864] [ 61.176864] -> #2 (&n->list_lock){-.-.}-{2:2}: [ 61.176871] lock_acquire (kernel/locking/lockdep.c:5673) [ 61.176879] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162) [ 61.176885] get_partial_node.part.0 (mm/slub.c:2271) [ 61.176890] ___slab_alloc (mm/slub.c:2268 mm/slub.c:2386 mm/slub.c:3188) [ 61.176894] __slab_alloc.constprop.0 (mm/slub.c:3292) [ 61.176899] __kmem_cache_alloc_node (mm/slub.c:3345 mm/slub.c:3442 mm/slub.c:3491) [ 61.176903] __kmalloc (mm/slab_common.c:968 mm/slab_common.c:980) [ 61.176908] tty_buffer_alloc (drivers/tty/tty_buffer.c:182) [ 61.176914] __tty_buffer_request_room (drivers/tty/tty_buffer.c:279) [ 61.176919] __tty_insert_flip_char (drivers/tty/tty_buffer.c:398) [ 61.176924] uart_insert_char (drivers/tty/serial/serial_core.c:3341) [ 61.176929] pl011_fifo_to_tty.isra.0 (drivers/tty/serial/amba-pl011.c:314) [ 61.176934] pl011_int (include/linux/spinlock.h:390 drivers/tty/serial/amba-pl011.c:1396 drivers/tty/serial/amba-pl011.c:1571) [ 61.176937] __handle_irq_event_percpu (kernel/irq/handle.c:158) [ 61.176941] handle_irq_event (kernel/irq/handle.c:193 kernel/irq/handle.c:210) [ 61.176944] handle_fasteoi_irq (kernel/irq/chip.c:716) [ 61.176950] generic_handle_domain_irq (kernel/irq/irqdesc.c:652 kernel/irq/irqdesc.c:707) [ 61.176953] gic_handle_irq (arch/arm64/include/asm/io.h:75 include/asm-generic/io.h:335 drivers/irqchip/irq-gic.c:344) [ 61.176958] call_on_irq_stack (arch/arm64/kernel/entry.S:905) [ 61.176962] do_interrupt_handler (arch/arm64/kernel/entry-common.c:274) [ 61.176968] el1_interrupt (arch/arm64/kernel/entry-common.c:472 arch/arm64/kernel/entry-common.c:486) [ 61.176971] el1h_64_irq_handler (arch/arm64/kernel/entry-common.c:492) [ 61.176975] el1h_64_irq (arch/arm64/kernel/entry.S:587) [ 61.176978] __kmem_cache_alloc_node (mm/slub.c:3490) [ 61.176983] kmalloc_trace (mm/slab_common.c:1064 (discriminator 4)) [ 61.176986] inet6_dump_fib (net/ipv6/ip6_fib.c:657) [ 61.176991] rtnl_dump_all (net/core/rtnetlink.c:3964) [ 61.176997] netlink_dump (net/netlink/af_netlink.c:2296) [ 61.177004] netlink_recvmsg (net/netlink/af_netlink.c:2024) [ 61.177009] ____sys_recvmsg (net/socket.c:1015 net/socket.c:1036 net/socket.c:2723) [ 61.177014] ___sys_recvmsg (net/socket.c:2765) [ 61.177019] __sys_recvmsg (include/linux/file.h:31 net/socket.c:2797) [ 61.177025] __arm64_sys_recvmsg (net/socket.c:2802) [ 61.177030] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:57) [ 61.177037] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:149) [ 61.177043] do_el0_svc (arch/arm64/kernel/syscall.c:194) [ 61.177049] el0_svc (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:638) [ 61.177052] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:656) [ 61.177055] el0t_64_sync (arch/arm64/kernel/entry.S:591) [ 61.177058] [ 61.177058] -> #1 (&port_lock_key){-.-.}-{2:2}: [ 61.177065] lock_acquire (kernel/locking/lockdep.c:5673) [ 61.177071] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162) [ 61.177074] serial8250_console_write (drivers/tty/serial/8250/8250_port.c:3394) [ 61.177082] univ8250_console_write (drivers/tty/serial/8250/8250_core.c:585) [ 61.177087] console_flush_all (kernel/printk/printk.c:2888 kernel/printk/printk.c:2942) [ 61.177093] console_unlock.part.0 (kernel/printk/printk.c:3017) [ 61.177098] vprintk_emit (kernel/printk/printk.c:2317) [ 61.177104] vprintk_default (kernel/printk/printk.c:2328) [ 61.177110] vprintk (kernel/printk/printk_safe.c:50) [ 61.177116] _printk (kernel/printk/printk.c:2341) [ 61.177121] register_console (kernel/printk/printk.c:3468) [ 61.177126] uart_add_one_port (drivers/tty/serial/serial_core.c:2579 drivers/tty/serial/serial_core.c:3100) [ 61.177130] serial8250_register_8250_port (drivers/tty/serial/8250/8250_core.c:1093) [ 61.177135] bcm2835aux_serial_probe (drivers/tty/serial/8250/8250_bcm2835aux.c:184) [ 61.177141] platform_probe (drivers/base/platform.c:1405) [ 61.177148] really_probe (drivers/base/dd.c:552 drivers/base/dd.c:631) [ 61.177152] __driver_probe_device (drivers/base/dd.c:768) [ 61.177157] driver_probe_device (drivers/base/dd.c:798) [ 61.177161] __driver_attach (drivers/base/dd.c:1185) [ 61.177166] bus_for_each_dev (drivers/base/bus.c:368) [ 61.177170] driver_attach (drivers/base/dd.c:1202) [ 61.177173] bus_add_driver (drivers/base/bus.c:673) [ 61.177177] driver_register (drivers/base/driver.c:246) [ 61.177182] __platform_driver_register (drivers/base/platform.c:868) [ 61.177188] bcm2835aux_serial_driver_init (drivers/tty/serial/8250/8250_bcm2835aux.c:233) [ 61.177195] do_one_initcall (init/main.c:1306) [ 61.177199] kernel_init_freeable (init/main.c:1378 init/main.c:1395 init/main.c:1414 init/main.c:1634) [ 61.177207] kernel_init (init/main.c:1524) [ 61.177212] ret_from_fork (arch/arm64/kernel/entry.S:871) [ 61.177216] [ 61.177216] -> #0 (console_owner){..-.}-{0:0}: [ 61.177222] __lock_acquire (kernel/locking/lockdep.c:3099 kernel/locking/lockdep.c:3217 kernel/locking/lockdep.c:3832 kernel/locking/lockdep.c:5056) [ 61.177228] lock_acquire.part.0 (arch/arm64/include/asm/percpu.h:40 kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5671) [ 61.177233] lock_acquire (kernel/locking/lockdep.c:5673) [ 61.177238] console_flush_all (kernel/printk/printk.c:2883 kernel/printk/printk.c:2942) [ 61.177244] console_unlock.part.0 (kernel/printk/printk.c:3017) [ 61.177250] vprintk_emit (kernel/printk/printk.c:2317) [ 61.177255] vprintk_default (kernel/printk/printk.c:2328) [ 61.177261] vprintk (kernel/printk/printk_safe.c:50) [ 61.177267] _printk (kernel/printk/printk.c:2341) [ 61.177271] slab_bug (mm/slub.c:892) [ 61.177274] check_bytes_and_report (mm/slub.c:1054) [ 61.177279] check_object (mm/slub.c:1196 (discriminator 2)) [ 61.177283] alloc_debug_processing (mm/slub.c:1415 mm/slub.c:1425) [ 61.177287] get_partial_node.part.0 (mm/slub.c:2146 mm/slub.c:2279) [ 61.177291] ___slab_alloc (mm/slub.c:2268 mm/slub.c:2386 mm/slub.c:3188) [ 61.177295] __slab_alloc.constprop.0 (mm/slub.c:3292) [ 61.177300] __kmem_cache_alloc_node (mm/slub.c:3345 mm/slub.c:3442 mm/slub.c:3491) [ 61.177304] kmalloc_trace (mm/slab_common.c:1064 (discriminator 4)) [ 61.177308] device_add (drivers/base/core.c:3436 drivers/base/core.c:3486) [ 61.177311] platform_device_add (drivers/base/platform.c:717) [ 61.177317] platform_device_register_full (drivers/base/platform.c:844) [ 61.177323] gpio_mockup_register_chip+0x1ec/0x2b8 gpio_mockup [ 61.177342] gpio_mockup_init+0xf0/0xd40 gpio_mockup [ 61.177352] do_one_initcall (init/main.c:1306) [ 61.177356] do_init_module (kernel/module/main.c:2457) [ 61.177363] load_module (kernel/module/main.c:2859) [ 61.177369] __do_sys_finit_module (kernel/module/main.c:2961) [ 61.177375] __arm64_sys_finit_module (kernel/module/main.c:2928) [ 61.177381] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:57) [ 61.177387] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:149) [ 61.177393] do_el0_svc (arch/arm64/kernel/syscall.c:194) [ 61.177398] el0_svc (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:638) [ 61.177402] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:656) [ 61.177405] el0t_64_sync (arch/arm64/kernel/entry.S:591) [ 61.177408] [ 61.177408] other info that might help us debug this: [ 61.177408] [ 61.177410] Chain exists of: [ 61.177410] console_owner --> &port_lock_key --> &n->list_lock [ 61.177410] [ 61.177417] Possible unsafe locking scenario: [ 61.177417] [ 61.177418] CPU0 CPU1 [ 61.177419] ---- ---- [ 61.177420] lock(&n->list_lock); [ 61.177423] lock(&port_lock_key); [ 61.177426] lock(&n->list_lock); [ 61.177429] lock(console_owner); [ 61.177432] [ 61.177432] *** DEADLOCK *** [ 61.177432] [ 61.177434] 3 locks held by modprobe/510: [ 61.177436] #0: ffff000040000698 (&n->list_lock){-.-.}-{2:2}, at: get_partial_node.part.0 (mm/slub.c:2271) [ 61.177448] #1: ffff80000b227f18 (console_lock){+.+.}-{0:0}, at: vprintk_emit (kernel/printk/printk.c:1936 kernel/printk/printk.c:2315) [ 61.177460] #2: ffff80000b228388 (console_srcu){....}-{0:0}, at: console_flush_all (include/linux/srcu.h:200 kernel/printk/printk.c:290 kernel/printk/printk.c:2934) [ 61.177471] [ 61.177471] stack backtrace: [ 61.177474] CPU: 3 PID: 510 Comm: modprobe Not tainted 6.3.0-rc1-next-20230307 #1 [ 61.177479] Hardware name: Raspberry Pi 4 Model B (DT) [ 61.177482] Call trace: [ 61.177483] dump_backtrace (arch/arm64/kernel/stacktrace.c:160) [ 61.177487] show_stack (arch/arm64/kernel/stacktrace.c:167) [ 61.177490] dump_stack_lvl (lib/dump_stack.c:107) [ 61.177498] dump_stack (lib/dump_stack.c:114) [ 61.177504] print_circular_bug (kernel/locking/lockdep.c:2057) [ 61.177509] check_noncircular (kernel/locking/lockdep.c:2181) [ 61.177514] __lock_acquire (kernel/locking/lockdep.c:3099 kernel/locking/lockdep.c:3217 kernel/locking/lockdep.c:3832 kernel/locking/lockdep.c:5056) [ 61.177520] lock_acquire.part.0 (arch/arm64/include/asm/percpu.h:40 kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5671) [ 61.177525] lock_acquire (kernel/locking/lockdep.c:5673) [ 61.177530] console_flush_all (kernel/printk/printk.c:2883 kernel/printk/printk.c:2942) [ 61.177536] console_unlock.part.0 (kernel/printk/printk.c:3017) [ 61.177542] vprintk_emit (kernel/printk/printk.c:2317) [ 61.177547] vprintk_default (kernel/printk/printk.c:2328) [ 61.177553] vprintk (kernel/printk/printk_safe.c:50) [ 61.177559] _printk (kernel/printk/printk.c:2341) [ 61.177564] slab_bug (mm/slub.c:892) [ 61.177567] check_bytes_and_report (mm/slub.c:1054) [ 61.177571] check_object (mm/slub.c:1196 (discriminator 2)) [ 61.177575] alloc_debug_processing (mm/slub.c:1415 mm/slub.c:1425) [ 61.177579] get_partial_node.part.0 (mm/slub.c:2146 mm/slub.c:2279) [ 61.177583] ___slab_alloc (mm/slub.c:2268 mm/slub.c:2386 mm/slub.c:3188) [ 61.177587] __slab_alloc.constprop.0 (mm/slub.c:3292) [ 61.177592] __kmem_cache_alloc_node (mm/slub.c:3345 mm/slub.c:3442 mm/slub.c:3491) [ 61.177596] kmalloc_trace (mm/slab_common.c:1064 (discriminator 4)) [ 61.177600] device_add (drivers/base/core.c:3436 drivers/base/core.c:3486) [ 61.177603] platform_device_add (drivers/base/platform.c:717) [ 61.177609] platform_device_register_full (drivers/base/platform.c:844) [ 61.177615] gpio_mockup_register_chip+0x1ec/0x2b8 gpio_mockup [ 61.177625] gpio_mockup_init+0xf0/0xd40 gpio_mockup [ 61.177634] do_one_initcall (init/main.c:1306) [ 61.177638] do_init_module (kernel/module/main.c:2457) [ 61.177644] load_module (kernel/module/main.c:2859) [ 61.177650] __do_sys_finit_module (kernel/module/main.c:2961) [ 61.177656] __arm64_sys_finit_module (kernel/module/main.c:2928) [ 61.177662] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:57) [ 61.177668] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:149) [ 61.177674] do_el0_svc (arch/arm64/kernel/syscall.c:194) [ 61.177680] el0_svc (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:638) [ 61.177683] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:656) [ 61.177686] el0t_64_sync (arch/arm64/kernel/entry.S:591) [ 62.011685] BUG kmalloc-512 (Not tainted): Poison overwritten [ 62.017513] ----------------------------------------------------------------------------- [ 62.017513] [ 62.027300] 0xffff00004ecb7a38-0xffff00004ecb7a47 @offset=31288. First byte 0x6a instead of 0x6b [ 62.036210] Allocated in swnode_register+0x40/0x218 age=808 cpu=3 pid=386 [ 62.043101] __kmem_cache_alloc_node (mm/slub.c:3345 mm/slub.c:3442 mm/slub.c:3491) [ 62.047784] kmalloc_trace (mm/slab_common.c:1064 (discriminator 4)) [ 62.051406] swnode_register (drivers/base/swnode.c:776) [ 62.055293] fwnode_create_software_node (drivers/base/swnode.c:934 (discriminator 4)) [ 62.060238] gpio_mockup_register_chip+0x1c4/0x2b8 gpio_mockup [ 62.066337] gpio_mockup_init+0xf0/0xd40 gpio_mockup [ 62.071551] do_one_initcall (init/main.c:1306) [ 62.075437] do_init_module (kernel/module/main.c:2457) [ 62.079238] load_module (kernel/module/main.c:2859) [ 62.083037] __do_sys_finit_module (kernel/module/main.c:2961) [ 62.087455] __arm64_sys_finit_module (kernel/module/main.c:2928) [ 62.092048] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:57) [ 62.095848] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:149) [ 62.100793] do_el0_svc (arch/arm64/kernel/syscall.c:194) [ 62.104151] el0_svc (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:638) [ 62.107244] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:656) [ 62.111570] Freed in software_node_release+0xdc/0x108 age=632 cpu=0 pid=428 [ 62.118633] __kmem_cache_free (mm/slub.c:3732 mm/slub.c:3788 mm/slub.c:3800) [ 62.122784] kfree (mm/slab_common.c:1020) [ 62.125788] software_node_release (drivers/base/swnode.c:761) [ 62.130204] kobject_put (lib/kobject.c:685 lib/kobject.c:712 include/linux/kref.h:65 lib/kobject.c:729) [ 62.133739] software_node_notify_remove (drivers/base/swnode.c:1093) [ 62.138597] device_del (drivers/base/core.c:2265 drivers/base/core.c:3778) [ 62.142134] platform_device_del.part.0 (drivers/base/platform.c:753) [ 62.146903] platform_device_unregister (drivers/base/platform.c:551 drivers/base/platform.c:794) [ 62.151672] gpio_mockup_exit+0x54/0x280 gpio_mockup [ 62.156888] __arm64_sys_delete_module (kernel/module/main.c:756 kernel/module/main.c:698 kernel/module/main.c:698) [ 62.161745] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:57) [ 62.165545] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:149) [ 62.170490] do_el0_svc (arch/arm64/kernel/syscall.c:194) [ 62.173850] el0_svc (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:638) [ 62.176941] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:656) [ 62.181267] el0t_64_sync (arch/arm64/kernel/entry.S:591) [ 62.184975] Slab 0xfffffc00013b2c00 objects=21 used=7 fp=0xffff00004ecb7400 flags=0x7fffc0000010200(slab|head|node=0|zone=1|lastcpupid=0xffff) [ 62.197943] Object 0xffff00004ecb7a00 @offset=31232 fp=0xffff00004ecb7400 [ 62.197943] [ 62.206325] Redzone ffff00004ecb7800: ... [ 63.089597] CPU: 3 PID: 510 Comm: modprobe Not tainted 6.3.0-rc1-next-20230307 #1 [ 63.097186] Hardware name: Raspberry Pi 4 Model B (DT) [ 63.102392] Call trace: [ 63.104865] dump_backtrace (arch/arm64/kernel/stacktrace.c:160) [ 63.108665] show_stack (arch/arm64/kernel/stacktrace.c:167) [ 63.112021] dump_stack_lvl (lib/dump_stack.c:107) [ 63.115734] dump_stack (lib/dump_stack.c:114) [ 63.119093] print_trailer (mm/slub.c:953) [ 63.122892] check_bytes_and_report (mm/slub.c:1058) [ 63.127395] check_object (mm/slub.c:1196 (discriminator 2)) [ 63.131104] alloc_debug_processing (mm/slub.c:1415 mm/slub.c:1425) [ 63.135606] get_partial_node.part.0 (mm/slub.c:2146 mm/slub.c:2279) [ 63.140286] ___slab_alloc (mm/slub.c:2268 mm/slub.c:2386 mm/slub.c:3188) [ 63.144084] __slab_alloc.constprop.0 (mm/slub.c:3292) [ 63.148674] __kmem_cache_alloc_node (mm/slub.c:3345 mm/slub.c:3442 mm/slub.c:3491) [ 63.153354] kmalloc_trace (mm/slab_common.c:1064 (discriminator 4)) [ 63.156974] device_add (drivers/base/core.c:3436 drivers/base/core.c:3486) [ 63.160508] platform_device_add (drivers/base/platform.c:717) [ 63.164837] platform_device_register_full (drivers/base/platform.c:844) [ 63.169959] gpio_mockup_register_chip+0x1ec/0x2b8 gpio_mockup [ 63.176057] gpio_mockup_init+0xf0/0xd40 gpio_mockup [ 63.181269] do_one_initcall (init/main.c:1306) [ 63.185155] do_init_module (kernel/module/main.c:2457) [ 63.188956] load_module (kernel/module/main.c:2859) [ 63.192755] __do_sys_finit_module (kernel/module/main.c:2961) [ 63.197171] __arm64_sys_finit_module (kernel/module/main.c:2928) [ 63.201765] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:57) [ 63.205565] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:149) [ 63.210510] do_el0_svc (arch/arm64/kernel/syscall.c:194) [ 63.213869] el0_svc (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:638) [ 63.216961] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:656) [ 63.221287] el0t_64_sync (arch/arm64/kernel/entry.S:591) [ 63.224998] FIX kmalloc-512: Restoring Poison 0xffff00004ecb7a38-0xffff00004ecb7a47=0x6b [ 63.233202] FIX kmalloc-512: Marking all objects used [ 63.399213] ============================================================================= links to the crash: - https://lkft.validation.linaro.org/scheduler/job/6224830#L1291 - https://lkft.validation.linaro.org/scheduler/job/6224742#L1202 - https://lkft.validation.linaro.org/scheduler/job/6224784#L3415 - https://lkft.validation.linaro.org/scheduler/job/6224810#L2029 metadata: git_ref: master git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git_sha: 709c6adf19dc558e44ab5c01659b09a16a2d3c82 git_describe: next-20230307 kernel_version: 6.3.0-rc1 kernel-config: https://storage.tuxsuite.com/public/linaro/lkft/builds/2MfXESbRAbSUj9oic6d8… build-url: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next/-/pipelines/798095907 artifact-location: https://storage.tuxsuite.com/public/linaro/lkft/builds/2MfXESbRAbSUj9oic6d8… toolchain: gcc-11 -- Linaro LKFT https://lkft.linaro.org

1 year, 7 months

5
8
0 0

[PATCH v6] selftests: rtc: Fixes rtctest error handling.

by Atul Kumar Pant

Adds a check to verify if the rtc device file is valid or not and prints a useful error message if the file is not accessible. Signed-off-by: Atul Kumar Pant <atulpant.linux(a)gmail.com> --- changes since v5: Updated error message to use strerror(). If the rtc file is invalid, the skip the test. changes since v4: Updated the commit message. changes since v3: Added Linux-kselftest and Linux-kernel mailing lists. changes since v2: Changed error message when rtc file does not exist. changes since v1: Removed check for uid=0 If rtc file is invalid, then exit the test. tools/testing/selftests/rtc/rtctest.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/rtc/rtctest.c b/tools/testing/selftests/rtc/rtctest.c index 630fef735c7e..27b466111885 100644 --- a/tools/testing/selftests/rtc/rtctest.c +++ b/tools/testing/selftests/rtc/rtctest.c @@ -15,6 +15,7 @@ #include <sys/types.h> #include <time.h> #include <unistd.h> +#include <error.h> #include "../kselftest_harness.h" #include "../kselftest.h" @@ -437,7 +438,7 @@ int main(int argc, char **argv) if (access(rtc_file, F_OK) == 0) ret = test_harness_run(argc, argv); else - ksft_exit_fail_msg("[ERROR]: Cannot access rtc file %s - Exiting\n", rtc_file); + ksft_exit_skip("%s: %s\n", rtc_file, strerror(errno)); return ret; } -- 2.25.1

1 year, 7 months

2
4
0 0

[PATCH v33 0/6] Implement IOCTL to get and optionally clear info about PTEs

by Muhammad Usama Anjum

*Changes in v33*: - Add PAGE_IS_FILE support for THPs *Changes in v31 and v32*: - Minor updates *Changes in v30*: - Rebase on top of next-20230815 - Minor nitpicks *Changes in v29:* - Polish IOCTL and improve documentation *Changes in v28:* - Fix walk_end and add 17 test cases in selftests patch *Changes in v27:* - Handle review comments and minor improvements - Add performance improvement patch on top with test for easy review *Changes in v26:* - Code re-structurring and API changes in PAGEMAP_IOCTL *Changes in v25*: - Do proper filtering on hole as well (hole got missed earlier) *Changes in v24*: - Rebase on top of next-20230710 - Place WP markers in case of hole as well *Changes in v23*: - Set vec_buf_index in loop only when vec_buf_index is set - Return -EFAULT instead of -EINVAL if vec is NULL - Correctly return the walk ending address to the page granularity *Changes in v22*: - Interface change: - Replace [start start + len) with [start, end) - Return the ending address of the address walk in start *Changes in v21*: - Abort walk instead of returning error if WP is to be performed on partial hugetlb *Changes in v20* - Correct PAGE_IS_FILE and add PAGE_IS_PFNZERO *Changes in v19* - Minor changes and interface updates *Changes in v18* - Rebase on top of next-20230613 - Minor updates *Changes in v17* - Rebase on top of next-20230606 - Minor improvements in PAGEMAP_SCAN IOCTL patch *Changes in v16* - Fix a corner case - Add exclusive PM_SCAN_OP_WP back *Changes in v15* - Build fix (Add missed build fix in RESEND) *Changes in v14* - Fix build error caused by #ifdef added at last minute in some configs *Changes in v13* - Rebase on top of next-20230414 - Give-up on using uffd_wp_range() and write new helpers, flush tlb only once *Changes in v12* - Update and other memory types to UFFD_FEATURE_WP_ASYNC - Rebaase on top of next-20230406 - Review updates *Changes in v11* - Rebase on top of next-20230307 - Base patches on UFFD_FEATURE_WP_UNPOPULATED - Do a lot of cosmetic changes and review updates - Remove ENGAGE_WP + !GET operation as it can be performed with UFFDIO_WRITEPROTECT *Changes in v10* - Add specific condition to return error if hugetlb is used with wp async - Move changes in tools/include/uapi/linux/fs.h to separate patch - Add documentation *Changes in v9:* - Correct fault resolution for userfaultfd wp async - Fix build warnings and errors which were happening on some configs - Simplify pagemap ioctl's code *Changes in v8:* - Update uffd async wp implementation - Improve PAGEMAP_IOCTL implementation *Changes in v7:* - Add uffd wp async - Update the IOCTL to use uffd under the hood instead of soft-dirty flags *Motivation* The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows GetWriteWatch() and ResetWriteWatch() syscalls [1]. The GetWriteWatch() retrieves the addresses of the pages that are written to in a region of virtual memory. This syscall is used in Windows applications and games etc. This syscall is being emulated in pretty slow manner in userspace. Our purpose is to enhance the kernel such that we translate it efficiently in a better way. Currently some out of tree hack patches are being used to efficiently emulate it in some kernels. We intend to replace those with these patches. So the whole gaming on Linux can effectively get benefit from this. It means there would be tons of users of this code. CRIU use case [2] was mentioned by Andrei and Danylo: > Use cases for migrating sparse VMAs are binaries sanitized with ASAN, > MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of > shadow memory [4]. Being able to migrate such binaries allows to highly > reduce the amount of work needed to identify and fix post-migration > crashes, which happen constantly. Andrei's defines the following uses of this code: * it is more granular and allows us to track changed pages more effectively. The current interface can clear dirty bits for the entire process only. In addition, reading info about pages is a separate operation. It means we must freeze the process to read information about all its pages, reset dirty bits, only then we can start dumping pages. The information about pages becomes more and more outdated, while we are processing pages. The new interface solves both these downsides. First, it allows us to read pte bits and clear the soft-dirty bit atomically. It means that CRIU will not need to freeze processes to pre-dump their memory. Second, it clears soft-dirty bits for a specified region of memory. It means CRIU will have actual info about pages to the moment of dumping them. * The new interface has to be much faster because basic page filtering is happening in the kernel. With the old interface, we have to read pagemap for each page. *Implementation Evolution (Short Summary)* From the definition of GetWriteWatch(), we feel like kernel's soft-dirty feature can be used under the hood with some additions like: * reset soft-dirty flag for only a specific region of memory instead of clearing the flag for the entire process * get and clear soft-dirty flag for a specific region atomically So we decided to use ioctl on pagemap file to read or/and reset soft-dirty flag. But using soft-dirty flag, sometimes we get extra pages which weren't even written. They had become soft-dirty because of VMA merging and VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were able to by-pass this short coming by ignoring VM_SOFTDIRTY until David reported that mprotect etc messes up the soft-dirty flag while ignoring VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We discussed if we can revert these patches. But we could not reach to any conclusion. So at this point, I made couple of tries to solve this whole VM_SOFTDIRTY issue by correcting the soft-dirty implementation: * [7] Correct the bug fixed wrongly back in 2014. It had potential to cause regression. We left it behind. * [8] Keep a list of soft-dirty part of a VMA across splits and merges. I got the reply don't increase the size of the VMA by 8 bytes. At this point, we left soft-dirty considering it is too much delicate and userfaultfd [9] seemed like the only way forward. From there onward, we have been basing soft-dirty emulation on userfaultfd wp feature where kernel resolves the faults itself when WP_ASYNC feature is used. It was straight forward to add WP_ASYNC feature in userfautlfd. Now we get only those pages dirty or written-to which are really written in reality. (PS There is another WP_UNPOPULATED userfautfd feature is required which is needed to avoid pre-faulting memory before write-protecting [9].) All the different masks were added on the request of CRIU devs to create interface more generic and better. [1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-… [2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com [3] https://github.com/google/sanitizers [4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit [5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com [6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/ [7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.… [8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.… [9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com [10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com * Original Cover letter from v8* Hello, Note: Soft-dirty pages and pages which have been written-to are synonyms. As kernel already has soft-dirty feature inside which we have given up to use, we are using written-to terminology while using UFFD async WP under the hood. It is possible to find and clear soft-dirty pages entirely in userspace. But it isn't efficient: - The mprotect and SIGSEGV handler for bookkeeping - The userfaultfd wp (synchronous) with the handler for bookkeeping Some benchmarks can be seen here[1]. This series adds features that weren't present earlier: - There is no atomic get soft-dirty/Written-to status and clear present in the kernel. - The pages which have been written-to can not be found in accurate way. (Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty pages than there actually are.) Historically, soft-dirty PTE bit tracking has been used in the CRIU project. The procfs interface is enough for finding the soft-dirty bit status and clearing the soft-dirty bit of all the pages of a process. We have the use case where we need to track the soft-dirty PTE bit for only specific pages on-demand. We need this tracking and clear mechanism of a region of memory while the process is running to emulate the getWriteWatch() syscall of Windows. *(Moved to using UFFD instead of soft-dirty feature to find pages which have been written-to from v7 patch series)*: Stop using the soft-dirty flags for finding which pages have been written to. It is too delicate and wrong as it shows more soft-dirty pages than the actual soft-dirty pages. There is no interest in correcting it [2][3] as this is how the feature was written years ago. It shouldn't be updated to changed behaviour. Peter Xu has suggested using the async version of the UFFD WP [4] as it is based inherently on the PTEs. So in this patch series, I've added a new mode to the UFFD which is asynchronous version of the write protect. When this variant of the UFFD WP is used, the page faults are resolved automatically by the kernel. The pages which have been written-to can be found by reading pagemap file (!PM_UFFD_WP). This feature can be used successfully to find which pages have been written to from the time the pages were write protected. This works just like the soft-dirty flag without showing any extra pages which aren't soft-dirty in reality. The information related to pages if the page is file mapped, present and swapped is required for the CRIU project [5][6]. The addition of the required mask, any mask, excluded mask and return masks are also required for the CRIU project [5]. The IOCTL returns the addresses of the pages which match the specific masks. The page addresses are returned in struct page_region in a compact form. The max_pages is needed to support a use case where user only wants to get a specific number of pages. So there is no need to find all the pages of interest in the range when max_pages is specified. The IOCTL returns when the maximum number of the pages are found. The max_pages is optional. If max_pages is specified, it must be equal or greater than the vec_size. This restriction is needed to handle worse case when one page_region only contains info of one page and it cannot be compacted. This is needed to emulate the Windows getWriteWatch() syscall. The patch series include the detailed selftest which can be used as an example for the uffd async wp test and PAGEMAP_IOCTL. It shows the interface usages as well. [1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora… [2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.… [3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.… [4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n [5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/ [6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/ Regards, Muhammad Usama Anjum Muhammad Usama Anjum (5): fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs fs/proc/task_mmu: Add fast paths to get/clear PAGE_IS_WRITTEN flag tools headers UAPI: Update linux/fs.h with the kernel sources mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL selftests: mm: add pagemap ioctl tests Peter Xu (1): userfaultfd: UFFD_FEATURE_WP_ASYNC Documentation/admin-guide/mm/pagemap.rst | 89 + Documentation/admin-guide/mm/userfaultfd.rst | 35 + fs/proc/task_mmu.c | 722 ++++++++ fs/userfaultfd.c | 26 +- include/linux/hugetlb.h | 1 + include/linux/userfaultfd_k.h | 28 +- include/uapi/linux/fs.h | 59 + include/uapi/linux/userfaultfd.h | 9 +- mm/hugetlb.c | 34 +- mm/memory.c | 28 +- tools/include/uapi/linux/fs.h | 59 + tools/testing/selftests/mm/.gitignore | 2 + tools/testing/selftests/mm/Makefile | 3 +- tools/testing/selftests/mm/config | 1 + tools/testing/selftests/mm/pagemap_ioctl.c | 1660 ++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 4 + 16 files changed, 2736 insertions(+), 24 deletions(-) create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c -- 2.40.1

1 year, 8 months

3
12
0 0

[PATCH v4 00/36] arm64/gcs: Provide support for GCS in userspace

by Mark Brown

The arm64 Guarded Control Stack (GCS) feature provides support for hardware protected stacks of return addresses, intended to provide hardening against return oriented programming (ROP) attacks and to make it easier to gather call stacks for applications such as profiling. When GCS is active a secondary stack called the Guarded Control Stack is maintained, protected with a memory attribute which means that it can only be written with specific GCS operations. The current GCS pointer can not be directly written to by userspace. When a BL is executed the value stored in LR is also pushed onto the GCS, and when a RET is executed the top of the GCS is popped and compared to LR with a fault being raised if the values do not match. GCS operations may only be performed on GCS pages, a data abort is generated if they are not. The combination of hardware enforcement and lack of extra instructions in the function entry and exit paths should result in something which has less overhead and is more difficult to attack than a purely software implementation like clang's shadow stacks. This series implements support for use of GCS by userspace, along with support for use of GCS within KVM guests. It does not enable use of GCS by either EL1 or EL2, this will be implemented separately. Executables are started without GCS and must use a prctl() to enable it, it is expected that this will be done very early in application execution by the dynamic linker or other startup code. x86 has an equivalent feature called shadow stacks, this series depends on the x86 patches for generic memory management support for the new guarded/shadow stack page type and shares APIs as much as possible. As there has been extensive discussion with the wider community around the ABI for shadow stacks I have as far as practical kept implementation decisions close to those for x86, anticipating that review would lead to similar conclusions in the absence of strong reasoning for divergence. The main divergence I am concious of is that x86 allows shadow stack to be enabled and disabled repeatedly, freeing the shadow stack for the thread whenever disabled, while this implementation keeps the GCS allocated after disable but refuses to reenable it. This is to avoid races with things actively walking the GCS during a disable, we do anticipate that some systems will wish to disable GCS at runtime but are not aware of any demand for subsequently reenabling it. x86 uses an arch_prctl() to manage enable and disable, since only x86 and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a patch set for the equivalent RISC-V zisslpcfi feature which I initially adopted fairly directly but following review feedback has been revised quite a bit. There is an open issue with support for CRIU, on x86 this required the ability to set the GCS mode via ptrace. This series supports configuring mode bits other than enable/disable via ptrace but it needs to be confirmed if this is sufficient. There's a few bits where I'm not convinced with where I've placed things, in particular the GCS write operation is in the GCS header not in uaccess.h, I wasn't sure what was clearest there and am probably too close to the code to have a clear opinion. The reporting of GCS in /proc/PID/smaps is also a bit awkward. The series depends on the x86 shadow stack support: https://lore.kernel.org/lkml/20230227222957.24501-1-rick.p.edgecombe@intel.… I've rebased this onto v6.5-rc4 but not included it in the series in order to avoid confusion with Rick's work and cut down the size of the series, you can see the branch at: https://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc.git arm64-gcs [1] https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/ Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Changes in v4: - Implement flags for map_shadow_stack() allowing the cap and end of stack marker to be enabled independently or not at all. - Relax size and alignment requirements for map_shadow_stack(). - Add more blurb explaining the advantages of hardware enforcement. - Link to v3: https://lore.kernel.org/r/20230731-arm64-gcs-v3-0-cddf9f980d98@kernel.org Changes in v3: - Rebase onto v6.5-rc4. - Add a GCS barrier on context switch. - Add a GCS stress test. - Link to v2: https://lore.kernel.org/r/20230724-arm64-gcs-v2-0-dc2c1d44c2eb@kernel.org Changes in v2: - Rebase onto v6.5-rc3. - Rework prctl() interface to allow each bit to be locked independently. - map_shadow_stack() now places the cap token based on the size requested by the caller not the actual space allocated. - Mode changes other than enable via ptrace are now supported. - Expand test coverage. - Various smaller fixes and adjustments. - Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org --- Mark Brown (36): prctl: arch-agnostic prctl for shadow stack arm64: Document boot requirements for Guarded Control Stacks arm64/gcs: Document the ABI for Guarded Control Stacks arm64/sysreg: Add new system registers for GCS arm64/sysreg: Add definitions for architected GCS caps arm64/gcs: Add manual encodings of GCS instructions arm64/gcs: Provide copy_to_user_gcs() arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS) arm64/mm: Allocate PIE slots for EL0 guarded control stack mm: Define VM_SHADOW_STACK for arm64 when we support GCS arm64/mm: Map pages for guarded control stack KVM: arm64: Manage GCS registers for guests arm64/gcs: Allow GCS usage at EL0 and EL1 arm64/idreg: Add overrride for GCS arm64/hwcap: Add hwcap for GCS arm64/traps: Handle GCS exceptions arm64/mm: Handle GCS data aborts arm64/gcs: Context switch GCS state for EL0 arm64/gcs: Allocate a new GCS for threads with GCS enabled arm64/gcs: Implement shadow stack prctl() interface arm64/mm: Implement map_shadow_stack() arm64/signal: Set up and restore the GCS context for signal handlers arm64/signal: Expose GCS state in signal frames arm64/ptrace: Expose GCS via ptrace and core files arm64: Add Kconfig for Guarded Control Stack (GCS) kselftest/arm64: Verify the GCS hwcap kselftest/arm64: Add GCS as a detected feature in the signal tests kselftest/arm64: Add framework support for GCS to signal handling tests kselftest/arm64: Allow signals tests to specify an expected si_code kselftest/arm64: Always run signals tests with GCS enabled kselftest/arm64: Add very basic GCS test program kselftest/arm64: Add a GCS test program built with the system libc kselftest/arm64: Add test coverage for GCS mode locking selftests/arm64: Add GCS signal tests kselftest/arm64: Add a GCS stress test kselftest/arm64: Enable GCS for the FP stress tests Documentation/admin-guide/kernel-parameters.txt | 3 + Documentation/arch/arm64/booting.rst | 22 + Documentation/arch/arm64/elf_hwcaps.rst | 3 + Documentation/arch/arm64/gcs.rst | 228 +++++++++ Documentation/arch/arm64/index.rst | 1 + Documentation/filesystems/proc.rst | 2 +- arch/arm64/Kconfig | 19 + arch/arm64/include/asm/cpufeature.h | 6 + arch/arm64/include/asm/el2_setup.h | 17 + arch/arm64/include/asm/esr.h | 28 +- arch/arm64/include/asm/exception.h | 2 + arch/arm64/include/asm/gcs.h | 106 ++++ arch/arm64/include/asm/hwcap.h | 1 + arch/arm64/include/asm/kvm_arm.h | 4 +- arch/arm64/include/asm/kvm_host.h | 12 + arch/arm64/include/asm/pgtable-prot.h | 14 +- arch/arm64/include/asm/processor.h | 7 + arch/arm64/include/asm/sysreg.h | 20 + arch/arm64/include/asm/uaccess.h | 42 ++ arch/arm64/include/uapi/asm/hwcap.h | 1 + arch/arm64/include/uapi/asm/ptrace.h | 8 + arch/arm64/include/uapi/asm/sigcontext.h | 9 + arch/arm64/kernel/cpufeature.c | 19 + arch/arm64/kernel/cpuinfo.c | 1 + arch/arm64/kernel/entry-common.c | 23 + arch/arm64/kernel/idreg-override.c | 2 + arch/arm64/kernel/process.c | 85 ++++ arch/arm64/kernel/ptrace.c | 59 +++ arch/arm64/kernel/signal.c | 237 ++++++++- arch/arm64/kernel/traps.c | 11 + arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 17 + arch/arm64/kvm/sys_regs.c | 22 + arch/arm64/mm/Makefile | 1 + arch/arm64/mm/fault.c | 78 ++- arch/arm64/mm/gcs.c | 234 +++++++++ arch/arm64/mm/mmap.c | 12 +- arch/arm64/tools/cpucaps | 1 + arch/arm64/tools/sysreg | 55 +++ fs/proc/task_mmu.c | 3 + include/linux/mm.h | 16 +- include/linux/syscalls.h | 1 + include/uapi/asm-generic/unistd.h | 5 +- include/uapi/linux/elf.h | 1 + include/uapi/linux/prctl.h | 22 + kernel/sys.c | 30 ++ kernel/sys_ni.c | 1 + tools/testing/selftests/arm64/Makefile | 2 +- tools/testing/selftests/arm64/abi/hwcap.c | 19 + tools/testing/selftests/arm64/fp/assembler.h | 15 + tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 + tools/testing/selftests/arm64/fp/sve-test.S | 2 + tools/testing/selftests/arm64/fp/za-test.S | 2 + tools/testing/selftests/arm64/fp/zt-test.S | 2 + tools/testing/selftests/arm64/gcs/.gitignore | 5 + tools/testing/selftests/arm64/gcs/Makefile | 24 + tools/testing/selftests/arm64/gcs/asm-offsets.h | 0 tools/testing/selftests/arm64/gcs/basic-gcs.c | 356 ++++++++++++++ tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 ++++++++ .../selftests/arm64/gcs/gcs-stress-thread.S | 311 ++++++++++++ tools/testing/selftests/arm64/gcs/gcs-stress.c | 532 +++++++++++++++++++++ tools/testing/selftests/arm64/gcs/gcs-util.h | 87 ++++ tools/testing/selftests/arm64/gcs/libc-gcs.c | 500 +++++++++++++++++++ tools/testing/selftests/arm64/signal/.gitignore | 1 + .../testing/selftests/arm64/signal/test_signals.c | 17 +- .../testing/selftests/arm64/signal/test_signals.h | 6 + .../selftests/arm64/signal/test_signals_utils.c | 32 +- .../selftests/arm64/signal/test_signals_utils.h | 39 ++ .../arm64/signal/testcases/gcs_exception_fault.c | 59 +++ .../selftests/arm64/signal/testcases/gcs_frame.c | 78 +++ .../arm64/signal/testcases/gcs_write_fault.c | 67 +++ .../selftests/arm64/signal/testcases/testcases.c | 7 + .../selftests/arm64/signal/testcases/testcases.h | 1 + 72 files changed, 3823 insertions(+), 34 deletions(-) --- base-commit: ed0e1456f04be7a93c9a186e8e13aed78b555617 change-id: 20230303-arm64-gcs-e311ab0d8729 Best regards, -- Mark Brown <broonie(a)kernel.org>

1 year, 8 months

5
88
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror August 2023