- Linux-kselftest-mirror - lists.linaro.org

[PATCH 0/2] Restrict address space for sv39,sv48,sv57

by Charlie Jenkins

Make sv39 the default address space for mmap as some applications currently depend on this assumption. The RISC-V specification enforces that bits outside of the virtual address range are not used, so restricting the size of the default address space as such should be temporary. A hint address passed to mmap will cause the largest address space that fits entirely into the hint to be used. If the hint is less than or equal to 1<<38, a 39-bit address will be used. After an address space is completely full, the next smallest address space will be used. Documentation is also added to the RISC-V virtual memory section to explain these changes. Charlie Jenkins (2): RISC-V: mm: Restrict address space for sv39,sv48,sv57 RISC-V: mm: Update documentation and include test Documentation/riscv/vm-layout.rst | 20 ++++++++ arch/riscv/include/asm/elf.h | 2 +- arch/riscv/include/asm/pgtable.h | 21 ++++++-- arch/riscv/include/asm/processor.h | 41 +++++++++++++--- tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/mm/Makefile | 22 +++++++++ .../selftests/riscv/mm/testcases/mmap.c | 49 +++++++++++++++++++ 7 files changed, 144 insertions(+), 13 deletions(-) create mode 100644 tools/testing/selftests/riscv/mm/Makefile create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap.c base-commit: eef509789cecdce895020682192d32e8bac790e8 -- 2.34.1

2 years

3
5
0 0

[PATCH v4 0/7] Optimize mremap during mutual alignment within PMD

by Joel Fernandes (Google)

Hello! Here is v4 of the mremap start address optimization / fix for exec warning. It took me a while to write a test that catches the issue me/Linus discussed in the last version. And I verified kernel crashes without the check. See below. The main changes in this series is: Care to be taken to move purely within a VMA, in other words this check in call_align_down(): if (vma->vm_start != addr_masked) return false; As an example of why this is needed: Consider the following range which is 2MB aligned and is a part of a larger 10MB range which is not shown. Each character is 256KB below making the source and destination 2MB each. The lower case letters are moved (s to d) and the upper case letters are not moved. |DDDDddddSSSSssss| If we align down 'ssss' to start from the 'SSSS', we will end up destroying SSSS. The above if statement prevents that and I verified it. I also added a test for this in the last patch. History of patches ================== v3->v4: 1. Make sure to check address to align is beginning of VMA 2. Add test to check this (test fails with a kernel crash if we don't do this). v2->v3: 1. Masked address was stored in int, fixed it to unsigned long to avoid truncation. 2. We now handle moves happening purely within a VMA, a new test is added to handle this. 3. More code comments. v1->v2: 1. Trigger the optimization for mremaps smaller than a PMD. I tested by tracing that it works correctly. 2. Fix issue with bogus return value found by Linus if we broke out of the above loop for the first PMD itself. v1: Initial RFC. Description of patches ====================== These patches optimizes the start addresses in move_page_tables() and tests the changes. It addresses a warning [1] that occurs due to a downward, overlapping move on a mutually-aligned offset within a PMD during exec. By initiating the copy process at the PMD level when such alignment is present, we can prevent this warning and speed up the copying process at the same time. Linus Torvalds suggested this idea. Please check the individual patches for more details. thanks, - Joel [1] https://lore.kernel.org/all/ZB2GTBD%2FLWTrkOiO@dhcp22.suse.cz/ Joel Fernandes (Google) (7): mm/mremap: Optimize the start addresses in move_page_tables() mm/mremap: Allow moves within the same VMA for stack selftests: mm: Fix failure case when new remap region was not found selftests: mm: Add a test for mutually aligned moves > PMD size selftests: mm: Add a test for remapping to area immediately after existing mapping selftests: mm: Add a test for remapping within a range selftests: mm: Add a test for moving from an offset from start of mapping fs/exec.c | 2 +- include/linux/mm.h | 2 +- mm/mremap.c | 63 ++++- tools/testing/selftests/mm/mremap_test.c | 301 +++++++++++++++++++---- 4 files changed, 319 insertions(+), 49 deletions(-) -- 2.41.0.rc2.161.g9c6817b8e7-goog

2 years

5
17
0 0

[linux-next:master] BUILD REGRESSION 53cdf865f90ba922a854c65ed05b519f9d728424

by kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master branch HEAD: 53cdf865f90ba922a854c65ed05b519f9d728424 Add linux-next specific files for 20230627 Error/Warning reports: https://lore.kernel.org/oe-kbuild-all/202306122223.HHER4zOo-lkp@intel.com https://lore.kernel.org/oe-kbuild-all/202306212119.fBNByIyn-lkp@intel.com https://lore.kernel.org/oe-kbuild-all/202306260401.qZlYQpV2-lkp@intel.com Error/Warning: (recently discovered and may have been fixed) arch/parisc/kernel/pdt.c:66:6: warning: no previous prototype for 'arch_report_meminfo' [-Wmissing-prototypes] drivers/char/mem.c:164:25: error: implicit declaration of function 'unxlate_dev_mem_ptr'; did you mean 'xlate_dev_mem_ptr'? [-Werror=implicit-function-declaration] drivers/gpu/drm/i915/soc/intel_gmch.c:41:13: warning: variable 'mchbar_addr' set but not used [-Wunused-but-set-variable] drivers/mfd/max77541.c:176:18: warning: cast to smaller integer type 'enum max7754x_ids' from 'const void *' [-Wvoid-pointer-to-enum-cast] lib/kunit/executor_test.c:138:4: warning: cast from 'void (*)(const void *)' to 'kunit_action_t *' (aka 'void (*)(void *)') converts to incompatible function type [-Wcast-function-type-strict] lib/kunit/test.c:775:38: warning: cast from 'void (*)(const void *)' to 'kunit_action_t *' (aka 'void (*)(void *)') converts to incompatible function type [-Wcast-function-type-strict] Unverified Error/Warning (likely false positive, please contact us if interested): drivers/usb/cdns3/cdns3-starfive.c:23: warning: expecting prototype for cdns3(). Prototype was for USB_STRAP_HOST() instead net/wireless/scan.c:373 cfg80211_gen_new_ie() warn: potential spectre issue 'sub->data' [r] net/wireless/scan.c:397 cfg80211_gen_new_ie() warn: possible spectre second half. 'ext_id' {standard input}: Error: local label `"2" (instance number 9 of a fb label)' is not defined {standard input}:1097: Error: pcrel too far {standard input}:13: Error: symbol `__export_symbol_alpha_mv' is already defined Error/Warning ids grouped by kconfigs: gcc_recent_errors |-- alpha-randconfig-r024-20230627 | `-- standard-input:Error:symbol-__export_symbol_alpha_mv-is-already-defined |-- i386-randconfig-m021-20230627 | |-- net-wireless-scan.c-cfg80211_gen_new_ie()-warn:possible-spectre-second-half.-ext_id | `-- net-wireless-scan.c-cfg80211_gen_new_ie()-warn:potential-spectre-issue-sub-data-r |-- parisc-allnoconfig | `-- arch-parisc-kernel-pdt.c:warning:no-previous-prototype-for-arch_report_meminfo |-- parisc-allyesconfig | `-- arch-parisc-kernel-pdt.c:warning:no-previous-prototype-for-arch_report_meminfo |-- parisc-defconfig | `-- arch-parisc-kernel-pdt.c:warning:no-previous-prototype-for-arch_report_meminfo |-- parisc-randconfig-r012-20230627 | `-- arch-parisc-kernel-pdt.c:warning:no-previous-prototype-for-arch_report_meminfo |-- parisc-randconfig-r013-20230627 | `-- arch-parisc-kernel-pdt.c:warning:no-previous-prototype-for-arch_report_meminfo |-- parisc-randconfig-r052-20230627 | `-- arch-parisc-kernel-pdt.c:warning:no-previous-prototype-for-arch_report_meminfo |-- parisc64-defconfig | `-- arch-parisc-kernel-pdt.c:warning:no-previous-prototype-for-arch_report_meminfo |-- riscv-allmodconfig | `-- drivers-usb-cdns3-cdns3-starfive.c:warning:expecting-prototype-for-cdns3().-Prototype-was-for-USB_STRAP_HOST()-instead |-- riscv-allyesconfig | `-- drivers-usb-cdns3-cdns3-starfive.c:warning:expecting-prototype-for-cdns3().-Prototype-was-for-USB_STRAP_HOST()-instead |-- sh-allmodconfig | |-- drivers-char-mem.c:error:implicit-declaration-of-function-unxlate_dev_mem_ptr | |-- standard-input:Error:local-label-(instance-number-of-a-fb-label)-is-not-defined | `-- standard-input:Error:pcrel-too-far |-- sh-j2_defconfig | `-- drivers-char-mem.c:error:implicit-declaration-of-function-unxlate_dev_mem_ptr |-- sh-randconfig-r015-20230627 | `-- drivers-char-mem.c:error:implicit-declaration-of-function-unxlate_dev_mem_ptr |-- sh-rsk7203_defconfig | `-- drivers-char-mem.c:error:implicit-declaration-of-function-unxlate_dev_mem_ptr `-- x86_64-buildonly-randconfig-r001-20230627 `-- drivers-gpu-drm-i915-soc-intel_gmch.c:warning:variable-mchbar_addr-set-but-not-used clang_recent_errors |-- arm-randconfig-r031-20230627 | `-- lib-kunit-test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type |-- hexagon-randconfig-r041-20230627 | |-- lib-kunit-executor_test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type | `-- lib-kunit-test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type |-- powerpc-randconfig-r023-20230627 | |-- lib-kunit-executor_test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type | `-- lib-kunit-test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type |-- riscv-randconfig-r042-20230627 | `-- lib-kunit-test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type `-- x86_64-randconfig-r011-20230627 `-- drivers-mfd-max77541.c:warning:cast-to-smaller-integer-type-enum-max7754x_ids-from-const-void elapsed time: 725m configs tested: 137 configs skipped: 7 tested configs: alpha allyesconfig gcc alpha defconfig gcc alpha randconfig-r004-20230627 gcc arc allyesconfig gcc arc defconfig gcc arc randconfig-r043-20230627 gcc arm allmodconfig gcc arm allyesconfig gcc arm assabet_defconfig gcc arm bcm2835_defconfig clang arm defconfig gcc arm lpc32xx_defconfig clang arm multi_v7_defconfig gcc arm pxa910_defconfig gcc arm randconfig-r031-20230627 clang arm randconfig-r035-20230627 clang arm randconfig-r046-20230627 gcc arm spear13xx_defconfig clang arm versatile_defconfig clang arm64 alldefconfig gcc arm64 allyesconfig gcc arm64 defconfig gcc arm64 randconfig-r003-20230627 gcc csky defconfig gcc hexagon randconfig-r041-20230627 clang hexagon randconfig-r045-20230627 clang i386 allyesconfig gcc i386 buildonly-randconfig-r004-20230627 gcc i386 buildonly-randconfig-r005-20230627 gcc i386 buildonly-randconfig-r006-20230627 gcc i386 debian-10.3 gcc i386 defconfig gcc i386 randconfig-i001-20230627 gcc i386 randconfig-i002-20230627 gcc i386 randconfig-i003-20230627 gcc i386 randconfig-i004-20230627 gcc i386 randconfig-i005-20230627 gcc i386 randconfig-i006-20230627 gcc i386 randconfig-i011-20230627 clang i386 randconfig-i012-20230627 clang i386 randconfig-i013-20230627 clang i386 randconfig-i014-20230627 clang i386 randconfig-i015-20230627 clang i386 randconfig-i016-20230627 clang loongarch allmodconfig gcc loongarch allnoconfig gcc loongarch defconfig gcc loongarch randconfig-r024-20230627 gcc loongarch randconfig-r032-20230627 gcc m68k allmodconfig gcc m68k allyesconfig gcc m68k defconfig gcc m68k stmark2_defconfig gcc m68k sun3x_defconfig gcc microblaze defconfig gcc mips allmodconfig gcc mips allyesconfig gcc mips mtx1_defconfig clang mips randconfig-r022-20230627 gcc mips xway_defconfig gcc nios2 defconfig gcc nios2 randconfig-r014-20230627 gcc nios2 randconfig-r026-20230627 gcc openrisc randconfig-r005-20230627 gcc openrisc randconfig-r016-20230627 gcc parisc allyesconfig gcc parisc defconfig gcc parisc randconfig-r012-20230627 gcc parisc randconfig-r013-20230627 gcc parisc64 defconfig gcc powerpc allmodconfig gcc powerpc allnoconfig gcc powerpc klondike_defconfig gcc powerpc makalu_defconfig gcc powerpc mpc8315_rdb_defconfig clang powerpc mpc834x_itxgp_defconfig clang powerpc mpc836x_rdk_defconfig clang powerpc mvme5100_defconfig clang powerpc pasemi_defconfig gcc powerpc randconfig-r023-20230627 clang powerpc tqm8540_defconfig clang powerpc tqm8548_defconfig gcc riscv allmodconfig gcc riscv allnoconfig gcc riscv allyesconfig gcc riscv defconfig gcc riscv randconfig-r042-20230627 clang riscv rv32_defconfig gcc s390 allmodconfig gcc s390 allyesconfig gcc s390 defconfig gcc s390 randconfig-r021-20230627 clang s390 randconfig-r044-20230627 clang sh allmodconfig gcc sh dreamcast_defconfig gcc sh j2_defconfig gcc sh r7780mp_defconfig gcc sh randconfig-r015-20230627 gcc sh rsk7203_defconfig gcc sh rts7751r2dplus_defconfig gcc sh sdk7780_defconfig gcc sh se7751_defconfig gcc sparc allyesconfig gcc sparc defconfig gcc sparc64 randconfig-r001-20230627 gcc sparc64 randconfig-r025-20230627 gcc um allmodconfig clang um allnoconfig clang um allyesconfig clang um defconfig gcc um i386_defconfig gcc um x86_64_defconfig gcc x86_64 allyesconfig gcc x86_64 buildonly-randconfig-r001-20230627 gcc x86_64 buildonly-randconfig-r002-20230627 gcc x86_64 buildonly-randconfig-r003-20230627 gcc x86_64 defconfig gcc x86_64 kexec gcc x86_64 randconfig-r011-20230627 clang x86_64 randconfig-x001-20230627 clang x86_64 randconfig-x002-20230627 clang x86_64 randconfig-x003-20230627 clang x86_64 randconfig-x004-20230627 clang x86_64 randconfig-x005-20230627 clang x86_64 randconfig-x006-20230627 clang x86_64 randconfig-x011-20230627 gcc x86_64 randconfig-x012-20230627 gcc x86_64 randconfig-x013-20230627 gcc x86_64 randconfig-x014-20230627 gcc x86_64 randconfig-x015-20230627 gcc x86_64 randconfig-x016-20230627 gcc x86_64 rhel-8.3-rust clang x86_64 rhel-8.3 gcc xtensa alldefconfig gcc xtensa common_defconfig gcc xtensa randconfig-r002-20230627 gcc xtensa randconfig-r006-20230627 gcc -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

2 years

1
0
0 0

[GIT PULL] KUnit next update for Linux 6.5-rc1

by Shuah Khan

Hi Linus, Please pull the following KUnit next update for Linux 6.5-rc1. This KUnit update for Linux 6.5-rc1 consists of: - kunit_add_action() API to defer a call until test exit. - Update document to add kunit_add_action() usage notes. - Changes to always run cleanup from a test kthread. - Documentation updates to clarify cleanup usage - assertions should not be used in cleanup - Documentation update to clearly indicate that exit functions should run even if init fails - Several fixes and enhancements to existing tests. diff is attached. thanks, -- Shuah ---------------------------------------------------------------- The following changes since commit ac9a78681b921877518763ba0e89202254349d1b: Linux 6.4-rc1 (2023-05-07 13:34:35 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-kunit-6.5-rc1 for you to fetch changes up to 2e66833579ed759d7b7da1a8f07eb727ec6e80db: MAINTAINERS: Add source tree entry for kunit (2023-06-15 09:16:01 -0600) ---------------------------------------------------------------- linux-kselftest-kunit-6.5-rc1 This KUnit update for Linux 6.5-rc1 consists of: - kunit_add_action() API to defer a call until test exit. - Update document to add kunit_add_action() usage notes. - Changes to always run cleanup from a test kthread. - Documentation updates to clarify cleanup usage - assertions should not be used in cleanup - Documentation update to clearly indicate that exit functions should run even if init fails - Several fixes and enhancements to existing tests. ---------------------------------------------------------------- Daniel Latypov (1): kunit: tool: undo type subscripts for subprocess.Popen David Gow (11): kunit: Always run cleanup from a test kthread Documentation: kunit: Note that assertions should not be used in cleanup Documentation: kunit: Warn that exit functions run even if init fails kunit: example: Provide example exit functions kunit: Add kunit_add_action() to defer a call until test exit kunit: executor_test: Use kunit_add_action() kunit: kmalloc_array: Use kunit_add_action() Documentation: kunit: Add usage notes for kunit_add_action() kunit: Fix obsolete name in documentation headers (func->action) kunit: Move kunit_abort() call out of kunit_do_failed_assertion() Documentation: kunit: Rename references to kunit_abort() Geert Uytterhoeven (1): Documentation: kunit: Modular tests should not depend on KUNIT=y Michal Wajdeczko (3): kunit/test: Add example test showing parameterized testing kunit: Fix reporting of the skipped parameterized tests kunit: Update kunit_print_ok_not_ok function SeongJae Park (1): MAINTAINERS: Add source tree entry for kunit Takashi Sakamoto (1): Documentation: Kunit: add MODULE_LICENSE to sample code Documentation/dev-tools/kunit/architecture.rst | 4 +- Documentation/dev-tools/kunit/start.rst | 7 +- Documentation/dev-tools/kunit/usage.rst | 69 ++++++++++- MAINTAINERS | 2 + include/kunit/resource.h | 92 +++++++++++++++ include/kunit/test.h | 34 ++++-- lib/kunit/executor_test.c | 11 +- lib/kunit/kunit-example-test.c | 56 +++++++++ lib/kunit/kunit-test.c | 88 +++++++++++++- lib/kunit/resource.c | 99 ++++++++++++++++ lib/kunit/test.c | 157 ++++++++++++++----------- tools/testing/kunit/kunit_kernel.py | 6 +- tools/testing/kunit/mypy.ini | 6 + tools/testing/kunit/run_checks.py | 2 +- 14 files changed, 538 insertions(+), 95 deletions(-) create mode 100644 tools/testing/kunit/mypy.ini ----------------------------------------------------------------

2 years

2
1
0 0

[PATCH 0/4] rseq selftests updates

by Mathieu Desnoyers

Hi Shuah, This series contains updates to the rseq selftests. * A typo in the Makefile prevents the basic_percpu_ops_mm_cid_test to use the mm_cid field. * Fix load-acquire/store-release macros which were buggy on arm64. (this depends on commit "Implement rseq_unqual_scalar_typeof"). * The change "Use rseq_unqual_scalar_typeof in macros" is not a fix per se, but improves the assembler generated. Can you pick these in the selftests tree please ? Thanks, Mathieu Mathieu Desnoyers (4): selftests/rseq: Fix CID_ID typo in Makefile selftests/rseq: Implement rseq_unqual_scalar_typeof selftests/rseq: Fix arm64 buggy load-acquire/store-release macros selftests/rseq: Use rseq_unqual_scalar_typeof in macros tools/testing/selftests/rseq/Makefile | 2 +- tools/testing/selftests/rseq/compiler.h | 26 ++++++++++ tools/testing/selftests/rseq/rseq-arm.h | 4 +- tools/testing/selftests/rseq/rseq-arm64.h | 58 ++++++++++++----------- tools/testing/selftests/rseq/rseq-mips.h | 4 +- tools/testing/selftests/rseq/rseq-ppc.h | 4 +- tools/testing/selftests/rseq/rseq-riscv.h | 6 +-- tools/testing/selftests/rseq/rseq-s390.h | 4 +- tools/testing/selftests/rseq/rseq-x86.h | 4 +- 9 files changed, 70 insertions(+), 42 deletions(-) -- 2.25.1

2 years

1
4
0 0

[PATCH bpf-next v3 0/7] Add SO_REUSEPORT support for TC bpf_sk_assign

by Lorenz Bauer

We want to replace iptables TPROXY with a BPF program at TC ingress. To make this work in all cases we need to assign a SO_REUSEPORT socket to an skb, which is currently prohibited. This series adds support for such sockets to bpf_sk_assing. I did some refactoring to cut down on the amount of duplicate code. The key to this is to use INDIRECT_CALL in the reuseport helpers. To show that this approach is not just beneficial to TC sk_assign I removed duplicate code for bpf_sk_lookup as well. Changes from v1: - Correct commit abbrev length (Kuniyuki) - Reduce duplication (Kuniyuki) - Add checks on sk_state (Martin) - Split exporting inet[6]_lookup_reuseport into separate patch (Eric) Joint work with Daniel Borkmann. Signed-off-by: Lorenz Bauer <lmb(a)isovalent.com> --- Changes in v3: - Fix warning re udp_ehashfn and udp6_ehashfn (Simon) - Return higher scoring connected UDP reuseport sockets (Kuniyuki) - Fix ipv6 module builds - Link to v2: https://lore.kernel.org/r/20230613-so-reuseport-v2-0-b7c69a342613@isovalent… --- Daniel Borkmann (1): selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper Lorenz Bauer (6): udp: re-score reuseport groups when connected sockets are present net: export inet_lookup_reuseport and inet6_lookup_reuseport net: document inet[6]_lookup_reuseport sk_state requirements net: remove duplicate reuseport_lookup functions net: remove duplicate sk_lookup helpers bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign include/net/inet6_hashtables.h | 84 ++++++++- include/net/inet_hashtables.h | 77 +++++++- include/net/sock.h | 7 +- include/net/udp.h | 8 + include/uapi/linux/bpf.h | 3 - net/core/filter.c | 2 - net/ipv4/inet_hashtables.c | 70 +++++--- net/ipv4/udp.c | 88 ++++----- net/ipv6/inet6_hashtables.c | 73 +++++--- net/ipv6/udp.c | 98 ++++------ tools/include/uapi/linux/bpf.h | 3 - tools/testing/selftests/bpf/network_helpers.c | 3 + .../selftests/bpf/prog_tests/assign_reuse.c | 197 +++++++++++++++++++++ .../selftests/bpf/progs/test_assign_reuse.c | 142 +++++++++++++++ 14 files changed, 676 insertions(+), 179 deletions(-) --- base-commit: 970308a7b544fa1c7ee98a2721faba3765be8dd8 change-id: 20230613-so-reuseport-e92c526173ee Best regards, -- Lorenz Bauer <lmb(a)isovalent.com>

2 years

3
16
0 0

[PATCH] selftests: timers: set-timer-lat: Remove unneeded semicolon

by baomingtong001＠208suo.com

./tools/testing/selftests/timers/set-timer-lat.c:83:2-3: Unneeded semicolon Signed-off-by: Mingtong Bao <baomingtong001(a)208suo.com> --- tools/testing/selftests/timers/set-timer-lat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/timers/set-timer-lat.c b/tools/testing/selftests/timers/set-timer-lat.c index 50da45437daa..d60bbcad487f 100644 --- a/tools/testing/selftests/timers/set-timer-lat.c +++ b/tools/testing/selftests/timers/set-timer-lat.c @@ -80,7 +80,7 @@ char *clockstring(int clockid) return "CLOCK_BOOTTIME_ALARM"; case CLOCK_TAI: return "CLOCK_TAI"; - }; + } return "UNKNOWN_CLOCKID"; } -- 2.40.1

2 years

2
1
0 0

[PATCH v3 0/9] cgroup/cpuset: Support remote partitions

by Waiman Long

v3: - [v2] https://lore.kernel.org/lkml/20230531163405.2200292-1-longman@redhat.com/ - Change the new control file from root-only "cpuset.cpus.reserve" to non-root "cpuset.cpus.exclusive" which lists the set of exclusive CPUs distributed down the hierarchy. - Add a patch to restrict boot-time isolated CPUs to isolated partitions only. - Update the test_cpuset_prs.sh test script and documentation accordingly. v2: - [v1] https://lore.kernel.org/lkml/20230412153758.3088111-1-longman@redhat.com/ - Dropped the special "isolcpus" partition in v1 - Add the root only "cpuset.cpus.reserve" control file for reserving CPUs used for remote isolated partitions. - Update the test_cpuset_prs.sh test script and documentation accordingly. This patch series introduces a new cpuset control file "cpuset.cpus.exclusive" which must be a subset of "cpuset.cpus" and the parent's "cpuset.cpus.exclusive". This control file lists the exclusive CPUs to be distributed down the hierarchy. Any one of the exclusive CPUs can only be distributed to at most one child cpuset. Unlike "cpuset.cpus", invalid input to "cpuset.cpus.exclusive" will be rejected with an error. This new control file has no effect on the behavior of the cpuset until it turns into a partition root. At that point, its effective CPUs will be set to its exclusive CPUs unless some of them are offline. This patch series also introduces a new category of cpuset partition called remote partitions. The existing partition category where the partition roots have to be clustered around the root cgroup in a hierarchical way is now referred to as local partitions. A remote partition can be formed far from the root cgroup with no partition root parent. While local partitions can be created without touching "cpuset.cpus.exclusive" as it can be set automatically if a cpuset becomes a local partition root. Properly set "cpuset.cpus.exclusive" values down the hierarchy are required to create a remote partition. Both scheduling and isolated partitions can be formed in a remote partition. A local partition can be created under a remote partition. A remote partition, however, cannot be formed under a local partition for now. Modern container orchestration tools like Kubernetes use the cgroup hierarchy to manage different containers. And it is relying on other middleware like systemd to help managing it. If a container needs to use isolated CPUs, it is hard to get those with the local partitions as it will require the administrative parent cgroup to be a partition root too which tool like systemd may not be ready to manage. With this patch series, we allow the creation of remote partition far from the root. The container management tool can manage the "cpuset.cpus.exclusive" file without impacting the other cpuset files that are managed by other middlewares. Of course, invalid "cpuset.cpus.exclusive" values will be rejected and changes to "cpuset.cpus" can affect the value of "cpuset.cpus.exclusive" due to the requirement that it has to be a subset of the former control file. Waiman Long (9): cgroup/cpuset: Inherit parent's load balance state in v2 cgroup/cpuset: Extract out CS_CPU_EXCLUSIVE & CS_SCHED_LOAD_BALANCE handling cgroup/cpuset: Improve temporary cpumasks handling cgroup/cpuset: Allow suppression of sched domain rebuild in update_cpumasks_hier() cgroup/cpuset: Add cpuset.cpus.exclusive for v2 cgroup/cpuset: Introduce remote partition cgroup/cpuset: Check partition conflict with housekeeping setup cgroup/cpuset: Documentation update for partition cgroup/cpuset: Extend test_cpuset_prs.sh to test remote partition Documentation/admin-guide/cgroup-v2.rst | 100 +- kernel/cgroup/cpuset.c | 1352 ++++++++++++----- .../selftests/cgroup/test_cpuset_prs.sh | 398 +++-- 3 files changed, 1297 insertions(+), 553 deletions(-) -- 2.31.1

2 years

2
11
0 0

[linux-next:master] BUILD REGRESSION 60e7c4a25da68cd826719b685babbd23e73b85b0

by kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master branch HEAD: 60e7c4a25da68cd826719b685babbd23e73b85b0 Add linux-next specific files for 20230626 Error/Warning reports: https://lore.kernel.org/oe-kbuild-all/202306122223.HHER4zOo-lkp@intel.com https://lore.kernel.org/oe-kbuild-all/202306262024.HUr6WfyW-lkp@intel.com Error/Warning: (recently discovered and may have been fixed) arch/parisc/kernel/pdt.c:66:6: warning: no previous prototype for 'arch_report_meminfo' [-Wmissing-prototypes] drivers/char/mem.c:164:25: error: implicit declaration of function 'unxlate_dev_mem_ptr'; did you mean 'xlate_dev_mem_ptr'? [-Werror=implicit-function-declaration] drivers/gpu/drm/i915/display/intel_display_power.h:255:70: error: declaration of 'struct seq_file' will not be visible outside of this function [-Werror,-Wvisibility] drivers/gpu/drm/i915/display/intel_display_power.h:256:70: error: declaration of 'struct seq_file' will not be visible outside of this function [-Werror,-Wvisibility] lib/kunit/executor_test.c:138:4: warning: cast from 'void (*)(const void *)' to 'kunit_action_t *' (aka 'void (*)(void *)') converts to incompatible function type [-Wcast-function-type-strict] lib/kunit/test.c:775:38: warning: cast from 'void (*)(const void *)' to 'kunit_action_t *' (aka 'void (*)(void *)') converts to incompatible function type [-Wcast-function-type-strict] Unverified Error/Warning (likely false positive, please contact us if interested): drivers/usb/cdns3/cdns3-starfive.c:23: warning: expecting prototype for cdns3(). Prototype was for USB_STRAP_HOST() instead fs/btrfs/volumes.c:6407 btrfs_map_block() error: we previously assumed 'mirror_num_ret' could be null (see line 6244) lib/kunit/test.c:336 __kunit_abort() warn: ignoring unreachable code. {standard input}: Error: local label `"2" (instance number 9 of a fb label)' is not defined {standard input}:1097: Error: pcrel too far Error/Warning ids grouped by kconfigs: gcc_recent_errors |-- i386-randconfig-m021-20230625 | `-- fs-btrfs-volumes.c-btrfs_map_block()-error:we-previously-assumed-mirror_num_ret-could-be-null-(see-line-) |-- parisc-allyesconfig | `-- arch-parisc-kernel-pdt.c:warning:no-previous-prototype-for-arch_report_meminfo |-- parisc-defconfig | `-- arch-parisc-kernel-pdt.c:warning:no-previous-prototype-for-arch_report_meminfo |-- parisc-generic-64bit_defconfig | `-- arch-parisc-kernel-pdt.c:warning:no-previous-prototype-for-arch_report_meminfo |-- parisc-randconfig-r012-20230626 | `-- arch-parisc-kernel-pdt.c:warning:no-previous-prototype-for-arch_report_meminfo |-- parisc-randconfig-r026-20230626 | `-- arch-parisc-kernel-pdt.c:warning:no-previous-prototype-for-arch_report_meminfo |-- parisc-randconfig-r053-20230625 | `-- arch-parisc-kernel-pdt.c:warning:no-previous-prototype-for-arch_report_meminfo |-- parisc64-defconfig | `-- arch-parisc-kernel-pdt.c:warning:no-previous-prototype-for-arch_report_meminfo |-- powerpc-randconfig-m041-20230625 | `-- lib-kunit-test.c-__kunit_abort()-warn:ignoring-unreachable-code. |-- riscv-allmodconfig | `-- drivers-usb-cdns3-cdns3-starfive.c:warning:expecting-prototype-for-cdns3().-Prototype-was-for-USB_STRAP_HOST()-instead |-- riscv-allyesconfig | `-- drivers-usb-cdns3-cdns3-starfive.c:warning:expecting-prototype-for-cdns3().-Prototype-was-for-USB_STRAP_HOST()-instead |-- sh-allmodconfig | |-- drivers-char-mem.c:error:implicit-declaration-of-function-unxlate_dev_mem_ptr | |-- standard-input:Error:local-label-(instance-number-of-a-fb-label)-is-not-defined | `-- standard-input:Error:pcrel-too-far |-- sh-rsk7201_defconfig | `-- drivers-char-mem.c:error:implicit-declaration-of-function-unxlate_dev_mem_ptr |-- sh-se7619_defconfig | `-- drivers-char-mem.c:error:implicit-declaration-of-function-unxlate_dev_mem_ptr `-- sparc64-randconfig-r053-20230625 `-- net-bluetooth-hci_conn.c:WARNING-opportunity-for-kmemdup clang_recent_errors |-- arm-randconfig-r016-20230626 | |-- lib-kunit-executor_test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type | `-- lib-kunit-test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type |-- hexagon-randconfig-r041-20230626 | |-- lib-kunit-executor_test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type | `-- lib-kunit-test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type |-- i386-randconfig-i004-20230626 | `-- drivers-gpu-drm-i915-display-intel_display_power.h:error:declaration-of-struct-seq_file-will-not-be-visible-outside-of-this-function-Werror-Wvisibility |-- um-randconfig-r024-20230626 | |-- lib-kunit-executor_test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type | `-- lib-kunit-test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type `-- x86_64-randconfig-r004-20230626 `-- drivers-gpu-drm-i915-display-intel_display_power.h:error:declaration-of-struct-seq_file-will-not-be-visible-outside-of-this-function-Werror-Wvisibility elapsed time: 1076m configs tested: 124 configs skipped: 8 tested configs: alpha allyesconfig gcc alpha defconfig gcc alpha randconfig-r002-20230626 gcc alpha randconfig-r036-20230626 gcc arc allyesconfig gcc arc defconfig gcc arc nsim_700_defconfig gcc arc randconfig-r023-20230626 gcc arc randconfig-r043-20230626 gcc arm allmodconfig gcc arm allyesconfig gcc arm aspeed_g4_defconfig clang arm defconfig gcc arm footbridge_defconfig gcc arm h3600_defconfig gcc arm imx_v4_v5_defconfig clang arm ixp4xx_defconfig clang arm multi_v7_defconfig gcc arm mxs_defconfig clang arm netwinder_defconfig clang arm randconfig-r003-20230626 gcc arm randconfig-r006-20230626 gcc arm randconfig-r016-20230626 clang arm randconfig-r034-20230626 gcc arm randconfig-r046-20230626 clang arm s5pv210_defconfig clang arm spear3xx_defconfig clang arm64 allyesconfig gcc arm64 defconfig gcc arm64 randconfig-r031-20230626 clang csky defconfig gcc csky randconfig-r013-20230626 gcc hexagon randconfig-r041-20230626 clang hexagon randconfig-r045-20230626 clang i386 alldefconfig gcc i386 allyesconfig gcc i386 buildonly-randconfig-r004-20230626 clang i386 buildonly-randconfig-r005-20230626 clang i386 buildonly-randconfig-r006-20230626 clang i386 debian-10.3 gcc i386 defconfig gcc i386 randconfig-i001-20230626 clang i386 randconfig-i002-20230626 clang i386 randconfig-i003-20230626 clang i386 randconfig-i004-20230626 clang i386 randconfig-i005-20230626 clang i386 randconfig-i006-20230626 clang i386 randconfig-i011-20230626 gcc i386 randconfig-i012-20230626 gcc i386 randconfig-i013-20230626 gcc i386 randconfig-i014-20230626 gcc i386 randconfig-i015-20230626 gcc i386 randconfig-i016-20230626 gcc loongarch allmodconfig gcc loongarch allnoconfig gcc loongarch defconfig gcc m68k allmodconfig gcc m68k allyesconfig gcc m68k defconfig gcc m68k hp300_defconfig gcc m68k randconfig-r011-20230626 gcc m68k randconfig-r015-20230626 gcc microblaze randconfig-r005-20230626 gcc mips allmodconfig gcc mips allyesconfig gcc mips ath79_defconfig clang mips ip22_defconfig clang mips loongson2k_defconfig clang mips malta_defconfig clang mips malta_qemu_32r6_defconfig clang mips maltaaprp_defconfig clang mips randconfig-r035-20230626 gcc nios2 defconfig gcc nios2 randconfig-r032-20230626 gcc openrisc defconfig gcc openrisc simple_smp_defconfig gcc parisc allyesconfig gcc parisc defconfig gcc parisc generic-64bit_defconfig gcc parisc randconfig-r012-20230626 gcc parisc randconfig-r026-20230626 gcc parisc64 defconfig gcc powerpc allmodconfig gcc powerpc allnoconfig gcc powerpc arches_defconfig gcc powerpc currituck_defconfig gcc powerpc pcm030_defconfig gcc powerpc randconfig-r021-20230626 gcc powerpc socrates_defconfig clang riscv alldefconfig clang riscv allmodconfig gcc riscv allnoconfig gcc riscv allyesconfig gcc riscv defconfig gcc riscv randconfig-r042-20230626 gcc riscv rv32_defconfig gcc s390 allmodconfig gcc s390 allyesconfig gcc s390 defconfig gcc s390 randconfig-r044-20230626 gcc sh allmodconfig gcc sh defconfig gcc sh rsk7201_defconfig gcc sh se7619_defconfig gcc sh sh7785lcr_32bit_defconfig gcc sparc allyesconfig gcc sparc defconfig gcc um allmodconfig clang um allnoconfig clang um allyesconfig clang um defconfig gcc um i386_defconfig gcc um randconfig-r024-20230626 clang um x86_64_defconfig gcc x86_64 allyesconfig gcc x86_64 buildonly-randconfig-r001-20230626 clang x86_64 buildonly-randconfig-r002-20230626 clang x86_64 buildonly-randconfig-r003-20230626 clang x86_64 defconfig gcc x86_64 kexec gcc x86_64 randconfig-r022-20230626 gcc x86_64 rhel-8.3-rust clang x86_64 rhel-8.3 gcc xtensa generic_kc705_defconfig gcc -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

2 years

1
0
0 0

arm64: fp-stress: BUG: KFENCE: memory corruption in fpsimd_release_task

by Naresh Kamboju

Following kernel BUG noticed while running selftests arm64 fp-stress running stable rc kernel versions 6.1.29-rc1 and 6.3.3-rc1. Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org> # selftests: arm64: fp-stress # TAP version 13 # 1..80 # # 8 CPUs, 3 SVE VLs, 3 SME VLs, SME2 absent # # Will run for 10s ... # # ZA-VL-32-4: PID: 1091 # # [ 263.834190] ================================================================== [ 263.834270] BUG: KFENCE: memory corruption in fpsimd_release_task+0x28/0x50 [ 263.834270] ZA-V[ 263.834419] Corrupted memory at 0x00000000d9c0a375 [ ! ! ! ! ! ! . . . . . . . . . . ] (in kfence-#158): L-64-[ 263.834929] fpsimd_release_task+0x28/0x50 [ 263.835074] arch_release_task_struct+0x1c/0x30 [ 263.835221] __put_task_struct+0x164/0x220 [ 263.835336] delayed_put_task_struct+0x60/0x128 4: [ 263.835484] rcu_core+0x318/0x950 [ 263.835632] rcu_core_si+0x1c/0x30 [ 263.835770] __do_softirq+0x110/0x3d8 Stre[ 263.835874] run_ksoftirqd+0x40/0xe0 [ 263.835994] smpboot_thread_fn+0x1d0/0x260 [ 263.836105] kthread+0xec/0x190 [ 263.836221] ret_from_fork+0x10/0x20 [ 263.836342] ami[ 263.836393] kfence-#158: 0x00000000c8819329-0x000000009e00cc22, size=546, cache=kmalloc-1k [ 263.836393] [ 263.836527] allocated by task 1112 on cpu 5 at 252.422888s: [ 263.836697] do_sme_acc+0xa8/0x230 ng m[ 263.836821] el0_sme_acc+0x40/0xa0 [ 263.836966] el0t_64_sync_handler+0xa8/0xf0 [ 263.837114] el0t_64_sync+0x190/0x198 [ 263.837224] ode[ 263.837275] freed by task 15 on cpu 0 at 263.833793s: [ 263.837500] fpsimd_release_task+0x28/0x50 [ 263.837629] arch_release_task_struct+0x1c/0x30 ve[ 263.837773] __put_task_struct+0x164/0x220 [ 263.837886] delayed_put_task_struct+0x60/0x128 [ 263.838032] rcu_core+0x318/0x950 cto[ 263.838176] rcu_core_si+0x1c/0x30 [ 263.838310] __do_softirq+0x110/0x3d8 [ 263.838417] run_ksoftirqd+0x40/0xe0 [ 263.838521] smpboot_thread_fn+0x1d0/0x260 [ 263.838626] kthread+0xec/0x190 [ 263.838742] ret_from_fork+0x10/0x20 [ 263.838861] [ 263.838913] CPU: 0 PID: 15 Comm: ksoftirqd/0 Not tainted 6.3.3-rc1 #1 [ 263.839037] Hardware name: FVP Base RevC (DT) [ 263.839111] ================================================================== r length: 512 bits # # ZA-VL-64-4: PID: 1089 # # SSVE-VL-64-4: Streaming mode Vector length: 512 bits # # SSVE-VL-64-4: PID: 1088 # # ZA-VL-16-4: Streaming mode vector length: 128 bits # # ZA-VL-16-4: PID: 1093 # # FPSIMD-5-0: Vector length: 128 bits # # FPSIMD-5-0: PID: 1094 # # SVE-VL-32-5: Vector length: 256 bits # # SVE-VL-32-5: PID: 1096 # # SSVE-VL-64-5: Streaming mode Vector length: 512 bits # # SVE-VL-64-5: Vector length: 512 bits # # SVE-VL-64-5: PID: 1095 # # SSVE-VL-64-5: PID: 1098 # # ZA-VL-64-5:[ 263.905145] ================================================================== [ 263.905299] BUG: KFENCE: memory corruption in fpsimd_release_task+0x28/0x50 [ 263.905299] Str[ 263.905444] Corrupted memory at 0x00000000e3d2342a [ ! ! ! ! ! ! . . . . . . . . . . ] (in kfence-#146): [ 263.905957] fpsimd_release_task+0x28/0x50 eam[ 263.906088] arch_release_task_struct+0x1c/0x30 [ 263.906236] __put_task_struct+0x164/0x220 [ 263.906348] delayed_put_task_struct+0x60/0x128 [ 263.906499] rcu_core+0x318/0x950 [ 263.906647] rcu_core_si+0x1c/0x30 in[ 263.906786] __do_softirq+0x110/0x3d8 [ 263.906892] ____do_softirq+0x1c/0x30 [ 263.907015] call_on_irq_stack+0x24/0x58 g mo[ 263.907139] do_softirq_own_stack+0x28/0x40 [ 263.907305] __irq_exit_rcu+0x94/0xf8 [ 263.907454] irq_exit_rcu+0x1c/0x40 de [ 263.907599] el0_interrupt+0x58/0x160 [ 263.907765] __el0_irq_handler_common+0x18/0x28 [ 263.907879] el0t_64_irq_handler+0x10/0x20 [ 263.907989] el0t_64_irq+0x190/0x198 [ 263.908098] vect[ 263.908149] kfence-#146: 0x000000005a8569e6-0x00000000c704c501, size=546, cache=kmalloc-1k [ 263.908149] [ 263.908282] allocated by task 1102 on cpu 0 at 251.030980s: [ 263.908452] do_sme_acc+0xa8/0x230 or l[ 263.908576] el0_sme_acc+0x40/0xa0 [ 263.908725] el0t_64_sync_handler+0xa8/0xf0 [ 263.908879] el0t_64_sync+0x190/0x198 [ 263.908986] eng[ 263.909036] freed by task 1 on cpu 3 at 263.904989s: [ 263.909311] fpsimd_release_task+0x28/0x50 [ 263.909439] arch_release_task_struct+0x1c/0x30 th:[ 263.909584] __put_task_struct+0x164/0x220 [ 263.909696] delayed_put_task_struct+0x60/0x128 [ 263.909842] rcu_core+0x318/0x950 512 [ 263.909986] rcu_core_si+0x1c/0x30 [ 263.910175] __do_softirq+0x110/0x3d8 [ 263.910279] ____do_softirq+0x1c/0x30 [ 263.910399] call_on_irq_stack+0x24/0x58 [ 263.910520] do_softirq_own_stack+0x28/0x40 [ 263.910645] __irq_exit_rcu+0x94/0xf8 bits[ 263.910792] irq_exit_rcu+0x1c/0x40 [ 263.910937] el0_interrupt+0x58/0x160 [ 263.911043] __el0_irq_handler_common+0x18/0x28 [ 263.911154] el0t_64_irq_handler+0x10/0x20 [ 263.911261] el0t_64_irq+0x190/0x198 # # [ 263.911387] [ 263.911448] CPU: 3 PID: 1 Comm: systemd Tainted: G B 6.3.3-rc1 #1 [ 263.911575] Hardware name: FVP Base RevC (DT) [ 263.911653] ================================================================== .. # ok 80 ZA-VL-16-7 # # Totals: pass:80 fail:0 xfail:0 xpass:0 skip:0 error:0 ok 32 selftests: arm64: fp-stress Steps to reproduce: ============ # To install tuxrun on your system globally: # sudo pip3 install -U tuxrun==0.42.0 # # See https://tuxrun.org/ for complete documentation. tuxrun \ --runtime podman \ --device fvp-aemva \ --boot-args rw \ --kernel https://storage.tuxsuite.com/public/linaro/lkft/builds/2Pq5NvLiBcWRMuy6lXft… \ --modules https://storage.tuxsuite.com/public/linaro/lkft/builds/2Pq5NvLiBcWRMuy6lXft… \ --rootfs https://storage.tuxboot.com/debian/bookworm/arm64/rootfs.ext4.xz \ --parameters SKIPFILE=skipfile-lkft.yaml \ --parameters KSELFTEST=https://storage.tuxsuite.com/public/linaro/lkft/builds/2Pq5NvLiBc… \ --image tuxrun:fvp \ --tests kselftest-arm64 \ --timeouts boot=60 kselftest-arm64=60 Test log links: ======== - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.1.y/build/v6.1.2… - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.1.y/build/v6.1.2… - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.1.y/build/v6.1.2… - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.3.y/build/v6.3.2… - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.3.y/build/v6.3.2… - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.3.y/build/v6.3.2… -- Linaro LKFT https://lkft.linaro.org

2 years

5
15
0 0

[PATCH v4 0/3] tracing/user_events: Fix incorrect return value for writes when events are disabled and add its tests

by sunliming

Now the writing operation return the count of writes regardless of whether events are enabled or disabled. Fix this by just return -EBADF when events are disabled. v3 -> v4: - Change the return value from zero to -EBADF v2 -> v3: - Change the return value from -ENOENT to zero v1 -> v2: - Change the return value from -EFAULT to -ENOENT sunliming (3): tracing/user_events: Fix incorrect return value for writing operation when events are disabled selftests/user_events: Enable the event before write_fault test in ftrace self-test selftests/user_events: Add test cases when event is disabled kernel/trace/trace_events_user.c | 3 ++- tools/testing/selftests/user_events/ftrace_test.c | 8 ++++++++ 2 files changed, 10 insertions(+), 1 deletion(-) -- 2.25.1

2 years

2
4
0 0

[PATCH net] selftests: rtnetlink: remove netdevsim device after ipsec offload test

by Sabrina Dubroca

On systems where netdevsim is built-in or loaded before the test starts, kci_test_ipsec_offload doesn't remove the netdevsim device it created during the test. Fixes: e05b2d141fef ("netdevsim: move netdev creation/destruction to dev probe") Signed-off-by: Sabrina Dubroca <sd(a)queasysnail.net> --- tools/testing/selftests/net/rtnetlink.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh index 383ac6fc037d..ba286d680fd9 100755 --- a/tools/testing/selftests/net/rtnetlink.sh +++ b/tools/testing/selftests/net/rtnetlink.sh @@ -860,6 +860,7 @@ EOF fi # clean up any leftovers + echo 0 > /sys/bus/netdevsim/del_device $probed && rmmod netdevsim if [ $ret -ne 0 ]; then -- 2.40.1

2 years

4
3
0 0

[PATCH v20 0/5] Implement IOCTL to get and optionally clear info about PTEs

by Muhammad Usama Anjum

*Changes in v20* - Correct PAGE_IS_FILE and add PAGE_IS_PFNZERO *Changes in v19* - Minor changes and interface updates *Changes in v18* - Rebase on top of next-20230613 - Minor updates *Changes in v17* - Rebase on top of next-20230606 - Minor improvements in PAGEMAP_SCAN IOCTL patch *Changes in v16* - Fix a corner case - Add exclusive PM_SCAN_OP_WP back *Changes in v15* - Build fix (Add missed build fix in RESEND) *Changes in v14* - Fix build error caused by #ifdef added at last minute in some configs *Changes in v13* - Rebase on top of next-20230414 - Give-up on using uffd_wp_range() and write new helpers, flush tlb only once *Changes in v12* - Update and other memory types to UFFD_FEATURE_WP_ASYNC - Rebaase on top of next-20230406 - Review updates *Changes in v11* - Rebase on top of next-20230307 - Base patches on UFFD_FEATURE_WP_UNPOPULATED - Do a lot of cosmetic changes and review updates - Remove ENGAGE_WP + !GET operation as it can be performed with UFFDIO_WRITEPROTECT *Changes in v10* - Add specific condition to return error if hugetlb is used with wp async - Move changes in tools/include/uapi/linux/fs.h to separate patch - Add documentation *Changes in v9:* - Correct fault resolution for userfaultfd wp async - Fix build warnings and errors which were happening on some configs - Simplify pagemap ioctl's code *Changes in v8:* - Update uffd async wp implementation - Improve PAGEMAP_IOCTL implementation *Changes in v7:* - Add uffd wp async - Update the IOCTL to use uffd under the hood instead of soft-dirty flags *Motivation* The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of the pages that are written to in a region of virtual memory. This syscall is used in Windows applications and games etc. This syscall is being emulated in pretty slow manner in userspace. Our purpose is to enhance the kernel such that we translate it efficiently in a better way. Currently some out of tree hack patches are being used to efficiently emulate it in some kernels. We intend to replace those with these patches. So the whole gaming on Linux can effectively get benefit from this. It means there would be tons of users of this code. CRIU use case [2] was mentioned by Andrei and Danylo: > Use cases for migrating sparse VMAs are binaries sanitized with ASAN, > MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of > shadow memory [4]. Being able to migrate such binaries allows to highly > reduce the amount of work needed to identify and fix post-migration > crashes, which happen constantly. Andrei's defines the following uses of this code: * it is more granular and allows us to track changed pages more effectively. The current interface can clear dirty bits for the entire process only. In addition, reading info about pages is a separate operation. It means we must freeze the process to read information about all its pages, reset dirty bits, only then we can start dumping pages. The information about pages becomes more and more outdated, while we are processing pages. The new interface solves both these downsides. First, it allows us to read pte bits and clear the soft-dirty bit atomically. It means that CRIU will not need to freeze processes to pre-dump their memory. Second, it clears soft-dirty bits for a specified region of memory. It means CRIU will have actual info about pages to the moment of dumping them. * The new interface has to be much faster because basic page filtering is happening in the kernel. With the old interface, we have to read pagemap for each page. *Implementation Evolution (Short Summary)* From the definition of GetWriteWatch(), we feel like kernel's soft-dirty feature can be used under the hood with some additions like: * reset soft-dirty flag for only a specific region of memory instead of clearing the flag for the entire process * get and clear soft-dirty flag for a specific region atomically So we decided to use ioctl on pagemap file to read or/and reset soft-dirty flag. But using soft-dirty flag, sometimes we get extra pages which weren't even written. They had become soft-dirty because of VMA merging and VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were able to by-pass this short coming by ignoring VM_SOFTDIRTY until David reported that mprotect etc messes up the soft-dirty flag while ignoring VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We discussed if we can revert these patches. But we could not reach to any conclusion. So at this point, I made couple of tries to solve this whole VM_SOFTDIRTY issue by correcting the soft-dirty implementation: * [7] Correct the bug fixed wrongly back in 2014. It had potential to cause regression. We left it behind. * [8] Keep a list of soft-dirty part of a VMA across splits and merges. I got the reply don't increase the size of the VMA by 8 bytes. At this point, we left soft-dirty considering it is too much delicate and userfaultfd [9] seemed like the only way forward. From there onward, we have been basing soft-dirty emulation on userfaultfd wp feature where kernel resolves the faults itself when WP_ASYNC feature is used. It was straight forward to add WP_ASYNC feature in userfautlfd. Now we get only those pages dirty or written-to which are really written in reality. (PS There is another WP_UNPOPULATED userfautfd feature is required which is needed to avoid pre-faulting memory before write-protecting [9].) All the different masks were added on the request of CRIU devs to create interface more generic and better. [1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-… [2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com [3] https://github.com/google/sanitizers [4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit [5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com [6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/ [7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.… [8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.… [9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com [10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com * Original Cover letter from v8* Hello, Note: Soft-dirty pages and pages which have been written-to are synonyms. As kernel already has soft-dirty feature inside which we have given up to use, we are using written-to terminology while using UFFD async WP under the hood. This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear the info about page table entries. The following operations are supported in this ioctl: - Get the information if the pages have been written-to (PAGE_IS_WRITTEN), file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped (PAGE_IS_SWAPPED). - Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which pages have been written-to. - Find pages which have been written-to and write protect the pages (atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE) It is possible to find and clear soft-dirty pages entirely in userspace. But it isn't efficient: - The mprotect and SIGSEGV handler for bookkeeping - The userfaultfd wp (synchronous) with the handler for bookkeeping Some benchmarks can be seen here[1]. This series adds features that weren't present earlier: - There is no atomic get soft-dirty/Written-to status and clear present in the kernel. - The pages which have been written-to can not be found in accurate way. (Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty pages than there actually are.) Historically, soft-dirty PTE bit tracking has been used in the CRIU project. The procfs interface is enough for finding the soft-dirty bit status and clearing the soft-dirty bit of all the pages of a process. We have the use case where we need to track the soft-dirty PTE bit for only specific pages on-demand. We need this tracking and clear mechanism of a region of memory while the process is running to emulate the getWriteWatch() syscall of Windows. *(Moved to using UFFD instead of soft-dirtyi feature to find pages which have been written-to from v7 patch series)*: Stop using the soft-dirty flags for finding which pages have been written to. It is too delicate and wrong as it shows more soft-dirty pages than the actual soft-dirty pages. There is no interest in correcting it [2][3] as this is how the feature was written years ago. It shouldn't be updated to changed behaviour. Peter Xu has suggested using the async version of the UFFD WP [4] as it is based inherently on the PTEs. So in this patch series, I've added a new mode to the UFFD which is asynchronous version of the write protect. When this variant of the UFFD WP is used, the page faults are resolved automatically by the kernel. The pages which have been written-to can be found by reading pagemap file (!PM_UFFD_WP). This feature can be used successfully to find which pages have been written to from the time the pages were write protected. This works just like the soft-dirty flag without showing any extra pages which aren't soft-dirty in reality. The information related to pages if the page is file mapped, present and swapped is required for the CRIU project [5][6]. The addition of the required mask, any mask, excluded mask and return masks are also required for the CRIU project [5]. The IOCTL returns the addresses of the pages which match the specific masks. The page addresses are returned in struct page_region in a compact form. The max_pages is needed to support a use case where user only wants to get a specific number of pages. So there is no need to find all the pages of interest in the range when max_pages is specified. The IOCTL returns when the maximum number of the pages are found. The max_pages is optional. If max_pages is specified, it must be equal or greater than the vec_size. This restriction is needed to handle worse case when one page_region only contains info of one page and it cannot be compacted. This is needed to emulate the Windows getWriteWatch() syscall. The patch series include the detailed selftest which can be used as an example for the uffd async wp test and PAGEMAP_IOCTL. It shows the interface usages as well. [1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora… [2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.… [3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.… [4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n [5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/ [6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/ Regards, Muhammad Usama Anjum Muhammad Usama Anjum (4): fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs tools headers UAPI: Update linux/fs.h with the kernel sources mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL selftests: mm: add pagemap ioctl tests Peter Xu (1): userfaultfd: UFFD_FEATURE_WP_ASYNC Documentation/admin-guide/mm/pagemap.rst | 58 + Documentation/admin-guide/mm/userfaultfd.rst | 35 + fs/proc/task_mmu.c | 560 +++++++ fs/userfaultfd.c | 26 +- include/linux/hugetlb.h | 1 + include/linux/userfaultfd_k.h | 21 +- include/uapi/linux/fs.h | 54 + include/uapi/linux/userfaultfd.h | 9 +- mm/hugetlb.c | 34 +- mm/memory.c | 27 +- tools/include/uapi/linux/fs.h | 54 + tools/testing/selftests/mm/.gitignore | 2 + tools/testing/selftests/mm/Makefile | 3 +- tools/testing/selftests/mm/config | 1 + tools/testing/selftests/mm/pagemap_ioctl.c | 1464 ++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 4 + 16 files changed, 2329 insertions(+), 24 deletions(-) create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh -- 2.39.2

2 years

3
9
0 0

Re: [PATCH v3 3/3] selftests/tdx: Test GetQuote TDX attestation feature

by Dan Williams

Erdem Aktas wrote: > On Mon, Jun 12, 2023 at 12:03 PM Dan Williams <dan.j.williams(a)intel.com> > wrote: > > > [ add David, Brijesh, and Atish] > > > > Kuppuswamy Sathyanarayanan wrote: > > > In TDX guest, the second stage of the attestation process is Quote > > > generation. This process is required to convert the locally generated > > > TDREPORT into a remotely verifiable Quote. It involves sending the > > > TDREPORT data to a Quoting Enclave (QE) which will verify the > > > integrity of the TDREPORT and sign it with an attestation key. > > > > > > Intel's TDX attestation driver exposes TDX_CMD_GET_QUOTE IOCTL to > > > allow the user agent to get the TD Quote. > > > > > > Add a kernel selftest module to verify the Quote generation feature. > > > > > > TD Quote generation involves following steps: > > > > > > * Get the TDREPORT data using TDX_CMD_GET_REPORT IOCTL. > > > * Embed the TDREPORT data in quote buffer and request for quote > > > generation via TDX_CMD_GET_QUOTE IOCTL request. > > > * Upon completion of the GetQuote request, check for non zero value > > > in the status field of Quote header to make sure the generated > > > quote is valid. > > > > What this cover letter does not say is that this is adding another > > instance of the similar pattern as SNP_GET_REPORT. > > > > Linux is best served when multiple vendors trying to do similar > > operations are brought together behind a common ABI. We see this in the > > history of wrangling SCSI vendors behind common interfaces. > > Compared to the number of SCSI vendors, I think the number of CPU vendors > for confidential computing seems manageable to me. Is this really a good > comparison? Fair enough, and prompted by this I talk a bit more about the motiviations and benefits of a Keys abstraction for attestation here: https://lore.kernel.org/all/64961c3baf8ce_142af829436@dwillia2-xfh.jf.intel… > > Now multiple > > confidential computing vendors trying to develop similar flows with > > differentiated formats where that differentiation need not leak over the > > ABI boundary. > > > > <Just my personal opinion below> > I agree with this statement in the high level but it is also somehow > surprising for me after all the discussion happened around this topic. > Honestly, I feel like there are multiple versions of "Intel" working in > different directions. This proposal was sent while firmly wearing my Linux community hat. I agree, the timing here is unfortunate. > If we want multiple vendors trying to do the similar things behind a common > ABI, it should start with the spec. Since this comment is coming from > Intel, I wonder if there is any plan to combine the GHCB and GHCI > interfaces under common ABI in the future or why it did not even happen in > the first place. Per above comment about firmly wearing my Linux hat I am coming at this purely from the perspective of what do we do now as a community that continues to see these implementations proliferate and grow more features. Common specs are great, but I agree with you, it is too late for that, but I hope that as Linux asserts "this is what it should look like" it starts to influence future IP innovation, and attestation service providers, to acommodate the kernel's ABI momentum. > What I see is that Intel has GETQUOTE TDVMCALL interface in its spec and > again Intel does not really want to provide support for it in linux. It > feels really frustrating. I am aware of how frustrating late feedback can be. I am also encouraged by some of the conversations and investigations that have already happened around how Keys fits what these attestation solutions need. > > My observation of SNP_GET_REPORT and TDX_CMD_GET_REPORT is that they are > > both passing blobs across the user/kernel and platform/kernel boundary > > for the purposes of unlocking other resources. To me that is a flow that > > the Keys subsystem has infrastructure to handle. It has the concept of > > upcalls and asynchronous population of blobs by handles and mechanisms > > to protect and cache those communications. Linux / the Keys subsystem > > could benefit from the enhancements it would need to cover these 2 > > cases. Specifically, the benefit that when ARM and RISC-V arrive with > > similar communications with platform TSMs (Trusted Security Module) they > > can build upon the same infrastructure. > > > > David, am I reaching with that association? My strawman mapping of > > TDX_CMD_GET_QUOTE to request_key() is something like: > > > > request_key(coco_quote, "description", "<uuencoded tdreport>") > > > > Where this is a common key_type for all vendors, but the description and > > arguments have room for vendor differentiation when doing the upcall to > > the platform TSM, but userspace never needs to contend with the > > different vendor formats, that is all handled internally to the kernel. > > > > I think the problem definition here is not accurate. With AMD SNP, guests > need to do a hypercall to KVM and KVM needs to issue > a SNP_GUEST_REQUEST(MSG_REPORT_REQ) to the SP firmware. In TDX, guests > need to do a TDCALL to TDXMODULE to get the TDREPORT and then it needs to > get that report delivered to the host userspace to get the TDQUOTE > generated by the SGX quoting enclave. Also TDQUOTE is designed to work > async while the SNP_GUEST_REQUESTS are blocking vmcalls. > > Those are completely different flows. Are you suggesting that intel should > also come down to a single call to get the TDQUOTE like AMD SNP? The Keys subsystem supports async instantiation of key material with usermode upcalls if necessary. So I do not see a problem supporting these flows behind a common key type. > The TDCALL interface asking for the TDREPORT is already there. AMD does not > need to ask the report and the quote separately. > > Here, the problem was that Intel (upstream) did not want to implement > hypercall for TDQUOTE which would be handled by the user space VMM. The > alternative implementation (using vsock) does not work for many use cases > including ours. I do not see how your suggestion addresses the problem that > this patch was trying to solve. Perhaps the strawman mockup makes it more clear: https://lore.kernel.org/all/64961c3baf8ce_142af829436@dwillia2-xfh.jf.intel… > So while I like the suggested direction, I am not sure how much it is > possible to come up with a common ABI even with just only for 2 vendors > (AMD and Intel) without doing spec changes which is a multi year effort > imho. I agree, hardware spec changes are out of scope for this effort, but Keys might require some additional flows to be built up in the kernel that could be previously handled in userspace. I.e. the "bottom half" that I reference in the mockup. This is something we went through with using "encrypted-keys" for nvdimm. Instead of an ioctl to inject a secret key over the user kernel boundary a key server need to store a serialized version of the encrypted key blob and pass that into the kernel.

2 years

1
0
0 0

[PATCH v2 0/2] arm64/signal: Fix handling of TPIDR2

by Mark Brown

The restoring of TPIDR2 signal context has been broken since it was merged, fix this and add a test case covering it. This is a result of TPIDR2 context management following a different flow to any of the other state that we provide and the fact that we don't expose TPIDR (which follows the same pattern) to signals. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Changes in v2: - Added a feature check for SME to the new test. - Link to v1: https://lore.kernel.org/r/20230621-arm64-fix-tpidr2-signal-restore-v1-0-b6d… --- Mark Brown (2): arm64/signal: Restore TPIDR2 register rather than memory state kselftest/arm64: Add a test case for TPIDR2 restore arch/arm64/kernel/signal.c | 2 +- tools/testing/selftests/arm64/signal/.gitignore | 2 +- .../arm64/signal/testcases/tpidr2_restore.c | 86 ++++++++++++++++++++++ 3 files changed, 88 insertions(+), 2 deletions(-) --- base-commit: 858fd168a95c5b9669aac8db6c14a9aeab446375 change-id: 20230621-arm64-fix-tpidr2-signal-restore-713d93798f99 Best regards, -- Mark Brown <broonie(a)kernel.org>

2 years

2
7
0 0

[PATCH bpf-next v3 0/2] Fix missing synack in BPF cgroup_skb filters

by Kui-Feng Lee

TCP SYN/ACK packets of connections from processes/sockets outside a cgroup on the same host are not received by the cgroup's installed cgroup_skb filters. There were two BPF cgroup_skb programs attached to a cgroup named "my_cgroup". SEC("cgroup_skb/ingress") int ingress(struct __sk_buff *skb) { /* .... process skb ... */ return 1; } SEC("cgroup_skb/egress") int egress(struct __sk_buff *skb) { /* .... process skb ... */ return 1; } We discovered that when running the command "nc -6 -l 8000" in "my_group" and connecting to it from outside of "my_cgroup" with the command "nc -6 localhost 8000", the egress filter did not detect the SYN/ACK packet. However, we did observe the SYN/ACK packet at the ingress when connecting from a socket in "my_cgroup" to a socket outside of it. We came across BPF_CGROUP_RUN_PROG_INET_EGRESS(). This macro is responsible for calling BPF programs that are attached to the egress hook of a cgroup and it skips programs if the sending socket is not the owner of the skb. Specifically, in our situation, the SYN/ACK skb is owned by a struct request_sock instance, but the sending socket is the listener socket we use to receive incoming connections. The request_sock is created to manage an incoming connection. It has been determined that checking the owner of a skb against the sending socket is not required. Removing this check will allow the filters to receive SYN/ACK packets. To ensure that cgroup_skb filters can receive all signaling packets, including SYN, SYN/ACK, ACK, FIN, and FIN/ACK. A new self-test has been added as well. Changes from v2: - Remove redundant blank lines. Changes from v1: - Check the number of observed packets instead of just sleeping. - Use ASSERT_XXX() instead of CHECK()/ [v1] https://lore.kernel.org/all/20230612191641.441774-1-kuifeng@meta.com/ [v2] https://lore.kernel.org/all/20230617052756.640916-2-kuifeng@meta.com/ Kui-Feng Lee (2): net: bpf: Always call BPF cgroup filters for egress. selftests/bpf: Verify that the cgroup_skb filters receive expected packets. include/linux/bpf-cgroup.h | 2 +- tools/testing/selftests/bpf/cgroup_helpers.c | 12 + tools/testing/selftests/bpf/cgroup_helpers.h | 1 + tools/testing/selftests/bpf/cgroup_tcp_skb.h | 35 ++ .../selftests/bpf/prog_tests/cgroup_tcp_skb.c | 399 ++++++++++++++++++ .../selftests/bpf/progs/cgroup_tcp_skb.c | 382 +++++++++++++++++ 6 files changed, 830 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/bpf/cgroup_tcp_skb.h create mode 100644 tools/testing/selftests/bpf/prog_tests/cgroup_tcp_skb.c create mode 100644 tools/testing/selftests/bpf/progs/cgroup_tcp_skb.c -- 2.34.1

2 years

4
12
0 0

[PATCH net-next 0/9] mptcp: expose more info and small improvements

by Matthieu Baerts

Patch 1-3/9 track and expose some aggregated data counters at the MPTCP level: the number of retransmissions and the bytes that have been transferred. The first patch prepares the work by moving where snd_una is updated for fallback sockets while the last patch adds some tests to cover the new code. Patch 4-6/9 introduce a new getsockopt for SOL_MPTCP: MPTCP_FULL_INFO. This new socket option allows to combine info from MPTCP_INFO, MPTCP_TCPINFO and MPTCP_SUBFLOW_ADDRS socket options into one. It can be needed to have all info in one because the path-manager can close and re-create subflows between getsockopt() and fooling the accounting. The first patch introduces a unique subflow ID to easily detect when subflows are being re-created with the same 5-tuple while the last patch adds some tests to cover the new code. Please note that patch 5/9 ("mptcp: introduce MPTCP_FULL_INFO getsockopt") can reveal a bug that were there for a bit of time, see [1]. A fix has recently been fixed to netdev for the -net tree: "mptcp: ensure listener is unhashed before updating the sk status", see [2]. There is no conflicts between the two patches but it might be better to apply this series after the one for -net and after having merged "net" into "net-next". Patch 7/9 is similar to commit 47867f0a7e83 ("selftests: mptcp: join: skip check if MIB counter not supported") recently applied in the -net tree but here it adapts the new code that is only in net-next (and it fixes a merge conflict resolution which didn't have any impact). Patch 8 and 9/9 are two simple refactoring. One to consolidate the transition to TCP_CLOSE in mptcp_do_fastclose() and avoid duplicated code. The other one reduces the scope of an argument passed to mptcp_pm_alloc_anno_list() function. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/407 [1] Link: https://lore.kernel.org/netdev/20230620-upstream-net-20230620-misc-fixes-fo… [2] Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net> --- Geliang Tang (1): mptcp: pass addr to mptcp_pm_alloc_anno_list Matthieu Baerts (1): selftests: mptcp: join: skip check if MIB counter not supported (part 2) Paolo Abeni (7): mptcp: move snd_una update earlier for fallback socket mptcp: track some aggregate data counters selftests: mptcp: explicitly tests aggregate counters mptcp: add subflow unique id mptcp: introduce MPTCP_FULL_INFO getsockopt selftests: mptcp: add MPTCP_FULL_INFO testcase mptcp: consolidate transition to TCP_CLOSE in mptcp_do_fastclose() include/uapi/linux/mptcp.h | 29 +++++ net/mptcp/options.c | 14 +- net/mptcp/pm_netlink.c | 8 +- net/mptcp/pm_userspace.c | 2 +- net/mptcp/protocol.c | 31 +++-- net/mptcp/protocol.h | 11 +- net/mptcp/sockopt.c | 152 +++++++++++++++++++++- net/mptcp/subflow.c | 2 + tools/testing/selftests/net/mptcp/mptcp_join.sh | 33 ++--- tools/testing/selftests/net/mptcp/mptcp_sockopt.c | 120 ++++++++++++++++- 10 files changed, 356 insertions(+), 46 deletions(-) --- base-commit: 712557f210723101717570844c95ac0913af74d7 change-id: 20230620-upstream-net-next-20230620-mptcp-expose-more-info-and-misc-6b4a3a415ec5 Best regards, -- Matthieu Baerts <matthieu.baerts(a)tessares.net>

2 years

5
13
0 0

[PATCH v19 0/5] Implement IOCTL to get and optionally clear info about PTEs

by Muhammad Usama Anjum

*Changes in v19* - Minor changes and interface updates *Changes in v18* - Rebase on top of next-20230613 - Minor updates *Changes in v17* - Rebase on top of next-20230606 - Minor improvements in PAGEMAP_SCAN IOCTL patch *Changes in v16* - Fix a corner case - Add exclusive PM_SCAN_OP_WP back *Changes in v15* - Build fix (Add missed build fix in RESEND) *Changes in v14* - Fix build error caused by #ifdef added at last minute in some configs *Changes in v13* - Rebase on top of next-20230414 - Give-up on using uffd_wp_range() and write new helpers, flush tlb only once *Changes in v12* - Update and other memory types to UFFD_FEATURE_WP_ASYNC - Rebaase on top of next-20230406 - Review updates *Changes in v11* - Rebase on top of next-20230307 - Base patches on UFFD_FEATURE_WP_UNPOPULATED - Do a lot of cosmetic changes and review updates - Remove ENGAGE_WP + !GET operation as it can be performed with UFFDIO_WRITEPROTECT *Changes in v10* - Add specific condition to return error if hugetlb is used with wp async - Move changes in tools/include/uapi/linux/fs.h to separate patch - Add documentation *Changes in v9:* - Correct fault resolution for userfaultfd wp async - Fix build warnings and errors which were happening on some configs - Simplify pagemap ioctl's code *Changes in v8:* - Update uffd async wp implementation - Improve PAGEMAP_IOCTL implementation *Changes in v7:* - Add uffd wp async - Update the IOCTL to use uffd under the hood instead of soft-dirty flags *Motivation* The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of the pages that are written to in a region of virtual memory. This syscall is used in Windows applications and games etc. This syscall is being emulated in pretty slow manner in userspace. Our purpose is to enhance the kernel such that we translate it efficiently in a better way. Currently some out of tree hack patches are being used to efficiently emulate it in some kernels. We intend to replace those with these patches. So the whole gaming on Linux can effectively get benefit from this. It means there would be tons of users of this code. CRIU use case [2] was mentioned by Andrei and Danylo: > Use cases for migrating sparse VMAs are binaries sanitized with ASAN, > MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of > shadow memory [4]. Being able to migrate such binaries allows to highly > reduce the amount of work needed to identify and fix post-migration > crashes, which happen constantly. Andrei's defines the following uses of this code: * it is more granular and allows us to track changed pages more effectively. The current interface can clear dirty bits for the entire process only. In addition, reading info about pages is a separate operation. It means we must freeze the process to read information about all its pages, reset dirty bits, only then we can start dumping pages. The information about pages becomes more and more outdated, while we are processing pages. The new interface solves both these downsides. First, it allows us to read pte bits and clear the soft-dirty bit atomically. It means that CRIU will not need to freeze processes to pre-dump their memory. Second, it clears soft-dirty bits for a specified region of memory. It means CRIU will have actual info about pages to the moment of dumping them. * The new interface has to be much faster because basic page filtering is happening in the kernel. With the old interface, we have to read pagemap for each page. *Implementation Evolution (Short Summary)* From the definition of GetWriteWatch(), we feel like kernel's soft-dirty feature can be used under the hood with some additions like: * reset soft-dirty flag for only a specific region of memory instead of clearing the flag for the entire process * get and clear soft-dirty flag for a specific region atomically So we decided to use ioctl on pagemap file to read or/and reset soft-dirty flag. But using soft-dirty flag, sometimes we get extra pages which weren't even written. They had become soft-dirty because of VMA merging and VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were able to by-pass this short coming by ignoring VM_SOFTDIRTY until David reported that mprotect etc messes up the soft-dirty flag while ignoring VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We discussed if we can revert these patches. But we could not reach to any conclusion. So at this point, I made couple of tries to solve this whole VM_SOFTDIRTY issue by correcting the soft-dirty implementation: * [7] Correct the bug fixed wrongly back in 2014. It had potential to cause regression. We left it behind. * [8] Keep a list of soft-dirty part of a VMA across splits and merges. I got the reply don't increase the size of the VMA by 8 bytes. At this point, we left soft-dirty considering it is too much delicate and userfaultfd [9] seemed like the only way forward. From there onward, we have been basing soft-dirty emulation on userfaultfd wp feature where kernel resolves the faults itself when WP_ASYNC feature is used. It was straight forward to add WP_ASYNC feature in userfautlfd. Now we get only those pages dirty or written-to which are really written in reality. (PS There is another WP_UNPOPULATED userfautfd feature is required which is needed to avoid pre-faulting memory before write-protecting [9].) All the different masks were added on the request of CRIU devs to create interface more generic and better. [1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-… [2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com [3] https://github.com/google/sanitizers [4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit [5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com [6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/ [7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.… [8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.… [9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com [10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com * Original Cover letter from v8* Hello, Note: Soft-dirty pages and pages which have been written-to are synonyms. As kernel already has soft-dirty feature inside which we have given up to use, we are using written-to terminology while using UFFD async WP under the hood. This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear the info about page table entries. The following operations are supported in this ioctl: - Get the information if the pages have been written-to (PAGE_IS_WRITTEN), file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped (PAGE_IS_SWAPPED). - Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which pages have been written-to. - Find pages which have been written-to and write protect the pages (atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE) It is possible to find and clear soft-dirty pages entirely in userspace. But it isn't efficient: - The mprotect and SIGSEGV handler for bookkeeping - The userfaultfd wp (synchronous) with the handler for bookkeeping Some benchmarks can be seen here[1]. This series adds features that weren't present earlier: - There is no atomic get soft-dirty/Written-to status and clear present in the kernel. - The pages which have been written-to can not be found in accurate way. (Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty pages than there actually are.) Historically, soft-dirty PTE bit tracking has been used in the CRIU project. The procfs interface is enough for finding the soft-dirty bit status and clearing the soft-dirty bit of all the pages of a process. We have the use case where we need to track the soft-dirty PTE bit for only specific pages on-demand. We need this tracking and clear mechanism of a region of memory while the process is running to emulate the getWriteWatch() syscall of Windows. *(Moved to using UFFD instead of soft-dirtyi feature to find pages which have been written-to from v7 patch series)*: Stop using the soft-dirty flags for finding which pages have been written to. It is too delicate and wrong as it shows more soft-dirty pages than the actual soft-dirty pages. There is no interest in correcting it [2][3] as this is how the feature was written years ago. It shouldn't be updated to changed behaviour. Peter Xu has suggested using the async version of the UFFD WP [4] as it is based inherently on the PTEs. So in this patch series, I've added a new mode to the UFFD which is asynchronous version of the write protect. When this variant of the UFFD WP is used, the page faults are resolved automatically by the kernel. The pages which have been written-to can be found by reading pagemap file (!PM_UFFD_WP). This feature can be used successfully to find which pages have been written to from the time the pages were write protected. This works just like the soft-dirty flag without showing any extra pages which aren't soft-dirty in reality. The information related to pages if the page is file mapped, present and swapped is required for the CRIU project [5][6]. The addition of the required mask, any mask, excluded mask and return masks are also required for the CRIU project [5]. The IOCTL returns the addresses of the pages which match the specific masks. The page addresses are returned in struct page_region in a compact form. The max_pages is needed to support a use case where user only wants to get a specific number of pages. So there is no need to find all the pages of interest in the range when max_pages is specified. The IOCTL returns when the maximum number of the pages are found. The max_pages is optional. If max_pages is specified, it must be equal or greater than the vec_size. This restriction is needed to handle worse case when one page_region only contains info of one page and it cannot be compacted. This is needed to emulate the Windows getWriteWatch() syscall. The patch series include the detailed selftest which can be used as an example for the uffd async wp test and PAGEMAP_IOCTL. It shows the interface usages as well. [1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora… [2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.… [3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.… [4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n [5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/ [6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/ Regards, Muhammad Usama Anjum Muhammad Usama Anjum (4): fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs tools headers UAPI: Update linux/fs.h with the kernel sources mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL selftests: mm: add pagemap ioctl tests Peter Xu (1): userfaultfd: UFFD_FEATURE_WP_ASYNC Documentation/admin-guide/mm/pagemap.rst | 58 + Documentation/admin-guide/mm/userfaultfd.rst | 35 + fs/proc/task_mmu.c | 526 +++++++ fs/userfaultfd.c | 26 +- include/linux/hugetlb.h | 1 + include/linux/userfaultfd_k.h | 21 +- include/uapi/linux/fs.h | 53 + include/uapi/linux/userfaultfd.h | 9 +- mm/hugetlb.c | 34 +- mm/memory.c | 27 +- tools/include/uapi/linux/fs.h | 53 + tools/testing/selftests/mm/.gitignore | 2 + tools/testing/selftests/mm/Makefile | 3 +- tools/testing/selftests/mm/config | 1 + tools/testing/selftests/mm/pagemap_ioctl.c | 1458 ++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 4 + 16 files changed, 2287 insertions(+), 24 deletions(-) create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh -- 2.39.2

2 years

4
18
0 0

[PATCH v2] selftests/input: introduce a test for the EVIOCGLED ioctl

by Dana Elfassy

This patch introduces a specific test case for the EVIOCGLED ioctl. The test covers the case where len > maxlen in the EVIOCGLED(sizeof(all_leds)), all_leds) ioctl. Signed-off-by: Dana Elfassy <dangel101(a)gmail.com> --- Changes in v2: - Changed variable leds from an array to an int This patch depends on '[v3] selftests/input: Introduce basic tests for evdev ioctls' [1] sent to the ML. [1] https://patchwork.kernel.org/project/linux-input/patch/20230607153214.15933… tools/testing/selftests/input/evioc-test.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/tools/testing/selftests/input/evioc-test.c b/tools/testing/selftests/input/evioc-test.c index ad7b93fe39cf..378db2b4dd56 100644 --- a/tools/testing/selftests/input/evioc-test.c +++ b/tools/testing/selftests/input/evioc-test.c @@ -234,4 +234,21 @@ TEST(eviocsrep_set_repeat_settings) selftest_uinput_destroy(uidev); } +TEST(eviocgled_get_all_leds) +{ + struct selftest_uinput *uidev; + int leds = 0; + int rc; + + rc = selftest_uinput_create_device(&uidev, -1); + ASSERT_EQ(0, rc); + ASSERT_NE(NULL, uidev); + + /* ioctl to set the maxlen = 0 */ + rc = ioctl(uidev->evdev_fd, EVIOCGLED(0), leds); + ASSERT_EQ(0, rc); + + selftest_uinput_destroy(uidev); +} + TEST_HARNESS_MAIN -- 2.41.0

2 years

2
1
0 0

[PATCH v3] selftests/input: introduce a test for the EVIOCGKEY ioctl

by Dana Elfassy

This patch introduces a specific test case for the EVIOCGKEY ioctl. The test covers the case where len > maxlen in the EVIOCGKEY(sizeof(keystate)), keystate) ioctl. Signed-off-by: Dana Elfassy <dangel101(a)gmail.com> --- Changes in v3: - Edited commit's subject and description - Renamed variable rep_values to keystate - Added argument to selftest_uinput_create_device() - Removed memset Changes in v2: - Added following note about the patch's dependency This patch depends on '[v3] selftests/input: Introduce basic tests for evdev ioctls' [1] sent to the ML. [1] https://patchwork.kernel.org/project/linux-input/patch/20230607153214.15933… tools/testing/selftests/input/evioc-test.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/tools/testing/selftests/input/evioc-test.c b/tools/testing/selftests/input/evioc-test.c index ad7b93fe39cf..e0f69459f504 100644 --- a/tools/testing/selftests/input/evioc-test.c +++ b/tools/testing/selftests/input/evioc-test.c @@ -234,4 +234,21 @@ TEST(eviocsrep_set_repeat_settings) selftest_uinput_destroy(uidev); } +TEST(eviocgkey_get_global_key_state) +{ + struct selftest_uinput *uidev; + int keystate = 0; + int rc; + + rc = selftest_uinput_create_device(&uidev, -1); + ASSERT_EQ(0, rc); + ASSERT_NE(NULL, uidev); + + /* ioctl to create the scenario where len > maxlen in bits_to_user() */ + rc = ioctl(uidev->evdev_fd, EVIOCGKEY(0), keystate); + ASSERT_EQ(0, rc); + + selftest_uinput_destroy(uidev); +} + TEST_HARNESS_MAIN -- 2.41.0

2 years

2
1
0 0

[PATCH 00/14] mm/init/kernel: missing-prototypes warnings

by Arnd Bergmann

From: Arnd Bergmann <arnd(a)arndb.de> These are patches addressing -Wmissing-prototypes warnings in common kernel code and memory management code files that usually get merged through the -mm tree. Andrew, can you pick these up in the -mm tree? Arnd Arnd Bergmann (14): mm: percpu: unhide pcpu_embed_first_chunk prototype mm: page_poison: always declare __kernel_map_pages() function mm: sparse: mark populate_section_memmap() static audit: avoid missing-prototype warnings lib: devmem_is_allowed: include linux/io.h locking: add lockevent_read() prototype panic: hide unused global functions panic: make function declarations visible kunit: include debugfs header file suspend: add a arch_resume_nosmt() prototype init: consolidate prototypes in linux/init.h init: move cifs_root_data() prototype into linux/mount.h thread_info: move function declarations to linux/thread_info.h time_namespace: always provide arch_get_vdso_data() prototype for vdso arch/arm/include/asm/irq.h | 1 - arch/arm64/include/asm/thread_info.h | 4 ---- arch/microblaze/include/asm/setup.h | 2 -- arch/mips/include/asm/irq.h | 1 - arch/parisc/kernel/smp.c | 1 - arch/powerpc/include/asm/irq.h | 1 - arch/riscv/include/asm/irq.h | 2 -- arch/riscv/include/asm/timex.h | 2 -- arch/s390/include/asm/thread_info.h | 3 --- arch/s390/kernel/entry.h | 2 -- arch/sh/include/asm/irq.h | 1 - arch/sh/include/asm/rtc.h | 2 -- arch/sh/include/asm/thread_info.h | 3 --- arch/sparc/include/asm/irq_32.h | 1 - arch/sparc/include/asm/irq_64.h | 1 - arch/sparc/include/asm/timer_64.h | 1 - arch/sparc/kernel/kernel.h | 4 ---- arch/x86/include/asm/irq.h | 2 -- arch/x86/include/asm/mem_encrypt.h | 3 --- arch/x86/include/asm/thread_info.h | 3 --- arch/x86/include/asm/time.h | 1 - arch/x86/include/asm/tsc.h | 1 - include/asm-generic/bug.h | 5 +++-- include/linux/acpi.h | 3 ++- include/linux/audit.h | 2 -- include/linux/audit_arch.h | 2 ++ include/linux/delay.h | 1 + include/linux/init.h | 20 ++++++++++++++++++++ include/linux/mm.h | 3 +-- include/linux/mount.h | 2 ++ include/linux/panic.h | 3 +++ include/linux/percpu.h | 2 -- include/linux/suspend.h | 2 ++ include/linux/thread_info.h | 5 +++++ include/linux/time_namespace.h | 3 ++- init/do_mounts.c | 2 -- init/main.c | 18 ------------------ kernel/audit.h | 2 +- kernel/locking/lock_events.h | 4 ++++ kernel/panic.c | 3 +-- lib/devmem_is_allowed.c | 1 + lib/kunit/debugfs.c | 1 + mm/sparse.c | 2 +- 43 files changed, 52 insertions(+), 76 deletions(-) -- 2.39.2 Cc: Russell King <linux(a)armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: Will Deacon <will(a)kernel.org> Cc: Michal Simek <monstr(a)monstr.eu> Cc: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de> Cc: Helge Deller <deller(a)gmx.de> Cc: Michael Ellerman <mpe(a)ellerman.id.au> Cc: Palmer Dabbelt <palmer(a)dabbelt.com> Cc: Heiko Carstens <hca(a)linux.ibm.com> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: x86(a)kernel.org Cc: "Rafael J. Wysocki" <rafael(a)kernel.org> Cc: Paul Moore <paul(a)paul-moore.com> Cc: Eric Paris <eparis(a)redhat.com> Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Dennis Zhou <dennis(a)kernel.org> Cc: Tejun Heo <tj(a)kernel.org> Cc: Christoph Lameter <cl(a)linux.com> Cc: Pavel Machek <pavel(a)ucw.cz> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: Waiman Long <longman(a)redhat.com> Cc: Boqun Feng <boqun.feng(a)gmail.com> Cc: linux-arm-kernel(a)lists.infradead.org Cc: linux-kernel(a)vger.kernel.org Cc: linux-mips(a)vger.kernel.org Cc: linux-parisc(a)vger.kernel.org Cc: linuxppc-dev(a)lists.ozlabs.org Cc: linux-riscv(a)lists.infradead.org Cc: linux-s390(a)vger.kernel.org Cc: linux-sh(a)vger.kernel.org Cc: audit(a)vger.kernel.org Cc: linux-mm(a)kvack.org Cc: linux-pm(a)vger.kernel.org Cc: linux-kselftest(a)vger.kernel.org Cc: kunit-dev(a)googlegroups.com

2 years

7
23
0 0

[PATCH] selftests/input: introduce a test for the EVIOCGLED ioctl

by Dana Elfassy

This patch introduces a specific test case for the EVIOCGLED ioctl. The test covers the case where len > maxlen in the EVIOCGLED(sizeof(all_leds)), all_leds) ioctl. Signed-off-by: Dana Elfassy <dangel101(a)gmail.com> --- This patch depends on '[v3] selftests/input: Introduce basic tests for evdev ioctls' [1] sent to the ML. [1] https://patchwork.kernel.org/project/linux-input/patch/20230607153214.15933… tools/testing/selftests/input/evioc-test.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/tools/testing/selftests/input/evioc-test.c b/tools/testing/selftests/input/evioc-test.c index ad7b93fe39cf..2bf1b32ae01a 100644 --- a/tools/testing/selftests/input/evioc-test.c +++ b/tools/testing/selftests/input/evioc-test.c @@ -234,4 +234,21 @@ TEST(eviocsrep_set_repeat_settings) selftest_uinput_destroy(uidev); } +TEST(eviocgled_get_all_leds) +{ + struct selftest_uinput *uidev; + int leds[2]; + int rc; + + rc = selftest_uinput_create_device(&uidev, -1); + ASSERT_EQ(0, rc); + ASSERT_NE(NULL, uidev); + + /* ioctl to set the maxlen = 0 */ + rc = ioctl(uidev->evdev_fd, EVIOCGLED(0), leds); + ASSERT_EQ(0, rc); + + selftest_uinput_destroy(uidev); +} + TEST_HARNESS_MAIN -- 2.41.0

2 years

1
0
0 0

[PATCH v2 0/2] Unify uapi bitsperlong.h

by Tiezhu Yang

v2: -- Check __CHAR_BIT__ and __SIZEOF_LONG__ rather than __aarch64__, __riscv, __loongarch__, thanks Ruoyao -- Update the code comment and commit message v1: -- Rebase on 6.4-rc6 -- Only unify uapi bitsperlong.h for arm64, riscv and loongarch -- Remove uapi bitsperlong.h of hexagon and microblaze in a new patch Here is the RFC patch: https://lore.kernel.org/linux-arch/1683615903-10862-1-git-send-email-yangti… Tiezhu Yang (2): asm-generic: Unify uapi bitsperlong.h for arm64, riscv and loongarch tools arch: Remove uapi bitsperlong.h of hexagon and microblaze arch/arm64/include/uapi/asm/bitsperlong.h | 24 ------------------- arch/loongarch/include/uapi/asm/bitsperlong.h | 9 -------- arch/riscv/include/uapi/asm/bitsperlong.h | 14 ----------- include/uapi/asm-generic/bitsperlong.h | 11 +++++++++ tools/arch/arm64/include/uapi/asm/bitsperlong.h | 24 ------------------- tools/arch/hexagon/include/uapi/asm/bitsperlong.h | 27 ---------------------- .../arch/loongarch/include/uapi/asm/bitsperlong.h | 9 -------- .../arch/microblaze/include/uapi/asm/bitsperlong.h | 2 -- tools/arch/riscv/include/uapi/asm/bitsperlong.h | 14 ----------- tools/include/uapi/asm-generic/bitsperlong.h | 12 ++++++++++ tools/include/uapi/asm/bitsperlong.h | 6 ----- 11 files changed, 23 insertions(+), 129 deletions(-) delete mode 100644 arch/arm64/include/uapi/asm/bitsperlong.h delete mode 100644 arch/loongarch/include/uapi/asm/bitsperlong.h delete mode 100644 arch/riscv/include/uapi/asm/bitsperlong.h delete mode 100644 tools/arch/arm64/include/uapi/asm/bitsperlong.h delete mode 100644 tools/arch/hexagon/include/uapi/asm/bitsperlong.h delete mode 100644 tools/arch/loongarch/include/uapi/asm/bitsperlong.h delete mode 100644 tools/arch/microblaze/include/uapi/asm/bitsperlong.h delete mode 100644 tools/arch/riscv/include/uapi/asm/bitsperlong.h -- 2.1.0

2 years

2
4
0 0

[PATCH 0/2] arm64/signal: Fix handling of TPIDR2

by Mark Brown

The restoring of TPIDR2 signal context has been broken since it was merged, fix this and add a test case covering it. This is a result of TPIDR2 context management following a different flow to any of the other state that we provide and the fact that we don't expose TPIDR (which follows the same pattern) to signals. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Mark Brown (2): arm64/signal: Restore TPIDR2 register rather than memory state kselftest/arm64: Add a test case for TPIDR2 restore arch/arm64/kernel/signal.c | 2 +- tools/testing/selftests/arm64/signal/.gitignore | 2 +- .../arm64/signal/testcases/tpidr2_restore.c | 85 ++++++++++++++++++++++ 3 files changed, 87 insertions(+), 2 deletions(-) --- base-commit: 858fd168a95c5b9669aac8db6c14a9aeab446375 change-id: 20230621-arm64-fix-tpidr2-signal-restore-713d93798f99 Best regards, -- Mark Brown <broonie(a)kernel.org>

2 years

1
3
0 0

[PATCH v2] selftests/input: add test to cover len > maxlen in bits_to_user()

by Dana Elfassy

In order to cover this case, setting 'maxlen = 0', with the following explanation: EVIOCGKEY is executed from evdev_do_ioctl(), which is called from evdev_ioctl_handler(). evdev_ioctl_handler() is called from 2 functions, where by code coverage, only the first one is in use. ‘compat’ is given the value ‘0’ [1]. Thus, the condition [2] is always false. This means ‘len’ always equals a positive number [3] ‘maxlen’ in evdev_handle_get_val [4] is defined locally in evdev_do_ioctl() [5], and is sent in the variable 'size' [6] [1] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L1281 [2] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L705 [3] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L707 [4] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L886 [5] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L1155 [6] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L1141 Signed-off-by: Dana Elfassy <dangel101(a)gmail.com> --- Changes in v2: - Added following note about the patch's dependency This patch depends on '[v3] selftests/input: Introduce basic tests for evdev ioctls' [1] sent to the ML. [1] https://patchwork.kernel.org/project/linux-input/patch/20230607153214.15933… tools/testing/selftests/input/evioc-test.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/tools/testing/selftests/input/evioc-test.c b/tools/testing/selftests/input/evioc-test.c index ad7b93fe39cf..b94de2ee5596 100644 --- a/tools/testing/selftests/input/evioc-test.c +++ b/tools/testing/selftests/input/evioc-test.c @@ -234,4 +234,23 @@ TEST(eviocsrep_set_repeat_settings) selftest_uinput_destroy(uidev); } +TEST(eviocgkey_get_global_key_state) +{ + struct selftest_uinput *uidev; + int rep_values[2]; + int rc; + + memset(rep_values, 0, sizeof(rep_values)); + + rc = selftest_uinput_create_device(&uidev); + ASSERT_EQ(0, rc); + ASSERT_NE(NULL, uidev); + + /* ioctl to create the scenario where len > maxlen in bits_to_user() */ + rc = ioctl(uidev->evdev_fd, EVIOCGKEY(0), rep_values); + ASSERT_EQ(0, rc); + + selftest_uinput_destroy(uidev); +} + TEST_HARNESS_MAIN -- 2.41.0

2 years

2
1
0 0

[PATCH net] selftests: forwarding: Fix race condition in mirror installation

by Petr Machata

From: Danielle Ratson <danieller(a)nvidia.com> When mirroring to a gretap in hardware the device expects to be programmed with the egress port and all the encapsulating headers. This requires the driver to resolve the path the packet will take in the software data path and program the device accordingly. If the path cannot be resolved (in this case because of an unresolved neighbor), then mirror installation fails until the path is resolved. This results in a race that causes the test to sometimes fail. Fix this by setting the neighbor's state to permanent in a couple of tests, so that it is always valid. Fixes: 35c31d5c323f ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1d") Fixes: 239e754af854 ("selftests: forwarding: Test mirror-to-gretap w/ UL 802.1q") Signed-off-by: Danielle Ratson <danieller(a)nvidia.com> Reviewed-by: Petr Machata <petrm(a)nvidia.com> Signed-off-by: Petr Machata <petrm(a)nvidia.com> --- .../testing/selftests/net/forwarding/mirror_gre_bridge_1d.sh | 4 ++++ .../testing/selftests/net/forwarding/mirror_gre_bridge_1q.sh | 4 ++++ 2 files changed, 8 insertions(+) diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1d.sh b/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1d.sh index c5095da7f6bf..aec752a22e9e 100755 --- a/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1d.sh +++ b/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1d.sh @@ -93,12 +93,16 @@ cleanup() test_gretap() { + ip neigh replace 192.0.2.130 lladdr $(mac_get $h3) \ + nud permanent dev br2 full_test_span_gre_dir gt4 ingress 8 0 "mirror to gretap" full_test_span_gre_dir gt4 egress 0 8 "mirror to gretap" } test_ip6gretap() { + ip neigh replace 2001:db8:2::2 lladdr $(mac_get $h3) \ + nud permanent dev br2 full_test_span_gre_dir gt6 ingress 8 0 "mirror to ip6gretap" full_test_span_gre_dir gt6 egress 0 8 "mirror to ip6gretap" } diff --git a/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1q.sh b/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1q.sh index 9ff22f28032d..0cf4c47a46f9 100755 --- a/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1q.sh +++ b/tools/testing/selftests/net/forwarding/mirror_gre_bridge_1q.sh @@ -90,12 +90,16 @@ cleanup() test_gretap() { + ip neigh replace 192.0.2.130 lladdr $(mac_get $h3) \ + nud permanent dev br1 full_test_span_gre_dir gt4 ingress 8 0 "mirror to gretap" full_test_span_gre_dir gt4 egress 0 8 "mirror to gretap" } test_ip6gretap() { + ip neigh replace 2001:db8:2::2 lladdr $(mac_get $h3) \ + nud permanent dev br1 full_test_span_gre_dir gt6 ingress 8 0 "mirror to ip6gretap" full_test_span_gre_dir gt6 egress 0 8 "mirror to ip6gretap" } -- 2.40.1

2 years

2
1
0 0

[PATCH bpf,v6 0/4] Socket lookup BPF API from tc/xdp ingress does not respect VRF bindings.

by Gilad Sever

When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't respected. This patchset fixes this by regarding the incoming device's VRF attachment when performing the socket lookups from tc/xdp. The first two patches are coding changes which factor out the tc helper's logic which was shared with cg/sk_skb (which operate correctly). This refactoring is needed in order to avoid affecting the cgroup/sk_skb flows as there does not seem to be a strict criteria for discerning which flow the helper is called from based on the net device or packet information. The third patch contains the actual bugfix. The fourth patch adds bpf tests for these lookup functions. --- v6: - Remove redundant IS_ENABLED as suggested by Daniel Borkmann - Declare net_device variable and use it as suggested by Daniel Borkmann v5: Use reverse xmas tree indentation v4: - Move dev_sdif() to include/linux/netdevice.h as suggested by Stanislav Fomichev - Remove SYS and SYS_NOFAIL duplicate definitions v3: - Rename bpf_l2_sdif() to dev_sdif() as suggested by Stanislav Fomichev - Added xdp tests as suggested by Daniel Borkmann - Use start_server() to avoid duplicate code as suggested by Stanislav Fomichev v2: Fixed uninitialized var in test patch (4). Gilad Sever (4): bpf: factor out socket lookup functions for the TC hookpoint. bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC hookpoint bpf: fix bpf socket lookup from tc/xdp to respect socket VRF bindings selftests/bpf: Add vrf_socket_lookup tests include/linux/netdevice.h | 9 + net/core/filter.c | 141 ++++++-- .../bpf/prog_tests/vrf_socket_lookup.c | 312 ++++++++++++++++++ .../selftests/bpf/progs/vrf_socket_lookup.c | 88 +++++ 4 files changed, 526 insertions(+), 24 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/vrf_socket_lookup.c create mode 100644 tools/testing/selftests/bpf/progs/vrf_socket_lookup.c -- 2.34.1

2 years

2
5
0 0

[PATCH net-next 00/16] selftests: Preparations for out-of-order-operations patches in mlxsw

by Petr Machata

The mlxsw driver currently makes the assumption that the user applies configuration in a bottom-up manner. Thus netdevices need to be added to the bridge before IP addresses are configured on that bridge or SVI added on top of it. Enslaving a netdevice to another netdevice that already has uppers is in fact forbidden by mlxsw for this reason. Despite this safety, it is rather easy to get into situations where the offloaded configuration is just plain wrong. Over the course of the following several patchsets, mlxsw code is going to be adjusted to diminish the space of wrongly offloaded configurations. Ideally the offload state will reflect the actual state, regardless of the sequence of operation used to construct that state. Several selftests build configurations that will not be offloadable in the future on some systems. The reason is that what will get offloaded is the actual configuration, not the configuration steps. For example, when a port is added to a bridge that has an IP address, that bridge will get a RIF, which it would not have with the current code. But on Nvidia Spectrum-1 machines, MAC addresses of all RIFs need to have the same prefix, which the bridge will violate. The RIF thus couldn't be created, and the enslavement is therefore canceled, because it would lead to an unoffloadable configuration. This breaks some selftests. In this patchset, adjust selftests to avoid the configurations that mlxsw would be incapable of offloading, while maintaining relevance with regards to the feature that is being tested. There are generally two cases of fixes: - Disabling IPv6 autogen on bridges that do not participate in routing, either because of the abovementioned requirement to keep the same MAC prefix on all in-HW router interfaces, or, on 802.1ad bridges, because in-HW router interfaces are not supported at all. - Setting the bridge MAC address to what it will become after the first member port is attached, so that the in-HW router interface is created with a supported MAC address. The patchset is then split thus: - Patches #1-#7 adjust generic selftests - Patches #8-#16 adjust mlxsw-specific selftests Petr Machata (16): selftests: forwarding: q_in_vni: Disable IPv6 autogen on bridges selftests: forwarding: dual_vxlan_bridge: Disable IPv6 autogen on bridges selftests: forwarding: skbedit_priority: Disable IPv6 autogen on a bridge selftests: forwarding: pedit_dsfield: Disable IPv6 autogen on a bridge selftests: forwarding: mirror_gre_*: Disable IPv6 autogen on bridges selftests: forwarding: mirror_gre_*: Use port MAC for bridge address selftests: forwarding: router_bridge: Use port MAC for bridge address selftests: mlxsw: q_in_q_veto: Disable IPv6 autogen on bridges selftests: mlxsw: extack: Disable IPv6 autogen on bridges selftests: mlxsw: mirror_gre_scale: Disable IPv6 autogen on a bridge selftests: mlxsw: qos_dscp_bridge: Disable IPv6 autogen on a bridge selftests: mlxsw: qos_ets_strict: Disable IPv6 autogen on bridges selftests: mlxsw: qos_mc_aware: Disable IPv6 autogen on bridges selftests: mlxsw: spectrum: q_in_vni_veto: Disable IPv6 autogen on a bridge selftests: mlxsw: vxlan: Disable IPv6 autogen on bridges selftests: mlxsw: one_armed_router: Use port MAC for bridge address .../selftests/drivers/net/mlxsw/extack.sh | 24 ++++++++--- .../drivers/net/mlxsw/mirror_gre_scale.sh | 1 + .../drivers/net/mlxsw/one_armed_router.sh | 3 +- .../drivers/net/mlxsw/q_in_q_veto.sh | 8 ++++ .../drivers/net/mlxsw/qos_dscp_bridge.sh | 1 + .../drivers/net/mlxsw/qos_ets_strict.sh | 8 +++- .../drivers/net/mlxsw/qos_mc_aware.sh | 2 + .../net/mlxsw/spectrum/q_in_vni_veto.sh | 1 + .../selftests/drivers/net/mlxsw/vxlan.sh | 41 ++++++++++++++----- .../net/forwarding/dual_vxlan_bridge.sh | 1 + .../net/forwarding/mirror_gre_bound.sh | 1 + .../net/forwarding/mirror_gre_bridge_1d.sh | 3 +- .../forwarding/mirror_gre_bridge_1d_vlan.sh | 3 +- .../forwarding/mirror_gre_bridge_1q_lag.sh | 3 +- .../net/forwarding/mirror_topo_lib.sh | 1 + .../selftests/net/forwarding/pedit_dsfield.sh | 4 +- .../selftests/net/forwarding/q_in_vni.sh | 1 + .../selftests/net/forwarding/router_bridge.sh | 3 +- .../net/forwarding/skbedit_priority.sh | 4 +- 19 files changed, 88 insertions(+), 25 deletions(-) -- 2.40.1

2 years

2
17
0 0

[PATCH] kselftest/arm64: Log signal code and address for unexpected signals

by Mark Brown

If we get an unexpected signal during a signal test log a bit more data to aid diagnostics. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- tools/testing/selftests/arm64/signal/test_signals_utils.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.c b/tools/testing/selftests/arm64/signal/test_signals_utils.c index 40be8443949d..0dc948db3a4a 100644 --- a/tools/testing/selftests/arm64/signal/test_signals_utils.c +++ b/tools/testing/selftests/arm64/signal/test_signals_utils.c @@ -249,7 +249,8 @@ static void default_handler(int signum, siginfo_t *si, void *uc) fprintf(stderr, "-- Timeout !\n"); } else { fprintf(stderr, - "-- RX UNEXPECTED SIGNAL: %d\n", signum); + "-- RX UNEXPECTED SIGNAL: %d code %d address %p\n", + signum, si->si_code, si->si_addr); } default_result(current, 1); } --- base-commit: 44c026a73be8038f03dbdeef028b642880cf1511 change-id: 20230620-arm64-selftest-log-wrong-signal-cd8c34ae5e4f Best regards, -- Mark Brown <broonie(a)kernel.org>

2 years

2
1
0 0

[PATCH 0/3] selftests: cgroup: add zswap test program

by Domenico Cerasuolo

This series adds 2 zswap related selftests that verify known and fixed issues. A new dedicated test program (test_zswap) is proposed since the test cases are specific to zswap and hosts specific helpers. The first patch adds the (empty) test program, while the other 2 add an actual test function each. Domenico Cerasuolo (3): selftests: cgroup: add test_zswap program selftests: cgroup: add test_zswap with no kmem bypass test selftests: cgroup: add zswap-memcg unwanted writeback test tools/testing/selftests/cgroup/.gitignore | 1 + tools/testing/selftests/cgroup/Makefile | 2 + tools/testing/selftests/cgroup/test_zswap.c | 286 ++++++++++++++++++++ 3 files changed, 289 insertions(+) create mode 100644 tools/testing/selftests/cgroup/test_zswap.c -- 2.34.1

2 years

1
3
0 0

[PATCH bpf-next v2 0/6] Add SO_REUSEPORT support for TC bpf_sk_assign

by Lorenz Bauer

We want to replace iptables TPROXY with a BPF program at TC ingress. To make this work in all cases we need to assign a SO_REUSEPORT socket to an skb, which is currently prohibited. This series adds support for such sockets to bpf_sk_assing. See patch 5 for details. I did some refactoring to cut down on the amount of duplicate code. The key to this is to use INDIRECT_CALL in the reuseport helpers. To show that this approach is not just beneficial to TC sk_assign I removed duplicate code for bpf_sk_lookup as well. Changes from v1: - Correct commit abbrev length (Kuniyuki) - Reduce duplication (Kuniyuki) - Add checks on sk_state (Martin) - Split exporting inet[6]_lookup_reuseport into separate patch (Eric) Joint work with Daniel Borkmann. Signed-off-by: Lorenz Bauer <lmb(a)isovalent.com> --- Daniel Borkmann (1): selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper Lorenz Bauer (5): net: export inet_lookup_reuseport and inet6_lookup_reuseport net: document inet[6]_lookup_reuseport sk_state requirements net: remove duplicate reuseport_lookup functions net: remove duplicate sk_lookup helpers bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign include/net/inet6_hashtables.h | 84 ++++++++- include/net/inet_hashtables.h | 77 +++++++- include/net/sock.h | 7 +- include/uapi/linux/bpf.h | 3 - net/core/filter.c | 2 - net/ipv4/inet_hashtables.c | 69 +++++--- net/ipv4/udp.c | 73 +++----- net/ipv6/inet6_hashtables.c | 71 +++++--- net/ipv6/udp.c | 85 +++------ tools/include/uapi/linux/bpf.h | 3 - tools/testing/selftests/bpf/network_helpers.c | 3 + .../selftests/bpf/prog_tests/assign_reuse.c | 197 +++++++++++++++++++++ .../selftests/bpf/progs/test_assign_reuse.c | 142 +++++++++++++++ 13 files changed, 637 insertions(+), 179 deletions(-) --- base-commit: 25085b4e9251c77758964a8e8651338972353642 change-id: 20230613-so-reuseport-e92c526173ee Best regards, -- Lorenz Bauer <lmb(a)isovalent.com>

2 years

4
21
0 0

[PATCH v18 0/5] Implement IOCTL to get and optionally clear info about PTEs

by Muhammad Usama Anjum

*Changes in v18* - Rebase on top of next-20230613 - Minor updates *Changes in v17* - Rebase on top of next-20230606 - Minor improvements in PAGEMAP_SCAN IOCTL patch *Changes in v16* - Fix a corner case - Add exclusive PM_SCAN_OP_WP back *Changes in v15* - Build fix (Add missed build fix in RESEND) *Changes in v14* - Fix build error caused by #ifdef added at last minute in some configs *Changes in v13* - Rebase on top of next-20230414 - Give-up on using uffd_wp_range() and write new helpers, flush tlb only once *Changes in v12* - Update and other memory types to UFFD_FEATURE_WP_ASYNC - Rebaase on top of next-20230406 - Review updates *Changes in v11* - Rebase on top of next-20230307 - Base patches on UFFD_FEATURE_WP_UNPOPULATED - Do a lot of cosmetic changes and review updates - Remove ENGAGE_WP + !GET operation as it can be performed with UFFDIO_WRITEPROTECT *Changes in v10* - Add specific condition to return error if hugetlb is used with wp async - Move changes in tools/include/uapi/linux/fs.h to separate patch - Add documentation *Changes in v9:* - Correct fault resolution for userfaultfd wp async - Fix build warnings and errors which were happening on some configs - Simplify pagemap ioctl's code *Changes in v8:* - Update uffd async wp implementation - Improve PAGEMAP_IOCTL implementation *Changes in v7:* - Add uffd wp async - Update the IOCTL to use uffd under the hood instead of soft-dirty flags *Motivation* The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of the pages that are written to in a region of virtual memory. This syscall is used in Windows applications and games etc. This syscall is being emulated in pretty slow manner in userspace. Our purpose is to enhance the kernel such that we translate it efficiently in a better way. Currently some out of tree hack patches are being used to efficiently emulate it in some kernels. We intend to replace those with these patches. So the whole gaming on Linux can effectively get benefit from this. It means there would be tons of users of this code. CRIU use case [2] was mentioned by Andrei and Danylo: > Use cases for migrating sparse VMAs are binaries sanitized with ASAN, > MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of > shadow memory [4]. Being able to migrate such binaries allows to highly > reduce the amount of work needed to identify and fix post-migration > crashes, which happen constantly. Andrei's defines the following uses of this code: * it is more granular and allows us to track changed pages more effectively. The current interface can clear dirty bits for the entire process only. In addition, reading info about pages is a separate operation. It means we must freeze the process to read information about all its pages, reset dirty bits, only then we can start dumping pages. The information about pages becomes more and more outdated, while we are processing pages. The new interface solves both these downsides. First, it allows us to read pte bits and clear the soft-dirty bit atomically. It means that CRIU will not need to freeze processes to pre-dump their memory. Second, it clears soft-dirty bits for a specified region of memory. It means CRIU will have actual info about pages to the moment of dumping them. * The new interface has to be much faster because basic page filtering is happening in the kernel. With the old interface, we have to read pagemap for each page. *Implementation Evolution (Short Summary)* From the definition of GetWriteWatch(), we feel like kernel's soft-dirty feature can be used under the hood with some additions like: * reset soft-dirty flag for only a specific region of memory instead of clearing the flag for the entire process * get and clear soft-dirty flag for a specific region atomically So we decided to use ioctl on pagemap file to read or/and reset soft-dirty flag. But using soft-dirty flag, sometimes we get extra pages which weren't even written. They had become soft-dirty because of VMA merging and VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were able to by-pass this short coming by ignoring VM_SOFTDIRTY until David reported that mprotect etc messes up the soft-dirty flag while ignoring VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We discussed if we can revert these patches. But we could not reach to any conclusion. So at this point, I made couple of tries to solve this whole VM_SOFTDIRTY issue by correcting the soft-dirty implementation: * [7] Correct the bug fixed wrongly back in 2014. It had potential to cause regression. We left it behind. * [8] Keep a list of soft-dirty part of a VMA across splits and merges. I got the reply don't increase the size of the VMA by 8 bytes. At this point, we left soft-dirty considering it is too much delicate and userfaultfd [9] seemed like the only way forward. From there onward, we have been basing soft-dirty emulation on userfaultfd wp feature where kernel resolves the faults itself when WP_ASYNC feature is used. It was straight forward to add WP_ASYNC feature in userfautlfd. Now we get only those pages dirty or written-to which are really written in reality. (PS There is another WP_UNPOPULATED userfautfd feature is required which is needed to avoid pre-faulting memory before write-protecting [9].) All the different masks were added on the request of CRIU devs to create interface more generic and better. [1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-… [2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com [3] https://github.com/google/sanitizers [4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit [5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com [6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/ [7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.… [8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.… [9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com [10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com * Original Cover letter from v8* Hello, Note: Soft-dirty pages and pages which have been written-to are synonyms. As kernel already has soft-dirty feature inside which we have given up to use, we are using written-to terminology while using UFFD async WP under the hood. This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear the info about page table entries. The following operations are supported in this ioctl: - Get the information if the pages have been written-to (PAGE_IS_WRITTEN), file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped (PAGE_IS_SWAPPED). - Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which pages have been written-to. - Find pages which have been written-to and write protect the pages (atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE) It is possible to find and clear soft-dirty pages entirely in userspace. But it isn't efficient: - The mprotect and SIGSEGV handler for bookkeeping - The userfaultfd wp (synchronous) with the handler for bookkeeping Some benchmarks can be seen here[1]. This series adds features that weren't present earlier: - There is no atomic get soft-dirty/Written-to status and clear present in the kernel. - The pages which have been written-to can not be found in accurate way. (Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty pages than there actually are.) Historically, soft-dirty PTE bit tracking has been used in the CRIU project. The procfs interface is enough for finding the soft-dirty bit status and clearing the soft-dirty bit of all the pages of a process. We have the use case where we need to track the soft-dirty PTE bit for only specific pages on-demand. We need this tracking and clear mechanism of a region of memory while the process is running to emulate the getWriteWatch() syscall of Windows. *(Moved to using UFFD instead of soft-dirtyi feature to find pages which have been written-to from v7 patch series)*: Stop using the soft-dirty flags for finding which pages have been written to. It is too delicate and wrong as it shows more soft-dirty pages than the actual soft-dirty pages. There is no interest in correcting it [2][3] as this is how the feature was written years ago. It shouldn't be updated to changed behaviour. Peter Xu has suggested using the async version of the UFFD WP [4] as it is based inherently on the PTEs. So in this patch series, I've added a new mode to the UFFD which is asynchronous version of the write protect. When this variant of the UFFD WP is used, the page faults are resolved automatically by the kernel. The pages which have been written-to can be found by reading pagemap file (!PM_UFFD_WP). This feature can be used successfully to find which pages have been written to from the time the pages were write protected. This works just like the soft-dirty flag without showing any extra pages which aren't soft-dirty in reality. The information related to pages if the page is file mapped, present and swapped is required for the CRIU project [5][6]. The addition of the required mask, any mask, excluded mask and return masks are also required for the CRIU project [5]. The IOCTL returns the addresses of the pages which match the specific masks. The page addresses are returned in struct page_region in a compact form. The max_pages is needed to support a use case where user only wants to get a specific number of pages. So there is no need to find all the pages of interest in the range when max_pages is specified. The IOCTL returns when the maximum number of the pages are found. The max_pages is optional. If max_pages is specified, it must be equal or greater than the vec_size. This restriction is needed to handle worse case when one page_region only contains info of one page and it cannot be compacted. This is needed to emulate the Windows getWriteWatch() syscall. The patch series include the detailed selftest which can be used as an example for the uffd async wp test and PAGEMAP_IOCTL. It shows the interface usages as well. [1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora… [2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.… [3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.… [4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n [5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/ [6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/ Regards, Muhammad Usama Anjum Muhammad Usama Anjum (4): fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs tools headers UAPI: Update linux/fs.h with the kernel sources mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL selftests: mm: add pagemap ioctl tests Peter Xu (1): userfaultfd: UFFD_FEATURE_WP_ASYNC Documentation/admin-guide/mm/pagemap.rst | 58 + Documentation/admin-guide/mm/userfaultfd.rst | 35 + fs/proc/task_mmu.c | 513 ++++++ fs/userfaultfd.c | 26 +- include/linux/hugetlb.h | 1 + include/linux/userfaultfd_k.h | 21 +- include/uapi/linux/fs.h | 53 + include/uapi/linux/userfaultfd.h | 9 +- mm/hugetlb.c | 34 +- mm/memory.c | 27 +- tools/include/uapi/linux/fs.h | 53 + tools/testing/selftests/mm/.gitignore | 2 + tools/testing/selftests/mm/Makefile | 3 +- tools/testing/selftests/mm/config | 1 + tools/testing/selftests/mm/pagemap_ioctl.c | 1459 ++++++++++++++++++ tools/testing/selftests/mm/run_vmtests.sh | 4 + 16 files changed, 2275 insertions(+), 24 deletions(-) create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh -- 2.39.2

2 years

4
29
0 0

[PATCH 0/2] Modify vDSO selftests

by Tiezhu Yang

This patchset is based on the next branch of shuah/linux-kselftest.git Tiezhu Yang (2): selftests/vDSO: Add support for LoongArch selftests/vDSO: Get version and name for all archs tools/testing/selftests/vDSO/vdso_config.h | 6 ++++- tools/testing/selftests/vDSO/vdso_test_getcpu.c | 16 +++++-------- .../selftests/vDSO/vdso_test_gettimeofday.c | 26 ++++++---------------- 3 files changed, 18 insertions(+), 30 deletions(-) -- 2.1.0

2 years

1
3
0 0

[PATCH v5] selftests/clone3: Fix broken test under !CONFIG_TIME_NS

by Tiezhu Yang

When execute the following command to test clone3 on LoongArch: # cd tools/testing/selftests/clone3 && make && ./clone3 we can see the following error info: # [5719] Trying clone3() with flags 0x80 (size 0) # Invalid argument - Failed to create new process # [5719] clone3() with flags says: -22 expected 0 not ok 18 [5719] Result (-22) is different than expected (0) This is because if CONFIG_TIME_NS is not set, but the flag CLONE_NEWTIME (0x80) is used to clone a time namespace, it will return -EINVAL in copy_time_ns(). If kernel does not support CONFIG_TIME_NS, /proc/self/ns/time will be not exist, and then we should skip clone3() test with CLONE_NEWTIME. With this patch under !CONFIG_TIME_NS: # cd tools/testing/selftests/clone3 && make && ./clone3 ... # Time namespaces are not supported ok 18 # SKIP Skipping clone3() with CLONE_NEWTIME # Totals: pass:17 fail:0 xfail:0 xpass:0 skip:1 error:0 Fixes: 515bddf0ec41 ("selftests/clone3: test clone3 with CLONE_NEWTIME") Suggested-by: Thomas Gleixner <tglx(a)linutronix.de> Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn> --- v5: -- Rebase on the next branch of shuah/linux-kselftest.git to avoid potential merge conflicts due to changes in the link: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git/c… -- Update the commit message and send it as a single patch Here is the v4 patch: https://lore.kernel.org/loongarch/1685968410-5412-2-git-send-email-yangtiez… tools/testing/selftests/clone3/clone3.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c index e60cf4d..1c61e3c 100644 --- a/tools/testing/selftests/clone3/clone3.c +++ b/tools/testing/selftests/clone3/clone3.c @@ -196,7 +196,12 @@ int main(int argc, char *argv[]) CLONE3_ARGS_NO_TEST); /* Do a clone3() in a new time namespace */ - test_clone3(CLONE_NEWTIME, 0, 0, CLONE3_ARGS_NO_TEST); + if (access("/proc/self/ns/time", F_OK) == 0) { + test_clone3(CLONE_NEWTIME, 0, 0, CLONE3_ARGS_NO_TEST); + } else { + ksft_print_msg("Time namespaces are not supported\n"); + ksft_test_result_skip("Skipping clone3() with CLONE_NEWTIME\n"); + } /* Do a clone3() with exit signal (SIGCHLD) in flags */ test_clone3(SIGCHLD, 0, -EINVAL, CLONE3_ARGS_NO_TEST); -- 2.1.0

2 years

1
1
0 0

[RFC PATCH 00/19] hugetlb support for KVM guest_mem

by Ackerley Tng

Hello, This patchset builds upon a soon-to-be-published WIP patchset that Sean published at https://github.com/sean-jc/linux/tree/x86/kvm_gmem_solo, mentioned at [1]. The tree can be found at: https://github.com/googleprodkernel/linux-cc/tree/gmem-hugetlb-rfc-v1 In this patchset, hugetlb support for KVM's guest_mem (aka gmem) is introduced, allowing VM private memory (for confidential computing) to be backed by hugetlb pages. guest_mem provides userspace with a handle, with which userspace can allocate and deallocate memory for confidential VMs without mapping the memory into userspace. Why use hugetlb instead of introducing a new allocator, like gmem does for 4K and transparent hugepages? + hugetlb provides the following useful functionality, which would otherwise have to be reimplemented: + Allocation of hugetlb pages at boot time, including + Parsing of kernel boot parameters to configure hugetlb + Tracking of usage in hstate + gmem will share the same system-wide pool of hugetlb pages, so users don't have to have separate pools for hugetlb and gmem + Page accounting with subpools + hugetlb pages are tracked in subpools, which gmem uses to reserve pages from the global hstate + Memory charging + hugetlb provides code that charges memory to cgroups + Reporting: hugetlb usage and availability are available at /proc/meminfo, etc The first 11 patches in this patchset is a series of refactoring to decouple hugetlb and hugetlbfs. The central thread binding the refactoring is that some functions (like inode_resv_map(), inode_subpool(), inode_hstate(), etc) rely on a hugetlbfs concept, that the resv_map, subpool, hstate, are in a specific field in a hugetlb inode. Refactoring to parametrize functions by hstate, subpool, resv_map will allow hugetlb to be used by gmem and in other places where these data structures aren't necessarily stored in the same positions in the inode. The refactoring proposed here is just the minimum required to get a proof-of-concept working with gmem. I would like to get opinions on this approach before doing further refactoring. (See TODOs) TODOs: + hugetlb/hugetlbfs refactoring + remove_inode_hugepages() no longer needs to be exposed, it is hugetlbfs specific and used only in inode.c + remove_mapping_hugepages(), remove_inode_single_folio(), hugetlb_unreserve_pages() shouldn't need to take inode as a parameter + Updating inode->i_blocks can be refactored to a separate function and called from hugetlbfs and gmem + alloc_hugetlb_folio_from_subpool() shouldn't need to be parametrized by vma + hugetlb_reserve_pages() should be refactored to be symmetric with hugetlb_unreserve_pages() + It should be parametrized by resv_map + alloc_hugetlb_folio_from_subpool() could perhaps use hugetlb_reserve_pages()? + gmem + Figure out if resv_map should be used by gmem at all + Probably needs more refactoring to decouple resv_map from hugetlb functions Questions for the community: 1. In this patchset, every gmem file backed with hugetlb is given a new subpool. Is that desirable? + In hugetlbfs, a subpool always belongs to a mount, and hugetlbfs has one mount per hugetlb size (2M, 1G, etc) + memfd_create(MFD_HUGETLB) effectively returns a full hugetlbfs file, so it (rightfully) uses the hugetlbfs kernel mounts and their subpools + I gave each file a subpool mostly to speed up implementation and still be able to reserve hugetlb pages from the global hstate based on the gmem file size. + gmem, unlike hugetlbfs, isn't meant to be a full filesystem, so + Should there be multiple mounts, one for each hugetlb size? + Will the mounts be initialized on boot or on first gmem file creation? + Or is one subpool per gmem file fine? 2. Should resv_map be used for gmem at all, since gmem doesn't allow userspace reservations? [1] https://lore.kernel.org/lkml/ZEM5Zq8oo+xnApW9@google.com/ --- Ackerley Tng (19): mm: hugetlb: Expose get_hstate_idx() mm: hugetlb: Move and expose hugetlbfs_zero_partial_page mm: hugetlb: Expose remove_inode_hugepages mm: hugetlb: Decouple hstate, subpool from inode mm: hugetlb: Allow alloc_hugetlb_folio() to be parametrized by subpool and hstate mm: hugetlb: Provide hugetlb_filemap_add_folio() mm: hugetlb: Refactor vma_*_reservation functions mm: hugetlb: Refactor restore_reserve_on_error mm: hugetlb: Use restore_reserve_on_error directly in filesystems mm: hugetlb: Parametrize alloc_hugetlb_folio_from_subpool() by resv_map mm: hugetlb: Parametrize hugetlb functions by resv_map mm: truncate: Expose preparation steps for truncate_inode_pages_final KVM: guest_mem: Refactor kvm_gmem fd creation to be in layers KVM: guest_mem: Refactor cleanup to separate inode and file cleanup KVM: guest_mem: hugetlb: initialization and cleanup KVM: guest_mem: hugetlb: allocate and truncate from hugetlb KVM: selftests: Add basic selftests for hugetlbfs-backed guest_mem KVM: selftests: Support various types of backing sources for private memory KVM: selftests: Update test for various private memory backing source types fs/hugetlbfs/inode.c | 102 ++-- include/linux/hugetlb.h | 86 ++- include/linux/mm.h | 1 + include/uapi/linux/kvm.h | 25 + mm/hugetlb.c | 324 +++++++----- mm/truncate.c | 24 +- .../testing/selftests/kvm/guest_memfd_test.c | 33 +- .../testing/selftests/kvm/include/test_util.h | 14 + tools/testing/selftests/kvm/lib/test_util.c | 74 +++ .../kvm/x86_64/private_mem_conversions_test.c | 38 +- virt/kvm/guest_mem.c | 488 ++++++++++++++---- 11 files changed, 882 insertions(+), 327 deletions(-) -- 2.41.0.rc0.172.g3f132b7071-goog

2 years

4
22
0 0

[PATCH v3 00/10] RISCV: Add KVM_GET_REG_LIST API

by Haibo Xu

KVM_GET_REG_LIST will dump all register IDs that are available to KVM_GET/SET_ONE_REG and It's very useful to identify some platform regression issue during VM migration. Patch 1-7 re-structured the get-reg-list test in aarch64 to make some of the code as common test framework that can be shared by riscv. Patch 8 enabled the KVM_GET_REG_LIST API in riscv and patch 9-10 added the corresponding kselftest for checking possible register regressions. The get-reg-list kvm selftest was ported from aarch64 and tested with Linux 6.4-rc5 on a Qemu riscv64 virt machine. --- Changed since v2: * Rebase to Linux 6.4-rc5 * Filter out ZICBO* config and ISA_EXT registers report if the extensions were not supported in host * Enable AIA CSR test * Move vCPU extension check_supported() to finalize_vcpu() per Andrew's suggestion * Switch to use KVM_REG_SIZE_ULONG for most registers' definition --- Changed since v1: * rebase to Andrew's changes * fix coding style Andrew Jones (7): KVM: arm64: selftests: Replace str_with_index with strdup_printf KVM: arm64: selftests: Drop SVE cap check in print_reg KVM: arm64: selftests: Remove print_reg's dependency on vcpu_config KVM: arm64: selftests: Rename vcpu_config and add to kvm_util.h KVM: arm64: selftests: Delete core_reg_fixup KVM: arm64: selftests: Split get-reg-list test code KVM: arm64: selftests: Finish generalizing get-reg-list Haibo Xu (3): KVM: riscv: Add KVM_GET_REG_LIST API support KVM: riscv: selftests: Skip some registers set operation KVM: riscv: selftests: Add get-reg-list test Documentation/virt/kvm/api.rst | 2 +- arch/riscv/kvm/vcpu.c | 378 +++++++++++ tools/testing/selftests/kvm/Makefile | 11 +- .../selftests/kvm/aarch64/get-reg-list.c | 540 ++-------------- tools/testing/selftests/kvm/get-reg-list.c | 421 ++++++++++++ .../selftests/kvm/include/kvm_util_base.h | 16 + .../selftests/kvm/include/riscv/processor.h | 3 + .../testing/selftests/kvm/include/test_util.h | 2 + tools/testing/selftests/kvm/lib/test_util.c | 15 + .../selftests/kvm/riscv/get-reg-list.c | 611 ++++++++++++++++++ 10 files changed, 1499 insertions(+), 500 deletions(-) create mode 100644 tools/testing/selftests/kvm/get-reg-list.c create mode 100644 tools/testing/selftests/kvm/riscv/get-reg-list.c -- 2.34.1

2 years

3
25
0 0

[PATCH bpf,v5 0/4] Socket lookup BPF API from tc/xdp ingress does not respect VRF bindings.

by Gilad Sever

When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't respected. This patchset fixes this by regarding the incoming device's VRF attachment when performing the socket lookups from tc/xdp. The first two patches are coding changes which factor out the tc helper's logic which was shared with cg/sk_skb (which operate correctly). This refactoring is needed in order to avoid affecting the cgroup/sk_skb flows as there does not seem to be a strict criteria for discerning which flow the helper is called from based on the net device or packet information. The third patch contains the actual bugfix. The fourth patch adds bpf tests for these lookup functions. --- v5: Use reverse xmas tree indentation v4: - Move dev_sdif() to include/linux/netdevice.h as suggested by Stanislav Fomichev - Remove SYS and SYS_NOFAIL duplicate definitions v3: - Rename bpf_l2_sdif() to dev_sdif() as suggested by Stanislav Fomichev - Added xdp tests as suggested by Daniel Borkmann - Use start_server() to avoid duplicate code as suggested by Stanislav Fomichev v2: Fixed uninitialized var in test patch (4). Gilad Sever (4): bpf: factor out socket lookup functions for the TC hookpoint. bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC hookpoint bpf: fix bpf socket lookup from tc/xdp to respect socket VRF bindings selftests/bpf: Add vrf_socket_lookup tests include/linux/netdevice.h | 9 + net/core/filter.c | 123 +++++-- .../bpf/prog_tests/vrf_socket_lookup.c | 312 ++++++++++++++++++ .../selftests/bpf/progs/vrf_socket_lookup.c | 88 +++++ 4 files changed, 511 insertions(+), 21 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/vrf_socket_lookup.c create mode 100644 tools/testing/selftests/bpf/progs/vrf_socket_lookup.c -- 2.34.1

2 years

2
5
0 0

[PATCH net 0/2] selftests/ptp: Add support for new timestamp IOCTLs

by Alex Maftei

PTP_SYS_OFFSET_EXTENDED was added in November 2018 in 361800876f80 (" ptp: add PTP_SYS_OFFSET_EXTENDED ioctl") and PTP_SYS_OFFSET_PRECISE was added in February 2016 in 719f1aa4a671 ("ptp: Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping") The PTP selftest code is lacking support for these two IOCTLS. This short series of patches adds support for them. Alex Maftei (2): selftests/ptp: Add -x option for testing PTP_SYS_OFFSET_EXTENDED selftests/ptp: Add -X option for testing PTP_SYS_OFFSET_PRECISE tools/testing/selftests/ptp/testptp.c | 71 ++++++++++++++++++++++++++- 1 file changed, 69 insertions(+), 2 deletions(-) -- 2.28.0

2 years

5
7
0 0

[PATCH v2 0/3] tracing/user_events: Fix incorrect return value for

by sunliming

Now the writing operation return the count of writes whether events are enabled or disabled. Fix this by just return -ENOENT when events are disabled. v1 -> v2: - Change the returh vale from -EFAULT to -ENOENT sunliming (3): tracing/user_events: Fix incorrect return value for writing operation when events are disabled selftests/user_events: Enable the event before write_fault test in ftrace self-test selftests/user_events: Add test cases when event is disabled kernel/trace/trace_events_user.c | 3 ++- tools/testing/selftests/user_events/ftrace_test.c | 8 ++++++++ 2 files changed, 10 insertions(+), 1 deletion(-) -- 2.25.1

2 years

4
8
0 0

[PATCH 0/3] RFC: F_OFD_GETLK should provide more info

by Stas Sergeev

This patch-set implements 2 small extensions to the current F_OFD_GETLK, allowing it to gather more information than it currently returns. First extension allows to use F_UNLCK on query, which currently returns EINVAL. Instead it can be used to query the locks on a particular fd - something that is not currently possible. The basic idea is that on F_OFD_GETLK, F_UNLCK would "conflict" with (or query) any types of the lock on the same fd, and ignore any locks on other fds. Use-cases: 1. CRIU-alike scenario when you want to read the locking info from an fd for the later reconstruction. This can now be done by setting l_start and l_len to 0 to cover entire file range, and do F_OFD_GETLK. In the loop you need to advance l_start past the returned lock ranges, to eventually collect all locked ranges. 2. Implementing the lock checking/enforcing policy. Say you want to implement an "auditor" module in your program, that checks that the I/O is done only after the proper locking is applied on a file region. In this case you need to know if the particular region is locked on that fd, and if so - with what type of the lock. If you would do that currently (without this extension) then you can only check for the write locks, and for that you need to probe the lock on your fd and then open the same file via nother fd and probe there. That way you can identify the write lock on a particular fd, but such trick is non-atomic and complex. As for finding out the read lock on a particular fd - impossible. This extension allows to do such queries without any extra efforts. 3. Implementing the mandatory locking policy. Suppose you want to make a policy where the write lock inhibits any unlocked readers and writers. Currently you need to check if the write lock is present on some other fd, and if it is not there - allow the I/O operation. But because the write lock can appear at any moment, you need to do that under some global lock, which can be released only when the I/O operation is finished. With the proposed extension you can instead just check the write lock on your own fd first, and if it is there - allow the I/O operation on that fd without using any global lock. Only if there is no write lock on this fd, then you need to take global lock and check for a write lock on other fds. The second patch implements another extension. Currently F_OFD_GETLK returns -1 in the l_pid member. This patch removes the code that writes -1 there, so that the proper pid is returned. I am not sure why it was decided to deliberately hide the owner's pid. It may be needed in case you want to send some message to the offending locker, like eg SIGKILL. The third patch adds a test-case for OFD locks. It tests both the generic things and the proposed extensions. Stas Sergeev (3): fs/locks: F_UNLCK extension for F_OFD_GETLK fd/locks: allow get the lock owner by F_OFD_GETLK selftests: add OFD lock tests fs/locks.c | 25 +++- tools/testing/selftests/locking/Makefile | 2 + tools/testing/selftests/locking/ofdlocks.c | 135 +++++++++++++++++++++ 3 files changed, 157 insertions(+), 5 deletions(-) create mode 100644 tools/testing/selftests/locking/ofdlocks.c CC: Jeff Layton <jlayton(a)kernel.org> CC: Chuck Lever <chuck.lever(a)oracle.com> CC: Alexander Viro <viro(a)zeniv.linux.org.uk> CC: Christian Brauner <brauner(a)kernel.org> CC: linux-fsdevel(a)vger.kernel.org CC: linux-kernel(a)vger.kernel.org CC: Shuah Khan <shuah(a)kernel.org> CC: linux-kselftest(a)vger.kernel.org -- 2.39.2

2 years

2
2
0 0

[PATCH] selftests/input: add test to cover failure on dev->absinfo

by Dana Elfassy

Added coverage for the case where !dev->absinfo, and executing the normal flow as well. Signed-off-by: Dana Elfassy <dangel101(a)gmail.com> --- This patch depends on '[v3] selftests/input: Introduce basic tests for evdev ioctls' [1] sent to the ML. [1] https://patchwork.kernel.org/project/linux-input/patch/20230607153214.15933… tools/testing/selftests/input/evioc-test.c | 23 ++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/tools/testing/selftests/input/evioc-test.c b/tools/testing/selftests/input/evioc-test.c index ad7b93fe39cf..1a688002e1e7 100644 --- a/tools/testing/selftests/input/evioc-test.c +++ b/tools/testing/selftests/input/evioc-test.c @@ -234,4 +234,27 @@ TEST(eviocsrep_set_repeat_settings) selftest_uinput_destroy(uidev); } +TEST(eviocsabs_set_abs_value_limits) +{ + struct selftest_uinput *uidev; + struct input_absinfo absinfo; + int rc; + + // fail test on dev->absinfo + rc = selftest_uinput_create_device(&uidev); + ASSERT_EQ(0, rc); + ASSERT_NE(NULL, uidev); + rc = ioctl(uidev->evdev_fd, EVIOCSABS(0), &absinfo); + ASSERT_EQ(-1, rc); + selftest_uinput_destroy(uidev); + + // ioctl normal flow + rc = selftest_uinput_create_device(&uidev, EV_ABS, -1); + ASSERT_EQ(0, rc); + ASSERT_NE(NULL, uidev); + rc = ioctl(uidev->evdev_fd, EVIOCSABS(0), &absinfo); + ASSERT_EQ(0, rc); + selftest_uinput_destroy(uidev); +} + TEST_HARNESS_MAIN -- 2.41.0

2 years

1
0
0 0

[PATCH v3 00/10] Add Intel VT-d nested translation

by Yi Liu

This is to add Intel VT-d nested translation based on IOMMUFD nesting infrastructure. As the iommufd nesting infrastructure series[1], iommu core supports new ops to report iommu hardware information, allocate domains with user data and sync stage-1 IOTLB. The data required in the three paths are vendor-specific, so 1) IOMMU_HW_INFO_TYPE_INTEL_VTD and struct iommu_device_info_vtd are defined to report iommu hardware information for Intel VT-d . 2) IOMMU_HWPT_DATA_VTD_S1 is defined for the Intel VT-d stage-1 page table, it will be used in the stage-1 domain allocation and IOTLB syncing path. struct iommu_hwpt_intel_vtd is defined to pass user_data for the Intel VT-d stage-1 domain allocation. struct iommu_hwpt_invalidate_intel_vtd is defined to pass the data for the Intel VT-d stage-1 IOTLB invalidation. With above IOMMUFD extensions, the intel iommu driver implements the three paths to support nested translation. The first Intel platform supporting nested translation is Sapphire Rapids which, unfortunately, has a hardware errata [2] requiring special treatment. This errata happens when a stage-1 page table page (either level) is located in a stage-2 read-only region. In that case the IOMMU hardware may ignore the stage-2 RO permission and still set the A/D bit in stage-1 page table entries during page table walking. A flag IOMMU_HW_INFO_VTD_ERRATA_772415_SPR17 is introduced to report this errata to userspace. With that restriction the user should either disable nested translation to favor RO stage-2 mappings or ensure no RO stage-2 mapping to enable nested translation. Intel-iommu driver is armed with necessary checks to prevent such mix in patch10 of this series. Qemu currently does add RO mappings though. The vfio agent in Qemu simply maps all valid regions in the GPA address space which certainly includes RO regions e.g. vbios. In reality we don't know a usage relying on DMA reads from the BIOS region. Hence finding a way to allow user opt-out RO mappings in Qemu might be an acceptable tradeoff. But how to achieve it cleanly needs more discussion in Qemu community. For now we just hacked Qemu to test. Complete code can be found in [3], QEMU could can be found in [4]. base-commit: ce9b593b1f74ccd090edc5d2ad397da84baa9946 [1] https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.c… [2] https://www.intel.com/content/www/us/en/content-details/772415/content-deta… [3] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting [4] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv4.mig.reset.v4_var3%… Change log: v3: - Further split the patches into an order of adding helpers for nested domain, iotlb flush, nested domain attachment and nested domain allocation callback, then report the hw_info to userspace. - Add batch support in cache invalidation from userspace - Disallow nested translation usage if RO mappings exists in stage-2 domain due to errata on readonly mappings on Sapphire Rapids platform. v2: https://lore.kernel.org/linux-iommu/20230309082207.612346-1-yi.l.liu@intel.… - The iommufd infrastructure is split to be separate series. v1: https://lore.kernel.org/linux-iommu/20230209043153.14964-1-yi.l.liu@intel.c… Regards, Yi Liu Lu Baolu (5): iommu/vt-d: Extend dmar_domain to support nested domain iommu/vt-d: Add helper for nested domain allocation iommu/vt-d: Add helper to setup pasid nested translation iommu/vt-d: Add nested domain allocation iommu/vt-d: Disallow nesting on domains with read-only mappings Yi Liu (5): iommufd: Add data structure for Intel VT-d stage-1 domain allocation iommu/vt-d: Make domain attach helpers to be extern iommu/vt-d: Set the nested domain to a device iommu/vt-d: Add iotlb flush for nested domain iommu/vt-d: Implement hw_info for iommu capability query drivers/iommu/intel/Makefile | 2 +- drivers/iommu/intel/iommu.c | 78 ++++++++++++--- drivers/iommu/intel/iommu.h | 55 +++++++++-- drivers/iommu/intel/nested.c | 181 +++++++++++++++++++++++++++++++++++ drivers/iommu/intel/pasid.c | 151 +++++++++++++++++++++++++++++ drivers/iommu/intel/pasid.h | 2 + drivers/iommu/iommufd/main.c | 6 ++ include/linux/iommu.h | 1 + include/uapi/linux/iommufd.h | 149 ++++++++++++++++++++++++++++ 9 files changed, 603 insertions(+), 22 deletions(-) create mode 100644 drivers/iommu/intel/nested.c -- 2.34.1

2 years

7
39
0 0

[linux-next:master] BUILD REGRESSION 47045630bc409ce6606d97b790895210dd1d517d

by kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master branch HEAD: 47045630bc409ce6606d97b790895210dd1d517d Add linux-next specific files for 20230619 Error/Warning reports: https://lore.kernel.org/oe-kbuild-all/202306122223.HHER4zOo-lkp@intel.com https://lore.kernel.org/oe-kbuild-all/202306141934.UKmM9bFX-lkp@intel.com https://lore.kernel.org/oe-kbuild-all/202306151954.Rsz6HP7h-lkp@intel.com https://lore.kernel.org/oe-kbuild-all/202306191640.NE97q8G3-lkp@intel.com https://lore.kernel.org/oe-kbuild-all/202306200328.NX4JlBDv-lkp@intel.com Error/Warning: (recently discovered and may have been fixed) arch/parisc/kernel/pdt.c:65:6: warning: no previous prototype for 'arch_report_meminfo' [-Wmissing-prototypes] arch/riscv/kvm/aia_imsic.c:237:9: error: call to undeclared function 'arch_atomic_long_fetch_or'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] drivers/char/mem.c:164:25: error: implicit declaration of function 'unxlate_dev_mem_ptr'; did you mean 'xlate_dev_mem_ptr'? [-Werror=implicit-function-declaration] drivers/gpu/drm/i915/display/intel_display_power.h:255:70: error: declaration of 'struct seq_file' will not be visible outside of this function [-Werror,-Wvisibility] drivers/leds/leds-cht-wcove.c:144:21: warning: no previous prototype for 'cht_wc_leds_brightness_get' [-Wmissing-prototypes] lib/kunit/executor_test.c:138:4: warning: cast from 'void (*)(const void *)' to 'kunit_action_t *' (aka 'void (*)(void *)') converts to incompatible function type [-Wcast-function-type-strict] lib/kunit/test.c:775:38: warning: cast from 'void (*)(const void *)' to 'kunit_action_t *' (aka 'void (*)(void *)') converts to incompatible function type [-Wcast-function-type-strict] security/apparmor/policy_unpack.c:1173: warning: expecting prototype for verify_dfa_accept_xindex(). Prototype was for verify_dfa_accept_index() instead Unverified Error/Warning (likely false positive, please contact us if interested): drivers/staging/media/atomisp/pci/atomisp_fops.c:517 atomisp_open() warn: missing unwind goto? drivers/usb/cdns3/cdns3-starfive.c:23: warning: expecting prototype for cdns3(). Prototype was for USB_STRAP_HOST() instead fs/smb/client/cifsfs.c:984 cifs_smb3_do_mount() warn: possible memory leak of 'cifs_sb' fs/smb/client/cifssmb.c:4089 CIFSFindFirst() warn: missing error code? 'rc' fs/smb/client/cifssmb.c:4216 CIFSFindNext() warn: missing error code? 'rc' fs/smb/client/connect.c:2775 cifs_match_super() error: 'tlink' dereferencing possible ERR_PTR() fs/smb/client/connect.c:2974 generic_ip_connect() error: we previously assumed 'socket' could be null (see line 2962) {standard input}: Error: local label `"2" (instance number 9 of a fb label)' is not defined {standard input}:1097: Error: pcrel too far Error/Warning ids grouped by kconfigs: gcc_recent_errors |-- i386-allyesconfig | `-- drivers-leds-leds-cht-wcove.c:warning:no-previous-prototype-for-cht_wc_leds_brightness_get |-- i386-randconfig-m021-20230618 | |-- drivers-staging-media-atomisp-pci-atomisp_fops.c-atomisp_open()-warn:missing-unwind-goto | |-- fs-smb-client-cifsfs.c-cifs_smb3_do_mount()-warn:possible-memory-leak-of-cifs_sb | |-- fs-smb-client-cifssmb.c-CIFSFindFirst()-warn:missing-error-code-rc | |-- fs-smb-client-cifssmb.c-CIFSFindNext()-warn:missing-error-code-rc | |-- fs-smb-client-connect.c-cifs_match_super()-error:tlink-dereferencing-possible-ERR_PTR() | `-- fs-smb-client-connect.c-generic_ip_connect()-error:we-previously-assumed-socket-could-be-null-(see-line-) |-- parisc-allyesconfig | `-- arch-parisc-kernel-pdt.c:warning:no-previous-prototype-for-arch_report_meminfo |-- parisc-defconfig | `-- arch-parisc-kernel-pdt.c:warning:no-previous-prototype-for-arch_report_meminfo |-- parisc-randconfig-r001-20230619 | `-- arch-parisc-kernel-pdt.c:warning:no-previous-prototype-for-arch_report_meminfo |-- parisc-randconfig-s042-20230618 | `-- arch-parisc-kernel-pdt.c:warning:no-previous-prototype-for-arch_report_meminfo |-- parisc64-defconfig | `-- arch-parisc-kernel-pdt.c:warning:no-previous-prototype-for-arch_report_meminfo |-- riscv-allmodconfig | `-- drivers-usb-cdns3-cdns3-starfive.c:warning:expecting-prototype-for-cdns3().-Prototype-was-for-USB_STRAP_HOST()-instead |-- riscv-allyesconfig | `-- drivers-usb-cdns3-cdns3-starfive.c:warning:expecting-prototype-for-cdns3().-Prototype-was-for-USB_STRAP_HOST()-instead |-- riscv-randconfig-s051-20230618 | `-- arch-riscv-kernel-signal.c:sparse:sparse:incorrect-type-in-initializer-(different-address-spaces)-expected-void-__val-got-void-noderef-__user-assigned-datap |-- sh-allmodconfig | |-- drivers-char-mem.c:error:implicit-declaration-of-function-unxlate_dev_mem_ptr | |-- standard-input:Error:local-label-(instance-number-of-a-fb-label)-is-not-defined | `-- standard-input:Error:pcrel-too-far |-- sh-magicpanelr2_defconfig | `-- drivers-char-mem.c:error:implicit-declaration-of-function-unxlate_dev_mem_ptr |-- sh-rsk7269_defconfig | `-- drivers-char-mem.c:error:implicit-declaration-of-function-unxlate_dev_mem_ptr |-- x86_64-allyesconfig | `-- drivers-leds-leds-cht-wcove.c:warning:no-previous-prototype-for-cht_wc_leds_brightness_get `-- x86_64-randconfig-m001-20230618 |-- fs-smb-client-cifsfs.c-cifs_smb3_do_mount()-warn:possible-memory-leak-of-cifs_sb |-- fs-smb-client-cifssmb.c-CIFSFindFirst()-warn:missing-error-code-rc |-- fs-smb-client-cifssmb.c-CIFSFindNext()-warn:missing-error-code-rc |-- fs-smb-client-connect.c-cifs_match_super()-error:tlink-dereferencing-possible-ERR_PTR() `-- fs-smb-client-connect.c-generic_ip_connect()-error:we-previously-assumed-socket-could-be-null-(see-line-) clang_recent_errors |-- arm-randconfig-r002-20230619 | |-- lib-kunit-executor_test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type | |-- lib-kunit-test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type | `-- security-apparmor-policy_unpack.c:warning:expecting-prototype-for-verify_dfa_accept_xindex().-Prototype-was-for-verify_dfa_accept_index()-instead |-- hexagon-randconfig-r011-20230619 | `-- bin-bash:line:Segmentation-fault-LLVM_OBJCOPY-llvm-objcopy-pahole-J-btf_gen_floats-j-lang_exclude-rust-skip_encoding_btf_inconsistent_proto-btf_gen_optimized-btf_base-vmlinux-drivers-iio-adc-max1363.k |-- hexagon-randconfig-r041-20230619 | |-- lib-kunit-executor_test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type | `-- lib-kunit-test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type |-- hexagon-randconfig-r045-20230619 | |-- lib-kunit-executor_test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type | `-- lib-kunit-test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type |-- powerpc-randconfig-r012-20230619 | |-- lib-kunit-executor_test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type | `-- lib-kunit-test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type |-- riscv-randconfig-r042-20230619 | |-- arch-riscv-kvm-aia_imsic.c:error:call-to-undeclared-function-arch_atomic_long_fetch_or-ISO-C99-and-later-do-not-support-implicit-function-declarations | `-- lib-kunit-test.c:warning:cast-from-void-(-)(const-void-)-to-kunit_action_t-(aka-void-(-)(void-)-)-converts-to-incompatible-function-type |-- x86_64-randconfig-a012-20230619 | `-- security-apparmor-policy_unpack.c:warning:expecting-prototype-for-verify_dfa_accept_xindex().-Prototype-was-for-verify_dfa_accept_index()-instead |-- x86_64-randconfig-r015-20230619 | |-- drivers-gpu-drm-i915-display-intel_display_power.h:error:declaration-of-struct-seq_file-will-not-be-visible-outside-of-this-function-Werror-Wvisibility | `-- drivers-net-ethernet-jme.o:warning:objtool:jme_check_link()-falls-through-to-next-function-jme_powersave_phy() `-- x86_64-rhel-8.3-rust `-- security-apparmor-policy_unpack.c:warning:expecting-prototype-for-verify_dfa_accept_xindex().-Prototype-was-for-verify_dfa_accept_index()-instead elapsed time: 1330m configs tested: 126 configs skipped: 4 tested configs: alpha allyesconfig gcc alpha defconfig gcc alpha randconfig-r023-20230619 gcc arc allyesconfig gcc arc axs101_defconfig gcc arc defconfig gcc arc randconfig-r021-20230619 gcc arc randconfig-r043-20230619 gcc arm alldefconfig clang arm allmodconfig gcc arm allyesconfig gcc arm aspeed_g4_defconfig clang arm collie_defconfig clang arm defconfig gcc arm hisi_defconfig gcc arm imx_v6_v7_defconfig gcc arm jornada720_defconfig gcc arm neponset_defconfig clang arm pxa910_defconfig gcc arm randconfig-r002-20230619 clang arm randconfig-r046-20230619 gcc arm vexpress_defconfig clang arm vf610m4_defconfig gcc arm64 allyesconfig gcc arm64 defconfig gcc arm64 randconfig-r031-20230619 gcc csky defconfig gcc csky randconfig-r022-20230619 gcc hexagon randconfig-r011-20230619 clang hexagon randconfig-r041-20230619 clang hexagon randconfig-r045-20230619 clang i386 allyesconfig gcc i386 buildonly-randconfig-r004-20230619 gcc i386 buildonly-randconfig-r005-20230619 gcc i386 buildonly-randconfig-r006-20230619 gcc i386 debian-10.3 gcc i386 defconfig gcc i386 randconfig-i001-20230619 gcc i386 randconfig-i002-20230619 gcc i386 randconfig-i003-20230619 gcc i386 randconfig-i004-20230619 gcc i386 randconfig-i005-20230619 gcc i386 randconfig-i006-20230619 gcc i386 randconfig-i011-20230619 clang i386 randconfig-i012-20230619 clang i386 randconfig-i013-20230619 clang i386 randconfig-i014-20230619 clang i386 randconfig-i015-20230619 clang i386 randconfig-i016-20230619 clang i386 randconfig-r033-20230619 gcc loongarch allmodconfig gcc loongarch allnoconfig gcc loongarch defconfig gcc loongarch randconfig-r006-20230619 gcc m68k allmodconfig gcc m68k allyesconfig gcc m68k defconfig gcc m68k m5208evb_defconfig gcc m68k randconfig-r034-20230619 gcc m68k randconfig-r036-20230619 gcc microblaze randconfig-r024-20230619 gcc mips allmodconfig gcc mips allyesconfig gcc mips mtx1_defconfig clang nios2 defconfig gcc nios2 randconfig-r003-20230619 gcc nios2 randconfig-r004-20230619 gcc nios2 randconfig-r016-20230619 gcc nios2 randconfig-r032-20230619 gcc parisc allyesconfig gcc parisc defconfig gcc parisc randconfig-r001-20230619 gcc parisc64 defconfig gcc powerpc allmodconfig gcc powerpc allnoconfig gcc powerpc gamecube_defconfig clang powerpc randconfig-r012-20230619 clang riscv allmodconfig gcc riscv allnoconfig gcc riscv allyesconfig gcc riscv defconfig gcc riscv nommu_k210_defconfig gcc riscv nommu_virt_defconfig clang riscv randconfig-r042-20230619 clang riscv rv32_defconfig gcc s390 allmodconfig gcc s390 allyesconfig gcc s390 defconfig gcc s390 randconfig-r044-20230619 clang sh allmodconfig gcc sh apsh4ad0a_defconfig gcc sh landisk_defconfig gcc sh magicpanelr2_defconfig gcc sh rsk7269_defconfig gcc sh se7750_defconfig gcc sh shx3_defconfig gcc sparc allyesconfig gcc sparc defconfig gcc sparc64 randconfig-r005-20230619 gcc sparc64 randconfig-r014-20230619 gcc um defconfig gcc um i386_defconfig gcc um randconfig-r013-20230619 gcc um x86_64_defconfig gcc x86_64 allyesconfig gcc x86_64 buildonly-randconfig-r001-20230619 gcc x86_64 buildonly-randconfig-r002-20230619 gcc x86_64 buildonly-randconfig-r003-20230619 gcc x86_64 defconfig gcc x86_64 kexec gcc x86_64 randconfig-a001-20230619 gcc x86_64 randconfig-a002-20230619 gcc x86_64 randconfig-a003-20230619 gcc x86_64 randconfig-a004-20230619 gcc x86_64 randconfig-a005-20230619 gcc x86_64 randconfig-a006-20230619 gcc x86_64 randconfig-a011-20230619 clang x86_64 randconfig-a012-20230619 clang x86_64 randconfig-a013-20230619 clang x86_64 randconfig-a014-20230619 clang x86_64 randconfig-a015-20230619 clang x86_64 randconfig-a016-20230619 clang x86_64 randconfig-r015-20230619 clang x86_64 rhel-8.3-rust clang x86_64 rhel-8.3 gcc xtensa randconfig-r025-20230619 gcc -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

2 years

1
0
0 0

[PATCH] selftests/input: add test to cover len > maxlen in bits_to_user()

by Dana Elfassy

In order to cover this case, setting 'maxlen = 0', with the following explanation: EVIOCGKEY is executed from evdev_do_ioctl(), which is called from evdev_ioctl_handler(). evdev_ioctl_handler() is called from 2 functions, where by code coverage, only the first one is in use. ‘compat’ is given the value ‘0’ [1]. Thus, the condition [2] is always false. This means ‘len’ always equals a positive number [3] ‘maxlen’ in evdev_handle_get_val [4] is defined locally in evdev_do_ioctl() [5], and is sent in the variable 'size' [6] [1] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L1281 [2] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L705 [3] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L707 [4] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L886 [5] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L1155 [6] https://elixir.bootlin.com/linux/v6.2/source/drivers/input/evdev.c#L1141 Signed-off-by: Dana Elfassy <dangel101(a)gmail.com> --- tools/testing/selftests/input/evioc-test.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/tools/testing/selftests/input/evioc-test.c b/tools/testing/selftests/input/evioc-test.c index ad7b93fe39cf..b94de2ee5596 100644 --- a/tools/testing/selftests/input/evioc-test.c +++ b/tools/testing/selftests/input/evioc-test.c @@ -234,4 +234,23 @@ TEST(eviocsrep_set_repeat_settings) selftest_uinput_destroy(uidev); } +TEST(eviocgkey_get_global_key_state) +{ + struct selftest_uinput *uidev; + int rep_values[2]; + int rc; + + memset(rep_values, 0, sizeof(rep_values)); + + rc = selftest_uinput_create_device(&uidev); + ASSERT_EQ(0, rc); + ASSERT_NE(NULL, uidev); + + /* ioctl to create the scenario where len > maxlen in bits_to_user() */ + rc = ioctl(uidev->evdev_fd, EVIOCGKEY(0), rep_values); + ASSERT_EQ(0, rc); + + selftest_uinput_destroy(uidev); +} + TEST_HARNESS_MAIN -- 2.41.0

2 years

1
0
0 0

[PATCH 5.4 01/64] test_firmware: fix a memory leak with reqs buffer

by Greg Kroah-Hartman

From: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr> [ Upstream commit be37bed754ed90b2655382f93f9724b3c1aae847 ] Dan Carpenter spotted that test_fw_config->reqs will be leaked if trigger_batched_requests_store() is called two or more times. The same appears with trigger_batched_requests_async_store(). This bug wasn't trigger by the tests, but observed by Dan's visual inspection of the code. The recommended workaround was to return -EBUSY if test_fw_config->reqs is already allocated. Fixes: 7feebfa487b92 ("test_firmware: add support for request_firmware_into_buf") Cc: Luis Chamberlain <mcgrof(a)kernel.org> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: Russ Weight <russell.h.weight(a)intel.com> Cc: Tianfei Zhang <tianfei.zhang(a)intel.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Colin Ian King <colin.i.king(a)gmail.com> Cc: Randy Dunlap <rdunlap(a)infradead.org> Cc: linux-kselftest(a)vger.kernel.org Cc: stable(a)vger.kernel.org # v5.4 Suggested-by: Dan Carpenter <error27(a)gmail.com> Suggested-by: Takashi Iwai <tiwai(a)suse.de> Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr> Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org> Acked-by: Luis Chamberlain <mcgrof(a)kernel.org> Link: https://lore.kernel.org/r/20230509084746.48259-2-mirsad.todorovac@alu.unizg… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- lib/test_firmware.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/lib/test_firmware.c b/lib/test_firmware.c index 3146845f01562..38553944e9675 100644 --- a/lib/test_firmware.c +++ b/lib/test_firmware.c @@ -679,6 +679,11 @@ static ssize_t trigger_batched_requests_store(struct device *dev, mutex_lock(&test_fw_mutex); + if (test_fw_config->reqs) { + rc = -EBUSY; + goto out_bail; + } + test_fw_config->reqs = vzalloc(array3_size(sizeof(struct test_batched_req), test_fw_config->num_requests, 2)); @@ -778,6 +783,11 @@ ssize_t trigger_batched_requests_async_store(struct device *dev, mutex_lock(&test_fw_mutex); + if (test_fw_config->reqs) { + rc = -EBUSY; + goto out_bail; + } + test_fw_config->reqs = vzalloc(array3_size(sizeof(struct test_batched_req), test_fw_config->num_requests, 2)); -- 2.39.2

2 years

1
0
0 0

[PATCH 5.10 05/89] test_firmware: fix a memory leak with reqs buffer

by Greg Kroah-Hartman

From: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr> [ Upstream commit be37bed754ed90b2655382f93f9724b3c1aae847 ] Dan Carpenter spotted that test_fw_config->reqs will be leaked if trigger_batched_requests_store() is called two or more times. The same appears with trigger_batched_requests_async_store(). This bug wasn't trigger by the tests, but observed by Dan's visual inspection of the code. The recommended workaround was to return -EBUSY if test_fw_config->reqs is already allocated. Fixes: 7feebfa487b92 ("test_firmware: add support for request_firmware_into_buf") Cc: Luis Chamberlain <mcgrof(a)kernel.org> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: Russ Weight <russell.h.weight(a)intel.com> Cc: Tianfei Zhang <tianfei.zhang(a)intel.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Colin Ian King <colin.i.king(a)gmail.com> Cc: Randy Dunlap <rdunlap(a)infradead.org> Cc: linux-kselftest(a)vger.kernel.org Cc: stable(a)vger.kernel.org # v5.4 Suggested-by: Dan Carpenter <error27(a)gmail.com> Suggested-by: Takashi Iwai <tiwai(a)suse.de> Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr> Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org> Acked-by: Luis Chamberlain <mcgrof(a)kernel.org> Link: https://lore.kernel.org/r/20230509084746.48259-2-mirsad.todorovac@alu.unizg… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- lib/test_firmware.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/lib/test_firmware.c b/lib/test_firmware.c index 4884057eb53f0..ed0455a9ded87 100644 --- a/lib/test_firmware.c +++ b/lib/test_firmware.c @@ -863,6 +863,11 @@ static ssize_t trigger_batched_requests_store(struct device *dev, mutex_lock(&test_fw_mutex); + if (test_fw_config->reqs) { + rc = -EBUSY; + goto out_bail; + } + test_fw_config->reqs = vzalloc(array3_size(sizeof(struct test_batched_req), test_fw_config->num_requests, 2)); @@ -962,6 +967,11 @@ ssize_t trigger_batched_requests_async_store(struct device *dev, mutex_lock(&test_fw_mutex); + if (test_fw_config->reqs) { + rc = -EBUSY; + goto out_bail; + } + test_fw_config->reqs = vzalloc(array3_size(sizeof(struct test_batched_req), test_fw_config->num_requests, 2)); -- 2.39.2

2 years

1
0
0 0

[PATCH 5.10 04/89] test_firmware: prevent race conditions by a correct implementation of locking

by Greg Kroah-Hartman

From: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr> [ Upstream commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 ] Dan Carpenter spotted a race condition in a couple of situations like these in the test_firmware driver: static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg) { u8 val; int ret; ret = kstrtou8(buf, 10, &val); if (ret) return ret; mutex_lock(&test_fw_mutex); *(u8 *)cfg = val; mutex_unlock(&test_fw_mutex); /* Always return full write size even if we didn't consume all */ return size; } static ssize_t config_num_requests_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { int rc; mutex_lock(&test_fw_mutex); if (test_fw_config->reqs) { pr_err("Must call release_all_firmware prior to changing config\n"); rc = -EINVAL; mutex_unlock(&test_fw_mutex); goto out; } mutex_unlock(&test_fw_mutex); rc = test_dev_config_update_u8(buf, count, &test_fw_config->num_requests); out: return rc; } static ssize_t config_read_fw_idx_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { return test_dev_config_update_u8(buf, count, &test_fw_config->read_fw_idx); } The function test_dev_config_update_u8() is called from both the locked and the unlocked context, function config_num_requests_store() and config_read_fw_idx_store() which can both be called asynchronously as they are driver's methods, while test_dev_config_update_u8() and siblings change their argument pointed to by u8 *cfg or similar pointer. To avoid deadlock on test_fw_mutex, the lock is dropped before calling test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8() itself, but alas this creates a race condition. Having two locks wouldn't assure a race-proof mutual exclusion. This situation is best avoided by the introduction of a new, unlocked function __test_dev_config_update_u8() which can be called from the locked context and reducing test_dev_config_update_u8() to: static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg) { int ret; mutex_lock(&test_fw_mutex); ret = __test_dev_config_update_u8(buf, size, cfg); mutex_unlock(&test_fw_mutex); return ret; } doing the locking and calling the unlocked primitive, which enables both locked and unlocked versions without duplication of code. The similar approach was applied to all functions called from the locked and the unlocked context, which safely mitigates both deadlocks and race conditions in the driver. __test_dev_config_update_bool(), __test_dev_config_update_u8() and __test_dev_config_update_size_t() unlocked versions of the functions were introduced to be called from the locked contexts as a workaround without releasing the main driver's lock and thereof causing a race condition. The test_dev_config_update_bool(), test_dev_config_update_u8() and test_dev_config_update_size_t() locked versions of the functions are being called from driver methods without the unnecessary multiplying of the locking and unlocking code for each method, and complicating the code with saving of the return value across lock. Fixes: 7feebfa487b92 ("test_firmware: add support for request_firmware_into_buf") Cc: Luis Chamberlain <mcgrof(a)kernel.org> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: Russ Weight <russell.h.weight(a)intel.com> Cc: Takashi Iwai <tiwai(a)suse.de> Cc: Tianfei Zhang <tianfei.zhang(a)intel.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Colin Ian King <colin.i.king(a)gmail.com> Cc: Randy Dunlap <rdunlap(a)infradead.org> Cc: linux-kselftest(a)vger.kernel.org Cc: stable(a)vger.kernel.org # v5.4 Suggested-by: Dan Carpenter <error27(a)gmail.com> Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr> Link: https://lore.kernel.org/r/20230509084746.48259-1-mirsad.todorovac@alu.unizg… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- lib/test_firmware.c | 52 ++++++++++++++++++++++++++++++--------------- 1 file changed, 35 insertions(+), 17 deletions(-) diff --git a/lib/test_firmware.c b/lib/test_firmware.c index b99cf0a50a698..4884057eb53f0 100644 --- a/lib/test_firmware.c +++ b/lib/test_firmware.c @@ -321,16 +321,26 @@ static ssize_t config_test_show_str(char *dst, return len; } -static int test_dev_config_update_bool(const char *buf, size_t size, +static inline int __test_dev_config_update_bool(const char *buf, size_t size, bool *cfg) { int ret; - mutex_lock(&test_fw_mutex); if (kstrtobool(buf, cfg) < 0) ret = -EINVAL; else ret = size; + + return ret; +} + +static int test_dev_config_update_bool(const char *buf, size_t size, + bool *cfg) +{ + int ret; + + mutex_lock(&test_fw_mutex); + ret = __test_dev_config_update_bool(buf, size, cfg); mutex_unlock(&test_fw_mutex); return ret; @@ -341,7 +351,8 @@ static ssize_t test_dev_config_show_bool(char *buf, bool val) return snprintf(buf, PAGE_SIZE, "%d\n", val); } -static int test_dev_config_update_size_t(const char *buf, +static int __test_dev_config_update_size_t( + const char *buf, size_t size, size_t *cfg) { @@ -352,9 +363,7 @@ static int test_dev_config_update_size_t(const char *buf, if (ret) return ret; - mutex_lock(&test_fw_mutex); *(size_t *)cfg = new; - mutex_unlock(&test_fw_mutex); /* Always return full write size even if we didn't consume all */ return size; @@ -370,7 +379,7 @@ static ssize_t test_dev_config_show_int(char *buf, int val) return snprintf(buf, PAGE_SIZE, "%d\n", val); } -static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg) +static int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg) { u8 val; int ret; @@ -379,14 +388,23 @@ static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg) if (ret) return ret; - mutex_lock(&test_fw_mutex); *(u8 *)cfg = val; - mutex_unlock(&test_fw_mutex); /* Always return full write size even if we didn't consume all */ return size; } +static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg) +{ + int ret; + + mutex_lock(&test_fw_mutex); + ret = __test_dev_config_update_u8(buf, size, cfg); + mutex_unlock(&test_fw_mutex); + + return ret; +} + static ssize_t test_dev_config_show_u8(char *buf, u8 val) { return snprintf(buf, PAGE_SIZE, "%u\n", val); @@ -413,10 +431,10 @@ static ssize_t config_num_requests_store(struct device *dev, mutex_unlock(&test_fw_mutex); goto out; } - mutex_unlock(&test_fw_mutex); - rc = test_dev_config_update_u8(buf, count, - &test_fw_config->num_requests); + rc = __test_dev_config_update_u8(buf, count, + &test_fw_config->num_requests); + mutex_unlock(&test_fw_mutex); out: return rc; @@ -460,10 +478,10 @@ static ssize_t config_buf_size_store(struct device *dev, mutex_unlock(&test_fw_mutex); goto out; } - mutex_unlock(&test_fw_mutex); - rc = test_dev_config_update_size_t(buf, count, - &test_fw_config->buf_size); + rc = __test_dev_config_update_size_t(buf, count, + &test_fw_config->buf_size); + mutex_unlock(&test_fw_mutex); out: return rc; @@ -490,10 +508,10 @@ static ssize_t config_file_offset_store(struct device *dev, mutex_unlock(&test_fw_mutex); goto out; } - mutex_unlock(&test_fw_mutex); - rc = test_dev_config_update_size_t(buf, count, - &test_fw_config->file_offset); + rc = __test_dev_config_update_size_t(buf, count, + &test_fw_config->file_offset); + mutex_unlock(&test_fw_mutex); out: return rc; -- 2.39.2

2 years

1
0
0 0

[PATCH 5.15 003/107] test_firmware: fix a memory leak with reqs buffer

by Greg Kroah-Hartman

From: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr> [ Upstream commit be37bed754ed90b2655382f93f9724b3c1aae847 ] Dan Carpenter spotted that test_fw_config->reqs will be leaked if trigger_batched_requests_store() is called two or more times. The same appears with trigger_batched_requests_async_store(). This bug wasn't trigger by the tests, but observed by Dan's visual inspection of the code. The recommended workaround was to return -EBUSY if test_fw_config->reqs is already allocated. Fixes: 7feebfa487b92 ("test_firmware: add support for request_firmware_into_buf") Cc: Luis Chamberlain <mcgrof(a)kernel.org> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: Russ Weight <russell.h.weight(a)intel.com> Cc: Tianfei Zhang <tianfei.zhang(a)intel.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Colin Ian King <colin.i.king(a)gmail.com> Cc: Randy Dunlap <rdunlap(a)infradead.org> Cc: linux-kselftest(a)vger.kernel.org Cc: stable(a)vger.kernel.org # v5.4 Suggested-by: Dan Carpenter <error27(a)gmail.com> Suggested-by: Takashi Iwai <tiwai(a)suse.de> Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr> Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org> Acked-by: Luis Chamberlain <mcgrof(a)kernel.org> Link: https://lore.kernel.org/r/20230509084746.48259-2-mirsad.todorovac@alu.unizg… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- lib/test_firmware.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/lib/test_firmware.c b/lib/test_firmware.c index 4ad01dbe7e729..2a4078946a3fd 100644 --- a/lib/test_firmware.c +++ b/lib/test_firmware.c @@ -863,6 +863,11 @@ static ssize_t trigger_batched_requests_store(struct device *dev, mutex_lock(&test_fw_mutex); + if (test_fw_config->reqs) { + rc = -EBUSY; + goto out_bail; + } + test_fw_config->reqs = vzalloc(array3_size(sizeof(struct test_batched_req), test_fw_config->num_requests, 2)); @@ -962,6 +967,11 @@ ssize_t trigger_batched_requests_async_store(struct device *dev, mutex_lock(&test_fw_mutex); + if (test_fw_config->reqs) { + rc = -EBUSY; + goto out_bail; + } + test_fw_config->reqs = vzalloc(array3_size(sizeof(struct test_batched_req), test_fw_config->num_requests, 2)); -- 2.39.2

2 years

1
0
0 0

[PATCH 5.15 002/107] test_firmware: prevent race conditions by a correct implementation of locking

by Greg Kroah-Hartman

From: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr> [ Upstream commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 ] Dan Carpenter spotted a race condition in a couple of situations like these in the test_firmware driver: static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg) { u8 val; int ret; ret = kstrtou8(buf, 10, &val); if (ret) return ret; mutex_lock(&test_fw_mutex); *(u8 *)cfg = val; mutex_unlock(&test_fw_mutex); /* Always return full write size even if we didn't consume all */ return size; } static ssize_t config_num_requests_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { int rc; mutex_lock(&test_fw_mutex); if (test_fw_config->reqs) { pr_err("Must call release_all_firmware prior to changing config\n"); rc = -EINVAL; mutex_unlock(&test_fw_mutex); goto out; } mutex_unlock(&test_fw_mutex); rc = test_dev_config_update_u8(buf, count, &test_fw_config->num_requests); out: return rc; } static ssize_t config_read_fw_idx_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { return test_dev_config_update_u8(buf, count, &test_fw_config->read_fw_idx); } The function test_dev_config_update_u8() is called from both the locked and the unlocked context, function config_num_requests_store() and config_read_fw_idx_store() which can both be called asynchronously as they are driver's methods, while test_dev_config_update_u8() and siblings change their argument pointed to by u8 *cfg or similar pointer. To avoid deadlock on test_fw_mutex, the lock is dropped before calling test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8() itself, but alas this creates a race condition. Having two locks wouldn't assure a race-proof mutual exclusion. This situation is best avoided by the introduction of a new, unlocked function __test_dev_config_update_u8() which can be called from the locked context and reducing test_dev_config_update_u8() to: static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg) { int ret; mutex_lock(&test_fw_mutex); ret = __test_dev_config_update_u8(buf, size, cfg); mutex_unlock(&test_fw_mutex); return ret; } doing the locking and calling the unlocked primitive, which enables both locked and unlocked versions without duplication of code. The similar approach was applied to all functions called from the locked and the unlocked context, which safely mitigates both deadlocks and race conditions in the driver. __test_dev_config_update_bool(), __test_dev_config_update_u8() and __test_dev_config_update_size_t() unlocked versions of the functions were introduced to be called from the locked contexts as a workaround without releasing the main driver's lock and thereof causing a race condition. The test_dev_config_update_bool(), test_dev_config_update_u8() and test_dev_config_update_size_t() locked versions of the functions are being called from driver methods without the unnecessary multiplying of the locking and unlocking code for each method, and complicating the code with saving of the return value across lock. Fixes: 7feebfa487b92 ("test_firmware: add support for request_firmware_into_buf") Cc: Luis Chamberlain <mcgrof(a)kernel.org> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: Russ Weight <russell.h.weight(a)intel.com> Cc: Takashi Iwai <tiwai(a)suse.de> Cc: Tianfei Zhang <tianfei.zhang(a)intel.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Colin Ian King <colin.i.king(a)gmail.com> Cc: Randy Dunlap <rdunlap(a)infradead.org> Cc: linux-kselftest(a)vger.kernel.org Cc: stable(a)vger.kernel.org # v5.4 Suggested-by: Dan Carpenter <error27(a)gmail.com> Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr> Link: https://lore.kernel.org/r/20230509084746.48259-1-mirsad.todorovac@alu.unizg… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- lib/test_firmware.c | 52 ++++++++++++++++++++++++++++++--------------- 1 file changed, 35 insertions(+), 17 deletions(-) diff --git a/lib/test_firmware.c b/lib/test_firmware.c index 0b4e3de3f1748..4ad01dbe7e729 100644 --- a/lib/test_firmware.c +++ b/lib/test_firmware.c @@ -321,16 +321,26 @@ static ssize_t config_test_show_str(char *dst, return len; } -static int test_dev_config_update_bool(const char *buf, size_t size, +static inline int __test_dev_config_update_bool(const char *buf, size_t size, bool *cfg) { int ret; - mutex_lock(&test_fw_mutex); if (kstrtobool(buf, cfg) < 0) ret = -EINVAL; else ret = size; + + return ret; +} + +static int test_dev_config_update_bool(const char *buf, size_t size, + bool *cfg) +{ + int ret; + + mutex_lock(&test_fw_mutex); + ret = __test_dev_config_update_bool(buf, size, cfg); mutex_unlock(&test_fw_mutex); return ret; @@ -341,7 +351,8 @@ static ssize_t test_dev_config_show_bool(char *buf, bool val) return snprintf(buf, PAGE_SIZE, "%d\n", val); } -static int test_dev_config_update_size_t(const char *buf, +static int __test_dev_config_update_size_t( + const char *buf, size_t size, size_t *cfg) { @@ -352,9 +363,7 @@ static int test_dev_config_update_size_t(const char *buf, if (ret) return ret; - mutex_lock(&test_fw_mutex); *(size_t *)cfg = new; - mutex_unlock(&test_fw_mutex); /* Always return full write size even if we didn't consume all */ return size; @@ -370,7 +379,7 @@ static ssize_t test_dev_config_show_int(char *buf, int val) return snprintf(buf, PAGE_SIZE, "%d\n", val); } -static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg) +static int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg) { u8 val; int ret; @@ -379,14 +388,23 @@ static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg) if (ret) return ret; - mutex_lock(&test_fw_mutex); *(u8 *)cfg = val; - mutex_unlock(&test_fw_mutex); /* Always return full write size even if we didn't consume all */ return size; } +static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg) +{ + int ret; + + mutex_lock(&test_fw_mutex); + ret = __test_dev_config_update_u8(buf, size, cfg); + mutex_unlock(&test_fw_mutex); + + return ret; +} + static ssize_t test_dev_config_show_u8(char *buf, u8 val) { return snprintf(buf, PAGE_SIZE, "%u\n", val); @@ -413,10 +431,10 @@ static ssize_t config_num_requests_store(struct device *dev, mutex_unlock(&test_fw_mutex); goto out; } - mutex_unlock(&test_fw_mutex); - rc = test_dev_config_update_u8(buf, count, - &test_fw_config->num_requests); + rc = __test_dev_config_update_u8(buf, count, + &test_fw_config->num_requests); + mutex_unlock(&test_fw_mutex); out: return rc; @@ -460,10 +478,10 @@ static ssize_t config_buf_size_store(struct device *dev, mutex_unlock(&test_fw_mutex); goto out; } - mutex_unlock(&test_fw_mutex); - rc = test_dev_config_update_size_t(buf, count, - &test_fw_config->buf_size); + rc = __test_dev_config_update_size_t(buf, count, + &test_fw_config->buf_size); + mutex_unlock(&test_fw_mutex); out: return rc; @@ -490,10 +508,10 @@ static ssize_t config_file_offset_store(struct device *dev, mutex_unlock(&test_fw_mutex); goto out; } - mutex_unlock(&test_fw_mutex); - rc = test_dev_config_update_size_t(buf, count, - &test_fw_config->file_offset); + rc = __test_dev_config_update_size_t(buf, count, + &test_fw_config->file_offset); + mutex_unlock(&test_fw_mutex); out: return rc; -- 2.39.2

2 years

1
0
0 0

[PATCH 6.1 003/166] test_firmware: prevent race conditions by a correct implementation of locking

by Greg Kroah-Hartman

From: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr> [ Upstream commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 ] Dan Carpenter spotted a race condition in a couple of situations like these in the test_firmware driver: static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg) { u8 val; int ret; ret = kstrtou8(buf, 10, &val); if (ret) return ret; mutex_lock(&test_fw_mutex); *(u8 *)cfg = val; mutex_unlock(&test_fw_mutex); /* Always return full write size even if we didn't consume all */ return size; } static ssize_t config_num_requests_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { int rc; mutex_lock(&test_fw_mutex); if (test_fw_config->reqs) { pr_err("Must call release_all_firmware prior to changing config\n"); rc = -EINVAL; mutex_unlock(&test_fw_mutex); goto out; } mutex_unlock(&test_fw_mutex); rc = test_dev_config_update_u8(buf, count, &test_fw_config->num_requests); out: return rc; } static ssize_t config_read_fw_idx_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { return test_dev_config_update_u8(buf, count, &test_fw_config->read_fw_idx); } The function test_dev_config_update_u8() is called from both the locked and the unlocked context, function config_num_requests_store() and config_read_fw_idx_store() which can both be called asynchronously as they are driver's methods, while test_dev_config_update_u8() and siblings change their argument pointed to by u8 *cfg or similar pointer. To avoid deadlock on test_fw_mutex, the lock is dropped before calling test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8() itself, but alas this creates a race condition. Having two locks wouldn't assure a race-proof mutual exclusion. This situation is best avoided by the introduction of a new, unlocked function __test_dev_config_update_u8() which can be called from the locked context and reducing test_dev_config_update_u8() to: static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg) { int ret; mutex_lock(&test_fw_mutex); ret = __test_dev_config_update_u8(buf, size, cfg); mutex_unlock(&test_fw_mutex); return ret; } doing the locking and calling the unlocked primitive, which enables both locked and unlocked versions without duplication of code. The similar approach was applied to all functions called from the locked and the unlocked context, which safely mitigates both deadlocks and race conditions in the driver. __test_dev_config_update_bool(), __test_dev_config_update_u8() and __test_dev_config_update_size_t() unlocked versions of the functions were introduced to be called from the locked contexts as a workaround without releasing the main driver's lock and thereof causing a race condition. The test_dev_config_update_bool(), test_dev_config_update_u8() and test_dev_config_update_size_t() locked versions of the functions are being called from driver methods without the unnecessary multiplying of the locking and unlocking code for each method, and complicating the code with saving of the return value across lock. Fixes: 7feebfa487b92 ("test_firmware: add support for request_firmware_into_buf") Cc: Luis Chamberlain <mcgrof(a)kernel.org> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: Russ Weight <russell.h.weight(a)intel.com> Cc: Takashi Iwai <tiwai(a)suse.de> Cc: Tianfei Zhang <tianfei.zhang(a)intel.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Colin Ian King <colin.i.king(a)gmail.com> Cc: Randy Dunlap <rdunlap(a)infradead.org> Cc: linux-kselftest(a)vger.kernel.org Cc: stable(a)vger.kernel.org # v5.4 Suggested-by: Dan Carpenter <error27(a)gmail.com> Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr> Link: https://lore.kernel.org/r/20230509084746.48259-1-mirsad.todorovac@alu.unizg… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- lib/test_firmware.c | 52 ++++++++++++++++++++++++++++++--------------- 1 file changed, 35 insertions(+), 17 deletions(-) diff --git a/lib/test_firmware.c b/lib/test_firmware.c index 6ef3e6926da8a..13d3fa6aa972c 100644 --- a/lib/test_firmware.c +++ b/lib/test_firmware.c @@ -360,16 +360,26 @@ static ssize_t config_test_show_str(char *dst, return len; } -static int test_dev_config_update_bool(const char *buf, size_t size, +static inline int __test_dev_config_update_bool(const char *buf, size_t size, bool *cfg) { int ret; - mutex_lock(&test_fw_mutex); if (kstrtobool(buf, cfg) < 0) ret = -EINVAL; else ret = size; + + return ret; +} + +static int test_dev_config_update_bool(const char *buf, size_t size, + bool *cfg) +{ + int ret; + + mutex_lock(&test_fw_mutex); + ret = __test_dev_config_update_bool(buf, size, cfg); mutex_unlock(&test_fw_mutex); return ret; @@ -380,7 +390,8 @@ static ssize_t test_dev_config_show_bool(char *buf, bool val) return snprintf(buf, PAGE_SIZE, "%d\n", val); } -static int test_dev_config_update_size_t(const char *buf, +static int __test_dev_config_update_size_t( + const char *buf, size_t size, size_t *cfg) { @@ -391,9 +402,7 @@ static int test_dev_config_update_size_t(const char *buf, if (ret) return ret; - mutex_lock(&test_fw_mutex); *(size_t *)cfg = new; - mutex_unlock(&test_fw_mutex); /* Always return full write size even if we didn't consume all */ return size; @@ -409,7 +418,7 @@ static ssize_t test_dev_config_show_int(char *buf, int val) return snprintf(buf, PAGE_SIZE, "%d\n", val); } -static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg) +static int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg) { u8 val; int ret; @@ -418,14 +427,23 @@ static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg) if (ret) return ret; - mutex_lock(&test_fw_mutex); *(u8 *)cfg = val; - mutex_unlock(&test_fw_mutex); /* Always return full write size even if we didn't consume all */ return size; } +static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg) +{ + int ret; + + mutex_lock(&test_fw_mutex); + ret = __test_dev_config_update_u8(buf, size, cfg); + mutex_unlock(&test_fw_mutex); + + return ret; +} + static ssize_t test_dev_config_show_u8(char *buf, u8 val) { return snprintf(buf, PAGE_SIZE, "%u\n", val); @@ -478,10 +496,10 @@ static ssize_t config_num_requests_store(struct device *dev, mutex_unlock(&test_fw_mutex); goto out; } - mutex_unlock(&test_fw_mutex); - rc = test_dev_config_update_u8(buf, count, - &test_fw_config->num_requests); + rc = __test_dev_config_update_u8(buf, count, + &test_fw_config->num_requests); + mutex_unlock(&test_fw_mutex); out: return rc; @@ -525,10 +543,10 @@ static ssize_t config_buf_size_store(struct device *dev, mutex_unlock(&test_fw_mutex); goto out; } - mutex_unlock(&test_fw_mutex); - rc = test_dev_config_update_size_t(buf, count, - &test_fw_config->buf_size); + rc = __test_dev_config_update_size_t(buf, count, + &test_fw_config->buf_size); + mutex_unlock(&test_fw_mutex); out: return rc; @@ -555,10 +573,10 @@ static ssize_t config_file_offset_store(struct device *dev, mutex_unlock(&test_fw_mutex); goto out; } - mutex_unlock(&test_fw_mutex); - rc = test_dev_config_update_size_t(buf, count, - &test_fw_config->file_offset); + rc = __test_dev_config_update_size_t(buf, count, + &test_fw_config->file_offset); + mutex_unlock(&test_fw_mutex); out: return rc; -- 2.39.2

2 years

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror