This commit removed an extra check for zero-length ranges, and folded it
into the common validate_range() helper used by all UFFD ioctls.
It failed to notice though that UFFDIO_COPY *only* called validate_range
on the dst range, not the src range. So removing this check actually let
us proceed with zero-length source ranges, eventually hitting a BUG
further down in the call stack.
The correct fix seems clear: call validate_range() on the src range too.
Other ioctls are not affected by this, as they only have one range, not
two (src + dst).
Reported-by: syzbot+42309678e0bc7b32f8e9(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=42309678e0bc7b32f8e9
Signed-off-by: Axel Rasmussen <axelrasmussen(a)google.com>
---
fs/userfaultfd.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 53a7220c4679..36d233759233 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1759,6 +1759,9 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
sizeof(uffdio_copy)-sizeof(__s64)))
goto out;
+ ret = validate_range(ctx->mm, uffdio_copy.src, uffdio_copy.len);
+ if (ret)
+ goto out;
ret = validate_range(ctx->mm, uffdio_copy.dst, uffdio_copy.len);
if (ret)
goto out;
--
2.41.0.255.g8b1d071c50-goog
The arm64 Guarded Control Stack (GCS) feature provides support for
hardware protected stacks of return addresses, intended to provide
hardening against return oriented programming (ROP) attacks and to make
it easier to gather call stacks for applications such as profiling.
When GCS is active a secondary stack called the Guarded Control Stack is
maintained, protected with a memory attribute which means that it can
only be written with specific GCS operations. When a BL is executed the
value stored in LR is also pushed onto the GCS, and when a RET is
executed the top of the GCS is popped and compared to LR with a fault
being raised if the values do not match. GCS operations may only be
performed on GCS pages, a data abort is generated if they are not.
This series implements support for use of GCS by userspace, along with
support for use of GCS within KVM guests. It does not enable use of GCS
by either EL1 or EL2. Executables are started without GCS and must use
a prctl() to enable it, it is expected that this will be done very early
in application execution by the dynamic linker or other startup code.
x86 has an equivalent feature called shadow stacks, this series depends
on the x86 patches for generic memory management support for the new
guarded/shadow stack page type and shares APIs as much as possible. As
there has been extensive discussion with the wider community around the
ABI for shadow stacks I have as far as practical kept implementation
decisions close to those for x86, anticipating that review would lead to
similar conclusions in the absence of strong reasoning for divergence.
The main divergence I am concious of is that x86 allows shadow stack to
be enabled and disabled repeatedly, freeing the shadow stack for the
thread whenever disabled, while this implementation keeps the GCS
allocated after disable but refuses to reenable it. This is to avoid
races with things actively walking the GCS during a disable, we do
anticipate that some systems will wish to disable GCS at runtime but are
not aware of any demand for subsequently reenabling it.
x86 uses an arch_prctl() to manage enable and disable, since only x86
and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a
patch set for the equivalent RISC-V zisslpcfi feature which I initially
adopted fairly directly but following review feedback has been reviewed
quite a bit.
There is an open issue with support for CRIU, on x86 this required the
ability to set the GCS mode via ptrace. This series supports
configuring mode bits other than enable/disable via ptrace but it needs
to be confirmed if this is sufficient.
There's a few bits where I'm not convinced with where I've placed
things, in particular the GCS write operation is in the GCS header not
in uaccess.h, I wasn't sure what was clearest there and am probably too
close to the code to have a clear opinion. The reporting of GCS in
/proc/PID/smaps is also a bit awkward.
The series depends on the x86 shadow stack support:
https://lore.kernel.org/lkml/20230227222957.24501-1-rick.p.edgecombe@intel.…
I've rebased this onto v6.5-rc3 but not included it in the series in
order to avoid confusion with Rick's work and cut down the size of the
series, you can see the branch at:
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc.git arm64-gcs
[1] https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v3:
- Rebase onto v6.5-rc4.
- Add a GCS barrier on context switch.
- Add a GCS stress test.
- Link to v2: https://lore.kernel.org/r/20230724-arm64-gcs-v2-0-dc2c1d44c2eb@kernel.org
Changes in v2:
- Rebase onto v6.5-rc3.
- Rework prctl() interface to allow each bit to be locked independently.
- map_shadow_stack() now places the cap token based on the size
requested by the caller not the actual space allocated.
- Mode changes other than enable via ptrace are now supported.
- Expand test coverage.
- Various smaller fixes and adjustments.
- Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org
---
Mark Brown (36):
prctl: arch-agnostic prctl for shadow stack
arm64: Document boot requirements for Guarded Control Stacks
arm64/gcs: Document the ABI for Guarded Control Stacks
arm64/sysreg: Add new system registers for GCS
arm64/sysreg: Add definitions for architected GCS caps
arm64/gcs: Add manual encodings of GCS instructions
arm64/gcs: Provide copy_to_user_gcs()
arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS)
arm64/mm: Allocate PIE slots for EL0 guarded control stack
mm: Define VM_SHADOW_STACK for arm64 when we support GCS
arm64/mm: Map pages for guarded control stack
KVM: arm64: Manage GCS registers for guests
arm64/gcs: Allow GCS usage at EL0 and EL1
arm64/idreg: Add overrride for GCS
arm64/hwcap: Add hwcap for GCS
arm64/traps: Handle GCS exceptions
arm64/mm: Handle GCS data aborts
arm64/gcs: Context switch GCS state for EL0
arm64/gcs: Allocate a new GCS for threads with GCS enabled
arm64/gcs: Implement shadow stack prctl() interface
arm64/mm: Implement map_shadow_stack()
arm64/signal: Set up and restore the GCS context for signal handlers
arm64/signal: Expose GCS state in signal frames
arm64/ptrace: Expose GCS via ptrace and core files
arm64: Add Kconfig for Guarded Control Stack (GCS)
kselftest/arm64: Verify the GCS hwcap
kselftest/arm64: Add GCS as a detected feature in the signal tests
kselftest/arm64: Add framework support for GCS to signal handling tests
kselftest/arm64: Allow signals tests to specify an expected si_code
kselftest/arm64: Always run signals tests with GCS enabled
kselftest/arm64: Add very basic GCS test program
kselftest/arm64: Add a GCS test program built with the system libc
kselftest/arm64: Add test coverage for GCS mode locking
selftests/arm64: Add GCS signal tests
kselftest/arm64: Add a GCS stress test
kselftest/arm64: Enable GCS for the FP stress tests
Documentation/admin-guide/kernel-parameters.txt | 3 +
Documentation/arch/arm64/booting.rst | 22 +
Documentation/arch/arm64/elf_hwcaps.rst | 3 +
Documentation/arch/arm64/gcs.rst | 225 +++++++++
Documentation/arch/arm64/index.rst | 1 +
Documentation/filesystems/proc.rst | 2 +-
arch/arm64/Kconfig | 19 +
arch/arm64/include/asm/cpufeature.h | 6 +
arch/arm64/include/asm/el2_setup.h | 17 +
arch/arm64/include/asm/esr.h | 28 +-
arch/arm64/include/asm/exception.h | 2 +
arch/arm64/include/asm/gcs.h | 106 ++++
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/asm/kvm_arm.h | 4 +-
arch/arm64/include/asm/kvm_host.h | 12 +
arch/arm64/include/asm/pgtable-prot.h | 14 +-
arch/arm64/include/asm/processor.h | 7 +
arch/arm64/include/asm/sysreg.h | 20 +
arch/arm64/include/asm/uaccess.h | 42 ++
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/ptrace.h | 8 +
arch/arm64/include/uapi/asm/sigcontext.h | 9 +
arch/arm64/kernel/cpufeature.c | 19 +
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kernel/entry-common.c | 23 +
arch/arm64/kernel/idreg-override.c | 2 +
arch/arm64/kernel/process.c | 85 ++++
arch/arm64/kernel/ptrace.c | 59 +++
arch/arm64/kernel/signal.c | 237 ++++++++-
arch/arm64/kernel/traps.c | 11 +
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 17 +
arch/arm64/kvm/sys_regs.c | 22 +
arch/arm64/mm/Makefile | 1 +
arch/arm64/mm/fault.c | 78 ++-
arch/arm64/mm/gcs.c | 226 +++++++++
arch/arm64/mm/mmap.c | 17 +-
arch/arm64/tools/cpucaps | 1 +
arch/arm64/tools/sysreg | 55 +++
fs/proc/task_mmu.c | 3 +
include/linux/mm.h | 16 +-
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 5 +-
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 22 +
kernel/sys.c | 30 ++
kernel/sys_ni.c | 1 +
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/abi/hwcap.c | 19 +
tools/testing/selftests/arm64/fp/assembler.h | 15 +
tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 +
tools/testing/selftests/arm64/fp/sve-test.S | 2 +
tools/testing/selftests/arm64/fp/za-test.S | 2 +
tools/testing/selftests/arm64/fp/zt-test.S | 2 +
tools/testing/selftests/arm64/gcs/.gitignore | 5 +
tools/testing/selftests/arm64/gcs/Makefile | 23 +
tools/testing/selftests/arm64/gcs/asm-offsets.h | 0
tools/testing/selftests/arm64/gcs/basic-gcs.c | 351 ++++++++++++++
tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 ++++++++
.../selftests/arm64/gcs/gcs-stress-thread.S | 311 ++++++++++++
tools/testing/selftests/arm64/gcs/gcs-stress.c | 532 +++++++++++++++++++++
tools/testing/selftests/arm64/gcs/gcs-util.h | 87 ++++
tools/testing/selftests/arm64/gcs/libc-gcs.c | 372 ++++++++++++++
tools/testing/selftests/arm64/signal/.gitignore | 1 +
.../testing/selftests/arm64/signal/test_signals.c | 17 +-
.../testing/selftests/arm64/signal/test_signals.h | 6 +
.../selftests/arm64/signal/test_signals_utils.c | 32 +-
.../selftests/arm64/signal/test_signals_utils.h | 39 ++
.../arm64/signal/testcases/gcs_exception_fault.c | 59 +++
.../selftests/arm64/signal/testcases/gcs_frame.c | 78 +++
.../arm64/signal/testcases/gcs_write_fault.c | 67 +++
.../selftests/arm64/signal/testcases/testcases.c | 7 +
.../selftests/arm64/signal/testcases/testcases.h | 1 +
72 files changed, 3683 insertions(+), 34 deletions(-)
---
base-commit: 730a197c555893dfad0deebcace710d5c7425ba5
change-id: 20230303-arm64-gcs-e311ab0d8729
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Users can now select a
desired address space using a non-zero hint address to mmap. Previously,
requesting the default address space from mmap by passing zero as the hint
address would result in using the largest address space possible. Some
applications depend on empty bits in the virtual address space, like Go and
Java, so this patch provides more flexibility for application developers.
-Charlie
---
v9:
- Raise the mmap_end default to STACK_TOP_MAX to allow the address space to grow
beyond the default of sv48 on sv57 machines as suggested by Alexandre
- Some of the mmap macros had unnecessary conditionals that I have removed
v8:
- Fix RV32 and the RV32 compat mode of RV64 (suggested by Conor)
- Extract out addr and base from the mmap macros (suggested by Alexandre)
v7:
- Changing RLIMIT_STACK inside of an executing program does not trigger
arch_pick_mmap_layout(), so rewrite tests to change RLIMIT_STACK from a
script before executing tests. RLIMIT_STACK of infinity forces bottomup
mmap allocation.
- Make arch_get_mmap_base macro more readible by extracting out the rnd
calculation.
- Use MMAP_MIN_VA_BITS in TASK_UNMAPPED_BASE to support case when mmap
attempts to allocate address smaller than DEFAULT_MAP_WINDOW.
- Fix incorrect wording in documentation.
v6:
- Rebase onto the correct base
v5:
- Minor wording change in documentation
- Change some parenthesis in arch_get_mmap_ macros
- Added case for addr==0 in arch_get_mmap_ because without this, programs would
crash if RLIMIT_STACK was modified before executing the program. This was
tested using the libhugetlbfs tests.
v4:
- Split testcases/document patch into test cases, in-code documentation, and
formal documentation patches
- Modified the mmap_base macro to be more legible and better represent memory
layout
- Fixed documentation to better reflect the implmentation
- Renamed DEFAULT_VA_BITS to MMAP_VA_BITS
- Added additional test case for rlimit changes
---
Charlie Jenkins (4):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Document mmap changes
Documentation/riscv/vm-layout.rst | 22 +++++++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 29 +++++++--
arch/riscv/include/asm/processor.h | 52 +++++++++++++--
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/.gitignore | 2 +
tools/testing/selftests/riscv/mm/Makefile | 15 +++++
.../riscv/mm/testcases/mmap_bottomup.c | 35 ++++++++++
.../riscv/mm/testcases/mmap_default.c | 35 ++++++++++
.../selftests/riscv/mm/testcases/mmap_test.h | 64 +++++++++++++++++++
.../selftests/riscv/mm/testcases/run_mmap.sh | 12 ++++
11 files changed, 258 insertions(+), 12 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_default.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_test.h
create mode 100755 tools/testing/selftests/riscv/mm/testcases/run_mmap.sh
--
2.34.1
Replace the original fixed-size log buffer with a dynamically-
extending log.
Patch 1 provides the basic implementation. The following patches
add test cases, support for logging long strings, and an optimization
to the string formatting that is now more thoroughly testable.
Richard Fitzgerald (6):
kunit: Replace fixed-size log with dynamically-extending buffer
kunit: kunit-test: Add test cases for extending log buffer
kunit: Handle logging of lines longer than the fragment buffer size
kunit: kunit-test: Add test cases for logging very long lines
kunit: kunit-test: Add test of logging only a newline
kunit: Don't waste first attempt to format string in
kunit_log_append()
include/kunit/test.h | 25 +++-
lib/kunit/debugfs.c | 65 +++++++--
lib/kunit/kunit-test.c | 321 ++++++++++++++++++++++++++++++++++++++++-
lib/kunit/test.c | 127 +++++++++++++---
4 files changed, 489 insertions(+), 49 deletions(-)
--
2.30.2
KVM_GET_REG_LIST will dump all register IDs that are available to
KVM_GET/SET_ONE_REG and It's very useful to identify some platform
regression issue during VM migration.
Patch 1-7 re-structured the get-reg-list test in aarch64 to make some
of the code as common test framework that can be shared by riscv.
Patch 8 move reject_set check logic to a function so as to check for
different errno for different registers.
Patch 9 move finalize_vcpu back to run_test so that riscv can implement
its specific operation.
Patch 10 change to do the get/set operation only on present-blessed list.
Patch 11 add the skip_set facilities so that riscv can skip set operation
on some registers.
Patch 12 enabled the KVM_GET_REG_LIST API in riscv.
patch 13 added the corresponding kselftest for checking possible
register regressions.
The get-reg-list kvm selftest was ported from aarch64 and tested with
Linux v6.5-rc3 on a Qemu riscv64 virt machine.
---
Changed since v5:
* Rebase to v6.5-rc3
* Minor fix for Andrew's comments
Andrew Jones (7):
KVM: arm64: selftests: Replace str_with_index with strdup_printf
KVM: arm64: selftests: Drop SVE cap check in print_reg
KVM: arm64: selftests: Remove print_reg's dependency on vcpu_config
KVM: arm64: selftests: Rename vcpu_config and add to kvm_util.h
KVM: arm64: selftests: Delete core_reg_fixup
KVM: arm64: selftests: Split get-reg-list test code
KVM: arm64: selftests: Finish generalizing get-reg-list
Haibo Xu (6):
KVM: arm64: selftests: Move reject_set check logic to a function
KVM: arm64: selftests: Move finalize_vcpu back to run_test
KVM: selftests: Only do get/set tests on present blessed list
KVM: selftests: Add skip_set facility to get_reg_list test
KVM: riscv: Add KVM_GET_REG_LIST API support
KVM: riscv: selftests: Add get-reg-list test
Documentation/virt/kvm/api.rst | 2 +-
arch/riscv/kvm/vcpu.c | 375 +++++++++
tools/testing/selftests/kvm/Makefile | 13 +-
.../selftests/kvm/aarch64/get-reg-list.c | 554 ++-----------
tools/testing/selftests/kvm/get-reg-list.c | 401 +++++++++
.../selftests/kvm/include/kvm_util_base.h | 21 +
.../selftests/kvm/include/riscv/processor.h | 3 +
.../testing/selftests/kvm/include/test_util.h | 2 +
tools/testing/selftests/kvm/lib/test_util.c | 15 +
.../selftests/kvm/riscv/get-reg-list.c | 780 ++++++++++++++++++
10 files changed, 1670 insertions(+), 496 deletions(-)
create mode 100644 tools/testing/selftests/kvm/get-reg-list.c
create mode 100644 tools/testing/selftests/kvm/riscv/get-reg-list.c
--
2.34.1
Hello,
This patch series adds a new x86 arch specific BPF helper, bpf_rdtsc()
which can be used for reading the hardware time stamp counter (TSC.)
Currently the same counter is directly accessible from userspace
(using RDTSC instruction), and kernel space using various rdtsc_*()
APIs, however eBPF lacks the support.
The main usage for the TSC counter is for various profiling and timing
purposes, getting accurate cycle counter values. The counter can be
currently read from BPF programs by using the existing perf subsystem
services (bpf_perf_event_read()), however its usage is cumbersome at
best. Additionally, the perf subsystem provides relative value only
for the counter, but absolute values are desired by some use cases
like Wult [1]. The absolute value of TSC can be read with BPF programs
currently via some kprobe / bpf_core_read() magic (see [2], [3], [4] for
example), but this relies on accessing kernel internals and is not
stable API, and is pretty cumbersome. Thus, this patch proposes a new
arch x86 specific BPF helper to avoid the above issues.
-Tero
[1] https://github.com/intel/wult
[2] https://github.com/intel/wult/blob/c92237c95b898498faf41e6644983102d1fe5156…
[3] https://github.com/intel/wult/blob/c92237c95b898498faf41e6644983102d1fe5156…
[4] https://github.com/intel/wult/blob/c92237c95b898498faf41e6644983102d1fe5156…
From: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 upstream.
Dan Carpenter spotted a race condition in a couple of situations like
these in the test_firmware driver:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
ret = kstrtou8(buf, 10, &val);
if (ret)
return ret;
mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
static ssize_t config_num_requests_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
int rc;
mutex_lock(&test_fw_mutex);
if (test_fw_config->reqs) {
pr_err("Must call release_all_firmware prior to changing config\n");
rc = -EINVAL;
mutex_unlock(&test_fw_mutex);
goto out;
}
mutex_unlock(&test_fw_mutex);
rc = test_dev_config_update_u8(buf, count,
&test_fw_config->num_requests);
out:
return rc;
}
static ssize_t config_read_fw_idx_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
return test_dev_config_update_u8(buf, count,
&test_fw_config->read_fw_idx);
}
The function test_dev_config_update_u8() is called from both the locked
and the unlocked context, function config_num_requests_store() and
config_read_fw_idx_store() which can both be called asynchronously as
they are driver's methods, while test_dev_config_update_u8() and siblings
change their argument pointed to by u8 *cfg or similar pointer.
To avoid deadlock on test_fw_mutex, the lock is dropped before calling
test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8()
itself, but alas this creates a race condition.
Having two locks wouldn't assure a race-proof mutual exclusion.
This situation is best avoided by the introduction of a new, unlocked
function __test_dev_config_update_u8() which can be called from the locked
context and reducing test_dev_config_update_u8() to:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
mutex_lock(&test_fw_mutex);
ret = __test_dev_config_update_u8(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
}
doing the locking and calling the unlocked primitive, which enables both
locked and unlocked versions without duplication of code.
The similar approach was applied to all functions called from the locked
and the unlocked context, which safely mitigates both deadlocks and race
conditions in the driver.
__test_dev_config_update_bool(), __test_dev_config_update_u8() and
__test_dev_config_update_size_t() unlocked versions of the functions
were introduced to be called from the locked contexts as a workaround
without releasing the main driver's lock and thereof causing a race
condition.
The test_dev_config_update_bool(), test_dev_config_update_u8() and
test_dev_config_update_size_t() locked versions of the functions
are being called from driver methods without the unnecessary multiplying
of the locking and unlocking code for each method, and complicating
the code with saving of the return value across lock.
Fixes: 7feebfa487b92 ("test_firmware: add support for request_firmware_into_buf")
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Takashi Iwai <tiwai(a)suse.de>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v5.4
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
Link: https://lore.kernel.org/r/20230509084746.48259-1-mirsad.todorovac@alu.unizg…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
lib/test_firmware.c | 37 ++++++++++++++++++++++++++++---------
1 file changed, 28 insertions(+), 9 deletions(-)
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -301,16 +301,26 @@ static ssize_t config_test_show_str(char
return len;
}
-static int test_dev_config_update_bool(const char *buf, size_t size,
- bool *cfg)
+static inline int __test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
{
int ret;
- mutex_lock(&test_fw_mutex);
if (strtobool(buf, cfg) < 0)
ret = -EINVAL;
else
ret = size;
+
+ return ret;
+}
+
+static int test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_bool(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
@@ -340,7 +350,7 @@ static ssize_t test_dev_config_show_int(
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+static inline int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
long new;
@@ -352,14 +362,23 @@ static int test_dev_config_update_u8(con
if (new > U8_MAX)
return -EINVAL;
- mutex_lock(&test_fw_mutex);
*(u8 *)cfg = new;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
+static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_u8(buf, size, cfg);
+ mutex_unlock(&test_fw_mutex);
+
+ return ret;
+}
+
static ssize_t test_dev_config_show_u8(char *buf, u8 cfg)
{
u8 val;
@@ -392,10 +411,10 @@ static ssize_t config_num_requests_store
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_u8(buf, count,
- &test_fw_config->num_requests);
+ rc = __test_dev_config_update_u8(buf, count,
+ &test_fw_config->num_requests);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
This extension allows to use F_UNLCK on query, which currently returns
EINVAL. Instead it can be used to query the locks on a particular fd -
something that is not currently possible. The basic idea is that on
F_OFD_GETLK, F_UNLCK would "conflict" with (or query) any types of the
lock on the same fd, and ignore any locks on other fds.
Use-cases:
1. CRIU-alike scenario when you want to read the locking info from an
fd for the later reconstruction. This can now be done by setting
l_start and l_len to 0 to cover entire file range, and do F_OFD_GETLK.
In the loop you need to advance l_start past the returned lock ranges,
to eventually collect all locked ranges.
2. Implementing the lock checking/enforcing policy.
Say you want to implement an "auditor" module in your program,
that checks that the I/O is done only after the proper locking is
applied on a file region. In this case you need to know if the
particular region is locked on that fd, and if so - with what type
of the lock. If you would do that currently (without this extension)
then you can only check for the write locks, and for that you need to
probe the lock on your fd and then open the same file via another fd and
probe there. That way you can identify the write lock on a particular
fd, but such trick is non-atomic and complex. As for finding out the
read lock on a particular fd - impossible.
This extension allows to do such queries without any extra efforts.
3. Implementing the mandatory locking policy.
Suppose you want to make a policy where the write lock inhibits any
unlocked readers and writers. Currently you need to check if the
write lock is present on some other fd, and if it is not there - allow
the I/O operation. But because the write lock can appear at any moment,
you need to do that under some global lock, which can be released only
when the I/O operation is finished.
With the proposed extension you can instead just check the write lock
on your own fd first, and if it is there - allow the I/O operation on
that fd without using any global lock. Only if there is no write lock
on this fd, then you need to take global lock and check for a write
lock on other fds.
The second patch adds a test-case for OFD locks.
It tests both the generic things and the proposed extension.
The third patch is a proposed man page update for fcntl(2)
(not for the linux source tree)
Changes in v2:
- Dropped the l_pid extension patch and updated test-case accordingly.
Stas Sergeev (2):
fs/locks: F_UNLCK extension for F_OFD_GETLK
selftests: add OFD lock tests
fs/locks.c | 23 +++-
tools/testing/selftests/locking/Makefile | 2 +
tools/testing/selftests/locking/ofdlocks.c | 132 +++++++++++++++++++++
3 files changed, 154 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/locking/ofdlocks.c
CC: Jeff Layton <jlayton(a)kernel.org>
CC: Chuck Lever <chuck.lever(a)oracle.com>
CC: Alexander Viro <viro(a)zeniv.linux.org.uk>
CC: Christian Brauner <brauner(a)kernel.org>
CC: linux-fsdevel(a)vger.kernel.org
CC: linux-kernel(a)vger.kernel.org
CC: Shuah Khan <shuah(a)kernel.org>
CC: linux-kselftest(a)vger.kernel.org
CC: linux-api(a)vger.kernel.org
--
2.39.2
*Changes in v27:*
- Handle review comments and minor improvements
- Add performance improvement patch on top with test for easy review
*Changes in v26:*
- Code re-structurring and API changes in PAGEMAP_IOCTL
*Changes in v25*:
- Do proper filtering on hole as well (hole got missed earlier)
*Changes in v24*:
- Rebase on top of next-20230710
- Place WP markers in case of hole as well
*Changes in v23*:
- Set vec_buf_index in loop only when vec_buf_index is set
- Return -EFAULT instead of -EINVAL if vec is NULL
- Correctly return the walk ending address to the page granularity
*Changes in v22*:
- Interface change:
- Replace [start start + len) with [start, end)
- Return the ending address of the address walk in start
*Changes in v21*:
- Abort walk instead of returning error if WP is to be performed on
partial hugetlb
*Changes in v20*
- Correct PAGE_IS_FILE and add PAGE_IS_PFNZERO
*Changes in v19*
- Minor changes and interface updates
*Changes in v18*
- Rebase on top of next-20230613
- Minor updates
*Changes in v17*
- Rebase on top of next-20230606
- Minor improvements in PAGEMAP_SCAN IOCTL patch
*Changes in v16*
- Fix a corner case
- Add exclusive PM_SCAN_OP_WP back
*Changes in v15*
- Build fix (Add missed build fix in RESEND)
*Changes in v14*
- Fix build error caused by #ifdef added at last minute in some configs
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() and ResetWriteWatch() syscalls [1]. The GetWriteWatch()
retrieves the addresses of the pages that are written to in a region of
virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirty feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (5):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
fs/proc/task_mmu: Add fast paths to get/clear PAGE_IS_WRITTEN flag
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 64 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 715 +++++++++
fs/userfaultfd.c | 26 +-
include/linux/hugetlb.h | 1 +
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 59 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 34 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 59 +
tools/testing/selftests/mm/.gitignore | 2 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1491 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
16 files changed, 2527 insertions(+), 24 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
--
2.39.2
[ commit be37bed754ed90b2655382f93f9724b3c1aae847 upstream ]
Dan Carpenter spotted that test_fw_config->reqs will be leaked if
trigger_batched_requests_store() is called two or more times.
The same appears with trigger_batched_requests_async_store().
This bug wasn't triggered by the tests, but observed by Dan's visual
inspection of the code.
The recommended workaround was to return -EBUSY if test_fw_config->reqs
is already allocated.
Fixes: c92316bf8e94 ("test_firmware: add batched firmware tests")
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v4.14
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Suggested-by: Takashi Iwai <tiwai(a)suse.de>
Link: https://lore.kernel.org/r/20230509084746.48259-2-mirsad.todorovac@alu.unizg…
Signed-off-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
[ This fix is applied against the 4.14 stable branch. There are no changes to the ]
[ fix in code when compared to the upstread, only the reformatting for backport. ]
---
v2 -> v3:
minor clarifications in the versioning for the patchwork. not change to commit.
v1 -> v2:
removed the Reviewed-by: and Acked-by tags, as this is a slightly different patch and
those need to be reacquired
lib/test_firmware.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index 1c5e5246bf10..5318c5e18acf 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -621,6 +621,11 @@ static ssize_t trigger_batched_requests_store(struct device *dev,
mutex_lock(&test_fw_mutex);
+ if (test_fw_config->reqs) {
+ rc = -EBUSY;
+ goto out_bail;
+ }
+
test_fw_config->reqs = vzalloc(sizeof(struct test_batched_req) *
test_fw_config->num_requests * 2);
if (!test_fw_config->reqs) {
@@ -723,6 +728,11 @@ ssize_t trigger_batched_requests_async_store(struct device *dev,
mutex_lock(&test_fw_mutex);
+ if (test_fw_config->reqs) {
+ rc = -EBUSY;
+ goto out_bail;
+ }
+
test_fw_config->reqs = vzalloc(sizeof(struct test_batched_req) *
test_fw_config->num_requests * 2);
if (!test_fw_config->reqs) {
--
2.34.1
A missing break in kms_tests leads to kselftest hang when the
parameter -s is used.
In current code flow because of missing break in -s, -t parses
args spilled from -s and as -t accepts only valid values as 0,1
so any arg in -s >1 or <0, gets in ksm_test failure
This went undetected since, before the addition of option -t,
the next case -M would immediately break out of the switch
statement but that is no longer the case
Add the missing break statement.
----Before----
./ksm_tests -H -s 100
Invalid merge type
----After----
./ksm_tests -H -s 100
Number of normal pages: 0
Number of huge pages: 50
Total size: 100 MiB
Total time: 0.401732682 s
Average speed: 248.922 MiB/s
Fixes: 07115fcc15b4 ("selftests/mm: add new selftests for KSM")
Signed-off-by: Ayush Jain <ayush.jain3(a)amd.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
---
v1 -> v2
collect Reviewed-by from David
Updated Fixes tag from commit 9e7cb94ca218 to 07115fcc15b4
tools/testing/selftests/mm/ksm_tests.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mm/ksm_tests.c b/tools/testing/selftests/mm/ksm_tests.c
index 435acebdc325..380b691d3eb9 100644
--- a/tools/testing/selftests/mm/ksm_tests.c
+++ b/tools/testing/selftests/mm/ksm_tests.c
@@ -831,6 +831,7 @@ int main(int argc, char *argv[])
printf("Size must be greater than 0\n");
return KSFT_FAIL;
}
+ break;
case 't':
{
int tmp = atoi(optarg);
--
2.34.1
We want to replace iptables TPROXY with a BPF program at TC ingress.
To make this work in all cases we need to assign a SO_REUSEPORT socket
to an skb, which is currently prohibited. This series adds support for
such sockets to bpf_sk_assing.
I did some refactoring to cut down on the amount of duplicate code. The
key to this is to use INDIRECT_CALL in the reuseport helpers. To show
that this approach is not just beneficial to TC sk_assign I removed
duplicate code for bpf_sk_lookup as well.
Joint work with Daniel Borkmann.
Signed-off-by: Lorenz Bauer <lmb(a)isovalent.com>
---
Changes in v6:
- Reject unhashed UDP sockets in bpf_sk_assign to avoid ref leak
- Link to v5: https://lore.kernel.org/r/20230613-so-reuseport-v5-0-f6686a0dbce0@isovalent…
Changes in v5:
- Drop reuse_sk == sk check in inet[6]_steal_stock (Kuniyuki)
- Link to v4: https://lore.kernel.org/r/20230613-so-reuseport-v4-0-4ece76708bba@isovalent…
Changes in v4:
- WARN_ON_ONCE if reuseport socket is refcounted (Kuniyuki)
- Use inet[6]_ehashfn_t to shorten function declarations (Kuniyuki)
- Shuffle documentation patch around (Kuniyuki)
- Update commit message to explain why IPv6 needs EXPORT_SYMBOL
- Link to v3: https://lore.kernel.org/r/20230613-so-reuseport-v3-0-907b4cbb7b99@isovalent…
Changes in v3:
- Fix warning re udp_ehashfn and udp6_ehashfn (Simon)
- Return higher scoring connected UDP reuseport sockets (Kuniyuki)
- Fix ipv6 module builds
- Link to v2: https://lore.kernel.org/r/20230613-so-reuseport-v2-0-b7c69a342613@isovalent…
Changes in v2:
- Correct commit abbrev length (Kuniyuki)
- Reduce duplication (Kuniyuki)
- Add checks on sk_state (Martin)
- Split exporting inet[6]_lookup_reuseport into separate patch (Eric)
---
Daniel Borkmann (1):
selftests/bpf: Test that SO_REUSEPORT can be used with sk_assign helper
Lorenz Bauer (7):
udp: re-score reuseport groups when connected sockets are present
bpf: reject unhashed sockets in bpf_sk_assign
net: export inet_lookup_reuseport and inet6_lookup_reuseport
net: remove duplicate reuseport_lookup functions
net: document inet[6]_lookup_reuseport sk_state requirements
net: remove duplicate sk_lookup helpers
bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign
include/net/inet6_hashtables.h | 81 ++++++++-
include/net/inet_hashtables.h | 74 +++++++-
include/net/sock.h | 7 +-
include/uapi/linux/bpf.h | 3 -
net/core/filter.c | 4 +-
net/ipv4/inet_hashtables.c | 68 ++++---
net/ipv4/udp.c | 88 ++++-----
net/ipv6/inet6_hashtables.c | 71 +++++---
net/ipv6/udp.c | 98 ++++------
tools/include/uapi/linux/bpf.h | 3 -
tools/testing/selftests/bpf/network_helpers.c | 3 +
.../selftests/bpf/prog_tests/assign_reuse.c | 197 +++++++++++++++++++++
.../selftests/bpf/progs/test_assign_reuse.c | 142 +++++++++++++++
13 files changed, 660 insertions(+), 179 deletions(-)
---
base-commit: 6f5a630d7c57cd79b1f526a95e757311e32d41e5
change-id: 20230613-so-reuseport-e92c526173ee
Best regards,
--
Lorenz Bauer <lmb(a)isovalent.com>
iommufd gives userspace the capability to manipulate iommu subsytem.
e.g. DMA map/unmap etc. In the near future, it will support iommu nested
translation. Different platform vendors have different implementation for
the nested translation. For example, Intel VT-d supports using guest I/O
page table as the stage-1 translation table. This requires guest I/O page
table be compatible with hardware IOMMU. So before set up nested translation,
userspace needs to know the hardware iommu information to understand the
nested translation requirements.
This series reports the iommu hardware information for a given device
which has been bound to iommufd. It is preparation work for userspace to
allocate hwpt for given device. Like the nested translation support[1].
This series introduces an iommu op to report the iommu hardware info,
and an ioctl IOMMU_GET_HW_INFO is added to report such hardware info to
user. enum iommu_hw_info_type is defined to differentiate the iommu hardware
info reported to user hence user can decode them. This series only adds the
framework for iommu hw info reporting, the complete reporting path needs vendor
specific definition and driver support. The full code is available in [1]
as well.
[1] https://github.com/yiliu1765/iommufd/tree/wip/iommufd_nesting_08032023-yi
(only the hw_info report path is the latest, other parts is wip)
Change log:
v5:
- Return hw_info_type in the .hw_info op, hence drop hw_info_type field in iommu_ops (Kevin)
- Add Jason's r-b for patch 01
- Address coding style comments from Jason and Kevin w.r.t. patch 02, 03 and 04
v4: https://lore.kernel.org/linux-iommu/20230724105936.107042-1-yi.l.liu@intel.…
- Rename ioctl to IOMMU_GET_HW_INFO and structure to iommu_hw_info
- Move the iommufd_get_hw_info handler to main.c
- Place iommu_hw_info prior to iommu_hwpt_alloc
- Update the function namings accordingly
- Update uapi kdocs
v3: https://lore.kernel.org/linux-iommu/20230511143024.19542-1-yi.l.liu@intel.c…
- Add r-b from Baolu
- Rename IOMMU_HW_INFO_TYPE_DEFAULT to be IOMMU_HW_INFO_TYPE_NONE to
better suit what it means
- Let IOMMU_DEVICE_GET_HW_INFO succeed even the underlying iommu driver
does not have driver-specific data to report per below remark.
https://lore.kernel.org/kvm/ZAcwJSK%2F9UVI9LXu@nvidia.com/
v2: https://lore.kernel.org/linux-iommu/20230309075358.571567-1-yi.l.liu@intel.…
- Drop patch 05 of v1 as it is already covered by other series
- Rename the capability info to be iommu hardware info
v1: https://lore.kernel.org/linux-iommu/20230209041642.9346-1-yi.l.liu@intel.co…
Regards,
Yi Liu
Lu Baolu (1):
iommu: Add new iommu op to get iommu hardware information
Nicolin Chen (1):
iommufd/selftest: Add coverage for IOMMU_GET_HW_INFO ioctl
Yi Liu (2):
iommu: Move dev_iommu_ops() to private header
iommufd: Add IOMMU_GET_HW_INFO
drivers/iommu/iommu-priv.h | 11 +++
drivers/iommu/iommufd/iommufd_test.h | 9 +++
drivers/iommu/iommufd/main.c | 79 +++++++++++++++++++
drivers/iommu/iommufd/selftest.c | 16 ++++
include/linux/iommu.h | 20 +++--
include/uapi/linux/iommufd.h | 44 +++++++++++
tools/testing/selftests/iommu/iommufd.c | 17 +++-
tools/testing/selftests/iommu/iommufd_utils.h | 26 ++++++
8 files changed, 210 insertions(+), 12 deletions(-)
--
2.34.1
From: Zi Yan <ziy(a)nvidia.com>
Hi all,
File folio supports any order and people would like to support flexible orders
for anonymous folio[1] too. Currently, split_huge_page() only splits a huge
page to order-0 pages, but splitting to orders higher than 0 is also useful.
This patchset adds support for splitting a huge page to any lower order pages
and uses it during folio truncate operations.
The patchset is on top of mm-everything-2023-03-27-21-20.
Changelog from v1
===
1. Changed split_page_memcg() and split_page_owner() parameter to use order
2. Used folio_test_pmd_mappable() in place of the equivalent code
Details
===
* Patch 1 changes split_page_memcg() to use order instead of nr_pages
* Patch 2 changes split_page_owner() to use order instead of nr_pages
* Patch 3 and 4 add new_order parameter split_page_memcg() and
split_page_owner() and prepare for upcoming changes.
* Patch 5 adds split_huge_page_to_list_to_order() to split a huge page
to any lower order. The original split_huge_page_to_list() calls
split_huge_page_to_list_to_order() with new_order = 0.
* Patch 6 uses split_huge_page_to_list_to_order() in large pagecache folio
truncation instead of split the large folio all the way down to order-0.
* Patch 7 adds a test API to debugfs and test cases in
split_huge_page_test selftests.
Comments and/or suggestions are welcome.
[1] https://lore.kernel.org/linux-mm/Y%2FblF0GIunm+pRIC@casper.infradead.org/
Zi Yan (7):
mm/memcg: use order instead of nr in split_page_memcg()
mm/page_owner: use order instead of nr in split_page_owner()
mm: memcg: make memcg huge page split support any order split.
mm: page_owner: add support for splitting to any order in split
page_owner.
mm: thp: split huge page to any lower order pages.
mm: truncate: split huge page cache page to a non-zero order if
possible.
mm: huge_memory: enable debugfs to split huge pages to any order.
include/linux/huge_mm.h | 10 +-
include/linux/memcontrol.h | 4 +-
include/linux/page_owner.h | 10 +-
mm/huge_memory.c | 137 ++++++++---
mm/memcontrol.c | 10 +-
mm/page_alloc.c | 8 +-
mm/page_owner.c | 10 +-
mm/truncate.c | 21 +-
.../selftests/mm/split_huge_page_test.c | 225 +++++++++++++++++-
9 files changed, 366 insertions(+), 69 deletions(-)
--
2.39.2
In the Segment Routing (SR) architecture a list of instructions, called
segments, can be added to the packet headers to influence the forwarding and
processing of the packets in an SR enabled network.
Considering the Segment Routing over IPv6 data plane (SRv6) [1], the segment
identifiers (SIDs) are IPv6 addresses (128 bits) and the segment list (SID
List) is carried in the Segment Routing Header (SRH). A segment may correspond
to a "behavior" that is executed by a node when the packet is received.
The Linux kernel currently supports a large subset of the behaviors described
in [2] (e.g., End, End.X, End.T and so on).
In some SRv6 scenarios, the number of segments carried by the SID List may
increase dramatically, reducing the MTU (Maximum Transfer Unit) size and/or
limiting the processing power of legacy hardware devices (due to longer IPv6
headers).
The NEXT-C-SID mechanism [3] extends the SRv6 architecture by providing several
ways to efficiently represent the SID List.
By leveraging the NEXT-C-SID, is it possible to encode several SRv6 segments
within a single 128 bit SID address (also referenced as Compressed SID
Container). In this way, the length of the SID List can be drastically reduced.
The NEXT-C-SID mechanism is built upon the "flavors" framework defined in [2].
This framework is already supported by the Linux SRv6 subsystem and is used to
modify and/or extend a subset of existing behaviors.
In this patchset, we extend the SRv6 End.X behavior in order to support the
NEXT-C-SID mechanism.
In details, the patchset is made of:
- patch 1/2: add NEXT-C-SID support for SRv6 End.X behavior;
- patch 2/2: add selftest for NEXT-C-SID in SRv6 End.X behavior.
From the user space perspective, we do not need to change the iproute2 code to
support the NEXT-C-SID flavor for the SRv6 End.X behavior. However, we will
update the man page considering the NEXT-C-SID flavor applied to the SRv6 End.X
behavior in a separate patch.
Comments, improvements and suggestions are always appreciated.
Thank you all,
Andrea
[1] - https://datatracker.ietf.org/doc/html/rfc8754
[2] - https://datatracker.ietf.org/doc/html/rfc8986
[3] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression
Andrea Mayer (1):
seg6: add NEXT-C-SID support for SRv6 End.X behavior
Paolo Lungaroni (1):
selftests: seg6: add selftest for NEXT-C-SID flavor in SRv6 End.X
behavior
net/ipv6/seg6_local.c | 108 +-
tools/testing/selftests/net/Makefile | 1 +
.../net/srv6_end_x_next_csid_l3vpn_test.sh | 1213 +++++++++++++++++
3 files changed, 1302 insertions(+), 20 deletions(-)
create mode 100755 tools/testing/selftests/net/srv6_end_x_next_csid_l3vpn_test.sh
--
2.20.1
Hi all:
The core frequency is subjected to the process variation in semiconductors.
Not all cores are able to reach the maximum frequency respecting the
infrastructure limits. Consequently, AMD has redefined the concept of
maximum frequency of a part. This means that a fraction of cores can reach
maximum frequency. To find the best process scheduling policy for a given
scenario, OS needs to know the core ordering informed by the platform through
highest performance capability register of the CPPC interface.
Earlier implementations of AMD Pstate Preferred Core only support a static
core ranking and targeted performance. Now it has the ability to dynamically
change the preferred core based on the workload and platform conditions and
accounting for thermals and aging.
AMD Pstate driver utilizes the functions and data structures provided by
the ITMT architecture to enable the scheduler to favor scheduling on cores
which can be get a higher frequency with lower voltage.
We call it AMD Pstate Preferrred Core.
Here sched_set_itmt_core_prio() is called to set priorities and
sched_set_itmt_support() is called to enable ITMT feature.
AMD Pstate driver uses the highest performance value to indicate
the priority of CPU. The higher value has a higher priority.
AMD Pstate driver will provide an initial core ordering at boot time.
It relies on the CPPC interface to communicate the core ranking to the
operating system and scheduler to make sure that OS is choosing the cores
with highest performance firstly for scheduling the process. When AMD Pstate
driver receives a message with the highest performance change, it will
update the core ranking.
Meng Li (6):
ACPI: CPPC: Add get the highest performance cppc control
cpufreq: amd-pstate: Enable AMD Pstate Preferred Core Supporting.
cpufreq: Add a notification message that the highest perf has changed
cpufreq: amd-pstate: Update AMD Pstate Preferred Core ranking
dynamically
Documentation: amd-pstate: introduce AMD Pstate Preferred Core
Documentation: introduce AMD Pstate Preferrd Core mode kernel command
line options
.../admin-guide/kernel-parameters.txt | 5 +
Documentation/admin-guide/pm/amd-pstate.rst | 55 ++++++
drivers/acpi/cppc_acpi.c | 13 ++
drivers/acpi/processor_driver.c | 6 +
drivers/cpufreq/amd-pstate.c | 181 ++++++++++++++++--
drivers/cpufreq/cpufreq.c | 13 ++
include/acpi/cppc_acpi.h | 5 +
include/linux/amd-pstate.h | 1 +
include/linux/cpufreq.h | 4 +
9 files changed, 267 insertions(+), 16 deletions(-)
--
2.34.1
Submit the top-level headers also from the kunit test module notifier
initialization callback, so external tools that are parsing dmesg for
kunit test output are able to tell how many test suites should be expected
and whether to continue parsing after complete output from the first test
suite is collected.
Extend kunit module notifier initialization callback with a processing
path for only listing the tests provided by a module if the kunit action
parameter is set to "list", so external tools can obtain a list of test
cases to be executed in advance and can make a better job on assigning
kernel messages interleaved with kunit output to specific tests.
Use test filtering functions in kunit module notifier callback functions,
so external tools are able to execute individual test cases from kunit
test modules in order to still better isolate their potential impact on
kernel messages that appear interleaved with output from other tests.
v5: Fix new name of a structure moved to kunit namespace not updated in
executor_test functions (lkp(a)intel.com),
- refresh on tpp of attributes filtering fix.
v4: Use kunit_exec_run_tests() (Mauro, Rae), but prevent it from
emitting the headers when called on load of non-test modules,
- don't use a different list format, use kunit_exec_list_tests() (Rae),
- refresh on top of newly introduced attributes patches, handle newly
introduced kunit.action=list_attr case (Rae).
v3: Fix CONFIG_GLOB, required by filtering functions, not selected when
building as a module (lkp(a)intel.com).
v2: Fix new name of a structure moved to kunit namespace not updated
across all uses (lkp(a)intel.com).
Janusz Krzysztofik (3):
kunit: Report the count of test suites in a module
kunit: Make 'list' action available to kunit test modules
kunit: Allow kunit test modules to use test filtering
include/kunit/test.h | 21 +++++++
lib/kunit/Kconfig | 2 +-
lib/kunit/executor.c | 115 ++++++++++++++++++++++----------------
lib/kunit/executor_test.c | 36 ++++++++----
lib/kunit/test.c | 37 +++++++++++-
5 files changed, 149 insertions(+), 62 deletions(-)
base-commit: 1c9fd080dffe5e5ad763527fbc2aa3f6f8c653e9
--
2.41.0
Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Users can now select a
desired address space using a non-zero hint address to mmap. Previously,
requesting the default address space from mmap by passing zero as the hint
address would result in using the largest address space possible. Some
applications depend on empty bits in the virtual address space, like Go and
Java, so this patch provides more flexibility for application developers.
-Charlie
---
v8:
- Fix RV32 and the RV32 compat mode of RV64
- Extract out addr and base from the mmap macros
v7:
- Changing RLIMIT_STACK inside of an executing program does not trigger
arch_pick_mmap_layout(), so rewrite tests to change RLIMIT_STACK from a
script before executing tests. RLIMIT_STACK of infinity forces bottomup
mmap allocation.
- Make arch_get_mmap_base macro more readible by extracting out the rnd
calculation.
- Use MMAP_MIN_VA_BITS in TASK_UNMAPPED_BASE to support case when mmap
attempts to allocate address smaller than DEFAULT_MAP_WINDOW.
- Fix incorrect wording in documentation.
v6:
- Rebase onto the correct base
v5:
- Minor wording change in documentation
- Change some parenthesis in arch_get_mmap_ macros
- Added case for addr==0 in arch_get_mmap_ because without this, programs would
crash if RLIMIT_STACK was modified before executing the program. This was
tested using the libhugetlbfs tests.
v4:
- Split testcases/document patch into test cases, in-code documentation, and
formal documentation patches
- Modified the mmap_base macro to be more legible and better represent memory
layout
- Fixed documentation to better reflect the implmentation
- Renamed DEFAULT_VA_BITS to MMAP_VA_BITS
- Added additional test case for rlimit changes
---
Charlie Jenkins (4):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Document mmap changes
Documentation/riscv/vm-layout.rst | 22 +++++++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 28 ++++++--
arch/riscv/include/asm/processor.h | 52 +++++++++++++--
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/.gitignore | 2 +
tools/testing/selftests/riscv/mm/Makefile | 15 +++++
.../riscv/mm/testcases/mmap_bottomup.c | 35 ++++++++++
.../riscv/mm/testcases/mmap_default.c | 35 ++++++++++
.../selftests/riscv/mm/testcases/mmap_test.h | 64 +++++++++++++++++++
.../selftests/riscv/mm/testcases/run_mmap.sh | 12 ++++
11 files changed, 257 insertions(+), 12 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_default.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_test.h
create mode 100755 tools/testing/selftests/riscv/mm/testcases/run_mmap.sh
--
2.41.0
Here is a series with some fixes and cleanups to resctrl selftests.
v5:
- Improve changelogs
- Close fd_lm only in cat_val()
- Improve unmount error handling
v4:
- Move resctrlfs (unconditional) umount after resctrl fs support check
v3:
- Don't include rewritten CAT test into this series!
- Tweak wildcard style in Makefile
- Fix many changelog typos, remove some wrong claims, and generally
improve them.
- Add fix to PARENT_EXIT() to unmount resctrl FS
- Add unmounting resctrl FS before starting any tests
- Add fix for buf leak
- Add fix for perf fd closing
- Split mount/remount/umount patches differently
- Use size_t and %zu for span
- Keep MBM print as MB, only internally use span in bytes
- Drop start_buf global from fill_buf
v2 (was sent with CAT test rewrite which is no longer included in v3):
- Rebased on top of next to solve the conflicts
- Added 2 patches related to resctrl FS mount/umount (fix + cleanup)
- Consistently use "alloc" in cache_alloc_size()
- CAT test error handling tweaked
- Remove a spurious newline change from the CAT patch
- Small improvements to changelogs
Ilpo Järvinen (19):
selftests/resctrl: Add resctrl.h into build deps
selftests/resctrl: Don't leak buffer in fill_cache()
selftests/resctrl: Unmount resctrl FS if child fails to run benchmark
selftests/resctrl: Close perf value read fd on errors
selftests/resctrl: Unmount resctrl FS before starting the first test
selftests/resctrl: Move resctrl FS mount/umount to higher level
selftests/resctrl: Refactor remount_resctrl(bool mum_resctrlfs) to
mount_resctrl()
selftests/resctrl: Remove mum_resctrlfs from struct resctrl_val_param
selftests/resctrl: Convert span to size_t
selftests/resctrl: Express span internally in bytes
selftests/resctrl: Remove duplicated preparation for span arg
selftests/resctrl: Remove "malloc_and_init_memory" param from
run_fill_buf()
selftests/resctrl: Remove unnecessary startptr global from fill_buf
selftests/resctrl: Improve parameter consistency in fill_buf
selftests/resctrl: Don't pass test name to fill_buf
selftests/resctrl: Don't use variable argument list for ->setup()
selftests/resctrl: Move CAT/CMT test global vars to function they are
used in
selftests/resctrl: Pass the real number of tests to show_cache_info()
selftests/resctrl: Remove test type checks from cat_val()
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/resctrl/cache.c | 66 +++++++-------
tools/testing/selftests/resctrl/cat_test.c | 28 ++----
tools/testing/selftests/resctrl/cmt_test.c | 29 ++-----
tools/testing/selftests/resctrl/fill_buf.c | 87 +++++++------------
tools/testing/selftests/resctrl/mba_test.c | 9 +-
tools/testing/selftests/resctrl/mbm_test.c | 17 ++--
tools/testing/selftests/resctrl/resctrl.h | 17 ++--
.../testing/selftests/resctrl/resctrl_tests.c | 83 ++++++++++++------
tools/testing/selftests/resctrl/resctrl_val.c | 7 +-
tools/testing/selftests/resctrl/resctrlfs.c | 64 +++++++-------
11 files changed, 178 insertions(+), 231 deletions(-)
--
2.30.2
This is agains mm/mm-unstable, but everything except patch #6 and #7
should apply on current master. Especially patch #1 and #2 should go
upstream first, so we can let the other stuff mature a bit longer.
Handle the fallout of 474098edac26 ("mm/gup: replace FOLL_NUMA by
gup_can_follow_protnone()") where I accidentially missed that
follow_page() and smaps implicitly kept the FOLL_NUMA flag clear by not
setting it if FOLL_FORCE is absent, to not trigger faults on
PROT_NONE-mapped PTEs.
Patch #1 fixes the known issues by reintroducing FOLL_NUMA as
FOLL_HONOR_NUMA_FAULT and decoupling it from FOLL_FORCE.
Patch #2 is a cleanup that I think actually fixes some corner cases, so
I added a Fixes: tag.
Patch #3 makes KVM explicitly set FOLL_HONOR_NUMA_FAULT in the single
case where it is required, and documents the situation.
Patch #4 then stops implicitly setting FOLL_HONOR_NUMA_FAULT. But note that
for FOLL_WRITE we always implicitly honor NUMA hinting faults.
Patch #5 cleans up a comments.
Patch #6 improves the KVM functional tests such that patch #7 can
actually check for one of the known issues: KSM no longer working on
PROT_NONE mappings on x86-64 with CONFIG_NUMA_BALANCING.
v2 -> V3:
* "mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT"
-> Squash one comment removal
-> Adjust the KSM comment
* smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd()
-> Move follow_trans_huge_pmd() to mm/internal.h
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: liubo <liubo254(a)huawei.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
David Hildenbrand (7):
mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd()
kvm: explicitly set FOLL_HONOR_NUMA_FAULT in hva_to_pfn_slow()
mm/gup: don't implicitly set FOLL_HONOR_NUMA_FAULT
pgtable: improve pte_protnone() comment
selftest/mm: ksm_functional_tests: test in mmap_and_merge_range() if
anything got merged
selftest/mm: ksm_functional_tests: Add PROT_NONE test
fs/proc/task_mmu.c | 3 +-
include/linux/huge_mm.h | 3 -
include/linux/mm.h | 21 +++-
include/linux/mm_types.h | 9 ++
include/linux/pgtable.h | 16 ++-
mm/gup.c | 23 +++-
mm/huge_memory.c | 3 +-
mm/internal.h | 7 ++
.../selftests/mm/ksm_functional_tests.c | 106 ++++++++++++++++--
virt/kvm/kvm_main.c | 13 ++-
10 files changed, 171 insertions(+), 33 deletions(-)
--
2.41.0
As reported and suggested by Willy, the inline __sysret() helper
introduces three types of conversions and increases the size:
(1) the "unsigned long" argument to __sysret() forces a sign extension
from all sys_* functions that used to return 'int'
(2) the comparison with the error range now has to be performed on a
'unsigned long' instead of an 'int'
(3) the return value from __sysret() is a 'long' (note, a signed long)
which then has to be turned back to an 'int' before being returned by the
caller to satisfy the caller's prototype.
To fix up this, firstly, let's use macro instead of inline function to
preserves the input type and avoids these useless conversions (1), (3).
Secondly, comparison to -MAX_ERRNO inflicts on all integer returns where
we could previously keep a simple sign comparison, let's use a new
is_signed_type() macro from include/linux/compiler.h to limit the
comparision to -MAX_ERRNO (2) only on demand and preserves a simple sign
comparision for most of the cases as before.
Thirdly, fix up the following warning by an explicit conversion and let
__sysret() be able to accept the (void *) type of argument and return
value with the same (void *) type:
sysroot/powerpc/include/sys.h: In function 'sbrk':
sysroot/powerpc/include/sys.h:104:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
104 | return (void *)__sysret(-ENOMEM);
Fourthly, to further workaround the argument type with 'const', must use
__auto_type for a new enough gcc versions and use 'long' for the old gcc
versions as before.
Here reports the size testing result with nolibc-test:
before:
// ppc64le
$ size nolibc-test
text data bss dec hex filename
27916 8 80 28004 6d64 nolibc-test
// mips
$ size nolibc-test
text data bss dec hex filename
23276 64 64 23404 5b6c nolibc-test
after:
// ppc64le
$ size nolibc-test
text data bss dec hex filename
27736 8 80 27824 6cb0 nolibc-test
// mips
$ size nolibc-test
text data bss dec hex filename
23036 64 64 23164 5a7c nolibc-test
Suggested-by: Willy Tarreau <w(a)1wt.eu>
Link: https://lore.kernel.org/lkml/20230806095846.GB10627@1wt.eu/
Link: https://lore.kernel.org/lkml/20230806134348.GA19145@1wt.eu/
Signed-off-by: Zhangjin Wu <falcon(a)tinylab.org>
---
Hi, Willy
To increase readability, v3 further defines a
__GXX_HAS_AUTO_TYPE_WITH_CONST_SUPPORT macro for gcc >= 11.0
(ABI_VERSION >= 1016) who has __auto_type with 'const' support.
When this macro is defined, provides a __sysret version with
__auto_type, otherwise, use a fixed 'long' type as a fallback.
Tested for all of the nolibc supported architectures with Arnd's
13.2.0 toolchains. and also for x86_64 with gcc-4.8 and gcc-9, no
compile failures, no compile warnings, no running failures.
Changes from v2 --> v3:
* define a __GXX_HAS_AUTO_TYPE_WITH_CONST_SUPPORT for gcc >= 11.0 (ABI_VERSION >= 1016)
* split __sysret() to two versions by the macro instead of a mixed unified and unreadable version
* use shorter __ret instead of __sysret_arg
Changes from v1 --> v2:
* fix up argument with 'const' in the type
* support "void *" argument
v2: https://lore.kernel.org/lkml/95fe3e732f455fab653fe1427118d905e4d04257.16913…
v1: https://lore.kernel.org/lkml/20230806131921.52453-1-falcon@tinylab.org/
---
tools/include/nolibc/sys.h | 66 +++++++++++++++++++++++++++++++-------
1 file changed, 55 insertions(+), 11 deletions(-)
diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h
index 56f63eb48a1b..b137f7771db9 100644
--- a/tools/include/nolibc/sys.h
+++ b/tools/include/nolibc/sys.h
@@ -35,15 +35,59 @@
* (src/internal/syscall_ret.c) and glibc (sysdeps/unix/sysv/linux/sysdep.h)
*/
-static __inline__ __attribute__((unused, always_inline))
-long __sysret(unsigned long ret)
-{
- if (ret >= (unsigned long)-MAX_ERRNO) {
- SET_ERRNO(-(long)ret);
- return -1;
- }
- return ret;
-}
+/*
+ * Whether 'type' is a signed type or an unsigned type. Supports scalar types,
+ * bool and also pointer types. (from include/linux/compiler.h)
+ */
+#define __is_signed_type(type) (((type)(-1)) < (type)1)
+
+/* __auto_type is used instead of __typeof__ to workaround the build error
+ * 'error: assignment of read-only variable' when the argument has 'const' in
+ * the type, but __auto_type is a new feature from newer gcc version and it
+ * only works with 'const' from gcc 11.0 (__GXX_ABI_VERSION = 1016)
+ * https://gcc.gnu.org/legacy-ml/gcc-patches/2013-11/msg01378.html
+ */
+
+#if __GXX_ABI_VERSION >= 1016
+#define __GXX_HAS_AUTO_TYPE_WITH_CONST_SUPPORT
+#endif
+
+#ifdef __GXX_HAS_AUTO_TYPE_WITH_CONST_SUPPORT
+#define __sysret(arg) \
+({ \
+ __auto_type __ret = (arg); \
+ if (__is_signed_type(__typeof__(arg))) { \
+ if (__ret < 0) { \
+ SET_ERRNO(-(long)__ret); \
+ __ret = (__typeof__(arg))(-1L); \
+ } \
+ } else { \
+ if ((unsigned long)__ret >= (unsigned long)-MAX_ERRNO) { \
+ SET_ERRNO(-(long)__ret); \
+ __ret = (__typeof__(arg))(-1L); \
+ } \
+ } \
+ __ret; \
+})
+
+#else /* ! __GXX_HAS_AUTO_TYPE_WITH_CONST_SUPPORT */
+#define __sysret(arg) \
+({ \
+ long __ret = (long)(arg); \
+ if (__is_signed_type(__typeof__(arg))) { \
+ if (__ret < 0) { \
+ SET_ERRNO(-__ret); \
+ __ret = -1L; \
+ } \
+ } else { \
+ if ((unsigned long)__ret >= (unsigned long)-MAX_ERRNO) { \
+ SET_ERRNO(-__ret); \
+ __ret = -1L; \
+ } \
+ } \
+ (__typeof__(arg))__ret; \
+})
+#endif /* ! __GXX_HAS_AUTO_TYPE_WITH_CONST_SUPPORT */
/* Functions in this file only describe syscalls. They're declared static so
* that the compiler usually decides to inline them while still being allowed
@@ -94,7 +138,7 @@ void *sbrk(intptr_t inc)
if (ret && sys_brk(ret + inc) == ret + inc)
return ret + inc;
- return (void *)__sysret(-ENOMEM);
+ return __sysret((void *)-ENOMEM);
}
@@ -682,7 +726,7 @@ void *sys_mmap(void *addr, size_t length, int prot, int flags, int fd,
static __attribute__((unused))
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset)
{
- return (void *)__sysret((unsigned long)sys_mmap(addr, length, prot, flags, fd, offset));
+ return __sysret(sys_mmap(addr, length, prot, flags, fd, offset));
}
static __attribute__((unused))
--
2.25.1
Silence the following warnings reported by the new -Wall -Wextra options
with pure assembly code.
In file included from sysroot/powerpc/include/stdio.h:13,
from nolibc-test.c:13:
sysroot/powerpc/include/arch.h: In function '_start':
sysroot/powerpc/include/arch.h:192:32: warning: unused variable 'r2' [-Wunused-variable]
192 | register volatile long r2 __asm__ ("r2") = (void *)&TOC - (void *)_start;
| ^~
sysroot/powerpc/include/arch.h:187:97: warning: optimization may eliminate reads and/or writes to register variables [-Wvolatile-register-var]
187 | void __attribute__((weak, noreturn, optimize("Os", "omit-frame-pointer"))) __no_stack_protector _start(void)
| ^~~~~~
Since only elfv2 ABI requires to save the TOC/GOT pointer to r2
register, when using elfv1 ABI, the old C code is simply ignored by the
compiler, but the compiler can not ignore the inline assembly code and
will introduce build failure or running segfaults. So, let's further
only add the new assembly code for elfv2 ABI with the checking of
_CALL_ELF == 2.
Link: https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.pdf
Link: https://www.llvm.org/devmtg/2014-04/PDFs/Talks/Euro-LLVM-2014-Weigand.pdf
Signed-off-by: Zhangjin Wu <falcon(a)tinylab.org>
---
Hi, Willy
When rebase on latest 20230806-for-6.6-1 branch, -Wall -Wextra reported
the above warnings.
Here uses volatile inline assembly code instead of C code to silence the
unused and optimization warnings.
And since only elfv2 require to save TOC pointer to r2 register, this
further only add the assembly code for elfv2.
BR,
Zhangjin
---
tools/include/nolibc/arch-powerpc.h | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/tools/include/nolibc/arch-powerpc.h b/tools/include/nolibc/arch-powerpc.h
index 76c3784f9dc7..ac212e6185b2 100644
--- a/tools/include/nolibc/arch-powerpc.h
+++ b/tools/include/nolibc/arch-powerpc.h
@@ -187,9 +187,17 @@
void __attribute__((weak, noreturn, optimize("Os", "omit-frame-pointer"))) __no_stack_protector _start(void)
{
#ifdef __powerpc64__
- /* On 64-bit PowerPC, save TOC/GOT pointer to r2 */
- extern char TOC __asm__ (".TOC.");
- register volatile long r2 __asm__ ("r2") = (void *)&TOC - (void *)_start;
+#if _CALL_ELF == 2
+ /* with -mabi=elfv2, save TOC/GOT pointer to r2
+ * r12 is global entry pointer, we use it to compute TOC from r12
+ * https://www.llvm.org/devmtg/2014-04/PDFs/Talks/Euro-LLVM-2014-Weigand.pdf
+ * https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.pdf
+ */
+ __asm__ volatile (
+ "addis 2, 12, .TOC. - _start@ha\n"
+ "addi 2, 2, .TOC. - _start@l\n"
+ );
+#endif /* _CALL_ELF == 2 */
__asm__ volatile (
"mr 3, 1\n" /* save stack pointer to r3, as arg1 of _start_c */
--
2.25.1
Here is a series with some fixes and cleanups to resctrl selftests.
Only has a minor change in code ordering in main() compared with v3.
v4:
- Move resctrlfs (unconditional) umount after resctrl fs support check
v3:
- Don't include rewritten CAT test into this series!
- Tweak wildcard style in Makefile
- Fix many changelog typos, remove some wrong claims, and generally
improve them.
- Add fix to PARENT_EXIT() to unmount resctrl FS
- Add unmounting resctrl FS before starting any tests
- Add fix for buf leak
- Add fix for perf fd closing
- Split mount/remount/umount patches differently
- Use size_t and %zu for span
- Keep MBM print as MB, only internally use span in bytes
- Drop start_buf global from fill_buf
v2 (was sent with CAT test rewrite which is no longer included in v3):
- Rebased on top of next to solve the conflicts
- Added 2 patches related to resctrl FS mount/umount (fix + cleanup)
- Consistently use "alloc" in cache_alloc_size()
- CAT test error handling tweaked
- Remove a spurious newline change from the CAT patch
- Small improvements to changelogs
Ilpo Järvinen (19):
selftests/resctrl: Add resctrl.h into build deps
selftests/resctrl: Don't leak buffer in fill_cache()
selftests/resctrl: Unmount resctrl FS if child fails to run benchmark
selftests/resctrl: Close perf value read fd on errors
selftests/resctrl: Unmount resctrl FS before starting the first test
selftests/resctrl: Move resctrl FS mount/umount to higher level
selftests/resctrl: Refactor remount_resctrl(bool mum_resctrlfs) to
mount_resctrl()
selftests/resctrl: Remove mum_resctrlfs from struct resctrl_val_param
selftests/resctrl: Convert span to size_t
selftests/resctrl: Express span internally in bytes
selftests/resctrl: Remove duplicated preparation for span arg
selftests/resctrl: Remove "malloc_and_init_memory" param from
run_fill_buf()
selftests/resctrl: Remove unnecessary startptr global from fill_buf
selftests/resctrl: Improve parameter consistency in fill_buf
selftests/resctrl: Don't pass test name to fill_buf
selftests/resctrl: Don't use variable argument list for ->setup()
selftests/resctrl: Move CAT/CMT test global vars to function they are
used in
selftests/resctrl: Pass the real number of tests to show_cache_info()
selftests/resctrl: Remove test type checks from cat_val()
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/resctrl/cache.c | 64 +++++++-------
tools/testing/selftests/resctrl/cat_test.c | 28 ++----
tools/testing/selftests/resctrl/cmt_test.c | 29 ++-----
tools/testing/selftests/resctrl/fill_buf.c | 87 +++++++------------
tools/testing/selftests/resctrl/mba_test.c | 9 +-
tools/testing/selftests/resctrl/mbm_test.c | 17 ++--
tools/testing/selftests/resctrl/resctrl.h | 17 ++--
.../testing/selftests/resctrl/resctrl_tests.c | 82 +++++++++++------
tools/testing/selftests/resctrl/resctrl_val.c | 7 +-
tools/testing/selftests/resctrl/resctrlfs.c | 57 ++++++------
11 files changed, 169 insertions(+), 230 deletions(-)
--
2.30.2
Hi, Willy
Here is the v6 of the __sysret series [1], applies your suggestions.
additionally, the sbrk() also uses the __sysret helper.
These patches are tested (together with the coming v4 selftests/nolibc
patches) for all of the supported architectures:
arch/board | result
------------|------------
arm/vexpress-a9 | 142 test(s) passed, 1 skipped, 0 failed.
arm/virt | 142 test(s) passed, 1 skipped, 0 failed.
aarch64/virt | 142 test(s) passed, 1 skipped, 0 failed.
ppc/g3beige | not supported
ppc/ppce500 | not supported
i386/pc | 142 test(s) passed, 1 skipped, 0 failed.
x86_64/pc | 142 test(s) passed, 1 skipped, 0 failed.
mipsel/malta | 142 test(s) passed, 1 skipped, 0 failed.
loongarch64/virt | 142 test(s) passed, 1 skipped, 0 failed.
riscv64/virt | 142 test(s) passed, 1 skipped, 0 failed.
riscv32/virt | 0 test(s) passed, 0 skipped, 0 failed.
s390x/s390-ccw-virtio | 142 test(s) passed, 1 skipped, 0 failed.
Changes from v5 --> v6:
* tools/nolibc: arch-*.h: fix up code indent errors
toolc/nolibc: arch-*.h: clean up whitespaces after __asm__
Fix up the code indent errors and whitespaces between __asm__ and volatile.
The post-whitespaces are reserved as before.
* tools/nolibc: arch-loongarch.h: shrink with _NOLIBC_SYSCALL_CLOBBERLIST
tools/nolibc: arch-mips.h: shrink with _NOLIBC_SYSCALL_CLOBBERLIST
Add _NOLIBC_ prefix for SYSCALL_CLOBBERLIST.
* tools/nolibc: add missing my_syscall6() for mips
Use post-whitespaces instead of post-tab.
The above 4 patches are preparation for this one.
* tools/nolibc: __sysret: support syscalls who return a pointer
Add comments about the new errno range [-MAX_ERRNOR, -1], add ref to
the musl and glibc.
* tools/nolibc: clean up mmap() routine
Comment the MAP_FAILED return info.
* tools/nolibc: clean up sbrk() routine
New patch, applies __sysret() helper too and also fixes up an error
reported by scripts/checkpatch.pl.
* selftests/nolibc: export argv0 for some tests
selftests/nolibc: prepare: create /dev/zero
Prepare /dev/zero and argv0 for mmap test cases.
* selftests/nolibc: add EXPECT_PTREQ, EXPECT_PTRNE and EXPECT_PTRER
selftests/nolibc: add sbrk_0 to test current brk getting
No change.
* selftests/nolibc: add mmap_bad test case
selftests/nolibc: add munmap_bad test case
selftests/nolibc: add mmap_munmap_good test case
Split the first two out to standalone patches.
Add /dev/zero and argv0 to the file list and assigns a file_size
manually for /dev/zero.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1687957589.git.falcon@tinylab.org/
Zhangjin Wu (15):
tools/nolibc: arch-*.h: fix up code indent errors
toolc/nolibc: arch-*.h: clean up whitespaces after __asm__
tools/nolibc: arch-loongarch.h: shrink with _NOLIBC_SYSCALL_CLOBBERLIST
tools/nolibc: arch-mips.h: shrink with _NOLIBC_SYSCALL_CLOBBERLIST
tools/nolibc: add missing my_syscall6() for mips
tools/nolibc: __sysret: support syscalls who return a pointer
tools/nolibc: clean up mmap() routine
tools/nolibc: clean up sbrk() routine
selftests/nolibc: export argv0 for some tests
selftests/nolibc: prepare: create /dev/zero
selftests/nolibc: add EXPECT_PTREQ, EXPECT_PTRNE and EXPECT_PTRER
selftests/nolibc: add sbrk_0 to test current brk getting
selftests/nolibc: add mmap_bad test case
selftests/nolibc: add munmap_bad test case
selftests/nolibc: add mmap_munmap_good test case
tools/include/nolibc/arch-aarch64.h | 28 ++--
tools/include/nolibc/arch-arm.h | 28 ++--
tools/include/nolibc/arch-i386.h | 24 ++--
tools/include/nolibc/arch-loongarch.h | 37 +++---
tools/include/nolibc/arch-mips.h | 73 +++++++----
tools/include/nolibc/arch-riscv.h | 14 +-
tools/include/nolibc/arch-s390.h | 14 +-
tools/include/nolibc/arch-x86_64.h | 28 ++--
tools/include/nolibc/nolibc.h | 9 +-
tools/include/nolibc/sys.h | 55 ++++----
tools/include/nolibc/types.h | 6 +
tools/testing/selftests/nolibc/nolibc-test.c | 129 ++++++++++++++++++-
12 files changed, 292 insertions(+), 153 deletions(-)
--
2.25.1
As reported and suggested by Willy, the inline __sysret() helper
introduces three types of conversions and increases the size:
(1) the "unsigned long" argument to __sysret() forces a sign extension
from all sys_* functions that used to return 'int'
(2) the comparison with the error range now has to be performed on a
'unsigned long' instead of an 'int'
(3) the return value from __sysret() is a 'long' (note, a signed long)
which then has to be turned back to an 'int' before being returned by the
caller to satisfy the caller's prototype.
To fix up this, firstly, let's use macro instead of inline function to
preserves the input type and avoids these useless conversions (1), (3).
Secondly, comparison to -MAX_ERRNO inflicts on all integer returns where
we could previously keep a simple sign comparison, let's use a new
is_signed_type() macro from include/linux/compiler.h to limit the
comparision to -MAX_ERRNO (2) only on demand and preserves a simple sign
comparision for most of the cases as before.
Thirdly, fix up the following warning by an explicit conversion and let
__sysret() be able to accept the (void *) type of argument:
sysroot/powerpc/include/sys.h: In function 'sbrk':
sysroot/powerpc/include/sys.h:104:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
104 | return (void *)__sysret(-ENOMEM);
Fourthly, to further workaround the argument type with 'const', must use
__auto_type in a new enough version or use 'long' as before.
Here reports the size testing result of nolibc-test with gcc 13.2.0:
before:
// ppc64le with powerpc64-linux-gcc
$ size nolibc-test
text data bss dec hex filename
28004 8 80 28092 6dbc nolibc-test
// mips with mips64-linux-gcc (CFLAGS="-mabi=32 -EL")
$ size nolibc-test
text data bss dec hex filename
23164 64 64 23292 5afc nolibc-test
after:
// ppc64le with powerpc64-linux-gcc
$ size nolibc-test
text data bss dec hex filename
27828 8 80 27916 6d0c nolibc-test
// mips with mips64-linux-gcc (CFLAGS="-mabi=32 -EL")
$ size nolibc-test
text data bss dec hex filename
22924 64 64 23052 5a0c nolibc-test
Suggested-by: Willy Tarreau <w(a)1wt.eu>
Link: https://lore.kernel.org/lkml/20230806095846.GB10627@1wt.eu/
Link: https://lore.kernel.org/lkml/20230806134348.GA19145@1wt.eu/
Signed-off-by: Zhangjin Wu <falcon(a)tinylab.org>
---
Hi, Willy
v4 rebases on latest 20230806-for-6.6-1 and fixes up a warning reported
by the new -Wall -Wextra options.
Changes from v3 --> v4:
* fix up a new warning about 'ret < 0' when the input arg type is (void *)
Changes from v2 --> v3:
* define a __GXX_HAS_AUTO_TYPE_WITH_CONST_SUPPORT for gcc >= 11.0 (ABI_VERSION >= 1016)
* split __sysret() to two versions by the macro instead of a mixed unified and unreadable version
* use shorter __ret instead of __sysret_arg
Changes from v1 --> v2:
* fix up argument with 'const' in the type
* support "void *" argument
v2: https://lore.kernel.org/lkml/95fe3e732f455fab653fe1427118d905e4d04257.16913…
v1: https://lore.kernel.org/lkml/20230806131921.52453-1-falcon@tinylab.org/
---
tools/include/nolibc/sys.h | 66 +++++++++++++++++++++++++++++++-------
1 file changed, 55 insertions(+), 11 deletions(-)
diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h
index 833d6c5e86dc..565b4a295c11 100644
--- a/tools/include/nolibc/sys.h
+++ b/tools/include/nolibc/sys.h
@@ -35,15 +35,59 @@
* (src/internal/syscall_ret.c) and glibc (sysdeps/unix/sysv/linux/sysdep.h)
*/
-static __inline__ __attribute__((unused, always_inline))
-long __sysret(unsigned long ret)
-{
- if (ret >= (unsigned long)-MAX_ERRNO) {
- SET_ERRNO(-(long)ret);
- return -1;
- }
- return ret;
-}
+/*
+ * Whether 'type' is a signed type or an unsigned type. Supports scalar types,
+ * bool and also pointer types. (from include/linux/compiler.h)
+ */
+#define __is_signed_type(type) (((type)(-1)) < (type)1)
+
+/* __auto_type is used instead of __typeof__ to workaround the build error
+ * 'error: assignment of read-only variable' when the argument has 'const' in
+ * the type, but __auto_type is a new feature from newer gcc version and it
+ * only works with 'const' from gcc 11.0 (__GXX_ABI_VERSION = 1016)
+ * https://gcc.gnu.org/legacy-ml/gcc-patches/2013-11/msg01378.html
+ */
+
+#if __GXX_ABI_VERSION >= 1016
+#define __GXX_HAS_AUTO_TYPE_WITH_CONST_SUPPORT
+#endif
+
+#ifdef __GXX_HAS_AUTO_TYPE_WITH_CONST_SUPPORT
+#define __sysret(arg) \
+({ \
+ __auto_type __ret = (arg); \
+ if (__is_signed_type(__typeof__(arg))) { \
+ if ((long)__ret < 0) { \
+ SET_ERRNO(-(long)__ret); \
+ __ret = (__typeof__(arg))(-1L); \
+ } \
+ } else { \
+ if ((unsigned long)__ret >= (unsigned long)-MAX_ERRNO) { \
+ SET_ERRNO(-(long)__ret); \
+ __ret = (__typeof__(arg))(-1L); \
+ } \
+ } \
+ __ret; \
+})
+
+#else /* ! __GXX_HAS_AUTO_TYPE_WITH_CONST_SUPPORT */
+#define __sysret(arg) \
+({ \
+ long __ret = (long)(arg); \
+ if (__is_signed_type(__typeof__(arg))) { \
+ if (__ret < 0) { \
+ SET_ERRNO(-__ret); \
+ __ret = -1L; \
+ } \
+ } else { \
+ if ((unsigned long)__ret >= (unsigned long)-MAX_ERRNO) { \
+ SET_ERRNO(-__ret); \
+ __ret = -1L; \
+ } \
+ } \
+ (__typeof__(arg))__ret; \
+})
+#endif /* ! __GXX_HAS_AUTO_TYPE_WITH_CONST_SUPPORT */
/* Functions in this file only describe syscalls. They're declared static so
* that the compiler usually decides to inline them while still being allowed
@@ -94,7 +138,7 @@ void *sbrk(intptr_t inc)
if (ret && sys_brk(ret + inc) == ret + inc)
return ret + inc;
- return (void *)__sysret(-ENOMEM);
+ return __sysret((void *)-ENOMEM);
}
@@ -682,7 +726,7 @@ void *sys_mmap(void *addr, size_t length, int prot, int flags, int fd,
static __attribute__((unused))
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset)
{
- return (void *)__sysret((unsigned long)sys_mmap(addr, length, prot, flags, fd, offset));
+ return __sysret(sys_mmap(addr, length, prot, flags, fd, offset));
}
static __attribute__((unused))
--
2.25.1
As is described in the "How to use MPTCP?" section in MPTCP wiki [1]:
"Your app should create sockets with IPPROTO_MPTCP as the proto:
( socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); ). Legacy apps can be
forced to create and use MPTCP sockets instead of TCP ones via the
mptcpize command bundled with the mptcpd daemon."
But the mptcpize (LD_PRELOAD technique) command has some limitations
[2]:
- it doesn't work if the application is not using libc (e.g. GoLang
apps)
- in some envs, it might not be easy to set env vars / change the way
apps are launched, e.g. on Android
- mptcpize needs to be launched with all apps that want MPTCP: we could
have more control from BPF to enable MPTCP only for some apps or all the
ones of a netns or a cgroup, etc.
- it is not in BPF, we cannot talk about it at netdev conf.
So this patchset attempts to use BPF to implement functions similer to
mptcpize.
The main idea is to add a hook in sys_socket() to change the protocol id
from IPPROTO_TCP (or 0) to IPPROTO_MPTCP.
[1]
https://github.com/multipath-tcp/mptcp_net-next/wiki
[2]
https://github.com/multipath-tcp/mptcp_net-next/issues/79
v12:
- update diag_* log of update_socket_protocol.
- add 'ip netns show' after 'ip netns del' to check if there is
a test did not clean up its netns.
- return libbpf_get_error() instead of -EIO for the error from
open_and_load().
- Use getsockopt(SOL_PROTOCOL) to verify mptcp protocol intead of
using 'ss -tOni'.
v11:
- add comments about outputs of 'ss' and 'nstat'.
- use "err = verify_mptcpify()" instead of using =+.
v10:
- drop "#ifdef CONFIG_BPF_JIT".
- include vmlinux.h and bpf_tracing_net.h to avoid defining some
macros.
- drop unneeded checks for mptcp.
v9:
- update comment for 'update_socket_protocol'.
v8:
- drop the additional checks on the 'protocol' value after the
'update_socket_protocol()' call.
v7:
- add __weak and __diag_* for update_socket_protocol.
v6:
- add update_socket_protocol.
v5:
- add bpf_mptcpify helper.
v4:
- use lsm_cgroup/socket_create
v3:
- patch 8: char cmd[128]; -> char cmd[256];
v2:
- Fix build selftests errors reported by CI
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/79
Geliang Tang (5):
bpf: Add update_socket_protocol hook
selftests/bpf: Use random netns name for mptcp
selftests/bpf: Add two mptcp netns helpers
selftests/bpf: Fix error checks of mptcp open_and_load
selftests/bpf: Add mptcpify test
net/mptcp/bpf.c | 15 ++
net/socket.c | 26 +++-
.../testing/selftests/bpf/prog_tests/mptcp.c | 146 +++++++++++++++---
tools/testing/selftests/bpf/progs/mptcpify.c | 20 +++
4 files changed, 186 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/mptcpify.c
--
2.35.3
Hi all,
following bug is trying to workaround an error on ppc64le, where
zram01.sh LTP test (there is also kernel selftest
tools/testing/selftests/zram/zram01.sh, but LTP test got further
updates) has often mem_used_total 0 although zram is already filled.
Patch tries to repeatedly read /sys/block/zram*/mm_stat for 1 sec,
waiting for mem_used_total > 0. The question if this is expected and
should be workarounded or a bug which should be fixed.
REPRODUCE THE ISSUE
Quickest way to install only zram tests and their dependencies:
make autotools && ./configure && for i in testcases/lib/ testcases/kernel/device-drivers/zram/; do cd $i && make -j$(getconf _NPROCESSORS_ONLN) && make install && cd -; done
Run the test (only on vfat)
PATH="/opt/ltp/testcases/bin:$PATH" LTP_SINGLE_FS_TYPE=vfat zram01.sh
Petr Vorel (1):
zram01.sh: Workaround division by 0 on vfat on ppc64le
.../kernel/device-drivers/zram/zram01.sh | 27 +++++++++++++++++--
1 file changed, 25 insertions(+), 2 deletions(-)
--
2.38.0
*Changes in v26:*
- Code re-structurring and API changes in PAGEMAP_IOCTL
*Changes in v25*:
- Do proper filtering on hole as well (hole got missed earlier)
*Changes in v24*:
- Rebase on top of next-20230710
- Place WP markers in case of hole as well
*Changes in v23*:
- Set vec_buf_index in loop only when vec_buf_index is set
- Return -EFAULT instead of -EINVAL if vec is NULL
- Correctly return the walk ending address to the page granularity
*Changes in v22*:
- Interface change:
- Replace [start start + len) with [start, end)
- Return the ending address of the address walk in start
*Changes in v21*:
- Abort walk instead of returning error if WP is to be performed on
partial hugetlb
*Changes in v20*
- Correct PAGE_IS_FILE and add PAGE_IS_PFNZERO
*Changes in v19*
- Minor changes and interface updates
*Changes in v18*
- Rebase on top of next-20230613
- Minor updates
*Changes in v17*
- Rebase on top of next-20230606
- Minor improvements in PAGEMAP_SCAN IOCTL patch
*Changes in v16*
- Fix a corner case
- Add exclusive PM_SCAN_OP_WP back
*Changes in v15*
- Build fix (Add missed build fix in RESEND)
*Changes in v14*
- Fix build error caused by #ifdef added at last minute in some configs
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() and ResetWriteWatch() syscalls [1]. The GetWriteWatch()
retrieves the addresses of the pages that are written to in a region of
virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirty feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 64 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 653 ++++++++
fs/userfaultfd.c | 26 +-
include/linux/hugetlb.h | 1 +
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 58 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 34 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 58 +
tools/testing/selftests/mm/.gitignore | 2 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1485 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
16 files changed, 2457 insertions(+), 24 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
--
2.39.2
As reported and suggested by Willy, the inline __sysret() helper
introduces three types of conversions and increases the size:
(1) the "unsigned long" argument to __sysret() forces a sign extension
from all sys_* functions that used to return 'int'
(2) the comparison with the error range now has to be performed on a
'unsigned long' instead of an 'int'
(3) the return value from __sysret() is a 'long' (note, a signed long)
which then has to be turned back to an 'int' before being returned by the
caller to satisfy the caller's prototype.
To fix up this, firstly, let's use macro instead of inline function to
preserves the input type and avoids these useless conversions (1), (3).
Secondly, comparison to -MAX_ERRNO inflicts on all integer returns where
we could previously keep a simple sign comparison, let's use a new
is_signed_type() macro from include/linux/compiler.h to limit the
comparision to -MAX_ERRNO (2) only on demand and preserves a simple sign
comparision for most of the cases as before.
Thirdly, fix up the following warning by an explicit conversion and let
__sysret() be able to accept the (void *) type of argument:
sysroot/powerpc/include/sys.h: In function 'sbrk':
sysroot/powerpc/include/sys.h:104:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
104 | return (void *)__sysret(-ENOMEM);
Fourthly, to further workaround the argument type with 'const', must use
__auto_type in a new enough version or use 'long' as before.
Here reports the size testing result with nolibc-test:
before:
// ppc64le
$ size nolibc-test
text data bss dec hex filename
27916 8 80 28004 6d64 nolibc-test
// mips
$ size nolibc-test
text data bss dec hex filename
23276 64 64 23404 5b6c nolibc-test
after:
// ppc64le
$ size nolibc-test
text data bss dec hex filename
27736 8 80 27824 6cb0 nolibc-test
// mips
$ size nolibc-test
text data bss dec hex filename
23036 64 64 23164 5a7c nolibc-test
Suggested-by: Willy Tarreau <w(a)1wt.eu>
Link: https://lore.kernel.org/lkml/20230806095846.GB10627@1wt.eu/
Link: https://lore.kernel.org/lkml/20230806134348.GA19145@1wt.eu/
Signed-off-by: Zhangjin Wu <falcon(a)tinylab.org>
---
v2 here is further fix up argument with 'const' in the type and also
support "void *" argument, v1 is [1].
Tested on many architectures (i386, x86_64, mips, ppc64) and gcc version
(from gcc 4.8-13.1.0), compiles well without any warning and errors and
also with smaller size.
[1]: https://lore.kernel.org/lkml/20230806131921.52453-1-falcon@tinylab.org/
---
tools/include/nolibc/sys.h | 52 ++++++++++++++++++++++++++++++--------
1 file changed, 41 insertions(+), 11 deletions(-)
diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h
index 56f63eb48a1b..9c7448ae19e2 100644
--- a/tools/include/nolibc/sys.h
+++ b/tools/include/nolibc/sys.h
@@ -35,15 +35,45 @@
* (src/internal/syscall_ret.c) and glibc (sysdeps/unix/sysv/linux/sysdep.h)
*/
-static __inline__ __attribute__((unused, always_inline))
-long __sysret(unsigned long ret)
-{
- if (ret >= (unsigned long)-MAX_ERRNO) {
- SET_ERRNO(-(long)ret);
- return -1;
- }
- return ret;
-}
+/*
+ * Whether 'type' is a signed type or an unsigned type. Supports scalar types,
+ * bool and also pointer types. (from include/linux/compiler.h)
+ */
+#define __is_signed_type(type) (((type)(-1)) < (type)1)
+
+/* __auto_type is used instead of __typeof__ to workaround the build error
+ * 'error: assignment of read-only variable' when the argument has 'const' in
+ * the type, but __auto_type is a new feature from newer version and it only
+ * work with 'const' from gcc 11.0 (__GXX_ABI_VERSION = 1016)
+ * https://gcc.gnu.org/legacy-ml/gcc-patches/2013-11/msg01378.html
+ */
+
+#if __GXX_ABI_VERSION < 1016
+#define __typeofdecl(arg) long
+#define __typeofconv1(arg) (long)
+#define __typeofconv2(arg) (long)
+#else
+#define __typeofdecl(arg) __auto_type
+#define __typeofconv1(arg)
+#define __typeofconv2(arg) (__typeof__(arg))
+#endif
+
+#define __sysret(arg) \
+({ \
+ __typeofdecl(arg) __sysret_arg = __typeofconv1(arg)(arg); \
+ if (__is_signed_type(__typeof__(arg))) { \
+ if (__sysret_arg < 0) { \
+ SET_ERRNO(-(long)__sysret_arg); \
+ __sysret_arg = __typeofconv2(arg)(-1L); \
+ } \
+ } else { \
+ if ((unsigned long)__sysret_arg >= (unsigned long)-MAX_ERRNO) { \
+ SET_ERRNO(-(long)__sysret_arg); \
+ __sysret_arg = __typeofconv2(arg)(-1L); \
+ } \
+ } \
+ (__typeof__(arg))__sysret_arg; \
+})
/* Functions in this file only describe syscalls. They're declared static so
* that the compiler usually decides to inline them while still being allowed
@@ -94,7 +124,7 @@ void *sbrk(intptr_t inc)
if (ret && sys_brk(ret + inc) == ret + inc)
return ret + inc;
- return (void *)__sysret(-ENOMEM);
+ return __sysret((void *)-ENOMEM);
}
@@ -682,7 +712,7 @@ void *sys_mmap(void *addr, size_t length, int prot, int flags, int fd,
static __attribute__((unused))
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset)
{
- return (void *)__sysret((unsigned long)sys_mmap(addr, length, prot, flags, fd, offset));
+ return __sysret(sys_mmap(addr, length, prot, flags, fd, offset));
}
static __attribute__((unused))
--
2.25.1
As reported and suggested by Willy, the inline __sysret() helper
introduces three types of conversions and increases the size:
(1) the "unsigned long" argument to __sysret() forces a sign extension
from all sys_* functions that used to return 'int'
(2) the comparison with the error range now has to be performed on a
'unsigned long' instead of an 'int'
(3) the return value from __sysret() is a 'long' (note, a signed long)
which then has to be turned back to an 'int' before being returned by the
caller to satisfy the caller's prototype.
To fix up this, firstly, let's use macro instead of inline function to
preserves the input type and avoids these useless conversions (1), (3).
Secondly, comparison to -MAX_ERRNO inflicts on all integer returns where
we could previously keep a simple sign comparison, let's use a new
is_signed_type() macro from include/linux/compiler.h to limit the
comparision to -MAX_ERRNO (2) only on demand and preserves a simple sign
comparision for most of the cases as before.
Thirdly, fix up the following warning by an explicit conversion:
sysroot/powerpc/include/sys.h: In function 'sbrk':
sysroot/powerpc/include/sys.h:104:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
104 | return (void *)__sysret(-ENOMEM);
Here reports the size testing result with nolibc-test:
before:
// ppc64le
$ size nolibc-test
text data bss dec hex filename
27916 8 80 28004 6d64 nolibc-test
// mips
$ size nolibc-test
text data bss dec hex filename
23276 64 64 23404 5b6c nolibc-test
after:
// ppc64le
$ size nolibc-test
text data bss dec hex filename
27736 8 80 27824 6cb0 nolibc-test
// mips
$ size nolibc-test
text data bss dec hex filename
23036 64 64 23164 5a7c nolibc-test
Suggested-by: Willy Tarreau <w(a)1wt.eu>
Link: https://lore.kernel.org/lkml/20230806095846.GB10627@1wt.eu/#R
Signed-off-by: Zhangjin Wu <falcon(a)tinylab.org>
---
tools/include/nolibc/compiler.h | 9 +++++++++
tools/include/nolibc/sys.h | 27 +++++++++++++++++----------
2 files changed, 26 insertions(+), 10 deletions(-)
diff --git a/tools/include/nolibc/compiler.h b/tools/include/nolibc/compiler.h
index beddc3665d69..360dfc533814 100644
--- a/tools/include/nolibc/compiler.h
+++ b/tools/include/nolibc/compiler.h
@@ -22,4 +22,13 @@
# define __no_stack_protector __attribute__((__optimize__("-fno-stack-protector")))
#endif /* defined(__has_attribute) */
+/*
+ * from include/linux/compiler.h
+ *
+ * Whether 'type' is a signed type or an unsigned type. Supports scalar types,
+ * bool and also pointer types.
+ */
+#define is_signed_type(type) (((type)(-1)) < (type)1)
+#define is_unsigned_type(type) (!is_signed_type(type))
+
#endif /* _NOLIBC_COMPILER_H */
diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h
index 56f63eb48a1b..8271302f79c4 100644
--- a/tools/include/nolibc/sys.h
+++ b/tools/include/nolibc/sys.h
@@ -35,15 +35,22 @@
* (src/internal/syscall_ret.c) and glibc (sysdeps/unix/sysv/linux/sysdep.h)
*/
-static __inline__ __attribute__((unused, always_inline))
-long __sysret(unsigned long ret)
-{
- if (ret >= (unsigned long)-MAX_ERRNO) {
- SET_ERRNO(-(long)ret);
- return -1;
- }
- return ret;
-}
+#define __sysret(arg) \
+({ \
+ __typeof__(arg) __sysret_arg = (arg); \
+ if (is_signed_type(__typeof__(arg))) { \
+ if (__sysret_arg < 0) { \
+ SET_ERRNO(-(int)__sysret_arg); \
+ __sysret_arg = -1L; \
+ } \
+ } else { \
+ if ((unsigned long)__sysret_arg >= (unsigned long)-MAX_ERRNO) { \
+ SET_ERRNO(-(int)__sysret_arg); \
+ __sysret_arg = -1L; \
+ } \
+ } \
+ __sysret_arg; \
+})
/* Functions in this file only describe syscalls. They're declared static so
* that the compiler usually decides to inline them while still being allowed
@@ -94,7 +101,7 @@ void *sbrk(intptr_t inc)
if (ret && sys_brk(ret + inc) == ret + inc)
return ret + inc;
- return (void *)__sysret(-ENOMEM);
+ return (void *)__sysret((unsigned long)-ENOMEM);
}
--
2.25.1
Hi, Willy
Now, the dependent pmac32_defconfig patch has been merged into the
powerpc next-test branch [1] ;-)
v6 here with a clean up of the CFLAGS for ppc variants, removed the
redundant -Wl options and call cc-option to check the -mmultiple option
for llvm as kernel does. v5 is [2].
Tests run with local toolchains and latest toolchains.
$ for arch in ppc ppc64 ppc64le; do \
make run-user XARCH=$arch | grep "status: "; \
done
166 test(s): 158 passed, 8 skipped, 0 failed => status: warning
166 test(s): 158 passed, 8 skipped, 0 failed => status: warning
166 test(s): 158 passed, 8 skipped, 0 failed => status: warning
$ for arch in ppc ppc64 ppc64le; do \
make run-user XARCH=$arch CC=/labs/linux-lab/prebuilt/toolchains/ppc64/gcc-13.1.0-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc | grep "status: "; \
done
166 test(s): 158 passed, 8 skipped, 0 failed => status: warning
166 test(s): 158 passed, 8 skipped, 0 failed => status: warning
166 test(s): 158 passed, 8 skipped, 0 failed => status: warning
Changes from v5 --> v6:
* selftests/nolibc: add test support for ppc
selftests/nolibc: add test support for ppc64le
selftests/nolibc: add test support for ppc64
Removed the -Wl options.
As comment from arch/powerpc/Makefile, use -mmultiple with cc-option for llvm has no such options.
* tools/nolibc: add support for powerpc
tools/nolibc: add support for powerpc64
selftests/nolibc: add XARCH and ARCH mapping support
selftests/nolibc: allow customize CROSS_COMPILE by architecture
selftests/nolibc: customize CROSS_COMPILE for 32/64-bit powerpc
No changes.
BR,
Zhangjin Wu
---
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/commit/?h…
[2]: https://lore.kernel.org/lkml/cover.1691062722.git.falcon@tinylab.org/
Zhangjin Wu (8):
tools/nolibc: add support for powerpc
tools/nolibc: add support for powerpc64
selftests/nolibc: add XARCH and ARCH mapping support
selftests/nolibc: add test support for ppc
selftests/nolibc: add test support for ppc64le
selftests/nolibc: add test support for ppc64
selftests/nolibc: allow customize CROSS_COMPILE by architecture
selftests/nolibc: customize CROSS_COMPILE for 32/64-bit powerpc
tools/include/nolibc/arch-powerpc.h | 213 ++++++++++++++++++++++++
tools/include/nolibc/arch.h | 2 +
tools/testing/selftests/nolibc/Makefile | 74 ++++++--
3 files changed, 277 insertions(+), 12 deletions(-)
create mode 100644 tools/include/nolibc/arch-powerpc.h
--
2.25.1
Hi, Willy
Based on the CROSS_COMPILE customize support [1] from the last ppc
patchset, to further make run-user/run targets happy for all of the
nolibc supported architectures, let's customize CROSS_COMPILE for all of
them.
Beside loongarch, all of the other architectures have local toolchains.
let's use the one from [2] for loongarch, it has a different prefix.
And also, as suggested by you in our previous discuss, let's add some
notes for the toolchains and firmwares instead of automatically download
them.
Now, the test iteration becomes very simple and pretty:
$ ARCHS="i386 x86_64 arm64 arm mips ppc ppc64 ppc64le riscv s390"
$ for arch in ${ARCHS[@]}; do printf "%9s: " $arch; make run-user XARCH=$arch | grep status; done
i386: 165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
x86_64: 165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
arm64: 165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
arm: 165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
mips: 165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
ppc: 165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
ppc64: 165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
ppc64le: 165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
riscv: 165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
s390: 165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
(I have no qemu-user currently for loongarch, so, no test result above)
Best regards,
Zhangjin
---
[1] https://lore.kernel.org/lkml/cover.1691259983.git.falcon@tinylab.org/
[2] https://mirrors.edge.kernel.org/pub/tools/crosstool/
Zhangjin Wu (4):
selftests/nolibc: allow use x86_64 toolchain for i386
selftests/nolibc: customize CROSS_COMPILE for many architectures
selftests/nolibc: customize CROSS_COMPILE for loongarch
selftests/nolibc: add some notes about qemu tools
tools/testing/selftests/nolibc/Makefile | 32 ++++++++++++++++++++++++-
1 file changed, 31 insertions(+), 1 deletion(-)
--
2.25.1
To help the developers to avoid mistakes and keep the code smaller let's
enable compiler warnings.
I stuck with __attribute__((unused)) over __maybe_unused in
nolibc-test.c for consistency with nolibc proper.
If we want to add a define it needs to be added twice once for nolibc
proper and once for nolibc-test otherwise libc-test wouldn't build
anymore.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Changes in v3:
- Make getpagesize() return "int"
- Simplify validation of read() return value
- Don't make functions static that are to be used as breakpoints
- Drop -s from LDFLAGS
- Use proper types for read()/write() return values
- Fix unused parameter warnings in new setvbuf()
- Link to v2: https://lore.kernel.org/r/20230801-nolibc-warnings-v2-0-1ba5ca57bd9b@weisss…
Changes in v2:
- Don't drop unused test helpers, mark them as __attribute__((unused))
- Make some function in nolibc-test static
- Also handle -W and -Wextra
- Link to v1: https://lore.kernel.org/r/20230731-nolibc-warnings-v1-0-74973d2a52d7@weisss…
---
Thomas Weißschuh (14):
tools/nolibc: drop unused variables
tools/nolibc: fix return type of getpagesize()
tools/nolibc: setvbuf: avoid unused parameter warnings
tools/nolibc: sys: avoid implicit sign cast
tools/nolibc: stdint: use int for size_t on 32bit
selftests/nolibc: drop unused variables
selftests/nolibc: mark test helpers as potentially unused
selftests/nolibc: make functions static if possible
selftests/nolibc: avoid unused parameter warnings
selftests/nolibc: avoid sign-compare warnings
selftests/nolibc: use correct return type for read() and write()
selftests/nolibc: prevent out of bounds access in expect_vfprintf
selftests/nolibc: don't strip nolibc-test
selftests/nolibc: enable compiler warnings
tools/include/nolibc/stdint.h | 4 +
tools/include/nolibc/stdio.h | 5 +-
tools/include/nolibc/sys.h | 7 +-
tools/testing/selftests/nolibc/Makefile | 4 +-
tools/testing/selftests/nolibc/nolibc-test.c | 111 ++++++++++++++++-----------
5 files changed, 80 insertions(+), 51 deletions(-)
---
base-commit: bc87f9562af7b2b4cb07dcaceccfafcf05edaff8
change-id: 20230731-nolibc-warnings-c6e47284ac03
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Hi,
This is the v2 to fix cpu buffers unavailable problem after some
operations on file 'tracing_cpumask' and 'snapshot', also upload
its testcase. Changes show as below.
v2:
- Fix compile issue reported-by kernel test robot <lkp(a)intel.com> with
suggestion from Steve:
- Link: https://lore.kernel.org/all/202308050731.PQutr3r0-lkp@intel.com/
- Link: https://lore.kernel.org/all/20230804125107.41d6cdb1@gandalf.local.home/
- Add a step to set tracing_on in testcase (see patch 2) and add
descriptions of each step.
v1:
- Link: https://lore.kernel.org/all/20230804124549.2562977-1-zhengyejian1@huawei.co…
Zheng Yejian (2):
tracing: Fix cpu buffers unavailable due to 'record_disabled' messed
selftests/ftrace: Add a basic testcase for snapshot
kernel/trace/trace.c | 6 ++++
.../ftrace/test.d/00basic/snapshot1.tc | 31 +++++++++++++++++++
2 files changed, 37 insertions(+)
create mode 100644 tools/testing/selftests/ftrace/test.d/00basic/snapshot1.tc
--
2.25.1
Hi, steve,
after some operations on file 'tracing_cpumask' and 'snapshot', trace
ring buffer can no longer record anything. This series contain a patch
to fix it and a constrived testcase to reproduce it.
I think the reproduction testcase is useful to help others to check if
their version has this problem and verify the bugfix. However, currently
in "tools/testing/selftests/ftrace/test.d", there seems no appropriate
subdirectory to put this kind reproductions.
So I now put the testcase in "00basic" because it is basicly simple. Or
would there be a new directory to collect simple reproduction testcases?
Zheng Yejian (2):
tracing: Fix cpu buffers unavailable due to 'record_disabled' messed
selftests/ftrace: Add a basic testcase for snapshot
kernel/trace/trace.c | 2 ++
.../ftrace/test.d/00basic/snapshot1.tc | 17 +++++++++++++++++
2 files changed, 19 insertions(+)
create mode 100644 tools/testing/selftests/ftrace/test.d/00basic/snapshot1.tc
--
2.25.1
Here is a new batch of fixes related to MPTCP for v6.5 and older.
Patches 1 and 2 fix issues with MPTCP Join selftest when manually
launched with '-i' parameter to use 'ip mptcp' tool instead of the
dedicated one (pm_nl_ctl). The issues have been there since v5.18.
Thank you Andrea for your first contributions to MPTCP code in the
upstream kernel!
Patch 3 avoids corrupting the data stream when trying to reset
connections that have fallen back to TCP. This can happen from v6.1.
Patch 4 fixes a race when doing a disconnect() and an accept() in
parallel on a listener socket. The issue only happens in rare cases if
the user is really unlucky since a fix that landed in v6.3 but
backported up to v6.1.
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Andrea Claudi (2):
selftests: mptcp: join: fix 'delete and re-add' test
selftests: mptcp: join: fix 'implicit EP' test
Paolo Abeni (2):
mptcp: avoid bogus reset on fallback close
mptcp: fix disconnect vs accept race
net/mptcp/protocol.c | 2 +-
net/mptcp/protocol.h | 1 -
net/mptcp/subflow.c | 60 ++++++++++++-------------
tools/testing/selftests/net/mptcp/mptcp_join.sh | 6 ++-
4 files changed, 35 insertions(+), 34 deletions(-)
---
base-commit: 0f71c9caf26726efea674646f566984e735cc3b9
change-id: 20230803-upstream-net-20230803-misc-fixes-6-5-6046c6ca74b6
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
Submit the top-level headers also from the kunit test module notifier
initialization callback, so external tools that are parsing dmesg for
kunit test output are able to tell how many test suites should be expected
and whether to continue parsing after complete output from the first test
suite is collected.
Extend kunit module notifier initialization callback with a processing
path for only listing the tests provided by a module if the kunit action
parameter is set to "list", so external tools can obtain a list of test
cases to be executed in advance and can make a better job on assigning
kernel messages interleaved with kunit output to specific tests.
Use test filtering functions in kunit module notifier callback functions,
so external tools are able to execute individual test cases from kunit
test modules in order to still better isolate their potential impact on
kernel messages that appear interleaved with output from other tests.
v4: Use kunit_exec_run_tests() (Mauro, Rae), but prevent it from
emitting the headers when called on load of non-test modules,
- don't use a different list format, use kunit_exec_list_tests() (Rae),
- refresh on top of newly introduced attributes patches, handle newly
introduced kunit.action=list_attr case (Rae).
v3: Fix CONFIG_GLOB, required by filtering functions, not selected when
building as a module.
v2: Fix new name of a structure moved to kunit namespace not updated
across all uses.
Janusz Krzysztofik (3):
kunit: Report the count of test suites in a module
kunit: Make 'list' action available to kunit test modules
kunit: Allow kunit test modules to use test filtering
include/kunit/test.h | 21 ++++++++
lib/kunit/Kconfig | 2 +-
lib/kunit/executor.c | 115 +++++++++++++++++++++++++------------------
lib/kunit/test.c | 40 ++++++++++++++-
4 files changed, 128 insertions(+), 50 deletions(-)
base-commit: 5a175d369c702ce08c9feb630125c9fc7a9e1370
--
2.41.0
Commit 3bcbc20942db ("selftests/rseq: Play nice with binaries statically
linked against glibc 2.35+") which is now in Linus' tree introduced uses
of __weak but did nothing to ensure that a definition is provided for it
resulting in build failures for the rseq tests:
rseq.c:41:1: error: unknown type name '__weak'
__weak ptrdiff_t __rseq_offset;
^
rseq.c:41:17: error: expected ';' after top level declarator
__weak ptrdiff_t __rseq_offset;
^
;
rseq.c:42:1: error: unknown type name '__weak'
__weak unsigned int __rseq_size;
^
rseq.c:43:1: error: unknown type name '__weak'
__weak unsigned int __rseq_flags;
Fix this by using the definition from tools/include compiler.h.
Fixes: 3bcbc20942db ("selftests/rseq: Play nice with binaries statically linked against glibc 2.35+")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
It'd be good if the KVM testing could include builds of the rseq
selftests, the KVM tests pull in code from rseq but not the build system
which has resulted in multiple failures like this.
---
tools/testing/selftests/rseq/Makefile | 4 +++-
tools/testing/selftests/rseq/rseq.c | 2 ++
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index b357ba24af06..7a957c7d459a 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -4,8 +4,10 @@ ifneq ($(shell $(CC) --version 2>&1 | head -n 1 | grep clang),)
CLANG_FLAGS += -no-integrated-as
endif
+top_srcdir = ../../../..
+
CFLAGS += -O2 -Wall -g -I./ $(KHDR_INCLUDES) -L$(OUTPUT) -Wl,-rpath=./ \
- $(CLANG_FLAGS)
+ $(CLANG_FLAGS) -I$(top_srcdir)/tools/include
LDLIBS += -lpthread -ldl
# Own dependencies because we only want to build against 1st prerequisite, but
diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
index a723da253244..96e812bdf8a4 100644
--- a/tools/testing/selftests/rseq/rseq.c
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -31,6 +31,8 @@
#include <sys/auxv.h>
#include <linux/auxvec.h>
+#include <linux/compiler.h>
+
#include "../kselftest.h"
#include "rseq.h"
---
base-commit: 5d0c230f1de8c7515b6567d9afba1f196fb4e2f4
change-id: 20230804-kselftest-rseq-build-9d537942b1de
Best regards,
--
Mark Brown <broonie(a)kernel.org>
test_kmem_basic creates 100,000 negative dentries, with each one mapping
to a slab object. After memory.high is set, these are reclaimed through
the shrink_slab function call which reclaims all 100,000 entries. The
test passes the majority of the time because when slab1 or current is
calculated, it is often above 0, however, 0 is also an acceptable value.
Signed-off-by: Lucas Karpinski <lkarpins(a)redhat.com>
---
In the previous patch, I missed a change to the variable 'current' even
after some testing as the issue was so sporadic. Current takes the slab
size into account and can also face the same issue where it fails since
the reported value is 0, which is an acceptable value.
Drop: b4abfc19 in mm-unstable
V2: https://lore.kernel.org/all/ix6vzgjqay2x7bskle7pypoint4nj66fwq7odvd5hektatv…
tools/testing/selftests/cgroup/test_kmem.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c
index 1b2cec9d18a4..ed2e50bb1e76 100644
--- a/tools/testing/selftests/cgroup/test_kmem.c
+++ b/tools/testing/selftests/cgroup/test_kmem.c
@@ -75,11 +75,11 @@ static int test_kmem_basic(const char *root)
sleep(1);
slab1 = cg_read_key_long(cg, "memory.stat", "slab ");
- if (slab1 <= 0)
+ if (slab1 < 0)
goto cleanup;
current = cg_read_long(cg, "memory.current");
- if (current <= 0)
+ if (current < 0)
goto cleanup;
if (slab1 < slab0 / 2 && current < slab0 / 2)
--
2.41.0
Hi, Willy
Here is last 3 patches for v6.6 from me.
It includes two generic patches from the tinyconfig part1 series and one
static related patch derived from Thomas' series.
Best regards,
Zhangjin
Zhangjin Wu (3):
selftests/nolibc: allow report with existing test log
selftests/nolibc: fix up O= option support
tools/nolibc: stackprotector.h: make __stack_chk_init static
tools/include/nolibc/crt.h | 2 +-
tools/include/nolibc/stackprotector.h | 5 ++---
tools/testing/selftests/nolibc/Makefile | 11 +++++++++--
3 files changed, 12 insertions(+), 6 deletions(-)
--
2.25.1
Will noticed that with newer toolchains memcpy() ends up being
implemented with SVE instructions, breaking the signals tests when in
streaming mode. We fixed this by using an open coded version of
OPTIMZER_HIDE_VAR(), but in the process it was noticed that some of the
selftests are using the tools/include headers and it might be nice to
share things there. We also have a custom compiler.h in the BTI tests.
Update the tools/include headers to have what we need, pull them into
the arm64 selftests build and make use of them in the signals and BTI
tests. Since the resulting changes are a bit invasive for a fix we keep
an initial patch using the open coding, updating and replacing that
later.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v4:
- Roll in a refactoring to include and use the tools/include headers.
- Link to v3: https://lore.kernel.org/r/20230720-arm64-signal-memcpy-fix-v3-1-08aed2385d6…
Changes in v3:
- Open code OPTIMISER_HIDE_VAR() instead of the memory clobber.
- Link to v2: https://lore.kernel.org/r/20230712-arm64-signal-memcpy-fix-v2-1-494f7025caf…
Changes in v2:
- Rebase onto v6.5-rc1.
- Link to v1: https://lore.kernel.org/r/20230628-arm64-signal-memcpy-fix-v1-1-db3e0300829…
---
Mark Brown (6):
kselftest/arm64: Exit streaming mode after collecting signal context
tools compiler.h: Add OPTIMIZER_HIDE_VAR()
tools include: Add some common function attributes
kselftest/arm64: Make the tools/include headers available
kselftest/arm64: Use shared OPTIMZER_HIDE_VAR() definiton
kselftest/arm64: Use the tools/include compiler.h rather than our own
tools/include/linux/compiler.h | 18 +++++++++++++++
tools/testing/selftests/arm64/Makefile | 2 ++
tools/testing/selftests/arm64/bti/compiler.h | 21 -----------------
tools/testing/selftests/arm64/bti/system.c | 4 +---
tools/testing/selftests/arm64/bti/system.h | 4 ++--
tools/testing/selftests/arm64/bti/test.c | 1 -
.../selftests/arm64/signal/test_signals_utils.h | 27 +++++++++++++++++++++-
7 files changed, 49 insertions(+), 28 deletions(-)
---
base-commit: 6eaae198076080886b9e7d57f4ae06fa782f90ef
change-id: 20230628-arm64-signal-memcpy-fix-7de3b3c8fa10
Best regards,
--
Mark Brown <broonie(a)kernel.org>
[ Upstream commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 ]
Dan Carpenter spotted a race condition in a couple of situations like
these in the test_firmware driver:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
ret = kstrtou8(buf, 10, &val);
if (ret)
return ret;
mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
static ssize_t config_num_requests_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
int rc;
mutex_lock(&test_fw_mutex);
if (test_fw_config->reqs) {
pr_err("Must call release_all_firmware prior to changing config\n");
rc = -EINVAL;
mutex_unlock(&test_fw_mutex);
goto out;
}
mutex_unlock(&test_fw_mutex);
// NOTE: HERE is the race!!! Function can be preempted!
// test_fw_config->reqs can change between the release of
// the lock about and acquire of the lock in the
// test_dev_config_update_u8()
rc = test_dev_config_update_u8(buf, count,
&test_fw_config->num_requests);
out:
return rc;
}
static ssize_t config_read_fw_idx_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
return test_dev_config_update_u8(buf, count,
&test_fw_config->read_fw_idx);
}
The function test_dev_config_update_u8() is called from both the locked
and the unlocked context, function config_num_requests_store() and
config_read_fw_idx_store() which can both be called asynchronously as
they are driver's methods, while test_dev_config_update_u8() and siblings
change their argument pointed to by u8 *cfg or similar pointer.
To avoid deadlock on test_fw_mutex, the lock is dropped before calling
test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8()
itself, but alas this creates a race condition.
Having two locks wouldn't assure a race-proof mutual exclusion.
This situation is best avoided by the introduction of a new, unlocked
function __test_dev_config_update_u8() which can be called from the locked
context and reducing test_dev_config_update_u8() to:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
mutex_lock(&test_fw_mutex);
ret = __test_dev_config_update_u8(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
}
doing the locking and calling the unlocked primitive, which enables both
locked and unlocked versions without duplication of code.
Fixes: c92316bf8e948 ("test_firmware: add batched firmware tests")
Cc: Luis R. Rodriguez <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Takashi Iwai <tiwai(a)suse.de>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v5.4, 4.19, 4.14
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Link: https://lore.kernel.org/r/20230509084746.48259-1-mirsad.todorovac@alu.unizg…
Signed-off-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
[ This is the patch to fix the racing condition in locking for the 5.4, ]
[ 4.19 and 4.14 stable branches. Not all the fixes from the upstream ]
[ commit apply, but those which do are verbatim equal to those in the ]
[ upstream commit. ]
---
v3:
minor bug fixes in the commit description. no change to the code.
5.4, 4.19 and 4.14 passed build, 5.4 and 4.19 passed kselftest.
unable to boot 4.14, should work (no changes to lib/test_firmware.c).
v2:
bundled locking and ENOSPC patches together.
tested on 5.4 and 4.19 stable.
lib/test_firmware.c | 37 ++++++++++++++++++++++++++++---------
1 file changed, 28 insertions(+), 9 deletions(-)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index 38553944e967..92d7195d5b5b 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -301,16 +301,26 @@ static ssize_t config_test_show_str(char *dst,
return len;
}
-static int test_dev_config_update_bool(const char *buf, size_t size,
- bool *cfg)
+static inline int __test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
{
int ret;
- mutex_lock(&test_fw_mutex);
if (strtobool(buf, cfg) < 0)
ret = -EINVAL;
else
ret = size;
+
+ return ret;
+}
+
+static int test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_bool(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
@@ -340,7 +350,7 @@ static ssize_t test_dev_config_show_int(char *buf, int cfg)
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+static inline int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
long new;
@@ -352,14 +362,23 @@ static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
if (new > U8_MAX)
return -EINVAL;
- mutex_lock(&test_fw_mutex);
*(u8 *)cfg = new;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
+static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_u8(buf, size, cfg);
+ mutex_unlock(&test_fw_mutex);
+
+ return ret;
+}
+
static ssize_t test_dev_config_show_u8(char *buf, u8 cfg)
{
u8 val;
@@ -392,10 +411,10 @@ static ssize_t config_num_requests_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_u8(buf, count,
- &test_fw_config->num_requests);
+ rc = __test_dev_config_update_u8(buf, count,
+ &test_fw_config->num_requests);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
--
2.34.1
[ Upstream commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 ]
Dan Carpenter spotted a race condition in a couple of situations like
these in the test_firmware driver:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
ret = kstrtou8(buf, 10, &val);
if (ret)
return ret;
mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
static ssize_t config_num_requests_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
int rc;
mutex_lock(&test_fw_mutex);
if (test_fw_config->reqs) {
pr_err("Must call release_all_firmware prior to changing config\n");
rc = -EINVAL;
mutex_unlock(&test_fw_mutex);
goto out;
}
mutex_unlock(&test_fw_mutex);
// NOTE: HERE is the race!!! Function can be preempted!
// test_fw_config->reqs can change between the release of
// the lock about and acquire of the lock in the
// test_dev_config_update_u8()
rc = test_dev_config_update_u8(buf, count,
&test_fw_config->num_requests);
out:
return rc;
}
static ssize_t config_read_fw_idx_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
return test_dev_config_update_u8(buf, count,
&test_fw_config->read_fw_idx);
}
The function test_dev_config_update_u8() is called from both the locked
and the unlocked context, function config_num_requests_store() and
config_read_fw_idx_store() which can both be called asynchronously as
they are driver's methods, while test_dev_config_update_u8() and siblings
change their argument pointed to by u8 *cfg or similar pointer.
To avoid deadlock on test_fw_mutex, the lock is dropped before calling
test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8()
itself, but alas this creates a race condition.
Having two locks wouldn't assure a race-proof mutual exclusion.
This situation is best avoided by the introduction of a new, unlocked
function __test_dev_config_update_u8() which can be called from the locked
context and reducing test_dev_config_update_u8() to:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
mutex_lock(&test_fw_mutex);
ret = __test_dev_config_update_u8(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
}
doing the locking and calling the unlocked primitive, which enables both
locked and unlocked versions without duplication of code.
Fixes: c92316bf8e948 ("test_firmware: add batched firmware tests")
Cc: Luis R. Rodriguez <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Takashi Iwai <tiwai(a)suse.de>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v5.4, 4.19, 4.14
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Link: https://lore.kernel.org/r/20230509084746.48259-1-mirsad.todorovac@alu.unizg…
Signed-off-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
[ This is the patch to fix the racing condition in locking for the 5.4, ]
[ 4.19 and 4.14 stable branches. Not all the fixes from the upstream ]
[ commit apply, but those which do are verbatim equal to those in the ]
[ upstream commit. ]
---
v4:
minor versioning clarifications for the patchwork. no changes to the commit.
v3:
fixed a minor typo. no change to commit.
v2:
tested on 5.4 stable build.
lib/test_firmware.c | 37 ++++++++++++++++++++++++++++---------
1 file changed, 28 insertions(+), 9 deletions(-)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index 38553944e967..92d7195d5b5b 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -301,16 +301,26 @@ static ssize_t config_test_show_str(char *dst,
return len;
}
-static int test_dev_config_update_bool(const char *buf, size_t size,
- bool *cfg)
+static inline int __test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
{
int ret;
- mutex_lock(&test_fw_mutex);
if (strtobool(buf, cfg) < 0)
ret = -EINVAL;
else
ret = size;
+
+ return ret;
+}
+
+static int test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_bool(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
@@ -340,7 +350,7 @@ static ssize_t test_dev_config_show_int(char *buf, int cfg)
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+static inline int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
long new;
@@ -352,14 +362,23 @@ static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
if (new > U8_MAX)
return -EINVAL;
- mutex_lock(&test_fw_mutex);
*(u8 *)cfg = new;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
+static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_u8(buf, size, cfg);
+ mutex_unlock(&test_fw_mutex);
+
+ return ret;
+}
+
static ssize_t test_dev_config_show_u8(char *buf, u8 cfg)
{
u8 val;
@@ -392,10 +411,10 @@ static ssize_t config_num_requests_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_u8(buf, count,
- &test_fw_config->num_requests);
+ rc = __test_dev_config_update_u8(buf, count,
+ &test_fw_config->num_requests);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
--
2.34.1
[ commit be37bed754ed90b2655382f93f9724b3c1aae847 upstream ]
Dan Carpenter spotted that test_fw_config->reqs will be leaked if
trigger_batched_requests_store() is called two or more times.
The same appears with trigger_batched_requests_async_store().
This bug wasn't triggered by the tests, but observed by Dan's visual
inspection of the code.
The recommended workaround was to return -EBUSY if test_fw_config->reqs
is already allocated.
Fixes: c92316bf8e94 ("test_firmware: add batched firmware tests")
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v4.19
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Suggested-by: Takashi Iwai <tiwai(a)suse.de>
Link: https://lore.kernel.org/r/20230509084746.48259-2-mirsad.todorovac@alu.unizg…
Signed-off-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
[ This is a backport to v4.19 stable branch without a change in code from the 5.4+ patch ]
---
v2:
no changes to commit. minor clarifications with versioning for the patchwork.
v1:
patch sumbmitted verbatim from the 5.4+ branch to 4.19
lib/test_firmware.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index f4cc874021da..e4688821eab8 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -618,6 +618,11 @@ static ssize_t trigger_batched_requests_store(struct device *dev,
mutex_lock(&test_fw_mutex);
+ if (test_fw_config->reqs) {
+ rc = -EBUSY;
+ goto out_bail;
+ }
+
test_fw_config->reqs =
vzalloc(array3_size(sizeof(struct test_batched_req),
test_fw_config->num_requests, 2));
@@ -721,6 +726,11 @@ ssize_t trigger_batched_requests_async_store(struct device *dev,
mutex_lock(&test_fw_mutex);
+ if (test_fw_config->reqs) {
+ rc = -EBUSY;
+ goto out_bail;
+ }
+
test_fw_config->reqs =
vzalloc(array3_size(sizeof(struct test_batched_req),
test_fw_config->num_requests, 2));
--
2.34.1
The openvswitch selftests currently contain a few cases for managing the
datapath, which includes creating datapath instances, adding interfaces,
and doing some basic feature / upcall tests. This is useful to validate
the control path.
Add the ability to program some of the more common flows with actions. This
can be improved overtime to include regression testing, etc.
v2->v3:
1. Dropped support for ipv6 in nat() case
2. Fixed a spelling mistake in 2/5 commit message.
v1->v2:
1. Fix issue when parsing ipv6 in the NAT action
2. Fix issue calculating length during ctact parsing
3. Fix error message when invalid bridge is passed
4. Fold in Adrian's patch to support key masks
Aaron Conole (4):
selftests: openvswitch: add an initial flow programming case
selftests: openvswitch: add a test for ipv4 forwarding
selftests: openvswitch: add basic ct test case parsing
selftests: openvswitch: add ct-nat test case with ipv4
Adrian Moreno (1):
selftests: openvswitch: support key masks
.../selftests/net/openvswitch/openvswitch.sh | 223 +++++++
.../selftests/net/openvswitch/ovs-dpctl.py | 588 +++++++++++++++++-
2 files changed, 787 insertions(+), 24 deletions(-)
--
2.40.1
Submit the top-level headers also from the kunit test module notifier
initialization callback, so external tools that are parsing dmesg for
kunit test output are able to tell how many test suites should be expected
and whether to continue parsing after complete output from the first test
suite is collected.
Extend kunit module notifier initialization callback with a processing
path for only listing the tests provided by a module if the kunit action
parameter is set to "list", so external tools can obtain a list of test
cases to be executed in advance and can make a better job on assigning
kernel messages interleaved with kunit output to specific tests.
Use test filtering functions in kunit module notifier callback functions,
so external tools are able to execute individual test cases from kunit
test modules in order to still better isolate their potential impact on
kernel messages that appear interleaved with output from other tests.
v3: Fix CONFIG_GLOB, required by filtering fuctions, not selected when
building as a module.
v2: Fix new name of a structure moved to kunit namespace not updated
across all uses.
Janusz Krzysztofik (3):
kunit: Report the count of test suites in a module
kunit: Make 'list' action available to kunit test modules
kunit: Allow kunit test modules to use test filtering
include/kunit/test.h | 14 +++++++++++
lib/kunit/Kconfig | 2 +-
lib/kunit/executor.c | 57 +++++++++++++++++++++++++-------------------
lib/kunit/test.c | 57 +++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 104 insertions(+), 26 deletions(-)
--
2.41.0
NOTE: This patch is tested against 5.4 stable
NOTE: This is a patch for the 5.4 stable branch, not for the torvalds tree.
The torvalds tree, and stable tree 5.10, 5.15, 6.1 and 6.4 branches
were fixed in the separate
commit ID 4acfe3dfde68 ("test_firmware: prevent race conditions by a correct implementation of locking")
which was incompatible with 5.4
Dan Carpenter spotted a race condition in a couple of situations like
these in the test_firmware driver:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
ret = kstrtou8(buf, 10, &val);
if (ret)
return ret;
mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
static ssize_t config_num_requests_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
int rc;
mutex_lock(&test_fw_mutex);
if (test_fw_config->reqs) {
pr_err("Must call release_all_firmware prior to changing config\n");
rc = -EINVAL;
mutex_unlock(&test_fw_mutex);
goto out;
}
mutex_unlock(&test_fw_mutex);
// NOTE: HERE is the race!!! Function can be preempted!
// test_fw_config->reqs can change between the release of
// the lock about and acquire of the lock in the
// test_dev_config_update_u8()
rc = test_dev_config_update_u8(buf, count,
&test_fw_config->num_requests);
out:
return rc;
}
static ssize_t config_read_fw_idx_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
return test_dev_config_update_u8(buf, count,
&test_fw_config->read_fw_idx);
}
The function test_dev_config_update_u8() is called from both the locked
and the unlocked context, function config_num_requests_store() and
config_read_fw_idx_store() which can both be called asynchronously as
they are driver's methods, while test_dev_config_update_u8() and siblings
change their argument pointed to by u8 *cfg or similar pointer.
To avoid deadlock on test_fw_mutex, the lock is dropped before calling
test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8()
itself, but alas this creates a race condition.
Having two locks wouldn't assure a race-proof mutual exclusion.
This situation is best avoided by the introduction of a new, unlocked
function __test_dev_config_update_u8() which can be called from the locked
context and reducing test_dev_config_update_u8() to:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
mutex_lock(&test_fw_mutex);
ret = __test_dev_config_update_u8(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
}
doing the locking and calling the unlocked primitive, which enables both
locked and unlocked versions without duplication of code.
Fixes: c92316bf8e948 ("test_firmware: add batched firmware tests")
Cc: Luis R. Rodriguez <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Takashi Iwai <tiwai(a)suse.de>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v5.4
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Link: https://lore.kernel.org/r/20230509084746.48259-1-mirsad.todorovac@alu.unizg…
Signed-off-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
---
lib/test_firmware.c | 37 ++++++++++++++++++++++++++++---------
1 file changed, 28 insertions(+), 9 deletions(-)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index 38553944e967..92d7195d5b5b 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -301,16 +301,26 @@ static ssize_t config_test_show_str(char *dst,
return len;
}
-static int test_dev_config_update_bool(const char *buf, size_t size,
- bool *cfg)
+static inline int __test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
{
int ret;
- mutex_lock(&test_fw_mutex);
if (strtobool(buf, cfg) < 0)
ret = -EINVAL;
else
ret = size;
+
+ return ret;
+}
+
+static int test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_bool(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
@@ -340,7 +350,7 @@ static ssize_t test_dev_config_show_int(char *buf, int cfg)
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+static inline int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
long new;
@@ -352,14 +362,23 @@ static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
if (new > U8_MAX)
return -EINVAL;
- mutex_lock(&test_fw_mutex);
*(u8 *)cfg = new;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
+static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_u8(buf, size, cfg);
+ mutex_unlock(&test_fw_mutex);
+
+ return ret;
+}
+
static ssize_t test_dev_config_show_u8(char *buf, u8 cfg)
{
u8 val;
@@ -392,10 +411,10 @@ static ssize_t config_num_requests_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_u8(buf, count,
- &test_fw_config->num_requests);
+ rc = __test_dev_config_update_u8(buf, count,
+ &test_fw_config->num_requests);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
--
2.34.1
test_kmem_basic creates 100,000 negative dentries, with each one mapping
to a slab object. After memory.high is set, these are reclaimed through
the shrink_slab function call which reclaims all 100,000 entries. The
test passes the majority of the time because when slab1 is calculated,
it is often above 0, however, 0 is also an acceptable value.
Signed-off-by: Lucas Karpinski <lkarpins(a)redhat.com>
---
v3: rebased on mm-unstable
tools/testing/selftests/cgroup/test_kmem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c
index 1b2cec9d18a4..67cc0182058d 100644
--- a/tools/testing/selftests/cgroup/test_kmem.c
+++ b/tools/testing/selftests/cgroup/test_kmem.c
@@ -75,7 +75,7 @@ static int test_kmem_basic(const char *root)
sleep(1);
slab1 = cg_read_key_long(cg, "memory.stat", "slab ");
- if (slab1 <= 0)
+ if (slab1 < 0)
goto cleanup;
current = cg_read_long(cg, "memory.current");
--
2.41.0
As is described in the "How to use MPTCP?" section in MPTCP wiki [1]:
"Your app should create sockets with IPPROTO_MPTCP as the proto:
( socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); ). Legacy apps can be
forced to create and use MPTCP sockets instead of TCP ones via the
mptcpize command bundled with the mptcpd daemon."
But the mptcpize (LD_PRELOAD technique) command has some limitations
[2]:
- it doesn't work if the application is not using libc (e.g. GoLang
apps)
- in some envs, it might not be easy to set env vars / change the way
apps are launched, e.g. on Android
- mptcpize needs to be launched with all apps that want MPTCP: we could
have more control from BPF to enable MPTCP only for some apps or all the
ones of a netns or a cgroup, etc.
- it is not in BPF, we cannot talk about it at netdev conf.
So this patchset attempts to use BPF to implement functions similer to
mptcpize.
The main idea is to add a hook in sys_socket() to change the protocol id
from IPPROTO_TCP (or 0) to IPPROTO_MPTCP.
[1]
https://github.com/multipath-tcp/mptcp_net-next/wiki
[2]
https://github.com/multipath-tcp/mptcp_net-next/issues/79
v10:
- drop "#ifdef CONFIG_BPF_JIT".
- include vmlinux.h and bpf_tracing_net.h to avoid defining some
macros.
- drop unneeded checks for mptcp.
v9:
- update comment for 'update_socket_protocol'.
v8:
- drop the additional checks on the 'protocol' value after the
'update_socket_protocol()' call.
v7:
- add __weak and __diag_* for update_socket_protocol.
v6:
- add update_socket_protocol.
v5:
- add bpf_mptcpify helper.
v4:
- use lsm_cgroup/socket_create
v3:
- patch 8: char cmd[128]; -> char cmd[256];
v2:
- Fix build selftests errors reported by CI
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/79
Geliang Tang (5):
bpf: Add update_socket_protocol hook
selftests/bpf: Use random netns name for mptcp
selftests/bpf: Add two mptcp netns helpers
selftests/bpf: Drop unneeded checks for mptcp
selftests/bpf: Add mptcpify test
net/mptcp/bpf.c | 15 ++
net/socket.c | 24 ++++
.../testing/selftests/bpf/prog_tests/mptcp.c | 129 +++++++++++++++---
tools/testing/selftests/bpf/progs/mptcpify.c | 20 +++
4 files changed, 169 insertions(+), 19 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/mptcpify.c
--
2.35.3
As is described in the "How to use MPTCP?" section in MPTCP wiki [1]:
"Your app should create sockets with IPPROTO_MPTCP as the proto:
( socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); ). Legacy apps can be
forced to create and use MPTCP sockets instead of TCP ones via the
mptcpize command bundled with the mptcpd daemon."
But the mptcpize (LD_PRELOAD technique) command has some limitations
[2]:
- it doesn't work if the application is not using libc (e.g. GoLang
apps)
- in some envs, it might not be easy to set env vars / change the way
apps are launched, e.g. on Android
- mptcpize needs to be launched with all apps that want MPTCP: we could
have more control from BPF to enable MPTCP only for some apps or all the
ones of a netns or a cgroup, etc.
- it is not in BPF, we cannot talk about it at netdev conf.
So this patchset attempts to use BPF to implement functions similer to
mptcpize.
The main idea is to add a hook in sys_socket() to change the protocol id
from IPPROTO_TCP (or 0) to IPPROTO_MPTCP.
[1]
https://github.com/multipath-tcp/mptcp_net-next/wiki
[2]
https://github.com/multipath-tcp/mptcp_net-next/issues/79
v9:
- update comment for 'update_socket_protocol'.
v8:
- drop the additional checks on the 'protocol' value after the
'update_socket_protocol()' call.
v7:
- add __weak and __diag_* for update_socket_protocol.
v6:
- add update_socket_protocol.
v5:
- add bpf_mptcpify helper.
v4:
- use lsm_cgroup/socket_create
v3:
- patch 8: char cmd[128]; -> char cmd[256];
v2:
- Fix build selftests errors reported by CI
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/79
Geliang Tang (4):
bpf: Add update_socket_protocol hook
selftests/bpf: Use random netns name for mptcp
selftests/bpf: Add two mptcp netns helpers
selftests/bpf: Add mptcpify test
net/mptcp/bpf.c | 17 +++
net/socket.c | 24 ++++
.../testing/selftests/bpf/prog_tests/mptcp.c | 125 ++++++++++++++++--
tools/testing/selftests/bpf/progs/mptcpify.c | 25 ++++
4 files changed, 182 insertions(+), 9 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/mptcpify.c
--
2.35.3
This test fails routinely in our prod testing environment, and I can
reproduce it locally as well.
The test allocates dcache inside a cgroup, then drops the memory limit
and checks that usage drops correspondingly. The reason it fails is
because dentries are freed with an RCU delay - a debugging sleep shows
that usage drops as expected shortly after.
Insert a 1s sleep after dropping the limit. This should be good
enough, assuming that machines running those tests are otherwise not
very busy.
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
---
tools/testing/selftests/cgroup/test_kmem.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c
index 258ddc565deb..1b2cec9d18a4 100644
--- a/tools/testing/selftests/cgroup/test_kmem.c
+++ b/tools/testing/selftests/cgroup/test_kmem.c
@@ -70,6 +70,10 @@ static int test_kmem_basic(const char *root)
goto cleanup;
cg_write(cg, "memory.high", "1M");
+
+ /* wait for RCU freeing */
+ sleep(1);
+
slab1 = cg_read_key_long(cg, "memory.stat", "slab ");
if (slab1 <= 0)
goto cleanup;
--
2.41.0
The riscv selftests (which were modeled after the arm64 selftests) are
improperly declaring the "emit_tests" target to depend upon the "all"
target. This approach, when combined with commit 9fc96c7c19df
("selftests: error out if kernel header files are not yet built"), has
caused build failures [1] on arm64, and is likely to cause similar
failures for riscv.
To fix this, simply remove the unnecessary "all" dependency from the
emit_tests target. The dependency is still effectively honored, because
again, invocation is via "install", which also depends upon "all".
An alternative approach would be to harden the emit_tests target so that
it can depend upon "all", but that's a lot more complicated and hard to
get right, and doesn't seem worth it, especially given that emit_tests
should probably not be overridden at all.
[1] https://lore.kernel.org/20230710-kselftest-fix-arm64-v1-1-48e872844f25@kern…
Fixes: 9fc96c7c19df ("selftests: error out if kernel header files are not yet built")
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
---
Andrew,
With this, and with my arm64 fix [2] that you've already put into
mm-unstable, you should be able to safely drop commit 819187ab8741
("selftests: fix arm64 test installation").
[2] https://lore.kernel.org/20230711005629.2547838-1-jhubbard@nvidia.com
thanks,
John Hubbard
tools/testing/selftests/riscv/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/riscv/Makefile b/tools/testing/selftests/riscv/Makefile
index 9dd629cc86aa..f4b3d5c9af5b 100644
--- a/tools/testing/selftests/riscv/Makefile
+++ b/tools/testing/selftests/riscv/Makefile
@@ -43,7 +43,7 @@ run_tests: all
done
# Avoid any output on non riscv on emit_tests
-emit_tests: all
+emit_tests:
@for DIR in $(RISCV_SUBTARGETS); do \
BUILD_TARGET=$(OUTPUT)/$$DIR; \
$(MAKE) OUTPUT=$$BUILD_TARGET -C $$DIR $@; \
base-commit: 3f01e9fed8454dcd89727016c3e5b2fbb8f8e50c
prerequisite-patch-id: 37c92f7425689ff069fb83996a25cd98e78d7242
--
2.41.0
Nested translation is a hardware feature that is supported by many modern
IOMMU hardwares. It has two stages (stage-1, stage-2) address translation
to get access to the physical address. stage-1 translation table is owned
by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes
to stage-1 translation table should be followed by an IOTLB invalidation.
Take Intel VT-d as an example, the stage-1 translation table is I/O page
table. As the below diagram shows, guest I/O page table pointer in GPA
(guest physical address) is passed to host and be used to perform the stage-1
address translation. Along with it, modifications to present mappings in the
guest I/O page table should be followed with an IOTLB invalidation.
.-------------. .---------------------------.
| vIOMMU | | Guest I/O page table |
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush --+
'-------------' |
| | V
| | I/O page table pointer in GPA
'-------------'
Guest
------| Shadow |---------------------------|--------
v v v
Host
.-------------. .------------------------.
| pIOMMU | | FS for GIOVA->GPA |
| | '------------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.----------------------------------.
| | | SS for GPA->HPA, unmanaged domain|
| | '----------------------------------'
'-------------'
Where:
- FS = First stage page tables
- SS = Second stage page tables
<Intel VT-d Nested translation>
In IOMMUFD, all the translation tables are tracked by hw_pagetable (hwpt)
and each has an iommu_domain allocated from iommu driver. So in this series
hw_pagetable and iommu_domain means the same thing if no special note.
IOMMUFD has already supported allocating hw_pagetable that is linked with
an IOAS. However, nesting requires IOMMUFD to allow allocating hw_pagetable
with driver specific parameters and interface to sync stage-1 IOTLB as user
owns the stage-1 translation table.
This series is based on the iommu hw info reporting series [1]. It first
introduces new iommu op for allocating domains with user data and the op
for invalidate stage-1 IOTLB, and then extend the IOMMUFD internal infrastructure
to accept user_data and parent hwpt, then relay the data to iommu core to
allocate user iommu_domain. After it, extends the ioctl IOMMU_HWPT_ALLOC to
accept user data and stage-2 hwpt ID to allocate hwpt. Along with it, ioctl
IOMMU_HWPT_INVALIDATE is added to invalidate stage-1 IOTLB. This is needed
for user-managed hwpts. Selftest is added as well to cover the new ioctls.
Complete code can be found in [2], QEMU could can be found in [3].
At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks
them for the help. ^_^. Look forward to your feedbacks.
[1] https://lore.kernel.org/linux-iommu/20230724105936.107042-1-yi.l.liu@intel.…
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv4_nesting
Change log:
v3:
- Add new uAPI things in alphabetical order
- Pass in "enum iommu_hwpt_type hwpt_type" to op->domain_alloc_user for
sanity, replacing the previous op->domain_alloc_user_data_len solution
- Return ERR_PTR from domain_alloc_user instead of NULL
- Only add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation (Kevin)
- Add IOMMU_RESV_IOVA_RANGES to report resv iova ranges to userspace hence
userspace is able to exclude the ranges in the stage-1 HWPT (e.g. guest I/O
page table). (Kevin)
- Add selftest coverage for the new IOMMU_RESV_IOVA_RANGES ioctl
- Minor changes per Kevin's inputs
v2: https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.c…
- Add union iommu_domain_user_data to include all user data structures to avoid
passing void * in kernel APIs.
- Add iommu op to return user data length for user domain allocation
- Rename struct iommu_hwpt_alloc::data_type to be hwpt_type
- Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len
- Convert cache_invalidate_user op to be int instead of void
- Remove @data_type in struct iommu_hwpt_invalidate
- Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1
v1: https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.…
Thanks,
Yi Liu
Lu Baolu (2):
iommu: Add new iommu op to create domains owned by userspace
iommu: Add nested domain support
Nicolin Chen (6):
iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
iommufd: Only enforce IOMMU_RESV_SW_MSI when attaching user-managed
HWPT
iommufd/selftest: Add domain_alloc_user() support in iommu mock
iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data
iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
Yi Liu (9):
iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation
iommufd: Pass in hwpt_type/parent/user_data to
iommufd_hw_pagetable_alloc()
iommufd: Add IOMMU_RESV_IOVA_RANGES
iommufd: IOMMU_HWPT_ALLOC allocation with user data
iommufd: Add IOMMU_HWPT_INVALIDATE
iommufd/selftest: Add a helper to get test device
iommufd/selftest: Add IOMMU_TEST_OP_DEV_[ADD|DEL]_RESERVED to add/del
reserved regions to selftest device
iommufd/selftest: Add .get_resv_regions() for mock_dev
iommufd/selftest: Add coverage for IOMMU_RESV_IOVA_RANGES
drivers/iommu/iommufd/device.c | 9 +-
drivers/iommu/iommufd/hw_pagetable.c | 181 +++++++++++-
drivers/iommu/iommufd/io_pagetable.c | 5 +-
drivers/iommu/iommufd/iommufd_private.h | 20 +-
drivers/iommu/iommufd/iommufd_test.h | 36 +++
drivers/iommu/iommufd/main.c | 59 +++-
drivers/iommu/iommufd/selftest.c | 266 ++++++++++++++++--
include/linux/iommu.h | 34 +++
include/uapi/linux/iommufd.h | 96 ++++++-
tools/testing/selftests/iommu/iommufd.c | 224 ++++++++++++++-
tools/testing/selftests/iommu/iommufd_utils.h | 70 +++++
11 files changed, 958 insertions(+), 42 deletions(-)
--
2.34.1
test_kmem_basic creates 100,000 negative dentries, with each one mapping
to a slab object. After memory.high is set, these are reclaimed through
the shrink_slab function call which reclaims all 100,000 entries. The
test passes the majority of the time because when slab1 is calculated,
it is often above 0, however, 0 is also an acceptable value.
Signed-off-by: Lucas Karpinski <lkarpins(a)redhat.com>
---
https://lore.kernel.org/all/m6jbt5hzq27ygt3l4xyiaxxb7i5auvb2lahbcj4yaxxigqz…
V2: Corrected title
tools/testing/selftests/cgroup/test_kmem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c
index 258ddc565deb..ba0a0bfc5a98 100644
--- a/tools/testing/selftests/cgroup/test_kmem.c
+++ b/tools/testing/selftests/cgroup/test_kmem.c
@@ -71,7 +71,7 @@ static int test_kmem_basic(const char *root)
cg_write(cg, "memory.high", "1M");
slab1 = cg_read_key_long(cg, "memory.stat", "slab ");
- if (slab1 <= 0)
+ if (slab1 < 0)
goto cleanup;
current = cg_read_long(cg, "memory.current");
--
2.41.0
The following error happens:
In file included from vstate_exec_nolibc.c:2:
/usr/include/riscv64-linux-gnu/sys/prctl.h:42:12: error: conflicting types for ‘prctl’; h
ave ‘int(int, ...)’
42 | extern int prctl (int __option, ...) __THROW;
| ^~~~~
In file included from ./../../../../include/nolibc/nolibc.h:99,
from <command-line>:
./../../../../include/nolibc/sys.h:892:5: note: previous definition of ‘prctl’ with type
‘int(int, long unsigned int, long unsigned int, long unsigned int, long unsigned int)
’
892 | int prctl(int option, unsigned long arg2, unsigned long arg3,
| ^~~~~
Fix this by not including <sys/prctl.h>, which is not needed here since
prctl syscall is directly called using its number.
Fixes: 7cf6198ce22d ("selftests: Test RISC-V Vector prctl interface")
Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
---
tools/testing/selftests/riscv/vector/vstate_exec_nolibc.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/tools/testing/selftests/riscv/vector/vstate_exec_nolibc.c b/tools/testing/selftests/riscv/vector/vstate_exec_nolibc.c
index 5cbc392944a6..2c0d2b1126c1 100644
--- a/tools/testing/selftests/riscv/vector/vstate_exec_nolibc.c
+++ b/tools/testing/selftests/riscv/vector/vstate_exec_nolibc.c
@@ -1,6 +1,4 @@
// SPDX-License-Identifier: GPL-2.0-only
-#include <sys/prctl.h>
-
#define THIS_PROGRAM "./vstate_exec_nolibc"
int main(int argc, char **argv)
--
2.39.2
[ Upstream commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 ]
Dan Carpenter spotted a race condition in a couple of situations like
these in the test_firmware driver:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
ret = kstrtou8(buf, 10, &val);
if (ret)
return ret;
mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
static ssize_t config_num_requests_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
int rc;
mutex_lock(&test_fw_mutex);
if (test_fw_config->reqs) {
pr_err("Must call release_all_firmware prior to changing config\n");
rc = -EINVAL;
mutex_unlock(&test_fw_mutex);
goto out;
}
mutex_unlock(&test_fw_mutex);
// NOTE: HERE is the race!!! Function can be preempted!
// test_fw_config->reqs can change between the release of
// the lock about and acquire of the lock in the
// test_dev_config_update_u8()
rc = test_dev_config_update_u8(buf, count,
&test_fw_config->num_requests);
out:
return rc;
}
static ssize_t config_read_fw_idx_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
return test_dev_config_update_u8(buf, count,
&test_fw_config->read_fw_idx);
}
The function test_dev_config_update_u8() is called from both the locked
and the unlocked context, function config_num_requests_store() and
config_read_fw_idx_store() which can both be called asynchronously as
they are driver's methods, while test_dev_config_update_u8() and siblings
change their argument pointed to by u8 *cfg or similar pointer.
To avoid deadlock on test_fw_mutex, the lock is dropped before calling
test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8()
itself, but alas this creates a race condition.
Having two locks wouldn't assure a race-proof mutual exclusion.
This situation is best avoided by the introduction of a new, unlocked
function __test_dev_config_update_u8() which can be called from the locked
context and reducing test_dev_config_update_u8() to:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
mutex_lock(&test_fw_mutex);
ret = __test_dev_config_update_u8(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
}
doing the locking and calling the unlocked primitive, which enables both
locked and unlocked versions without duplication of code.
Fixes: c92316bf8e948 ("test_firmware: add batched firmware tests")
Cc: Luis R. Rodriguez <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Takashi Iwai <tiwai(a)suse.de>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v5.4, 4.19
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Link: https://lore.kernel.org/r/20230509084746.48259-1-mirsad.todorovac@alu.unizg…
Signed-off-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
[ This is the patch to fix the racing condition in locking for the 5.4, ]
[ 4.19 and 4.4 stable branches. Not all the fixes from the upstream ]
[ commit apply, but those which do are verbatim equal to those in the ]
[ upstream commit. ]
---
v2:
bundled locking and ENOSPC patches together.
tested on 5.4 and 4.19 stable.
lib/test_firmware.c | 37 ++++++++++++++++++++++++++++---------
1 file changed, 28 insertions(+), 9 deletions(-)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index 38553944e967..92d7195d5b5b 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -301,16 +301,26 @@ static ssize_t config_test_show_str(char *dst,
return len;
}
-static int test_dev_config_update_bool(const char *buf, size_t size,
- bool *cfg)
+static inline int __test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
{
int ret;
- mutex_lock(&test_fw_mutex);
if (strtobool(buf, cfg) < 0)
ret = -EINVAL;
else
ret = size;
+
+ return ret;
+}
+
+static int test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_bool(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
@@ -340,7 +350,7 @@ static ssize_t test_dev_config_show_int(char *buf, int cfg)
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+static inline int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
long new;
@@ -352,14 +362,23 @@ static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
if (new > U8_MAX)
return -EINVAL;
- mutex_lock(&test_fw_mutex);
*(u8 *)cfg = new;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
+static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_u8(buf, size, cfg);
+ mutex_unlock(&test_fw_mutex);
+
+ return ret;
+}
+
static ssize_t test_dev_config_show_u8(char *buf, u8 cfg)
{
u8 val;
@@ -392,10 +411,10 @@ static ssize_t config_num_requests_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_u8(buf, count,
- &test_fw_config->num_requests);
+ rc = __test_dev_config_update_u8(buf, count,
+ &test_fw_config->num_requests);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
--
2.39.3
It seems that the most critical issue with vm.memfd_noexec=2 (the fact
that passing MFD_EXEC would bypass it entirely[1]) has been fixed in
Andrew's tree[2], but there are still some outstanding issues that need
to be addressed:
* The dmesg warnings are pr_warn_once, which on most systems means that
they will be used up by systemd or some other boot process and
userspace developers will never see it. The original patch posted to
the ML used pr_warn_ratelimited but the merged patch had it changed
(with a comment about it being "per review"), but given that the
current warnings are useless, pr_warn_ratelimited makes far more
sense.
* vm.memfd_noexec=2 shouldn't reject old-style memfd_create(2) syscalls
because it will make it far to difficult to ever migrate. Instead it
should imply MFD_EXEC.
* The racheting mechanism for vm.memfd_noexec doesn't make sense as a
security mechanism because a CAP_SYS_ADMIN capable user can create
executable binaries in a hidden tmpfs very easily, not to mention the
many other things they can do.
* The memfd selftests would not exit with a non-zero error code when
certain tests that ran in a forked process (specifically the ones
related to MFD_EXEC and MFD_NOEXEC_SEAL) failed.
(This patchset is based on top of Jeff Xu's patches[2] fixing the
MFD_EXEC bug in vm.memfd_noexec=2.)
[1]: https://lore.kernel.org/all/ZJwcsU0vI-nzgOB_@codewreck.org/
[2]: https://lore.kernel.org/all/20230705063315.3680666-1-jeffxu@google.com/
Aleksa Sarai (3):
memfd: cleanups for vm.memfd_noexec handling
memfd: remove racheting feature from vm.memfd_noexec
selftests: memfd: error out test process when child test fails
include/linux/pid_namespace.h | 16 +++------
kernel/pid_sysctl.h | 7 ----
mm/memfd.c | 32 +++++++----------
tools/testing/selftests/memfd/memfd_test.c | 41 ++++++++++++++++++----
4 files changed, 51 insertions(+), 45 deletions(-)
--
2.41.0
As is described in the "How to use MPTCP?" section in MPTCP wiki [1]:
"Your app should create sockets with IPPROTO_MPTCP as the proto:
( socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); ). Legacy apps can be
forced to create and use MPTCP sockets instead of TCP ones via the
mptcpize command bundled with the mptcpd daemon."
But the mptcpize (LD_PRELOAD technique) command has some limitations
[2]:
- it doesn't work if the application is not using libc (e.g. GoLang
apps)
- in some envs, it might not be easy to set env vars / change the way
apps are launched, e.g. on Android
- mptcpize needs to be launched with all apps that want MPTCP: we could
have more control from BPF to enable MPTCP only for some apps or all the
ones of a netns or a cgroup, etc.
- it is not in BPF, we cannot talk about it at netdev conf.
So this patchset attempts to use BPF to implement functions similer to
mptcpize.
The main idea is to add a hook in sys_socket() to change the protocol id
from IPPROTO_TCP (or 0) to IPPROTO_MPTCP.
[1]
https://github.com/multipath-tcp/mptcp_net-next/wiki
[2]
https://github.com/multipath-tcp/mptcp_net-next/issues/79
v8:
- drop the additional checks on the 'protocol' value after the
'update_socket_protocol()' call.
v7:
- add __weak and __diag_* for update_socket_protocol.
v6:
- add update_socket_protocol.
v5:
- add bpf_mptcpify helper.
v4:
- use lsm_cgroup/socket_create
v3:
- patch 8: char cmd[128]; -> char cmd[256];
v2:
- Fix build selftests errors reported by CI
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/79
Geliang Tang (4):
bpf: Add update_socket_protocol hook
selftests/bpf: Use random netns name for mptcp
selftests/bpf: Add two mptcp netns helpers
selftests/bpf: Add mptcpify test
net/mptcp/bpf.c | 17 +++
net/socket.c | 25 ++++
.../testing/selftests/bpf/prog_tests/mptcp.c | 125 ++++++++++++++++--
tools/testing/selftests/bpf/progs/mptcpify.c | 25 ++++
4 files changed, 183 insertions(+), 9 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/mptcpify.c
--
2.35.3
Hi, Willy
Here is the v5, purely include the ppc parts, with two critical fixups
for the latest gcc 13.1.0 toolchain, now, both run and run-user pass.
Here is the run-user test report:
// with local toolchains
$ for arch in ppc ppc64 ppc64le; do make run-user XARCH=$arch | grep "status: "; done
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
// with latest toolchains
$ for arch in ppc ppc64 ppc64le; do make run-user XARCH=$arch CC=/path/to/gcc-13.1.0-nolibc/powerpc64-linux/bin/powerpc64-linux-gcc | grep status; file nolibc-test; done
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
nolibc-test: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV), statically linked, stripped
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
nolibc-test: ELF 64-bit MSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, stripped
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
nolibc-test: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, stripped
Since the missing serial console enabling patch [1] for ppc32 has
already gotten a Reviewed-by line from the ppc maintainer, now, the ppc
defconfig aligns with the others', and it is able to simply move the
nolibc-test-config related stuff to the next tinyconfig series.
Based on v4 [2], beside removing several nolibc-test-config related
patches, two bugs with the latest gcc 13.1.0 have been fixed.
Changes from v4 --> v5:
* tools/nolibc: add support for powerpc64
selftests/nolibc: add XARCH and ARCH mapping support
selftests/nolibc: add test support for ppc64
selftests/nolibc: allow customize CROSS_COMPILE by architecture
selftests/nolibc: customize CROSS_COMPILE for 32/64-bit powerpc
Almost the same as v4.
* tools/nolibc: add support for powerpc
For 32-bit PowerPC, with newer gcc compilers (e.g. gcc 13.1.0),
"omit-frame-pointer" fails with __attribute__((no_stack_protector)) but
works with __attribute__((__optimize__("-fno-stack-protector")))
Using the later for ppc32 to workaround the issue.
* selftests/nolibc: add test support for ppc
Add default CFLAGS for ppc to allow build with the
latest powerpc64-linux-gcc toolchain from
https://mirrors.edge.kernel.org/pub/tools/crosstool/
* selftests/nolibc: add test support for ppc64le
Align with kernel, prefer elfv2 ABI to elfv1 ABI when the toolchain
support, otherwise, ABI mismatched binary will not run.
Best regards,
Zhangjin Wu
---
[1]: https://lore.kernel.org/lkml/bb7b5f9958b3e3a20f6573ff7ce7c5dc566e7e32.16909…
[2]: https://lore.kernel.org/lkml/cover.1690916314.git.falcon@tinylab.org/
Zhangjin Wu (8):
tools/nolibc: add support for powerpc
tools/nolibc: add support for powerpc64
selftests/nolibc: add XARCH and ARCH mapping support
selftests/nolibc: add test support for ppc
selftests/nolibc: add test support for ppc64le
selftests/nolibc: add test support for ppc64
selftests/nolibc: allow customize CROSS_COMPILE by architecture
selftests/nolibc: customize CROSS_COMPILE for 32/64-bit powerpc
tools/include/nolibc/arch-powerpc.h | 213 ++++++++++++++++++++++++
tools/include/nolibc/arch.h | 2 +
tools/testing/selftests/nolibc/Makefile | 74 ++++++--
3 files changed, 277 insertions(+), 12 deletions(-)
create mode 100644 tools/include/nolibc/arch-powerpc.h
--
2.25.1
Hi, Willy, Hi Thomas
v4 here is mainly with a new nolibc-test-config target from your
suggestions and with the reordering of some patches to make
nolibc-test-config be fast forward.
run-user tests for all of the powerpc variants:
$ for arch in ppc ppc64 ppc64le; do make run-user XARCH=$arch | grep status; done
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
and defconfig + run for ppc:
$ make nolibc-test-config XARCH=ppc
$ make run XARCH=ppc
165 test(s): 159 passed, 6 skipped, 0 failed => status: warning
* tools/nolibc: add support for powerpc
tools/nolibc: add support for powerpc64
No change.
* selftests/nolibc: fix up O= option support
selftests/nolibc: add macros to reduce duplicated changes
From tinyconfig-part1 patchset, required by our nolibc-test-config target
Let nolibc-test-config be able to use objtree and the kernel related
macros directly.
* selftests/nolibc: add XARCH and ARCH mapping support
Moved before nolibc-test-config, for the NOLIBC_TEST_CONFIG macro used by
nolibc-test-config target
Willy talked about this twice, let nolibc-test-config be able to use
nolibc-test-$(XARCH).config listed in NOLIBC_TEST_CONFIG directly.
* selftests/nolibc: add nolibc-test-config target
selftests/nolibc: add help for nolibc-test-config target
A new generic nolibc-test-config target is added, allows to enable
additional options for a top-level config target.
defconfig is reserved as an alias of nolibc-test-config.
As suggested by Thomas and Willy.
* selftests/nolibc: add test support for ppc
selftests/nolibc: add test support for ppc64le
selftests/nolibc: add test support for ppc64
Renamed from $(XARCH).config to nolibc-test-$(XARCH).config
As suggested by Willy.
* selftests/nolibc: allow customize CROSS_COMPILE by architecture
selftests/nolibc: customize CROSS_COMPILE for 32/64-bit powerpc
Moved here as suggested by Willy.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1690468707.git.falcon@tinylab.org/
Zhangjin Wu (12):
tools/nolibc: add support for powerpc
tools/nolibc: add support for powerpc64
selftests/nolibc: fix up O= option support
selftests/nolibc: add macros to reduce duplicated changes
selftests/nolibc: add XARCH and ARCH mapping support
selftests/nolibc: add nolibc-test-config target
selftests/nolibc: add help for nolibc-test-config target
selftests/nolibc: add test support for ppc
selftests/nolibc: add test support for ppc64le
selftests/nolibc: add test support for ppc64
selftests/nolibc: allow customize CROSS_COMPILE by architecture
selftests/nolibc: customize CROSS_COMPILE for 32/64-bit powerpc
tools/include/nolibc/arch-powerpc.h | 202 ++++++++++++++++++
tools/include/nolibc/arch.h | 2 +
tools/testing/selftests/nolibc/Makefile | 157 ++++++++++----
.../nolibc/configs/nolibc-test-ppc.config | 3 +
4 files changed, 327 insertions(+), 37 deletions(-)
create mode 100644 tools/include/nolibc/arch-powerpc.h
create mode 100644 tools/testing/selftests/nolibc/configs/nolibc-test-ppc.config
--
2.25.1
As is described in the "How to use MPTCP?" section in MPTCP wiki [1]:
"Your app should create sockets with IPPROTO_MPTCP as the proto:
( socket(AF_INET, SOCK_STREAM, IPPROTO_MPTCP); ). Legacy apps can be
forced to create and use MPTCP sockets instead of TCP ones via the
mptcpize command bundled with the mptcpd daemon."
But the mptcpize (LD_PRELOAD technique) command has some limitations
[2]:
- it doesn't work if the application is not using libc (e.g. GoLang
apps)
- in some envs, it might not be easy to set env vars / change the way
apps are launched, e.g. on Android
- mptcpize needs to be launched with all apps that want MPTCP: we could
have more control from BPF to enable MPTCP only for some apps or all the
ones of a netns or a cgroup, etc.
- it is not in BPF, we cannot talk about it at netdev conf.
So this patchset attempts to use BPF to implement functions similer to
mptcpize.
The main idea is to add a hook in sys_socket() to change the protocol id
from IPPROTO_TCP (or 0) to IPPROTO_MPTCP.
[1]
https://github.com/multipath-tcp/mptcp_net-next/wiki
[2]
https://github.com/multipath-tcp/mptcp_net-next/issues/79
v7:
- add __weak and __diag_* for update_socket_protocol.
v6:
- add update_socket_protocol.
v5:
- add bpf_mptcpify helper.
v4:
- use lsm_cgroup/socket_create
v3:
- patch 8: char cmd[128]; -> char cmd[256];
v2:
- Fix build selftests errors reported by CI
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/79
Geliang Tang (6):
net: socket: add update_socket_protocol hook
bpf: Register mptcp modret set
selftests/bpf: Add mptcpify program
selftests/bpf: use random netns name for mptcp
selftests/bpf: add two mptcp netns helpers
selftests/bpf: Add mptcpify selftest
net/mptcp/bpf.c | 17 +++
net/socket.c | 26 ++++
.../testing/selftests/bpf/prog_tests/mptcp.c | 125 ++++++++++++++++--
tools/testing/selftests/bpf/progs/mptcpify.c | 25 ++++
4 files changed, 184 insertions(+), 9 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/mptcpify.c
--
2.35.3
Here's a follow-up from my RFC series last year:
https://lore.kernel.org/lkml/20221004093131.40392-1-thuth@redhat.com/T/
Basic idea of this series is now to use the kselftest_harness.h
framework to get TAP output in the tests, so that it is easier
for the user to see what is going on, and e.g. to be able to
detect whether a certain test is part of the test binary or not
(which is useful when tests get extended in the course of time).
Thomas Huth (4):
KVM: selftests: Rename the ASSERT_EQ macro
KVM: selftests: x86: Use TAP interface in the sync_regs test
KVM: selftests: x86: Use TAP interface in the fix_hypercall test
KVM: selftests: x86: Use TAP interface in the userspace_msr_exit test
.../selftests/kvm/aarch64/aarch32_id_regs.c | 8 +-
.../selftests/kvm/aarch64/page_fault_test.c | 10 +-
.../testing/selftests/kvm/include/test_util.h | 4 +-
tools/testing/selftests/kvm/lib/kvm_util.c | 2 +-
.../selftests/kvm/max_guest_memory_test.c | 2 +-
tools/testing/selftests/kvm/s390x/cmma_test.c | 62 +++++-----
tools/testing/selftests/kvm/s390x/memop.c | 6 +-
tools/testing/selftests/kvm/s390x/tprot.c | 4 +-
.../x86_64/dirty_log_page_splitting_test.c | 18 +--
.../x86_64/exit_on_emulation_failure_test.c | 2 +-
.../selftests/kvm/x86_64/fix_hypercall_test.c | 16 ++-
.../kvm/x86_64/nested_exceptions_test.c | 12 +-
.../kvm/x86_64/recalc_apic_map_test.c | 6 +-
.../selftests/kvm/x86_64/sync_regs_test.c | 113 +++++++++++++++---
.../selftests/kvm/x86_64/tsc_msrs_test.c | 32 ++---
.../kvm/x86_64/userspace_msr_exit_test.c | 19 +--
.../vmx_exception_with_invalid_guest_state.c | 2 +-
.../selftests/kvm/x86_64/vmx_pmu_caps_test.c | 3 +-
.../selftests/kvm/x86_64/xapic_state_test.c | 8 +-
.../selftests/kvm/x86_64/xen_vmcall_test.c | 20 ++--
20 files changed, 218 insertions(+), 131 deletions(-)
--
2.39.3
To help the developers to avoid mistakes and keep the code smaller let's
enable compiler warnings.
I stuck with __attribute__((unused)) over __maybe_unused in
nolibc-test.c for consistency with nolibc proper.
If we want to add a define it needs to be added twice once for nolibc
proper and once for nolibc-test otherwise libc-test wouldn't build
anymore.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Changes in v2:
- Don't drop unused test helpers, mark them as __attribute__((unused))
- Make some function in nolibc-test static
- Also handle -W and -Wextra
- Link to v1: https://lore.kernel.org/r/20230731-nolibc-warnings-v1-0-74973d2a52d7@weisss…
---
Thomas Weißschuh (10):
tools/nolibc: drop unused variables
tools/nolibc: sys: avoid implicit sign cast
tools/nolibc: stdint: use int for size_t on 32bit
selftests/nolibc: drop unused variables
selftests/nolibc: mark test helpers as potentially unused
selftests/nolibc: make functions static if possible
selftests/nolibc: avoid unused arguments warnings
selftests/nolibc: avoid sign-compare warnings
selftests/nolibc: test return value of read() in test_vfprintf
selftests/nolibc: enable compiler warnings
tools/include/nolibc/stdint.h | 4 +
tools/include/nolibc/sys.h | 3 +-
tools/testing/selftests/nolibc/Makefile | 2 +-
tools/testing/selftests/nolibc/nolibc-test.c | 108 +++++++++++++++++----------
4 files changed, 74 insertions(+), 43 deletions(-)
---
base-commit: dfef4fc45d5713eb23d87f0863aff9c33bd4bfaf
change-id: 20230731-nolibc-warnings-c6e47284ac03
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Changes from RFC[1]
- Rebase on latest mm-unstable
- Add base-commit
----
There are use cases that need to apply DAMOS schemes to specific address
ranges or DAMON monitoring targets. NUMA nodes in the physical address
space, special memory objects in the virtual address space, and
monitoring target specific efficient monitoring results snapshot
retrieval could be examples of such use cases. This patchset extends
DAMOS filters feature for such cases, by implementing two more filter
types, namely address ranges and DAMON monitoring types.
Patches sequence
----------------
The first seven patches are for the address ranges based DAMOS filter.
The first patch implements the filter feature and expose it via DAMON
kernel API. The second patch further expose the feature to users via
DAMON sysfs interface. The third and fourth patches implement unit
tests and selftests for the feature. Three patches (fifth to seventh)
updating the documents follow.
The following six patches are for the DAMON monitoring target based
DAMOS filter. The eighth patch implements the feature in the core layer
and expose it via DAMON's kernel API. The ninth patch further expose it
to users via DAMON sysfs interface. Tenth patch add a selftest, and two
patches (eleventh and twelfth) update documents.
[1] https://lore.kernel.org/damon/20230728203444.70703-1-sj@kernel.org/
SeongJae Park (13):
mm/damon/core: introduce address range type damos filter
mm/damon/sysfs-schemes: support address range type DAMOS filter
mm/damon/core-test: add a unit test for __damos_filter_out()
selftests/damon/sysfs: test address range damos filter
Docs/mm/damon/design: update for address range filters
Docs/ABI/damon: update for address range DAMOS filter
Docs/admin-guide/mm/damon/usage: update for address range type DAMOS
filter
mm/damon/core: implement target type damos filter
mm/damon/sysfs-schemes: support target damos filter
selftests/damon/sysfs: test damon_target filter
Docs/mm/damon/design: update for DAMON monitoring target type DAMOS
filter
Docs/ABI/damon: update for DAMON monitoring target type DAMOS filter
Docs/admin-guide/mm/damon/usage: update for DAMON monitoring target
type DAMOS filter
.../ABI/testing/sysfs-kernel-mm-damon | 27 +++++-
Documentation/admin-guide/mm/damon/usage.rst | 34 +++++---
Documentation/mm/damon/design.rst | 24 ++++--
include/linux/damon.h | 28 +++++--
mm/damon/core-test.h | 61 ++++++++++++++
mm/damon/core.c | 62 ++++++++++++++
mm/damon/sysfs-schemes.c | 83 +++++++++++++++++++
tools/testing/selftests/damon/sysfs.sh | 5 ++
8 files changed, 299 insertions(+), 25 deletions(-)
base-commit: 32f9db36a0031f99629b5910d795b3f13f284472
--
2.25.1
Changes from RFC[1]
- Rebase on latest mm-unstable
- Add base-commit
----
The tried_regions directory of DAMON sysfs interface is useful for
retrieving monitoring results snapshot or DAMOS debugging. However, for
common use case that need to monitor only the total size of the scheme
tried regions (e.g., monitoring working set size), the kernel overhead
for directory construction and user overhead for reading the content
could be high if the number of monitoring region is not small. This
patchset implements DAMON sysfs files for efficient support of the use
case.
The first patch implements the sysfs file to reduce the user space
overhead, and the second patch implements a command for reducing the
kernel space overhead.
The third patch adds a selftest for the new file, and following two
patches update documents.
[1] https://lore.kernel.org/damon/20230728201817.70602-1-sj@kernel.org/
SeongJae Park (5):
mm/damon/sysfs-schemes: implement DAMOS tried total bytes file
mm/damon/sysfs: implement a command for updating only schemes tried
total bytes
selftests/damon/sysfs: test tried_regions/total_bytes file
Docs/ABI/damon: update for tried_regions/total_bytes
Docs/admin-guide/mm/damon/usage: update for tried_regions/total_bytes
.../ABI/testing/sysfs-kernel-mm-damon | 13 +++++-
Documentation/admin-guide/mm/damon/usage.rst | 42 ++++++++++++-------
mm/damon/sysfs-common.h | 2 +-
mm/damon/sysfs-schemes.c | 24 ++++++++++-
mm/damon/sysfs.c | 26 +++++++++---
tools/testing/selftests/damon/sysfs.sh | 1 +
6 files changed, 83 insertions(+), 25 deletions(-)
base-commit: a57d8094e1946e9dbdba0dddf0e10f9f4dceae0d
--
2.25.1
With test case kvm_page_table_test, start time is acquired with
time type CLOCK_MONOTONIC_RAW, however end time in function timespec_elapsed
is acquired with time type CLOCK_MONOTONIC. This will cause
inaccurate elapsed time calculation on some platform such as LoongArch.
This patch modified test case kvm_page_table_test, and uses unified
time type CLOCK_MONOTONIC for start time.
Signed-off-by: Bibo Mao <maobibo(a)loongson.cn>
---
tools/testing/selftests/kvm/kvm_page_table_test.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/kvm_page_table_test.c b/tools/testing/selftests/kvm/kvm_page_table_test.c
index b3b00be1ef82..69f26d80c821 100644
--- a/tools/testing/selftests/kvm/kvm_page_table_test.c
+++ b/tools/testing/selftests/kvm/kvm_page_table_test.c
@@ -200,7 +200,7 @@ static void *vcpu_worker(void *data)
if (READ_ONCE(host_quit))
return NULL;
- clock_gettime(CLOCK_MONOTONIC_RAW, &start);
+ clock_gettime(CLOCK_MONOTONIC, &start);
ret = _vcpu_run(vcpu);
ts_diff = timespec_elapsed(start);
@@ -367,7 +367,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
/* Test the stage of KVM creating mappings */
*current_stage = KVM_CREATE_MAPPINGS;
- clock_gettime(CLOCK_MONOTONIC_RAW, &start);
+ clock_gettime(CLOCK_MONOTONIC, &start);
vcpus_complete_new_stage(*current_stage);
ts_diff = timespec_elapsed(start);
@@ -380,7 +380,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
*current_stage = KVM_UPDATE_MAPPINGS;
- clock_gettime(CLOCK_MONOTONIC_RAW, &start);
+ clock_gettime(CLOCK_MONOTONIC, &start);
vcpus_complete_new_stage(*current_stage);
ts_diff = timespec_elapsed(start);
@@ -392,7 +392,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
*current_stage = KVM_ADJUST_MAPPINGS;
- clock_gettime(CLOCK_MONOTONIC_RAW, &start);
+ clock_gettime(CLOCK_MONOTONIC, &start);
vcpus_complete_new_stage(*current_stage);
ts_diff = timespec_elapsed(start);
--
2.27.0
test_kmem_basic creates 100,000 negative dentries, with each one mapping
to a slab object. After memory.high is set, these are reclaimed through
the shrink_slab function call which reclaims all 100,000 entries. The test
passes the majority of the time because when slab1 is calculated, it is
often above 0, however, 0 is also an acceptable value.
Signed-off-by: Lucas Karpinski <lkarpins(a)redhat.com>
---
tools/testing/selftests/cgroup/test_kmem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c
index 258ddc565deb..ba0a0bfc5a98 100644
--- a/tools/testing/selftests/cgroup/test_kmem.c
+++ b/tools/testing/selftests/cgroup/test_kmem.c
@@ -71,7 +71,7 @@ static int test_kmem_basic(const char *root)
cg_write(cg, "memory.high", "1M");
slab1 = cg_read_key_long(cg, "memory.stat", "slab ");
- if (slab1 <= 0)
+ if (slab1 < 0)
goto cleanup;
current = cg_read_long(cg, "memory.current");
--
2.41.0
This is agains mm/mm-unstable, but everything except patch #7 and #8
should apply on current master. Especially patch #1 and #2 should go
upstream first, so we can let the other stuff mature a bit longer.
Next attempt to handle the fallout of 474098edac26
("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") where I
accidentially missed that follow_page() and smaps implicitly kept the
FOLL_NUMA flag clear by not setting it if FOLL_FORCE is absent, to
not trigger faults on PROT_NONE-mapped PTEs.
Patch #1 fixes the known issues by reintroducing FOLL_NUMA as
FOLL_HONOR_NUMA_FAULT and decoupling it from FOLL_FORCE.
Patch #2 is a cleanup that I think actually fixes some corner cases, so
I added a Fixes: tag.
Patch #3 makes KVM explicitly set FOLL_HONOR_NUMA_FAULT in the single
case where it is required, and documents the situation.
Patch #4 then stops implicitly setting FOLL_HONOR_NUMA_FAULT. But note that
for FOLL_WRITE we always implicitly honor NUMA hinting faults.
Patch #5 and patch #6 cleanup some comments.
Patch #7 improves the KVM functional tests such that patch #8 can
actually check for one of the known issues: KSM no longer working on
PROT_NONE mappings on x86-64 with CONFIG_NUMA_BALANCING.
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: liubo <liubo254(a)huawei.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
David Hildenbrand (8):
mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd()
kvm: explicitly set FOLL_HONOR_NUMA_FAULT in hva_to_pfn_slow()
mm/gup: don't implicitly set FOLL_HONOR_NUMA_FAULT
pgtable: improve pte_protnone() comment
mm/huge_memory: remove stale NUMA hinting comment from
follow_trans_huge_pmd()
selftest/mm: ksm_functional_tests: test in mmap_and_merge_range() if
anything got merged
selftest/mm: ksm_functional_tests: Add PROT_NONE test
fs/proc/task_mmu.c | 3 +-
include/linux/mm.h | 21 +++-
include/linux/mm_types.h | 9 ++
include/linux/pgtable.h | 16 ++-
mm/gup.c | 23 +++-
mm/huge_memory.c | 3 +-
.../selftests/mm/ksm_functional_tests.c | 106 ++++++++++++++++--
virt/kvm/kvm_main.c | 13 ++-
8 files changed, 164 insertions(+), 30 deletions(-)
--
2.41.0
Add trap and cleanup for SIGTERM sent by timeout and SIGINT from
keyboard, for the test times out and leaves incoherent network stack.
Fixes: 511e8db54036c ("selftests: forwarding: Add test for custom multipath hash")
Cc: Ido Schimmel <idosch(a)nvidia.com>
Cc: netdev(a)vger.kernel.org
---
tools/testing/selftests/net/forwarding/custom_multipath_hash.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/forwarding/custom_multipath_hash.sh b/tools/testing/selftests/net/forwarding/custom_multipath_hash.sh
index 56eb83d1a3bd..c7ab883d2515 100755
--- a/tools/testing/selftests/net/forwarding/custom_multipath_hash.sh
+++ b/tools/testing/selftests/net/forwarding/custom_multipath_hash.sh
@@ -363,7 +363,7 @@ custom_hash()
custom_hash_v6
}
-trap cleanup EXIT
+trap cleanup INT TERM EXIT
setup_prepare
setup_wait
--
2.34.1
On Tue, 1 Aug 2023 at 15:11, Greg Kroah-Hartman
<gregkh(a)linuxfoundation.org> wrote:
>
> This is the start of the stable review cycle for the 6.4.8 release.
> There are 239 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu, 03 Aug 2023 09:18:38 +0000.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.8-rc1.…
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
Following kselftest build regression found,
selftests/rseq: Play nice with binaries statically linked against
glibc 2.35+
commit 3bcbc20942db5d738221cca31a928efc09827069 upstream.
To allow running rseq and KVM's rseq selftests as statically linked
binaries, initialize the various "trampoline" pointers to point directly
at the expect glibc symbols, and skip the dlysm() lookups if the rseq
size is non-zero, i.e. the binary is statically linked *and* the libc
registered its own rseq.
Define weak versions of the symbols so as not to break linking against
libc versions that don't support rseq in any capacity.
The KVM selftests in particular are often statically linked so that they
can be run on targets with very limited runtime environments, i.e. test
machines.
Fixes: 233e667e1ae3 ("selftests/rseq: Uplift rseq selftests for
compatibility with glibc-2.35")
Cc: Aaron Lewis <aaronlewis(a)google.com>
Cc: kvm(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20230721223352.2333911-1-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Build log:
----
x86_64-linux-gnu-gcc -O2 -Wall -g -I./ -isystem
/home/tuxbuild/.cache/tuxmake/builds/1/build/usr/include
-L/home/tuxbuild/.cache/tuxmake/builds/1/build/kselftest/rseq
-Wl,-rpath=./ -shared -fPIC rseq.c -lpthread -ldl -o
/home/tuxbuild/.cache/tuxmake/builds/1/build/kselftest/rseq/librseq.so
rseq.c:41:1: error: unknown type name '__weak'
41 | __weak ptrdiff_t __rseq_offset;
| ^~~~~~
rseq.c:41:18: error: expected '=', ',', ';', 'asm' or '__attribute__'
before '__rseq_offset'
41 | __weak ptrdiff_t __rseq_offset;
| ^~~~~~~~~~~~~
rseq.c:42:7: error: expected ';' before 'unsigned'
42 | __weak unsigned int __rseq_size;
| ^~~~~~~~~
| ;
rseq.c:43:7: error: expected ';' before 'unsigned'
43 | __weak unsigned int __rseq_flags;
| ^~~~~~~~~
| ;
rseq.c:45:47: error: '__rseq_offset' undeclared here (not in a
function); did you mean 'rseq_offset'?
45 | static const ptrdiff_t *libc_rseq_offset_p = &__rseq_offset;
| ^~~~~~~~~~~~~
| rseq_offset
make[3]: Leaving directory 'tools/testing/selftests/rseq'
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
Links:
- https://storage.tuxsuite.com/public/linaro/lkft/builds/2TNSVjRCfcIaJWQNkPwD…
- https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.4.y/build/v6.4.7…
--
Linaro LKFT
https://lkft.linaro.org
The lkdtm selftest config fragment enables CONFIG_UBSAN_TRAP to make the
ARRAY_BOUNDS test kill the calling process when an out-of-bound access
is detected by UBSAN. However, after this [1] commit, UBSAN is triggered
under many new scenarios that weren't detected before, such as in struct
definitions with fixed-size trailing arrays used as flexible arrays. As
a result, CONFIG_UBSAN_TRAP=y has become a very aggressive option to
enable except for specific situations.
`make kselftest-merge` applies CONFIG_UBSAN_TRAP=y to the kernel config
for all selftests, which makes many of them fail because of system hangs
during boot.
This change removes the config option from the lkdtm kselftest and also
the ARRAY_BOUNDS test to skip it rather than have it failing. If
out-of-bound array accesses need to be checked, there's
CONFIG_TEST_UBSAN for that.
[1] commit 2d47c6956ab3 ("ubsan: Tighten UBSAN_BOUNDS on GCC")'
Signed-off-by: Ricardo Cañuelo <ricardo.canuelo(a)collabora.com>
---
tools/testing/selftests/lkdtm/config | 1 -
tools/testing/selftests/lkdtm/tests.txt | 2 +-
2 files changed, 1 insertion(+), 2 deletions(-)
diff --git a/tools/testing/selftests/lkdtm/config b/tools/testing/selftests/lkdtm/config
index 5d52f64dfb43..7afe05e8c4d7 100644
--- a/tools/testing/selftests/lkdtm/config
+++ b/tools/testing/selftests/lkdtm/config
@@ -9,7 +9,6 @@ CONFIG_INIT_ON_FREE_DEFAULT_ON=y
CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
CONFIG_UBSAN=y
CONFIG_UBSAN_BOUNDS=y
-CONFIG_UBSAN_TRAP=y
CONFIG_STACKPROTECTOR_STRONG=y
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB_DEBUG_ON=y
diff --git a/tools/testing/selftests/lkdtm/tests.txt b/tools/testing/selftests/lkdtm/tests.txt
index 607b8d7e3ea3..6a49f2abbda8 100644
--- a/tools/testing/selftests/lkdtm/tests.txt
+++ b/tools/testing/selftests/lkdtm/tests.txt
@@ -7,7 +7,7 @@ EXCEPTION
#EXHAUST_STACK Corrupts memory on failure
#CORRUPT_STACK Crashes entire system on success
#CORRUPT_STACK_STRONG Crashes entire system on success
-ARRAY_BOUNDS
+#ARRAY_BOUNDS Needs CONFIG_UBSAN_TRAP=y, might cause unrelated system hangs
CORRUPT_LIST_ADD list_add corruption
CORRUPT_LIST_DEL list_del corruption
STACK_GUARD_PAGE_LEADING
--
2.25.1
[ commit be37bed754ed90b2655382f93f9724b3c1aae847 upstream ]
Dan Carpenter spotted that test_fw_config->reqs will be leaked if
trigger_batched_requests_store() is called two or more times.
The same appears with trigger_batched_requests_async_store().
This bug wasn't triggered by the tests, but observed by Dan's visual
inspection of the code.
The recommended workaround was to return -EBUSY if test_fw_config->reqs
is already allocated.
Fixes: c92316bf8e94 ("test_firmware: add batched firmware tests")
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v4.14
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Suggested-by: Takashi Iwai <tiwai(a)suse.de>
Link: https://lore.kernel.org/r/20230509084746.48259-2-mirsad.todorovac@alu.unizg…
Signed-off-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
[ This fix is applied against the 4.14 stable branch. There are no changes to the ]
[ fix in code when compared to the upstread, only the reformatting for backport. ]
Signed-off-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
---
v1 -> v2:
removed the Reviewed-by: and Acked-by tags, as this is a slightly different patch and
those need to be reacquired
lib/test_firmware.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index 1c5e5246bf10..5318c5e18acf 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -621,6 +621,11 @@ static ssize_t trigger_batched_requests_store(struct device *dev,
mutex_lock(&test_fw_mutex);
+ if (test_fw_config->reqs) {
+ rc = -EBUSY;
+ goto out_bail;
+ }
+
test_fw_config->reqs = vzalloc(sizeof(struct test_batched_req) *
test_fw_config->num_requests * 2);
if (!test_fw_config->reqs) {
@@ -723,6 +728,11 @@ ssize_t trigger_batched_requests_async_store(struct device *dev,
mutex_lock(&test_fw_mutex);
+ if (test_fw_config->reqs) {
+ rc = -EBUSY;
+ goto out_bail;
+ }
+
test_fw_config->reqs = vzalloc(sizeof(struct test_batched_req) *
test_fw_config->num_requests * 2);
if (!test_fw_config->reqs) {
--
2.34.1
There are macros in kernel.h that can be used outside of that header.
Split them to args.h and replace open coded variants.
Test compiled with `make allmodconfig` for x86_64.
Test cross-compiled with `make multi_v7_defconfig` for arm.
(Note that positive diff statistics is due to documentation being
updated.)
In v4:
- fixed compilation error on arm (LKP, Stephen)
In v3:
- split to a series of patches
- fixed build issue on `make allmodconfig` for x86_64 (Andrew)
In v2:
- converted existing users at the same time (Andrew, Rasmus)
- documented how it does work (Andrew, Rasmus)
Andy Shevchenko (4):
kernel.h: Split out COUNT_ARGS() and CONCATENATE() to args.h
x86/asm: Replace custom COUNT_ARGS() & CONCATENATE() implementations
arm64: smccc: Replace custom COUNT_ARGS() & CONCATENATE()
implementations
genetlink: Replace custom CONCATENATE() implementation
arch/x86/include/asm/rmwcc.h | 11 ++---
include/kunit/test.h | 1 +
include/linux/args.h | 28 +++++++++++++
include/linux/arm-smccc.h | 69 ++++++++++++++-----------------
include/linux/genl_magic_func.h | 27 ++++++------
include/linux/genl_magic_struct.h | 8 ++--
include/linux/kernel.h | 7 ----
include/linux/pci.h | 2 +-
include/trace/bpf_probe.h | 2 +
9 files changed, 84 insertions(+), 71 deletions(-)
create mode 100644 include/linux/args.h
--
2.40.0.1.gaa8946217a0b
lwt xmit hook does not expect positive return values in function
ip_finish_output2 and ip6_finish_output2. However, BPF redirect programs
can return positive values such like NET_XMIT_DROP, NET_RX_DROP, and etc
as errors. Such return values can panic the kernel unexpectedly:
https://gist.github.com/zhaiyan920/8fbac245b261fe316a7ef04c9b1eba48
This patch fixes the return values from BPF redirect, so the error
handling would be consistent at xmit hook. It also adds a few test cases
to prevent future regressions.
v3: https://lore.kernel.org/bpf/cover.1690255889.git.yan@cloudflare.com/
v2: https://lore.kernel.org/netdev/ZLdY6JkWRccunvu0@debian.debian/
v1: https://lore.kernel.org/bpf/ZLbYdpWC8zt9EJtq@debian.debian/
changes since v3:
* minor change in commit message and changelogs
* tested by Jakub Sitnicki
changes since v2:
* subject name changed
* also covered redirect to ingress case
* added selftests
changes since v1:
* minor code style changes
Yan Zhai (2):
bpf: fix skb_do_redirect return values
bpf: selftests: add lwt redirect regression test cases
include/linux/netdevice.h | 2 +
net/core/filter.c | 9 +-
tools/testing/selftests/bpf/Makefile | 1 +
.../selftests/bpf/progs/test_lwt_redirect.c | 66 +++++++
.../selftests/bpf/test_lwt_redirect.sh | 174 ++++++++++++++++++
5 files changed, 250 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/test_lwt_redirect.c
create mode 100755 tools/testing/selftests/bpf/test_lwt_redirect.sh
--
2.30.2
Hi Willy and Thomas,
In v3, I have fixed all the problems you mentioned.
Welcome any other suggestion.
---
Changes in v3:
- Fix the missing return
- Fix __NR_pipe to __NR_pipe2
- Fix the missing static
- Test case works in one process
- Link to v2:
https://lore.kernel.org/all/cover.1690733545.git.tanyuan@tinylab.org
Changes in v2:
- Use sys_pipe2 to implement the pipe()
- Use memcmp() instead of strcmp()
- Link to v1:
https://lore.kernel.org/all/cover.1690307717.git.tanyuan@tinylab.org
---
Yuan Tan (2):
tools/nolibc: add pipe() and pipe2() support
selftests/nolibc: add testcase for pipe
tools/include/nolibc/sys.h | 24 ++++++++++++++++++++
tools/testing/selftests/nolibc/nolibc-test.c | 22 ++++++++++++++++++
2 files changed, 46 insertions(+)
--
2.34.1
[ commit be37bed754ed90b2655382f93f9724b3c1aae847 upstream ]
Dan Carpenter spotted that test_fw_config->reqs will be leaked if
trigger_batched_requests_store() is called two or more times.
The same appears with trigger_batched_requests_async_store().
This bug wasn't triggered by the tests, but observed by Dan's visual
inspection of the code.
The recommended workaround was to return -EBUSY if test_fw_config->reqs
is already allocated.
Fixes: c92316bf8e94 ("test_firmware: add batched firmware tests")
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v4.19
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Suggested-by: Takashi Iwai <tiwai(a)suse.de>
Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Acked-by: Luis Chamberlain <mcgrof(a)kernel.org>
Link: https://lore.kernel.org/r/20230509084746.48259-2-mirsad.todorovac@alu.unizg…
Signed-off-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
[ This is a backport to v4.19 stable branch without a change in code from the 5.4+ patch ]
---
v1:
patch sumbmitted verbatim from the 5.4+ branch to 4.19
lib/test_firmware.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index f4cc874021da..e4688821eab8 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -618,6 +618,11 @@ static ssize_t trigger_batched_requests_store(struct device *dev,
mutex_lock(&test_fw_mutex);
+ if (test_fw_config->reqs) {
+ rc = -EBUSY;
+ goto out_bail;
+ }
+
test_fw_config->reqs =
vzalloc(array3_size(sizeof(struct test_batched_req),
test_fw_config->num_requests, 2));
@@ -721,6 +726,11 @@ ssize_t trigger_batched_requests_async_store(struct device *dev,
mutex_lock(&test_fw_mutex);
+ if (test_fw_config->reqs) {
+ rc = -EBUSY;
+ goto out_bail;
+ }
+
test_fw_config->reqs =
vzalloc(array3_size(sizeof(struct test_batched_req),
test_fw_config->num_requests, 2));
--
2.34.1
Hi, Willy, Thomas,
Thanks to your advice and I really learned a lot from it.
V2 now uses pipe2() to wrap pipe(), and fixes the strcmp issue in test
case.
Best regards,
Yuan Tan
Yuan Tan (2):
tools/nolibc: add pipe() and pipe2() support
selftests/nolibc: add testcase for pipe
tools/include/nolibc/sys.h | 24 ++++++++++++++
tools/testing/selftests/nolibc/nolibc-test.c | 35 ++++++++++++++++++++
2 files changed, 59 insertions(+)
--
2.34.1
This series was originally written by José Expósito, and can be found
here:
https://github.com/Rust-for-Linux/linux/pull/950
Add support for writing KUnit tests in Rust. While Rust doctests are
already converted to KUnit tests and run, they're really better suited
for examples, rather than as first-class unit tests.
This series implements a series of direct Rust bindings for KUnit tests,
as well as a new macro which allows KUnit tests to be written using a
close variant of normal Rust unit test syntax. The only change required
is replacing '#[cfg(test)]' with '#[kunit_tests(kunit_test_suite_name)]'
An example test would look like:
#[kunit_tests(rust_kernel_hid_driver)]
mod tests {
use super::*;
use crate::{c_str, driver, hid, prelude::*};
use core::ptr;
struct SimpleTestDriver;
impl Driver for SimpleTestDriver {
type Data = ();
}
#[test]
fn rust_test_hid_driver_adapter() {
let mut hid = bindings::hid_driver::default();
let name = c_str!("SimpleTestDriver");
static MODULE: ThisModule = unsafe { ThisModule::from_ptr(ptr::null_mut()) };
let res = unsafe {
<hid::Adapter<SimpleTestDriver> as driver::DriverOps>::register(&mut hid, name, &MODULE)
};
assert_eq!(res, Err(ENODEV)); // The mock returns -19
}
}
Changes since the GitHub PR:
- Rebased on top of kselftest/kunit
- Add const_mut_refs feature
This may conflict with https://lore.kernel.org/lkml/20230503090708.2524310-6-nmi@metaspace.dk/
- Add rust/macros/kunit.rs to the KUnit MAINTAINERS entry
Signed-off-by: David Gow <davidgow(a)google.com>
---
José Expósito (3):
rust: kunit: add KUnit case and suite macros
rust: macros: add macro to easily run KUnit tests
rust: kunit: allow to know if we are in a test
MAINTAINERS | 1 +
rust/kernel/kunit.rs | 181 +++++++++++++++++++++++++++++++++++++++++++++++++++
rust/kernel/lib.rs | 1 +
rust/macros/kunit.rs | 149 ++++++++++++++++++++++++++++++++++++++++++
rust/macros/lib.rs | 29 +++++++++
5 files changed, 361 insertions(+)
---
base-commit: 64bd4641310c41a1ecf07c13c67bc0ed61045dfd
change-id: 20230720-rustbind-477964954da5
Best regards,
--
David Gow <davidgow(a)google.com>
[ commit be37bed754ed90b2655382f93f9724b3c1aae847 upstream ]
Dan Carpenter spotted that test_fw_config->reqs will be leaked if
trigger_batched_requests_store() is called two or more times.
The same appears with trigger_batched_requests_async_store().
This bug wasn't triggered by the tests, but observed by Dan's visual
inspection of the code.
The recommended workaround was to return -EBUSY if test_fw_config->reqs
is already allocated.
Fixes: c92316bf8e94 ("test_firmware: add batched firmware tests")
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v4.14
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Suggested-by: Takashi Iwai <tiwai(a)suse.de>
Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Acked-by: Luis Chamberlain <mcgrof(a)kernel.org>
Link: https://lore.kernel.org/r/20230509084746.48259-2-mirsad.todorovac@alu.unizg…
Signed-off-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
[ This fix is applied against the 4.14 stable branch. There are no changes to the ]
[ fix in code when compared to the upstread, only the reformatting for backport. ]
---
lib/test_firmware.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index 1c5e5246bf10..5318c5e18acf 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -621,6 +621,11 @@ static ssize_t trigger_batched_requests_store(struct device *dev,
mutex_lock(&test_fw_mutex);
+ if (test_fw_config->reqs) {
+ rc = -EBUSY;
+ goto out_bail;
+ }
+
test_fw_config->reqs = vzalloc(sizeof(struct test_batched_req) *
test_fw_config->num_requests * 2);
if (!test_fw_config->reqs) {
@@ -723,6 +728,11 @@ ssize_t trigger_batched_requests_async_store(struct device *dev,
mutex_lock(&test_fw_mutex);
+ if (test_fw_config->reqs) {
+ rc = -EBUSY;
+ goto out_bail;
+ }
+
test_fw_config->reqs = vzalloc(sizeof(struct test_batched_req) *
test_fw_config->num_requests * 2);
if (!test_fw_config->reqs) {
--
2.39.3
Hi, Willy, Thomas
Currently, since this part of the discussion is out of the original topic [1],
as suggested by Willy, open a new thread for this.
[1]: https://lore.kernel.org/lkml/20230731224929.GA18296@1wt.eu/#R
The original topic [1] tries to enable -Wall (or even -Wextra) to report some
issues (include the dead unused functions/data) found during compiling stage,
this further propose a method to enable '-ffunction-sections -fdata-sections
-Wl,--gc-sections,--print-gc-sections to' find the dead unused functions/data
during linking stage:
Just thought about gc-section, do we need to further remove dead code/data
in the binary? I don't think it is necessary for nolibc-test itself, but with
'-Wl,--gc-sections -Wl,--print-gc-sections' may be a good helper to show us
which ones should be dropped or which ones are wrongly declared as public?
Just found '-O3 + -Wl,--gc-section + -Wl,--print-gc-sections' did tell
us something as below:
removing unused section '.text.nolibc_raise'
removing unused section '.text.nolibc_memmove'
removing unused section '.text.nolibc_abort'
removing unused section '.text.nolibc_memcpy'
removing unused section '.text.__stack_chk_init'
removing unused section '.text.is_setting_valid'
These info may help us further add missing 'static' keyword or find
another method to to drop the wrongly used status of some functions from
the code side.
Let's continue the discussion as below.
> On Mon, Jul 31, 2023 at 08:36:05PM +0200, Thomas Wei�schuh wrote:
>
> [...]
>
> > > > > It is very easy to add the missing 'static' keyword for is_setting_valid(), but
> > > > > for __stack_chk_init(), is it ok for us to convert it to 'static' and remove
> > > > > the 'weak' attrbute and even the 'section' attribute? seems it is only used by
> > > > > our _start_c() currently.
> > > >
> > > > Making is_setting_valid(), __stack_chk_init() seems indeed useful.
> > > > Also all the run_foo() test functions.
> > >
> > > Most of them could theoretically be turned to static. *But* it causes a
> > > problem which is that it will multiply their occurrences in multi-unit
> > > programs, and that's in part why we've started to use weak instead. Also
> > > if you run through gdb and want to mark a break point, you won't have the
> > > symbol when it's static,
Willy, did I misunderstand something again? a simple test shows, seems this is
not really always like that, static mainly means 'local', the symbol is still
there if without -O2/-Os and is able to be set a breakpoint at:
// test.c: gcc -o test test.c
#include <stdio.h>
static int test (void)
{
printf("hello, world!\n");
}
int main(void)
{
test();
return 0;
}
Even with -Os/-O2, an additional '-g' is able to generate the 'test' symbol for
debug as we expect.
> and the code will appear at multiple locations,
> > > which is really painful. I'd instead really prefer to avoid static when
> > > we don't strictly want to inline the code, and prefer weak when possible
> > > because we know many of them will be dropped at link time (and that's
> > > the exact purpose).
For the empty __stack_chk_init() one (when the arch not support stackprotector)
we used, when with 'weak', it is not possible drop it during link time even
with -O3, the weak one will be dropped may be only when there is a global one
with the same name is used or the 'weak' one is never really used?
#include <stdio.h>
__attribute__((weak,unused,section(".text.nolibc_memset")))
int test (void)
{
printf("hello, world!\n");
}
int main(void)
{
test();
return 0;
}
0000000000001060 <main>:
1060: f3 0f 1e fa endbr64
1064: 48 83 ec 08 sub $0x8,%rsp
1068: e8 03 01 00 00 callq 1170 <test>
106d: 31 c0 xor %eax,%eax
106f: 48 83 c4 08 add $0x8,%rsp
1073: c3 retq
1074: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
107b: 00 00 00
107e: 66 90 xchg %ax,%ax
Seems it is either impossible to add a 'inline' keyword again with the 'weak'
attribute (warned by compiler), so, the _start_c (itself is always called by
_start) will always add an empty call to the weak empty __stack_chk_init(),
-Os/-O2/-O3 don't help. for such an empty function, in my opinion, as the size
we want to care about, the calling place should be simply removed by compiler.
Test also shows, with current __inline__ method, the calling place is removed,
but with c89, the __stack_chk_init() itself will not be droped automatically,
only if not with -std=c89, it will be dropped and not appear in the
--print-gc-sections result.
Even for a supported architecture, the shorter __stack_chk_init() may be better
to inlined to the _start_c()?
So, If my above test is ok, then, we'd better simply convert the whole
__stack_chk_init() to a static one as below (I didn't investigate this deeply
due to the warning about static and weak conflict at the first time):
diff --git a/tools/include/nolibc/crt.h b/tools/include/nolibc/crt.h
index 32e128b0fb62..a5f33fef1672 100644
--- a/tools/include/nolibc/crt.h
+++ b/tools/include/nolibc/crt.h
@@ -10,7 +10,7 @@
char **environ __attribute__((weak));
const unsigned long *_auxv __attribute__((weak));
-void __stack_chk_init(void);
+static void __stack_chk_init(void);
static void exit(int);
void _start_c(long *sp)
diff --git a/tools/include/nolibc/stackprotector.h b/tools/include/nolibc/stackprotector.h
index b620f2b9578d..13f1d0e60387 100644
--- a/tools/include/nolibc/stackprotector.h
+++ b/tools/include/nolibc/stackprotector.h
@@ -37,8 +37,7 @@ void __stack_chk_fail_local(void)
__attribute__((weak,section(".data.nolibc_stack_chk")))
uintptr_t __stack_chk_guard;
-__attribute__((weak,section(".text.nolibc_stack_chk"))) __no_stack_protector
-void __stack_chk_init(void)
+static __no_stack_protector void __stack_chk_init(void)
{
my_syscall3(__NR_getrandom, &__stack_chk_guard, sizeof(__stack_chk_guard), 0);
/* a bit more randomness in case getrandom() fails, ensure the guard is never 0 */
@@ -46,7 +45,7 @@ void __stack_chk_init(void)
__stack_chk_guard ^= (uintptr_t) &__stack_chk_guard;
}
#else /* !defined(_NOLIBC_STACKPROTECTOR) */
-__inline__ void __stack_chk_init(void) {}
+static void __stack_chk_init(void) {}
#endif /* defined(_NOLIBC_STACKPROTECTOR) */
#endif /* _NOLIBC_STACKPROTECTOR_H */
> >
> > Thanks for the clarification. I forgot about that completely!
> >
> > The stuff from nolibc-test.c itself (run_foo() and is_settings_valid())
> > should still be done.
>
> Yes, likely. Nolibc-test should be done just like users expect to use
> nolibc, and nolibc should be the most flexible possible.
For the 'static' keyword we tested above, '-g' may help the debug requirement,
so, is ok for us to apply 'static' for them safely now?
A further test shows, with 'static' on _start_c() doesn't help the size, for it
is always called from _start(), will never save move instructions, but we need
a more 'used' attribute to silence the error 'nolibc-test.c:(.text+0x38cd):
undefined reference to `_start_c'', so, reserve it as the before simpler 'void
_start_c(void)' may be better?
static __attribute__((used)) void _start_c(long *sp)
Thanks,
Zhangjin
>
> Cheers,
> Willy
[ Upstream commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 ]
Dan Carpenter spotted a race condition in a couple of situations like
these in the test_firmware driver:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
ret = kstrtou8(buf, 10, &val);
if (ret)
return ret;
mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
static ssize_t config_num_requests_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
int rc;
mutex_lock(&test_fw_mutex);
if (test_fw_config->reqs) {
pr_err("Must call release_all_firmware prior to changing config\n");
rc = -EINVAL;
mutex_unlock(&test_fw_mutex);
goto out;
}
mutex_unlock(&test_fw_mutex);
// NOTE: HERE is the race!!! Function can be preempted!
// test_fw_config->reqs can change between the release of
// the lock about and acquire of the lock in the
// test_dev_config_update_u8()
rc = test_dev_config_update_u8(buf, count,
&test_fw_config->num_requests);
out:
return rc;
}
static ssize_t config_read_fw_idx_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
return test_dev_config_update_u8(buf, count,
&test_fw_config->read_fw_idx);
}
The function test_dev_config_update_u8() is called from both the locked
and the unlocked context, function config_num_requests_store() and
config_read_fw_idx_store() which can both be called asynchronously as
they are driver's methods, while test_dev_config_update_u8() and siblings
change their argument pointed to by u8 *cfg or similar pointer.
To avoid deadlock on test_fw_mutex, the lock is dropped before calling
test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8()
itself, but alas this creates a race condition.
Having two locks wouldn't assure a race-proof mutual exclusion.
This situation is best avoided by the introduction of a new, unlocked
function __test_dev_config_update_u8() which can be called from the locked
context and reducing test_dev_config_update_u8() to:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
mutex_lock(&test_fw_mutex);
ret = __test_dev_config_update_u8(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
}
doing the locking and calling the unlocked primitive, which enables both
locked and unlocked versions without duplication of code.
Fixes: c92316bf8e948 ("test_firmware: add batched firmware tests")
Cc: Luis R. Rodriguez <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Takashi Iwai <tiwai(a)suse.de>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v5.4
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Link: https://lore.kernel.org/r/20230509084746.48259-1-mirsad.todorovac@alu.unizg…
Signed-off-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
[ NOTE: This patch is tested against 5.4 stable. Some of the fixes from the upstream ]
[ patch to stable 5.10+ do not apply. ]
---
v2:
minor clarification and consistency improvements. no change to code.
lib/test_firmware.c | 37 ++++++++++++++++++++++++++++---------
1 file changed, 28 insertions(+), 9 deletions(-)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index 38553944e967..92d7195d5b5b 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -301,16 +301,26 @@ static ssize_t config_test_show_str(char *dst,
return len;
}
-static int test_dev_config_update_bool(const char *buf, size_t size,
- bool *cfg)
+static inline int __test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
{
int ret;
- mutex_lock(&test_fw_mutex);
if (strtobool(buf, cfg) < 0)
ret = -EINVAL;
else
ret = size;
+
+ return ret;
+}
+
+static int test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_bool(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
@@ -340,7 +350,7 @@ static ssize_t test_dev_config_show_int(char *buf, int cfg)
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+static inline int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
long new;
@@ -352,14 +362,23 @@ static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
if (new > U8_MAX)
return -EINVAL;
- mutex_lock(&test_fw_mutex);
*(u8 *)cfg = new;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
+static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_u8(buf, size, cfg);
+ mutex_unlock(&test_fw_mutex);
+
+ return ret;
+}
+
static ssize_t test_dev_config_show_u8(char *buf, u8 cfg)
{
u8 val;
@@ -392,10 +411,10 @@ static ssize_t config_num_requests_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_u8(buf, count,
- &test_fw_config->num_requests);
+ rc = __test_dev_config_update_u8(buf, count,
+ &test_fw_config->num_requests);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
--
2.34.1
iommufd gives userspace the capability to manipulate iommu subsytem.
e.g. DMA map/unmap etc. In the near future, it will support iommu nested
translation. Different platform vendors have different implementation for
the nested translation. For example, Intel VT-d supports using guest I/O
page table as the stage-1 translation table. This requires guest I/O page
table be compatible with hardware IOMMU. So before set up nested translation,
userspace needs to know the hardware iommu information to understand the
nested translation requirements.
This series reports the iommu hardware information for a given device
which has been bound to iommufd. It is preparation work for userspace to
allocate hwpt for given device. Like the nested translation support[1].
This series introduces an iommu op to report the iommu hardware info,
and an ioctl IOMMU_GET_HW_INFO is added to report such hardware info to
user. enum iommu_hw_info_type is defined to differentiate the iommu hardware
info reported to user hence user can decode them. This series only adds the
framework for iommu hw info reporting, the complete reporting path needs vendor
specific definition and driver support. The full code is available in [1]
as well.
[1] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
Change log:
v4:
- Rename ioctl to IOMMU_GET_HW_INFO and structure to iommu_hw_info
- Move the iommufd_get_hw_info handler to main.c
- Place iommu_hw_info prior to iommu_hwpt_alloc
- Update the function namings accordingly
- Update uapi kdocs
v3: https://lore.kernel.org/linux-iommu/20230511143024.19542-1-yi.l.liu@intel.c…
- Add r-b from Baolu
- Rename IOMMU_HW_INFO_TYPE_DEFAULT to be IOMMU_HW_INFO_TYPE_NONE to
better suit what it means
- Let IOMMU_DEVICE_GET_HW_INFO succeed even the underlying iommu driver
does not have driver-specific data to report per below remark.
https://lore.kernel.org/kvm/ZAcwJSK%2F9UVI9LXu@nvidia.com/
v2: https://lore.kernel.org/linux-iommu/20230309075358.571567-1-yi.l.liu@intel.…
- Drop patch 05 of v1 as it is already covered by other series
- Rename the capability info to be iommu hardware info
v1: https://lore.kernel.org/linux-iommu/20230209041642.9346-1-yi.l.liu@intel.co…
Regards,
Yi Liu
Lu Baolu (1):
iommu: Add new iommu op to get iommu hardware information
Nicolin Chen (1):
iommufd/selftest: Add coverage for IOMMU_GET_HW_INFO ioctl
Yi Liu (2):
iommu: Move dev_iommu_ops() to private header
iommufd: Add IOMMU_GET_HW_INFO
drivers/iommu/iommu-priv.h | 11 +++
drivers/iommu/iommufd/device.c | 1 +
drivers/iommu/iommufd/iommufd_test.h | 9 +++
drivers/iommu/iommufd/main.c | 76 +++++++++++++++++++
drivers/iommu/iommufd/selftest.c | 16 ++++
include/linux/iommu.h | 27 ++++---
include/uapi/linux/iommufd.h | 44 +++++++++++
tools/testing/selftests/iommu/iommufd.c | 17 ++++-
tools/testing/selftests/iommu/iommufd_utils.h | 26 +++++++
9 files changed, 215 insertions(+), 12 deletions(-)
--
2.34.1
This small series of 4 patches adds some improvements in MPTCP
selftests:
- Patch 1 reworks the detailed report of mptcp_join.sh selftest to
better display what went well or wrong per test.
- Patch 2 adds colours (if supported, forced and/or not disabled) in
mptcp_join.sh selftest output to help spotting issues.
- Patch 3 modifies an MPTCP selftest tool to interact with the
path-manager via Netlink to always look for errors if any. This makes
sure odd behaviours can be seen in the logs and errors can be caught
later if needed.
- Patch 4 removes stdout and stderr redirections to /dev/null when using
pm_nl_ctl if no errors are expected in order to log odd behaviours.
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Matthieu Baerts (4):
selftests: mptcp: join: rework detailed report
selftests: mptcp: join: colored results
selftests: mptcp: pm_nl_ctl: always look for errors
selftests: mptcp: userspace_pm: unmute unexpected errors
tools/testing/selftests/net/mptcp/mptcp_join.sh | 452 ++++++++++------------
tools/testing/selftests/net/mptcp/mptcp_lib.sh | 39 ++
tools/testing/selftests/net/mptcp/pm_netlink.sh | 6 +-
tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 33 +-
tools/testing/selftests/net/mptcp/userspace_pm.sh | 100 ++---
5 files changed, 329 insertions(+), 301 deletions(-)
---
base-commit: 64a37272fa5fb2d951ebd1a96fd42b045d64924c
change-id: 20230728-upstream-net-next-20230728-mptcp-selftests-misc-0190cfd69ef9
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
There is a warning reported by coccinelle:
./tools/testing/selftests/cgroup/test_zswap.c:211:6-18: WARNING:
Unsigned expression compared with zero: stored_pages < 0
The type of "stored_pages" is size_t, which always be an unsigned type,
so it is impossible less than zero. Drop the if statements to silence
the warning.
Signed-off-by: Li Zetao <lizetao1(a)huawei.com>
---
tools/testing/selftests/cgroup/test_zswap.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c
index 49def87a909b..dbad8d0cd090 100644
--- a/tools/testing/selftests/cgroup/test_zswap.c
+++ b/tools/testing/selftests/cgroup/test_zswap.c
@@ -208,8 +208,6 @@ static int test_no_kmem_bypass(const char *root)
free(trigger_allocation);
if (get_zswap_stored_pages(&stored_pages))
break;
- if (stored_pages < 0)
- break;
/* If memory was pushed to zswap, verify it belongs to memcg */
if (stored_pages > stored_pages_threshold) {
int zswapped = cg_read_key_long(test_group, "memory.stat", "zswapped ");
--
2.34.1
This 3 patch series consists of fixes to proc_filter test
found during linun-next testing.
The first patch fixes the LKFT reported compile error, second
one adds .gitignore and the third fixes error paths to skip
instead of fail (root check, and argument checks)
Shuah Khan (3):
selftests:connector: Fix Makefile to include KHDR_INCLUDES
selftests:connector: Add .gitignore and poupulate it with test
selftests:connector: Add root check and fix arg error paths to skip
tools/testing/selftests/connector/.gitignore | 1 +
tools/testing/selftests/connector/Makefile | 2 +-
tools/testing/selftests/connector/proc_filter.c | 9 +++++++--
3 files changed, 9 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/connector/.gitignore
--
2.39.2
The openvswitch selftests currently contain a few cases for managing the
datapath, which includes creating datapath instances, adding interfaces,
and doing some basic feature / upcall tests. This is useful to validate
the control path.
Add the ability to program some of the more common flows with actions. This
can be improved overtime to include regression testing, etc.
Changes from original:
1. Fix issue when parsing ipv6 in the NAT action
2. Fix issue calculating length during ctact parsing
3. Fix error message when invalid bridge is passed
4. Fold in Adrian's patch to support key masks
Aaron Conole (4):
selftests: openvswitch: add an initial flow programming case
selftests: openvswitch: add a test for ipv4 forwarding
selftests: openvswitch: add basic ct test case parsing
selftests: openvswitch: add ct-nat test case with ipv4
Adrian Moreno (1):
selftests: openvswitch: support key masks
.../selftests/net/openvswitch/openvswitch.sh | 223 +++++++
.../selftests/net/openvswitch/ovs-dpctl.py | 601 +++++++++++++++++-
2 files changed, 800 insertions(+), 24 deletions(-)
--
2.40.1
I checked and the Landlock ptrace test failed because Yama is enabled,
which is expected. You can check that with
/proc/sys/kernel/yama/ptrace_scope
Jeff Xu sent a patch to fix this case but it is not ready yet:
https://lore.kernel.org/r/20220628222941.2642917-1-jeffxu@google.com
Could you please send a new patch Jeff, and add Limin in Cc?
On 29/11/2022 12:26, limin wrote:
> cat /proc/cmdline
> BOOT_IMAGE=/vmlinuz-6.1.0-next-20221116
> root=UUID=a65b3a79-dc02-4728-8a0c-5cf24f4ae08b ro
> systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all
>
>
> config
>
> #
> # Automatically generated file; DO NOT EDIT.
> # Linux/x86 6.1.0-rc6 Kernel Configuration
> #
[...]
> CONFIG_SECURITY_YAMA=y
[...]
> CONFIG_LSM="landlock,lockdown,yama,integrity,apparmor"
[...]
>
> On 2022/11/29 19:03, Mickaël Salaün wrote:
>> I tested with next-20221116 and all tests are OK. Could you share your
>> kernel configuration with a link? What is the content of /proc/cmdline?
>>
>> On 29/11/2022 02:42, limin wrote:
>>> I run test on Linux ubuntu2204 6.1.0-next-20221116
>>>
>>> I did't use yama.
>>>
>>> you can reproduce by this step:
>>>
>>> cd kernel_src
>>>
>>> cd tools/testing/selftests/landlock/
>>> make
>>> ./ptrace_test
>>>
>>>
>>>
>>>
>>> On 2022/11/29 3:44, Mickaël Salaün wrote:
>>>> This patch changes the test semantic and then cannot work on my test
>>>> environment. On which kernel did you run test? Do you use Yama or
>>>> something similar?
>>>>
>>>> On 28/11/2022 03:04, limin wrote:
>>>>> Tests PTRACE_ATTACH and PTRACE_MODE_READ on the parent,
>>>>> trace parent return -1 when child== 0
>>>>> How to reproduce warning:
>>>>> $ make -C tools/testing/selftests TARGETS=landlock run_tests
>>>>>
>>>>> Signed-off-by: limin <limin100(a)huawei.com>
>>>>> ---
>>>>> tools/testing/selftests/landlock/ptrace_test.c | 5 ++---
>>>>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/tools/testing/selftests/landlock/ptrace_test.c
>>>>> b/tools/testing/selftests/landlock/ptrace_test.c
>>>>> index c28ef98ff3ac..88c4dc63eea0 100644
>>>>> --- a/tools/testing/selftests/landlock/ptrace_test.c
>>>>> +++ b/tools/testing/selftests/landlock/ptrace_test.c
>>>>> @@ -267,12 +267,11 @@ TEST_F(hierarchy, trace)
>>>>> /* Tests PTRACE_ATTACH and PTRACE_MODE_READ on the
>>>>> parent. */
>>>>> err_proc_read = test_ptrace_read(parent);
>>>>> ret = ptrace(PTRACE_ATTACH, parent, NULL, 0);
>>>>> + EXPECT_EQ(-1, ret);
>>>>> + EXPECT_EQ(EPERM, errno);
>>>>> if (variant->domain_child) {
>>>>> - EXPECT_EQ(-1, ret);
>>>>> - EXPECT_EQ(EPERM, errno);
>>>>> EXPECT_EQ(EACCES, err_proc_read);
>>>>> } else {
>>>>> - EXPECT_EQ(0, ret);
>>>>> EXPECT_EQ(0, err_proc_read);
>>>>> }
>>>>> if (ret == 0) {
Add the definition of the '__weak' attribute since it is not available in
GCC 11.3.0 and newer:
rseq.c:41:1: error: unknown type name ‘__weak’
41 | __weak ptrdiff_t __rseq_offset;
Fixes: 3bcbc20942db ("selftests/rseq: Play nice with binaries statically linked against glibc 2.35+")
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
---
tools/testing/selftests/rseq/rseq.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
index a723da253244..584eec9b0930 100644
--- a/tools/testing/selftests/rseq/rseq.c
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -34,6 +34,10 @@
#include "../kselftest.h"
#include "rseq.h"
+#ifndef __weak
+#define __weak __attribute__((weak))
+#endif
+
/*
* Define weak versions to play nice with binaries that are statically linked
* against a libc that doesn't support registering its own rseq.
--
2.34.1
Hi,
Since v6.5-rc1, kunit gained a devm/drmm-like mechanism that makes tests
resources much easier to cleanup.
This series converts the existing tests to use those new actions where
relevant.
Let me know what you think,
Maxime
Signed-off-by: Maxime Ripard <mripard(a)kernel.org>
---
Changes in v3:
- Fixed the build cast warnings by switching to wrapper functions
- Link to v2: https://lore.kernel.org/r/20230720-kms-kunit-actions-rework-v2-0-175017bd56…
Changes in v2:
- Fix some typos
- Use plaltform_device_del instead of removing the call to
platform_device_put after calling platform_device_add
- Link to v1: https://lore.kernel.org/r/20230710-kms-kunit-actions-rework-v1-0-722c58d72c…
---
Maxime Ripard (11):
drm/tests: helpers: Switch to kunit actions
drm/tests: client-modeset: Remove call to drm_kunit_helper_free_device()
drm/tests: modes: Remove call to drm_kunit_helper_free_device()
drm/tests: probe-helper: Remove call to drm_kunit_helper_free_device()
drm/tests: helpers: Create a helper to allocate a locking ctx
drm/tests: helpers: Create a helper to allocate an atomic state
drm/vc4: tests: pv-muxing: Remove call to drm_kunit_helper_free_device()
drm/vc4: tests: mock: Use a kunit action to unregister DRM device
drm/vc4: tests: pv-muxing: Switch to managed locking init
drm/vc4: tests: Switch to atomic state allocation helper
drm/vc4: tests: pv-muxing: Document test scenario
drivers/gpu/drm/tests/drm_client_modeset_test.c | 8 --
drivers/gpu/drm/tests/drm_kunit_helpers.c | 141 +++++++++++++++++++++++-
drivers/gpu/drm/tests/drm_modes_test.c | 8 --
drivers/gpu/drm/tests/drm_probe_helper_test.c | 8 --
drivers/gpu/drm/vc4/tests/vc4_mock.c | 12 ++
drivers/gpu/drm/vc4/tests/vc4_test_pv_muxing.c | 115 +++++++------------
include/drm/drm_kunit_helpers.h | 7 ++
7 files changed, 198 insertions(+), 101 deletions(-)
---
base-commit: d7b3af5a77e8d8da28f435f313e069aea5bcf172
change-id: 20230710-kms-kunit-actions-rework-5d163762c93b
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
Hi,
Since v6.5-rc1, kunit gained a devm/drmm-like mechanism that makes tests
resources much easier to cleanup.
This series converts the existing tests to use those new actions where
relevant.
Let me know what you think,
Maxime
Signed-off-by: Maxime Ripard <mripard(a)kernel.org>
---
Changes in v2:
- Fix some typos
- Use plaltform_device_del instead of removing the call to
platform_device_put after calling platform_device_add
- Link to v1: https://lore.kernel.org/r/20230710-kms-kunit-actions-rework-v1-0-722c58d72c…
---
Maxime Ripard (11):
drm/tests: helpers: Switch to kunit actions
drm/tests: client-modeset: Remove call to drm_kunit_helper_free_device()
drm/tests: modes: Remove call to drm_kunit_helper_free_device()
drm/tests: probe-helper: Remove call to drm_kunit_helper_free_device()
drm/tests: helpers: Create a helper to allocate a locking ctx
drm/tests: helpers: Create a helper to allocate an atomic state
drm/vc4: tests: pv-muxing: Remove call to drm_kunit_helper_free_device()
drm/vc4: tests: mock: Use a kunit action to unregister DRM device
drm/vc4: tests: pv-muxing: Switch to managed locking init
drm/vc4: tests: Switch to atomic state allocation helper
drm/vc4: tests: pv-muxing: Document test scenario
drivers/gpu/drm/tests/drm_client_modeset_test.c | 8 --
drivers/gpu/drm/tests/drm_kunit_helpers.c | 108 +++++++++++++++++++++-
drivers/gpu/drm/tests/drm_modes_test.c | 8 --
drivers/gpu/drm/tests/drm_probe_helper_test.c | 8 --
drivers/gpu/drm/vc4/tests/vc4_mock.c | 5 ++
drivers/gpu/drm/vc4/tests/vc4_test_pv_muxing.c | 115 +++++++++---------------
include/drm/drm_kunit_helpers.h | 7 ++
7 files changed, 158 insertions(+), 101 deletions(-)
---
base-commit: c58c49dd89324b18a812762a2bfa5a0458e4f252
change-id: 20230710-kms-kunit-actions-rework-5d163762c93b
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
Submit the top-level headers also from the kunit test module notifier
initialization callback, so external tools that are parsing dmesg for
kunit test output are able to tell how many test suites should be expected
and whether to continue parsing after complete output from the first test
suite is collected.
Extend kunit module notifier initialization callback with a processing
path for only listing the tests provided by a module if the kunit action
parameter is set to "list", so external tools can obtain a list of test
cases to be executed in advance and can make a better job on assigning
kernel messages interleaved with kunit output to specific tests.
Use test filtering functions in kunit module notifier callback functions,
so external tools are able to execute individual test cases from kunit
test modules in order to still better isolate their potential impact on
kernel messages that appear interleaved with output from other tests.
v2: Fix new name of a structure moved to kunit namespace not updated
across all uses.
Janusz Krzysztofik (3):
kunit: Report the count of test suites in a module
kunit: Make 'list' action available to kunit test modules
kunit: Allow kunit test modules to use test filtering
include/kunit/test.h | 14 +++++++++++
lib/kunit/executor.c | 57 +++++++++++++++++++++++++-------------------
lib/kunit/test.c | 57 +++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 103 insertions(+), 25 deletions(-)
--
2.41.0
Hi, Willy
Most of the suggestions of v2 [1] have been applied in this v3 revision,
except the local menuconfig and mrproper targets, as explained in [2].
A fresh run with tinyconfig for ppc, ppc64 and ppc64le:
$ for arch in ppc ppc64 ppc64le; do \
mkdir -p $PWD/kernel-$arch;
time make defconfig run DEFCONFIG=tinyconfig ARCH=$arch O=$PWD/kernel-$arch RUN_OUT=$PWD/run.$arch.out;
done
rerun for ppc, ppc64 and ppc64le:
$ for arch in ppc ppc64 ppc64le; do \
make rerun ARCH=$arch O=$PWD/kernel-$arch RUN_OUT=$PWD/run.$arch.out;
done
Running /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/kernel-ppc/vmlinux on qemu-system-ppc
>> [ppc] Kernel command line: console=ttyS0 panic=-1
printk: console [ttyS0] enabled
Run /init as init process
Running test 'startup'
Running test 'syscall'
Running test 'stdlib'
Running test 'vfprintf'
Running test 'protection'
Leaving init with final status: 0
reboot: Power down
powered off, test finish
qemu-system-ppc: terminating on signal 15 from pid 190248 ()
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
See all results in /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/run.ppc.out
Running /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/kernel-ppc64/vmlinux on qemu-system-ppc64
Linux version 6.4.0+ (ubuntu@linux-lab) (powerpc64le-linux-gnu-gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #2 SMP Fri Jul 28 01:40:55 CST 2023
Kernel command line: console=hvc0 panic=-1
printk: console [hvc0] enabled
printk: console [hvc0] enabled
Run /init as init process
Running test 'startup'
Running test 'syscall'
Running test 'stdlib'
Running test 'vfprintf'
Running test 'protection'
Leaving init with final status: 0
reboot: Power down
powered off, test finish
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
See all results in /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/run.ppc64.out
Running /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/kernel-ppc64le/arch/powerpc/boot/zImage on qemu-system-ppc64le
Linux version 6.4.0+ (ubuntu@linux-lab) (powerpc64le-linux-gnu-gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #2 SMP Fri Jul 28 01:41:12 CST 2023
Kernel command line: console=hvc0 panic=-1
Run /init as init process
Running test 'startup'
Running test 'syscall'
Running test 'stdlib'
Running test 'vfprintf'
Running test 'protection'
Leaving init with final status: 0
reboot: Power down
powered off, test finish
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
See all results in /labs/linux-lab/src/linux-stable/tools/testing/selftests/nolibc/run.ppc64le.out
A fast report on existing test logs:
$ for arch in ppc ppc64 ppc64le; do \
make report ARCH=$arch RUN_OUT=$PWD/run.$arch.out | grep status; \
done
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
Changes from v2 --> v3:
* selftests/nolibc: allow report with existing test log
selftests/nolibc: fix up O= option support
selftests/nolibc: allow customize CROSS_COMPILE by architecture
selftests/nolibc: customize CROSS_COMPILE for 32/64-bit powerpc
selftests/nolibc: tinyconfig: add extra common options
No Change.
* selftests/nolibc: add macros to reduce duplicated changes
Remove REPORT_RUN_OUT and LOG_OUT.
* selftests/nolibc: string the core targets
Removed extconfig target from our v3 powerpc patchset [3], the
operations have been merged into the defconfig target.
Let kernel depends on $(KERNEL_CONFIG) instead of the removed
extconfig.
* selftests/nolibc: add menuconfig and mrproper for development
like the other local nolibc targets, still require local menuconfig
and mrproper targets for consistent usage with the same ARCH and no -C
/path/to/srctree
Merge them together to reduce duplicated entries.
* selftests/nolibc: allow quit qemu-system when poweroff fails
Enhance timeout logic with more expected strings print and detection
about the booting of bios, kernel, init and test.
Add a default 10 seconds of QEMU_TIMEOUT for every architecture to
detect all of the potential boog hang or failed poweroff.
* selftests/nolibc: customize QEMU_TIMEOUT for ppc64/ppc64le
Reduce QEMU_TIMEOUT from 60 seconds to a more normal 15 and 20
seconds for ppc64 and ppc64le respectively. the main time cost is
the slow bios used.
* selftests/nolibc: tinyconfig: add support for 32/64-bit powerpc
Rename the file names to shorter ones as suggestions from the powerpc
patchset.
* selftests/nolibc: speed up some targets with multiple jobs
New to speed up with -j<N> by default.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1689759351.git.falcon@tinylab.org/
[2]: https://lore.kernel.org/lkml/20230727132418.117924-1-falcon@tinylab.org/
[3]: https://lore.kernel.org/lkml/8e9e5ac6283c6ec2ecf10a70ce55b219028497c1.16904…
Zhangjin Wu (12):
selftests/nolibc: allow report with existing test log
selftests/nolibc: add macros to reduce duplicated changes
selftests/nolibc: fix up O= option support
selftests/nolibc: string the core targets
selftests/nolibc: allow customize CROSS_COMPILE by architecture
selftests/nolibc: customize CROSS_COMPILE for 32/64-bit powerpc
selftests/nolibc: add menuconfig and mrproper for development
selftests/nolibc: allow quit qemu-system when poweroff fails
selftests/nolibc: customize QEMU_TIMEOUT for ppc64/ppc64le
selftests/nolibc: tinyconfig: add extra common options
selftests/nolibc: tinyconfig: add support for 32/64-bit powerpc
selftests/nolibc: speed up some targets with multiple jobs
tools/testing/selftests/nolibc/Makefile | 102 ++++++++++++++----
.../selftests/nolibc/configs/common.config | 4 +
.../selftests/nolibc/configs/ppc.config | 3 +
.../selftests/nolibc/configs/ppc64.config | 3 +
.../selftests/nolibc/configs/ppc64le.config | 4 +
5 files changed, 98 insertions(+), 18 deletions(-)
create mode 100644 tools/testing/selftests/nolibc/configs/common.config
create mode 100644 tools/testing/selftests/nolibc/configs/ppc64.config
create mode 100644 tools/testing/selftests/nolibc/configs/ppc64le.config
--
2.25.1
Hi, Willy, Thomas
v3 here is to fix up two issues introduced in v2 powerpc patchset [1].
- One is restore the wrongly removed '\' for a '\$$ARCH'
- Another is add the missing $(ARCH).config for ppc, the default variant
for powerpc is renamed to ppc in v2 (as discussed with Willy in [2]), but
ppc.config is missing in v2 patchset, not sure why this happen, may a
'git clean -fdx .' is required to do a new test, just did it.
Btw, the v3 tinyconfig-part1 for powerpc is ready, I will send it out
soon.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1690373704.git.falcon@tinylab.org/
[2]: https://lore.kernel.org/lkml/ZL9leVOI25S2+0+g@1wt.eu/
Zhangjin Wu (7):
tools/nolibc: add support for powerpc
tools/nolibc: add support for powerpc64
selftests/nolibc: add extra configs customize support
selftests/nolibc: add XARCH and ARCH mapping support
selftests/nolibc: add test support for ppc
selftests/nolibc: add test support for ppc64le
selftests/nolibc: add test support for ppc64
tools/include/nolibc/arch-powerpc.h | 202 ++++++++++++++++++
tools/include/nolibc/arch.h | 2 +
tools/testing/selftests/nolibc/Makefile | 46 +++-
.../selftests/nolibc/configs/ppc.config | 3 +
4 files changed, 246 insertions(+), 7 deletions(-)
create mode 100644 tools/include/nolibc/arch-powerpc.h
create mode 100644 tools/testing/selftests/nolibc/configs/ppc.config
--
2.25.1
Hi,
Zhangjin and I are working on a tiny shell with nolibc. This patch
enables the missing pipe() with its testcase.
Thanks.
Yuan Tan (2):
tools/nolibc: add pipe() support
selftests/nolibc: add testcase for pipe.
tools/include/nolibc/sys.h | 17 ++++++++++
tools/testing/selftests/nolibc/nolibc-test.c | 34 ++++++++++++++++++++
2 files changed, 51 insertions(+)
--
2.39.2
Hi,
This is the semi-friendly patch-bot of Greg Kroah-Hartman.
Markus, you seem to have sent a nonsensical or otherwise pointless
review comment to a patch submission on a Linux kernel developer mailing
list. I strongly suggest that you not do this anymore. Please do not
bother developers who are actively working to produce patches and
features with comments that, in the end, are a waste of time.
Patch submitter, please ignore Markus's suggestion; you do not need to
follow it at all. The person/bot/AI that sent it is being ignored by
almost all Linux kernel maintainers for having a persistent pattern of
behavior of producing distracting and pointless commentary, and
inability to adapt to feedback. Please feel free to also ignore emails
from them.
thanks,
greg k-h's patch email bot
damos_new_filter() is returning a damos_filter struct without
initializing its ->list field. And the users of the function uses the
struct without initializing the field. As a result, uninitialized
memory access error is possible. Actually, a kernel NULL pointer
dereference BUG can be triggered using DAMON user-space tool, like
below.
# damo start --damos_action stat --damos_filter anon matching
# damo tune --damos_action stat --damos_filter anon matching --damos_filter anon nomatching
# dmesg
[...]
[ 36.908136] BUG: kernel NULL pointer dereference, address: 0000000000000008
[ 36.910483] #PF: supervisor write access in kernel mode
[ 36.912238] #PF: error_code(0x0002) - not-present page
[ 36.913415] PGD 0 P4D 0
[ 36.913978] Oops: 0002 [#1] PREEMPT SMP PTI
[ 36.914878] CPU: 32 PID: 1335 Comm: kdamond.0 Not tainted 6.5.0-rc3-mm-unstable-damon+ #1
[ 36.916621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 36.919051] RIP: 0010:damos_destroy_filter (include/linux/list.h:114 include/linux/list.h:137 include/linux/list.h:148 mm/damon/core.c:345 mm/damon/core.c:355)
[...]
[ 36.938247] Call Trace:
[ 36.938721] <TASK>
[...]
[ 36.950064] ? damos_destroy_filter (include/linux/list.h:114 include/linux/list.h:137 include/linux/list.h:148 mm/damon/core.c:345 mm/damon/core.c:355)
[ 36.950883] ? damon_sysfs_set_scheme_filters.isra.0 (mm/damon/sysfs-schemes.c:1573)
[ 36.952019] damon_sysfs_set_schemes (mm/damon/sysfs-schemes.c:1674 mm/damon/sysfs-schemes.c:1686)
[ 36.952875] damon_sysfs_apply_inputs (mm/damon/sysfs.c:1312 mm/damon/sysfs.c:1298)
[ 36.953757] ? damon_pa_check_accesses (mm/damon/paddr.c:168 mm/damon/paddr.c:179)
[ 36.954648] damon_sysfs_cmd_request_callback (mm/damon/sysfs.c:1329 mm/damon/sysfs.c:1359)
[...]
The first patch of this patchset fixes the bug by initializing the field in
damos_new_filter(). The second patch adds a unit test for the problem.
Note that the second patch Cc stable@ without Fixes: tag, since it would
be better to be ingested together for avoiding any future regression.
SeongJae Park (2):
mm/damon/core: initialize damo_filter->list from damos_new_filter()
mm/damon/core-test: add a test for damos_new_filter()
mm/damon/core-test.h | 13 +++++++++++++
mm/damon/core.c | 1 +
2 files changed, 14 insertions(+)
--
2.25.1
Hi, Willy, Thomas
Here is the first part of v2 of our tinyconfig support for nolibc-test
[1], the patchset subject is reserved as before.
As discussed in v1 thread [1], to easier the review progress, the whole
tinyconfig support is divided into several parts, mainly by
architecture, here is the first part, include basic preparation and
powerpc example.
This patchset should be applied after the 32/64-bit powerpc support [2],
exactly these two are required by us:
* selftests/nolibc: add extra config file customize support
* selftests/nolibc: add XARCH and ARCH mapping support
In this patchset, we firstly add some misc preparations and at last add
the tinyconfig target and use powerpc as the first example.
Tests:
// powerpc run-user
$ for arch in powerpc powerpc64 powerpc64le; do \
rm -rf $PWD/kernel-$arch; \
mkdir -p $PWD/kernel-$arch; \
make run-user XARCH=$arch O=$PWD/kernel-$arch RUN_OUT=$PWD/run.$arch.out | grep "status: "; \
done
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
// powerpc run
$ for arch in powerpc powerpc64 powerpc64le; do \
rm -rf $PWD/kernel-$arch; \
mkdir -p $PWD/kernel-$arch; \
make tinyconfig run XARCH=$arch O=$PWD/kernel-$arch RUN_OUT=$PWD/run.$arch.out; \
done
$ for arch in powerpc powerpc64 powerpc64le; do \
make report XARCH=$arch O=$PWD/kernel-$arch RUN_OUT=$PWD/run.$arch.out | grep "status: "; \
done
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
// the others, randomly choose some
$ make run-user XARCH=arm O=$PWD/kernel-arm RUN_OUT=$PWD/run.arm.out CROSS_COMPILE=arm-linux-gnueabi- | grep status:
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
$ make run-user XARCH=x86_64 O=$PWD/kernel-arm RUN_OUT=$PWD/run.x86_64.out CROSS_COMPILE=x86_64-linux-gnu- | grep status:
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
$ make run-libc-test | grep status:
165 test(s): 153 passed, 12 skipped, 0 failed => status: warning
// x86_64, require noapic kernel command line option for old qemu-system-x86_64 (v4.2.1)
$ make run XARCH=x86_64 O=$PWD/kernel-x86_64 RUN_OUT=$PWD/run.x86_64.out CROSS_COMPILE=x86_64-linux-gnu- | grep status
$ make rerun XARCH=x86_64 O=$PWD/kernel-x86_64 RUN_OUT=$PWD/run.x86_64.out CROSS_COMPILE=x86_64-linux-gnu- | grep status
165 test(s): 159 passed, 6 skipped, 0 failed => status: warning
tinyconfig mainly targets as a time-saver, the misc preparations service
for the same goal, let's take a look:
* selftests/nolibc: allow report with existing test log
Like rerun without rebuild, Add report (without rerun) to summarize
the existing test log, this may work perfectly with the 'grep status'
* selftests/nolibc: add macros to enhance maintainability
Several macros are added to dedup the shared code to shrink lines
and easier the maintainability
The macros are added just before the using area to avoid code change
conflicts in the future.
* selftests/nolibc: print running log to screen
Enable logging to let developers learn what is happening at the
first glance, without the need to edit the Makefile and rerun it.
These helps a lot when there is a long-time running, a failed
poweroff or even a forever hang.
For test summmary, the 'grep status' can be used together with the
standalone report target.
* selftests/nolibc: fix up O= option support
With objtree instead srctree for .config and IMAGE, now, O= works.
Using O=$PWD/kernel-$arch avoid the mrproer for every build.
* selftests/nolibc: add menuconfig for development
Allow manually tuning some options, mainly for a new architecture
porting.
* selftests/nolibc: add mrproper for development
selftests/nolibc: defconfig: remove mrproper target
Split the mrproper target out of defconfig, when with O=, mrproper is not
required by defconfig, but reserve it for the other use scenes.
* selftests/nolibc: string the core targets
Allow simply 'make run' instead of 'make defconfig; make extconfig;
make kernel; make run'.
* selftests/nolibc: allow quit qemu-system when poweroff fails
When poweroff fails, allow exit while detects the power off string
from output or the wait time is too long (specified by QEMU_TIMEOUT).
This helps the boards who have no poweroff support or the kernel not
enable the poweroff options (mainly for tinyconfig).
* selftests/nolibc: allow customize CROSS_COMPILE by architecture
* selftests/nolibc: customize CROSS_COMPILE for 32/64-bit powerpc
This further saves a CROSS_COMPILE option for 'make run', it is very
important when iterates all of the supported architectures and the
compilers are not just prefixed with the XARCH variable.
For example, binary of big endian powerpc64 can be compiled with
powerpc64le-linux-gnu-, but the prefix is powerpc64le.
Even if the pre-customized compiler not exist, we can configure
CROSS_COMPILE_<ARCH> before the test loop to use the code.
* selftests/nolibc: add tinyconfig target
selftests/nolibc: tinyconfig: add extra common options
selftests/nolibc: tinyconfig: add support for 32/64-bit powerpc
Here is the first architecture(and its variants) support tinyconfig.
powerpc is actually a very good architecture, for it has 'various'
variants for test.
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1687706332.git.falcon@tinylab.org/
[2]: https://lore.kernel.org/lkml/cover.1689713175.git.falcon@tinylab.org/
Zhangjin Wu (14):
selftests/nolibc: allow report with existing test log
selftests/nolibc: add macros to enhance maintainability
selftests/nolibc: print running log to screen
selftests/nolibc: fix up O= option support
selftests/nolibc: add menuconfig for development
selftests/nolibc: add mrproper for development
selftests/nolibc: defconfig: remove mrproper target
selftests/nolibc: string the core targets
selftests/nolibc: allow quit qemu-system when poweroff fails
selftests/nolibc: allow customize CROSS_COMPILE by architecture
selftests/nolibc: customize CROSS_COMPILE for 32/64-bit powerpc
selftests/nolibc: add tinyconfig target
selftests/nolibc: tinyconfig: add extra common options
selftests/nolibc: tinyconfig: add support for 32/64-bit powerpc
tools/testing/selftests/nolibc/Makefile | 102 ++++++++++++++----
.../selftests/nolibc/configs/common.config | 4 +
.../selftests/nolibc/configs/powerpc.config | 3 +
.../selftests/nolibc/configs/powerpc64.config | 3 +
.../nolibc/configs/powerpc64le.config | 4 +
5 files changed, 98 insertions(+), 18 deletions(-)
create mode 100644 tools/testing/selftests/nolibc/configs/common.config
create mode 100644 tools/testing/selftests/nolibc/configs/powerpc64.config
create mode 100644 tools/testing/selftests/nolibc/configs/powerpc64le.config
--
2.25.1
Submit the top-level headers also from the kunit test module notifier
initialization callback, so external tools that are parsing dmesg for
kunit test output are able to tell how many test suites should be expected
and whether to continue parsing after complete output from the first test
suite is collected.
Extend kunit module notifier initialization callback with a processing
path for only listing the tests provided by a module if the kunit action
parameter is set to "list", so external tools can obtain a list of test
cases to be executed in advance and can make a better job on assigning
kernel messages interleaved with kunit output to specific tests.
Use test filtering functions in kunit module notifier callback functions,
so external tools are able to execute individual test cases from kunit
test modules in order to still better isolate their potential impact on
kernel messages that appear interleaved with output from other tests.
Janusz Krzysztofik (3):
kunit: Report the count of test suites in a module
kunit: Make 'list' action available to kunit test modules
kunit: Allow kunit test modules to use test filtering
include/kunit/test.h | 14 +++++++++++
lib/kunit/executor.c | 51 ++++++++++++++++++++++-----------------
lib/kunit/test.c | 57 +++++++++++++++++++++++++++++++++++++++++++-
3 files changed, 99 insertions(+), 23 deletions(-)
--
2.41.0
Hi, Willy
Here is the powerpc support, includes 32-bit big-endian powerpc, 64-bit
little endian and big endian powerpc.
All of them passes run-user with the default powerpc toolchain from
Ubuntu 20.04:
$ make run-user DEFCONFIG=tinyconfig XARCH=powerpc CROSS_COMPILE=powerpc-linux-gnu- | grep status
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
$ make run-user DEFCONFIG=tinyconfig XARCH=powerpc64 CROSS_COMPILE=powerpc64le-linux-gnu- | grep status
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
$ make run-user DEFCONFIG=tinyconfig XARCH=powerpc64le CROSS_COMPILE=powerpc64le-linux-gnu- | grep status
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
For the slow 'run' target, I have run with defconfig before, and just
verified them via the fast tinyconfig + run with a new patch from next
patchset, all of them passes:
165 test(s): 156 passed, 9 skipped, 0 failed => status: warning
Note, the big endian crosstool powerpc64-linux-gcc from
https://mirrors.edge.kernel.org/pub/tools/crosstool/ has been used to
test both little endian and big endian powerpc64 too, both passed.
Here simply explain what they are:
* tools/nolibc: add support for powerpc
tools/nolibc: add support for powerpc64
32-bit & 64-bit powerpc support of nolibc.
* selftests/nolibc: select_null: fix up for big endian powerpc64
fix up a test case for big endian powerpc64.
* selftests/nolibc: add extra config file customize support
add extconfig target to allow enable extra config options via
configs/<ARCH>.config
applied suggestion from Thomas to use config files instead of config
lines.
* selftests/nolibc: add XARCH and ARCH mapping support
applied suggestions from Willy, use XARCH as the input of our nolibc
test, use ARCH as the pure kernel input, at last build the mapping
between XARCH and ARCH.
Customize the variables via the input XARCH.
* selftests/nolibc: add test support for powerpc
Require to use extconfig to enable the console options specified in
configs/powerpc.config
currently, we should manually run extconfig after defconfig, in next
patchset, we will do this automatically.
* selftests/nolibc: add test support for powerpc64le
selftests/nolibc: add test support for powerpc64
Very simple, but customize CFLAGS carefully to let them work with
powerpc64le-linux-gnu-gcc (from Linux distributions) and
powerpc64-linux-gcc (from mirrors.edge.kernel.org)
The next patchset will not be tinyconfig, but some prepare patches, will
be sent out soon.
Best regards,
Zhangjin
---
Zhangjin Wu (8):
tools/nolibc: add support for powerpc
tools/nolibc: add support for powerpc64
selftests/nolibc: select_null: fix up for big endian powerpc64
selftests/nolibc: add extra config file customize support
selftests/nolibc: add XARCH and ARCH mapping support
selftests/nolibc: add test support for powerpc
selftests/nolibc: add test support for powerpc64le
selftests/nolibc: add test support for powerpc64
tools/include/nolibc/arch-powerpc.h | 170 ++++++++++++++++++
tools/testing/selftests/nolibc/Makefile | 55 ++++--
.../selftests/nolibc/configs/powerpc.config | 3 +
tools/testing/selftests/nolibc/nolibc-test.c | 2 +-
4 files changed, 217 insertions(+), 13 deletions(-)
create mode 100644 tools/include/nolibc/arch-powerpc.h
create mode 100644 tools/testing/selftests/nolibc/configs/powerpc.config
--
2.25.1
If the test description is longer than the status alignment the
parameter 'n' to putcharn() would lead to a signed underflow that then
gets converted to a very large unsigned value.
This in turn leads out-of-bound writes in memset() crashing the
application.
The failure case of EXPECT_PTRER() used in "mmap_bad" exhibits this
exact behavior.
Fixes: 8a27526f49f9 ("selftests/nolibc: add EXPECT_PTREQ, EXPECT_PTRNE and EXPECT_PTRER")
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
tools/testing/selftests/nolibc/nolibc-test.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/nolibc/nolibc-test.c b/tools/testing/selftests/nolibc/nolibc-test.c
index 03b1d30f5507..9b76603e4ce3 100644
--- a/tools/testing/selftests/nolibc/nolibc-test.c
+++ b/tools/testing/selftests/nolibc/nolibc-test.c
@@ -151,7 +151,8 @@ static void result(int llen, enum RESULT r)
else
msg = "[FAIL]";
- putcharn(' ', 64 - llen);
+ if (llen < 64)
+ putcharn(' ', 64 - llen);
puts(msg);
}
---
base-commit: dfef4fc45d5713eb23d87f0863aff9c33bd4bfaf
change-id: 20230726-nolibc-result-width-1f4b0b4f3ca0
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
=== Context ===
In the context of a middlebox, fragmented packets are tricky to handle.
The full 5-tuple of a packet is often only available in the first
fragment which makes enforcing consistent policy difficult. There are
really only two stateless options, neither of which are very nice:
1. Enforce policy on first fragment and accept all subsequent fragments.
This works but may let in certain attacks or allow data exfiltration.
2. Enforce policy on first fragment and drop all subsequent fragments.
This does not really work b/c some protocols may rely on
fragmentation. For example, DNS may rely on oversized UDP packets for
large responses.
So stateful tracking is the only sane option. RFC 8900 [0] calls this
out as well in section 6.3:
Middleboxes [...] should process IP fragments in a manner that is
consistent with [RFC0791] and [RFC8200]. In many cases, middleboxes
must maintain state in order to achieve this goal.
=== BPF related bits ===
Policy has traditionally been enforced from XDP/TC hooks. Both hooks
run before kernel reassembly facilities. However, with the new
BPF_PROG_TYPE_NETFILTER, we can rather easily hook into existing
netfilter reassembly infra.
The basic idea is we bump a refcnt on the netfilter defrag module and
then run the bpf prog after the defrag module runs. This allows bpf
progs to transparently see full, reassembled packets. The nice thing
about this is that progs don't have to carry around logic to detect
fragments.
=== Changelog ===
Changes from v5:
* Fix defrag disable codepaths
Changes from v4:
* Refactor module handling code to not sleep in rcu_read_lock()
* Also unify the v4 and v6 hook structs so they can share codepaths
* Fixed some checkpatch.pl formatting warnings
Changes from v3:
* Correctly initialize `addrlen` stack var for recvmsg()
Changes from v2:
* module_put() if ->enable() fails
* Fix CI build errors
Changes from v1:
* Drop bpf_program__attach_netfilter() patches
* static -> static const where appropriate
* Fix callback assignment order during registration
* Only request_module() if callbacks are missing
* Fix retval when modprobe fails in userspace
* Fix v6 defrag module name (nf_defrag_ipv6_hooks -> nf_defrag_ipv6)
* Simplify priority checking code
* Add warning if module doesn't assign callbacks in the future
* Take refcnt on module while defrag link is active
[0]: https://datatracker.ietf.org/doc/html/rfc8900
Daniel Xu (5):
netfilter: defrag: Add glue hooks for enabling/disabling defrag
netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
bpf: selftests: Support not connecting client socket
bpf: selftests: Support custom type and proto for client sockets
bpf: selftests: Add defrag selftests
include/linux/netfilter.h | 10 +
include/uapi/linux/bpf.h | 5 +
net/ipv4/netfilter/nf_defrag_ipv4.c | 17 +-
net/ipv6/netfilter/nf_defrag_ipv6_hooks.c | 11 +
net/netfilter/core.c | 6 +
net/netfilter/nf_bpf_link.c | 123 +++++++-
tools/include/uapi/linux/bpf.h | 5 +
tools/testing/selftests/bpf/Makefile | 4 +-
.../selftests/bpf/generate_udp_fragments.py | 90 ++++++
.../selftests/bpf/ip_check_defrag_frags.h | 57 ++++
tools/testing/selftests/bpf/network_helpers.c | 26 +-
tools/testing/selftests/bpf/network_helpers.h | 3 +
.../bpf/prog_tests/ip_check_defrag.c | 283 ++++++++++++++++++
.../selftests/bpf/progs/ip_check_defrag.c | 104 +++++++
14 files changed, 718 insertions(+), 26 deletions(-)
create mode 100755 tools/testing/selftests/bpf/generate_udp_fragments.py
create mode 100644 tools/testing/selftests/bpf/ip_check_defrag_frags.h
create mode 100644 tools/testing/selftests/bpf/prog_tests/ip_check_defrag.c
create mode 100644 tools/testing/selftests/bpf/progs/ip_check_defrag.c
--
2.41.0
There are use cases that need to apply DAMOS schemes to specific address
ranges or DAMON monitoring targets. NUMA nodes in the physical address
space, special memory objects in the virtual address space, and
monitoring target specific efficient monitoring results snapshot
retrieval could be examples of such use cases. This patchset extends
DAMOS filters feature for such cases, by implementing two more filter
types, namely address ranges and DAMON monitoring types.
Patches sequence
----------------
The first seven patches are for the address ranges based DAMOS filter.
The first patch implements the filter feature and expose it via DAMON
kernel API. The second patch further expose the feature to users via
DAMON sysfs interface. The third and fourth patches implement unit
tests and selftests for the feature. Three patches (fifth to seventh)
updating the documents follow.
The following six patches are for the DAMON monitoring target based
DAMOS filter. The eighth patch implements the feature in the core layer
and expose it via DAMON's kernel API. The ninth patch further expose it
to users via DAMON sysfs interface. Tenth patch add a selftest, and two
patches (eleventh and twelfth) update documents.
SeongJae Park (13):
mm/damon/core: introduce address range type damos filter
mm/damon/sysfs-schemes: support address range type DAMOS filter
mm/damon/core-test: add a unit test for __damos_filter_out()
selftests/damon/sysfs: test address range damos filter
Docs/mm/damon/design: update for address range filters
Docs/ABI/damon: update for address range DAMOS filter
Docs/admin-guide/mm/damon/usage: update for address range type DAMOS
filter
mm/damon/core: implement target type damos filter
mm/damon/sysfs-schemes: support target damos filter
selftests/damon/sysfs: test damon_target filter
Docs/mm/damon/design: update for DAMON monitoring target type DAMOS
filter
Docs/ABI/damon: update for DAMON monitoring target type DAMOS filter
Docs/admin-guide/mm/damon/usage: update for DAMON monitoring target
type DAMOS filter
.../ABI/testing/sysfs-kernel-mm-damon | 27 +++++-
Documentation/admin-guide/mm/damon/usage.rst | 34 +++++---
Documentation/mm/damon/design.rst | 24 ++++--
include/linux/damon.h | 28 +++++--
mm/damon/core-test.h | 61 ++++++++++++++
mm/damon/core.c | 62 ++++++++++++++
mm/damon/sysfs-schemes.c | 83 +++++++++++++++++++
tools/testing/selftests/damon/sysfs.sh | 5 ++
8 files changed, 299 insertions(+), 25 deletions(-)
--
2.25.1
The tried_regions directory of DAMON sysfs interface is useful for
retrieving monitoring results snapshot or DAMOS debugging. However, for
common use case that need to monitor only the total size of the scheme
tried regions (e.g., monitoring working set size), the kernel overhead
for directory construction and user overhead for reading the content
could be high if the number of monitoring region is not small. This
patchset implements DAMON sysfs files for efficient support of the use
case.
The first patch implements the sysfs file to reduce the user space
overhead, and the second patch implements a command for reducing the
kernel space overhead.
The third patch adds a selftest for the new file, and following two
patches update documents.
SeongJae Park (5):
mm/damon/sysfs-schemes: implement DAMOS tried total bytes file
mm/damon/sysfs: implement a command for updating only schemes tried
total bytes
selftests/damon/sysfs: test tried_regions/total_bytes file
Docs/ABI/damon: update for tried_regions/total_bytes
Docs/admin-guide/mm/damon/usage: update for tried_regions/total_bytes
.../ABI/testing/sysfs-kernel-mm-damon | 13 +++++-
Documentation/admin-guide/mm/damon/usage.rst | 42 ++++++++++++-------
mm/damon/sysfs-common.h | 2 +-
mm/damon/sysfs-schemes.c | 24 ++++++++++-
mm/damon/sysfs.c | 26 +++++++++---
tools/testing/selftests/damon/sysfs.sh | 1 +
6 files changed, 83 insertions(+), 25 deletions(-)
--
2.25.1
Events Tracing infrastructure contains lot of files, directories
(internally in terms of inodes, dentries). And ends up by consuming
memory in MBs. We can have multiple events of Events Tracing, which
further requires more memory.
Instead of creating inodes/dentries, eventfs could keep meta-data and
skip the creation of inodes/dentries. As and when require, eventfs will
create the inodes/dentries only for required files/directories.
Also eventfs would delete the inodes/dentries once no more requires
but preserve the meta data.
Tracing events took ~9MB, with this approach it took ~4.5MB
for ~10K files/dir.
Diff from v5:
Patch 02: removed TRACEFS_EVENT_INODE enum.
Patch 04: added TRACEFS_EVENT_INODE enum.
Patch 06: removed WARN_ON_ONCE in eventfs_set_ef_status_free()
Patch 07: added WARN_ON_ONCE in create_dentry()
moved declaration of following to internal.h:
eventfs_start_creating()
eventfs_failed_creating()
eventfs_end_creating()
Patch 08: added WARN_ON_ONCE in eventfs_set_ef_status_free()
Diff from v4:
Patch 02: moved from v4 08/10
added fs/tracefs/internal.h
Patch 03: moved from v4 02/10
removed fs/tracefs/internal.h
Patch 04: moved from v4 03/10
moved out changes of fs/tracefs/internal.h
Patch 05: moved from v4 04/10
renamed eventfs_add_top_file() -> eventfs_add_events_file()
Patch 06: moved from v4 07/10
implemented create_dentry() helper function
added create_file(), create_dir() stub function
Patch 07: moved from v4 06/10
Patch 08: moved from v4 05/10
improved eventfs remove functionality
Patch 09: removed unwanted if conditions
Patch 10: added available_filter_functions check
Diff from v3:
Patch 3,4,5,7,9:
removed all the eventfs_rwsem code and replaced it with an srcu
lock for the readers, and a mutex to synchronize the writers of
the list.
Patch 2: moved 'tracefs_inode' and 'get_tracefs()' to v4 03/10
Patch 3: moved the struct eventfs_file and eventfs_inode into event_inode.c
as it really should not be exposed to all users.
Patch 5: added a recursion check to eventfs_remove_rec() as it is really
dangerous to have unchecked recursion in the kernel (we do have
a fixed size stack).
have the free use srcu callbacks. After the srcu grace periods
are done, it adds the eventfs_file onto a llist (lockless link
list) and wakes up a work queue. Then the work queue does the
freeing (this needs to be done in task/workqueue context, as
srcu callbacks are done in softirq context).
Patch 6: renamed:
eventfs_create_file() -> create_file()
eventfs_create_dir() -> create_dir()
Diff from v2:
Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM'
Patch 02: moved from v1 1/9
Patch 03: moved from v1 2/9
As suggested by Zheng Yejian, introduced eventfs_prepare_ef()
helper function to add files or directories to eventfs
fix WARNING reported by kernel test robot in v1 8/9
Patch 04: moved from v1 3/9
used eventfs_prepare_ef() to add files
fix WARNING reported by kernel test robot in v1 8/9
Patch 05: moved from v1 4/9
fix compiling warning reported by kernel test robot in v1 4/9
Patch 06: moved from v1 5/9
Patch 07: moved from v1 6/9
Patch 08: moved from v1 7/9
Patch 09: moved from v1 8/9
rebased because of v3 01/10
Patch 10: moved from v1 9/9
Diff from v1:
Patch 1: add header file
Patch 2: resolved kernel test robot issues
protecting eventfs lists using nested eventfs_rwsem
Patch 3: protecting eventfs lists using nested eventfs_rwsem
Patch 4: improve events cleanup code to fix crashes
Patch 5: resolved kernel test robot issues
removed d_instantiate_anon() calls
Patch 6: resolved kernel test robot issues
fix kprobe test in eventfs_root_lookup()
protecting eventfs lists using nested eventfs_rwsem
Patch 7: remove header file
Patch 8: pass eventfs_rwsem as argument to eventfs functions
called eventfs_remove_events_dir() instead of tracefs_remove()
from event_trace_del_tracer()
Patch 9: new patch to fix kprobe test case
fs/tracefs/Makefile | 1 +
fs/tracefs/event_inode.c | 801 ++++++++++++++++++
fs/tracefs/inode.c | 151 +++-
fs/tracefs/internal.h | 29 +
include/linux/trace_events.h | 1 +
include/linux/tracefs.h | 23 +
kernel/trace/trace.h | 2 +-
kernel/trace/trace_events.c | 76 +-
.../ftrace/test.d/kprobe/kprobe_args_char.tc | 9 +-
.../test.d/kprobe/kprobe_args_string.tc | 9 +-
10 files changed, 1050 insertions(+), 52 deletions(-)
create mode 100644 fs/tracefs/event_inode.c
create mode 100644 fs/tracefs/internal.h
--
2.39.0
[ This series depends on the VFIO device cdev series ]
Changelog
v11:
* Added Reviewed-by from Kevin
* Dropped 'rc' in iommufd_access_detach()
* Dropped a duplicated IS_ERR check at new_ioas
* Separate into a new patch the change in iommufd_access_destroy_object
v10:
https://lore.kernel.org/all/cover.1690488745.git.nicolinc@nvidia.com/
* Added Reviewed-by from Jason
* Replaced the iommufd_ref_to_user call with refcount_inc
* Added a wrapper iommufd_access_change_ioas_id and used it in the
iommufd_access_attach() and iommufd_access_replace() APIs
v9:
https://lore.kernel.org/all/cover.1690440730.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd for-next tree
* Added Reviewed-by from Jason and Alex
* Reworked the replace API patches
* Added a new patch allowing passing in to iopt_remove_access
* Added a new patch of a helper function following Jason's design,
mainly by blocking any concurrent detach/replace and keeping the
refcount_dec at the end of the function
* Added a call of the new helper in iommufd_access_destroy_object()
to reduce race condition
* Simplified the replace API patch
v8:
https://lore.kernel.org/all/cover.1690226015.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt series and then cdev v15 series:
https://lore.kernel.org/all/0-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.…https://lore.kernel.org/kvm/20230718135551.6592-1-yi.l.liu@intel.com/
* Changed the order of detach() and attach() in replace(), to fix a bug
v7:
https://lore.kernel.org/all/cover.1683593831.git.nicolinc@nvidia.com/
* Rebased on top of v6.4-rc1 and cdev v11 candidate
* Fixed a wrong file in replace() API patch
* Added Kevin's "Reviewed-by" to replace() API patch
v6:
https://lore.kernel.org/all/cover.1679939952.git.nicolinc@nvidia.com/
* Rebased on top of cdev v8 series
https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/
* Added "Reviewed-by" from Kevin to PATCH-4
* Squashed access->ioas updating lines into iommufd_access_change_pt(),
and changed function return type accordingly for simplification.
v5:
https://lore.kernel.org/all/cover.1679559476.git.nicolinc@nvidia.com/
* Kept the cmd->id in the iommufd_test_create_access() so the access can
be created with an ioas by default. Then, renamed the previous ioctl
IOMMU_TEST_OP_ACCESS_SET_IOAS to IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, so
it would be used to replace an access->ioas pointer.
* Added iommufd_access_replace() API after the introductions of the other
two APIs iommufd_access_attach() and iommufd_access_detach().
* Since vdev->iommufd_attached is also set in emulated pathway too, call
iommufd_access_update(), similar to the physical pathway.
v4:
https://lore.kernel.org/all/cover.1678284812.git.nicolinc@nvidia.com/
* Rebased on top of Jason's series adding replace() and hwpt_alloc()
https://lore.kernel.org/all/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia…
* Rebased on top of cdev series v6
https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/
* Dropped the patch that's moved to cdev series.
* Added unmap function pointer sanity before calling it.
* Added "Reviewed-by" from Kevin and Yi.
* Added back the VFIO change updating the ATTACH uAPI.
v3:
https://lore.kernel.org/all/cover.1677288789.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt branch:
https://lore.kernel.org/all/0-v2-406f7ac07936+6a-iommufd_hwpt_jgg@nvidia.co…
* Dropped patches from this series accordingly. There were a couple of
VFIO patches that will be submitted after the VFIO cdev series. Also,
renamed the series to be "emulated".
* Moved dma_unmap sanity patch to the first in the series.
* Moved dma_unmap sanity to cover both VFIO and IOMMUFD pathways.
* Added Kevin's "Reviewed-by" to two of the patches.
* Fixed a NULL pointer bug in vfio_iommufd_emulated_bind().
* Moved unmap() call to the common place in iommufd_access_set_ioas().
v2:
https://lore.kernel.org/all/cover.1675802050.git.nicolinc@nvidia.com/
* Rebased on top of vfio_device cdev v2 series.
* Update the kdoc and commit message of iommu_group_replace_domain().
* Dropped revert-to-core-domain part in iommu_group_replace_domain().
* Dropped !ops->dma_unmap check in vfio_iommufd_emulated_attach_ioas().
* Added missing rc value in vfio_iommufd_emulated_attach_ioas() from the
iommufd_access_set_ioas() call.
* Added a new patch in vfio_main to deny vfio_pin/unpin_pages() calls if
vdev->ops->dma_unmap is not implemented.
* Added a __iommmufd_device_detach helper and let the replace routine do
a partial detach().
* Added restriction on auto_domains to use the replace feature.
* Added the patch "iommufd/device: Make hwpt_list list_add/del symmetric"
from the has_group removal series.
v1:
https://lore.kernel.org/all/cover.1675320212.git.nicolinc@nvidia.com/
Hi all,
The existing IOMMU APIs provide a pair of functions: iommu_attach_group()
for callers to attach a device from the default_domain (NULL if not being
supported) to a given iommu domain, and iommu_detach_group() for callers
to detach a device from a given domain to the default_domain. Internally,
the detach_dev op is deprecated for the newer drivers with default_domain.
This means that those drivers likely can switch an attaching domain to
another one, without stagging the device at a blocking or default domain,
for use cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
to an S1 domain, or when switching between relevant S1 domains.
This series is rebased on top of Jason Gunthorpe's series that introduces
iommu_group_replace_domain API and IOMMUFD infrastructure for the IOMMUFD
"physical" devices. The IOMMUFD "emulated" deivces will need some extra
steps to replace the access->ioas object and its iopt pointer.
You can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v11
Thank you
Nicolin Chen
Nicolin Chen (7):
vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
iommufd: Allow passing in iopt_access_list_id to iopt_remove_access()
iommufd: Add iommufd_access_change_ioas(_id) helpers
iommufd: Use iommufd_access_change_ioas in
iommufd_access_destroy_object
iommufd: Add iommufd_access_replace() API
iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
vfio: Support IO page table replacement
drivers/iommu/iommufd/device.c | 128 ++++++++++++------
drivers/iommu/iommufd/io_pagetable.c | 6 +-
drivers/iommu/iommufd/iommufd_private.h | 3 +-
drivers/iommu/iommufd/iommufd_test.h | 4 +
drivers/iommu/iommufd/selftest.c | 19 +++
drivers/vfio/iommufd.c | 11 +-
drivers/vfio/vfio_main.c | 4 +
include/linux/iommufd.h | 1 +
include/uapi/linux/vfio.h | 6 +
tools/testing/selftests/iommu/iommufd.c | 29 +++-
tools/testing/selftests/iommu/iommufd_utils.h | 19 +++
11 files changed, 179 insertions(+), 51 deletions(-)
--
2.41.0
A missing break in kms_tests leads to kselftest hang when the
parameter -s is used.
In current code flow because of missing break in -s, -t parses
args spilled from -s and as -t accepts only valid values as 0,1
so any arg in -s >1 or <0, gets in ksm_test failure
This went undetected since, before the addition of option -t,
the next case -M would immediately break out of the switch
statement but that is no longer the case
Add the missing break statement.
----Before----
./ksm_tests -H -s 100
Invalid merge type
----After----
./ksm_tests -H -s 100
Number of normal pages: 0
Number of huge pages: 50
Total size: 100 MiB
Total time: 0.401732682 s
Average speed: 248.922 MiB/s
Fixes: 9e7cb94ca218 ("selftests: vm: add KSM merging time test")
Signed-off-by: Ayush Jain <ayush.jain3(a)amd.com>
---
tools/testing/selftests/mm/ksm_tests.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mm/ksm_tests.c b/tools/testing/selftests/mm/ksm_tests.c
index 435acebdc325..380b691d3eb9 100644
--- a/tools/testing/selftests/mm/ksm_tests.c
+++ b/tools/testing/selftests/mm/ksm_tests.c
@@ -831,6 +831,7 @@ int main(int argc, char *argv[])
printf("Size must be greater than 0\n");
return KSFT_FAIL;
}
+ break;
case 't':
{
int tmp = atoi(optarg);
--
2.34.1
[ This series depends on the VFIO device cdev series ]
Changelog
v8:
* Rebased on top of Jason's iommufd_hwpt series and then cdev v15 series:
https://lore.kernel.org/all/0-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.…https://lore.kernel.org/kvm/20230718135551.6592-1-yi.l.liu@intel.com/
* Changed the order of detach() and attach() in replace(), to fix a bug
v7:
https://lore.kernel.org/all/cover.1683593831.git.nicolinc@nvidia.com/
* Rebased on top of v6.4-rc1 and cdev v11 candidate
* Fixed a wrong file in replace() API patch
* Added Kevin's "Reviewed-by" to replace() API patch
v6:
https://lore.kernel.org/all/cover.1679939952.git.nicolinc@nvidia.com/
* Rebased on top of cdev v8 series
https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/
* Added "Reviewed-by" from Kevin to PATCH-4
* Squashed access->ioas updating lines into iommufd_access_change_pt(),
and changed function return type accordingly for simplification.
v5:
https://lore.kernel.org/all/cover.1679559476.git.nicolinc@nvidia.com/
* Kept the cmd->id in the iommufd_test_create_access() so the access can
be created with an ioas by default. Then, renamed the previous ioctl
IOMMU_TEST_OP_ACCESS_SET_IOAS to IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, so
it would be used to replace an access->ioas pointer.
* Added iommufd_access_replace() API after the introductions of the other
two APIs iommufd_access_attach() and iommufd_access_detach().
* Since vdev->iommufd_attached is also set in emulated pathway too, call
iommufd_access_update(), similar to the physical pathway.
v4:
https://lore.kernel.org/all/cover.1678284812.git.nicolinc@nvidia.com/
* Rebased on top of Jason's series adding replace() and hwpt_alloc()
https://lore.kernel.org/all/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia…
* Rebased on top of cdev series v6
https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/
* Dropped the patch that's moved to cdev series.
* Added unmap function pointer sanity before calling it.
* Added "Reviewed-by" from Kevin and Yi.
* Added back the VFIO change updating the ATTACH uAPI.
v3:
https://lore.kernel.org/all/cover.1677288789.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt branch:
https://lore.kernel.org/all/0-v2-406f7ac07936+6a-iommufd_hwpt_jgg@nvidia.co…
* Dropped patches from this series accordingly. There were a couple of
VFIO patches that will be submitted after the VFIO cdev series. Also,
renamed the series to be "emulated".
* Moved dma_unmap sanity patch to the first in the series.
* Moved dma_unmap sanity to cover both VFIO and IOMMUFD pathways.
* Added Kevin's "Reviewed-by" to two of the patches.
* Fixed a NULL pointer bug in vfio_iommufd_emulated_bind().
* Moved unmap() call to the common place in iommufd_access_set_ioas().
v2:
https://lore.kernel.org/all/cover.1675802050.git.nicolinc@nvidia.com/
* Rebased on top of vfio_device cdev v2 series.
* Update the kdoc and commit message of iommu_group_replace_domain().
* Dropped revert-to-core-domain part in iommu_group_replace_domain().
* Dropped !ops->dma_unmap check in vfio_iommufd_emulated_attach_ioas().
* Added missing rc value in vfio_iommufd_emulated_attach_ioas() from the
iommufd_access_set_ioas() call.
* Added a new patch in vfio_main to deny vfio_pin/unpin_pages() calls if
vdev->ops->dma_unmap is not implemented.
* Added a __iommmufd_device_detach helper and let the replace routine do
a partial detach().
* Added restriction on auto_domains to use the replace feature.
* Added the patch "iommufd/device: Make hwpt_list list_add/del symmetric"
from the has_group removal series.
v1:
https://lore.kernel.org/all/cover.1675320212.git.nicolinc@nvidia.com/
Hi all,
The existing IOMMU APIs provide a pair of functions: iommu_attach_group()
for callers to attach a device from the default_domain (NULL if not being
supported) to a given iommu domain, and iommu_detach_group() for callers
to detach a device from a given domain to the default_domain. Internally,
the detach_dev op is deprecated for the newer drivers with default_domain.
This means that those drivers likely can switch an attaching domain to
another one, without stagging the device at a blocking or default domain,
for use cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
to an S1 domain, or when switching between relevant S1 domains.
This series is rebased on top of Jason Gunthorpe's series that introduces
iommu_group_replace_domain API and IOMMUFD infrastructure for the IOMMUFD
"physical" devices. The IOMMUFD "emulated" deivces will need some extra
steps to replace the access->ioas object and its iopt pointer.
You can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v8
Thank you
Nicolin Chen
Nicolin Chen (4):
vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
iommufd: Add iommufd_access_replace() API
iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
vfio: Support IO page table replacement
drivers/iommu/iommufd/device.c | 72 ++++++++++++++-----
drivers/iommu/iommufd/iommufd_test.h | 4 ++
drivers/iommu/iommufd/selftest.c | 19 +++++
drivers/vfio/iommufd.c | 11 +--
drivers/vfio/vfio_main.c | 4 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/vfio.h | 6 ++
tools/testing/selftests/iommu/iommufd.c | 29 +++++++-
tools/testing/selftests/iommu/iommufd_utils.h | 19 +++++
9 files changed, 142 insertions(+), 23 deletions(-)
--
2.41.0
PTP_SYS_OFFSET_EXTENDED was added in November 2018 in
361800876f80 (" ptp: add PTP_SYS_OFFSET_EXTENDED ioctl")
and PTP_SYS_OFFSET_PRECISE was added in February 2016 in
719f1aa4a671 ("ptp: Add PTP_SYS_OFFSET_PRECISE for driver crosstimestamping")
The PTP selftest code is lacking support for these two IOCTLS.
This short series of patches adds support for them.
Changes in v2:
- Fixed rebase issues (v1 somehow ended up with patch 1 being from the
first manual split of my changes and patch 2 being from rebase 2 out
of 3)
- Rebased on top of net-next
Alex Maftei (2):
selftests/ptp: Add -x option for testing PTP_SYS_OFFSET_EXTENDED
selftests/ptp: Add -X option for testing PTP_SYS_OFFSET_PRECISE
tools/testing/selftests/ptp/testptp.c | 73 ++++++++++++++++++++++++++-
1 file changed, 71 insertions(+), 2 deletions(-)
--
2.25.1
[ This series depends on the VFIO device cdev series ]
Changelog
v10:
* Added Reviewed-by from Jason
* Replaced the iommufd_ref_to_user call with refcount_inc
* Added a wrapper iommufd_access_change_ioas_id and used it in the
iommufd_access_attach() and iommufd_access_replace() APIs
v9:
https://lore.kernel.org/linux-iommu/cover.1690440730.git.nicolinc@nvidia.co…
* Rebased on top of Jason's iommufd for-next tree
* Added Reviewed-by from Jason and Alex
* Reworked the replace API patches
* Added a new patch allowing passing in to iopt_remove_access
* Added a new patch of a helper function following Jason's design,
mainly by blocking any concurrent detach/replace and keeping the
refcount_dec at the end of the function
* Added a call of the new helper in iommufd_access_destroy_object()
to reduce race condition
* Simplified the replace API patch
v8:
https://lore.kernel.org/all/cover.1690226015.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt series and then cdev v15 series:
https://lore.kernel.org/all/0-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.…https://lore.kernel.org/kvm/20230718135551.6592-1-yi.l.liu@intel.com/
* Changed the order of detach() and attach() in replace(), to fix a bug
v7:
https://lore.kernel.org/all/cover.1683593831.git.nicolinc@nvidia.com/
* Rebased on top of v6.4-rc1 and cdev v11 candidate
* Fixed a wrong file in replace() API patch
* Added Kevin's "Reviewed-by" to replace() API patch
v6:
https://lore.kernel.org/all/cover.1679939952.git.nicolinc@nvidia.com/
* Rebased on top of cdev v8 series
https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/
* Added "Reviewed-by" from Kevin to PATCH-4
* Squashed access->ioas updating lines into iommufd_access_change_pt(),
and changed function return type accordingly for simplification.
v5:
https://lore.kernel.org/all/cover.1679559476.git.nicolinc@nvidia.com/
* Kept the cmd->id in the iommufd_test_create_access() so the access can
be created with an ioas by default. Then, renamed the previous ioctl
IOMMU_TEST_OP_ACCESS_SET_IOAS to IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, so
it would be used to replace an access->ioas pointer.
* Added iommufd_access_replace() API after the introductions of the other
two APIs iommufd_access_attach() and iommufd_access_detach().
* Since vdev->iommufd_attached is also set in emulated pathway too, call
iommufd_access_update(), similar to the physical pathway.
v4:
https://lore.kernel.org/all/cover.1678284812.git.nicolinc@nvidia.com/
* Rebased on top of Jason's series adding replace() and hwpt_alloc()
https://lore.kernel.org/all/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia…
* Rebased on top of cdev series v6
https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/
* Dropped the patch that's moved to cdev series.
* Added unmap function pointer sanity before calling it.
* Added "Reviewed-by" from Kevin and Yi.
* Added back the VFIO change updating the ATTACH uAPI.
v3:
https://lore.kernel.org/all/cover.1677288789.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt branch:
https://lore.kernel.org/all/0-v2-406f7ac07936+6a-iommufd_hwpt_jgg@nvidia.co…
* Dropped patches from this series accordingly. There were a couple of
VFIO patches that will be submitted after the VFIO cdev series. Also,
renamed the series to be "emulated".
* Moved dma_unmap sanity patch to the first in the series.
* Moved dma_unmap sanity to cover both VFIO and IOMMUFD pathways.
* Added Kevin's "Reviewed-by" to two of the patches.
* Fixed a NULL pointer bug in vfio_iommufd_emulated_bind().
* Moved unmap() call to the common place in iommufd_access_set_ioas().
v2:
https://lore.kernel.org/all/cover.1675802050.git.nicolinc@nvidia.com/
* Rebased on top of vfio_device cdev v2 series.
* Update the kdoc and commit message of iommu_group_replace_domain().
* Dropped revert-to-core-domain part in iommu_group_replace_domain().
* Dropped !ops->dma_unmap check in vfio_iommufd_emulated_attach_ioas().
* Added missing rc value in vfio_iommufd_emulated_attach_ioas() from the
iommufd_access_set_ioas() call.
* Added a new patch in vfio_main to deny vfio_pin/unpin_pages() calls if
vdev->ops->dma_unmap is not implemented.
* Added a __iommmufd_device_detach helper and let the replace routine do
a partial detach().
* Added restriction on auto_domains to use the replace feature.
* Added the patch "iommufd/device: Make hwpt_list list_add/del symmetric"
from the has_group removal series.
v1:
https://lore.kernel.org/all/cover.1675320212.git.nicolinc@nvidia.com/
Hi all,
The existing IOMMU APIs provide a pair of functions: iommu_attach_group()
for callers to attach a device from the default_domain (NULL if not being
supported) to a given iommu domain, and iommu_detach_group() for callers
to detach a device from a given domain to the default_domain. Internally,
the detach_dev op is deprecated for the newer drivers with default_domain.
This means that those drivers likely can switch an attaching domain to
another one, without stagging the device at a blocking or default domain,
for use cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
to an S1 domain, or when switching between relevant S1 domains.
This series is rebased on top of Jason Gunthorpe's series that introduces
iommu_group_replace_domain API and IOMMUFD infrastructure for the IOMMUFD
"physical" devices. The IOMMUFD "emulated" deivces will need some extra
steps to replace the access->ioas object and its iopt pointer.
You can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v10
Thank you
Nicolin Chen
Nicolin Chen (6):
vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
iommufd: Allow passing in iopt_access_list_id to iopt_remove_access()
iommufd: Add iommufd_access_change_ioas(_id) helpers
iommufd: Add iommufd_access_replace() API
iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
vfio: Support IO page table replacement
drivers/iommu/iommufd/device.c | 132 ++++++++++++------
drivers/iommu/iommufd/io_pagetable.c | 6 +-
drivers/iommu/iommufd/iommufd_private.h | 3 +-
drivers/iommu/iommufd/iommufd_test.h | 4 +
drivers/iommu/iommufd/selftest.c | 19 +++
drivers/vfio/iommufd.c | 11 +-
drivers/vfio/vfio_main.c | 4 +
include/linux/iommufd.h | 1 +
include/uapi/linux/vfio.h | 6 +
tools/testing/selftests/iommu/iommufd.c | 29 +++-
tools/testing/selftests/iommu/iommufd_utils.h | 19 +++
11 files changed, 184 insertions(+), 50 deletions(-)
--
2.41.0
Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Users can now select a
desired address space using a non-zero hint address to mmap. Previously,
requesting the default address space from mmap by passing zero as the hint
address would result in using the largest address space possible. Some
applications depend on empty bits in the virtual address space, like Go and
Java, so this patch provides more flexibility for application developers.
-Charlie
---
v7:
- Changing RLIMIT_STACK inside of an executing program does not trigger
arch_pick_mmap_layout(), so rewrite tests to change RLIMIT_STACK from a
script before executing tests. RLIMIT_STACK of infinity forces bottomup
mmap allocation.
- Make arch_get_mmap_base macro more readible by extracting out the rnd
calculation.
- Use MMAP_MIN_VA_BITS in TASK_UNMAPPED_BASE to support case when mmap
attempts to allocate address smaller than DEFAULT_MAP_WINDOW.
- Fix incorrect wording in documentation.
v6:
- Rebase onto the correct base
v5:
- Minor wording change in documentation
- Change some parenthesis in arch_get_mmap_ macros
- Added case for addr==0 in arch_get_mmap_ because without this, programs would
crash if RLIMIT_STACK was modified before executing the program. This was
tested using the libhugetlbfs tests.
v4:
- Split testcases/document patch into test cases, in-code documentation, and
formal documentation patches
- Modified the mmap_base macro to be more legible and better represent memory
layout
- Fixed documentation to better reflect the implmentation
- Renamed DEFAULT_VA_BITS to MMAP_VA_BITS
- Added additional test case for rlimit changes
---
Charlie Jenkins (4):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Document mmap changes
Documentation/riscv/vm-layout.rst | 22 +++++++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 21 ++++--
arch/riscv/include/asm/processor.h | 47 ++++++++++++--
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/.gitignore | 2 +
tools/testing/selftests/riscv/mm/Makefile | 15 +++++
.../riscv/mm/testcases/mmap_bottomup.c | 35 ++++++++++
.../riscv/mm/testcases/mmap_default.c | 35 ++++++++++
.../selftests/riscv/mm/testcases/mmap_test.h | 64 +++++++++++++++++++
.../selftests/riscv/mm/testcases/run_mmap.sh | 12 ++++
11 files changed, 244 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_default.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_test.h
create mode 100755 tools/testing/selftests/riscv/mm/testcases/run_mmap.sh
--
2.41.0
[ This series depends on the VFIO device cdev series ]
Changelog
v9:
* Rebased on top of Jason's iommufd for-next tree
* Added Reviewed-by from Jason and Alex
* Reworked the replace API patches
* Added a new patch allowing passing in to iopt_remove_access
* Added a new patch of a helper function following Jason's design,
mainly by blocking any concurrent detach/replace and keeping the
refcount_dec at the end of the function
* Added a call of the new helper in iommufd_access_destroy_object()
to reduce race condition
* Simplified the replace API patch
v8:
https://lore.kernel.org/all/cover.1690226015.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt series and then cdev v15 series:
https://lore.kernel.org/all/0-v8-6659224517ea+532-iommufd_alloc_jgg@nvidia.…https://lore.kernel.org/kvm/20230718135551.6592-1-yi.l.liu@intel.com/
* Changed the order of detach() and attach() in replace(), to fix a bug
v7:
https://lore.kernel.org/all/cover.1683593831.git.nicolinc@nvidia.com/
* Rebased on top of v6.4-rc1 and cdev v11 candidate
* Fixed a wrong file in replace() API patch
* Added Kevin's "Reviewed-by" to replace() API patch
v6:
https://lore.kernel.org/all/cover.1679939952.git.nicolinc@nvidia.com/
* Rebased on top of cdev v8 series
https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/
* Added "Reviewed-by" from Kevin to PATCH-4
* Squashed access->ioas updating lines into iommufd_access_change_pt(),
and changed function return type accordingly for simplification.
v5:
https://lore.kernel.org/all/cover.1679559476.git.nicolinc@nvidia.com/
* Kept the cmd->id in the iommufd_test_create_access() so the access can
be created with an ioas by default. Then, renamed the previous ioctl
IOMMU_TEST_OP_ACCESS_SET_IOAS to IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, so
it would be used to replace an access->ioas pointer.
* Added iommufd_access_replace() API after the introductions of the other
two APIs iommufd_access_attach() and iommufd_access_detach().
* Since vdev->iommufd_attached is also set in emulated pathway too, call
iommufd_access_update(), similar to the physical pathway.
v4:
https://lore.kernel.org/all/cover.1678284812.git.nicolinc@nvidia.com/
* Rebased on top of Jason's series adding replace() and hwpt_alloc()
https://lore.kernel.org/all/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia…
* Rebased on top of cdev series v6
https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/
* Dropped the patch that's moved to cdev series.
* Added unmap function pointer sanity before calling it.
* Added "Reviewed-by" from Kevin and Yi.
* Added back the VFIO change updating the ATTACH uAPI.
v3:
https://lore.kernel.org/all/cover.1677288789.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt branch:
https://lore.kernel.org/all/0-v2-406f7ac07936+6a-iommufd_hwpt_jgg@nvidia.co…
* Dropped patches from this series accordingly. There were a couple of
VFIO patches that will be submitted after the VFIO cdev series. Also,
renamed the series to be "emulated".
* Moved dma_unmap sanity patch to the first in the series.
* Moved dma_unmap sanity to cover both VFIO and IOMMUFD pathways.
* Added Kevin's "Reviewed-by" to two of the patches.
* Fixed a NULL pointer bug in vfio_iommufd_emulated_bind().
* Moved unmap() call to the common place in iommufd_access_set_ioas().
v2:
https://lore.kernel.org/all/cover.1675802050.git.nicolinc@nvidia.com/
* Rebased on top of vfio_device cdev v2 series.
* Update the kdoc and commit message of iommu_group_replace_domain().
* Dropped revert-to-core-domain part in iommu_group_replace_domain().
* Dropped !ops->dma_unmap check in vfio_iommufd_emulated_attach_ioas().
* Added missing rc value in vfio_iommufd_emulated_attach_ioas() from the
iommufd_access_set_ioas() call.
* Added a new patch in vfio_main to deny vfio_pin/unpin_pages() calls if
vdev->ops->dma_unmap is not implemented.
* Added a __iommmufd_device_detach helper and let the replace routine do
a partial detach().
* Added restriction on auto_domains to use the replace feature.
* Added the patch "iommufd/device: Make hwpt_list list_add/del symmetric"
from the has_group removal series.
v1:
https://lore.kernel.org/all/cover.1675320212.git.nicolinc@nvidia.com/
Hi all,
The existing IOMMU APIs provide a pair of functions: iommu_attach_group()
for callers to attach a device from the default_domain (NULL if not being
supported) to a given iommu domain, and iommu_detach_group() for callers
to detach a device from a given domain to the default_domain. Internally,
the detach_dev op is deprecated for the newer drivers with default_domain.
This means that those drivers likely can switch an attaching domain to
another one, without stagging the device at a blocking or default domain,
for use cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
to an S1 domain, or when switching between relevant S1 domains.
This series is rebased on top of Jason Gunthorpe's series that introduces
iommu_group_replace_domain API and IOMMUFD infrastructure for the IOMMUFD
"physical" devices. The IOMMUFD "emulated" deivces will need some extra
steps to replace the access->ioas object and its iopt pointer.
You can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v9
Thank you
Nicolin Chen
Nicolin Chen (6):
vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
iommufd: Allow passing in iopt_access_list_id to iopt_remove_access()
iommufd: Add iommufd_access_change_ioas helper
iommufd: Add iommufd_access_replace() API
iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
vfio: Support IO page table replacement
drivers/iommu/iommufd/device.c | 123 ++++++++++++------
drivers/iommu/iommufd/io_pagetable.c | 6 +-
drivers/iommu/iommufd/iommufd_private.h | 3 +-
drivers/iommu/iommufd/iommufd_test.h | 4 +
drivers/iommu/iommufd/selftest.c | 19 +++
drivers/vfio/iommufd.c | 11 +-
drivers/vfio/vfio_main.c | 4 +
include/linux/iommufd.h | 1 +
include/uapi/linux/vfio.h | 6 +
tools/testing/selftests/iommu/iommufd.c | 29 ++++-
tools/testing/selftests/iommu/iommufd_utils.h | 19 +++
11 files changed, 175 insertions(+), 50 deletions(-)
--
2.41.0
When we collect a signal context with one of the SME modes enabled we will
have enabled that mode behind the compiler and libc's back so they may
issue some instructions not valid in streaming mode, causing spurious
failures.
For the code prior to issuing the BRK to trigger signal handling we need to
stay in streaming mode if we were already there since that's a part of the
signal context the caller is trying to collect. Unfortunately this code
includes a memset() which is likely to be heavily optimised and is likely
to use FP instructions incompatible with streaming mode. We can avoid this
happening by open coding the memset(), inserting a volatile assembly
statement to avoid the compiler recognising what's being done and doing
something in optimisation. This code is not performance critical so the
inefficiency should not be an issue.
After collecting the context we can simply exit streaming mode, avoiding
these issues. Use a full SMSTOP for safety to prevent any issues appearing
with ZA.
Reported-by: Will Deacon <will(a)kernel.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v3:
- Open code OPTIMISER_HIDE_VAR() instead of the memory clobber.
- Link to v2: https://lore.kernel.org/r/20230712-arm64-signal-memcpy-fix-v2-1-494f7025caf…
Changes in v2:
- Rebase onto v6.5-rc1.
- Link to v1: https://lore.kernel.org/r/20230628-arm64-signal-memcpy-fix-v1-1-db3e0300829…
---
.../selftests/arm64/signal/test_signals_utils.h | 25 +++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.h b/tools/testing/selftests/arm64/signal/test_signals_utils.h
index 222093f51b67..c7f5627171dd 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.h
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.h
@@ -60,13 +60,25 @@ static __always_inline bool get_current_context(struct tdescr *td,
size_t dest_sz)
{
static volatile bool seen_already;
+ int i;
+ char *uc = (char *)dest_uc;
assert(td && dest_uc);
/* it's a genuine invocation..reinit */
seen_already = 0;
td->live_uc_valid = 0;
td->live_sz = dest_sz;
- memset(dest_uc, 0x00, td->live_sz);
+
+ /*
+ * This is a memset() but we don't want the compiler to
+ * optimise it into either instructions or a library call
+ * which might be incompatible with streaming mode.
+ */
+ for (i = 0; i < td->live_sz; i++) {
+ uc[i] = 0;
+ __asm__ ("" : "=r" (uc[i]) : "0" (uc[i]));
+ }
+
td->live_uc = dest_uc;
/*
* Grab ucontext_t triggering a SIGTRAP.
@@ -103,6 +115,17 @@ static __always_inline bool get_current_context(struct tdescr *td,
:
: "memory");
+ /*
+ * If we were grabbing a streaming mode context then we may
+ * have entered streaming mode behind the system's back and
+ * libc or compiler generated code might decide to do
+ * something invalid in streaming mode, or potentially even
+ * the state of ZA. Issue a SMSTOP to exit both now we have
+ * grabbed the state.
+ */
+ if (td->feats_supported & FEAT_SME)
+ asm volatile("msr S0_3_C4_C6_3, xzr");
+
/*
* If we get here with seen_already==1 it implies the td->live_uc
* context has been used to get back here....this probably means
---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230628-arm64-signal-memcpy-fix-7de3b3c8fa10
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This series fixes an issue which David Spickett found where if we change
the SVE VL while SME is in use we can end up attempting to save state to
an unallocated buffer and adds testing coverage for that plus a bit more
coverage of VL changes, just for paranioa.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Always reallocate the SVE state.
- Rebase onto v6.5-rc2.
- Link to v1: https://lore.kernel.org/r/20230713-arm64-fix-sve-sme-vl-change-v1-0-129dd86…
---
Mark Brown (3):
arm64/fpsimd: Ensure SME storage is allocated after SVE VL changes
kselftest/arm64: Add a test case for SVE VL changes with SME active
kselftest/arm64: Validate that changing one VL type does not affect another
arch/arm64/kernel/fpsimd.c | 33 +++++--
tools/testing/selftests/arm64/fp/vec-syscfg.c | 127 +++++++++++++++++++++++++-
2 files changed, 148 insertions(+), 12 deletions(-)
---
base-commit: 06785562d1b99ff6dc1cd0af54be5e3ff999dc02
change-id: 20230713-arm64-fix-sve-sme-vl-change-60eb1fa6a707
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The openvswitch selftests currently contain a few cases for managing the
datapath, which includes creating datapath instances, adding interfaces,
and doing some basic feature / upcall tests. This is useful to validate
the control path.
Add the ability to program some of the more common flows with actions. This
can be improved overtime to include regression testing, etc.
Aaron Conole (4):
selftests: openvswitch: add an initial flow programming case
selftests: openvswitch: add a test for ipv4 forwarding
selftests: openvswitch: add basic ct test case parsing
selftests: openvswitch: add ct-nat test case with ipv4
.../selftests/net/openvswitch/openvswitch.sh | 223 ++++++++
.../selftests/net/openvswitch/ovs-dpctl.py | 507 ++++++++++++++++++
2 files changed, 730 insertions(+)
--
2.40.1
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
With this proposal we're able to reach ~96.6% line rate speeds with data sent
and received directly from/to device memory.
* Patch overview:
** Part 1: struct paged device memory
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to understand a new memory type.
This proposal implements option #1. We implement a small framework to generate
struct pages for an sg_table returned from dma_buf_map_attachment(). The support
added here should be generic and easily extended to other use cases interested
in struct paged device memory. We use this framework to generate pages that can
be used in the networking stack.
** Part 2: recvmsg() & sendmsg() APIs
We define user APIs for the user to send and receive these dmabuf pages.
** part 3: support for unreadable skb frags
Dmabuf pages are not accessible by the host; we implement changes throughput the
networking stack to correctly handle skbs with unreadable frags.
** part 4: page pool support
We piggy back on Jakub's page pool memory providers idea:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory TCP
changes from the driver.
This is not strictly necessary, the driver can choose to allocate dmabuf pages
and use them directly without going through the page pool (if acceptable to
their maintainers).
Not included with this RFC is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This RFC is built on top of v6.4-rc7 with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using dmabuf pages
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
Not included in this series is our devmem TCP benchmark, which
transfers data to/from GPU dmabufs directly.
With this implementation & benchmark we're able to reach ~96.6% line rate
speeds with 4 GPU/NIC pairs running bi-direction traffic, with all the
packet payloads going straight to the GPU memory (no host buffer bounce).
** Test Setup
Kernel: v6.4-rc7, with this RFC and Jakub's memory provider API
cherry-picked locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Benchmark: custom devmem TCP benchmark not yet open sourced.
Mina Almasry (10):
dma-buf: add support for paged attachment mappings
dma-buf: add support for NET_RX pages
dma-buf: add support for NET_TX pages
net: add support for skbs with unreadable frags
tcp: implement recvmsg() RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX pages
tcp: implement sendmsg() TX path for for devmem tcp
selftests: add ncdevmem, netcat for devmem TCP
memory-provider: updates core provider API for devmem TCP
memory-provider: add dmabuf devmem provider
drivers/dma-buf/dma-buf.c | 444 ++++++++++++++++
include/linux/dma-buf.h | 142 +++++
include/linux/netdevice.h | 1 +
include/linux/skbuff.h | 34 +-
include/linux/socket.h | 1 +
include/net/page_pool.h | 21 +
include/net/sock.h | 4 +
include/net/tcp.h | 6 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/dma-buf.h | 12 +
include/uapi/linux/uio.h | 10 +
net/core/datagram.c | 3 +
net/core/page_pool.c | 111 +++-
net/core/skbuff.c | 81 ++-
net/core/sock.c | 47 ++
net/ipv4/tcp.c | 262 +++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 8 +
net/ipv4/tcp_output.c | 5 +-
net/packet/af_packet.c | 4 +-
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/ncdevmem.c | 693 +++++++++++++++++++++++++
23 files changed, 1868 insertions(+), 42 deletions(-)
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.41.0.390.g38632f3daf-goog
Events Tracing infrastructure contains lot of files, directories
(internally in terms of inodes, dentries). And ends up by consuming
memory in MBs. We can have multiple events of Events Tracing, which
further requires more memory.
Instead of creating inodes/dentries, eventfs could keep meta-data and
skip the creation of inodes/dentries. As and when require, eventfs will
create the inodes/dentries only for required files/directories.
Also eventfs would delete the inodes/dentries once no more requires
but preserve the meta data.
Tracing events took ~9MB, with this approach it took ~4.5MB
for ~10K files/dir.
Diff from v4:
Patch 02: moved from v4 08/10
added fs/tracefs/internal.h
Patch 03: moved from v4 02/10
removed fs/tracefs/internal.h
Patch 04: moved from v4 03/10
moved out changes of fs/tracefs/internal.h
Patch 05: moved from v4 04/10
renamed eventfs_add_top_file() -> eventfs_add_events_file()
Patch 06: moved from v4 07/10
implemented create_dentry() helper function
added create_file(), create_dir() stub function
Patch 07: moved from v4 06/10
Patch 08: moved from v4 05/10
improved eventfs remove functionality
Patch 09: removed unwanted if conditions
Patch 10: added available_filter_functions check
Diff from v3:
Patch 3,4,5,7,9:
removed all the eventfs_rwsem code and replaced it with an srcu
lock for the readers, and a mutex to synchronize the writers of
the list.
Patch 2: moved 'tracefs_inode' and 'get_tracefs()' to v4 03/10
Patch 3: moved the struct eventfs_file and eventfs_inode into event_inode.c
as it really should not be exposed to all users.
Patch 5: added a recursion check to eventfs_remove_rec() as it is really
dangerous to have unchecked recursion in the kernel (we do have
a fixed size stack).
have the free use srcu callbacks. After the srcu grace periods
are done, it adds the eventfs_file onto a llist (lockless link
list) and wakes up a work queue. Then the work queue does the
freeing (this needs to be done in task/workqueue context, as
srcu callbacks are done in softirq context).
Patch 6: renamed:
eventfs_create_file() -> create_file()
eventfs_create_dir() -> create_dir()
Diff from v2:
Patch 01: new patch:'Require all trace events to have a TRACE_SYSTEM'
Patch 02: moved from v1 1/9
Patch 03: moved from v1 2/9
As suggested by Zheng Yejian, introduced eventfs_prepare_ef()
helper function to add files or directories to eventfs
fix WARNING reported by kernel test robot in v1 8/9
Patch 04: moved from v1 3/9
used eventfs_prepare_ef() to add files
fix WARNING reported by kernel test robot in v1 8/9
Patch 05: moved from v1 4/9
fix compiling warning reported by kernel test robot in v1 4/9
Patch 06: moved from v1 5/9
Patch 07: moved from v1 6/9
Patch 08: moved from v1 7/9
Patch 09: moved from v1 8/9
rebased because of v3 01/10
Patch 10: moved from v1 9/9
Diff from v1:
Patch 1: add header file
Patch 2: resolved kernel test robot issues
protecting eventfs lists using nested eventfs_rwsem
Patch 3: protecting eventfs lists using nested eventfs_rwsem
Patch 4: improve events cleanup code to fix crashes
Patch 5: resolved kernel test robot issues
removed d_instantiate_anon() calls
Patch 6: resolved kernel test robot issues
fix kprobe test in eventfs_root_lookup()
protecting eventfs lists using nested eventfs_rwsem
Patch 7: remove header file
Patch 8: pass eventfs_rwsem as argument to eventfs functions
called eventfs_remove_events_dir() instead of tracefs_remove()
from event_trace_del_tracer()
Patch 9: new patch to fix kprobe test case
fs/tracefs/Makefile | 1 +
fs/tracefs/event_inode.c | 795 ++++++++++++++++++
fs/tracefs/inode.c | 151 +++-
fs/tracefs/internal.h | 26 +
include/linux/trace_events.h | 1 +
include/linux/tracefs.h | 30 +
kernel/trace/trace.h | 2 +-
kernel/trace/trace_events.c | 76 +-
.../ftrace/test.d/kprobe/kprobe_args_char.tc | 9 +-
.../test.d/kprobe/kprobe_args_string.tc | 9 +-
10 files changed, 1048 insertions(+), 52 deletions(-)
create mode 100644 fs/tracefs/event_inode.c
create mode 100644 fs/tracefs/internal.h
--
2.39.0
Hello,
This is v4 of the patch series for TDX selftests.
It has been updated for Intel’s v14 of the TDX host patches which was
proposed here:
https://lore.kernel.org/lkml/cover.1685333727.git.isaku.yamahata@intel.com/
The tree can be found at:
https://github.com/googleprodkernel/linux-cc/tree/tdx-selftests-rfc-v4
Changes from RFC v3:
In v14, TDX can only run with UPM enabled so the necessary changes were
made to handle that.
td_vcpu_run() was added to handle TdVmCalls that are now handled in
userspace.
The comments under the patch "KVM: selftests: Require GCC to realign
stacks on function entry" were addressed with the following patch:
https://lore.kernel.org/lkml/Y%2FfHLdvKHlK6D%2F1v@google.com/T/
And other minor tweaks were made to integrate the selftest
infrastructure onto v14.
In RFCv4, TDX selftest code is organized into:
+ headers in tools/testing/selftests/kvm/include/x86_64/tdx/
+ common code in tools/testing/selftests/kvm/lib/x86_64/tdx/
+ selftests in tools/testing/selftests/kvm/x86_64/tdx_*
Dependencies
+ Peter’s patches, which provide functions for the host to allocate
and track protected memory in the
guest. https://lore.kernel.org/lkml/20221018205845.770121-1-pgonda@google.com/T/
Further work for this patch series/TODOs
+ Sean’s comments for the non-confidential UPM selftests patch series
at https://lore.kernel.org/lkml/Y8dC8WDwEmYixJqt@google.com/T/#u apply
here as well
+ Add ucall support for TDX selftests
I would also like to acknowledge the following people, who helped
review or test patches in RFCv1, RFCv2, and RFCv3:
+ Sean Christopherson <seanjc(a)google.com>
+ Zhenzhong Duan <zhenzhong.duan(a)intel.com>
+ Peter Gonda <pgonda(a)google.com>
+ Andrew Jones <drjones(a)redhat.com>
+ Maxim Levitsky <mlevitsk(a)redhat.com>
+ Xiaoyao Li <xiaoyao.li(a)intel.com>
+ David Matlack <dmatlack(a)google.com>
+ Marc Orr <marcorr(a)google.com>
+ Isaku Yamahata <isaku.yamahata(a)gmail.com>
+ Maciej S. Szmigiero <maciej.szmigiero(a)oracle.com>
Links to earlier patch series
+ RFC v1: https://lore.kernel.org/lkml/20210726183816.1343022-1-erdemaktas@google.com…
+ RFC v2: https://lore.kernel.org/lkml/20220830222000.709028-1-sagis@google.com/T/#u
+ RFC v3: https://lore.kernel.org/lkml/20230121001542.2472357-1-ackerleytng@google.co…
Ackerley Tng (12):
KVM: selftests: Add function to allow one-to-one GVA to GPA mappings
KVM: selftests: Expose function that sets up sregs based on VM's mode
KVM: selftests: Store initial stack address in struct kvm_vcpu
KVM: selftests: Refactor steps in vCPU descriptor table initialization
KVM: selftests: TDX: Use KVM_TDX_CAPABILITIES to validate TDs'
attribute configuration
KVM: selftests: TDX: Update load_td_memory_region for VM memory backed
by guest memfd
KVM: selftests: Add functions to allow mapping as shared
KVM: selftests: Expose _vm_vaddr_alloc
KVM: selftests: TDX: Add support for TDG.MEM.PAGE.ACCEPT
KVM: selftests: TDX: Add support for TDG.VP.VEINFO.GET
KVM: selftests: TDX: Add TDX UPM selftest
KVM: selftests: TDX: Add TDX UPM selftests for implicit conversion
Erdem Aktas (3):
KVM: selftests: Add helper functions to create TDX VMs
KVM: selftests: TDX: Add TDX lifecycle test
KVM: selftests: TDX: Adding test case for TDX port IO
Roger Wang (1):
KVM: selftests: TDX: Add TDG.VP.INFO test
Ryan Afranji (2):
KVM: selftests: TDX: Verify the behavior when host consumes a TD
private memory
KVM: selftests: TDX: Add shared memory test
Sagi Shahar (10):
KVM: selftests: TDX: Add report_fatal_error test
KVM: selftests: TDX: Add basic TDX CPUID test
KVM: selftests: TDX: Add basic get_td_vmcall_info test
KVM: selftests: TDX: Add TDX IO writes test
KVM: selftests: TDX: Add TDX IO reads test
KVM: selftests: TDX: Add TDX MSR read/write tests
KVM: selftests: TDX: Add TDX HLT exit test
KVM: selftests: TDX: Add TDX MMIO reads test
KVM: selftests: TDX: Add TDX MMIO writes test
KVM: selftests: TDX: Add TDX CPUID TDVMCALL test
tools/testing/selftests/kvm/Makefile | 8 +
.../selftests/kvm/include/kvm_util_base.h | 35 +
.../selftests/kvm/include/x86_64/processor.h | 4 +
.../kvm/include/x86_64/tdx/td_boot.h | 82 +
.../kvm/include/x86_64/tdx/td_boot_asm.h | 16 +
.../selftests/kvm/include/x86_64/tdx/tdcall.h | 59 +
.../selftests/kvm/include/x86_64/tdx/tdx.h | 65 +
.../kvm/include/x86_64/tdx/tdx_util.h | 19 +
.../kvm/include/x86_64/tdx/test_util.h | 164 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 115 +-
.../selftests/kvm/lib/x86_64/processor.c | 77 +-
.../selftests/kvm/lib/x86_64/tdx/td_boot.S | 101 ++
.../selftests/kvm/lib/x86_64/tdx/tdcall.S | 158 ++
.../selftests/kvm/lib/x86_64/tdx/tdx.c | 262 ++++
.../selftests/kvm/lib/x86_64/tdx/tdx_util.c | 565 +++++++
.../selftests/kvm/lib/x86_64/tdx/test_util.c | 101 ++
.../kvm/x86_64/tdx_shared_mem_test.c | 134 ++
.../selftests/kvm/x86_64/tdx_upm_test.c | 469 ++++++
.../selftests/kvm/x86_64/tdx_vm_tests.c | 1322 +++++++++++++++++
19 files changed, 3730 insertions(+), 26 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/td_boot.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/td_boot_asm.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/tdcall.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/tdx.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/tdx_util.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/test_util.h
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/td_boot.S
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/tdcall.S
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/tdx.c
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/tdx_util.c
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/test_util.c
create mode 100644 tools/testing/selftests/kvm/x86_64/tdx_shared_mem_test.c
create mode 100644 tools/testing/selftests/kvm/x86_64/tdx_upm_test.c
create mode 100644 tools/testing/selftests/kvm/x86_64/tdx_vm_tests.c
--
2.41.0.487.g6d72f3e995-goog
[ Resending because claws-mail is messing with the Cc again. It doesn't like quotes :-p ]
On Fri, 21 Jul 2023 08:48:39 -0400
Steven Rostedt <rostedt(a)goodmis.org> wrote:
> diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
> index 4db048250cdb..2718de1533e6 100644
> --- a/fs/tracefs/event_inode.c
> +++ b/fs/tracefs/event_inode.c
> @@ -36,16 +36,36 @@ struct eventfs_file {
> const struct file_operations *fop;
> const struct inode_operations *iop;
> union {
> + struct list_head del_list;
> struct rcu_head rcu;
> - struct llist_node llist; /* For freeing after RCU */
> + unsigned long is_freed; /* Freed if one of the above is set */
I changed the freeing around. The dentries are freed before returning from
eventfs_remove_dir().
I also added a "is_freed" field that is part of the union and is set if
list elements have content. Note, since the union was criticized before, I
will state the entire purpose of doing this patch set is to save memory.
This structure will be used for every event file. What's the point of
getting rid of dentries if we are replacing it with something just as big?
Anyway, struct dentry does the exact same thing!
> };
> void *data;
> umode_t mode;
> - bool created;
> + unsigned int flags;
Bah, I forgot to remove flags (one iteration replaced the created with
flags to set both created and freed). I removed the freed with the above
"is_freed" and noticed that created is set if and only if ef->dentry is
set. So instead of using the created boolean, just test ef->dentry.
The flags isn't used and can be removed. I just forgot to do so.
> };
>
> static DEFINE_MUTEX(eventfs_mutex);
> DEFINE_STATIC_SRCU(eventfs_srcu);
> +
> +static struct dentry *eventfs_root_lookup(struct inode *dir,
> + struct dentry *dentry,
> + unsigned int flags);
> +static int dcache_dir_open_wrapper(struct inode *inode, struct file *file);
> +static int eventfs_release(struct inode *inode, struct file *file);
> +
> +static const struct inode_operations eventfs_root_dir_inode_operations = {
> + .lookup = eventfs_root_lookup,
> +};
> +
> +static const struct file_operations eventfs_file_operations = {
> + .open = dcache_dir_open_wrapper,
> + .read = generic_read_dir,
> + .iterate_shared = dcache_readdir,
> + .llseek = generic_file_llseek,
> + .release = eventfs_release,
> +};
> +
In preparing for getting rid of eventfs_file, I noticed that all
directories are set to the above ops. In create_dir() instead of passing in
ef->*ops, just use these directly. This does help with future work.
> /**
> * create_file - create a file in the tracefs filesystem
> * @name: the name of the file to create.
> @@ -123,17 +143,12 @@ static struct dentry *create_file(const char *name, umode_t mode,
> * If tracefs is not enabled in the kernel, the value -%ENODEV will be
> * returned.
> */
> -static struct dentry *create_dir(const char *name, umode_t mode,
> - struct dentry *parent, void *data,
> - const struct file_operations *fop,
> - const struct inode_operations *iop)
> +static struct dentry *create_dir(const char *name, struct dentry *parent, void *data)
> {
As stated, the directories always used the same *op values, so I just hard
coded it.
> struct tracefs_inode *ti;
> struct dentry *dentry;
> struct inode *inode;
>
> - WARN_ON(!S_ISDIR(mode));
> -
> dentry = eventfs_start_creating(name, parent);
> if (IS_ERR(dentry))
> return dentry;
> @@ -142,9 +157,9 @@ static struct dentry *create_dir(const char *name, umode_t mode,
> if (unlikely(!inode))
> return eventfs_failed_creating(dentry);
>
> - inode->i_mode = mode;
> - inode->i_op = iop;
> - inode->i_fop = fop;
> + inode->i_mode = S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO;
> + inode->i_op = &eventfs_root_dir_inode_operations;
> + inode->i_fop = &eventfs_file_operations;
> inode->i_private = data;
>
> ti = get_tracefs(inode);
> @@ -169,15 +184,27 @@ void eventfs_set_ef_status_free(struct dentry *dentry)
> struct tracefs_inode *ti_parent;
> struct eventfs_file *ef;
>
> + mutex_lock(&eventfs_mutex);
To synchronize with the removals, I needed to add locking here.
> ti_parent = get_tracefs(dentry->d_parent->d_inode);
> if (!ti_parent || !(ti_parent->flags & TRACEFS_EVENT_INODE))
> - return;
> + goto out;
>
> ef = dentry->d_fsdata;
> if (!ef)
> - return;
> - ef->created = false;
> + goto out;
> + /*
> + * If ef was freed, then the LSB bit is set for d_fsdata.
> + * But this should not happen, as it should still have a
> + * ref count that prevents it. Warn in case it does.
> + */
> + if (WARN_ON_ONCE((unsigned long)ef & 1))
> + goto out;
During the remove, a dget() is done to keep the dentry from freeing. To
make sure that it doesn't get freed, I added this test.
> +
> + dentry->d_fsdata = NULL;
> +
> ef->dentry = NULL;
> + out:
> + mutex_unlock(&eventfs_mutex);
> }
>
> /**
> @@ -202,6 +229,79 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
> ti->private = ef->ei;
> }
>
> +static struct dentry *
> +create_dentry(struct eventfs_file *ef, struct dentry *parent, bool lookup)
> +{
Because both the lookup and the dir_open_wrapper did basically the same
thing, I created a helper function so that I didn't have to update both
locations.
> + bool invalidate = false;
> + struct dentry *dentry;
> +
> + mutex_lock(&eventfs_mutex);
> + if (ef->is_freed) {
> + mutex_unlock(&eventfs_mutex);
> + return NULL;
> + }
Ignore if the ef is on its way to be freed.
> + if (ef->dentry) {
> + dentry = ef->dentry;
If the ef already has a dentry (created) then use it.
> + /* On dir open, up the ref count */
> + if (!lookup)
> + dget(dentry);
> + mutex_unlock(&eventfs_mutex);
> + return dentry;
> + }
> + mutex_unlock(&eventfs_mutex);
> +
> + if (!lookup)
> + inode_lock(parent->d_inode);
> +
> + if (ef->ei)
> + dentry = create_dir(ef->name, parent, ef->data);
> + else
> + dentry = create_file(ef->name, ef->mode, parent,
> + ef->data, ef->fop);
> +
> + if (!lookup)
> + inode_unlock(parent->d_inode);
> +
> + mutex_lock(&eventfs_mutex);
> + if (IS_ERR_OR_NULL(dentry)) {
With the lock dropped, the dentry could have been created causing it to
fail. Check if the ef->dentry exists, and if so, use it instead.
Note, if the ef is freed, it should not have a dentry.
> + /* If the ef was already updated get it */
> + dentry = ef->dentry;
> + if (dentry && !lookup)
> + dget(dentry);
> + mutex_unlock(&eventfs_mutex);
> + return dentry;
> + }
> +
> + if (!ef->dentry && !ef->is_freed) {
With the lock dropped, the dentry could have been filled too. If so, drop
the created dentry and use the one owned by the ef->dentry.
> + ef->dentry = dentry;
> + if (ef->ei)
> + eventfs_post_create_dir(ef);
> + dentry->d_fsdata = ef;
> + } else {
> + /* A race here, should try again (unless freed) */
> + invalidate = true;
I had a WARN_ON() once here. Probably could add a:
WARN_ON_ONCE(!ef->is_freed);
> + }
> + mutex_unlock(&eventfs_mutex);
> + if (invalidate)
> + d_invalidate(dentry);
> +
> + if (lookup || invalidate)
> + dput(dentry);
> +
> + return invalidate ? NULL : dentry;
> +}
> +
> +static bool match_event_file(struct eventfs_file *ef, const char *name)
> +{
A bit of a paranoid helper function. I wanted to make sure to synchronize
with the removals.
> + bool ret;
> +
> + mutex_lock(&eventfs_mutex);
> + ret = !ef->is_freed && strcmp(ef->name, name) == 0;
> + mutex_unlock(&eventfs_mutex);
> +
> + return ret;
> +}
> +
> /**
> * eventfs_root_lookup - lookup routine to create file/dir
> * @dir: directory in which lookup to be done
> @@ -211,7 +311,6 @@ static void eventfs_post_create_dir(struct eventfs_file *ef)
> * Used to create dynamic file/dir with-in @dir, search with-in ei
> * list, if @dentry found go ahead and create the file/dir
> */
> -
> static struct dentry *eventfs_root_lookup(struct inode *dir,
> struct dentry *dentry,
> unsigned int flags)
> @@ -230,30 +329,10 @@ static struct dentry *eventfs_root_lookup(struct inode *dir,
> idx = srcu_read_lock(&eventfs_srcu);
> list_for_each_entry_srcu(ef, &ei->e_top_files, list,
> srcu_read_lock_held(&eventfs_srcu)) {
> - if (strcmp(ef->name, dentry->d_name.name))
> + if (!match_event_file(ef, dentry->d_name.name))
> continue;
> ret = simple_lookup(dir, dentry, flags);
> - if (ef->created)
> - continue;
> - mutex_lock(&eventfs_mutex);
> - ef->created = true;
> - if (ef->ei)
> - ef->dentry = create_dir(ef->name, ef->mode, ef->d_parent,
> - ef->data, ef->fop, ef->iop);
> - else
> - ef->dentry = create_file(ef->name, ef->mode, ef->d_parent,
> - ef->data, ef->fop);
> -
> - if (IS_ERR_OR_NULL(ef->dentry)) {
> - ef->created = false;
> - mutex_unlock(&eventfs_mutex);
> - } else {
> - if (ef->ei)
> - eventfs_post_create_dir(ef);
> - ef->dentry->d_fsdata = ef;
> - mutex_unlock(&eventfs_mutex);
> - dput(ef->dentry);
> - }
> + create_dentry(ef, ef->d_parent, true);
> break;
> }
> srcu_read_unlock(&eventfs_srcu, idx);
> @@ -270,6 +349,7 @@ static int eventfs_release(struct inode *inode, struct file *file)
> struct tracefs_inode *ti;
> struct eventfs_inode *ei;
> struct eventfs_file *ef;
> + struct dentry *dentry;
> int idx;
>
> ti = get_tracefs(inode);
> @@ -280,8 +360,11 @@ static int eventfs_release(struct inode *inode, struct file *file)
> idx = srcu_read_lock(&eventfs_srcu);
> list_for_each_entry_srcu(ef, &ei->e_top_files, list,
> srcu_read_lock_held(&eventfs_srcu)) {
> - if (ef->created)
> - dput(ef->dentry);
> + mutex_lock(&eventfs_mutex);
> + dentry = ef->dentry;
> + mutex_unlock(&eventfs_mutex);
> + if (dentry)
> + dput(dentry);
> }
> srcu_read_unlock(&eventfs_srcu, idx);
> return dcache_dir_close(inode, file);
> @@ -312,47 +395,12 @@ static int dcache_dir_open_wrapper(struct inode *inode, struct file *file)
> ei = ti->private;
> idx = srcu_read_lock(&eventfs_srcu);
> list_for_each_entry_rcu(ef, &ei->e_top_files, list) {
> - if (ef->created) {
> - dget(ef->dentry);
> - continue;
> - }
> - mutex_lock(&eventfs_mutex);
> - ef->created = true;
> -
> - inode_lock(dentry->d_inode);
> - if (ef->ei)
> - ef->dentry = create_dir(ef->name, ef->mode, dentry,
> - ef->data, ef->fop, ef->iop);
> - else
> - ef->dentry = create_file(ef->name, ef->mode, dentry,
> - ef->data, ef->fop);
> - inode_unlock(dentry->d_inode);
> -
> - if (IS_ERR_OR_NULL(ef->dentry)) {
> - ef->created = false;
> - } else {
> - if (ef->ei)
> - eventfs_post_create_dir(ef);
> - ef->dentry->d_fsdata = ef;
> - }
> - mutex_unlock(&eventfs_mutex);
> + create_dentry(ef, dentry, false);
> }
> srcu_read_unlock(&eventfs_srcu, idx);
> return dcache_dir_open(inode, file);
> }
>
> -static const struct file_operations eventfs_file_operations = {
> - .open = dcache_dir_open_wrapper,
> - .read = generic_read_dir,
> - .iterate_shared = dcache_readdir,
> - .llseek = generic_file_llseek,
> - .release = eventfs_release,
> -};
> -
> -static const struct inode_operations eventfs_root_dir_inode_operations = {
> - .lookup = eventfs_root_lookup,
> -};
> -
> /**
> * eventfs_prepare_ef - helper function to prepare eventfs_file
> * @name: the name of the file/directory to create.
> @@ -470,11 +518,7 @@ struct eventfs_file *eventfs_add_subsystem_dir(const char *name,
> ti_parent = get_tracefs(parent->d_inode);
> ei_parent = ti_parent->private;
>
> - ef = eventfs_prepare_ef(name,
> - S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
> - &eventfs_file_operations,
> - &eventfs_root_dir_inode_operations, NULL);
> -
> + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);
For directories, just use the hard coded values.
> if (IS_ERR(ef))
> return ef;
>
> @@ -502,11 +546,7 @@ struct eventfs_file *eventfs_add_dir(const char *name,
> if (!ef_parent)
> return ERR_PTR(-EINVAL);
>
> - ef = eventfs_prepare_ef(name,
> - S_IFDIR | S_IRWXU | S_IRUGO | S_IXUGO,
> - &eventfs_file_operations,
> - &eventfs_root_dir_inode_operations, NULL);
> -
> + ef = eventfs_prepare_ef(name, S_IFDIR, NULL, NULL, NULL);
ditto.
> if (IS_ERR(ef))
> return ef;
>
> @@ -601,37 +641,15 @@ int eventfs_add_file(const char *name, umode_t mode,
> return 0;
> }
>
> -static LLIST_HEAD(free_list);
> -
> -static void eventfs_workfn(struct work_struct *work)
> -{
> - struct eventfs_file *ef, *tmp;
> - struct llist_node *llnode;
> -
> - llnode = llist_del_all(&free_list);
> - llist_for_each_entry_safe(ef, tmp, llnode, llist) {
> - if (ef->created && ef->dentry)
> - dput(ef->dentry);
> - kfree(ef->name);
> - kfree(ef->ei);
> - kfree(ef);
> - }
> -}
> -
> -DECLARE_WORK(eventfs_work, eventfs_workfn);
> -
> static void free_ef(struct rcu_head *head)
> {
> struct eventfs_file *ef = container_of(head, struct eventfs_file, rcu);
>
> - if (!llist_add(&ef->llist, &free_list))
> - return;
> -
> - queue_work(system_unbound_wq, &eventfs_work);
> + kfree(ef->name);
> + kfree(ef->ei);
> + kfree(ef);
Since I did not do the dput() or d_invalidate() here I don't need call this
from task context. This simplifies the process.
> }
>
> -
> -
> /**
> * eventfs_remove_rec - remove eventfs dir or file from list
> * @ef: eventfs_file to be removed.
> @@ -639,7 +657,7 @@ static void free_ef(struct rcu_head *head)
> * This function recursively remove eventfs_file which
> * contains info of file or dir.
> */
> -static void eventfs_remove_rec(struct eventfs_file *ef, int level)
> +static void eventfs_remove_rec(struct eventfs_file *ef, struct list_head *head, int level)
> {
> struct eventfs_file *ef_child;
>
> @@ -659,15 +677,12 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
> /* search for nested folders or files */
> list_for_each_entry_srcu(ef_child, &ef->ei->e_top_files, list,
> lockdep_is_held(&eventfs_mutex)) {
> - eventfs_remove_rec(ef_child, level + 1);
> + eventfs_remove_rec(ef_child, head, level + 1);
> }
> }
>
> - if (ef->created && ef->dentry)
> - d_invalidate(ef->dentry);
> -
> list_del_rcu(&ef->list);
> - call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
> + list_add_tail(&ef->del_list, head);
Hold off on freeing the ef. Add it to a link list to do so later.
> }
>
> /**
> @@ -678,12 +693,62 @@ static void eventfs_remove_rec(struct eventfs_file *ef, int level)
> */
> void eventfs_remove(struct eventfs_file *ef)
> {
> + struct eventfs_file *tmp;
> + LIST_HEAD(ef_del_list);
> + struct dentry *dentry_list = NULL;
> + struct dentry *dentry;
> +
> if (!ef)
> return;
>
> mutex_lock(&eventfs_mutex);
> - eventfs_remove_rec(ef, 0);
> + eventfs_remove_rec(ef, &ef_del_list, 0);
The above returns back with ef_del_list holding all the ef's to be freed.
I probably could have just passed the dentry_list down instead, but I
wanted the below complexity done in a non recursive function.
> +
> + list_for_each_entry_safe(ef, tmp, &ef_del_list, del_list) {
> + if (ef->dentry) {
> + unsigned long ptr = (unsigned long)dentry_list;
> +
> + /* Keep the dentry from being freed yet */
> + dget(ef->dentry);
> +
> + /*
> + * Paranoid: The dget() above should prevent the dentry
> + * from being freed and calling eventfs_set_ef_status_free().
> + * But just in case, set the link list LSB pointer to 1
> + * and have eventfs_set_ef_status_free() check that to
> + * make sure that if it does happen, it will not think
> + * the d_fsdata is an event_file.
> + *
> + * For this to work, no event_file should be allocated
> + * on a odd space, as the ef should always be allocated
> + * to be at least word aligned. Check for that too.
> + */
> + WARN_ON_ONCE(ptr & 1);
> +
> + ef->dentry->d_fsdata = (void *)(ptr | 1);
Set the d_fsdata to be a link list. The comment above needs to say to say
struct eventfs_file and struct dentry should be word aligned. Anyway, while
the eventfs_mutex is held, set all the dentries belonging to eventfs_files
to the dentry_list and clear the ef->dentry.
> + dentry_list = ef->dentry;
> + ef->dentry = NULL;
> + }
> + call_srcu(&eventfs_srcu, &ef->rcu, free_ef);
> + }
> mutex_unlock(&eventfs_mutex);
> +
> + while (dentry_list) {
> + unsigned long ptr;
> +
> + dentry = dentry_list;
> + ptr = (unsigned long)dentry->d_fsdata & ~1UL;
> + dentry_list = (struct dentry *)ptr;
> + dentry->d_fsdata = NULL;
With the mutex released, it is safe to free the dentries here. This also
must be done before returning from this function, as when I had it done in
the workqueue, it was failing some tests that would remove a dynamic event
and still see that the directory was still around!
> + d_invalidate(dentry);
> + mutex_lock(&eventfs_mutex);
> + /* dentry should now have at least a single reference */
> + WARN_ONCE((int)d_count(dentry) < 1,
> + "dentry %px less than one reference (%d) after invalidate\n",
I did update the above to:
WARN_ONCE((int)d_count(dentry) < 1,
"dentry %px (%s) less than one reference (%d) after invalidate\n",
dentry, dentry->d_name.name, d_count(dentry));
To include the name of the dentry (my current work is triggering this still).
> + dentry, d_count(dentry));
> + mutex_unlock(&eventfs_mutex);
> + dput(dentry);
> + }
> }
>
> /**
> diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
> index c443a0c32a8c..1b880b5cd29d 100644
> --- a/fs/tracefs/internal.h
> +++ b/fs/tracefs/internal.h
> @@ -22,4 +22,6 @@ struct dentry *tracefs_end_creating(struct dentry *dentry);
> struct dentry *tracefs_failed_creating(struct dentry *dentry);
> struct inode *tracefs_get_inode(struct super_block *sb);
>
> +void eventfs_set_ef_status_free(struct dentry *dentry);
> +
> #endif /* _TRACEFS_INTERNAL_H */
> diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
> index 4d30b0cafc5f..47c1b4d21735 100644
> --- a/include/linux/tracefs.h
> +++ b/include/linux/tracefs.h
> @@ -51,8 +51,6 @@ void eventfs_remove(struct eventfs_file *ef);
>
> void eventfs_remove_events_dir(struct dentry *dentry);
>
> -void eventfs_set_ef_status_free(struct dentry *dentry);
> -
Oh, and eventfs_set_ef_status_free() should not be exported to outside the
tracefs system.
-- Steve
> struct dentry *tracefs_create_file(const char *name, umode_t mode,
> struct dentry *parent, void *data,
> const struct file_operations *fops);
Hi, Willy, Thomas
The suggestions of v1 nolibc powerpc patchset [1] from you have been applied,
here is v2.
Testing results:
- run with tinyconfig
arch/board | result
------------|------------
ppc/g3beige | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc/ppce500 | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc64le/pseries | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc64le/powernv | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc64/pseries | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
ppc64/powernv | 165 test(s): 158 passed, 7 skipped, 0 failed => status: warning.
- run-user
(Tested with -Os, -O0 and -O2)
// for 32-bit PowerPC
$ for arch in powerpc ppc; do make run-user ARCH=$arch CROSS_COMPILE=powerpc-linux-gnu- ; done | grep status
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
// for 64-bit big-endian PowerPC and 64-bit little-endian PowerPC
$ for arch in ppc64 ppc64le; do make run-user ARCH=$arch CROSS_COMPILE=powerpc64le-linux-gnu- ; done | grep status
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
165 test(s): 157 passed, 8 skipped, 0 failed => status: warning
Changes from v1 --> v2:
- tools/nolibc: add support for powerpc
Add missing arch-powerpc.h lines to arch.h
Align with the other arch-<ARCH>.h, naming the variables
with more meaningful words, such as _ret, _num, _arg1 ...
Clean up the syscall instructions
No line from musl now.
Suggestons from Thomas
* tools/nolibc: add support for pppc64
No change
* selftests/nolibc: add extra configs customize support
To reduce complexity, merge the commands from the new extraconfig
target to defconfig target and drop the extconfig target completely.
Derived from Willy's suggestion of the tinyconfig patchset
* selftests/nolibc: add XARCH and ARCH mapping support
To reduce complexity, let's use XARCH internally and only reserve
ARCH as the input variable.
Derived from Willy's suggestion
* selftests/nolibc: add test support for powerpc
Add ppc as the default 32-bit variant for powerpc target, allow pass
ARCH=ppc or ARCH=powerpc to test 32-bit powerpc
Derived from Willy's suggestion
* selftests/nolibc: add test support for pppc64le
Rename powerpc64le to ppc64le
Suggestion from Willy
* selftests/nolibc: add test support for pppc64
Rename powerpc64 to ppc64
Suggestion from Willy
Best regards,
Zhangjin
---
[1]: https://lore.kernel.org/lkml/cover.1689713175.git.falcon@tinylab.org/
Zhangjin Wu (7):
tools/nolibc: add support for powerpc
tools/nolibc: add support for powerpc64
selftests/nolibc: add extra configs customize support
selftests/nolibc: add XARCH and ARCH mapping support
selftests/nolibc: add test support for ppc
selftests/nolibc: add test support for ppc64le
selftests/nolibc: add test support for ppc64
tools/include/nolibc/arch-powerpc.h | 202 ++++++++++++++++++++++++
tools/include/nolibc/arch.h | 2 +
tools/testing/selftests/nolibc/Makefile | 48 +++++-
3 files changed, 244 insertions(+), 8 deletions(-)
create mode 100644 tools/include/nolibc/arch-powerpc.h
--
2.25.1
Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Users can now select a
desired address space using a non-zero hint address to mmap. Previously,
requesting the default address space from mmap by passing zero as the hint
address would result in using the largest address space possible. Some
applications depend on empty bits in the virtual address space, like Go and
Java, so this patch provides more flexibility for application developers.
-Charlie
---
v6:
- Rebase onto the correct base
v5:
- Minor wording change in documentation
- Change some parenthesis in arch_get_mmap_ macros
- Added case for addr==0 in arch_get_mmap_ because without this, programs would
crash if RLIMIT_STACK was modified before executing the program. This was
tested using the libhugetlbfs tests.
v4:
- Split testcases/document patch into test cases, in-code documentation, and
formal documentation patches
- Modified the mmap_base macro to be more legible and better represent memory
layout
- Fixed documentation to better reflect the implmentation
- Renamed DEFAULT_VA_BITS to MMAP_VA_BITS
- Added additional test case for rlimit changes
---
Charlie Jenkins (4):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Document mmap changes
Documentation/riscv/vm-layout.rst | 22 +++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 20 ++-
arch/riscv/include/asm/processor.h | 46 +++++-
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/.gitignore | 1 +
tools/testing/selftests/riscv/mm/Makefile | 21 +++
.../selftests/riscv/mm/testcases/mmap.c | 133 ++++++++++++++++++
8 files changed, 234 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap.c
--
2.41.0
This is the basic functionality for iommufd to support
iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices.
iommufd_device_replace() allows changing the HWPT associated with the
device to a new IOAS or HWPT. Replace does this in way that failure leaves
things unchanged, and utilizes the iommu iommu_group_replace_domain() API
to allow the iommu driver to perform an optional non-disruptive change.
IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and
used by attach or replace. At this point it isn't very useful since the
HWPT is the same as the automatically managed HWPT from the IOAS. However
a following series will allow userspace to customize the created HWPT.
The implementation is complicated because we have to introduce some
per-iommu_group memory in iommufd and redo how we think about multi-device
groups to be more explicit. This solves all the locking problems in the
prior attempts.
This series is infrastructure work for the following series which:
- Add replace for attach
- Expose replace through VFIO APIs
- Implement driver parameters for HWPT creation (nesting)
Once review of this is complete I will keep it on a side branch and
accumulate the following series when they are ready so we can have a
stable base and make more incremental progress. When we have all the parts
together to get a full implementation it can go to Linus.
This is on github: https://github.com/jgunthorpe/linux/commits/iommufd_hwpt
v8:
- Rebase to v6.5-rc2, update to new behavior of __iommu_group_set_domain()
v7: https://lore.kernel.org/r/0-v7-6c0fd698eda2+5e3-iommufd_alloc_jgg@nvidia.com
- Rebase to v6.4-rc2, update to new signature of iommufd_get_ioas()
v6: https://lore.kernel.org/r/0-v6-fdb604df649a+369-iommufd_alloc_jgg@nvidia.com
- Go back to the v4 locking arragnment with now both the attach/detach
igroup->locks inside the functions, Kevin says he needs this for a
followup series. This still fixes the syzkaller bug
- Fix two more error unwind locking bugs where
iommufd_object_abort_and_destroy(hwpt) would deadlock or be mislocked.
Make sure fail_nth will catch these mistakes
- Add a patch allowing objects to have different abort than destroy
function, it allows hwpt abort to require the caller to continue
to hold the lock and enforces this with lockdep.
v5: https://lore.kernel.org/r/0-v5-6716da355392+c5-iommufd_alloc_jgg@nvidia.com
- Go back to the v3 version of the code, keep the comment changes from
v4. Syzkaller says the group lock change in v4 didn't work.
- Adjust the fail_nth test to cover the path syzkaller found. We need to
have an ioas with a mapped page installed to inject a failure during
domain attachment.
v4: https://lore.kernel.org/r/0-v4-9cd79ad52ee8+13f5-iommufd_alloc_jgg@nvidia.c…
- Refine comments and commit messages
- Move the group lock into iommufd_hw_pagetable_attach()
- Fix error unwind in iommufd_device_do_replace()
v3: https://lore.kernel.org/r/0-v3-61d41fd9e13e+1f5-iommufd_alloc_jgg@nvidia.com
- Refine comments and commit messages
- Adjust the flow in iommufd_device_auto_get_domain() so pt_id is only
set on success
- Reject replace on non-attached devices
- Add missing __reserved check for IOMMU_HWPT_ALLOC
v2: https://lore.kernel.org/r/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.c…
- Use WARN_ON for the igroup->group test and move that logic to a
function iommufd_group_try_get()
- Change igroup->devices to igroup->device list
Replace will need to iterate over all attached idevs
- Rename to iommufd_group_setup_msi()
- New patch to export iommu_get_resv_regions()
- New patch to use per-device reserved regions instead of per-group
regions
- Split out the reorganizing of iommufd_device_change_pt() from the
replace patch
- Replace uses the per-dev reserved regions
- Use stdev_id in a few more places in the selftest
- Fix error handling in IOMMU_HWPT_ALLOC
- Clarify comments
- Rebase on v6.3-rc1
v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia…
Jason Gunthorpe (17):
iommufd: Move isolated msi enforcement to iommufd_device_bind()
iommufd: Add iommufd_group
iommufd: Replace the hwpt->devices list with iommufd_group
iommu: Export iommu_get_resv_regions()
iommufd: Keep track of each device's reserved regions instead of
groups
iommufd: Use the iommufd_group to avoid duplicate MSI setup
iommufd: Make sw_msi_start a group global
iommufd: Move putting a hwpt to a helper function
iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
iommufd: Allow a hwpt to be aborted after allocation
iommufd: Fix locking around hwpt allocation
iommufd: Reorganize iommufd_device_attach into
iommufd_device_change_pt
iommufd: Add iommufd_device_replace()
iommufd: Make destroy_rwsem use a lock class per object type
iommufd: Add IOMMU_HWPT_ALLOC
iommufd/selftest: Return the real idev id from selftest mock_domain
iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
Nicolin Chen (2):
iommu: Introduce a new iommu_group_replace_domain() API
iommufd/selftest: Test iommufd_device_replace()
drivers/iommu/iommu-priv.h | 10 +
drivers/iommu/iommu.c | 38 +-
drivers/iommu/iommufd/device.c | 555 +++++++++++++-----
drivers/iommu/iommufd/hw_pagetable.c | 112 +++-
drivers/iommu/iommufd/io_pagetable.c | 32 +-
drivers/iommu/iommufd/iommufd_private.h | 52 +-
drivers/iommu/iommufd/iommufd_test.h | 6 +
drivers/iommu/iommufd/main.c | 24 +-
drivers/iommu/iommufd/selftest.c | 40 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/iommufd.h | 26 +
tools/testing/selftests/iommu/iommufd.c | 67 ++-
.../selftests/iommu/iommufd_fail_nth.c | 67 ++-
tools/testing/selftests/iommu/iommufd_utils.h | 63 +-
14 files changed, 867 insertions(+), 226 deletions(-)
create mode 100644 drivers/iommu/iommu-priv.h
base-commit: fdf0eaf11452d72945af31804e2a1048ee1b574c
--
2.41.0
Apologize for sending previous mail from a wrong app (not text mode).
Resending to keep the mailing list thread consistent.
On Wed, Jul 26, 2023 at 3:10 AM Markus Elfring <Markus.Elfring(a)web.de>
wrote:
>
> > Tests BPF redirect at the lwt xmit hook to ensure error handling are
> > safe, i.e. won't panic the kernel.
>
> Are imperative change descriptions still preferred?
Hi Markus,
I think you linked this to me yesterday that it should be described
imperatively:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Doc…
>
> See also:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Doc…
>
I don’t follow the purpose of this reference. This points to user impact
but this is a selftest, so I don’t see any user impact here. Or is there
anything I missed?
>
> Can remaining wording weaknesses be adjusted accordingly?
I am not following this question . Can you be more specific or provide an
example?
Yan
>
> Regards,
> Markus
>
Dzień dobry,
zapoznałem się z Państwa ofertą i z przyjemnością przyznaję, że przyciąga uwagę i zachęca do dalszych rozmów.
Pomyślałem, że może mógłbym mieć swój wkład w Państwa rozwój i pomóc dotrzeć z tą ofertą do większego grona odbiorców. Pozycjonuję strony www, dzięki czemu generują świetny ruch w sieci.
Możemy porozmawiać w najbliższym czasie?
Pozdrawiam
Adam Charachuta
Add specification for declaring test metadata to the KTAP v2 spec.
The purpose of test metadata is to allow for the declaration of essential
testing information in KTAP output. This information includes test
names, test configuration info, test attributes, and test files.
There have been similar ideas around the idea of test metadata such as test
prefixes and test name lines. However, I propose this specification as an
overall fix for these issues.
These test metadata lines are a form of diagnostic lines with the
format: "# <metadata_type>: <data>". As a type of diagnostic line, test
metadata lines are compliant with KTAP v1, which will help to not
interfere too much with current parsers.
Specifically the "# Subtest:" line is derived from the TAP 14 spec:
https://testanything.org/tap-version-14-specification.html.
The proposed location for test metadata is in the test header, between the
version line and the test plan line. Note including diagnostic lines in
the test header is a depature from KTAP v1.
This location provides two main benefits:
First, metadata will be printed prior to when subtests are run. Then if a
test fails, test metadata can help discern which test is causing the issue
and potentially why.
Second, this location ensures that the lines will not be accidentally
parsed as a subtest's diagnostic lines because the lines are bordered by
the version line and plan line.
Here is an example of test metadata:
KTAP version 2
# Config: CONFIG_TEST=y
1..1
KTAP version 2
# Subtest: test_suite
# File: /sys/kernel/...
# Attributes: slow
# Other: example_test
1..2
ok 1 test_1
ok 2 test_2
ok 1 test_suite
Here is a link to a version of the KUnit parser that is able to parse test
metadata lines for KTAP version 2. Note this includes test metadata
lines for the main level of KTAP.
Link: https://kunit-review.googlesource.com/c/linux/+/5809
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
Hi everyone,
I would like to use this proposal similar to an RFC to gather ideas on the
topic of test metadata. Let me know what you think.
I am also interested in brainstorming a list of recognized metadata types.
Providing recognized metadata types would be helpful in parsing and
displaying test metadata in a useful way.
Current ideas:
- "# Subtest: <test_name>" to indicate test name (name must match
corresponding result line)
- "# Attributes: <attributes list>" to indicate test attributes (list
separated by commas)
- "# File: <file_path>" to indicate file used in testing
Any other ideas?
Note this proposal replaces two of my previous proposals: "ktap_v2: add
recognized test name line" and "ktap_v2: allow prefix to KTAP lines."
Thanks!
-Rae
Note: this patch is based on Frank's ktap_spec_version_2 branch.
Documentation/dev-tools/ktap.rst | 51 ++++++++++++++++++++++++++++++--
1 file changed, 48 insertions(+), 3 deletions(-)
diff --git a/Documentation/dev-tools/ktap.rst b/Documentation/dev-tools/ktap.rst
index ff77f4aaa6ef..a2d0a196c115 100644
--- a/Documentation/dev-tools/ktap.rst
+++ b/Documentation/dev-tools/ktap.rst
@@ -17,7 +17,9 @@ KTAP test results describe a series of tests (which may be nested: i.e., test
can have subtests), each of which can contain both diagnostic data -- e.g., log
lines -- and a final result. The test structure and results are
machine-readable, whereas the diagnostic data is unstructured and is there to
-aid human debugging.
+aid human debugging. One exception to this is test metadata lines - a type
+of diagnostic lines. Test metadata is located between the version line and
+plan line of a test and can be machine-readable.
KTAP output is built from four different types of lines:
- Version lines
@@ -28,8 +30,7 @@ KTAP output is built from four different types of lines:
In general, valid KTAP output should also form valid TAP output, but some
information, in particular nested test results, may be lost. Also note that
there is a stagnant draft specification for TAP14, KTAP diverges from this in
-a couple of places (notably the "Subtest" header), which are described where
-relevant later in this document.
+a couple of places, which are described where relevant later in this document.
Version lines
-------------
@@ -166,6 +167,45 @@ even if they do not start with a "#": this is to capture any other useful
kernel output which may help debug the test. It is nevertheless recommended
that tests always prefix any diagnostic output they have with a "#" character.
+Test metadata lines
+-------------------
+
+Test metadata lines are a type of diagnostic lines used to the declare the
+name of a test and other helpful testing information in the test header.
+These lines are often helpful for parsing and for providing context during
+crashes.
+
+Test metadata lines must follow the format: "# <metadata_type>: <data>".
+These lines must be located between the version line and the plan line
+within a test header.
+
+There are a few currently recognized metadata types:
+- "# Subtest: <test_name>" to indicate test name (name must match
+ corresponding result line)
+- "# Attributes: <attributes list>" to indicate test attributes (list
+ separated by commas)
+- "# File: <file_path>" to indicate file used in testing
+
+As a rule, the "# Subtest:" line is generally first to declare the test
+name. Note that metadata lines do not necessarily need to use a
+recognized metadata type.
+
+An example of using metadata lines:
+
+::
+
+ KTAP version 2
+ 1..1
+ # File: /sys/kernel/...
+ KTAP version 2
+ # Subtest: example
+ # Attributes: slow, example_test
+ 1..1
+ ok 1 test_1
+ # example passed
+ ok 1 example
+
+
Unknown lines
-------------
@@ -206,6 +246,7 @@ An example of a test with two nested subtests:
KTAP version 2
1..1
KTAP version 2
+ # Subtest: example
1..2
ok 1 test_1
not ok 2 test_2
@@ -219,6 +260,7 @@ An example format with multiple levels of nested testing:
KTAP version 2
1..2
KTAP version 2
+ # Subtest: example_test_1
1..2
KTAP version 2
1..2
@@ -254,6 +296,7 @@ Example KTAP output
KTAP version 2
1..1
KTAP version 2
+ # Subtest: main_test
1..3
KTAP version 2
1..1
@@ -261,11 +304,13 @@ Example KTAP output
ok 1 test_1
ok 1 example_test_1
KTAP version 2
+ # Attributes: slow
1..2
ok 1 test_1 # SKIP test_1 skipped
ok 2 test_2
ok 2 example_test_2
KTAP version 2
+ # Subtest: example_test_3
1..3
ok 1 test_1
# test_2: FAIL
base-commit: 906f02e42adfbd5ae70d328ee71656ecb602aaf5
--
2.40.0.396.gfff15efe05-goog
lwt xmit hook does not expect positive return values in function
ip_finish_output2 and ip6_finish_output2. However, BPF redirect programs
can return positive values such like NET_XMIT_DROP, NET_RX_DROP, and etc
as errors. Such return values can panic the kernel unexpectedly:
https://gist.github.com/zhaiyan920/8fbac245b261fe316a7ef04c9b1eba48
This patch fixes the return values from BPF redirect, so the error
handling would be consistent at xmit hook. It also adds a few test cases
to prevent future regressions.
v2: https://lore.kernel.org/netdev/ZLdY6JkWRccunvu0@debian.debian/
v1: https://lore.kernel.org/bpf/ZLbYdpWC8zt9EJtq@debian.debian/
changes since v2:
* subject name changed
* also covered redirect to ingress case
* added selftests
changes since v1:
* minor code style changes
Yan Zhai (2):
bpf: fix skb_do_redirect return values
bpf: selftests: add lwt redirect regression test cases
net/core/filter.c | 12 +-
tools/testing/selftests/bpf/Makefile | 1 +
.../selftests/bpf/progs/test_lwt_redirect.c | 67 +++++++
.../selftests/bpf/test_lwt_redirect.sh | 165 ++++++++++++++++++
4 files changed, 244 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/bpf/progs/test_lwt_redirect.c
create mode 100755 tools/testing/selftests/bpf/test_lwt_redirect.sh
--
2.30.2
Remove clean target in Makefile to fix the following warning
and use the one in common lib.mk
Makefile:14: warning: overriding recipe for target 'clean'
../lib.mk:160: warning: ignoring old recipe for target 'clean'
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
tools/testing/selftests/prctl/Makefile | 2 --
1 file changed, 2 deletions(-)
diff --git a/tools/testing/selftests/prctl/Makefile b/tools/testing/selftests/prctl/Makefile
index cfc35d29fc2e..01dc90fbb509 100644
--- a/tools/testing/selftests/prctl/Makefile
+++ b/tools/testing/selftests/prctl/Makefile
@@ -10,7 +10,5 @@ all: $(TEST_PROGS)
include ../lib.mk
-clean:
- rm -fr $(TEST_PROGS)
endif
endif
--
2.39.2
On Tue, Jul 25, 2023 at 12:11 AM Markus Elfring <Markus.Elfring(a)web.de> wrote:
>
> > … unexpected problems. This change
> > converts the positive status code to proper error code.
>
> Please choose a corresponding imperative change suggestion.
>
> See also:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Doc…
>
>
> Did you provide sufficient justification for a possible addition of the tag “Fixes”?
>
>
> …
> > v2: code style change suggested by Stanislav Fomichev
> > ---
> > net/core/filter.c | 12 +++++++++++-
> …
>
> How do you think about to replace this marker by a line break?
>
> See also:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Doc…
>
> Regards,
> Markus
Hi Markus,
Thanks for the suggestions, those are what I could use more help with.
Will address these in the next version.
Yan
Hello everyone,
This patch series adds a test attributes framework to KUnit.
There has been interest in filtering out "slow" KUnit tests. Most notably,
a new config, CONFIG_MEMCPY_SLOW_KUNIT_TEST, has been added to exclude a
particularly slow memcpy test
(https://lore.kernel.org/all/20230118200653.give.574-kees@kernel.org/).
This attributes framework can be used to save and access test associated
data, including whether a test is slow. These attributes are reportable
(via KTAP and command line output) and are also filterable.
This framework is designed to allow for the addition of other attributes in
the future. These attributes could include whether the test can be run
concurrently, test file path, etc.
To try out the framework I suggest running:
"./tools/testing/kunit/kunit.py run --filter speed!=slow"
This patch series was originally sent out as an RFC. Here is a link to the
RFC v2:
https://lore.kernel.org/all/20230707210947.1208717-1-rmoar@google.com/
Thanks!
Rae
Rae Moar (9):
kunit: Add test attributes API structure
kunit: Add speed attribute
kunit: Add module attribute
kunit: Add ability to filter attributes
kunit: tool: Add command line interface to filter and report
attributes
kunit: memcpy: Mark tests as slow using test attributes
kunit: time: Mark test as slow using test attributes
kunit: add tests for filtering attributes
kunit: Add documentation of KUnit test attributes
.../dev-tools/kunit/running_tips.rst | 166 +++++++
include/kunit/attributes.h | 50 +++
include/kunit/test.h | 70 ++-
kernel/time/time_test.c | 2 +-
lib/Kconfig.debug | 3 +
lib/kunit/Makefile | 3 +-
lib/kunit/attributes.c | 418 ++++++++++++++++++
lib/kunit/executor.c | 115 ++++-
lib/kunit/executor_test.c | 128 +++++-
lib/kunit/kunit-example-test.c | 9 +
lib/kunit/test.c | 27 +-
lib/memcpy_kunit.c | 8 +-
tools/testing/kunit/kunit.py | 70 ++-
tools/testing/kunit/kunit_kernel.py | 8 +-
tools/testing/kunit/kunit_parser.py | 11 +-
tools/testing/kunit/kunit_tool_test.py | 39 +-
16 files changed, 1051 insertions(+), 76 deletions(-)
create mode 100644 include/kunit/attributes.h
create mode 100644 lib/kunit/attributes.c
base-commit: 64bd4641310c41a1ecf07c13c67bc0ed61045dfd
--
2.41.0.487.g6d72f3e995-goog
Hi All,
This is v3 of my series to clean up mm selftests so that they run correctly on
arm64. See [1] for full explanation.
Only patch 6 has changed vs v2. The rest are the same and already carry
reviewed/acked-bys. So I'm hoping I can get the final patch reviewed and this
series is hopefully then good enough to merge?
Changes Since v2 [2]
--------------------
- Patch 6: Change approach to cleaning up child processes; Use "parent death
signal", as suggested by David.
- Added Reviewed-by/Acked-by tags: thanks to David, Mark and Peter!
Changes Since v1 [1]
--------------------
- Patch 1: Explicitly set line buffer mode in ksft_print_header()
- Dropped v1 patch 2 (set execute permissions): Andrew has taken this into his
branch separately.
- Patch 2: Don't compile `soft-dirty` suite for arm64 instead of skipping it
at runtime.
- Patch 2: Declare fewer tests and skip all of test_softdirty() if soft-dirty
is not supported, rather than conditionally marking each check as skipped.
- Added Reviewed-by tags: thanks DavidH!
- Patch 8: Clarified commit message.
[1] https://lore.kernel.org/linux-mm/20230713135440.3651409-1-ryan.roberts@arm.…
[2] https://lore.kernel.org/linux-mm/20230717103152.202078-1-ryan.roberts@arm.c…
Thanks,
Ryan
Ryan Roberts (8):
selftests: Line buffer test program's stdout
selftests/mm: Skip soft-dirty tests on arm64
selftests/mm: Enable mrelease_test for arm64
selftests/mm: Fix thuge-gen test bugs
selftests/mm: va_high_addr_switch should skip unsupported arm64
configs
selftests/mm: Make migration test robust to failure
selftests/mm: Optionally pass duration to transhuge-stress
selftests/mm: Run all tests from run_vmtests.sh
tools/testing/selftests/kselftest.h | 9 ++
tools/testing/selftests/kselftest/runner.sh | 7 +-
tools/testing/selftests/mm/Makefile | 82 ++++++++++---------
tools/testing/selftests/mm/madv_populate.c | 26 +++++-
tools/testing/selftests/mm/migration.c | 12 ++-
tools/testing/selftests/mm/mrelease_test.c | 1 +
tools/testing/selftests/mm/run_vmtests.sh | 28 ++++++-
tools/testing/selftests/mm/settings | 2 +-
tools/testing/selftests/mm/thuge-gen.c | 4 +-
tools/testing/selftests/mm/transhuge-stress.c | 12 ++-
.../selftests/mm/va_high_addr_switch.c | 2 +-
11 files changed, 132 insertions(+), 53 deletions(-)
--
2.25.1
The arm64 Guarded Control Stack (GCS) feature provides support for
hardware protected stacks of return addresses, intended to provide
hardening against return oriented programming (ROP) attacks and to make
it easier to gather call stacks for applications such as profiling.
When GCS is active a secondary stack called the Guarded Control Stack is
maintained, protected with a memory attribute which means that it can
only be written with specific GCS operations. When a BL is executed the
value stored in LR is also pushed onto the GCS, and when a RET is
executed the top of the GCS is popped and compared to LR with a fault
being raised if the values do not match. GCS operations may only be
performed on GCS pages, a data abort is generated if they are not.
This series implements support for use of GCS by userspace, along with
support for use of GCS within KVM guests. It does not enable use of GCS
by either EL1 or EL2. Executables are started without GCS and must use
a prctl() to enable it, it is expected that this will be done very early
in application execution by the dynamic linker or other startup code.
x86 has an equivalent feature called shadow stacks, this series depends
on the x86 patches for generic memory management support for the new
guarded/shadow stack page type and shares APIs as much as possible. As
there has been extensive discussion with the wider community around the
ABI for shadow stacks I have as far as practical kept implementation
decisions close to those for x86, anticipating that review would lead to
similar conclusions in the absence of strong reasoning for divergence.
The main divergence I am concious of is that x86 allows shadow stack to
be enabled and disabled repeatedly, freeing the shadow stack for the
thread whenever disabled, while this implementation keeps the GCS
allocated after disable but refuses to reenable it. This is to avoid
races with things actively walking the GCS during a disable, we do
anticipate that some systems will wish to disable GCS at runtime but are
not aware of any demand for subsequently reenabling it.
x86 uses an arch_prctl() to manage enable and disable, since only x86
and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a
patch set for the equivalent RISC-V zisslpcfi feature which I initially
adopted fairly directly but following review feedback has been reviewed
quite a bit.
There is an open issue with support for CRIU, on x86 this required the
ability to set the GCS mode via ptrace. This series supports
configuring mode bits other than enable/disable via ptrace but it needs
to be confirmed if this is sufficient.
There's a few bits where I'm not convinced with where I've placed
things, in particular the GCS write operation is in the GCS header not
in uaccess.h, I wasn't sure what was clearest there and am probably too
close to the code to have a clear opinion. The reporting of GCS in
/proc/PID/smaps is also a bit awkward.
The series depends on the x86 shadow stack support:
https://lore.kernel.org/lkml/20230227222957.24501-1-rick.p.edgecombe@intel.…
I've rebased this onto v6.5-rc3 but not included it in the series in
order to avoid confusion with Rick's work and cut down the size of the
series, you can see the branch at:
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc.git arm64-gcs
[1] https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto v6.5-rc3.
- Rework prctl() interface to allow each bit to be locked independently.
- map_shadow_stack() now places the cap token based on the size
requested by the caller not the actual space allocated.
- Mode changes other than enable via ptrace are now supported.
- Expand test coverage.
- Various smaller fixes and adjustments.
- Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org
---
Mark Brown (35):
prctl: arch-agnostic prctl for shadow stack
arm64: Document boot requirements for Guarded Control Stacks
arm64/gcs: Document the ABI for Guarded Control Stacks
arm64/sysreg: Add new system registers for GCS
arm64/sysreg: Add definitions for architected GCS caps
arm64/gcs: Add manual encodings of GCS instructions
arm64/gcs: Provide copy_to_user_gcs()
arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS)
arm64/mm: Allocate PIE slots for EL0 guarded control stack
mm: Define VM_SHADOW_STACK for arm64 when we support GCS
arm64/mm: Map pages for guarded control stack
KVM: arm64: Manage GCS registers for guests
arm64/el2_setup: Allow GCS usage at EL0 and EL1
arm64/idreg: Add overrride for GCS
arm64/hwcap: Add hwcap for GCS
arm64/traps: Handle GCS exceptions
arm64/mm: Handle GCS data aborts
arm64/gcs: Context switch GCS registers for EL0
arm64/gcs: Allocate a new GCS for threads with GCS enabled
arm64/gcs: Implement shadow stack prctl() interface
arm64/mm: Implement map_shadow_stack()
arm64/signal: Set up and restore the GCS context for signal handlers
arm64/signal: Expose GCS state in signal frames
arm64/ptrace: Expose GCS via ptrace and core files
arm64: Add Kconfig for Guarded Control Stack (GCS)
kselftest/arm64: Verify the GCS hwcap
kselftest/arm64: Add GCS as a detected feature in the signal tests
kselftest/arm64: Add framework support for GCS to signal handling tests
kselftest/arm64: Allow signals tests to specify an expected si_code
kselftest/arm64: Always run signals tests with GCS enabled
kselftest/arm64: Add very basic GCS test program
kselftest/arm64: Add a GCS test program built with the system libc
kselftest/arm64: Add test coverage for GCS mode locking
selftests/arm64: Add GCS signal tests
kselftest/arm64: Enable GCS for the FP stress tests
Documentation/admin-guide/kernel-parameters.txt | 3 +
Documentation/arch/arm64/booting.rst | 22 ++
Documentation/arch/arm64/elf_hwcaps.rst | 3 +
Documentation/arch/arm64/gcs.rst | 225 +++++++++++++
Documentation/arch/arm64/index.rst | 1 +
Documentation/filesystems/proc.rst | 2 +-
arch/arm64/Kconfig | 19 ++
arch/arm64/include/asm/cpufeature.h | 6 +
arch/arm64/include/asm/el2_setup.h | 17 +
arch/arm64/include/asm/esr.h | 28 +-
arch/arm64/include/asm/exception.h | 2 +
arch/arm64/include/asm/gcs.h | 106 ++++++
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/asm/kvm_host.h | 12 +
arch/arm64/include/asm/pgtable-prot.h | 14 +-
arch/arm64/include/asm/processor.h | 7 +
arch/arm64/include/asm/sysreg.h | 20 ++
arch/arm64/include/asm/uaccess.h | 42 +++
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/ptrace.h | 8 +
arch/arm64/include/uapi/asm/sigcontext.h | 9 +
arch/arm64/kernel/cpufeature.c | 19 ++
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kernel/entry-common.c | 23 ++
arch/arm64/kernel/idreg-override.c | 2 +
arch/arm64/kernel/process.c | 78 +++++
arch/arm64/kernel/ptrace.c | 59 ++++
arch/arm64/kernel/signal.c | 237 ++++++++++++-
arch/arm64/kernel/traps.c | 11 +
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 17 +
arch/arm64/kvm/sys_regs.c | 22 ++
arch/arm64/mm/Makefile | 1 +
arch/arm64/mm/fault.c | 78 ++++-
arch/arm64/mm/gcs.c | 226 +++++++++++++
arch/arm64/mm/mmap.c | 17 +-
arch/arm64/tools/cpucaps | 1 +
arch/arm64/tools/sysreg | 55 +++
fs/proc/task_mmu.c | 3 +
include/linux/mm.h | 16 +-
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 5 +-
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 22 ++
kernel/sys.c | 30 ++
kernel/sys_ni.c | 1 +
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/abi/hwcap.c | 19 ++
tools/testing/selftests/arm64/fp/assembler.h | 15 +
tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 +
tools/testing/selftests/arm64/fp/sve-test.S | 2 +
tools/testing/selftests/arm64/fp/za-test.S | 2 +
tools/testing/selftests/arm64/fp/zt-test.S | 2 +
tools/testing/selftests/arm64/gcs/.gitignore | 3 +
tools/testing/selftests/arm64/gcs/Makefile | 19 ++
tools/testing/selftests/arm64/gcs/basic-gcs.c | 351 +++++++++++++++++++
tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 +++++++++++
tools/testing/selftests/arm64/gcs/gcs-util.h | 87 +++++
tools/testing/selftests/arm64/gcs/libc-gcs.c | 372 +++++++++++++++++++++
tools/testing/selftests/arm64/signal/.gitignore | 1 +
.../testing/selftests/arm64/signal/test_signals.c | 17 +-
.../testing/selftests/arm64/signal/test_signals.h | 6 +
.../selftests/arm64/signal/test_signals_utils.c | 32 +-
.../selftests/arm64/signal/test_signals_utils.h | 39 +++
.../arm64/signal/testcases/gcs_exception_fault.c | 59 ++++
.../selftests/arm64/signal/testcases/gcs_frame.c | 78 +++++
.../arm64/signal/testcases/gcs_write_fault.c | 67 ++++
.../selftests/arm64/signal/testcases/testcases.c | 7 +
.../selftests/arm64/signal/testcases/testcases.h | 1 +
68 files changed, 2825 insertions(+), 32 deletions(-)
---
base-commit: b8f2cc1100d85456f9a48243328b33ab0ce5caff
change-id: 20230303-arm64-gcs-e311ab0d8729
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Hi,
Using the same config for 6.5-rc2 on Ubuntu 22.04 LTS and 22.10, the execution
stop at the exact same line on both boxes (os I reckon it is more than an
accident):
# selftests: net/forwarding: sch_ets.sh
# TEST: ping vlan 10 [ OK ]
# TEST: ping vlan 11 [ OK ]
# TEST: ping vlan 12 [ OK ]
# Running in priomap mode
# Testing ets bands 3 strict 3, streams 0 1
# TEST: band 0 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 1 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 3 strict 3, streams 1 2
# TEST: band 1 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 2 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
# TEST: band 0 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 1 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 1 2
# TEST: bands 1:2 [ OK ]
# INFO: Expected ratio 2.00 Measured ratio 1.99
# Testing ets bands 3 quanta 3300 3300 3300, streams 0 1 2
# TEST: bands 0:1 [ OK ]
# INFO: Expected ratio 1.00 Measured ratio .99
# TEST: bands 0:2 [ OK ]
# INFO: Expected ratio 1.00 Measured ratio 1.00
# Testing ets bands 3 quanta 5000 3500 1500, streams 0 1 2
# TEST: bands 0:1 [ OK ]
# INFO: Expected ratio 1.42 Measured ratio 1.42
# TEST: bands 0:2 [ OK ]
# INFO: Expected ratio 3.33 Measured ratio 3.33
# Testing ets bands 3 quanta 5000 8000 1500, streams 0 1 2
# TEST: bands 0:1 [ OK ]
# INFO: Expected ratio 1.60 Measured ratio 1.59
# TEST: bands 0:2 [ OK ]
# INFO: Expected ratio 3.33 Measured ratio 3.33
# Testing ets bands 2 quanta 5000 2500, streams 0 1
# TEST: bands 0:1 [ OK ]
# INFO: Expected ratio 2.00 Measured ratio 1.99
# Running in classifier mode
# Testing ets bands 3 strict 3, streams 0 1
# TEST: band 0 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 1 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 3 strict 3, streams 1 2
# TEST: band 1 [ OK ]
# INFO: Expected ratio >95% Measured ratio 100.00
# TEST: band 2 [ OK ]
# INFO: Expected ratio <5% Measured ratio 0
# Testing ets bands 4 strict 1 quanta 5000 2500 1500, streams 0 1
I tried to run 'set -x' enabled version standalone, but that one finished
correctly (?).
It could be something previous scripts left, but right now I don't have a clue.
I can attempt to rerun all tests with sch_ets.sh bash 'set -x' enabled later today.
Best regards,
Mirsad Todorovac
Adds a check to verify if the rtc device file is valid or not
and prints a useful error message if the file is not accessible.
Signed-off-by: Atul Kumar Pant <atulpant.linux(a)gmail.com>
Acked-by: Alexandre Belloni <alexandre.belloni(a)bootlin.com>
---
changes since v4:
Updated the commit message.
changes since v3:
Added Linux-kselftest and Linux-kernel mailing lists.
changes since v2:
Changed error message when rtc file does not exist.
changes since v1:
Removed check for uid=0
If rtc file is invalid, then exit the test.
tools/testing/selftests/rtc/rtctest.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/rtc/rtctest.c b/tools/testing/selftests/rtc/rtctest.c
index 63ce02d1d5cc..630fef735c7e 100644
--- a/tools/testing/selftests/rtc/rtctest.c
+++ b/tools/testing/selftests/rtc/rtctest.c
@@ -17,6 +17,7 @@
#include <unistd.h>
#include "../kselftest_harness.h"
+#include "../kselftest.h"
#define NUM_UIE 3
#define ALARM_DELTA 3
@@ -419,6 +420,8 @@ __constructor_order_last(void)
int main(int argc, char **argv)
{
+ int ret = -1;
+
switch (argc) {
case 2:
rtc_file = argv[1];
@@ -430,5 +433,11 @@ int main(int argc, char **argv)
return 1;
}
- return test_harness_run(argc, argv);
+ // Run the test if rtc_file is valid
+ if (access(rtc_file, F_OK) == 0)
+ ret = test_harness_run(argc, argv);
+ else
+ ksft_exit_fail_msg("[ERROR]: Cannot access rtc file %s - Exiting\n", rtc_file);
+
+ return ret;
}
--
2.25.1
Hello everyone,
This patch series adds a test attributes framework to KUnit.
There has been interest in filtering out "slow" KUnit tests. Most notably,
a new config, CONFIG_MEMCPY_SLOW_KUNIT_TEST, has been added to exclude a
particularly slow memcpy test
(https://lore.kernel.org/all/20230118200653.give.574-kees@kernel.org/).
This attributes framework can be used to save and access test associated
data, including whether a test is slow. These attributes are reportable
(via KTAP and command line output) and are also filterable.
This framework is designed to allow for the addition of other attributes in
the future. These attributes could include whether the test can be run
concurrently, test file path, etc.
To try out the framework I suggest running:
"./tools/testing/kunit/kunit.py run --filter speed!=slow"
This patch series was originally sent out as an RFC. Here is a link to the
RFC v2:
https://lore.kernel.org/all/20230707210947.1208717-1-rmoar@google.com/
Thanks!
Rae
Rae Moar (9):
kunit: Add test attributes API structure
kunit: Add speed attribute
kunit: Add module attribute
kunit: Add ability to filter attributes
kunit: tool: Add command line interface to filter and report
attributes
kunit: memcpy: Mark tests as slow using test attributes
kunit: time: Mark test as slow using test attributes
kunit: add tests for filtering attributes
kunit: Add documentation of KUnit test attributes
.../dev-tools/kunit/running_tips.rst | 166 +++++++
include/kunit/attributes.h | 50 +++
include/kunit/test.h | 70 ++-
kernel/time/time_test.c | 2 +-
lib/Kconfig.debug | 3 +
lib/kunit/Makefile | 3 +-
lib/kunit/attributes.c | 421 ++++++++++++++++++
lib/kunit/executor.c | 115 ++++-
lib/kunit/executor_test.c | 128 +++++-
lib/kunit/kunit-example-test.c | 9 +
lib/kunit/test.c | 27 +-
lib/memcpy_kunit.c | 8 +-
tools/testing/kunit/kunit.py | 70 ++-
tools/testing/kunit/kunit_kernel.py | 8 +-
tools/testing/kunit/kunit_parser.py | 11 +-
tools/testing/kunit/kunit_tool_test.py | 39 +-
16 files changed, 1054 insertions(+), 76 deletions(-)
create mode 100644 include/kunit/attributes.h
create mode 100644 lib/kunit/attributes.c
base-commit: 64bd4641310c41a1ecf07c13c67bc0ed61045dfd
--
2.41.0.255.g8b1d071c50-goog
Hi,
There seems to be a problem with net/forwarding line of 6.5-rc2 kselftests,
vanilla Torvalds tree, commit fdf0eaf11452, on Ubuntu 22.04 LTS Jammy Jellyfish.
(Confirmed on Ubuntu 22.10 Kinetic Kudu.)
Tests fail with error message:
Command line is not complete. Try option "help"
Failed to create netif
The script
# tools/testing/seltests/net/forwarding/bridge_igmp.sh
bash `set -x` ends with an error:
++ create_netif_veth
++ local i
++ (( i = 1 ))
++ (( i <= NUM_NETIFS ))
++ local j=2
++ ip link show dev
++ [[ 255 -ne 0 ]]
++ ip link add type veth peer name
Command line is not complete. Try option "help"
++ [[ 255 -ne 0 ]]
++ echo 'Failed to create netif'
Failed to create netif
++ exit 1
The problem seems to be linked with this piece of code of "lib.sh":
create_netif_veth()
{
local i
for ((i = 1; i <= NUM_NETIFS; ++i)); do
local j=$((i+1))
ip link show dev ${NETIFS[p$i]} &> /dev/null
if [[ $? -ne 0 ]]; then
ip link add ${NETIFS[p$i]} type veth \
peer name ${NETIFS[p$j]}
if [[ $? -ne 0 ]]; then
echo "Failed to create netif"
exit 1
fi
fi
i=$j
done
}
Somehow, ${NETIFS[p$i]} is evaluated to an empty string?
However, I can't seem to see what is the expected result.
The problem was confirmed in the backlogs of 6.5-rc1 and 6.4 kselftests.
It is possible that I'm doing something terribly wrong, but it is basically
the default kselftest suite on a rather minimal Ubuntu.
Please find attached the bash output from `set -x`.
Version of iproute2 is:
ii iproute2 5.15.0-1ubuntu2 amd64 networking and traffic control tools
Hope this helps.
Best regards,
Mirsad Todorovac
Thank you Michał.
On 7/21/23 12:28 AM, Michał Mirosław wrote:
> b. rename match "flags" to 'page categories' everywhere - this makes
> it easier to differentiate the ioctl()s categorisation of pages
> from struct page flags;
> c. change {required + excluded} to {inverted + required}. This was
> rejected before, but I'd like to illustrate the difference.
> Old interface can be translated to the new by:
> categories_inverted = excluded_mask
> categories_mask = required_mask | excluded_mask
> categories_anyof_mask = anyof_mask
> The new way allows filtering by: A & (B | !C)
> categories_inverted = C
> categories_mask = A
> categories_anyof_mask = B | C
Andrei and Danylo,
Are you okay with these masks? It were you two who had proposed these.
--
BR,
Muhammad Usama Anjum
This series demonstrates how KTAP output can be used by nolibc-test to
make the test results better to read for people and machines.
Especially when running multiple invocations for different architectors
or build configurations we can make use of the kernels TAP parser to
automatically provide aggregated test reports.
The code is very hacky and incomplete and mostly meant to validate if
the output format is useful.
Start with the last patch of the series to actually see the generated
format, or run it for yourself.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (7):
selftests/nolibc: statically calculate number of testsuites
selftests/nolibc: use unsigned indices for testcases
selftests/nolibc: replace repetitive test structure with macro
selftests/nolibc: count subtests
kselftest: support KTAP format
kselftest: support skipping tests with testname
selftests/nolibc: proof of concept for TAP output
tools/testing/selftests/kselftest.h | 20 +++
tools/testing/selftests/nolibc/nolibc-test.c | 197 ++++++++++--------------
tools/testing/selftests/nolibc/run-all-tests.sh | 22 +++
3 files changed, 127 insertions(+), 112 deletions(-)
---
base-commit: dfef4fc45d5713eb23d87f0863aff9c33bd4bfaf
change-id: 20230718-nolibc-ktap-tmp-4408f505408d
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Hi Michał,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.5-rc2 next-20230721]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Micha-Miros-aw/Re-fs-proc-ta…
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/a0b5c6776b2ed91f78a7575649f8b100e58bd3a9.16898810…
patch subject: Re: fs/proc/task_mmu: Implement IOCTL for efficient page table scanning
config: powerpc-randconfig-r015-20230720 (https://download.01.org/0day-ci/archive/20230721/202307211507.xOl45LiR-lkp@…)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce: (https://download.01.org/0day-ci/archive/20230721/202307211507.xOl45LiR-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202307211507.xOl45LiR-lkp@intel.com/
All errors (new ones prefixed by >>):
>> fs/proc/task_mmu.c:1921:6: error: call to undeclared function 'userfaultfd_wp_async'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
1921 | if (userfaultfd_wp_async(vma) && userfaultfd_wp_use_markers(vma))
| ^
>> fs/proc/task_mmu.c:2200:12: error: call to undeclared function 'uffd_wp_range'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
2200 | int err = uffd_wp_range(vma, addr, end - addr, true);
| ^
2 errors generated.
vim +/userfaultfd_wp_async +1921 fs/proc/task_mmu.c
1913
1914 static int pagemap_scan_test_walk(unsigned long start, unsigned long end,
1915 struct mm_walk *walk)
1916 {
1917 struct pagemap_scan_private *p = walk->private;
1918 struct vm_area_struct *vma = walk->vma;
1919 unsigned long vma_category = 0;
1920
> 1921 if (userfaultfd_wp_async(vma) && userfaultfd_wp_use_markers(vma))
1922 vma_category |= PAGE_IS_WPASYNC;
1923 else if (p->arg.flags & PM_SCAN_CHECK_WPASYNC)
1924 return -EPERM;
1925
1926 if (vma->vm_flags & VM_PFNMAP)
1927 return 1;
1928
1929 if (!pagemap_scan_is_interesting_vma(vma_category, p))
1930 return 1;
1931
1932 p->cur_vma_category = vma_category;
1933 return 0;
1934 }
1935
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Hi Michał,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.5-rc2 next-20230720]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Micha-Miros-aw/Re-fs-proc-ta…
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/a0b5c6776b2ed91f78a7575649f8b100e58bd3a9.16898810…
patch subject: Re: fs/proc/task_mmu: Implement IOCTL for efficient page table scanning
config: i386-randconfig-r022-20230720 (https://download.01.org/0day-ci/archive/20230721/202307211337.5dwCMeHb-lkp@…)
compiler: clang version 15.0.7 (https://github.com/llvm/llvm-project.git 8dfdcc7b7bf66834a761bd8de445840ef68e4d1a)
reproduce: (https://download.01.org/0day-ci/archive/20230721/202307211337.5dwCMeHb-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202307211337.5dwCMeHb-lkp@intel.com/
All errors (new ones prefixed by >>):
>> fs/proc/task_mmu.c:1921:6: error: call to undeclared function 'userfaultfd_wp_async'; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration]
if (userfaultfd_wp_async(vma) && userfaultfd_wp_use_markers(vma))
^
>> fs/proc/task_mmu.c:2200:12: error: call to undeclared function 'uffd_wp_range'; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration]
int err = uffd_wp_range(vma, addr, end - addr, true);
^
2 errors generated.
vim +/userfaultfd_wp_async +1921 fs/proc/task_mmu.c
1913
1914 static int pagemap_scan_test_walk(unsigned long start, unsigned long end,
1915 struct mm_walk *walk)
1916 {
1917 struct pagemap_scan_private *p = walk->private;
1918 struct vm_area_struct *vma = walk->vma;
1919 unsigned long vma_category = 0;
1920
> 1921 if (userfaultfd_wp_async(vma) && userfaultfd_wp_use_markers(vma))
1922 vma_category |= PAGE_IS_WPASYNC;
1923 else if (p->arg.flags & PM_SCAN_CHECK_WPASYNC)
1924 return -EPERM;
1925
1926 if (vma->vm_flags & VM_PFNMAP)
1927 return 1;
1928
1929 if (!pagemap_scan_is_interesting_vma(vma_category, p))
1930 return 1;
1931
1932 p->cur_vma_category = vma_category;
1933 return 0;
1934 }
1935
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
In order to select a nexthop for multipath routes, fib_select_multipath()
is used with legacy nexthops and nexthop_select_path_hthr() is used with
nexthop objects. Those two functions perform a validity test on the
neighbor related to each nexthop but their logic is structured differently.
This causes a divergence in behavior and nexthop_select_path_hthr() may
return a nexthop that failed the neighbor validity test even if there was
one that passed.
Refactor nexthop_select_path_hthr() to make it more similar to
fib_select_multipath() and fix the problem mentioned above.
v2:
Removed unnecessary "first" variable in "nexthop: Do not return invalid
nexthop object during multipath selection".
v1:
https://lore.kernel.org/netdev/20230529201914.69828-1-bpoirier@nvidia.com/
---
Benjamin Poirier (4):
nexthop: Factor out hash threshold fdb nexthop selection
nexthop: Factor out neighbor validity check
nexthop: Do not return invalid nexthop object during multipath selection
selftests: net: Add test cases for nexthop groups with invalid neighbors
net/ipv4/nexthop.c | 61 +++++++++----
tools/testing/selftests/net/fib_nexthops.sh | 129 ++++++++++++++++++++++++++++
2 files changed, 171 insertions(+), 19 deletions(-)
---
base-commit: 36395b2efe905650cd179d67411ffee3b770268b
change-id: 20230719-nh_select-0303d55a1fb0
Best regards,
--
Benjamin Poirier <bpoirier(a)nvidia.com>
Hi Michał,
kernel test robot noticed the following build errors:
[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.5-rc2 next-20230720]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Micha-Miros-aw/Re-fs-proc-ta…
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/a0b5c6776b2ed91f78a7575649f8b100e58bd3a9.16898810…
patch subject: Re: fs/proc/task_mmu: Implement IOCTL for efficient page table scanning
config: arc-randconfig-r035-20230720 (https://download.01.org/0day-ci/archive/20230721/202307211030.2CJH6TkM-lkp@…)
compiler: arceb-elf-gcc (GCC) 12.3.0
reproduce: (https://download.01.org/0day-ci/archive/20230721/202307211030.2CJH6TkM-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202307211030.2CJH6TkM-lkp@intel.com/
All errors (new ones prefixed by >>):
fs/proc/task_mmu.c: In function 'pagemap_scan_test_walk':
fs/proc/task_mmu.c:1921:13: error: implicit declaration of function 'userfaultfd_wp_async'; did you mean 'userfaultfd_wp'? [-Werror=implicit-function-declaration]
1921 | if (userfaultfd_wp_async(vma) && userfaultfd_wp_use_markers(vma))
| ^~~~~~~~~~~~~~~~~~~~
| userfaultfd_wp
fs/proc/task_mmu.c: In function 'pagemap_scan_pte_hole':
>> fs/proc/task_mmu.c:2200:19: error: implicit declaration of function 'uffd_wp_range' [-Werror=implicit-function-declaration]
2200 | int err = uffd_wp_range(vma, addr, end - addr, true);
| ^~~~~~~~~~~~~
fs/proc/task_mmu.c: In function 'pagemap_scan_init_bounce_buffer':
fs/proc/task_mmu.c:2290:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
2290 | p->vec_out = (void __user *)p->arg.vec;
| ^
fs/proc/task_mmu.c: At top level:
fs/proc/task_mmu.c:1967:13: warning: 'pagemap_scan_backout_range' defined but not used [-Wunused-function]
1967 | static void pagemap_scan_backout_range(struct pagemap_scan_private *p,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
vim +/uffd_wp_range +2200 fs/proc/task_mmu.c
2182
2183 static int pagemap_scan_pte_hole(unsigned long addr, unsigned long end,
2184 int depth, struct mm_walk *walk)
2185 {
2186 struct pagemap_scan_private *p = walk->private;
2187 struct vm_area_struct *vma = walk->vma;
2188 int ret;
2189
2190 if (!vma || !pagemap_scan_is_interesting_page(p->cur_vma_category, p))
2191 return 0;
2192
2193 ret = pagemap_scan_output(p->cur_vma_category, p, addr, &end);
2194 if (addr == end)
2195 return ret;
2196
2197 if (~p->arg.flags & PM_SCAN_WP_MATCHING)
2198 return ret;
2199
> 2200 int err = uffd_wp_range(vma, addr, end - addr, true);
2201 if (err < 0)
2202 ret = err;
2203
2204 return ret;
2205 }
2206
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
Hi Michał,
kernel test robot noticed the following build warnings:
[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on linus/master v6.5-rc2 next-20230720]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Micha-Miros-aw/Re-fs-proc-ta…
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/a0b5c6776b2ed91f78a7575649f8b100e58bd3a9.16898810…
patch subject: Re: fs/proc/task_mmu: Implement IOCTL for efficient page table scanning
config: mips-allyesconfig (https://download.01.org/0day-ci/archive/20230721/202307210528.2qgK1vwi-lkp@…)
compiler: mips-linux-gcc (GCC) 12.3.0
reproduce: (https://download.01.org/0day-ci/archive/20230721/202307210528.2qgK1vwi-lkp@…)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp(a)intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202307210528.2qgK1vwi-lkp@intel.com/
All warnings (new ones prefixed by >>):
fs/proc/task_mmu.c: In function 'pagemap_scan_test_walk':
fs/proc/task_mmu.c:1921:13: error: implicit declaration of function 'userfaultfd_wp_async'; did you mean 'userfaultfd_wp'? [-Werror=implicit-function-declaration]
1921 | if (userfaultfd_wp_async(vma) && userfaultfd_wp_use_markers(vma))
| ^~~~~~~~~~~~~~~~~~~~
| userfaultfd_wp
fs/proc/task_mmu.c: In function 'pagemap_scan_init_bounce_buffer':
>> fs/proc/task_mmu.c:2290:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
2290 | p->vec_out = (void __user *)p->arg.vec;
| ^
fs/proc/task_mmu.c: At top level:
fs/proc/task_mmu.c:1967:13: warning: 'pagemap_scan_backout_range' defined but not used [-Wunused-function]
1967 | static void pagemap_scan_backout_range(struct pagemap_scan_private *p,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
vim +2290 fs/proc/task_mmu.c
2264
2265 static int pagemap_scan_init_bounce_buffer(struct pagemap_scan_private *p)
2266 {
2267 if (!p->arg.vec_len) {
2268 /*
2269 * An arbitrary non-page-aligned sentinel value for
2270 * pagemap_scan_push_range().
2271 */
2272 p->cur_buf.start = p->cur_buf.end = ULLONG_MAX;
2273 return 0;
2274 }
2275
2276 /*
2277 * Allocate a smaller buffer to get output from inside the page
2278 * walk functions and walk the range in PAGEMAP_WALK_SIZE chunks.
2279 * The last range is always stored in p.cur_buf to allow coalescing
2280 * consecutive ranges that have the same categories returned across
2281 * walk_page_range() calls.
2282 */
2283 p->vec_buf_len = min_t(size_t, PAGEMAP_WALK_SIZE >> PAGE_SHIFT,
2284 p->arg.vec_len - 1);
2285 p->vec_buf = kmalloc_array(p->vec_buf_len, sizeof(*p->vec_buf),
2286 GFP_KERNEL);
2287 if (!p->vec_buf)
2288 return -ENOMEM;
2289
> 2290 p->vec_out = (void __user *)p->arg.vec;
2291
2292 return 0;
2293 }
2294
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
=== Context ===
In the context of a middlebox, fragmented packets are tricky to handle.
The full 5-tuple of a packet is often only available in the first
fragment which makes enforcing consistent policy difficult. There are
really only two stateless options, neither of which are very nice:
1. Enforce policy on first fragment and accept all subsequent fragments.
This works but may let in certain attacks or allow data exfiltration.
2. Enforce policy on first fragment and drop all subsequent fragments.
This does not really work b/c some protocols may rely on
fragmentation. For example, DNS may rely on oversized UDP packets for
large responses.
So stateful tracking is the only sane option. RFC 8900 [0] calls this
out as well in section 6.3:
Middleboxes [...] should process IP fragments in a manner that is
consistent with [RFC0791] and [RFC8200]. In many cases, middleboxes
must maintain state in order to achieve this goal.
=== BPF related bits ===
Policy has traditionally been enforced from XDP/TC hooks. Both hooks
run before kernel reassembly facilities. However, with the new
BPF_PROG_TYPE_NETFILTER, we can rather easily hook into existing
netfilter reassembly infra.
The basic idea is we bump a refcnt on the netfilter defrag module and
then run the bpf prog after the defrag module runs. This allows bpf
progs to transparently see full, reassembled packets. The nice thing
about this is that progs don't have to carry around logic to detect
fragments.
=== Changelog ===
Changes from v4:
* Refactor module handling code to not sleep in rcu_read_lock()
* Also unify the v4 and v6 hook structs so they can share codepaths
* Fixed some checkpatch.pl formatting warnings
Changes from v3:
* Correctly initialize `addrlen` stack var for recvmsg()
Changes from v2:
* module_put() if ->enable() fails
* Fix CI build errors
Changes from v1:
* Drop bpf_program__attach_netfilter() patches
* static -> static const where appropriate
* Fix callback assignment order during registration
* Only request_module() if callbacks are missing
* Fix retval when modprobe fails in userspace
* Fix v6 defrag module name (nf_defrag_ipv6_hooks -> nf_defrag_ipv6)
* Simplify priority checking code
* Add warning if module doesn't assign callbacks in the future
* Take refcnt on module while defrag link is active
[0]: https://datatracker.ietf.org/doc/html/rfc8900
Daniel Xu (5):
netfilter: defrag: Add glue hooks for enabling/disabling defrag
netfilter: bpf: Support BPF_F_NETFILTER_IP_DEFRAG in netfilter link
bpf: selftests: Support not connecting client socket
bpf: selftests: Support custom type and proto for client sockets
bpf: selftests: Add defrag selftests
include/linux/netfilter.h | 10 +
include/uapi/linux/bpf.h | 5 +
net/ipv4/netfilter/nf_defrag_ipv4.c | 17 +-
net/ipv6/netfilter/nf_defrag_ipv6_hooks.c | 11 +
net/netfilter/core.c | 6 +
net/netfilter/nf_bpf_link.c | 116 ++++++-
tools/include/uapi/linux/bpf.h | 5 +
tools/testing/selftests/bpf/Makefile | 4 +-
.../selftests/bpf/generate_udp_fragments.py | 90 ++++++
.../selftests/bpf/ip_check_defrag_frags.h | 57 ++++
tools/testing/selftests/bpf/network_helpers.c | 26 +-
tools/testing/selftests/bpf/network_helpers.h | 3 +
.../bpf/prog_tests/ip_check_defrag.c | 283 ++++++++++++++++++
.../selftests/bpf/progs/ip_check_defrag.c | 104 +++++++
14 files changed, 715 insertions(+), 22 deletions(-)
create mode 100755 tools/testing/selftests/bpf/generate_udp_fragments.py
create mode 100644 tools/testing/selftests/bpf/ip_check_defrag_frags.h
create mode 100644 tools/testing/selftests/bpf/prog_tests/ip_check_defrag.c
create mode 100644 tools/testing/selftests/bpf/progs/ip_check_defrag.c
--
2.41.0
On Thu, Jul 20, 2023 at 09:28:52PM +0200, Michał Mirosław wrote:
> This is a massaged version of patch by Muhammad Usama Anjum [1]
> to illustrate my review comments and hopefully push the implementation
> efforts closer to conclusion. The changes are:
[...]
> +static void pagemap_scan_backout_range(struct pagemap_scan_private *p,
> + unsigned long addr, unsigned long end)
> +{
> + struct page_region *cur_buf = &p->cur_buf;
> +
> + if (cur_buf->start != addr) {
> + cur_buf->end = addr;
> + } else {
> + cur_buf->start = cur_buf->end = 0;
> + }
> +
> + p->end_addr = 0;
Just noticed that this is missing:
p->found_pages -= (end - addr) / PAGE_SIZE;
> +}
[...]
Best Regards
Michał Mirosław
v1: https://lore.kernel.org/rust-for-linux/20230614180837.630180-1-ojeda@kernel…
v2:
- Rebased on top of v6.5-rc1, which requires a change from
`kunit_do_failed_assertion` to `__kunit_do_failed_assertion` (since
the former became a macro) and the addition of a call to
`__kunit_abort` (since previously the call was done by the old
function which we cannot use anymore since it is a macro). (David)
- Added prerequisite patch to KUnit header to include `stddef.h` to
support the `KUNIT=y` case. (Reported by Boqun)
- Added comment on the purpose of `trait FromErrno`. (Martin asked
about it)
- Simplify code to use `std::fs::write` instead of `write!`, which
improves code size too. (Suggested by Alice)
- Fix copy-paste type in docs from "error" to "info" and, to make it
proper English, copy the `printk` docs style, i.e. from "info"
to "info-level message" -- and same for the "error" one. (Miguel)
- Swap `FILE` and `LINE` `static`s to keep the same order as done
elsewhere. (Miguel)
- Rename config option from `RUST_KERNEL_KUNIT_TEST` to
`RUST_KERNEL_DOCTESTS` (and update its title), so that we can use
the former for the "normal" (i.e. non-doctests, e.g. `#[test]` ones)
tests in the future. (David)
- Follow the syntax proposed for declaring test metadata in the KTAP
v2 spec, which may also get used for the KUnit test attributes API.
Thus, instead of "# Doctest from line {line}", use
"# {test_name}.location: {file}.{line}", which ideally will allow to
migrate to a KUnit attribute later.
This is done in all cases, i.e. when the tests succeeds, because
it may be useful for users running KUnit manually, or when passing
`--raw_output` to `kunit.py`. (David)
David: I used "location" instead of your suggested "line" alone, in
order to have both in a single line, which looked nice and closer to
the "ASSERTION FAILURE" case/line, since now we do have the original
file (please see below).
- Figure out the original line. This is done by deploying an anchor
so that the difference in lines between the beginning of the test
and the assert, in the generated file, can be computed. Then, we
offset the line number of the original test, which is given by
`rustdoc`. (developed by Boqun)
- Figure out the original file. This is done by walking the
filesystem, checking directory prefixes to reduce the amount of
combinations to check, and it is only done once per file (rather
than per test).
Ambiguities are detected and reported. It does limit the filenames
(module names) we can use, but it is unlikely we will hit it soon
and this should be temporary anyway until `rustdoc` provides us
with the real path. (Miguel)
Tested with both in-tree and `O=` builds, but I would appreciate
extra testing on this one, including via the `kunit.py` script.
- The three last items combined means that we now see this output even
for successful cases:
# rust_doctest_kernel_sync_locked_by_rs_0.location: rust/kernel/sync/locked_by.rs:28
ok 53 rust_doctest_kernel_sync_locked_by_rs_0
Which basically gives the user all the information they need to go
back to the source code of the doctest, while keeping them fairly
stable for bisection, and while avoiding to require users to write
test names manually. (David + Boqun + Miguel)
David: from what I saw in v2 of the RFC for the test attributes API,
the syntax still contains the test name when it is not a suite, so
I followed that, but if you prefer to omit it, please feel free to
do so (for me either way it is fine, and if this is the expected
attribute syntax, I guess it is worth to follow it to make migration
easier later on):
# location: rust/kernel/sync/locked_by.rs:28
ok 53 rust_doctest_kernel_sync_locked_by_rs_0
- Collected `Reviewed-by`s and `Tested-by`s, except for the main
commit due to the substantial changes.
Miguel Ojeda (7):
kunit: test-bug.h: include `stddef.h` for `NULL`
rust: init: make doctests compilable/testable
rust: str: make doctests compilable/testable
rust: sync: make doctests compilable/testable
rust: types: make doctests compilable/testable
rust: support running Rust documentation tests as KUnit ones
MAINTAINERS: add Rust KUnit files to the KUnit entry
MAINTAINERS | 2 +
include/kunit/test-bug.h | 2 +
lib/Kconfig.debug | 13 ++
rust/.gitignore | 2 +
rust/Makefile | 29 ++++
rust/bindings/bindings_helper.h | 1 +
rust/helpers.c | 7 +
rust/kernel/init.rs | 26 +--
rust/kernel/kunit.rs | 163 +++++++++++++++++++
rust/kernel/lib.rs | 2 +
rust/kernel/str.rs | 4 +-
rust/kernel/sync/arc.rs | 9 +-
rust/kernel/sync/lock/mutex.rs | 1 +
rust/kernel/sync/lock/spinlock.rs | 1 +
rust/kernel/types.rs | 6 +-
scripts/.gitignore | 2 +
scripts/Makefile | 4 +
scripts/rustdoc_test_builder.rs | 72 +++++++++
scripts/rustdoc_test_gen.rs | 260 ++++++++++++++++++++++++++++++
19 files changed, 591 insertions(+), 15 deletions(-)
create mode 100644 rust/kernel/kunit.rs
create mode 100644 scripts/rustdoc_test_builder.rs
create mode 100644 scripts/rustdoc_test_gen.rs
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
--
2.41.0
This series fixes an issue which David Spickett found where if we change
the SVE VL while SME is in use we can end up attempting to save state to
an unallocated buffer and adds testing coverage for that plus a bit more
coverage of VL changes, just for paranioa.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (3):
arm64/fpsimd: Ensure SME storage is allocated after SVE VL changes
kselftest/arm64: Add a test case for SVE VL changes with SME active
kselftest/arm64: Validate that changing one VL type does not affect another
arch/arm64/kernel/fpsimd.c | 32 +++++--
tools/testing/selftests/arm64/fp/vec-syscfg.c | 127 +++++++++++++++++++++++++-
2 files changed, 148 insertions(+), 11 deletions(-)
---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230713-arm64-fix-sve-sme-vl-change-60eb1fa6a707
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Hi,
This follows the discussion here:
https://lore.kernel.org/linux-kselftest/20230324123157.bbwvfq4gsxnlnfwb@hou…
This shows a couple of inconsistencies with regard to how device-managed
resources are cleaned up. Basically, devm resources will only be cleaned up
if the device is attached to a bus and bound to a driver. Failing any of
these cases, a call to device_unregister will not end up in the devm
resources being released.
We had to work around it in DRM to provide helpers to create a device for
kunit tests, but the current discussion around creating similar, generic,
helpers for kunit resumed interest in fixing this.
This can be tested using the command:
./tools/testing/kunit/kunit.py run --kunitconfig=drivers/base/test/
I added the fix David suggested back in that discussion which does fix
the tests. The SoB is missing, since David didn't provide it back then.
Let me know what you think,
Maxime
Signed-off-by: Maxime Ripard <maxime(a)cerno.tech>
---
Changes in v2:
- Use an init function
- Document the tests
- Add a fix for the bugs
- Link to v1: https://lore.kernel.org/r/20230329-kunit-devm-inconsistencies-test-v1-0-c33…
---
David Gow (1):
drivers: base: Free devm resources when unregistering a device
Maxime Ripard (2):
drivers: base: Add basic devm tests for root devices
drivers: base: Add basic devm tests for platform devices
drivers/base/core.c | 11 ++
drivers/base/test/.kunitconfig | 2 +
drivers/base/test/Kconfig | 4 +
drivers/base/test/Makefile | 3 +
drivers/base/test/platform-device-test.c | 220 +++++++++++++++++++++++++++++++
drivers/base/test/root-device-test.c | 108 +++++++++++++++
6 files changed, 348 insertions(+)
---
base-commit: 53cdf865f90ba922a854c65ed05b519f9d728424
change-id: 20230329-kunit-devm-inconsistencies-test-5e5a7d01e60d
Best regards,
--
Maxime Ripard <mripard(a)kernel.org>
Hi,
This patch series aims to improve the PMU event filter settings with a cleaner
and more organized structure and adds several test cases related to PMU event
filters.
These changes help to ensure that KVM's PMU event filter functions as expected
in all supported use cases.
Any feedback or suggestions are greatly appreciated.
Sincerely,
Jinrong Liang
Changes log:
v5:
- Add more x86 properties for Intel PMU;
- Designated initializer instead of overwrite all members; (Isaku Yamahata)
- PMU event filter invalid flag modified to "KVM_PMU_EVENT_FLAGS_VALID_MASK << 1"; (Isaku Yamahata)
- sizeof(bitmap) is modified to "sizeof(bitmap) * 8" to represent the number of
bits that can be represented by the bitmap variable. (Isaku Yamahata)
Previous:
https://lore.kernel.org/kvm/20230717062343.3743-1-cloudliang@tencent.com/T/
Jinrong Liang (6):
KVM: selftests: Add x86 properties for Intel PMU in processor.h
KVM: selftests: Drop the return of remove_event()
KVM: selftests: Introduce __kvm_pmu_event_filter to improved event
filter settings
KVM: selftests: Add test cases for unsupported PMU event filter input
values
KVM: selftests: Test if event filter meets expectations on fixed
counters
KVM: selftests: Test gp event filters don't affect fixed event filters
.../selftests/kvm/include/x86_64/processor.h | 5 +
.../kvm/x86_64/pmu_event_filter_test.c | 317 ++++++++++++------
2 files changed, 228 insertions(+), 94 deletions(-)
base-commit: 88bb466c9dec4f70d682cf38c685324e7b1b3d60
--
2.39.3
When we collect a signal context with one of the SME modes enabled we will
have enabled that mode behind the compiler and libc's back so they may
issue some instructions not valid in streaming mode, causing spurious
failures.
For the code prior to issuing the BRK to trigger signal handling we need to
stay in streaming mode if we were already there since that's a part of the
signal context the caller is trying to collect. Unfortunately this code
includes a memset() which is likely to be heavily optimised and is likely
to use FP instructions incompatible with streaming mode. We can avoid this
happening by open coding the memset(), inserting a volatile assembly
statement to avoid the compiler recognising what's being done and doing
something in optimisation. This code is not performance critical so the
inefficiency should not be an issue.
After collecting the context we can simply exit streaming mode, avoiding
these issues. Use a full SMSTOP for safety to prevent any issues appearing
with ZA.
Reported-by: Will Deacon <will(a)kernel.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto v6.5-rc1.
- Link to v1: https://lore.kernel.org/r/20230628-arm64-signal-memcpy-fix-v1-1-db3e0300829…
---
.../selftests/arm64/signal/test_signals_utils.h | 28 +++++++++++++++++++++-
1 file changed, 27 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/signal/test_signals_utils.h b/tools/testing/selftests/arm64/signal/test_signals_utils.h
index 222093f51b67..db28409fd44b 100644
--- a/tools/testing/selftests/arm64/signal/test_signals_utils.h
+++ b/tools/testing/selftests/arm64/signal/test_signals_utils.h
@@ -60,13 +60,28 @@ static __always_inline bool get_current_context(struct tdescr *td,
size_t dest_sz)
{
static volatile bool seen_already;
+ int i;
+ char *uc = (char *)dest_uc;
assert(td && dest_uc);
/* it's a genuine invocation..reinit */
seen_already = 0;
td->live_uc_valid = 0;
td->live_sz = dest_sz;
- memset(dest_uc, 0x00, td->live_sz);
+
+ /*
+ * This is a memset() but we don't want the compiler to
+ * optimise it into either instructions or a library call
+ * which might be incompatible with streaming mode.
+ */
+ for (i = 0; i < td->live_sz; i++) {
+ asm volatile("nop"
+ : "+m" (*dest_uc)
+ :
+ : "memory");
+ uc[i] = 0;
+ }
+
td->live_uc = dest_uc;
/*
* Grab ucontext_t triggering a SIGTRAP.
@@ -103,6 +118,17 @@ static __always_inline bool get_current_context(struct tdescr *td,
:
: "memory");
+ /*
+ * If we were grabbing a streaming mode context then we may
+ * have entered streaming mode behind the system's back and
+ * libc or compiler generated code might decide to do
+ * something invalid in streaming mode, or potentially even
+ * the state of ZA. Issue a SMSTOP to exit both now we have
+ * grabbed the state.
+ */
+ if (td->feats_supported & FEAT_SME)
+ asm volatile("msr S0_3_C4_C6_3, xzr");
+
/*
* If we get here with seen_already==1 it implies the td->live_uc
* context has been used to get back here....this probably means
---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230628-arm64-signal-memcpy-fix-7de3b3c8fa10
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Hi All,
This is v2 of my series to clean up mm selftests so that they run correctly on
arm64. See [1] for full explanation.
Changes Since v1 [1]
--------------------
- Patch 1: Explicitly set line buffer mode in ksft_print_header()
- Dropped v1 patch 2 (set execute permissions): Andrew has taken this into his
branch separately.
- Patch 2: Don't compile `soft-dirty` suite for arm64 instead of skipping it
at runtime.
- Patch 2: Declare fewer tests and skip all of test_softdirty() if soft-dirty
is not supported, rather than conditionally marking each check as skipped.
- Added Reviewed-by tags: thanks DavidH!
- Patch 8: Clarified commit message.
[1] https://lore.kernel.org/linux-mm/20230713135440.3651409-1-ryan.roberts@arm.…
Thanks,
Ryan
Ryan Roberts (8):
selftests: Line buffer test program's stdout
selftests/mm: Skip soft-dirty tests on arm64
selftests/mm: Enable mrelease_test for arm64
selftests/mm: Fix thuge-gen test bugs
selftests/mm: va_high_addr_switch should skip unsupported arm64
configs
selftests/mm: Make migration test robust to failure
selftests/mm: Optionally pass duration to transhuge-stress
selftests/mm: Run all tests from run_vmtests.sh
tools/testing/selftests/kselftest.h | 9 ++
tools/testing/selftests/kselftest/runner.sh | 7 +-
tools/testing/selftests/mm/Makefile | 82 ++++++++++---------
tools/testing/selftests/mm/madv_populate.c | 26 +++++-
tools/testing/selftests/mm/migration.c | 14 +++-
tools/testing/selftests/mm/mrelease_test.c | 1 +
tools/testing/selftests/mm/run_vmtests.sh | 28 ++++++-
tools/testing/selftests/mm/settings | 2 +-
tools/testing/selftests/mm/thuge-gen.c | 4 +-
tools/testing/selftests/mm/transhuge-stress.c | 12 ++-
.../selftests/mm/va_high_addr_switch.c | 2 +-
11 files changed, 133 insertions(+), 54 deletions(-)
--
2.25.1
Hi,
Consequential to the previous problem report, this one addresses almost the very
next test script.
The testing environment is the same: 6.5-rc2 vanilla Torvalds tree on Ubuntu 22.04 LTS.
The used config is the same, please find it with the bridge_mdb.sh normal and "set -x"
output on this link (too large to attach):
https://domac.alu.unizg.hr/~mtodorov/linux/selftests/net-forwarding/bridge_…
root@defiant:# ./bridge_mdb.sh
INFO: # Host entries configuration tests
TEST: Common host entries configuration tests (IPv4) [FAIL]
Managed to add IPv4 host entry with a filter mode
TEST: Common host entries configuration tests (IPv6) [FAIL]
Managed to add IPv6 host entry with a filter mode
TEST: Common host entries configuration tests (L2) [FAIL]
Managed to add L2 host entry with a filter mode
INFO: # Port group entries configuration tests - (*, G)
Command "replace" is unknown, try "bridge mdb help".
TEST: Common port group entries configuration tests (IPv4 (*, G)) [FAIL]
Failed to replace IPv4 (*, G) entry
Command "replace" is unknown, try "bridge mdb help".
TEST: Common port group entries configuration tests (IPv6 (*, G)) [FAIL]
Failed to replace IPv6 (*, G) entry
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
RTNETLINK answers: Invalid argument
Error: bridge: (*, G) group is already joined by port.
Error: bridge: (*, G) group is already joined by port.
TEST: IPv4 (*, G) port group entries configuration tests [FAIL]
(S, G) entry not created
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
RTNETLINK answers: Invalid argument
Error: bridge: (*, G) group is already joined by port.
Error: bridge: (*, G) group is already joined by port.
TEST: IPv6 (*, G) port group entries configuration tests [FAIL]
(S, G) entry not created
INFO: # Port group entries configuration tests - (S, G)
Command "replace" is unknown, try "bridge mdb help".
TEST: Common port group entries configuration tests (IPv4 (S, G)) [FAIL]
Failed to replace IPv4 (S, G) entry
Command "replace" is unknown, try "bridge mdb help".
TEST: Common port group entries configuration tests (IPv6 (S, G)) [FAIL]
Failed to replace IPv6 (S, G) entry
Error: bridge: (S, G) group is already joined by port.
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
TEST: IPv4 (S, G) port group entries configuration tests [FAIL]
Managed to add an entry with a filter mode
Error: bridge: (S, G) group is already joined by port.
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
Command "replace" is unknown, try "bridge mdb help".
TEST: IPv6 (S, G) port group entries configuration tests [FAIL]
"temp" entry has an unpending group timer
INFO: # Port group entries configuration tests - L2
Command "replace" is unknown, try "bridge mdb help".
TEST: Common port group entries configuration tests (L2 (*, G)) [FAIL]
Failed to replace L2 (*, G) entry
TEST: L2 (*, G) port group entries configuration tests [FAIL]
Managed to add an entry with a filter mode
INFO: # Large scale dump tests
TEST: IPv4 large scale dump tests [ OK ]
TEST: IPv6 large scale dump tests [ OK ]
TEST: L2 large scale dump tests [ OK ]
INFO: # Forwarding tests
Error: bridge: Group is already joined by host.
TEST: IPv4 host entries forwarding tests [FAIL]
Packet not locally received after adding a host entry
Error: bridge: Group is already joined by host.
TEST: IPv6 host entries forwarding tests [FAIL]
Packet locally received after flood
TEST: L2 host entries forwarding tests [FAIL]
Packet not locally received after flood
Command "replace" is unknown, try "bridge mdb help".
TEST: IPv4 port group "exclude" entries forwarding tests [FAIL]
Packet from valid source not received on H2 after adding entry
Command "replace" is unknown, try "bridge mdb help".
TEST: IPv6 port group "exclude" entries forwarding tests [FAIL]
Packet from invalid source received on H2 after adding entry
Command "replace" is unknown, try "bridge mdb help".
TEST: IPv4 port group "include" entries forwarding tests [FAIL]
Packet from valid source not received on H2 after adding entry
Command "replace" is unknown, try "bridge mdb help".
TEST: IPv6 port group "include" entries forwarding tests [FAIL]
Packet from invalid source received on H2 after adding entry
TEST: L2 port entries forwarding tests [ OK ]
INFO: # Control packets tests
Command "replace" is unknown, try "bridge mdb help".
TEST: IGMPv3 MODE_IS_INCLUDE tests [FAIL]
Source not add to source list
Command "replace" is unknown, try "bridge mdb help".
TEST: MLDv2 MODE_IS_INCLUDE tests [FAIL]
Source not add to source list
root@defiant:# bridge mdb show
root@defiant:#
NOTE that several "sleep 10" command looped in the script can easily exceed
the default timeout of 45 seconds, and SIGTERM to the script isn't processed,
so it leaves the system in an unpredictable state from which even
"systemctl restart networking" didn't bail out.
Setting tools/testing/selftests/net/forwarding/settings:timeout=150 seemed enough.
Best regards,
Mirsad Todorovac
This series is a follow up to the recent change [1] which added
per-cpu insert/delete statistics for maps. The bpf_map_sum_elem_count
kfunc presented in the original series was only available to tracing
programs, so let's make it available to all.
The first patch makes types listed in the reg2btf_ids[] array to be
considered trusted by kfuncs.
The second patch allows to treat CONST_PTR_TO_MAP as trusted pointers from
kfunc's point of view by adding it to the reg2btf_ids[] array.
The third patch adds missing const to the map argument of the
bpf_map_sum_elem_count kfunc.
The fourth patch registers the bpf_map_sum_elem_count for all programs,
and patches selftests correspondingly.
[1] https://lore.kernel.org/bpf/20230705160139.19967-1-aspsk@isovalent.com/
v1 -> v2:
* treat the whole reg2btf_ids array as trusted (Alexei)
Anton Protopopov (4):
bpf: consider types listed in reg2btf_ids as trusted
bpf: consider CONST_PTR_TO_MAP as trusted pointer to struct bpf_map
bpf: make an argument const in the bpf_map_sum_elem_count kfunc
bpf: allow any program to use the bpf_map_sum_elem_count kfunc
include/linux/btf_ids.h | 1 +
kernel/bpf/map_iter.c | 7 +++---
kernel/bpf/verifier.c | 22 +++++++++++--------
.../selftests/bpf/progs/map_ptr_kern.c | 5 +++++
4 files changed, 22 insertions(+), 13 deletions(-)
--
2.34.1
bpf_dynptr_slice(_rw) uses a user provided buffer if it can not provide
a pointer to a block of contiguous memory. This buffer is unused in the
case of local dynptrs, and may be unused in other cases as well. There
is no need to require the buffer, as the kfunc can just return NULL if
it was needed and not provided.
This adds another kfunc annotation, __opt, which combines with __sz and
__szk to allow the buffer associated with the size to be NULL. If the
buffer is NULL, the verifier does not check that the buffer is of
sufficient size.
Signed-off-by: Daniel Rosenberg <drosen(a)google.com>
---
Documentation/bpf/kfuncs.rst | 23 ++++++++++++++++++++++-
include/linux/skbuff.h | 2 +-
kernel/bpf/helpers.c | 30 ++++++++++++++++++------------
kernel/bpf/verifier.c | 17 +++++++++++++----
4 files changed, 54 insertions(+), 18 deletions(-)
diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index ea2516374d92..7a3d9de5f315 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -100,7 +100,7 @@ Hence, whenever a constant scalar argument is accepted by a kfunc which is not a
size parameter, and the value of the constant matters for program safety, __k
suffix should be used.
-2.2.2 __uninit Annotation
+2.2.3 __uninit Annotation
-------------------------
This annotation is used to indicate that the argument will be treated as
@@ -117,6 +117,27 @@ Here, the dynptr will be treated as an uninitialized dynptr. Without this
annotation, the verifier will reject the program if the dynptr passed in is
not initialized.
+2.2.4 __opt Annotation
+-------------------------
+
+This annotation is used to indicate that the buffer associated with an __sz or __szk
+argument may be null. If the function is passed a nullptr in place of the buffer,
+the verifier will not check that length is appropriate for the buffer. The kfunc is
+responsible for checking if this buffer is null before using it.
+
+An example is given below::
+
+ __bpf_kfunc void *bpf_dynptr_slice(..., void *buffer__opt, u32 buffer__szk)
+ {
+ ...
+ }
+
+Here, the buffer may be null. If buffer is not null, it at least of size buffer_szk.
+Either way, the returned buffer is either NULL, or of size buffer_szk. Without this
+annotation, the verifier will reject the program if a null pointer is passed in with
+a nonzero size.
+
+
.. _BPF_kfunc_nodef:
2.3 Using an existing kernel function
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 738776ab8838..8ddb4af1a501 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -4033,7 +4033,7 @@ __skb_header_pointer(const struct sk_buff *skb, int offset, int len,
if (likely(hlen - offset >= len))
return (void *)data + offset;
- if (!skb || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
+ if (!skb || !buffer || unlikely(skb_copy_bits(skb, offset, buffer, len) < 0))
return NULL;
return buffer;
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 8d368fa353f9..26efb6fbeab2 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2167,13 +2167,15 @@ __bpf_kfunc struct task_struct *bpf_task_from_pid(s32 pid)
* bpf_dynptr_slice() - Obtain a read-only pointer to the dynptr data.
* @ptr: The dynptr whose data slice to retrieve
* @offset: Offset into the dynptr
- * @buffer: User-provided buffer to copy contents into
- * @buffer__szk: Size (in bytes) of the buffer. This is the length of the
- * requested slice. This must be a constant.
+ * @buffer__opt: User-provided buffer to copy contents into. May be NULL
+ * @buffer__szk: Size (in bytes) of the buffer if present. This is the
+ * length of the requested slice. This must be a constant.
*
* For non-skb and non-xdp type dynptrs, there is no difference between
* bpf_dynptr_slice and bpf_dynptr_data.
*
+ * If buffer__opt is NULL, the call will fail if buffer_opt was needed.
+ *
* If the intention is to write to the data slice, please use
* bpf_dynptr_slice_rdwr.
*
@@ -2190,7 +2192,7 @@ __bpf_kfunc struct task_struct *bpf_task_from_pid(s32 pid)
* direct pointer)
*/
__bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset,
- void *buffer, u32 buffer__szk)
+ void *buffer__opt, u32 buffer__szk)
{
enum bpf_dynptr_type type;
u32 len = buffer__szk;
@@ -2210,15 +2212,17 @@ __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset
case BPF_DYNPTR_TYPE_RINGBUF:
return ptr->data + ptr->offset + offset;
case BPF_DYNPTR_TYPE_SKB:
- return skb_header_pointer(ptr->data, ptr->offset + offset, len, buffer);
+ return skb_header_pointer(ptr->data, ptr->offset + offset, len, buffer__opt);
case BPF_DYNPTR_TYPE_XDP:
{
void *xdp_ptr = bpf_xdp_pointer(ptr->data, ptr->offset + offset, len);
if (xdp_ptr)
return xdp_ptr;
- bpf_xdp_copy_buf(ptr->data, ptr->offset + offset, buffer, len, false);
- return buffer;
+ if (!buffer__opt)
+ return NULL;
+ bpf_xdp_copy_buf(ptr->data, ptr->offset + offset, buffer__opt, len, false);
+ return buffer__opt;
}
default:
WARN_ONCE(true, "unknown dynptr type %d\n", type);
@@ -2230,13 +2234,15 @@ __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset
* bpf_dynptr_slice_rdwr() - Obtain a writable pointer to the dynptr data.
* @ptr: The dynptr whose data slice to retrieve
* @offset: Offset into the dynptr
- * @buffer: User-provided buffer to copy contents into
- * @buffer__szk: Size (in bytes) of the buffer. This is the length of the
- * requested slice. This must be a constant.
+ * @buffer__opt: User-provided buffer to copy contents into. May be NULL
+ * @buffer__szk: Size (in bytes) of the buffer if present. This is the
+ * length of the requested slice. This must be a constant.
*
* For non-skb and non-xdp type dynptrs, there is no difference between
* bpf_dynptr_slice and bpf_dynptr_data.
*
+ * If buffer__opt is NULL, the call will fail if buffer_opt was needed.
+ *
* The returned pointer is writable and may point to either directly the dynptr
* data at the requested offset or to the buffer if unable to obtain a direct
* data pointer to (example: the requested slice is to the paged area of an skb
@@ -2267,7 +2273,7 @@ __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr_kern *ptr, u32 offset
* direct pointer)
*/
__bpf_kfunc void *bpf_dynptr_slice_rdwr(const struct bpf_dynptr_kern *ptr, u32 offset,
- void *buffer, u32 buffer__szk)
+ void *buffer__opt, u32 buffer__szk)
{
if (!ptr->data || bpf_dynptr_is_rdonly(ptr))
return NULL;
@@ -2294,7 +2300,7 @@ __bpf_kfunc void *bpf_dynptr_slice_rdwr(const struct bpf_dynptr_kern *ptr, u32 o
* will be copied out into the buffer and the user will need to call
* bpf_dynptr_write() to commit changes.
*/
- return bpf_dynptr_slice(ptr, offset, buffer, buffer__szk);
+ return bpf_dynptr_slice(ptr, offset, buffer__opt, buffer__szk);
}
__bpf_kfunc void *bpf_cast_to_kern_ctx(void *obj)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index fbcf5a4e2fcd..708ae7bca1fe 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -9398,6 +9398,11 @@ static bool is_kfunc_arg_const_mem_size(const struct btf *btf,
return __kfunc_param_match_suffix(btf, arg, "__szk");
}
+static bool is_kfunc_arg_optional(const struct btf *btf, const struct btf_param *arg)
+{
+ return __kfunc_param_match_suffix(btf, arg, "__opt");
+}
+
static bool is_kfunc_arg_constant(const struct btf *btf, const struct btf_param *arg)
{
return __kfunc_param_match_suffix(btf, arg, "__k");
@@ -10464,13 +10469,17 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
break;
case KF_ARG_PTR_TO_MEM_SIZE:
{
+ struct bpf_reg_state *buff_reg = ®s[regno];
+ const struct btf_param *buff_arg = &args[i];
struct bpf_reg_state *size_reg = ®s[regno + 1];
const struct btf_param *size_arg = &args[i + 1];
- ret = check_kfunc_mem_size_reg(env, size_reg, regno + 1);
- if (ret < 0) {
- verbose(env, "arg#%d arg#%d memory, len pair leads to invalid memory access\n", i, i + 1);
- return ret;
+ if (!register_is_null(buff_reg) || !is_kfunc_arg_optional(meta->btf, buff_arg)) {
+ ret = check_kfunc_mem_size_reg(env, size_reg, regno + 1);
+ if (ret < 0) {
+ verbose(env, "arg#%d arg#%d memory, len pair leads to invalid memory access\n", i, i + 1);
+ return ret;
+ }
}
if (is_kfunc_arg_const_mem_size(meta->btf, size_arg, size_reg)) {
base-commit: 6e98b09da931a00bf4e0477d0fa52748bf28fcce
--
2.40.1.495.gc816e09b53d-goog