From: angquan yu <angquan21(a)gmail.com>
In tools/testing/selftests/proc/proc-empty->because the return value
of a write call was being ignored. This call was partof a conditional
debugging block (if (0) { ... }), which meant it would neveractually
execute.
This patch removes the unused debug write call. This cleanup resolves
the compi>warning about ignoring the result of write declared with
the warn_unused_resultattribute.
Removing this code also improves the clarity and maintainability of
the function, as it eliminates a non-functional block of code.
This is original warning: proc-empty-vm.c: In function
‘test_proc_pid_statm’ :proc-empty-vm.c:385:17:
warning: ignoring return value of ‘write’
declared with>385 | write(1, buf, rv);|
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/proc/proc-empty-vm.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/proc/proc-empty-vm.c b/tools/testing/selftests/proc/proc-empty-vm.c
index 5e7020630..d231e61e4 100644
--- a/tools/testing/selftests/proc/proc-empty-vm.c
+++ b/tools/testing/selftests/proc/proc-empty-vm.c
@@ -383,8 +383,10 @@ static int test_proc_pid_statm(pid_t pid)
assert(rv <= sizeof(buf));
if (0) {
ssize_t written = write(1, buf, rv);
+
if (written == -1) {
perror("write failed to /proc/${pid}");
+ return EXIT_FAILURE;
}
}
--
2.39.2
Hi all:
The core frequency is subjected to the process variation in semiconductors.
Not all cores are able to reach the maximum frequency respecting the
infrastructure limits. Consequently, AMD has redefined the concept of
maximum frequency of a part. This means that a fraction of cores can reach
maximum frequency. To find the best process scheduling policy for a given
scenario, OS needs to know the core ordering informed by the platform through
highest performance capability register of the CPPC interface.
Earlier implementations of amd-pstate preferred core only support a static
core ranking and targeted performance. Now it has the ability to dynamically
change the preferred core based on the workload and platform conditions and
accounting for thermals and aging.
Amd-pstate driver utilizes the functions and data structures provided by
the ITMT architecture to enable the scheduler to favor scheduling on cores
which can be get a higher frequency with lower voltage.
We call it amd-pstate preferred core.
Here sched_set_itmt_core_prio() is called to set priorities and
sched_set_itmt_support() is called to enable ITMT feature.
Amd-pstate driver uses the highest performance value to indicate
the priority of CPU. The higher value has a higher priority.
Amd-pstate driver will provide an initial core ordering at boot time.
It relies on the CPPC interface to communicate the core ranking to the
operating system and scheduler to make sure that OS is choosing the cores
with highest performance firstly for scheduling the process. When amd-pstate
driver receives a message with the highest performance change, it will
update the core ranking.
Changes from V10->V11:
- cpufreq: amd-pstate:
- - according Perry's commnts, I replace the string with str_enabled_disable().
Changes from V9->V10:
- cpufreq: amd-pstate:
- - add judgement for highest_perf. When it is less than 255, the
preferred core feature is enabled. And it will set the priority.
- - deleset "static u32 max_highest_perf" etc, because amd p-state
perferred coe does not require specail process for hotpulg.
Changes form V8->V9:
- all:
- - pick up Tested-By flag added by Oleksandr.
- cpufreq: amd-pstate:
- - pick up Review-By flag added by Wyes.
- - ignore modification of bug.
- - add a attribute of prefcore_ranking.
- - modify data type conversion from u32 to int.
- Documentation: amd-pstate:
- - pick up Review-By flag added by Wyes.
Changes form V7->V8:
- all:
- - pick up Review-By flag added by Mario and Ray.
- cpufreq: amd-pstate:
- - use hw_prefcore embeds into cpudata structure.
- - delete preferred core init from cpu online/off.
Changes form V6->V7:
- x86:
- - Modify kconfig about X86_AMD_PSTATE.
- cpufreq: amd-pstate:
- - modify incorrect comments about scheduler_work().
- - convert highest_perf data type.
- - modify preferred core init when cpu init and online.
- acpi: cppc:
- - modify link of CPPC highest performance.
- cpufreq:
- - modify link of CPPC highest performance changed.
Changes form V5->V6:
- cpufreq: amd-pstate:
- - modify the wrong tag order.
- - modify warning about hw_prefcore sysfs attribute.
- - delete duplicate comments.
- - modify the variable name cppc_highest_perf to prefcore_ranking.
- - modify judgment conditions for setting highest_perf.
- - modify sysfs attribute for CPPC highest perf to pr_debug message.
- Documentation: amd-pstate:
- - modify warning: title underline too short.
Changes form V4->V5:
- cpufreq: amd-pstate:
- - modify sysfs attribute for CPPC highest perf.
- - modify warning about comments
- - rebase linux-next
- cpufreq:
- - Moidfy warning about function declarations.
- Documentation: amd-pstate:
- - align with ``amd-pstat``
Changes form V3->V4:
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.
Changes form V2->V3:
- x86:
- - Modify kconfig and description.
- cpufreq: amd-pstate:
- - Add Co-developed-by tag in commit message.
- cpufreq:
- - Modify commit message.
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.
Changes form V1->V2:
- acpi: cppc:
- - Add reference link.
- cpufreq:
- - Moidfy link error.
- cpufreq: amd-pstate:
- - Init the priorities of all online CPUs
- - Use a single variable to represent the status of preferred core.
- Documentation:
- - Default enabled preferred core.
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.
- - Default enabled preferred core.
- - Use a single variable to represent the status of preferred core.
Meng Li (7):
x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion.
acpi: cppc: Add get the highest performance cppc control
cpufreq: amd-pstate: Enable amd-pstate preferred core supporting.
cpufreq: Add a notification message that the highest perf has changed
cpufreq: amd-pstate: Update amd-pstate preferred core ranking
dynamically
Documentation: amd-pstate: introduce amd-pstate preferred core
Documentation: introduce amd-pstate preferrd core mode kernel command
line options
.../admin-guide/kernel-parameters.txt | 5 +
Documentation/admin-guide/pm/amd-pstate.rst | 59 +++++-
arch/x86/Kconfig | 5 +-
drivers/acpi/cppc_acpi.c | 13 ++
drivers/acpi/processor_driver.c | 6 +
drivers/cpufreq/amd-pstate.c | 187 ++++++++++++++++--
drivers/cpufreq/cpufreq.c | 13 ++
include/acpi/cppc_acpi.h | 5 +
include/linux/amd-pstate.h | 10 +
include/linux/cpufreq.h | 5 +
10 files changed, 288 insertions(+), 20 deletions(-)
--
2.34.1
Passing a gfp_t to KUNIT_EXPECT_EQ() causes a cast warning:
lib/kunit/string-stream-test.c:73:9: sparse: sparse: incorrect type in
initializer (different base types) expected long long right_value
got restricted gfp_t const __right
Avoid this by testing stream->gfp for the expected value and passing the
boolean result of this comparison to KUNIT_EXPECT_TRUE(), as was already
done a few lines above in string_stream_managed_init_test().
Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
Fixes: d1a0d699bfc0 ("kunit: string-stream: Add tests for freeing resource-managed string_stream")
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202311181918.0mpCu2Xh-lkp@intel.com/
---
lib/kunit/string-stream-test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/kunit/string-stream-test.c b/lib/kunit/string-stream-test.c
index 06822766f29a..03fb511826f7 100644
--- a/lib/kunit/string-stream-test.c
+++ b/lib/kunit/string-stream-test.c
@@ -72,7 +72,7 @@ static void string_stream_unmanaged_init_test(struct kunit *test)
KUNIT_EXPECT_EQ(test, stream->length, 0);
KUNIT_EXPECT_TRUE(test, list_empty(&stream->fragments));
- KUNIT_EXPECT_EQ(test, stream->gfp, GFP_KERNEL);
+ KUNIT_EXPECT_TRUE(test, (stream->gfp == GFP_KERNEL));
KUNIT_EXPECT_FALSE(test, stream->append_newlines);
KUNIT_EXPECT_TRUE(test, string_stream_is_empty(stream));
--
2.30.2
From: David Woodhouse <dwmw(a)amazon.co.uk>
Using -MD without -MP causes build failures when a header file is deleted
or moved. With -MP, the compiler will emit phony targets for the header
files it lists as dependencies, and the Makefiles won't refuse to attempt
to rebuild a C unit which no longer includes the deleted header.
Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk>
---
tools/testing/selftests/kvm/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index a3bb36fb3cfc..20ea549da570 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -211,7 +211,7 @@ else
LINUX_TOOL_ARCH_INCLUDE = $(top_srcdir)/tools/arch/$(ARCH)/include
endif
CFLAGS += -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99 \
- -Wno-gnu-variable-sized-type-not-at-end -MD\
+ -Wno-gnu-variable-sized-type-not-at-end -MD -MP \
-fno-builtin-memcmp -fno-builtin-memcpy -fno-builtin-memset \
-fno-builtin-strnlen \
-fno-stack-protector -fno-PIE -I$(LINUX_TOOL_INCLUDE) \
--
2.41.0
Changelog:
v6:
* Rebase on top of latest mm-unstable.
* Fix/improve the in-code documentation of the new list_lru
manipulation functions (patch 1)
v5:
* Replace reference getting with an rcu_read_lock() section for
zswap lru modifications (suggested by Yosry)
* Add a new prep patch that allows mem_cgroup_iter() to return
online cgroup.
* Add a callback that updates pool->next_shrink when the cgroup is
offlined (suggested by Yosry Ahmed, Johannes Weiner)
v4:
* Rename list_lru_add to list_lru_add_obj and __list_lru_add to
list_lru_add (patch 1) (suggested by Johannes Weiner and
Yosry Ahmed)
* Some cleanups on the memcg aware LRU patch (patch 2)
(suggested by Yosry Ahmed)
* Use event interface for the new per-cgroup writeback counters.
(patch 3) (suggested by Yosry Ahmed)
* Abstract zswap's lruvec states and handling into
zswap_lruvec_state (patch 5) (suggested by Yosry Ahmed)
v3:
* Add a patch to export per-cgroup zswap writeback counters
* Add a patch to update zswap's kselftest
* Separate the new list_lru functions into its own prep patch
* Do not start from the top of the hierarchy when encounter a memcg
that is not online for the global limit zswap writeback (patch 2)
(suggested by Yosry Ahmed)
* Do not remove the swap entry from list_lru in
__read_swapcache_async() (patch 2) (suggested by Yosry Ahmed)
* Removed a redundant zswap pool getting (patch 2)
(reported by Ryan Roberts)
* Use atomic for the nr_zswap_protected (instead of lruvec's lock)
(patch 5) (suggested by Yosry Ahmed)
* Remove the per-cgroup zswap shrinker knob (patch 5)
(suggested by Yosry Ahmed)
v2:
* Fix loongarch compiler errors
* Use pool stats instead of memcg stats when !CONFIG_MEMCG_KEM
There are currently several issues with zswap writeback:
1. There is only a single global LRU for zswap, making it impossible to
perform worload-specific shrinking - an memcg under memory pressure
cannot determine which pages in the pool it owns, and often ends up
writing pages from other memcgs. This issue has been previously
observed in practice and mitigated by simply disabling
memcg-initiated shrinking:
https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u
But this solution leaves a lot to be desired, as we still do not
have an avenue for an memcg to free up its own memory locked up in
the zswap pool.
2. We only shrink the zswap pool when the user-defined limit is hit.
This means that if we set the limit too high, cold data that are
unlikely to be used again will reside in the pool, wasting precious
memory. It is hard to predict how much zswap space will be needed
ahead of time, as this depends on the workload (specifically, on
factors such as memory access patterns and compressibility of the
memory pages).
This patch series solves these issues by separating the global zswap
LRU into per-memcg and per-NUMA LRUs, and performs workload-specific
(i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The
new shrinker does not have any parameter that must be tuned by the
user, and can be opted in or out on a per-memcg basis.
As a proof of concept, we ran the following synthetic benchmark:
build the linux kernel in a memory-limited cgroup, and allocate some
cold data in tmpfs to see if the shrinker could write them out and
improved the overall performance. Depending on the amount of cold data
generated, we observe from 14% to 35% reduction in kernel CPU time used
in the kernel builds.
Domenico Cerasuolo (3):
zswap: make shrinking memcg-aware
mm: memcg: add per-memcg zswap writeback stat
selftests: cgroup: update per-memcg zswap writeback selftest
Nhat Pham (3):
list_lru: allows explicit memcg and NUMA node selection
memcontrol: allows mem_cgroup_iter() to check for onlineness
zswap: shrinks zswap pool based on memory pressure
Documentation/admin-guide/mm/zswap.rst | 7 +
drivers/android/binder_alloc.c | 5 +-
fs/dcache.c | 8 +-
fs/gfs2/quota.c | 6 +-
fs/inode.c | 4 +-
fs/nfs/nfs42xattr.c | 8 +-
fs/nfsd/filecache.c | 4 +-
fs/xfs/xfs_buf.c | 6 +-
fs/xfs/xfs_dquot.c | 2 +-
fs/xfs/xfs_qm.c | 2 +-
include/linux/list_lru.h | 54 ++-
include/linux/memcontrol.h | 9 +-
include/linux/mmzone.h | 2 +
include/linux/vm_event_item.h | 1 +
include/linux/zswap.h | 27 +-
mm/list_lru.c | 48 ++-
mm/memcontrol.c | 20 +-
mm/mmzone.c | 1 +
mm/shrinker.c | 4 +-
mm/swap.h | 3 +-
mm/swap_state.c | 26 +-
mm/vmscan.c | 26 +-
mm/vmstat.c | 1 +
mm/workingset.c | 4 +-
mm/zswap.c | 426 +++++++++++++++++---
tools/testing/selftests/cgroup/test_zswap.c | 74 ++--
26 files changed, 629 insertions(+), 149 deletions(-)
base-commit: 40b487ae2620fc9187fee68b09d2cb275de0d60e
--
2.34.1
The test is inspired by the pmu_event_filter_test which implemented by x86. On
the arm64 platform, there is the same ability to set the pmu_event_filter
through the KVM_ARM_VCPU_PMU_V3_FILTER attribute. So add the test for arm64.
The series first move some pmu common code from vpmu_counter_access to lib/
which can be used by pmu_event_filter_test. Then implements the test itself.
Shaoqin Huang (3):
KVM: selftests: aarch64: Make the [create|destroy]_vpmu_vm() can be
reused
KVM: selftests: aarch64: Move the pmu helper function into lib/
KVM: selftests: aarch64: Introduce pmu_event_filter_test
tools/testing/selftests/kvm/Makefile | 2 +
.../kvm/aarch64/pmu_event_filter_test.c | 227 ++++++++++++++++++
.../kvm/aarch64/vpmu_counter_access.c | 218 ++---------------
.../selftests/kvm/include/aarch64/vpmu.h | 139 +++++++++++
.../testing/selftests/kvm/lib/aarch64/vpmu.c | 74 ++++++
5 files changed, 466 insertions(+), 194 deletions(-)
create mode 100644 tools/testing/selftests/kvm/aarch64/pmu_event_filter_test.c
create mode 100644 tools/testing/selftests/kvm/include/aarch64/vpmu.h
create mode 100644 tools/testing/selftests/kvm/lib/aarch64/vpmu.c
--
2.40.1
When linking statically, libraries may require other dependencies to be
included to ld flags. In particular, libelf may require libzstd. Use
pkg-config to determine such dependencies.
V4 -> V5: Introduced variables LIBELF_CFLAGS and LIBELF_LIBS.
(Daniel Borkmann)
Added patch "selftests/bpf: Choose pkg-config for the target".
V3 -> V4: Added "2> /dev/null".
V2 -> V3: Added missing "echo".
V1 -> V2: Implemented fallback, referring to HOSTPKG_CONFIG.
Akihiko Odaki (3):
selftests/bpf: Choose pkg-config for the target
selftests/bpf: Override PKG_CONFIG for static builds
selftests/bpf: Use pkg-config for libelf
tools/testing/selftests/bpf/Makefile | 14 +++++++++-----
tools/testing/selftests/bpf/README.rst | 2 +-
2 files changed, 10 insertions(+), 6 deletions(-)
--
2.43.0
From: angquan yu <angquan21(a)gmail.com>
This commit addresses compiler warnings in lam.c related to the usage
of non-literal format strings without format arguments in the
'run_test' function.
Warnings fixed:
- Resolved warnings indicating that 'ksft_test_result_skip' and
'ksft_test_result' were called with 't->msg' as a format string without
accompanying format arguments.
Changes made:
- Modified the calls to 'ksft_test_result_skip' and 'ksft_test_result'
to explicitly include a format specifier ("%s") for 't->msg'.
- This ensures that the string is safely treated as a format argument,
adhering to safer coding practices and resolving the compiler warnings.
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/x86/lam.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c
index 8f9b06d9c..215b8150b 100644
--- a/tools/testing/selftests/x86/lam.c
+++ b/tools/testing/selftests/x86/lam.c
@@ -817,7 +817,7 @@ static void run_test(struct testcases *test, int count)
/* return 3 is not support LA57, the case should be skipped */
if (ret == 3) {
- ksft_test_result_skip(t->msg);
+ ksft_test_result_skip("%s", t->msg);
continue;
}
@@ -826,7 +826,7 @@ static void run_test(struct testcases *test, int count)
else
ret = !(t->expected);
- ksft_test_result(ret, t->msg);
+ ksft_test_result(ret, "%s", t->msg);
}
}
--
2.39.2
From: angquan yu <angquan21(a)gmail.com>
This commit fixes a compiler warning in the file
x86_64/nx_huge_pages_test.c, which was caused by improper
macro expansion of '__TEST_REQUIRE'.
Warning addressed:
- The warning was triggered by the expansion of the '__TEST_REQUIRE'
macro, indicating a potential issue in how the macro was being
used or expanded.
Changes made:
- Modified the usage of the '__TEST_REQUIRE' macro to ensure proper
expansion. This involved explicitly passing the expected magic token
(MAGIC_TOKEN) and a descriptive error message to the macro.
- The fix enhances clarity in the macro usage and ensures that
the compiler correctly interprets the intended logic, thereby
resolving the warning.
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
index 18ac5c195..323ede6b6 100644
--- a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
+++ b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
@@ -259,7 +259,8 @@ int main(int argc, char **argv)
__TEST_REQUIRE(token == MAGIC_TOKEN,
"This test must be run with the magic token %d.\n"
"This is done by nx_huge_pages_test.sh, which\n"
- "also handles environment setup for the test.");
+ "also handles environment setup for the test.",
+ MAGIC_TOKEN);
run_test(reclaim_period_ms, false, reboot_permissions);
run_test(reclaim_period_ms, true, reboot_permissions);
--
2.39.2
From: angquan yu <angquan21(a)gmail.com>
This commit addresses compiler warnings in lam.c related to the usage
of non-literal format strings without format arguments in the
'run_test' function.
Warnings fixed:
- Resolved warnings indicating that 'ksft_test_result_skip' and
'ksft_test_result' were called with 't->msg' as a format string without
accompanying format arguments.
Changes made:
- Modified the calls to 'ksft_test_result_skip' and 'ksft_test_result'
to explicitly include a format specifier ("%s") for 't->msg'.
- This ensures that the string is safely treated as a format argument,
adhering to safer coding practices and resolving the compiler warnings.
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/x86/lam.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/x86/lam.c b/tools/testing/selftests/x86/lam.c
index 8f9b06d9c..215b8150b 100644
--- a/tools/testing/selftests/x86/lam.c
+++ b/tools/testing/selftests/x86/lam.c
@@ -817,7 +817,7 @@ static void run_test(struct testcases *test, int count)
/* return 3 is not support LA57, the case should be skipped */
if (ret == 3) {
- ksft_test_result_skip(t->msg);
+ ksft_test_result_skip("%s", t->msg);
continue;
}
@@ -826,7 +826,7 @@ static void run_test(struct testcases *test, int count)
else
ret = !(t->expected);
- ksft_test_result(ret, t->msg);
+ ksft_test_result(ret, "%s", t->msg);
}
}
--
2.39.2
From: angquan yu <angquan21(a)gmail.com>
This commit resolves a compiler warning regardingthe
use of non-literal format strings in breakpoint_test.c.
The functions `ksft_test_result_pass` and `ksft_test_result_fail`
were previously called with a variable `msg` directly, which could
potentially lead to format string vulnerabilities.
Changes made:
- Modified the calls to `ksft_test_result_pass` and `ksft_test_result_fail`
by adding a "%s" format specifier. This explicitly declares `msg` as a
string argument, adhering to safer coding practices and resolving
the compiler warning.
This change does not affect the functional behavior of the code but ensures
better code safety and compliance with recommended C programming standards.
The previous warning is "breakpoint_test.c:287:17:
warning: format not a string literal and no format arguments
[-Wformat-security]
287 | ksft_test_result_pass(msg);
| ^~~~~~~~~~~~~~~~~~~~~
breakpoint_test.c:289:17: warning: format not a string literal
and no format arguments [-Wformat-security]
289 | ksft_test_result_fail(msg);
| "
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/breakpoints/breakpoint_test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/breakpoints/breakpoint_test.c b/tools/testing/selftests/breakpoints/breakpoint_test.c
index 3266cc929..d46962a24 100644
--- a/tools/testing/selftests/breakpoints/breakpoint_test.c
+++ b/tools/testing/selftests/breakpoints/breakpoint_test.c
@@ -284,9 +284,9 @@ static void check_success(const char *msg)
nr_tests++;
if (ret)
- ksft_test_result_pass(msg);
+ ksft_test_result_pass("%s", msg);
else
- ksft_test_result_fail(msg);
+ ksft_test_result_fail("%s", msg);
}
static void launch_instruction_breakpoints(char *buf, int local, int global)
--
2.39.2
From: angquan yu <angquan21(a)gmail.com>
In tools/testing/selftests/proc/proc-empty->because the return value
of a write call was being ignored. This call was partof a conditional
debugging block (if (0) { ... }), which meant it would neveractually
execute.
This patch removes the unused debug write call. This cleanup resolves
the compi>warning about ignoring the result of write declared with
the warn_unused_resultattribute.
Removing this code also improves the clarity and maintainability of
the function, as it eliminates a non-functional block of code.
This is original warning: proc-empty-vm.c: In function
‘test_proc_pid_statm’ :proc-empty-vm.c:385:17:
warning: ignoring return value of ‘write’
declared with>385 | write(1, buf, rv);|
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/proc/proc-empty-vm.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/proc/proc-empty-vm.c b/tools/testing/selftests/proc/proc-empty-vm.c
index 5e7020630..d231e61e4 100644
--- a/tools/testing/selftests/proc/proc-empty-vm.c
+++ b/tools/testing/selftests/proc/proc-empty-vm.c
@@ -383,8 +383,10 @@ static int test_proc_pid_statm(pid_t pid)
assert(rv <= sizeof(buf));
if (0) {
ssize_t written = write(1, buf, rv);
+
if (written == -1) {
perror("write failed to /proc/${pid}");
+ return EXIT_FAILURE;
}
}
--
2.39.2
From: Eduard Zingerman <eddyz87(a)gmail.com>
[ Upstream commit f40bfd1679446b22d321e64a1fa98b7d07d2be08 ]
This is a preparatory change. A follow-up patch "bpf: verify callbacks
as if they are called unknown number of times" changes logic for
callbacks handling. While previously callbacks were verified as a
single function call, new scheme takes into account that callbacks
could be executed unknown number of times.
This has dire implications for bpf_loop_bench:
SEC("fentry/" SYS_PREFIX "sys_getpgid")
int benchmark(void *ctx)
{
for (int i = 0; i < 1000; i++) {
bpf_loop(nr_loops, empty_callback, NULL, 0);
__sync_add_and_fetch(&hits, nr_loops);
}
return 0;
}
W/o callbacks change verifier sees it as a 1000 calls to
empty_callback(). However, with callbacks change things become
exponential:
- i=0: state exploring empty_callback is scheduled with i=0 (a);
- i=1: state exploring empty_callback is scheduled with i=1;
...
- i=999: state exploring empty_callback is scheduled with i=999;
- state (a) is popped from stack;
- i=1: state exploring empty_callback is scheduled with i=1;
...
Avoid this issue by rewriting outer loop as bpf_loop().
Unfortunately, this adds a function call to a loop at runtime, which
negatively affects performance:
throughput latency
before: 149.919 ± 0.168 M ops/s, 6.670 ns/op
after : 137.040 ± 0.187 M ops/s, 7.297 ns/op
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87(a)gmail.com>
Link: https://lore.kernel.org/r/20231121020701.26440-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/bpf_loop_bench.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/bpf_loop_bench.c b/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
index 4ce76eb064c41..d461746fd3c1e 100644
--- a/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
+++ b/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
@@ -15,13 +15,16 @@ static int empty_callback(__u32 index, void *data)
return 0;
}
+static int outer_loop(__u32 index, void *data)
+{
+ bpf_loop(nr_loops, empty_callback, NULL, 0);
+ __sync_add_and_fetch(&hits, nr_loops);
+ return 0;
+}
+
SEC("fentry/" SYS_PREFIX "sys_getpgid")
int benchmark(void *ctx)
{
- for (int i = 0; i < 1000; i++) {
- bpf_loop(nr_loops, empty_callback, NULL, 0);
-
- __sync_add_and_fetch(&hits, nr_loops);
- }
+ bpf_loop(1000, outer_loop, NULL, 0);
return 0;
}
--
2.42.0
Hi,
v1 [1] was discussed during Plumbers [2], where a lot of feedback was given. I
hope to justify the changes in v2 and address the feedback here.
One feedback from Shuah was that keeping per-platform files with the USB/PCI
devices to test as part of the kselftest tree wasn't maintainable. One proposed
alternative was to generate a list of probed devices on a known-good kernel and
use that as a reference. However you need someone to look at that generated
reference to be able to say it is a good one, and you need to save it to ensure
it will be reproducible later anyway, so that wouldn't actually solve the
problem. It is a matter of hand-crafting vs generating the test definitions, but
they will need to be vouched by someone and stored somewhere in both cases.
So for this v2, in patch 2 I just have a sample test definition, and the
per-platform test definitions would be added to a separate repository.
The other feedback received was that the BIOS might reconfigure the PCI
topology (at least on x86), meaning that relying on a sequence of device and
function numbers (eg 1d.0/02.0/0.0) as a stable description of a device on the
platform is not possible. I couldn't verify whether this is really the case (if
you have any more insight into this, please let me know), but with that in mind,
here in v2 I have taken a different approach. Here I'm using the device's
properties which are used for driver matching (the same that show on modalias)
to identify a device in a stable way.
This approach has some drawbacks compared to the one on v1. For one it doesn't
uniquely identify a device, so if there are multiple of the same device on a
platform they have to be checked as a group. Also the test definition isn't as
human-readable.
I'm adding in CC the people I recognized at the Plumbers session that were
interested in this work. Feel free to add anyone missing.
Thanks,
Nícolas
[1] https://lore.kernel.org/all/20231024211818.365844-1-nfraprado@collabora.com
[2] https://www.youtube.com/watch?v=oE73eVSyFXQ&t=9377s
Original cover letter:
This is part of an effort to improve detection of regressions impacting
device probe on all platforms. The recently merged DT kselftest [3]
detects probe issues for all devices described statically in the DT.
That leaves out devices discovered at run-time from discoverable busses.
This is where this test comes in. All of the devices that are connected
through discoverable busses (ie USB and PCI), and which are internal and
therefore always present, can be described in a per-platform file so
they can be checked for. The test will check that the device has been
instantiated and bound to a driver.
Patch 1 introduces the test. Patch 2 adds the test definitions for the
google,spherion machine (Acer Chromebook 514) as an example.
This is the sample output from the test running on Spherion:
TAP version 13
Using board file: boards/google,spherion
1..3
ok 1 usb.camera
ok 2 usb.bluetooth
ok 3 pci.wifi
Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
[3] https://lore.kernel.org/all/20230828211424.2964562-1-nfraprado@collabora.co…
Changes in v2:
- Changed approach of encoding stable device reference in test file from
HW topology to device match fields (the ones from modalias)
- Better documented test format
Nícolas F. R. A. Prado (2):
kselftest: Add test to verify probe of devices from discoverable
busses
kselftest: devices: Add sample board file for google,spherion
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/devices/.gitignore | 1 +
tools/testing/selftests/devices/Makefile | 8 +
.../selftests/devices/boards/google,spherion | 12 ++
.../devices/test_discoverable_devices.sh | 160 ++++++++++++++++++
5 files changed, 182 insertions(+)
create mode 100644 tools/testing/selftests/devices/.gitignore
create mode 100644 tools/testing/selftests/devices/Makefile
create mode 100644 tools/testing/selftests/devices/boards/google,spherion
create mode 100755 tools/testing/selftests/devices/test_discoverable_devices.sh
--
2.42.1
From: Eduard Zingerman <eddyz87(a)gmail.com>
[ Upstream commit f40bfd1679446b22d321e64a1fa98b7d07d2be08 ]
This is a preparatory change. A follow-up patch "bpf: verify callbacks
as if they are called unknown number of times" changes logic for
callbacks handling. While previously callbacks were verified as a
single function call, new scheme takes into account that callbacks
could be executed unknown number of times.
This has dire implications for bpf_loop_bench:
SEC("fentry/" SYS_PREFIX "sys_getpgid")
int benchmark(void *ctx)
{
for (int i = 0; i < 1000; i++) {
bpf_loop(nr_loops, empty_callback, NULL, 0);
__sync_add_and_fetch(&hits, nr_loops);
}
return 0;
}
W/o callbacks change verifier sees it as a 1000 calls to
empty_callback(). However, with callbacks change things become
exponential:
- i=0: state exploring empty_callback is scheduled with i=0 (a);
- i=1: state exploring empty_callback is scheduled with i=1;
...
- i=999: state exploring empty_callback is scheduled with i=999;
- state (a) is popped from stack;
- i=1: state exploring empty_callback is scheduled with i=1;
...
Avoid this issue by rewriting outer loop as bpf_loop().
Unfortunately, this adds a function call to a loop at runtime, which
negatively affects performance:
throughput latency
before: 149.919 ± 0.168 M ops/s, 6.670 ns/op
after : 137.040 ± 0.187 M ops/s, 7.297 ns/op
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87(a)gmail.com>
Link: https://lore.kernel.org/r/20231121020701.26440-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/bpf_loop_bench.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/bpf/progs/bpf_loop_bench.c b/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
index 4ce76eb064c41..d461746fd3c1e 100644
--- a/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
+++ b/tools/testing/selftests/bpf/progs/bpf_loop_bench.c
@@ -15,13 +15,16 @@ static int empty_callback(__u32 index, void *data)
return 0;
}
+static int outer_loop(__u32 index, void *data)
+{
+ bpf_loop(nr_loops, empty_callback, NULL, 0);
+ __sync_add_and_fetch(&hits, nr_loops);
+ return 0;
+}
+
SEC("fentry/" SYS_PREFIX "sys_getpgid")
int benchmark(void *ctx)
{
- for (int i = 0; i < 1000; i++) {
- bpf_loop(nr_loops, empty_callback, NULL, 0);
-
- __sync_add_and_fetch(&hits, nr_loops);
- }
+ bpf_loop(1000, outer_loop, NULL, 0);
return 0;
}
--
2.42.0
From: angquan yu <angquan21(a)gmail.com>
This commit resolves a compiler warning regardingthe
use of non-literal format strings in breakpoint_test.c.
The functions `ksft_test_result_pass` and `ksft_test_result_fail`
were previously called with a variable `msg` directly, which could
potentially lead to format string vulnerabilities.
Changes made:
- Modified the calls to `ksft_test_result_pass` and `ksft_test_result_fail`
by adding a "%s" format specifier. This explicitly declares `msg` as a
string argument, adhering to safer coding practices and resolving
the compiler warning.
This change does not affect the functional behavior of the code but ensures
better code safety and compliance with recommended C programming standards.
The previous warning is "breakpoint_test.c:287:17:
warning: format not a string literal and no format arguments
[-Wformat-security]
287 | ksft_test_result_pass(msg);
| ^~~~~~~~~~~~~~~~~~~~~
breakpoint_test.c:289:17: warning: format not a string literal
and no format arguments [-Wformat-security]
289 | ksft_test_result_fail(msg);
| "
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/breakpoints/breakpoint_test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/breakpoints/breakpoint_test.c b/tools/testing/selftests/breakpoints/breakpoint_test.c
index 3266cc929..d46962a24 100644
--- a/tools/testing/selftests/breakpoints/breakpoint_test.c
+++ b/tools/testing/selftests/breakpoints/breakpoint_test.c
@@ -284,9 +284,9 @@ static void check_success(const char *msg)
nr_tests++;
if (ret)
- ksft_test_result_pass(msg);
+ ksft_test_result_pass("%s", msg);
else
- ksft_test_result_fail(msg);
+ ksft_test_result_fail("%s", msg);
}
static void launch_instruction_breakpoints(char *buf, int local, int global)
--
2.39.2
The root-only cpuset.cpus.isolated control file shows the current set
of isolated CPUs in isolated partitions. This control file is currently
exposed only with the cgroup_debug boot command line option which also
adds the ".__DEBUG__." prefix. This is actually a useful control file if
users want to find out which CPUs are currently in an isolated state by
the cpuset controller. Remove CFTYPE_DEBUG flag for this control file and
make it available by default without any prefix.
The test_cpuset_prs.sh test script and the cgroup-v2.rst documentation
file are also updated accordingly. Minor code change is also made in
test_cpuset_prs.sh to avoid false test failure when running on debug
kernel.
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
Documentation/admin-guide/cgroup-v2.rst | 7 ++++
kernel/cgroup/cpuset.c | 2 +-
.../selftests/cgroup/test_cpuset_prs.sh | 32 +++++++++++--------
3 files changed, 26 insertions(+), 15 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index cf5651a11df8..30f6ff2eba47 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2316,6 +2316,13 @@ Cpuset Interface Files
treated to have an implicit value of "cpuset.cpus" in the
formation of local partition.
+ cpuset.cpus.isolated
+ A read-only and root cgroup only multiple values file.
+
+ This file shows the set of all isolated CPUs used in existing
+ isolated partitions. It will be empty if no isolated partition
+ is created.
+
cpuset.cpus.partition
A read-write single value file which exists on non-root
cpuset-enabled cgroups. This flag is owned by the parent cgroup
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 1bad4007ff4b..2a16df86c55c 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3974,7 +3974,7 @@ static struct cftype dfl_files[] = {
.name = "cpus.isolated",
.seq_show = cpuset_common_seq_show,
.private = FILE_ISOLATED_CPULIST,
- .flags = CFTYPE_ONLY_ON_ROOT | CFTYPE_DEBUG,
+ .flags = CFTYPE_ONLY_ON_ROOT,
},
{ } /* terminate */
diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
index 7b7c4c2b6d85..b5eb1be2248c 100755
--- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
+++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
@@ -508,7 +508,7 @@ dump_states()
XECPUS=$DIR/cpuset.cpus.exclusive.effective
PRS=$DIR/cpuset.cpus.partition
PCPUS=$DIR/.__DEBUG__.cpuset.cpus.subpartitions
- ISCPUS=$DIR/.__DEBUG__.cpuset.cpus.isolated
+ ISCPUS=$DIR/cpuset.cpus.isolated
[[ -e $CPUS ]] && echo "$CPUS: $(cat $CPUS)"
[[ -e $XCPUS ]] && echo "$XCPUS: $(cat $XCPUS)"
[[ -e $ECPUS ]] && echo "$ECPUS: $(cat $ECPUS)"
@@ -593,17 +593,17 @@ check_cgroup_states()
#
# Get isolated (including offline) CPUs by looking at
-# /sys/kernel/debug/sched/domains and *cpuset.cpus.isolated control file,
+# /sys/kernel/debug/sched/domains and cpuset.cpus.isolated control file,
# if available, and compare that with the expected value.
#
# Note that isolated CPUs from the sched/domains context include offline
# CPUs as well as CPUs in non-isolated 1-CPU partition. Those CPUs may
-# not be included in the *cpuset.cpus.isolated control file which contains
+# not be included in the cpuset.cpus.isolated control file which contains
# only CPUs in isolated partitions.
#
# $1 - expected isolated cpu list(s) <isolcpus1>{,<isolcpus2>}
# <isolcpus1> - expected sched/domains value
-# <isolcpus2> - *cpuset.cpus.isolated value = <isolcpus1> if not defined
+# <isolcpus2> - cpuset.cpus.isolated value = <isolcpus1> if not defined
#
check_isolcpus()
{
@@ -611,7 +611,7 @@ check_isolcpus()
ISOLCPUS=
LASTISOLCPU=
SCHED_DOMAINS=/sys/kernel/debug/sched/domains
- ISCPUS=${CGROUP2}/.__DEBUG__.cpuset.cpus.isolated
+ ISCPUS=${CGROUP2}/cpuset.cpus.isolated
if [[ $EXPECT_VAL = . ]]
then
EXPECT_VAL=
@@ -692,14 +692,18 @@ test_fail()
null_isolcpus_check()
{
[[ $VERBOSE -gt 0 ]] || return 0
- pause 0.02
- check_isolcpus "."
- if [[ $? -ne 0 ]]
- then
- echo "Unexpected isolated CPUs: $ISOLCPUS"
- dump_states
- exit 1
- fi
+ # Retry a few times before printing error
+ RETRY=0
+ while [[ $RETRY -lt 5 ]]
+ do
+ pause 0.01
+ check_isolcpus "."
+ [[ $? -eq 0 ]] && return 0
+ ((RETRY++))
+ done
+ echo "Unexpected isolated CPUs: $ISOLCPUS"
+ dump_states
+ exit 1
}
#
@@ -776,7 +780,7 @@ run_state_test()
#
NEWLIST=$(cat cpuset.cpus.effective)
RETRY=0
- while [[ $NEWLIST != $CPULIST && $RETRY -lt 5 ]]
+ while [[ $NEWLIST != $CPULIST && $RETRY -lt 8 ]]
do
# Wait a bit longer & recheck a few times
pause 0.01
--
2.39.3
This patchset adds two kfunc helpers, bpf_xdp_get_xfrm_state() and
bpf_xdp_xfrm_state_release() that wrap xfrm_state_lookup() and
xfrm_state_put(). The intent is to support software RSS (via XDP) for
the ongoing/upcoming ipsec pcpu work [0]. Recent experiments performed
on (hopefully) reproducible AWS testbeds indicate that single tunnel
pcpu ipsec can reach line rate on 100G ENA nics.
Note this patchset only tests/shows generic xfrm_state access. The
"secret sauce" (if you can really even call it that) involves accessing
a soon-to-be-upstreamed pcpu_num field in xfrm_state. Early example is
available here [1].
[0]: https://datatracker.ietf.org/doc/draft-ietf-ipsecme-multi-sa-performance/03/
[1]: https://github.com/danobi/xdp-tools/blob/e89a1c617aba3b50d990f779357d6ce286…
Changes from v1:
* Move xfrm tunnel tests to test_progs
* Fix writing to opts->error when opts is invalid
* Use __bpf_kfunc_start_defs()
* Remove unused vxlanhdr definition
* Add and use BPF_CORE_WRITE_BITFIELD() macro
* Make series bisect clean
Changes from RFCv2:
* Rebased to ipsec-next
* Fix netns leak
Changes from RFCv1:
* Add Antony's commit tags
* Add KF_ACQUIRE and KF_RELEASE semantics
Daniel Xu (6):
bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc
bpf: xfrm: Add bpf_xdp_xfrm_state_release() kfunc
libbpf: Add BPF_CORE_WRITE_BITFIELD() macro
bpf: selftests: test_tunnel: Use vmlinux.h declarations
bpf: selftests: Move xfrm tunnel test to test_progs
bpf: xfrm: Add selftest for bpf_xdp_get_xfrm_state()
include/net/xfrm.h | 9 +
net/xfrm/Makefile | 1 +
net/xfrm/xfrm_policy.c | 2 +
net/xfrm/xfrm_state_bpf.c | 128 +++++++++++++++
tools/lib/bpf/bpf_core_read.h | 36 ++++
.../selftests/bpf/prog_tests/test_tunnel.c | 155 ++++++++++++++++++
.../selftests/bpf/progs/bpf_tracing_net.h | 1 +
.../selftests/bpf/progs/test_tunnel_kern.c | 138 +++++++++-------
tools/testing/selftests/bpf/test_tunnel.sh | 92 -----------
9 files changed, 412 insertions(+), 150 deletions(-)
create mode 100644 net/xfrm/xfrm_state_bpf.c
--
2.42.1
This patchset adds two kfunc helpers, bpf_xdp_get_xfrm_state() and
bpf_xdp_xfrm_state_release() that wrap xfrm_state_lookup() and
xfrm_state_put(). The intent is to support software RSS (via XDP) for
the ongoing/upcoming ipsec pcpu work [0]. Recent experiments performed
on (hopefully) reproducible AWS testbeds indicate that single tunnel
pcpu ipsec can reach line rate on 100G ENA nics.
Note this patchset only tests/shows generic xfrm_state access. The
"secret sauce" (if you can really even call it that) involves accessing
a soon-to-be-upstreamed pcpu_num field in xfrm_state. Early example is
available here [1].
[0]: https://datatracker.ietf.org/doc/draft-ietf-ipsecme-multi-sa-performance/03/
[1]: https://github.com/danobi/xdp-tools/blob/e89a1c617aba3b50d990f779357d6ce286…
Changes from RFCv2:
* Rebased to ipsec-next
* Fix netns leak
Changes from RFCv1:
* Add Antony's commit tags
* Add KF_ACQUIRE and KF_RELEASE semantics
Daniel Xu (7):
bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc
bpf: xfrm: Add bpf_xdp_xfrm_state_release() kfunc
bpf: selftests: test_tunnel: Use ping -6 over ping6
bpf: selftests: test_tunnel: Mount bpffs if necessary
bpf: selftests: test_tunnel: Use vmlinux.h declarations
bpf: selftests: test_tunnel: Disable CO-RE relocations
bpf: xfrm: Add selftest for bpf_xdp_get_xfrm_state()
include/net/xfrm.h | 9 ++
net/xfrm/Makefile | 1 +
net/xfrm/xfrm_policy.c | 2 +
net/xfrm/xfrm_state_bpf.c | 127 ++++++++++++++++++
.../selftests/bpf/progs/bpf_tracing_net.h | 1 +
.../selftests/bpf/progs/test_tunnel_kern.c | 98 ++++++++------
tools/testing/selftests/bpf/test_tunnel.sh | 43 ++++--
7 files changed, 227 insertions(+), 54 deletions(-)
create mode 100644 net/xfrm/xfrm_state_bpf.c
--
2.42.1
We have started printing more and more intentional stack traces. Whether
it's testing KASAN is able to detect use after frees or it's part of a
kunit test.
These stack traces can be problematic. They suddenly show up as a new
failure. Now the test team has to contact the developers. A bunch of
people have to investigate the bug. We finally decide that it's
intentional so now the test team has to update their filter scripts to
mark it as intentional. These filters are ad-hoc because there is no
standard format for warnings.
A better way would be to mark it as intentional from the start.
Here, I have marked the beginning and the end of the trace. It's more
tricky for things like lkdtm_FORTIFY_MEM_MEMBER() where the flow doesn't
reach the end of the function. I guess I would print a different
warning for stack traces that can't have a
"Intentional warning finished\n" message at the end.
I haven't actually tested this patch... Daniel, do you have a
list of intentional stack traces we could annotate?
Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org>
---
drivers/gpu/drm/tests/drm_rect_test.c | 2 ++
include/kunit/test.h | 3 +++
2 files changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/tests/drm_rect_test.c b/drivers/gpu/drm/tests/drm_rect_test.c
index 76332cd2ead8..367738254493 100644
--- a/drivers/gpu/drm/tests/drm_rect_test.c
+++ b/drivers/gpu/drm/tests/drm_rect_test.c
@@ -409,8 +409,10 @@ static void drm_test_rect_calc_hscale(struct kunit *test)
const struct drm_rect_scale_case *params = test->param_value;
int scaling_factor;
+ START_INTENTIONAL_WARNING();
scaling_factor = drm_rect_calc_hscale(¶ms->src, ¶ms->dst,
params->min_range, params->max_range);
+ END_INTENTIONAL_WARNING();
KUNIT_EXPECT_EQ(test, scaling_factor, params->expected_scaling_factor);
}
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 20ed9f9275c9..1f01d4c81055 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -337,6 +337,9 @@ void __kunit_test_suites_exit(struct kunit_suite **suites, int num_suites);
void kunit_exec_run_tests(struct kunit_suite_set *suite_set, bool builtin);
void kunit_exec_list_tests(struct kunit_suite_set *suite_set, bool include_attr);
+#define START_INTENTIONAL_WARNING() pr_info("Triggering a stack trace\n")
+#define END_INTENTIONAL_WARNING() pr_info("Intentional warning finished\n")
+
#if IS_BUILTIN(CONFIG_KUNIT)
int kunit_run_all_tests(void);
#else
--
2.42.0
KUnit's deferred action API accepts a void(*)(void *) function pointer
which is called when the test is exited. However, we very frequently
want to use existing functions which accept a single pointer, but which
may not be of type void*. While this is probably dodgy enough to be on
the wrong side of the C standard, it's been often used for similar
callbacks, and gcc's -Wcast-function-type seems to ignore cases where
the only difference is the type of the argument, assuming it's
compatible (i.e., they're both pointers to data).
However, clang 16 has introduced -Wcast-function-type-strict, which no
longer permits any deviation in function pointer type. This seems to be
because it'd break CFI, which validates the type of function calls.
This rather ruins our attempts to cast functions to defer them, and
leaves us with a few options. The one we've chosen is to implement a
macro which will generate a wrapper function which accepts a void*, and
casts the argument to the appropriate type.
For example, if you were trying to wrap:
void foo_close(struct foo *handle);
you could use:
KUNIT_DEFINE_ACTION_WRAPPER(kunit_action_foo_close,
foo_close,
struct foo *);
This would create a new kunit_action_foo_close() function, of type
kunit_action_t, which could be passed into kunit_add_action() and
similar functions.
In addition to defining this macro, update KUnit and its tests to use
it.
Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Reviewed-by: Nathan Chancellor <nathan(a)kernel.org>
Tested-by: Nathan Chancellor <nathan(a)kernel.org>
Acked-by: Daniel Vetter <daniel(a)ffwll.ch>
Reviewed-by: Maxime Ripard <mripard(a)kernel.org>
Signed-off-by: David Gow <davidgow(a)google.com>
---
Thanks everyone for testing v1 of this: this update only changes
documentation.
Changes since v1:
https://lore.kernel.org/linux-kselftest/20231110200830.1832556-1-davidgow@g…
- Update the usage.rst documentation (Thanks, Nathan)
- Add a better doc comment for KUNIT_DEFINE_ACTION_WRAPPER()
---
Documentation/dev-tools/kunit/usage.rst | 10 +++++++---
include/kunit/resource.h | 21 +++++++++++++++++++++
lib/kunit/kunit-test.c | 5 +----
lib/kunit/test.c | 6 ++++--
4 files changed, 33 insertions(+), 9 deletions(-)
diff --git a/Documentation/dev-tools/kunit/usage.rst b/Documentation/dev-tools/kunit/usage.rst
index c27e1646ecd9..9db12e91668e 100644
--- a/Documentation/dev-tools/kunit/usage.rst
+++ b/Documentation/dev-tools/kunit/usage.rst
@@ -651,12 +651,16 @@ For example:
}
Note that, for functions like device_unregister which only accept a single
-pointer-sized argument, it's possible to directly cast that function to
-a ``kunit_action_t`` rather than writing a wrapper function, for example:
+pointer-sized argument, it's possible to automatically generate a wrapper
+with the ``KUNIT_DEFINE_ACTION_WRAPPER()`` macro, for example:
.. code-block:: C
- kunit_add_action(test, (kunit_action_t *)&device_unregister, &dev);
+ KUNIT_DEFINE_ACTION_WRAPPER(device_unregister, device_unregister_wrapper, struct device *);
+ kunit_add_action(test, &device_unregister_wrapper, &dev);
+
+You should do this in preference to manually casting to the ``kunit_action_t`` type,
+as casting function pointers will break Control Flow Integrity (CFI).
``kunit_add_action`` can fail if, for example, the system is out of memory.
You can use ``kunit_add_action_or_reset`` instead which runs the action
diff --git a/include/kunit/resource.h b/include/kunit/resource.h
index c7383e90f5c9..4ad69a2642a5 100644
--- a/include/kunit/resource.h
+++ b/include/kunit/resource.h
@@ -390,6 +390,27 @@ void kunit_remove_resource(struct kunit *test, struct kunit_resource *res);
/* A 'deferred action' function to be used with kunit_add_action. */
typedef void (kunit_action_t)(void *);
+/**
+ * KUNIT_DEFINE_ACTION_WRAPPER() - Wrap a function for use as a deferred action.
+ *
+ * @wrapper: The name of the new wrapper function define.
+ * @orig: The original function to wrap.
+ * @arg_type: The type of the argument accepted by @orig.
+ *
+ * Defines a wrapper for a function which accepts a single, pointer-sized
+ * argument. This wrapper can then be passed to kunit_add_action() and
+ * similar. This should be used in preference to casting a function
+ * directly to kunit_action_t, as casting function pointers will break
+ * control flow integrity (CFI), leading to crashes.
+ */
+#define KUNIT_DEFINE_ACTION_WRAPPER(wrapper, orig, arg_type) \
+ static void wrapper(void *in) \
+ { \
+ arg_type arg = (arg_type)in; \
+ orig(arg); \
+ }
+
+
/**
* kunit_add_action() - Call a function when the test ends.
* @test: Test case to associate the action with.
diff --git a/lib/kunit/kunit-test.c b/lib/kunit/kunit-test.c
index 99d2a3a528e1..3e9c5192d095 100644
--- a/lib/kunit/kunit-test.c
+++ b/lib/kunit/kunit-test.c
@@ -538,10 +538,7 @@ static struct kunit_suite kunit_resource_test_suite = {
#if IS_BUILTIN(CONFIG_KUNIT_TEST)
/* This avoids a cast warning if kfree() is passed direct to kunit_add_action(). */
-static void kfree_wrapper(void *p)
-{
- kfree(p);
-}
+KUNIT_DEFINE_ACTION_WRAPPER(kfree_wrapper, kfree, const void *);
static void kunit_log_test(struct kunit *test)
{
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index f2eb71f1a66c..0308865194bb 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -772,6 +772,8 @@ static struct notifier_block kunit_mod_nb = {
};
#endif
+KUNIT_DEFINE_ACTION_WRAPPER(kfree_action_wrapper, kfree, const void *)
+
void *kunit_kmalloc_array(struct kunit *test, size_t n, size_t size, gfp_t gfp)
{
void *data;
@@ -781,7 +783,7 @@ void *kunit_kmalloc_array(struct kunit *test, size_t n, size_t size, gfp_t gfp)
if (!data)
return NULL;
- if (kunit_add_action_or_reset(test, (kunit_action_t *)kfree, data) != 0)
+ if (kunit_add_action_or_reset(test, kfree_action_wrapper, data) != 0)
return NULL;
return data;
@@ -793,7 +795,7 @@ void kunit_kfree(struct kunit *test, const void *ptr)
if (!ptr)
return;
- kunit_release_action(test, (kunit_action_t *)kfree, (void *)ptr);
+ kunit_release_action(test, kfree_action_wrapper, (void *)ptr);
}
EXPORT_SYMBOL_GPL(kunit_kfree);
--
2.43.0.rc1.413.gea7ed67945-goog
From: Willem de Bruijn <willemb(a)google.com>
Observed a clang warning when backporting cmsg_sender.
Ran the same build against all the .c files under selftests/net.
This is clang-14 with -Wall
Which is what tools/testing/selftests/net/Makefile also enables.
Willem de Bruijn (4):
selftests/net: ipsec: fix constant out of range
selftests/net: fix a char signedness issue
selftests/net: unix: fix unused variable compiler warning
selftests/net: mptcp: fix uninitialized variable warnings
tools/testing/selftests/net/af_unix/diag_uid.c | 1 -
tools/testing/selftests/net/cmsg_sender.c | 2 +-
tools/testing/selftests/net/ipsec.c | 4 ++--
tools/testing/selftests/net/mptcp/mptcp_connect.c | 11 ++++-------
tools/testing/selftests/net/mptcp/mptcp_inq.c | 11 ++++-------
5 files changed, 11 insertions(+), 18 deletions(-)
--
2.43.0.rc1.413.gea7ed67945-goog
From: Christoph Müllner <christoph.muellner(a)vrull.eu>
The upcoming RISC-V Ssdtso specification introduces a bit in the senvcfg
CSR to switch the memory consistency model at run-time from RVWMO to TSO
(and back). The active consistency model can therefore be switched on a
per-hart base and managed by the kernel on a per-process/thread base.
This patch implements basic Ssdtso support and adds a prctl API on top
so that user-space processes can switch to a stronger memory consistency
model (than the kernel was written for) at run-time.
I am not sure if other architectures support switching the memory
consistency model at run-time, but designing the prctl API in an
arch-independent way allows reusing it in the future.
The patchset also comes with a short documentation of the prctl API.
This series is based on the second draft of the Ssdtso specification
which was published recently on an RVI list:
https://lists.riscv.org/g/tech-arch-review/message/183
Note, that the Ssdtso specification is in development state
(i.e., not frozen or even ratified) which is also the reason
why I marked the series as RFC.
One aspect that is not covered in this patchset is virtualization.
It is planned to add virtualization support in a later version.
Hints/suggestions on how to implement this part are very much
appreciated.
Christoph Müllner (5):
RISC-V: Add basic Ssdtso support
RISC-V: Expose Ssdtso via hwprobe API
uapi: prctl: Add new prctl call to set/get the memory consistency
model
RISC-V: Implement prctl call to set/get the memory consistency model
RISC-V: selftests: Add DTSO tests
Documentation/arch/riscv/hwprobe.rst | 3 +
.../mm/dynamic-memory-consistency-model.rst | 76 ++++++++++++++++++
arch/riscv/Kconfig | 10 +++
arch/riscv/include/asm/csr.h | 1 +
arch/riscv/include/asm/dtso.h | 74 ++++++++++++++++++
arch/riscv/include/asm/hwcap.h | 1 +
arch/riscv/include/asm/processor.h | 8 ++
arch/riscv/include/asm/switch_to.h | 3 +
arch/riscv/include/uapi/asm/hwprobe.h | 1 +
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/cpufeature.c | 1 +
arch/riscv/kernel/dtso.c | 33 ++++++++
arch/riscv/kernel/process.c | 4 +
arch/riscv/kernel/sys_riscv.c | 1 +
include/uapi/linux/prctl.h | 5 ++
kernel/sys.c | 12 +++
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/dtso/.gitignore | 1 +
tools/testing/selftests/riscv/dtso/Makefile | 11 +++
tools/testing/selftests/riscv/dtso/dtso.c | 77 +++++++++++++++++++
20 files changed, 324 insertions(+), 1 deletion(-)
create mode 100644 Documentation/mm/dynamic-memory-consistency-model.rst
create mode 100644 arch/riscv/include/asm/dtso.h
create mode 100644 arch/riscv/kernel/dtso.c
create mode 100644 tools/testing/selftests/riscv/dtso/.gitignore
create mode 100644 tools/testing/selftests/riscv/dtso/Makefile
create mode 100644 tools/testing/selftests/riscv/dtso/dtso.c
--
2.41.0
As Guillaume pointed, many selftests create namespaces with very common
names (like "client" or "server") or even (partially) run directly in init_net.
This makes these tests prone to failure if another namespace with the same
name already exists. It also makes it impossible to run several instances
of these tests in parallel.
This patch set conver all the net selftests to run in unique namespace,
so we can update the selftest freamwork to run all tests in it's own namespace
in parallel. After update, we only need to wait for the test which need
longest time.
]# per_test_logging=1 time ./run_kselftest.sh -n -c net
TAP version 13
# selftests: net: reuseport_bpf_numa
not ok 3 selftests: net: reuseport_bpf_numa # exit=1
# selftests: net: reuseport_bpf_cpu
not ok 2 selftests: net: reuseport_bpf_cpu # exit=1
# selftests: net: reuseport_dualstack
not ok 4 selftests: net: reuseport_dualstack # exit=1
# selftests: net: reuseaddr_conflict
ok 5 selftests: net: reuseaddr_conflict
...
# selftests: net: test_vxlan_mdb.sh
ok 90 selftests: net: test_vxlan_mdb.sh
# selftests: net: fib_nexthops.sh
not ok 41 selftests: net: fib_nexthops.sh # exit=1
# selftests: net: fcnal-test.sh
not ok 36 selftests: net: fcnal-test.sh # exit=1
real 55m1.238s
user 12m10.350s
sys 22m17.432s
Hangbin Liu (38):
selftests/net: add lib.sh
selftests/net: arp_ndisc_evict_nocarrier.sh convert to run test in
unique namespace
selftest: arp_ndisc_untracked_subnets.sh convert to run test in unique
namespace
selftests/net: convert cmsg tests to make them run in unique namespace
selftests/net: convert drop_monitor_tests.sh to run it in unique
namespace
selftests/net: convert fcnal-test.sh to run it in unique namespace
selftests/net: convert fib_nexthop_multiprefix to run it in unique
namespace
selftests/net: convert fib_nexthop_nongw.sh to run it in unique
namespace
selftests/net: convert fib_nexthops.sh to run it in unique namespace
selftests/net: convert fib-onlink-tests.sh to run it in unique
namespace
selftests/net: convert fib_rule_tests.sh to run it in unique namespace
selftests/net: convert fib_tests.sh to run it in unique namespace
selftests/net: convert gre_gso.sh to run it in unique namespace
selftests/net: convert icmp_redirect.sh to run it in unique namespace
sleftests/net: convert icmp.sh to run it in unique namespace
selftests/net: convert ioam6.sh to run it in unique namespace
selftests/net: convert l2tp.sh to run it in unique namespace
selftests/net: convert ndisc_unsolicited_na_test.sh to run it in
unique namespace
selftests/net: convert netns-name.sh to run it in unique namespace
selftests/net: convert fdb_flush.sh to run it in unique namespace
selftests/net: convert rtnetlink.sh to run it in unique namespace
selftests/net: convert sctp_vrf.sh to run it in unique namespace
selftests/net: use unique netns name for setup_loopback.sh
setup_veth.sh
selftests/net: convert stress_reuseport_listen.sh to run it in unique
namespace
selftests/net: convert test_bridge_backup_port.sh to run it in unique
namespace
selftests/net: convert test_bridge_neigh_suppress.sh to run it in
unique namespace
selftests/net: convert test_vxlan_mdb.sh to run it in unique namespace
selftests/net: convert test_vxlan_nolocalbypass.sh to run it in unique
namespace
selftests/net: convert test_vxlan_under_vrf.sh to run it in unique
namespace
selftests/net: convert test_vxlan_vnifiltering.sh to run it in unique
namespace
selftests/net: convert toeplitz.sh to run it in unique namespace
selftests/net: convert unicast_extensions.sh to run it in unique
namespace
selftests/net: convert vrf_route_leaking.sh to run it in unique
namespace
selftests/net: convert vrf_strict_mode_test.sh to run it in unique
namespace
selftests/net: convert vrf-xfrm-tests.sh to run it in unique namespace
selftests/net: convert traceroute.sh to run it in unique namespace
selftests/net: convert xfrm_policy.sh to run it in unique namespace
kselftest/runner.sh: add netns support
tools/testing/selftests/kselftest/runner.sh | 26 +-
tools/testing/selftests/net/Makefile | 2 +-
.../net/arp_ndisc_evict_nocarrier.sh | 46 +--
.../net/arp_ndisc_untracked_subnets.sh | 18 +-
tools/testing/selftests/net/cmsg_ipv6.sh | 10 +-
tools/testing/selftests/net/cmsg_so_mark.sh | 7 +-
tools/testing/selftests/net/cmsg_time.sh | 7 +-
.../selftests/net/drop_monitor_tests.sh | 21 +-
tools/testing/selftests/net/fcnal-test.sh | 30 +-
tools/testing/selftests/net/fdb_flush.sh | 11 +-
.../testing/selftests/net/fib-onlink-tests.sh | 7 +-
.../selftests/net/fib_nexthop_multiprefix.sh | 104 +++--
.../selftests/net/fib_nexthop_nongw.sh | 34 +-
tools/testing/selftests/net/fib_nexthops.sh | 142 ++++---
tools/testing/selftests/net/fib_rule_tests.sh | 36 +-
tools/testing/selftests/net/fib_tests.sh | 184 +++++----
tools/testing/selftests/net/gre_gso.sh | 18 +-
tools/testing/selftests/net/icmp.sh | 10 +-
tools/testing/selftests/net/icmp_redirect.sh | 182 +++++----
tools/testing/selftests/net/ioam6.sh | 247 ++++++------
tools/testing/selftests/net/l2tp.sh | 130 +++----
tools/testing/selftests/net/lib.sh | 98 +++++
.../net/ndisc_unsolicited_na_test.sh | 19 +-
tools/testing/selftests/net/netns-name.sh | 44 +--
tools/testing/selftests/net/rtnetlink.sh | 21 +-
tools/testing/selftests/net/sctp_vrf.sh | 12 +-
tools/testing/selftests/net/settings | 2 +-
tools/testing/selftests/net/setup_loopback.sh | 8 +-
tools/testing/selftests/net/setup_veth.sh | 9 +-
.../selftests/net/stress_reuseport_listen.sh | 6 +-
.../selftests/net/test_bridge_backup_port.sh | 368 +++++++++---------
.../net/test_bridge_neigh_suppress.sh | 333 ++++++++--------
tools/testing/selftests/net/test_vxlan_mdb.sh | 202 +++++-----
.../selftests/net/test_vxlan_nolocalbypass.sh | 48 ++-
.../selftests/net/test_vxlan_under_vrf.sh | 70 ++--
.../selftests/net/test_vxlan_vnifiltering.sh | 154 +++++---
tools/testing/selftests/net/toeplitz.sh | 16 +-
tools/testing/selftests/net/traceroute.sh | 82 ++--
.../selftests/net/unicast_extensions.sh | 99 +++--
tools/testing/selftests/net/vrf-xfrm-tests.sh | 77 ++--
.../selftests/net/vrf_route_leaking.sh | 201 +++++-----
.../selftests/net/vrf_strict_mode_test.sh | 47 ++-
tools/testing/selftests/net/xfrm_policy.sh | 138 +++----
tools/testing/selftests/run_kselftest.sh | 4 +
44 files changed, 1676 insertions(+), 1654 deletions(-)
create mode 100644 tools/testing/selftests/net/lib.sh
--
2.41.0
Hi,
On Mon, Nov 27, 2023 at 11:49:16AM +0000, Felix Huettner wrote:
> conntrack zones are heavily used by tools like openvswitch to run
> multiple virtual "routers" on a single machine. In this context each
> conntrack zone matches to a single router, thereby preventing
> overlapping IPs from becoming issues.
> In these systems it is common to operate on all conntrack entries of a
> given zone, e.g. to delete them when a router is deleted. Previously this
> required these tools to dump the full conntrack table and filter out the
> relevant entries in userspace potentially causing performance issues.
>
> To do this we reuse the existing CTA_ZONE attribute. This was previous
> parsed but not used during dump and flush requests. Now if CTA_ZONE is
> set we filter these operations based on the provided zone.
> However this means that users that previously passed CTA_ZONE will
> experience a difference in functionality.
>
> Alternatively CTA_FILTER could have been used for the same
> functionality. However it is not yet supported during flush requests and
> is only available when using AF_INET or AF_INET6.
You mean, AF_UNSPEC cannot be specified in CTA_FILTER?
Please, extend libnetfilter_conntrack to support for this feature,
there is a filter API that can be used for this purpose.
Thanks.
On Fri, Nov 24, 2023 at 12:04:09PM +0100, Jonas Oberhauser wrote:
> > I think ARM64 approached this problem by adding the
> > load-acquire/store-release instructions and for TSO based code,
> > translate into those (eg. x86 -> arm64 transpilers).
>
>
> Although those instructions have a bit more ordering constraints.
>
> I have heard rumors that the apple chips also have a register that can be
> set at runtime.
Oh, I thought they made do with the load-acquire/store-release thingies.
But to be fair, I haven't been paying *that* much attention to the apple
stuff.
I did read about how they fudged some of the x86 flags thing.
> And there are some IBM machines that have a setting, but not sure how it is
> controlled.
Cute, I'm assuming this is the Power series (s390 already being TSO)? I
wasn't aware they had this.
> > IIRC Risc-V actually has such instructions as well, so *why* are you
> > doing this?!?!
>
>
> Unfortunately, at least last time I checked RISC-V still hadn't gotten such
> instructions.
> What they have is the *semantics* of the instructions, but no actual opcodes
> to encode them.
Well, that sucks..
> I argued for them in the RISC-V memory group, but it was considered to be
> outside the scope of that group.
>
> Transpiling with sufficient DMB ISH to get the desired ordering is really
> bad for performance.
Ha!, quite dreadful I would imagine.
> That is not to say that linux should support this. Perhaps linux should
> pressure RISC-V into supporting implicit barriers instead.
I'm not sure I count for much in this regard, but yeah, that sounds like
a plan :-)
The series adds support for setrlimit/getrlimit.
Mainly to avoid spurious coredumps when running the tests under
qemu-user.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (3):
tools/nolibc: drop custom definition of struct rusage
tools/nolibc: add support for getrlimit/setrlimit
selftests/nolibc: disable coredump via setrlimit
tools/include/nolibc/sys.h | 38 ++++++++++++++++++++++++++++
tools/include/nolibc/types.h | 21 +--------------
tools/testing/selftests/nolibc/nolibc-test.c | 31 +++++++++++++++++++++++
3 files changed, 70 insertions(+), 20 deletions(-)
---
base-commit: 0dbd4651f3f80151910a36416fa0df28a10c3b0a
change-id: 20231122-nolibc-rlimit-bb5b1f264fc4
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
In public cloud scenario, if kdump service works abnormally,
users cannot get vmcore. Without vmcore, user has no idea why the
kernel crashed. Meanwhile, there is no additional information
to find the reason why the kdump service is abnormal.
One way is to obtain console messages through VNC. The drawback
is that VNC is real-time, if user missed the timing to get the VNC
output, the crash needs to be retriggered.
Another way is to enable the console frontend of pstore and record the
console messages to the pstore backend. On the one hand, the console
logs only contain kernel printk logs and does not cover
user-mode print logs. Although we can redirect user-mode logs to the
pmsg frontend provided by pstore, user-mode information related to
booting and kdump service vary from systemd, kdump.sh, and so on which
makes redirection troublesome. So we added a tty frontend and save all
logs of tty driver to the pstore backend.
Another problem is that currently pstore only supports a single backend.
For debugging kdump problems, we hope to save the console logs and tty
logs to the ramoops backend of pstore, as it will not be lost after
rebooting. If the user has enabled another backend, the ramoops backend
will not be registered. To this end, we add the multi-backend function
to support simultaneous registration of multiple backends.
Based on the above changes, we can enable pstore in the crashdump kernel
and save the console logs and tty logs to the ramoops backend of pstore.
After rebooting, we can view the relevant logs by mounting the pstore
file system.
Furthermore, we also modified kexec-tools referring to crash-utils for
reading memory, so that pstore ramoops information can be read without
enabling pstore in first kernel. As we set the address and size of ramoops,
as well as the sizes of console and tty, we can infer the physical address
of console logs and tty logs in memory. Referring to the read method of
crash-utils, the console logs and tty logs are read from the memory,
user can get pstore debug information without affecting the first kernel
at all.
kexec-tools modification can be seen at
https://github.com/shuyuanmen/kexec-tools/blob/main/Add-pstore-segment.patch
Yuanhe Shu (5):
pstore: add tty frontend
pstore: add multi-backends support
pstore: add subdirs for multi-backends
pstore: remove the module parameter "backend"
tools/pstore: update pstore selftests
drivers/tty/n_tty.c | 1 +
fs/pstore/Kconfig | 23 ++
fs/pstore/Makefile | 2 +
fs/pstore/blk.c | 10 +
fs/pstore/ftrace.c | 22 +-
fs/pstore/inode.c | 86 ++++++-
fs/pstore/internal.h | 16 +-
fs/pstore/platform.c | 238 ++++++++++++--------
fs/pstore/pmsg.c | 23 +-
fs/pstore/ram.c | 40 +++-
fs/pstore/tty.c | 56 +++++
fs/pstore/zone.c | 42 +++-
include/linux/pstore.h | 33 +++
include/linux/pstore_blk.h | 3 +
include/linux/pstore_ram.h | 1 +
include/linux/pstore_zone.h | 2 +
include/linux/tty.h | 14 ++
tools/testing/selftests/pstore/common_tests | 4 -
18 files changed, 500 insertions(+), 116 deletions(-)
create mode 100644 fs/pstore/tty.c
--
2.39.3
Regressions that prevent a driver from probing a device can significantly
affect the functionality of a platform.
A kselftest to verify if devices on a DT-based platform are probed
correctly was recently introduced [1], but no such generic test is
available for ACPI platforms yet. bootrr [2] provides device probe
testing, but relies on a pre-defined list of the peripherals present on
each DUT.
On ACPI based hardware, a complete description of the platform is
provided to the OS by the system firmware. ACPI namespace objects are
mapped by the Linux ACPI subsystem into a device tree in
/sys/devices/LNXSYSTEM:00; the information in this subtree can be parsed
to build a list of the hw peripherals present on the DUT dynamically.
This series adds a test to verify if the devices declared in the ACPI
namespace and supported by the kernel are probed correctly.
This work follows a similar approach to [1], adapted for the ACPI use
case.
The first patch introduces a script that builds a list of all ACPI device
IDs supported by the kernel, by inspecting the acpi_device_id structs in
the sources. This list can be used to avoid testing ACPI-enumerated
devices that don't have a matching driver in the kernel. This script was
highly inspired by the dt-extract-compatibles script [3].
In the second patch, a new kselftest is added. It parses the
/sys/devices/LNXSYSTEM:00 tree to obtain a list of all platform
peripherals and verifies which of those, if supported, are correctly
bound to a driver.
Feedback is much appreciated,
Thank you,
Laura
[1] https://lore.kernel.org/all/20230828211424.2964562-1-nfraprado@collabora.co…
[2] https://github.com/kernelci/bootr
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scr…
Laura Nao (2):
acpi: Add script to extract ACPI device ids in the kernel
kselftest: Add test to detect unprobed devices on ACPI platforms
MAINTAINERS | 2 +
scripts/acpi/acpi-extract-ids | 60 +++++++++++++++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/acpi/.gitignore | 2 +
tools/testing/selftests/acpi/Makefile | 23 ++++++
.../selftests/acpi/test_unprobed_devices.sh | 75 +++++++++++++++++++
6 files changed, 163 insertions(+)
create mode 100755 scripts/acpi/acpi-extract-ids
create mode 100644 tools/testing/selftests/acpi/.gitignore
create mode 100644 tools/testing/selftests/acpi/Makefile
create mode 100755 tools/testing/selftests/acpi/test_unprobed_devices.sh
--
2.30.2
The za-fork test does not output a newline when reporting the result of
the one test it runs, causing the counts printed by kselftest to be
included in the test name. Add the newline.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/fp/za-fork.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/fp/za-fork.c b/tools/testing/selftests/arm64/fp/za-fork.c
index b86cb1049497..587b94648222 100644
--- a/tools/testing/selftests/arm64/fp/za-fork.c
+++ b/tools/testing/selftests/arm64/fp/za-fork.c
@@ -85,7 +85,7 @@ int main(int argc, char **argv)
*/
ret = open("/proc/sys/abi/sme_default_vector_length", O_RDONLY, 0);
if (ret >= 0) {
- ksft_test_result(fork_test(), "fork_test");
+ ksft_test_result(fork_test(), "fork_test\n");
} else {
ksft_print_msg("SME not supported\n");
---
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
change-id: 20231115-arm64-fix-za-fork-output-21cdd7a7195c
Best regards,
--
Mark Brown <broonie(a)kernel.org>
commit 05f1edac8009 ("selftests/mm: run all tests from run_vmtests.sh")
fixed the inconsistancy caused by tests being defined as TEST_GEN_PROGS.
This issue was leading to tests not being executed via run_vmtests.sh and
furthermore some tests running twice due to the kselftests wrapper also
executing them.
Fix the definition of two tests (soft-dirty and pagemap_ioctl)
that are still incorrectly defined.
Signed-off-by: Nico Pache <npache(a)redhat.com>
---
tools/testing/selftests/mm/Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 78dfec8bc676..dede0bcf97a3 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -60,7 +60,7 @@ TEST_GEN_FILES += mrelease_test
TEST_GEN_FILES += mremap_dontunmap
TEST_GEN_FILES += mremap_test
TEST_GEN_FILES += on-fault-limit
-TEST_GEN_PROGS += pagemap_ioctl
+TEST_GEN_FILES += pagemap_ioctl
TEST_GEN_FILES += thuge-gen
TEST_GEN_FILES += transhuge-stress
TEST_GEN_FILES += uffd-stress
@@ -72,7 +72,7 @@ TEST_GEN_FILES += mdwe_test
TEST_GEN_FILES += hugetlb_fault_after_madv
ifneq ($(ARCH),arm64)
-TEST_GEN_PROGS += soft-dirty
+TEST_GEN_FILES += soft-dirty
endif
ifeq ($(ARCH),x86_64)
--
2.41.0
Intel SIOV allows creating virtual devices of which the vRID is
represented by a pasid of a physical device. It is called as SIOV
virtual device in this series. Such devices can be bound to an iommufd
as physical device does and then later be attached to an IOAS/hwpt
using that pasid. Such PASIDs are called as default pasid.
iommufd has already supported pasid attach[1]. So a simple way to
support SIOV virtual device attachment is to let device driver call
the iommufd_device_pasid_attach() and pass in the default pasid for
the virtual device. This should work for now, but it may have problem
if iommufd core wants to differentiate the default pasids with other
kind of pasids (e.g. pasid given by userspace). In the later forwarding
page request to userspace, the default pasids are not supposed to send
to userspace as default pasids are mainly used by the SIOV device driver.
With above reason, this series chooses to have a new API to bind the
default pasid to iommufd, and extends the iommufd_device_attach() to
convert the attachment to be pasid attach with the default pasid. Device
drivers (e.g. VFIO) that support SIOV shall call the below APIs to
interact with iommufd:
- iommufd_device_bind_pasid(): Bind virtual device (a pasid of a device)
to iommufd;
- iommufd_device_attach(): Attach a SIOV virtual device to IOAS/HWPT;
- iommufd_device_replace(): Replace IOAS/HWPT of a SIOV virtual device;
- iommufd_device_detach(): Detach IOAS/HWPT of a SIOV virtual device;
- iommufd_device_unbind(): Unbind virtual device from iommufd;
For vfio devices, the device drivers that support SIOV should:
- use below API to register vdev for SIOV virtual device
vfio_register_pasid_iommu_dev()
- use below API to bind vdev to iommufd in .bind_iommufd() callback
iommufd_device_bind_pasid()
- allocate pasid by itself before calling iommufd_device_bind_pasid()
Complete code can be found at[2]
[1] https://lore.kernel.org/linux-iommu/20230926092651.17041-1-yi.l.liu@intel.c…
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_pasid_siov
Regards,
Yi Liu
Kevin Tian (5):
iommufd: Handle unsafe interrupts in a separate function
iommufd: Introduce iommufd_alloc_device()
iommufd: Add iommufd_device_bind_pasid()
iommufd: Support attach/replace for SIOV virtual device {dev, pasid}
vfio: Add vfio_register_pasid_iommu_dev()
Yi Liu (2):
iommufd/selftest: Extend IOMMU_TEST_OP_MOCK_DOMAIN to pass in pasid
iommufd/selftest: Add test coverage for SIOV virtual device
drivers/iommu/iommufd/device.c | 163 ++++++++++++++----
drivers/iommu/iommufd/iommufd_private.h | 7 +
drivers/iommu/iommufd/iommufd_test.h | 2 +
drivers/iommu/iommufd/selftest.c | 10 +-
drivers/vfio/group.c | 18 ++
drivers/vfio/vfio.h | 8 +
drivers/vfio/vfio_main.c | 10 ++
include/linux/iommufd.h | 3 +
include/linux/vfio.h | 1 +
tools/testing/selftests/iommu/iommufd.c | 75 ++++++--
.../selftests/iommu/iommufd_fail_nth.c | 42 ++++-
tools/testing/selftests/iommu/iommufd_utils.h | 21 ++-
12 files changed, 296 insertions(+), 64 deletions(-)
--
2.34.1
Multiple files/programs in `tools/testing/selftests/bpf/prog_tests/` still
heavily use the `CHECK` macro, even when better `ASSERT_` alternatives are
available.
As it was already pointed out by Yonghong Song [1] in the bpf selftests the use
of the ASSERT_* series of macros is preferred over the CHECK macro.
This patchset replaces the usage of `CHECK(` macros to the equivalent `ASSERT_`
family of macros in the following prog_tests:
- bind_perm.c
- bpf_obj_id.c
- bpf_tcp_ca.c
- vmlinux.c
[1] https://lore.kernel.org/lkml/0a142924-633c-44e6-9a92-2dc019656bf2@linux.dev
Changes in v3:
- Addressed the following points mentioned by Yonghong Song
- Improved `bpf_map_lookup_elem` assertion in bpf_tcp_ca.
- Replaced assertion introduced in v2 with one that checks `thread_ret`
instead of `pthread_join`. This ensures that `server`'s return value
(thread_ret) is the one being checked, as oposed to `pthread_join`'s
return value, since the latter one is less likely to fail.
Changes in v2:
- Fixed pthread_join assertion that broke the previous test
Previous version:
v2 - https://lore.kernel.org/lkml/GV1PR10MB6563AECF8E94798A1E5B36A4E8B6A@GV1PR10…
v1 - https://lore.kernel.org/lkml/GV1PR10MB6563FCFF1C5DEBE84FEA985FE8B0A@GV1PR10…
Yuran Pereira (4):
Replaces the usage of CHECK calls for ASSERTs in bpf_tcp_ca
Replaces the usage of CHECK calls for ASSERTs in bind_perm
Replaces the usage of CHECK calls for ASSERTs in bpf_obj_id
selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in
vmlinux
.../selftests/bpf/prog_tests/bind_perm.c | 6 +-
.../selftests/bpf/prog_tests/bpf_obj_id.c | 204 +++++++-----------
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 48 ++---
.../selftests/bpf/prog_tests/vmlinux.c | 16 +-
4 files changed, 105 insertions(+), 169 deletions(-)
--
2.25.1
Changes from v1:
* Dropped some changes that were independently fixed[1]
* No longer separate the f strings to their own patch
* Use r strings when the value is a regular expression
* Updated verification script
In retrospect a script to find the instances and apply fixes isn't that
useful for review, so the attached script this time just looks for
differences in the AST. Apply the series and run the script, with
the two references to compare as arguments.
There are some intentional changes to the AST now though, as the r strings
turn '\t' from a single character tab into a backslash and 't' character
pair (similar for '\n'). This does not affect the correctness of the
regular expression though.
v1: https://lore.kernel.org/all/20230814060704.79655-1-bgray@linux.ibm.com/
[1]: https://lore.kernel.org/all/20230816122133.1231599-1-vishalc@linux.ibm.com/
---
#!/usr/bin/env python3
"""
Verify Python syntax trees are equivalent between two references
"""
import argparse
import ast
from pathlib import Path
import subprocess as sp
def read_file(path: Path, ref: str) -> str:
return sp.run(f"git show {ref}:{path}", stdout=sp.PIPE, shell=True, encoding="utf-8", check=True).stdout
parser = argparse.ArgumentParser("Compare Python ASTs between revisions")
parser.add_argument("ref1", type=str, help="First revision to use")
parser.add_argument("ref2", type=str, help="Second revision to use")
args = parser.parse_args()
for pyfile in Path(".").glob("**/*.py"):
try:
ref1_content = read_file(pyfile, args.ref1)
ref2_content = read_file(pyfile, args.ref2)
except Exception as e:
print(f"ERROR:{pyfile}: Failed to read ({e})")
continue
try:
ref1_syntax = ast.parse(ref1_content, filename=pyfile)
ref2_syntax = ast.parse(ref2_content, filename=pyfile)
except SyntaxError as e:
print(f"ERROR:{pyfile}: Failed to parse, is it Python3? ({e})")
continue
if ast.dump(ref1_syntax) != ast.dump(ref2_syntax):
print(f"ERROR:{pyfile}: Revisions have different AST")
cmd = f"diff <(git show {args.ref1}:{pyfile} | python -m ast) <(git show {args.ref2}:{pyfile} | python -m ast)"
print(cmd)
sp.run(cmd, shell=True)
continue
Benjamin Gray (7):
ia64: fix Python string escapes
Documentation/sphinx: fix Python string escapes
drivers/comedi: fix Python string escapes
scripts: fix Python string escapes
tools/perf: fix Python string escapes
tools/power: fix Python string escapes
selftests/bpf: fix Python string escapes
Documentation/sphinx/cdomain.py | 2 +-
Documentation/sphinx/kernel_abi.py | 2 +-
Documentation/sphinx/kernel_feat.py | 2 +-
Documentation/sphinx/kerneldoc.py | 2 +-
Documentation/sphinx/maintainers_include.py | 8 +++---
arch/ia64/scripts/unwcheck.py | 2 +-
.../ni_routing/tools/convert_csv_to_c.py | 2 +-
scripts/clang-tools/gen_compile_commands.py | 2 +-
scripts/gdb/linux/symbols.py | 2 +-
tools/perf/pmu-events/jevents.py | 2 +-
.../scripts/python/arm-cs-trace-disasm.py | 4 +--
tools/perf/scripts/python/compaction-times.py | 2 +-
.../scripts/python/exported-sql-viewer.py | 4 +--
tools/power/pm-graph/bootgraph.py | 12 ++++-----
.../selftests/bpf/test_bpftool_synctypes.py | 26 +++++++++----------
tools/testing/selftests/bpf/test_offload.py | 2 +-
16 files changed, 38 insertions(+), 38 deletions(-)
--
2.41.0
From: Willem de Bruijn <willemb(a)google.com>
Commit 29f834aa326e ("net_sched: sch_fq: add 3 bands and WRR
scheduling") introduces multiple traffic bands, and per-band maximum
packet count.
Per-band limits ensures that packets in one class cannot fill the
entire qdisc and so cause DoS to the traffic in the other classes.
Verify this behavior:
1. set the limit to 10 per band
2. send 20 pkts on band A: verify that 10 are queued, 10 dropped
3. send 20 pkts on band A: verify that 0 are queued, 20 dropped
4. send 20 pkts on band B: verify that 10 are queued, 10 dropped
Packets must remain queued for a period to trigger this behavior.
Use SO_TXTIME to store packets for 100 msec.
The test reuses existing upstream test infra. The script is a fork of
cmsg_time.sh. The scripts call cmsg_sender.
The test extends cmsg_sender with two arguments:
* '-P' SO_PRIORITY
There is a subtle difference between IPv4 and IPv6 stack behavior:
PF_INET/IP_TOS sets IP header bits and sk_priority
PF_INET6/IPV6_TCLASS sets IP header bits BUT NOT sk_priority
* '-n' num pkts
Send multiple packets in quick succession.
I first attempted a for loop in the script, but this is too slow in
virtualized environments, causing flakiness as the 100ms timeout is
reached and packets are dequeued.
Also do not wait for timestamps to be queued unless timestamps are
requested.
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
---
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/cmsg_sender.c | 50 ++++++++++------
.../testing/selftests/net/fq_band_pktlimit.sh | 57 +++++++++++++++++++
3 files changed, 91 insertions(+), 17 deletions(-)
create mode 100755 tools/testing/selftests/net/fq_band_pktlimit.sh
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 5b2aca4c5f10..9274edfb76ff 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -91,6 +91,7 @@ TEST_PROGS += test_bridge_neigh_suppress.sh
TEST_PROGS += test_vxlan_nolocalbypass.sh
TEST_PROGS += test_bridge_backup_port.sh
TEST_PROGS += fdb_flush.sh
+TEST_PROGS += fq_band_pktlimit.sh
TEST_FILES := settings
diff --git a/tools/testing/selftests/net/cmsg_sender.c b/tools/testing/selftests/net/cmsg_sender.c
index 24b21b15ed3f..8d7575389f58 100644
--- a/tools/testing/selftests/net/cmsg_sender.c
+++ b/tools/testing/selftests/net/cmsg_sender.c
@@ -45,11 +45,13 @@ struct options {
const char *host;
const char *service;
unsigned int size;
+ unsigned int num_pkt;
struct {
unsigned int mark;
unsigned int dontfrag;
unsigned int tclass;
unsigned int hlimit;
+ unsigned int priority;
} sockopt;
struct {
unsigned int family;
@@ -72,6 +74,7 @@ struct options {
} v6;
} opt = {
.size = 13,
+ .num_pkt = 1,
.sock = {
.family = AF_UNSPEC,
.type = SOCK_DGRAM,
@@ -112,7 +115,7 @@ static void cs_parse_args(int argc, char *argv[])
{
int o;
- while ((o = getopt(argc, argv, "46sS:p:m:M:d:tf:F:c:C:l:L:H:")) != -1) {
+ while ((o = getopt(argc, argv, "46sS:p:P:m:M:n:d:tf:F:c:C:l:L:H:")) != -1) {
switch (o) {
case 's':
opt.silent_send = true;
@@ -138,7 +141,9 @@ static void cs_parse_args(int argc, char *argv[])
cs_usage(argv[0]);
}
break;
-
+ case 'P':
+ opt.sockopt.priority = atoi(optarg);
+ break;
case 'm':
opt.mark.ena = true;
opt.mark.val = atoi(optarg);
@@ -146,6 +151,9 @@ static void cs_parse_args(int argc, char *argv[])
case 'M':
opt.sockopt.mark = atoi(optarg);
break;
+ case 'n':
+ opt.num_pkt = atoi(optarg);
+ break;
case 'd':
opt.txtime.ena = true;
opt.txtime.delay = atoi(optarg);
@@ -410,6 +418,10 @@ static void ca_set_sockopts(int fd)
setsockopt(fd, SOL_IPV6, IPV6_UNICAST_HOPS,
&opt.sockopt.hlimit, sizeof(opt.sockopt.hlimit)))
error(ERN_SOCKOPT, errno, "setsockopt IPV6_HOPLIMIT");
+ if (opt.sockopt.priority &&
+ setsockopt(fd, SOL_SOCKET, SO_PRIORITY,
+ &opt.sockopt.priority, sizeof(opt.sockopt.priority)))
+ error(ERN_SOCKOPT, errno, "setsockopt SO_PRIORITY");
}
int main(int argc, char *argv[])
@@ -421,6 +433,7 @@ int main(int argc, char *argv[])
char *buf;
int err;
int fd;
+ int i;
cs_parse_args(argc, argv);
@@ -480,24 +493,27 @@ int main(int argc, char *argv[])
cs_write_cmsg(fd, &msg, cbuf, sizeof(cbuf));
- err = sendmsg(fd, &msg, 0);
- if (err < 0) {
- if (!opt.silent_send)
- fprintf(stderr, "send failed: %s\n", strerror(errno));
- err = ERN_SEND;
- goto err_out;
- } else if (err != (int)opt.size) {
- fprintf(stderr, "short send\n");
- err = ERN_SEND_SHORT;
- goto err_out;
- } else {
- err = ERN_SUCCESS;
+ for (i = 0; i < opt.num_pkt; i++) {
+ err = sendmsg(fd, &msg, 0);
+ if (err < 0) {
+ if (!opt.silent_send)
+ fprintf(stderr, "send failed: %s\n", strerror(errno));
+ err = ERN_SEND;
+ goto err_out;
+ } else if (err != (int)opt.size) {
+ fprintf(stderr, "short send\n");
+ err = ERN_SEND_SHORT;
+ goto err_out;
+ }
}
+ err = ERN_SUCCESS;
- /* Make sure all timestamps have time to loop back */
- usleep(opt.txtime.delay);
+ if (opt.ts.ena) {
+ /* Make sure all timestamps have time to loop back */
+ usleep(opt.txtime.delay);
- cs_read_cmsg(fd, &msg, cbuf, sizeof(cbuf));
+ cs_read_cmsg(fd, &msg, cbuf, sizeof(cbuf));
+ }
err_out:
close(fd);
diff --git a/tools/testing/selftests/net/fq_band_pktlimit.sh b/tools/testing/selftests/net/fq_band_pktlimit.sh
new file mode 100755
index 000000000000..24b77bdf41ff
--- /dev/null
+++ b/tools/testing/selftests/net/fq_band_pktlimit.sh
@@ -0,0 +1,57 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Verify that FQ has a packet limit per band:
+#
+# 1. set the limit to 10 per band
+# 2. send 20 pkts on band A: verify that 10 are queued, 10 dropped
+# 3. send 20 pkts on band A: verify that 0 are queued, 20 dropped
+# 4. send 20 pkts on band B: verify that 10 are queued, 10 dropped
+#
+# Send packets with a 100ms delay to ensure that previously sent
+# packets are still queued when later ones are sent.
+# Use SO_TXTIME for this.
+
+die() {
+ echo "$1"
+ exit 1
+}
+
+# run inside private netns
+if [[ $# -eq 0 ]]; then
+ ./in_netns.sh "$0" __subprocess
+ exit
+fi
+
+ip link add type dummy
+ip link set dev dummy0 up
+ip -6 addr add fdaa::1/128 dev dummy0
+ip -6 route add fdaa::/64 dev dummy0
+tc qdisc replace dev dummy0 root handle 1: fq quantum 1514 initial_quantum 1514 limit 10
+
+./cmsg_sender -6 -p u -d 100000 -n 20 fdaa::2 8000
+OUT1="$(tc -s qdisc show dev dummy0 | grep '^\ Sent')"
+
+./cmsg_sender -6 -p u -d 100000 -n 20 fdaa::2 8000
+OUT2="$(tc -s qdisc show dev dummy0 | grep '^\ Sent')"
+
+./cmsg_sender -6 -p u -d 100000 -n 20 -P 7 fdaa::2 8000
+OUT3="$(tc -s qdisc show dev dummy0 | grep '^\ Sent')"
+
+# Initial stats will report zero sent, as all packets are still
+# queued in FQ. Sleep for the delay period (100ms) and see that
+# twenty are now sent.
+sleep 0.1
+OUT4="$(tc -s qdisc show dev dummy0 | grep '^\ Sent')"
+
+# Log the output after the test
+echo "${OUT1}"
+echo "${OUT2}"
+echo "${OUT3}"
+echo "${OUT4}"
+
+# Test the output for expected values
+echo "${OUT1}" | grep -q '0\ pkt\ (dropped\ 10' || die "unexpected drop count at 1"
+echo "${OUT2}" | grep -q '0\ pkt\ (dropped\ 30' || die "unexpected drop count at 2"
+echo "${OUT3}" | grep -q '0\ pkt\ (dropped\ 40' || die "unexpected drop count at 3"
+echo "${OUT4}" | grep -q '20\ pkt\ (dropped\ 40' || die "unexpected accept count at 4"
--
2.43.0.rc1.413.gea7ed67945-goog
On Wed, Nov 15, 2023 at 01:17:06PM +0800, Liu, Jing2 wrote:
> This is the right way to approach it,
>
> I learned that there was discussion about using io_uring to get the
> page fault without
>
> eventfd notification in [1], and I am new at io_uring and studying the
> man page of
>
> liburing, but there're questions in my mind on how can QEMU get the
> coming page fault
>
> with a good performance.
>
> Since both QEMU and Kernel don't know when comes faults, after QEMU
> submits one
>
> read task to io_uring, we want kernel pending until fault comes. While
> based on
>
> hwpt_fault_fops_read() in [patch v2 4/6], it just returns 0 since
> there's now no fault,
>
> thus this round of read completes to CQ but it's not what we want. So
> I'm wondering
>
> how kernel pending on the read until fault comes. Does fops callback
> need special work to
Implement a fops with poll support that triggers when a new event is
pushed and everything will be fine. There are many examples in the
kernel. The ones in the mlx5 vfio driver spring to mind as a scheme I
recently looked at.
Jason
Multiple files/programs in `tools/testing/selftests/bpf/prog_tests/` still
heavily use the `CHECK` macro, even when better `ASSERT_` alternatives are
available.
As it was already pointed out by Yonghong Song [1] in the bpf selftests the use
of the ASSERT_* series of macros is preferred over the CHECK macro.
This patchset replaces the usage of `CHECK(` macros to the equivalent `ASSERT_`
family of macros in the following prog_tests:
- bind_perm.c
- bpf_obj_id.c
- bpf_tcp_ca.c
- vmlinux.c
[1] https://lore.kernel.org/lkml/0a142924-633c-44e6-9a92-2dc019656bf2@linux.dev
Changes in v2:
- Fixed pthread_join assertion that broke the previous test
Previous version:
v1 - https://lore.kernel.org/lkml/GV1PR10MB6563FCFF1C5DEBE84FEA985FE8B0A@GV1PR10…
Yuran Pereira (4):
Replaces the usage of CHECK calls for ASSERTs in bpf_tcp_ca
Replaces the usage of CHECK calls for ASSERTs in bind_perm
Replaces the usage of CHECK calls for ASSERTs in bpf_obj_id
selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in
vmlinux
.../selftests/bpf/prog_tests/bind_perm.c | 6 +-
.../selftests/bpf/prog_tests/bpf_obj_id.c | 204 +++++++-----------
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 50 ++---
.../selftests/bpf/prog_tests/vmlinux.c | 16 +-
4 files changed, 106 insertions(+), 170 deletions(-)
--
2.25.1
v4:
- Update patch 1 to move apply_wqattrs_lock() and apply_wqattrs_unlock()
down into CONFIG_SYSFS block to avoid compilation warnings.
v3:
- Break out a separate patch to make workqueue_set_unbound_cpumask()
static and move it down to the CONFIG_SYSFS section.
- Remove the "__DEBUG__." prefix and the CFTYPE_DEBUG flag from the
new root only cpuset.cpus.isolated control files and update the
test accordingly.
v2:
- Add 2 read-only workqueue sysfs files to expose the user requested
cpumask as well as the isolated CPUs to be excluded from
wq_unbound_cpumask.
- Ensure that caller of the new workqueue_unbound_exclude_cpumask()
hold cpus_read_lock.
- Update the cpuset code to make sure the cpus_read_lock is held
whenever workqueue_unbound_exclude_cpumask() may be called.
Isolated cpuset partition can currently be created to contain an
exclusive set of CPUs not used in other cgroups and with load balancing
disabled to reduce interference from the scheduler.
The main purpose of this isolated partition type is to dynamically
emulate what can be done via the "isolcpus" boot command line option,
specifically the default domain flag. One effect of the "isolcpus" option
is to remove the isolated CPUs from the cpumasks of unbound workqueues
since running work functions in an isolated CPU can be a major source
of interference. Changing the unbound workqueue cpumasks can be done at
run time by writing an appropriate cpumask without the isolated CPUs to
/sys/devices/virtual/workqueue/cpumask. So one can set up an isolated
cpuset partition and then write to the cpumask sysfs file to achieve
similar level of CPU isolation. However, this manual process can be
error prone.
This patch series implements automatic exclusion of isolated CPUs from
unbound workqueue cpumasks when an isolated cpuset partition is created
and then adds those CPUs back when the isolated partition is destroyed.
There are also other places in the kernel that look at the HK_FLAG_DOMAIN
cpumask or other HK_FLAG_* cpumasks and exclude the isolated CPUs from
certain actions to further reduce interference. CPUs in an isolated
cpuset partition will not be able to avoid those interferences yet. That
may change in the future as the need arises.
Waiman Long (5):
workqueue: Make workqueue_set_unbound_cpumask() static
workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs
from wq_unbound_cpumask
selftests/cgroup: Minor code cleanup and reorganization of
test_cpuset_prs.sh
cgroup/cpuset: Keep track of CPUs in isolated partitions
cgroup/cpuset: Take isolated CPUs out of workqueue unbound cpumask
Documentation/admin-guide/cgroup-v2.rst | 10 +-
include/linux/workqueue.h | 2 +-
kernel/cgroup/cpuset.c | 286 +++++++++++++-----
kernel/workqueue.c | 165 +++++++---
.../selftests/cgroup/test_cpuset_prs.sh | 216 ++++++++-----
5 files changed, 475 insertions(+), 204 deletions(-)
--
2.39.3
The kernel has recently added support for shadow stacks, currently
x86 only using their CET feature but both arm64 and RISC-V have
equivalent features (GCS and Zisslpcfi respectively), I am actively
working on GCS[1]. With shadow stacks the hardware maintains an
additional stack containing only the return addresses for branch
instructions which is not generally writeable by userspace and ensures
that any returns are to the recorded addresses. This provides some
protection against ROP attacks and making it easier to collect call
stacks. These shadow stacks are allocated in the address space of the
userspace process.
Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled. The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread. This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces. As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.
Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process in a similar manner
to how the normal stack is specified, keeping the current implicit
allocation behaviour if one is not specified either with clone3() or
through the use of clone(). Unlike normal stacks only the shadow stack
size is specified, similar issues to those that lead to the creation of
map_shadow_stack() apply.
Please note that the x86 portions of this code are build tested only, I
don't appear to have a system that can run CET avaible to me, I have
done testing with an integration into my pending work for GCS. There is
some possibility that the arm64 implementation may require the use of
clone3() and explicit userspace allocation of shadow stacks, this is
still under discussion.
A new architecture feature Kconfig option for shadow stacks is added as
here, this was suggested as part of the review comments for the arm64
GCS series and since we need to detect if shadow stacks are supported it
seemed sensible to roll it in here.
[1] https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org/
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto v6.7-rc1.
- Remove ability to provide preallocated shadow stack, just specify the
desired size.
- Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke…
---
Mark Brown (5):
mm: Introduce ARCH_HAS_USER_SHADOW_STACK
fork: Add shadow stack support to clone3()
selftests/clone3: Factor more of main loop into test_clone3()
selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
kselftest/clone3: Test shadow stack support
arch/x86/Kconfig | 1 +
arch/x86/include/asm/shstk.h | 11 +-
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 30 ++++-
fs/proc/task_mmu.c | 2 +-
include/linux/mm.h | 2 +-
include/linux/sched/task.h | 2 +
include/uapi/linux/sched.h | 4 +
kernel/fork.c | 24 +++-
mm/Kconfig | 6 +
tools/testing/selftests/clone3/clone3.c | 151 ++++++++++++++++------
tools/testing/selftests/clone3/clone3_selftests.h | 7 +
12 files changed, 188 insertions(+), 54 deletions(-)
---
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Changelog:
v5:
* Replace reference getting with an rcu_read_lock() section for
zswap lru modifications (suggested by Yosry)
* Add a new prep patch that allows mem_cgroup_iter() to return
online cgroup.
* Add a callback that updates pool->next_shrink when the cgroup is
offlined (suggested by Yosry Ahmed, Johannes Weiner)
v4:
* Rename list_lru_add to list_lru_add_obj and __list_lru_add to
list_lru_add (patch 1) (suggested by Johannes Weiner and
Yosry Ahmed)
* Some cleanups on the memcg aware LRU patch (patch 2)
(suggested by Yosry Ahmed)
* Use event interface for the new per-cgroup writeback counters.
(patch 3) (suggested by Yosry Ahmed)
* Abstract zswap's lruvec states and handling into
zswap_lruvec_state (patch 5) (suggested by Yosry Ahmed)
v3:
* Add a patch to export per-cgroup zswap writeback counters
* Add a patch to update zswap's kselftest
* Separate the new list_lru functions into its own prep patch
* Do not start from the top of the hierarchy when encounter a memcg
that is not online for the global limit zswap writeback (patch 2)
(suggested by Yosry Ahmed)
* Do not remove the swap entry from list_lru in
__read_swapcache_async() (patch 2) (suggested by Yosry Ahmed)
* Removed a redundant zswap pool getting (patch 2)
(reported by Ryan Roberts)
* Use atomic for the nr_zswap_protected (instead of lruvec's lock)
(patch 5) (suggested by Yosry Ahmed)
* Remove the per-cgroup zswap shrinker knob (patch 5)
(suggested by Yosry Ahmed)
v2:
* Fix loongarch compiler errors
* Use pool stats instead of memcg stats when !CONFIG_MEMCG_KEM
There are currently several issues with zswap writeback:
1. There is only a single global LRU for zswap, making it impossible to
perform worload-specific shrinking - an memcg under memory pressure
cannot determine which pages in the pool it owns, and often ends up
writing pages from other memcgs. This issue has been previously
observed in practice and mitigated by simply disabling
memcg-initiated shrinking:
https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u
But this solution leaves a lot to be desired, as we still do not
have an avenue for an memcg to free up its own memory locked up in
the zswap pool.
2. We only shrink the zswap pool when the user-defined limit is hit.
This means that if we set the limit too high, cold data that are
unlikely to be used again will reside in the pool, wasting precious
memory. It is hard to predict how much zswap space will be needed
ahead of time, as this depends on the workload (specifically, on
factors such as memory access patterns and compressibility of the
memory pages).
This patch series solves these issues by separating the global zswap
LRU into per-memcg and per-NUMA LRUs, and performs workload-specific
(i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The
new shrinker does not have any parameter that must be tuned by the
user, and can be opted in or out on a per-memcg basis.
As a proof of concept, we ran the following synthetic benchmark:
build the linux kernel in a memory-limited cgroup, and allocate some
cold data in tmpfs to see if the shrinker could write them out and
improved the overall performance. Depending on the amount of cold data
generated, we observe from 14% to 35% reduction in kernel CPU time used
in the kernel builds.
Domenico Cerasuolo (3):
zswap: make shrinking memcg-aware
mm: memcg: add per-memcg zswap writeback stat
selftests: cgroup: update per-memcg zswap writeback selftest
Nhat Pham (3):
list_lru: allows explicit memcg and NUMA node selection
memcontrol: allows mem_cgroup_iter() to check for onlineness
zswap: shrinks zswap pool based on memory pressure
Documentation/admin-guide/mm/zswap.rst | 7 +
drivers/android/binder_alloc.c | 5 +-
fs/dcache.c | 8 +-
fs/gfs2/quota.c | 6 +-
fs/inode.c | 4 +-
fs/nfs/nfs42xattr.c | 8 +-
fs/nfsd/filecache.c | 4 +-
fs/xfs/xfs_buf.c | 6 +-
fs/xfs/xfs_dquot.c | 2 +-
fs/xfs/xfs_qm.c | 2 +-
include/linux/list_lru.h | 46 ++-
include/linux/memcontrol.h | 9 +-
include/linux/mmzone.h | 2 +
include/linux/vm_event_item.h | 1 +
include/linux/zswap.h | 27 +-
mm/list_lru.c | 48 ++-
mm/memcontrol.c | 20 +-
mm/mmzone.c | 1 +
mm/shrinker.c | 4 +-
mm/swap.h | 3 +-
mm/swap_state.c | 26 +-
mm/vmscan.c | 26 +-
mm/vmstat.c | 1 +
mm/workingset.c | 4 +-
mm/zswap.c | 430 +++++++++++++++++---
tools/testing/selftests/cgroup/test_zswap.c | 74 ++--
26 files changed, 625 insertions(+), 149 deletions(-)
--
2.34.1