November 2023 - Linux-kselftest-mirror

Re: [PATCH v2 0/6] IOMMUFD: Deliver IO page faults to user space

by Jason Gunthorpe

On Wed, Nov 15, 2023 at 01:17:06PM +0800, Liu, Jing2 wrote: > This is the right way to approach it, > > I learned that there was discussion about using io_uring to get the > page fault without > > eventfd notification in [1], and I am new at io_uring and studying the > man page of > > liburing, but there're questions in my mind on how can QEMU get the > coming page fault > > with a good performance. > > Since both QEMU and Kernel don't know when comes faults, after QEMU > submits one > > read task to io_uring, we want kernel pending until fault comes. While > based on > > hwpt_fault_fops_read() in [patch v2 4/6], it just returns 0 since > there's now no fault, > > thus this round of read completes to CQ but it's not what we want. So > I'm wondering > > how kernel pending on the read until fault comes. Does fops callback > need special work to Implement a fops with poll support that triggers when a new event is pushed and everything will be fine. There are many examples in the kernel. The ones in the mlx5 vfio driver spring to mind as a scheme I recently looked at. Jason

2 years

2
2
0 0

[PATCH bpf-next v2 0/4] selftests/bpf: Update multiple prog_tests to use ASSERT_ macros

by Yuran Pereira

Multiple files/programs in `tools/testing/selftests/bpf/prog_tests/` still heavily use the `CHECK` macro, even when better `ASSERT_` alternatives are available. As it was already pointed out by Yonghong Song [1] in the bpf selftests the use of the ASSERT_* series of macros is preferred over the CHECK macro. This patchset replaces the usage of `CHECK(` macros to the equivalent `ASSERT_` family of macros in the following prog_tests: - bind_perm.c - bpf_obj_id.c - bpf_tcp_ca.c - vmlinux.c [1] https://lore.kernel.org/lkml/0a142924-633c-44e6-9a92-2dc019656bf2@linux.dev Changes in v2: - Fixed pthread_join assertion that broke the previous test Previous version: v1 - https://lore.kernel.org/lkml/GV1PR10MB6563FCFF1C5DEBE84FEA985FE8B0A@GV1PR10… Yuran Pereira (4): Replaces the usage of CHECK calls for ASSERTs in bpf_tcp_ca Replaces the usage of CHECK calls for ASSERTs in bind_perm Replaces the usage of CHECK calls for ASSERTs in bpf_obj_id selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in vmlinux .../selftests/bpf/prog_tests/bind_perm.c | 6 +- .../selftests/bpf/prog_tests/bpf_obj_id.c | 204 +++++++----------- .../selftests/bpf/prog_tests/bpf_tcp_ca.c | 50 ++--- .../selftests/bpf/prog_tests/vmlinux.c | 16 +- 4 files changed, 106 insertions(+), 170 deletions(-) -- 2.25.1

2 years

2
10
0 0

[PATCH v4 0/5] cgroup/cpuset: Improve CPU isolation in isolated partitions

by Waiman Long

v4: - Update patch 1 to move apply_wqattrs_lock() and apply_wqattrs_unlock() down into CONFIG_SYSFS block to avoid compilation warnings. v3: - Break out a separate patch to make workqueue_set_unbound_cpumask() static and move it down to the CONFIG_SYSFS section. - Remove the "__DEBUG__." prefix and the CFTYPE_DEBUG flag from the new root only cpuset.cpus.isolated control files and update the test accordingly. v2: - Add 2 read-only workqueue sysfs files to expose the user requested cpumask as well as the isolated CPUs to be excluded from wq_unbound_cpumask. - Ensure that caller of the new workqueue_unbound_exclude_cpumask() hold cpus_read_lock. - Update the cpuset code to make sure the cpus_read_lock is held whenever workqueue_unbound_exclude_cpumask() may be called. Isolated cpuset partition can currently be created to contain an exclusive set of CPUs not used in other cgroups and with load balancing disabled to reduce interference from the scheduler. The main purpose of this isolated partition type is to dynamically emulate what can be done via the "isolcpus" boot command line option, specifically the default domain flag. One effect of the "isolcpus" option is to remove the isolated CPUs from the cpumasks of unbound workqueues since running work functions in an isolated CPU can be a major source of interference. Changing the unbound workqueue cpumasks can be done at run time by writing an appropriate cpumask without the isolated CPUs to /sys/devices/virtual/workqueue/cpumask. So one can set up an isolated cpuset partition and then write to the cpumask sysfs file to achieve similar level of CPU isolation. However, this manual process can be error prone. This patch series implements automatic exclusion of isolated CPUs from unbound workqueue cpumasks when an isolated cpuset partition is created and then adds those CPUs back when the isolated partition is destroyed. There are also other places in the kernel that look at the HK_FLAG_DOMAIN cpumask or other HK_FLAG_* cpumasks and exclude the isolated CPUs from certain actions to further reduce interference. CPUs in an isolated cpuset partition will not be able to avoid those interferences yet. That may change in the future as the need arises. Waiman Long (5): workqueue: Make workqueue_set_unbound_cpumask() static workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs from wq_unbound_cpumask selftests/cgroup: Minor code cleanup and reorganization of test_cpuset_prs.sh cgroup/cpuset: Keep track of CPUs in isolated partitions cgroup/cpuset: Take isolated CPUs out of workqueue unbound cpumask Documentation/admin-guide/cgroup-v2.rst | 10 +- include/linux/workqueue.h | 2 +- kernel/cgroup/cpuset.c | 286 +++++++++++++----- kernel/workqueue.c | 165 +++++++--- .../selftests/cgroup/test_cpuset_prs.sh | 216 ++++++++----- 5 files changed, 475 insertions(+), 204 deletions(-) -- 2.39.3

2 years

2
7
0 0

[PATCH RFC RFT v2 0/5] fork: Support shadow stacks in clone3()

by Mark Brown

The kernel has recently added support for shadow stacks, currently x86 only using their CET feature but both arm64 and RISC-V have equivalent features (GCS and Zisslpcfi respectively), I am actively working on GCS[1]. With shadow stacks the hardware maintains an additional stack containing only the return addresses for branch instructions which is not generally writeable by userspace and ensures that any returns are to the recorded addresses. This provides some protection against ROP attacks and making it easier to collect call stacks. These shadow stacks are allocated in the address space of the userspace process. Our API for shadow stacks does not currently offer userspace any flexiblity for managing the allocation of shadow stacks for newly created threads, instead the kernel allocates a new shadow stack with the same size as the normal stack whenever a thread is created with the feature enabled. The stacks allocated in this way are freed by the kernel when the thread exits or shadow stacks are disabled for the thread. This lack of flexibility and control isn't ideal, in the vast majority of cases the shadow stack will be over allocated and the implicit allocation and deallocation is not consistent with other interfaces. As far as I can tell the interface is done in this manner mainly because the shadow stack patches were in development since before clone3() was implemented. Since clone3() is readily extensible let's add support for specifying a shadow stack when creating a new thread or process in a similar manner to how the normal stack is specified, keeping the current implicit allocation behaviour if one is not specified either with clone3() or through the use of clone(). Unlike normal stacks only the shadow stack size is specified, similar issues to those that lead to the creation of map_shadow_stack() apply. Please note that the x86 portions of this code are build tested only, I don't appear to have a system that can run CET avaible to me, I have done testing with an integration into my pending work for GCS. There is some possibility that the arm64 implementation may require the use of clone3() and explicit userspace allocation of shadow stacks, this is still under discussion. A new architecture feature Kconfig option for shadow stacks is added as here, this was suggested as part of the review comments for the arm64 GCS series and since we need to detect if shadow stacks are supported it seemed sensible to roll it in here. [1] https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org/ Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Changes in v2: - Rebase onto v6.7-rc1. - Remove ability to provide preallocated shadow stack, just specify the desired size. - Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke… --- Mark Brown (5): mm: Introduce ARCH_HAS_USER_SHADOW_STACK fork: Add shadow stack support to clone3() selftests/clone3: Factor more of main loop into test_clone3() selftests/clone3: Allow tests to flag if -E2BIG is a valid error code kselftest/clone3: Test shadow stack support arch/x86/Kconfig | 1 + arch/x86/include/asm/shstk.h | 11 +- arch/x86/kernel/process.c | 2 +- arch/x86/kernel/shstk.c | 30 ++++- fs/proc/task_mmu.c | 2 +- include/linux/mm.h | 2 +- include/linux/sched/task.h | 2 + include/uapi/linux/sched.h | 4 + kernel/fork.c | 24 +++- mm/Kconfig | 6 + tools/testing/selftests/clone3/clone3.c | 151 ++++++++++++++++------ tools/testing/selftests/clone3/clone3_selftests.h | 7 + 12 files changed, 188 insertions(+), 54 deletions(-) --- base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86 change-id: 20231019-clone3-shadow-stack-15d40d2bf536 Best regards, -- Mark Brown <broonie(a)kernel.org>

2 years

5
31
0 0

[PATCH] selftests/resctrl: Add non-contiguous CBMs CAT test

by Maciej Wieczor-Retman

Non-contiguous CBM support for Intel CAT has been merged into the kernel with Commit 0e3cd31f6e90 ("x86/resctrl: Enable non-contiguous CBMs in Intel CAT") but there is no selftest that would validate if this feature works correctly. The selftest needs to verify if writing non-contiguous CBMs to the schemata file behaves as expected in comparison to the information about non-contiguous CBMs support. Add tests for both L2 and L3 CAT to verify if the return values generated by writing non-contiguous CBMs don't contradict the reported non-contiguous support information. Comparing the return value of write_schemata() with non-contiguous CBMs support information can be simplified as a logical XOR operation. In other words if non-contiguous CBMs are supported and if non-contiguous write succeeds the test should succeed and if the write fails the test should also fail. The opposite should happen if non-contiguous CBMs are not supported. Signed-off-by: Maciej Wieczor-Retman <maciej.wieczor-retman(a)intel.com> --- This patch is based on a rework of resctrl selftests that's currently in review [1]. The patch also implements a similiar functionality presented in the bash script included in the cover letter to the original non-contiguous CBMs in Intel CAT series [2]. [1] https://lore.kernel.org/all/20231024092634.7122-1-ilpo.jarvinen@linux.intel… [2] https://lore.kernel.org/all/cover.1696934091.git.maciej.wieczor-retman@inte… tools/testing/selftests/resctrl/cat_test.c | 97 +++++++++++++++++++ tools/testing/selftests/resctrl/resctrl.h | 2 + .../testing/selftests/resctrl/resctrl_tests.c | 2 + 3 files changed, 101 insertions(+) diff --git a/tools/testing/selftests/resctrl/cat_test.c b/tools/testing/selftests/resctrl/cat_test.c index bc88eb891f35..6a01a5da30b4 100644 --- a/tools/testing/selftests/resctrl/cat_test.c +++ b/tools/testing/selftests/resctrl/cat_test.c @@ -342,6 +342,87 @@ static int cat_run_test(const struct resctrl_test *test, const struct user_param return ret; } +static int noncont_cat_run_test(const struct resctrl_test *test, + const struct user_params *uparams) +{ + unsigned long full_cache_mask, cont_mask, noncont_mask; + unsigned int eax, ebx, ecx, edx, ret, sparse_masks; + char res_path[PATH_MAX]; + char schemata[64]; + int bit_center; + FILE *fp; + + /* Check to compare sparse_masks content to cpuid output. */ + snprintf(res_path, sizeof(res_path), "%s/%s/%s", INFO_PATH, + test->resource, "sparse_masks"); + + fp = fopen(res_path, "r"); + if (!fp) { + perror("# Error in opening file\n"); + return errno; + } + + if (fscanf(fp, "%u", &sparse_masks) <= 0) { + perror("Could not get sparse_masks contents\n"); + fclose(fp); + return -1; + } + + fclose(fp); + + if (!strcmp(test->resource, "L3")) + __cpuid_count(0x10, 1, eax, ebx, ecx, edx); + else if (!strcmp(test->resource, "L2")) + __cpuid_count(0x10, 2, eax, ebx, ecx, edx); + else + return -EINVAL; + + if (sparse_masks != ((ecx >> 3) & 1)) + return -1; + + /* Write checks initialization. */ + ret = get_cbm_mask(test->resource, &full_cache_mask); + if (ret < 0) + return ret; + bit_center = count_bits(full_cache_mask) / 2; + cont_mask = full_cache_mask >> bit_center; + + /* Contiguous mask write check. */ + snprintf(schemata, sizeof(schemata), "%lx", cont_mask); + ret = write_schemata("", schemata, uparams->cpu, test->resource); + if (ret) + return ret; + + /* + * Non-contiguous mask write check. CBM has a 0xf hole approximately in the middle. + * Output is compared with support information to catch any edge case errors. + */ + noncont_mask = ~(full_cache_mask & (0xf << bit_center)) & full_cache_mask; + snprintf(schemata, sizeof(schemata), "%lx", noncont_mask); + ret = write_schemata("", schemata, uparams->cpu, test->resource); + if (ret && sparse_masks) + ksft_print_msg("Non-contiguous CBMs supported but write failed\n"); + else if (ret && !sparse_masks) + ksft_print_msg("Non-contiguous CBMs not supported and write failed as expected\n"); + else if (!ret && !sparse_masks) + ksft_print_msg("Non-contiguous CBMs not supported but write succeeded\n"); + return !ret == !sparse_masks; +} + +static bool noncont_cat_feature_check(const struct resctrl_test *test) +{ + char res_path[PATH_MAX]; + struct stat statbuf; + + snprintf(res_path, sizeof(res_path), "%s/%s/%s", INFO_PATH, + test->resource, "sparse_masks"); + + if (stat(res_path, &statbuf)) + return false; + + return test_resource_feature_check(test); +} + struct resctrl_test l3_cat_test = { .name = "L3_CAT", .group = "CAT", @@ -357,3 +438,19 @@ struct resctrl_test l2_cat_test = { .feature_check = test_resource_feature_check, .run_test = cat_run_test, }; + +struct resctrl_test l3_noncont_cat_test = { + .name = "L3_NONCONT_CAT", + .group = "NONCONT_CAT", + .resource = "L3", + .feature_check = noncont_cat_feature_check, + .run_test = noncont_cat_run_test, +}; + +struct resctrl_test l2_noncont_cat_test = { + .name = "L2_NONCONT_CAT", + .group = "NONCONT_CAT", + .resource = "L2", + .feature_check = noncont_cat_feature_check, + .run_test = noncont_cat_run_test, +}; diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h index fffeb442c173..51b8a6ff3a0d 100644 --- a/tools/testing/selftests/resctrl/resctrl.h +++ b/tools/testing/selftests/resctrl/resctrl.h @@ -184,5 +184,7 @@ extern struct resctrl_test mba_test; extern struct resctrl_test cmt_test; extern struct resctrl_test l3_cat_test; extern struct resctrl_test l2_cat_test; +extern struct resctrl_test l3_noncont_cat_test; +extern struct resctrl_test l2_noncont_cat_test; #endif /* RESCTRL_H */ diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c index 9e254bca6c25..fdeef82feb4e 100644 --- a/tools/testing/selftests/resctrl/resctrl_tests.c +++ b/tools/testing/selftests/resctrl/resctrl_tests.c @@ -16,6 +16,8 @@ static struct resctrl_test *resctrl_tests[] = { &cmt_test, &l3_cat_test, &l2_cat_test, + &l3_noncont_cat_test, + &l2_noncont_cat_test, }; static int detect_vendor(void) -- 2.42.1

2 years, 1 month

3
3
0 0

[PATCH v2] Kunit to check the longest symbol length

by Sergio González Collado

The longest length of a symbol (KSYM_NAME_LEN) was increased to 512 in the reference [1]. This patch adds a kunit test to check the longest symbol length. [1] https://lore.kernel.org/lkml/20220802015052.10452-6-ojeda@kernel.org/ Tested-by: Martin Rodriguez Reboredo <yakoyoku(a)gmail.com> Signed-off-by: Sergio González Collado <sergio.collado(a)gmail.com> - - - V1 -> V2: corrected CI tests. Added fix proposed at [2] [2] https://lore.kernel.org/lkml/Y9ES4UKl%2F+DtvAVS@gmail.com/T/#m3ef0e12bb834d… --- arch/x86/tools/insn_decoder_test.c | 3 +- lib/Kconfig.debug | 9 +++ lib/Makefile | 2 + lib/longest_symbol_kunit.c | 122 +++++++++++++++++++++++++++++ 4 files changed, 135 insertions(+), 1 deletion(-) create mode 100644 lib/longest_symbol_kunit.c diff --git a/arch/x86/tools/insn_decoder_test.c b/arch/x86/tools/insn_decoder_test.c index 472540aeabc2..6c2986d2ad11 100644 --- a/arch/x86/tools/insn_decoder_test.c +++ b/arch/x86/tools/insn_decoder_test.c @@ -10,6 +10,7 @@ #include <assert.h> #include <unistd.h> #include <stdarg.h> +#include <linux/kallsyms.h> #define unlikely(cond) (cond) @@ -106,7 +107,7 @@ static void parse_args(int argc, char **argv) } } -#define BUFSIZE 256 +#define BUFSIZE (256 + KSYM_NAME_LEN) int main(int argc, char **argv) { diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index cc7d53d9dc01..a531abece0a7 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2769,6 +2769,15 @@ config FORTIFY_KUNIT_TEST by the str*() and mem*() family of functions. For testing runtime traps of FORTIFY_SOURCE, see LKDTM's "FORTIFY_*" tests. +config LONGEST_SYM_KUNIT_TEST + tristate "Test the longest symbol possible" if !KUNIT_ALL_TESTS + depends on KUNIT && KPROBES + default KUNIT_ALL_TESTS + help + Tests the longest symbol possible + + If unsure, say N. + config HW_BREAKPOINT_KUNIT_TEST bool "Test hw_breakpoint constraints accounting" if !KUNIT_ALL_TESTS depends on HAVE_HW_BREAKPOINT diff --git a/lib/Makefile b/lib/Makefile index 6b09731d8e61..f72003d5869b 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -406,6 +406,8 @@ obj-$(CONFIG_FORTIFY_KUNIT_TEST) += fortify_kunit.o obj-$(CONFIG_STRCAT_KUNIT_TEST) += strcat_kunit.o obj-$(CONFIG_STRSCPY_KUNIT_TEST) += strscpy_kunit.o obj-$(CONFIG_SIPHASH_KUNIT_TEST) += siphash_kunit.o +obj-$(CONFIG_LONGEST_SYM_KUNIT_TEST) += longest_symbol_kunit.o +CFLAGS_longest_symbol_kunit.o += $(call cc-disable-warning, missing-prototypes) obj-$(CONFIG_GENERIC_LIB_DEVMEM_IS_ALLOWED) += devmem_is_allowed.o diff --git a/lib/longest_symbol_kunit.c b/lib/longest_symbol_kunit.c new file mode 100644 index 000000000000..998563018f7a --- /dev/null +++ b/lib/longest_symbol_kunit.c @@ -0,0 +1,122 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Test the longest symbol length. Execute with: + * ./tools/testing/kunit/kunit.py run longest-symbol + * --arch=x86_64 --kconfig_add CONFIG_KPROBES=y --kconfig_add CONFIG_MODULES=y + * --kconfig_add CONFIG_RETPOLINE=n + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <kunit/test.h> +#include <linux/stringify.h> +#include <linux/kprobes.h> +#include <linux/kallsyms.h> + +#define DI(name) s##name##name +#define DDI(name) DI(n##name##name) +#define DDDI(name) DDI(n##name##name) +#define DDDDI(name) DDDI(n##name##name) +#define DDDDDI(name) DDDDI(n##name##name) + +#define PLUS1(name) __PASTE(name, e) + +/*Generate a symbol whose name length is 511 */ +#define LONGEST_SYM_NAME DDDDDI(g1h2i3j4k5l6m7n) + +/*Generate a symbol whose name length is 512 */ +#define LONGEST_SYM_NAME_PLUS1 PLUS1(LONGEST_SYM_NAME) + +#define RETURN_LONGEST_SYM 0xAAAAA +#define RETURN_LONGEST_SYM_PLUS1 0x55555 + +noinline int LONGEST_SYM_NAME(void); +noinline int LONGEST_SYM_NAME(void) +{ + return RETURN_LONGEST_SYM; +} + +noinline int LONGEST_SYM_NAME_PLUS1(void); +noinline int LONGEST_SYM_NAME_PLUS1(void) +{ + return RETURN_LONGEST_SYM_PLUS1; +} + +_Static_assert(sizeof(__stringify(LONGEST_SYM_NAME)) == KSYM_NAME_LEN, +"Incorrect symbol length found. Expected KSYM_NAME_LEN: " +__stringify(KSYM_NAME) ", but found: " +__stringify(sizeof(LONGEST_SYM_NAME))); + +static void test_longest_symbol(struct kunit *test) +{ + KUNIT_EXPECT_EQ(test, RETURN_LONGEST_SYM, LONGEST_SYM_NAME()); +}; + +static void test_longest_symbol_kallsyms(struct kunit *test) +{ + unsigned long (*kallsyms_lookup_name)(const char *name); + static int (*longest_sym)(void); + + struct kprobe kp = { + .symbol_name = "kallsyms_lookup_name", + }; + + if (register_kprobe(&kp) < 0) { + pr_info("%s: kprobe not registered\n", __func__); + KUNIT_FAIL(test, "test_longest_symbol kallsysms: kprobe not registered\n"); + return; + } + + kunit_warn(test, "test_longest_symbol kallsyms: kprobe registered\n"); + kallsyms_lookup_name = (unsigned long (*)(const char *name))kp.addr; + unregister_kprobe(&kp); + + longest_sym = + (void *) kallsyms_lookup_name(__stringify(LONGEST_SYM_NAME)); + KUNIT_EXPECT_EQ(test, RETURN_LONGEST_SYM, longest_sym()); +}; + +static void test_longest_symbol_plus1(struct kunit *test) +{ + KUNIT_EXPECT_EQ(test, RETURN_LONGEST_SYM_PLUS1, LONGEST_SYM_NAME_PLUS1()); +}; + +static void test_longest_symbol_plus1_kallsyms(struct kunit *test) +{ + unsigned long (*kallsyms_lookup_name)(const char *name); + static int (*longest_sym_plus1)(void); + + struct kprobe kp = { + .symbol_name = "kallsyms_lookup_name", + }; + + if (register_kprobe(&kp) < 0) { + pr_info("%s: kprobe not registered\n", __func__); + KUNIT_FAIL(test, "test_longest_symbol kallsysms: kprobe not registered\n"); + return; + } + + kunit_warn(test, "test_longest_symbol_plus1 kallsyms: kprobe registered\n"); + kallsyms_lookup_name = (unsigned long (*)(const char *name))kp.addr; + unregister_kprobe(&kp); + + longest_sym_plus1 = + (void *) kallsyms_lookup_name(__stringify(LONGEST_SYM_NAME_PLUS1)); + KUNIT_EXPECT_NULL(test, longest_sym_plus1); +}; + +static struct kunit_case longest_symbol_test_cases[] = { + KUNIT_CASE(test_longest_symbol), + KUNIT_CASE(test_longest_symbol_kallsyms), + KUNIT_CASE(test_longest_symbol_plus1), + KUNIT_CASE(test_longest_symbol_plus1_kallsyms), + {} +}; + +static struct kunit_suite longest_symbol_test_suite = { + .name = "longest-symbol", + .test_cases = longest_symbol_test_cases, +}; +kunit_test_suite(longest_symbol_test_suite); + +MODULE_LICENSE("GPL"); base-commit: 037266a5f7239ead1530266f7d7af153d2a867fa -- 2.39.2

2 years, 1 month

1
0
0 0

[PATCH 2/2] mm/damon/core-test: test damon_split_region_at()'s access rate copying

by SeongJae Park

damon_split_region_at() should set access rate related fields of the resulting regions same. It may forgotten, and actually there was the mistake before. Test it with the unit test case for the function. Signed-off-by: SeongJae Park <sj(a)kernel.org> --- mm/damon/core-test.h | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/mm/damon/core-test.h b/mm/damon/core-test.h index 649adf91ebc5..e6a01ea2ec54 100644 --- a/mm/damon/core-test.h +++ b/mm/damon/core-test.h @@ -122,18 +122,25 @@ static void damon_test_split_at(struct kunit *test) { struct damon_ctx *c = damon_new_ctx(); struct damon_target *t; - struct damon_region *r; + struct damon_region *r, *r_new; t = damon_new_target(); r = damon_new_region(0, 100); + r->nr_accesses_bp = 420000; + r->nr_accesses = 42; + r->last_nr_accesses = 15; damon_add_region(r, t); damon_split_region_at(t, r, 25); KUNIT_EXPECT_EQ(test, r->ar.start, 0ul); KUNIT_EXPECT_EQ(test, r->ar.end, 25ul); - r = damon_next_region(r); - KUNIT_EXPECT_EQ(test, r->ar.start, 25ul); - KUNIT_EXPECT_EQ(test, r->ar.end, 100ul); + r_new = damon_next_region(r); + KUNIT_EXPECT_EQ(test, r_new->ar.start, 25ul); + KUNIT_EXPECT_EQ(test, r_new->ar.end, 100ul); + + KUNIT_EXPECT_EQ(test, r->nr_accesses_bp, r_new->nr_accesses_bp); + KUNIT_EXPECT_EQ(test, r->nr_accesses, r_new->nr_accesses); + KUNIT_EXPECT_EQ(test, r->last_nr_accesses, r_new->last_nr_accesses); damon_free_target(t); damon_destroy_ctx(c); -- 2.34.1

2 years, 1 month

1
0
0 0

[PATCH v5 0/6] workload-specific and memory pressure-driven zswap writeback

by Nhat Pham

Changelog: v5: * Replace reference getting with an rcu_read_lock() section for zswap lru modifications (suggested by Yosry) * Add a new prep patch that allows mem_cgroup_iter() to return online cgroup. * Add a callback that updates pool->next_shrink when the cgroup is offlined (suggested by Yosry Ahmed, Johannes Weiner) v4: * Rename list_lru_add to list_lru_add_obj and __list_lru_add to list_lru_add (patch 1) (suggested by Johannes Weiner and Yosry Ahmed) * Some cleanups on the memcg aware LRU patch (patch 2) (suggested by Yosry Ahmed) * Use event interface for the new per-cgroup writeback counters. (patch 3) (suggested by Yosry Ahmed) * Abstract zswap's lruvec states and handling into zswap_lruvec_state (patch 5) (suggested by Yosry Ahmed) v3: * Add a patch to export per-cgroup zswap writeback counters * Add a patch to update zswap's kselftest * Separate the new list_lru functions into its own prep patch * Do not start from the top of the hierarchy when encounter a memcg that is not online for the global limit zswap writeback (patch 2) (suggested by Yosry Ahmed) * Do not remove the swap entry from list_lru in __read_swapcache_async() (patch 2) (suggested by Yosry Ahmed) * Removed a redundant zswap pool getting (patch 2) (reported by Ryan Roberts) * Use atomic for the nr_zswap_protected (instead of lruvec's lock) (patch 5) (suggested by Yosry Ahmed) * Remove the per-cgroup zswap shrinker knob (patch 5) (suggested by Yosry Ahmed) v2: * Fix loongarch compiler errors * Use pool stats instead of memcg stats when !CONFIG_MEMCG_KEM There are currently several issues with zswap writeback: 1. There is only a single global LRU for zswap, making it impossible to perform worload-specific shrinking - an memcg under memory pressure cannot determine which pages in the pool it owns, and often ends up writing pages from other memcgs. This issue has been previously observed in practice and mitigated by simply disabling memcg-initiated shrinking: https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u But this solution leaves a lot to be desired, as we still do not have an avenue for an memcg to free up its own memory locked up in the zswap pool. 2. We only shrink the zswap pool when the user-defined limit is hit. This means that if we set the limit too high, cold data that are unlikely to be used again will reside in the pool, wasting precious memory. It is hard to predict how much zswap space will be needed ahead of time, as this depends on the workload (specifically, on factors such as memory access patterns and compressibility of the memory pages). This patch series solves these issues by separating the global zswap LRU into per-memcg and per-NUMA LRUs, and performs workload-specific (i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The new shrinker does not have any parameter that must be tuned by the user, and can be opted in or out on a per-memcg basis. As a proof of concept, we ran the following synthetic benchmark: build the linux kernel in a memory-limited cgroup, and allocate some cold data in tmpfs to see if the shrinker could write them out and improved the overall performance. Depending on the amount of cold data generated, we observe from 14% to 35% reduction in kernel CPU time used in the kernel builds. Domenico Cerasuolo (3): zswap: make shrinking memcg-aware mm: memcg: add per-memcg zswap writeback stat selftests: cgroup: update per-memcg zswap writeback selftest Nhat Pham (3): list_lru: allows explicit memcg and NUMA node selection memcontrol: allows mem_cgroup_iter() to check for onlineness zswap: shrinks zswap pool based on memory pressure Documentation/admin-guide/mm/zswap.rst | 7 + drivers/android/binder_alloc.c | 5 +- fs/dcache.c | 8 +- fs/gfs2/quota.c | 6 +- fs/inode.c | 4 +- fs/nfs/nfs42xattr.c | 8 +- fs/nfsd/filecache.c | 4 +- fs/xfs/xfs_buf.c | 6 +- fs/xfs/xfs_dquot.c | 2 +- fs/xfs/xfs_qm.c | 2 +- include/linux/list_lru.h | 46 ++- include/linux/memcontrol.h | 9 +- include/linux/mmzone.h | 2 + include/linux/vm_event_item.h | 1 + include/linux/zswap.h | 27 +- mm/list_lru.c | 48 ++- mm/memcontrol.c | 20 +- mm/mmzone.c | 1 + mm/shrinker.c | 4 +- mm/swap.h | 3 +- mm/swap_state.c | 26 +- mm/vmscan.c | 26 +- mm/vmstat.c | 1 + mm/workingset.c | 4 +- mm/zswap.c | 430 +++++++++++++++++--- tools/testing/selftests/cgroup/test_zswap.c | 74 ++-- 26 files changed, 625 insertions(+), 149 deletions(-) -- 2.34.1

2 years, 1 month

3
21
0 0

[PATCH] selftests:proc-empty-vm: Remove unused debug write callIn the function test_proc_pid_statm

by angquan yu

From: angquan yu <angquan21(a)gmail.com> In tools/testing/selftests/proc/proc-empty->because the return value of a write call was being ignored. This call was partof a conditional debugging block (if (0) { ... }), which meant it would neveractually execute. This patch removes the unused debug write call. This cleanup resolves the compi>warning about ignoring the result of write declared with the warn_unused_resultattribute. Removing this code also improves the clarity and maintainability ofthe function, as it eliminates a non-functional block of code. This is original warning: proc-empty-vm.c: In function ‘test_proc_pid_statm’:proc-empty-vm.c:385:17: warning: ignoring return value of ‘write’ declared with>385 | write(1, buf, rv);| Signed-off-by: angquan yu <angquan21(a)gmail.com> --- tools/testing/selftests/proc/proc-empty-vm.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/proc/proc-empty-vm.c b/tools/testing/selftests/proc/proc-empty-vm.c index 56198d4ca..74ef8627f 100644 --- a/tools/testing/selftests/proc/proc-empty-vm.c +++ b/tools/testing/selftests/proc/proc-empty-vm.c @@ -382,7 +382,10 @@ static int test_proc_pid_statm(pid_t pid) assert(rv >= 0); assert(rv <= sizeof(buf)); if (0) { - write(1, buf, rv); + ssize_t written = write(1, buf, rv); + if (written == -1) { + perror("write failed /proc/${pid}"); + } } const char *p = buf; -- 2.39.2

2 years, 1 month

1
0
0 0

[PATCH] selftests:proc-empty-vm: Remove unused debug write callIn the function test_proc_pid_statm

by angquan yu

From: angquan yu <angquan21(a)gmail.com> In tools/testing/selftests/proc/proc-empty->because the return value of a write call was being ignored. This call was partof a conditional debugging block (if (0) { ... }), which meant it would neveractually execute. This patch removes the unused debug write call. This cleanup resolves the compi>warning about ignoring the result of write declared with the warn_unused_resultattribute. Removing this code also improves the clarity and maintainability ofthe function, as it eliminates a non-functional block of code. This is original warning: proc-empty-vm.c: In function ‘test_proc_pid_statm’:proc-empty-vm.c:385:17: warning: ignoring return value of ‘write’ declared with>385 | write(1, buf, rv);| Signed-off-by: angquan yu <angquan21(a)gmail.com> --- tools/testing/selftests/proc/proc-empty-vm.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/proc/proc-empty-vm.c b/tools/testing/selftests/proc/proc-empty-vm.c index 56198d4ca..74ef8627f 100644 --- a/tools/testing/selftests/proc/proc-empty-vm.c +++ b/tools/testing/selftests/proc/proc-empty-vm.c @@ -382,7 +382,10 @@ static int test_proc_pid_statm(pid_t pid) assert(rv >= 0); assert(rv <= sizeof(buf)); if (0) { - write(1, buf, rv); + ssize_t written = write(1, buf, rv); + if (written == -1) { + perror("write failed /proc/${pid}"); + } } const char *p = buf; -- 2.39.2

2 years, 1 month

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror November 2023