November 2023 - Linux-kselftest-mirror

[PATCH 0/9] mm/damon: let users feed and tame/auto-tune DAMOS

by SeongJae Park

Introduce Aim-oriented Feedback-driven DAMOS Aggressiveness Auto-tuning. It makes DAMOS self-tuned with periodic simple user feedback. Patchset Changelog ================== From RFC (https://lore.kernel.org/damon/20231112194607.61399-1-sj@kernel.org/) - Wordsmith commit messages and cover letter Background: DAMOS Control Difficulty ==================================== DAMOS helps users easily implement access pattern aware system operations. However, controlling DAMOS in the wild is not that easy. The basic way for DAMOS control is specifying the target access pattern. In this approach, the user is assumed to well understand the access pattern and the characteristics of the system and the workloads. Though there are useful tools for that, it takes time and effort depending on the complexity and the dynamicity of the system and the workloads. After all, the access pattern consists of three ranges, namely the size, the access rate, and the age of the regions. It means users need to tune six parameters, which is anyway not a simple task. One of the worst cases would be DAMOS being too aggressive like a berserker, and therefore consuming too much system resource and making unwanted radical system operations. To let users avoid such cases, DAMOS allows users to set the upper-limit of the schemes' aggressiveness, namely DAMOS quota. DAMOS further provides its best-effort under the limit by prioritizing regions based on the access pattern of the regions. For example, users can ask DAMOS to page out up to 100 MiB of memory regions per second. Then DAMOS pages out regions that are not accessed for a longer time (colder) first under the limit. This allows users to set the target access pattern a bit naive with wider ranges, and focus on tuning only one parameter, the quota. In other words, the number of parameters to tune can be reduced from six to one. Still, however, the optimum value for the quota depends on the system and the workloads' characteristics, so not that simple. The number of parameters to tune can also increase again if the user needs to run multiple schemes. Aim-oriented Feedback-driven DAMOS Aggressiveness Auto Tuning ============================================================= Users would use DAMOS since they want to achieve something with it. They will likely have measurable metrics representing the achievement and the target number of the metric like SLO, and continuously measure that anyway. While the additional cost of getting the information is nearly zero, it could be useful for DAMOS to understand how appropriate its current aggressiveness is set, and adjust it on its own to make the metric value more close to the target. Based on this idea, we introduce a new way of tuning DAMOS with nearly zero additional effort, namely Aim-oriented Feedback-driven DAMOS Aggressiveness Auto Tuning. It asks users to provide feedback representing how well DAMOS is doing relative to the users' aim. Then DAMOS adjusts its aggressiveness, specifically the quota that provides the best effort result under the limit, based on the current level of the aggressiveness and the users' feedback. Implementation -------------- The implementation asks users to represent the feedback with score numbers. The scores could be anything including user-space specific metrics including latency and throughput of special user-space workloads, and system metrics including free memory ratio, memory pressure stall time (PSI), and active to inactive LRU lists size ratio. The feedback scores and the aggressiveness of the given DAMOS scheme are assumed to be positively proportional, though. Selecting metrics of the assumption is the users' responsibility. The core logic uses the below simple feedback loop algorithm to calculate the next aggressiveness level of the scheme from the current aggressiveness level and the current feedback (target_score and current_score). It calculates the compensation for next aggressiveness as a proportion of current aggressiveness and distance to the target score. As a result, it arrives at the near-goal state in a short time using big steps when it's far from the goal, but avoids making unnecessarily radical changes that could turn out to be a bad decision using small steps when its near to the goal. f(n) = max(1, f(n - 1) * ((target_score - current_score) / target_score + 1)) Note that the compensation value becomes negative when it's over achieving the goal. That's why the feedback metric and the aggressiveness of the scheme should be positively proportional. The distance-adaptive speed manipulation is simply applied. Example Use Cases ----------------- If users want to reduce the memory footprint of the system as much as possible as long as the time spent for handling the resulting memory pressure is within a threshold, they could use DAMOS scheme that reclaims cold memory regions aiming for a little level of memory pressure stall time. If users want the active/inactive LRU lists well balanced to reduce the performance impact due to possible future memory pressure, they could use two schemes. The first one would be set to locate hot pages in the active LRU list, aiming for a specific active-to-inactive LRU list size ratio, say, 70%. The second one would be to locate cold pages in the inactive LRU list, aiming for a specific inactive-to-active LRU list size ratio, say, 30%. Then, DAMOS will balance the two schemes based on the goal and feedback. This aim-oriented auto tuning could also be useful for general balancing-required access aware system operations such as system memory auto scaling[3] and tiered memory management[4]. These two example usages are not what current DAMOS implementation is already supporting, but require additional DAMOS action developments, though. Evaluation: subtle memory pressure aiming proactive reclamation --------------------------------------------------------------- To show if the implementation works as expected, we prepare four different system configurations on AWS i3.metal instances. The first setup (original) runs the workload without any DAMOS scheme. The second setup (not-tuned) runs the workload with a virtual address space-based proactive reclamation scheme that pages out memory regions that are not accessed for five seconds or more. The third setup (offline-tuned) runs the same proactive reclamation DAMOS scheme, but after making it tuned for each workload offline, using our previous user-space driven automatic tuning approach, namely DAMOOS[1]. The fourth and final setup (AFDAA) runs the scheme that is the same as that of 'not-tuned' setup, but aims to keep 0.5% of 'some' memory pressure stall time (PSI) for the last 10 seconds using the aiming-oriented auto tuning. For each setup, we run realistic workloads from PARSEC3 and SPLASH-2X benchmark suites. For each run, we measure RSS and runtime of the workload, and 'some' memory pressure stall time (PSI) of the system. We repeat the runs five times and use averaged measurements. For simple comparison of the results, we normalize the measurements to those of 'original'. In the case of the PSI, though, the measurement for 'original' was zero, so we normalize the value to that of 'not-tuned' scheme's result. The normalized results are shown below. Not-tuned Offline-tuned AFDAA RSS 0.622688178226118 0.787950678944904 0.740093483278979 runtime 1.11767826657912 1.0564674983585 1.0910833880499 PSI 1 0.727521443794069 0.308498846350299 The 'not-tuned' scheme achieves about 38.7% memory saving but incur about 11.7% runtime slowdown. The 'offline-tuned' scheme achieves about 22.2% memory saving with about 5.5% runtime slowdown. It also achieves about 28.2% memory pressure stall time saving. AFDAA achieves about 26% memory saving with about 9.1% runtime slowdown. It also achieves about 69.1% memory pressure stall time saving. We repeat this test multiple times, and get consistent results. AFDAA is now integrated in our daily DAMON performance test setup. Apparently the aggressiveness of 'AFDAA' setup is somewhere between those of 'not-tuned' and 'offline-tuned' setup, since its memory saving and runtime overhead are between those of the other two setups. Actually we set the memory pressure stall time goal aiming for this middle aggressiveness. The difference in the two metrics are not significant, though. However, it shows significant saving of the memory pressure stall time, which was the goal of the auto-tuning, over the two variants. Hence, we conclude the automatic tuning is working as expected. Please note that the AFDAA setup is only for the evaluation, and therefore intentionally set a bit aggressive. It might not be appropriate for production environments. The test code is also available[2], so you could reproduce it on your system and workloads. Patches Sequence ================ The first four patches implement the core logic and user interfaces for the auto tuning. The first patch implements the core logic for the auto tuning, and the API for DAMOS users in the kernel space. The second patch implements basic file operations of DAMON sysfs directories and files that will be used for setting the goals and providing the feedback. The third patch connects the quota goals files inputs to the DAMOS core logic. Finally the fourth patch implements a dedicated DAMOS sysfs command for efficiently committing the quota goals feedback. Two patches for simple tests of the logic and interfaces follow. The fifth patch implements the core logic unit test. The sixth patch implements a selftest for the DAMON Sysfs interface for the goals. Finally, three patches for documentation follows. The seventh patch documents the design of the feature. The eighth patch updates the API doc for the new sysfs files. The final eighth patch updates the usage document for the features. References ========== [1] DAOS paper: https://www.amazon.science/publications/daos-data-access-aware-operating-sy… [2] Evaluation code: https://github.com/damonitor/damon-tests/commit/3f884e61193f0166b8724554b6d… [3] Memory auto scaling RFC idea: https://lore.kernel.org/damon/20231112195114.61474-1-sj@kernel.org/ [4] DAMON-based tiered memory management RFC idea: https://lore.kernel.org/damon/20231112195602.61525-1-sj@kernel.org/ SeongJae Park (9): mm/damon/core: implement goal-oriented feedback-driven quota auto-tuning mm/damon/sysfs-schemes: implement files for scheme quota goals setup mm/damon/sysfs-schemes: commit damos quota goals user input to DAMOS mm/damon/sysfs-schemes: implement a command for scheme quota goals only commit mm/damon/core-test: add a unit test for the feedback loop algorithm selftests/damon: test quota goals directory Docs/mm/damon/design: document DAMOS quota auto tuning Docs/ABI/damon: document DAMOS quota goals Docs/admin-guide/mm/damon/usage: document for quota goals .../ABI/testing/sysfs-kernel-mm-damon | 33 ++- Documentation/admin-guide/mm/damon/usage.rst | 48 +++- Documentation/mm/damon/design.rst | 13 + include/linux/damon.h | 19 ++ mm/damon/core-test.h | 32 +++ mm/damon/core.c | 68 ++++- mm/damon/sysfs-common.h | 3 + mm/damon/sysfs-schemes.c | 272 +++++++++++++++++- mm/damon/sysfs.c | 27 ++ tools/testing/selftests/damon/sysfs.sh | 27 ++ 10 files changed, 517 insertions(+), 25 deletions(-) base-commit: b4e0245a831a402cae8634a4dc277a04830ff07a -- 2.34.1

1 year, 7 months

2
3
0 0

[PATCH 0/3] kselftest/vDSO: Output formatting cleanups for vdso_test_abi

by Mark Brown

These patches update the output of the vdso_test_abi test program to bring it into line with expected KTAP usage, the main one being the first patch which ensures we log distinct test names for each reported result making it much easier for automated systems to track the status of the tests. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Mark Brown (3): kselftest/vDSO: Make test name reporting for vdso_abi_test tooling friendly kselftest/vDSO: Fix message formatting for clock_id logging kselftest/vDSO: Use ksft_print_msg() rather than printf in vdso_test_abi tools/testing/selftests/vDSO/vdso_test_abi.c | 72 +++++++++++++++------------- 1 file changed, 39 insertions(+), 33 deletions(-) --- base-commit: 98b1cc82c4affc16f5598d4fa14b1858671b2263 change-id: 20231122-kselftest-vdso-test-name-44fcc7e16a38 Best regards, -- Mark Brown <broonie(a)kernel.org>

1 year, 7 months

2
4
0 0

[PATCH] selftests:breakpoints: Fix Format String Warning in breakpoint_test

by angquan yu

From: angquan yu <angquan21(a)gmail.com> This commit resolves a compiler warning regardingthe use of non-literal format strings in breakpoint_test.c. The functions `ksft_test_result_pass` and `ksft_test_result_fail` were previously called with a variable `msg` directly, which could potentially lead to format string vulnerabilities. Changes made: - Modified the calls to `ksft_test_result_pass` and `ksft_test_result_fail` by adding a "%s" format specifier. This explicitly declares `msg` as a string argument, adhering to safer coding practices and resolving the compiler warning. This change does not affect the functional behavior of the code but ensures better code safety and compliance with recommended C programming standards. The previous warning is "breakpoint_test.c:287:17: warning: format not a string literal and no format arguments [-Wformat-security] 287 | ksft_test_result_pass(msg); | ^~~~~~~~~~~~~~~~~~~~~ breakpoint_test.c:289:17: warning: format not a string literal and no format arguments [-Wformat-security] 289 | ksft_test_result_fail(msg); | " Signed-off-by: angquan yu <angquan21(a)gmail.com> --- tools/testing/selftests/breakpoints/breakpoint_test.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/breakpoints/breakpoint_test.c b/tools/testing/selftests/breakpoints/breakpoint_test.c index 3266cc929..d46962a24 100644 --- a/tools/testing/selftests/breakpoints/breakpoint_test.c +++ b/tools/testing/selftests/breakpoints/breakpoint_test.c @@ -284,9 +284,9 @@ static void check_success(const char *msg) nr_tests++; if (ret) - ksft_test_result_pass(msg); + ksft_test_result_pass("%s", msg); else - ksft_test_result_fail(msg); + ksft_test_result_fail("%s", msg); } static void launch_instruction_breakpoints(char *buf, int local, int global) -- 2.39.2

1 year, 7 months

2
1
0 0

[PATCH] selftests/breakpoints: Fix format specifier in ksft_print_msg in step_after_suspend_test.c

by angquan yu

From: angquan yu <angquan21(a)gmail.com> In the function 'tools/testing/selftests/breakpoints/run_test' within step_after_suspend_test.c, the ksft_print_msg function call incorrectly used '$s' as a format specifier. This commit corrects this typo to use the proper '%s' format specifier, ensuring the error message from waitpid() is correctly displayed. The issue manifested as a compilation warning (too many arguments for format [-Wformat-extra-args]), potentially obscuring actual runtime errors and complicating debugging processes. This fix enhances the clarity of error messages during test failures and ensures compliance with standard C format string conventions. Signed-off-by: angquan yu <angquan21(a)gmail.com> --- tools/testing/selftests/breakpoints/step_after_suspend_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/breakpoints/step_after_suspend_test.c b/tools/testing/selftests/breakpoints/step_after_suspend_test.c index 2cf6f10ab..b8703c499 100644 --- a/tools/testing/selftests/breakpoints/step_after_suspend_test.c +++ b/tools/testing/selftests/breakpoints/step_after_suspend_test.c @@ -89,7 +89,7 @@ int run_test(int cpu) wpid = waitpid(pid, &status, __WALL); if (wpid != pid) { - ksft_print_msg("waitpid() failed: $s\n", strerror(errno)); + ksft_print_msg("waitpid() failed: %s\n", strerror(errno)); return KSFT_FAIL; } if (WIFEXITED(status)) { -- 2.39.2

1 year, 7 months

2
1
0 0

[PATCH RESEND] kunit: debugfs: Fix unchecked dereference in debugfs_print_results()

by Richard Fitzgerald

Move the call to kunit_suite_has_succeeded() after the check that the kunit_suite pointer is valid. This was found by smatch: lib/kunit/debugfs.c:66 debugfs_print_results() warn: variable dereferenced before check 'suite' (see line 63) Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com> Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org> Fixes: 38289a26e1b8 ("kunit: fix debugfs code to use enum kunit_status, not bool") --- lib/kunit/debugfs.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/kunit/debugfs.c b/lib/kunit/debugfs.c index 9d167adfa746..382706dfb47d 100644 --- a/lib/kunit/debugfs.c +++ b/lib/kunit/debugfs.c @@ -60,12 +60,14 @@ static void debugfs_print_result(struct seq_file *seq, struct string_stream *log static int debugfs_print_results(struct seq_file *seq, void *v) { struct kunit_suite *suite = (struct kunit_suite *)seq->private; - enum kunit_status success = kunit_suite_has_succeeded(suite); + enum kunit_status success; struct kunit_case *test_case; if (!suite) return 0; + success = kunit_suite_has_succeeded(suite); + /* Print KTAP header so the debugfs log can be parsed as valid KTAP. */ seq_puts(seq, "KTAP version 1\n"); seq_puts(seq, "1..1\n"); -- 2.30.2

1 year, 7 months

2
1
0 0

[PATCH] selftests:proc: Resolve 'Unused Result' Warning from proc-empty-vm.c

by angquan yu

From: angquan yu <angquan21(a)gmail.com> In tools/testing/selftests/proc/proc-empty->because the return value of a write call was being ignored. This call was partof a conditional debugging block (if (0) { ... }), which meant it would neveractually execute. This patch removes the unused debug write call. This cleanup resolves the compi>warning about ignoring the result of write declared with the warn_unused_resultattribute. Removing this code also improves the clarity and maintainability of the function, as it eliminates a non-functional block of code. This is original warning: proc-empty-vm.c: In function ‘test_proc_pid_statm’ :proc-empty-vm.c:385:17: warning: ignoring return value of ‘write’ declared with>385 | write(1, buf, rv);| Signed-off-by: angquan yu <angquan21(a)gmail.com> --- tools/testing/selftests/proc/proc-empty-vm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/proc/proc-empty-vm.c b/tools/testing/selftests/proc/proc-empty-vm.c index 5e7020630..d231e61e4 100644 --- a/tools/testing/selftests/proc/proc-empty-vm.c +++ b/tools/testing/selftests/proc/proc-empty-vm.c @@ -383,8 +383,10 @@ static int test_proc_pid_statm(pid_t pid) assert(rv <= sizeof(buf)); if (0) { ssize_t written = write(1, buf, rv); + if (written == -1) { perror("write failed to /proc/${pid}"); + return EXIT_FAILURE; } } -- 2.39.2

1 year, 7 months

2
1
0 0

[PATCH V11 0/7] amd-pstate preferred core

by Meng Li

Hi all: The core frequency is subjected to the process variation in semiconductors. Not all cores are able to reach the maximum frequency respecting the infrastructure limits. Consequently, AMD has redefined the concept of maximum frequency of a part. This means that a fraction of cores can reach maximum frequency. To find the best process scheduling policy for a given scenario, OS needs to know the core ordering informed by the platform through highest performance capability register of the CPPC interface. Earlier implementations of amd-pstate preferred core only support a static core ranking and targeted performance. Now it has the ability to dynamically change the preferred core based on the workload and platform conditions and accounting for thermals and aging. Amd-pstate driver utilizes the functions and data structures provided by the ITMT architecture to enable the scheduler to favor scheduling on cores which can be get a higher frequency with lower voltage. We call it amd-pstate preferred core. Here sched_set_itmt_core_prio() is called to set priorities and sched_set_itmt_support() is called to enable ITMT feature. Amd-pstate driver uses the highest performance value to indicate the priority of CPU. The higher value has a higher priority. Amd-pstate driver will provide an initial core ordering at boot time. It relies on the CPPC interface to communicate the core ranking to the operating system and scheduler to make sure that OS is choosing the cores with highest performance firstly for scheduling the process. When amd-pstate driver receives a message with the highest performance change, it will update the core ranking. Changes from V10->V11: - cpufreq: amd-pstate: - - according Perry's commnts, I replace the string with str_enabled_disable(). Changes from V9->V10: - cpufreq: amd-pstate: - - add judgement for highest_perf. When it is less than 255, the preferred core feature is enabled. And it will set the priority. - - deleset "static u32 max_highest_perf" etc, because amd p-state perferred coe does not require specail process for hotpulg. Changes form V8->V9: - all: - - pick up Tested-By flag added by Oleksandr. - cpufreq: amd-pstate: - - pick up Review-By flag added by Wyes. - - ignore modification of bug. - - add a attribute of prefcore_ranking. - - modify data type conversion from u32 to int. - Documentation: amd-pstate: - - pick up Review-By flag added by Wyes. Changes form V7->V8: - all: - - pick up Review-By flag added by Mario and Ray. - cpufreq: amd-pstate: - - use hw_prefcore embeds into cpudata structure. - - delete preferred core init from cpu online/off. Changes form V6->V7: - x86: - - Modify kconfig about X86_AMD_PSTATE. - cpufreq: amd-pstate: - - modify incorrect comments about scheduler_work(). - - convert highest_perf data type. - - modify preferred core init when cpu init and online. - acpi: cppc: - - modify link of CPPC highest performance. - cpufreq: - - modify link of CPPC highest performance changed. Changes form V5->V6: - cpufreq: amd-pstate: - - modify the wrong tag order. - - modify warning about hw_prefcore sysfs attribute. - - delete duplicate comments. - - modify the variable name cppc_highest_perf to prefcore_ranking. - - modify judgment conditions for setting highest_perf. - - modify sysfs attribute for CPPC highest perf to pr_debug message. - Documentation: amd-pstate: - - modify warning: title underline too short. Changes form V4->V5: - cpufreq: amd-pstate: - - modify sysfs attribute for CPPC highest perf. - - modify warning about comments - - rebase linux-next - cpufreq: - - Moidfy warning about function declarations. - Documentation: amd-pstate: - - align with ``amd-pstat`` Changes form V3->V4: - Documentation: amd-pstate: - - Modify inappropriate descriptions. Changes form V2->V3: - x86: - - Modify kconfig and description. - cpufreq: amd-pstate: - - Add Co-developed-by tag in commit message. - cpufreq: - - Modify commit message. - Documentation: amd-pstate: - - Modify inappropriate descriptions. Changes form V1->V2: - acpi: cppc: - - Add reference link. - cpufreq: - - Moidfy link error. - cpufreq: amd-pstate: - - Init the priorities of all online CPUs - - Use a single variable to represent the status of preferred core. - Documentation: - - Default enabled preferred core. - Documentation: amd-pstate: - - Modify inappropriate descriptions. - - Default enabled preferred core. - - Use a single variable to represent the status of preferred core. Meng Li (7): x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion. acpi: cppc: Add get the highest performance cppc control cpufreq: amd-pstate: Enable amd-pstate preferred core supporting. cpufreq: Add a notification message that the highest perf has changed cpufreq: amd-pstate: Update amd-pstate preferred core ranking dynamically Documentation: amd-pstate: introduce amd-pstate preferred core Documentation: introduce amd-pstate preferrd core mode kernel command line options .../admin-guide/kernel-parameters.txt | 5 + Documentation/admin-guide/pm/amd-pstate.rst | 59 +++++- arch/x86/Kconfig | 5 +- drivers/acpi/cppc_acpi.c | 13 ++ drivers/acpi/processor_driver.c | 6 + drivers/cpufreq/amd-pstate.c | 187 ++++++++++++++++-- drivers/cpufreq/cpufreq.c | 13 ++ include/acpi/cppc_acpi.h | 5 + include/linux/amd-pstate.h | 10 + include/linux/cpufreq.h | 5 + 10 files changed, 288 insertions(+), 20 deletions(-) -- 2.34.1

1 year, 7 months

3
14
0 0

[PATCH] kunit: string-stream-test: Avoid cast warning when testing gfp_t flags

by Richard Fitzgerald

Passing a gfp_t to KUNIT_EXPECT_EQ() causes a cast warning: lib/kunit/string-stream-test.c:73:9: sparse: sparse: incorrect type in initializer (different base types) expected long long right_value got restricted gfp_t const __right Avoid this by testing stream->gfp for the expected value and passing the boolean result of this comparison to KUNIT_EXPECT_TRUE(), as was already done a few lines above in string_stream_managed_init_test(). Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com> Fixes: d1a0d699bfc0 ("kunit: string-stream: Add tests for freeing resource-managed string_stream") Reported-by: kernel test robot <lkp(a)intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311181918.0mpCu2Xh-lkp@intel.com/ --- lib/kunit/string-stream-test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/kunit/string-stream-test.c b/lib/kunit/string-stream-test.c index 06822766f29a..03fb511826f7 100644 --- a/lib/kunit/string-stream-test.c +++ b/lib/kunit/string-stream-test.c @@ -72,7 +72,7 @@ static void string_stream_unmanaged_init_test(struct kunit *test) KUNIT_EXPECT_EQ(test, stream->length, 0); KUNIT_EXPECT_TRUE(test, list_empty(&stream->fragments)); - KUNIT_EXPECT_EQ(test, stream->gfp, GFP_KERNEL); + KUNIT_EXPECT_TRUE(test, (stream->gfp == GFP_KERNEL)); KUNIT_EXPECT_FALSE(test, stream->append_newlines); KUNIT_EXPECT_TRUE(test, string_stream_is_empty(stream)); -- 2.30.2

1 year, 7 months

3
2
0 0

[PATCH] KVM: selftests: add -MP to CFLAGS

by David Woodhouse

From: David Woodhouse <dwmw(a)amazon.co.uk> Using -MD without -MP causes build failures when a header file is deleted or moved. With -MP, the compiler will emit phony targets for the header files it lists as dependencies, and the Makefiles won't refuse to attempt to rebuild a C unit which no longer includes the deleted header. Signed-off-by: David Woodhouse <dwmw(a)amazon.co.uk> --- tools/testing/selftests/kvm/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index a3bb36fb3cfc..20ea549da570 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -211,7 +211,7 @@ else LINUX_TOOL_ARCH_INCLUDE = $(top_srcdir)/tools/arch/$(ARCH)/include endif CFLAGS += -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99 \ - -Wno-gnu-variable-sized-type-not-at-end -MD\ + -Wno-gnu-variable-sized-type-not-at-end -MD -MP \ -fno-builtin-memcmp -fno-builtin-memcpy -fno-builtin-memset \ -fno-builtin-strnlen \ -fno-stack-protector -fno-PIE -I$(LINUX_TOOL_INCLUDE) \ -- 2.41.0

1 year, 7 months

3
5
0 0

[PATCH v6 0/6] workload-specific and memory pressure-driven zswap writeback

by Nhat Pham

Changelog: v6: * Rebase on top of latest mm-unstable. * Fix/improve the in-code documentation of the new list_lru manipulation functions (patch 1) v5: * Replace reference getting with an rcu_read_lock() section for zswap lru modifications (suggested by Yosry) * Add a new prep patch that allows mem_cgroup_iter() to return online cgroup. * Add a callback that updates pool->next_shrink when the cgroup is offlined (suggested by Yosry Ahmed, Johannes Weiner) v4: * Rename list_lru_add to list_lru_add_obj and __list_lru_add to list_lru_add (patch 1) (suggested by Johannes Weiner and Yosry Ahmed) * Some cleanups on the memcg aware LRU patch (patch 2) (suggested by Yosry Ahmed) * Use event interface for the new per-cgroup writeback counters. (patch 3) (suggested by Yosry Ahmed) * Abstract zswap's lruvec states and handling into zswap_lruvec_state (patch 5) (suggested by Yosry Ahmed) v3: * Add a patch to export per-cgroup zswap writeback counters * Add a patch to update zswap's kselftest * Separate the new list_lru functions into its own prep patch * Do not start from the top of the hierarchy when encounter a memcg that is not online for the global limit zswap writeback (patch 2) (suggested by Yosry Ahmed) * Do not remove the swap entry from list_lru in __read_swapcache_async() (patch 2) (suggested by Yosry Ahmed) * Removed a redundant zswap pool getting (patch 2) (reported by Ryan Roberts) * Use atomic for the nr_zswap_protected (instead of lruvec's lock) (patch 5) (suggested by Yosry Ahmed) * Remove the per-cgroup zswap shrinker knob (patch 5) (suggested by Yosry Ahmed) v2: * Fix loongarch compiler errors * Use pool stats instead of memcg stats when !CONFIG_MEMCG_KEM There are currently several issues with zswap writeback: 1. There is only a single global LRU for zswap, making it impossible to perform worload-specific shrinking - an memcg under memory pressure cannot determine which pages in the pool it owns, and often ends up writing pages from other memcgs. This issue has been previously observed in practice and mitigated by simply disabling memcg-initiated shrinking: https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u But this solution leaves a lot to be desired, as we still do not have an avenue for an memcg to free up its own memory locked up in the zswap pool. 2. We only shrink the zswap pool when the user-defined limit is hit. This means that if we set the limit too high, cold data that are unlikely to be used again will reside in the pool, wasting precious memory. It is hard to predict how much zswap space will be needed ahead of time, as this depends on the workload (specifically, on factors such as memory access patterns and compressibility of the memory pages). This patch series solves these issues by separating the global zswap LRU into per-memcg and per-NUMA LRUs, and performs workload-specific (i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The new shrinker does not have any parameter that must be tuned by the user, and can be opted in or out on a per-memcg basis. As a proof of concept, we ran the following synthetic benchmark: build the linux kernel in a memory-limited cgroup, and allocate some cold data in tmpfs to see if the shrinker could write them out and improved the overall performance. Depending on the amount of cold data generated, we observe from 14% to 35% reduction in kernel CPU time used in the kernel builds. Domenico Cerasuolo (3): zswap: make shrinking memcg-aware mm: memcg: add per-memcg zswap writeback stat selftests: cgroup: update per-memcg zswap writeback selftest Nhat Pham (3): list_lru: allows explicit memcg and NUMA node selection memcontrol: allows mem_cgroup_iter() to check for onlineness zswap: shrinks zswap pool based on memory pressure Documentation/admin-guide/mm/zswap.rst | 7 + drivers/android/binder_alloc.c | 5 +- fs/dcache.c | 8 +- fs/gfs2/quota.c | 6 +- fs/inode.c | 4 +- fs/nfs/nfs42xattr.c | 8 +- fs/nfsd/filecache.c | 4 +- fs/xfs/xfs_buf.c | 6 +- fs/xfs/xfs_dquot.c | 2 +- fs/xfs/xfs_qm.c | 2 +- include/linux/list_lru.h | 54 ++- include/linux/memcontrol.h | 9 +- include/linux/mmzone.h | 2 + include/linux/vm_event_item.h | 1 + include/linux/zswap.h | 27 +- mm/list_lru.c | 48 ++- mm/memcontrol.c | 20 +- mm/mmzone.c | 1 + mm/shrinker.c | 4 +- mm/swap.h | 3 +- mm/swap_state.c | 26 +- mm/vmscan.c | 26 +- mm/vmstat.c | 1 + mm/workingset.c | 4 +- mm/zswap.c | 426 +++++++++++++++++--- tools/testing/selftests/cgroup/test_zswap.c | 74 ++-- 26 files changed, 629 insertions(+), 149 deletions(-) base-commit: 40b487ae2620fc9187fee68b09d2cb275de0d60e -- 2.34.1

1 year, 7 months

3
17
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror November 2023