v2:
- Fix test_cpuset_prs.sh problems reported by test robot
- Relax restriction imposed between cpuset.cpus.exclusive and
cpuset.cpus of sibling cpusets.
- Make cpuset.cpus.exclusive independent of cpuset.cpus.
- Update test_cpuset_prs.sh accordingly.
[v1] https://lore.kernel.org/lkml/20240605171858.1323464-1-longman@redhat.com/
This patchset attempts to address the following cpuset issues.
1) While reviewing the generate_sched_domains() function, I found a bug
in generating sched domains for remote non-isolating partitions.
2) Test robot had reported a test_cpuset_prs.sh test failure.
3) The current exclusivity test between cpuset.cpus.exclusive and
cpuset.cpus and the restriction that the set effective exclusive
CPUs has to be a subset of cpuset.cpus make it harder to preconfigure
the cgroup hierarchy to enable remote partition.
The test_cpuset_prs.sh script is updated to match changes made in this
patchset and was run to verify that the new code did not cause any
regression.
Waiman Long (5):
cgroup/cpuset: Fix remote root partition creation problem
selftest/cgroup: Fix test_cpuset_prs.sh problems reported by test
robot
cgroup/cpuset: Delay setting of CS_CPU_EXCLUSIVE until valid partition
cgroup/cpuset: Make cpuset.cpus.exclusive independent of cpuset.cpus
selftest/cgroup: Update test_cpuset_prs.sh to match changes
Documentation/admin-guide/cgroup-v2.rst | 12 +-
kernel/cgroup/cpuset.c | 158 +++++++++++++-----
.../selftests/cgroup/test_cpuset_prs.sh | 75 ++++++---
3 files changed, 180 insertions(+), 65 deletions(-)
--
2.39.3
From: Allison Henderson <allison.henderson(a)oracle.com>
Hi All,
This series is a new selftest that Vegard, Chuck and myself have been
working on to provide some test coverage for rds. I've made quite a few
updates since the rfc sent a few weeks ago:
I've added several knobs to the script to tune network turbulance, and
documented their usage in the README.txt. By default these options
are left off.
Added an extra flag to specify log location
I've also added a flag to the config.sh to skip gcov configurations if
the coverage report is not desired. run.sh has been adapted to skip
the report if the required configs are not present, or if the required
packages are not available
A time out has been added to prevent the test from hanging
indefinitely
The previous gcov issues have been resolved with an appropriate gcov
patch, as well as some extra logic to detect incompatible gcov and gcc
versions.
The shellcheck nits reported in the last review have been addressed
In order to return an appropriate exit code, the run.sh script has
been adapted to analyze the test.py strace, and determine if the test
passed, failed or timed out.
RDS specific GCOV configs have been documented under
Documentation/dev-tools/gcov.rst
Questions and comments appreciated. Thanks everyone!
Allison
Vegard Nossum (3):
.gitignore: add .gcda files
net: rds: add option for GCOV profiling
selftests: rds: add testing infrastructure
.gitignore | 1 +
Documentation/dev-tools/gcov.rst | 11 +
MAINTAINERS | 1 +
net/rds/Kconfig | 9 +
net/rds/Makefile | 5 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/net/rds/Makefile | 13 +
tools/testing/selftests/net/rds/README.txt | 41 ++++
tools/testing/selftests/net/rds/config.sh | 56 +++++
tools/testing/selftests/net/rds/init.sh | 69 ++++++
tools/testing/selftests/net/rds/run.sh | 271 +++++++++++++++++++++
tools/testing/selftests/net/rds/test.py | 251 +++++++++++++++++++
12 files changed, 729 insertions(+)
create mode 100644 tools/testing/selftests/net/rds/Makefile
create mode 100644 tools/testing/selftests/net/rds/README.txt
create mode 100755 tools/testing/selftests/net/rds/config.sh
create mode 100755 tools/testing/selftests/net/rds/init.sh
create mode 100755 tools/testing/selftests/net/rds/run.sh
create mode 100644 tools/testing/selftests/net/rds/test.py
--
2.25.1
To verify IFS (In Field Scan [1]) driver functionality, add the following 6
test cases:
1. Verify that IFS sysfs entries are created after loading the IFS module
2. Check if loading an invalid IFS test image fails and loading a valid
one succeeds
3. Perform IFS scan test on each CPU using all the available image files
4. Perform IFS scan with first test image file on a random CPU for 3
rounds
5. Perform IFS ARRAY BIST(Board Integrated System Test) test on each CPU
6. Perform IFS ARRAY BIST test on a random CPU for 3 rounds
These are not exhaustive, but some minimal test runs to check various
parts of the driver. Some negative tests are also included.
[1] https://docs.kernel.org/arch/x86/ifs.html
Pengfei Xu (4):
selftests: ifs: verify test interfaces are created by the driver
selftests: ifs: verify test image loading functionality
selftests: ifs: verify IFS scan test functionality
selftests: ifs: verify IFS ARRAY BIST functionality
MAINTAINERS | 1 +
tools/testing/selftests/Makefile | 1 +
.../drivers/platform/x86/intel/ifs/Makefile | 6 +
.../platform/x86/intel/ifs/test_ifs.sh | 494 ++++++++++++++++++
4 files changed, 502 insertions(+)
create mode 100644 tools/testing/selftests/drivers/platform/x86/intel/ifs/Makefile
create mode 100755 tools/testing/selftests/drivers/platform/x86/intel/ifs/test_ifs.sh
---
Changes:
v1 to v2:
- Rebase to v6.10 cycle kernel and resolve some code conflicts
- Improved checking of IFS ARRAY_BIST support by leveraging sysfs entry
methods (suggested by Ashok)
--
2.43.0
In this series, 4 tests are being conformed to TAP.
Muhammad Usama Anjum (4):
selftests: x86: check_initial_reg_state: conform test to TAP format
output
selftests: x86: corrupt_xstate_header: conform test to TAP format
output
selftests: fsgsbase_restore: conform test to TAP format output
selftests: entry_from_vm86: conform test to TAP format output
.../selftests/x86/check_initial_reg_state.c | 24 ++--
.../selftests/x86/corrupt_xstate_header.c | 30 +++--
tools/testing/selftests/x86/entry_from_vm86.c | 109 ++++++++--------
.../testing/selftests/x86/fsgsbase_restore.c | 117 +++++++++---------
4 files changed, 139 insertions(+), 141 deletions(-)
--
2.39.2
Don't print that 88 sub-tests are going to be executed. But then skip.
The error is printed that executed test was only 1 while 88 should have
run:
Old output:
TAP version 13
1..88
ok 2 # SKIP all tests require euid == 0
# Planned tests != run tests (88 != 1)
# Totals: pass:0 fail:0 xfail:0 xpass:0 skip:1 error:0
New and correct output:
TAP version 13
1..0 # SKIP all tests require euid == 0
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
tools/testing/selftests/openat2/resolve_test.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/openat2/resolve_test.c b/tools/testing/selftests/openat2/resolve_test.c
index bbafad440893c..5472ec478d227 100644
--- a/tools/testing/selftests/openat2/resolve_test.c
+++ b/tools/testing/selftests/openat2/resolve_test.c
@@ -508,12 +508,13 @@ void test_openat2_opath_tests(void)
int main(int argc, char **argv)
{
ksft_print_header();
- ksft_set_plan(NUM_TESTS);
/* NOTE: We should be checking for CAP_SYS_ADMIN here... */
- if (geteuid() != 0)
+ if (geteuid())
ksft_exit_skip("all tests require euid == 0\n");
+ ksft_set_plan(NUM_TESTS);
+
test_openat2_opath_tests();
if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0)
--
2.39.2
Changes from PATCH v1 -> v2:
- Updated selftest to use ksft_test_result_code instead of switch-case
(Muhammad Usama Anjum)
- Included more use cases in the cover letter
(Huang, Ying)
- Added documentation for sysfs and memcg interfaces
- Added an aging-specific struct lru_gen_mm_walk in struct pglist_data
to avoid allocating for each lruvec.
Changes from RFC v3 -> PATCH v1:
- Updated selftest to use ksft_print_msg instead of fprintf(stderr, ...)
(Muhammad Usama Anjum)
- Included more detail in patch skipping pmd_young with force_scan
(Huang, Ying)
- Deferred reaccess histogram as a followup
- Removed per-memcg page age interval configs for simplicity
Changes from RFC v2 -> RFC v3:
- Update to v6.8
- Added an aging kernel thread (gated behind config)
- Added basic selftests for sysfs interface files
- Track swapped out pages for reaccesses
- Refactoring and cleanup
- Dropped the virtio-balloon extension to make things manageable
Changes from RFC v1 -> RFC v2:
- Refactored the patchs into smaller pieces
- Renamed interfaces and functions from wss to wsr (Working Set Reporting)
- Fixed build errors when CONFIG_WSR is not set
- Changed working_set_num_bins to u8 for virtio-balloon
- Added support for per-NUMA node reporting for virtio-balloon
[rfc v1]
https://lore.kernel.org/linux-mm/20230509185419.1088297-1-yuanchu@google.co…
[rfc v2]
https://lore.kernel.org/linux-mm/20230621180454.973862-1-yuanchu@google.com/
[rfc v3]
https://lore.kernel.org/linux-mm/20240327213108.2384666-1-yuanchu@google.co…
This patch series provides workingset reporting of user pages in
lruvecs, of which coldness can be tracked by accessed bits and fd
references. However, the concept of workingset applies generically to
all types of memory, which could be kernel slab caches, discardable
userspace caches (databases), or CXL.mem. Therefore, data sources might
come from slab shrinkers, device drivers, or the userspace. IMO, the
kernel should provide a set of workingset interfaces that should be
generic enough to accommodate the various use cases, and be extensible
to potential future use cases. The current proposed interfaces are not
sufficient in that regard, but I would like to start somewhere, solicit
feedback, and iterate.
Use cases
==========
Job scheduling
On overcommitted hosts, workingset information allows the job scheduler
to right-size each job and land more jobs on the same host or NUMA node,
and in the case of a job with increasing workingset, policy decisions
can be made to migrate other jobs off the host/NUMA node, or oom-kill
the misbehaving job. If the job shape is very different from the machine
shape, knowing the workingset per-node can also help inform page
allocation policies.
Proactive reclaim
Workingset information allows the a container manager to proactively
reclaim memory while not impacting a job's performance. While PSI may
provide a reactive measure of when a proactive reclaim has reclaimed too
much, workingset reporting allows the policy to be more accurate and
flexible.
Ballooning (similar to proactive reclaim)
While this patch series does not extend the virtio-balloon device,
balloon policies benefit from workingset to more precisely determine
the size of the memory balloon. On desktops/laptops/mobile devices where
memory is scarce and overcommitted, the balloon sizing in multiple VMs
running on the same device can be orchestrated with workingset reports
from each one.
Promotion/Demotion
If different mechanisms are used for promition and demotion, workingset
information can help connect the two and avoid pages being migrated back
and forth.
For example, given a promotion hot page threshold defined in reaccess
distance of N seconds (promote pages accessed more often than every N
seconds). The threshold N should be set so that ~80% (e.g.) of pages on
the fast memory node passes the threshold. This calculation can be done
with workingset reports.
To be directly useful for promotion policies, the workingset report
interfaces need to be extended to report hotness and gather hotness
information from the devices[1].
[1]
https://www.opencompute.org/documents/ocp-cms-hotness-tracking-requirements…
Sysfs and Cgroup Interfaces
==========
The interfaces are detailed in the patches that introduce them. The main
idea here is we break down the workingset per-node per-memcg into time
intervals (ms), e.g.
1000 anon=137368 file=24530
20000 anon=34342 file=0
30000 anon=353232 file=333608
40000 anon=407198 file=206052
9223372036854775807 anon=4925624 file=892892
I realize this does not generalize well to hotness information, but I
lack the intuition for an abstraction that presents hotness in a useful
way. Based on a recent proposal for move_phys_pages[2], it seems like
userspace tiering software would like to move specific physical pages,
instead of informing the kernel "move x number of hot pages to y
device". Please advise.
[2]
https://lore.kernel.org/lkml/20240319172609.332900-1-gregory.price@memverge…
Implementation
==========
Currently, the reporting of user pages is based off of MGLRU, and
therefore requires CONFIG_LRU_GEN=y. We would benefit from more MGLRU
generations for a more fine-grained workingset report. I will make the
generation count configurable in the next version. The workingset
reporting mechanism is gated behind CONFIG_WORKINGSET_REPORT, and the
aging thread is behind CONFIG_WORKINGSET_REPORT_AGING.
Yuanchu Xie (8):
mm: multi-gen LRU: ignore non-leaf pmd_young for force_scan=true
mm: aggregate working set information into histograms
mm: use refresh interval to rate-limit workingset report aggregation
mm: report workingset during memory pressure driven scanning
mm: extend working set reporting to memcgs
mm: add kernel aging thread for workingset reporting
selftest: test system-wide workingset reporting
Docs/admin-guide/mm/workingset_report: document sysfs and memcg
interfaces
Documentation/admin-guide/mm/index.rst | 1 +
.../admin-guide/mm/workingset_report.rst | 105 ++++
drivers/base/node.c | 6 +
include/linux/memcontrol.h | 5 +
include/linux/mmzone.h | 9 +
include/linux/workingset_report.h | 97 +++
mm/Kconfig | 15 +
mm/Makefile | 2 +
mm/internal.h | 18 +
mm/memcontrol.c | 184 +++++-
mm/mm_init.c | 2 +
mm/mmzone.c | 2 +
mm/vmscan.c | 58 +-
mm/workingset_report.c | 561 ++++++++++++++++++
mm/workingset_report_aging.c | 127 ++++
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 3 +
tools/testing/selftests/mm/run_vmtests.sh | 5 +
.../testing/selftests/mm/workingset_report.c | 306 ++++++++++
.../testing/selftests/mm/workingset_report.h | 39 ++
.../selftests/mm/workingset_report_test.c | 329 ++++++++++
21 files changed, 1869 insertions(+), 6 deletions(-)
create mode 100644 Documentation/admin-guide/mm/workingset_report.rst
create mode 100644 include/linux/workingset_report.h
create mode 100644 mm/workingset_report.c
create mode 100644 mm/workingset_report_aging.c
create mode 100644 tools/testing/selftests/mm/workingset_report.c
create mode 100644 tools/testing/selftests/mm/workingset_report.h
create mode 100644 tools/testing/selftests/mm/workingset_report_test.c
--
2.45.1.467.gbab1589fc0-goog