Hi all:
The core frequency is subjected to the process variation in semiconductors.
Not all cores are able to reach the maximum frequency respecting the
infrastructure limits. Consequently, AMD has redefined the concept of
maximum frequency of a part. This means that a fraction of cores can reach
maximum frequency. To find the best process scheduling policy for a given
scenario, OS needs to know the core ordering informed by the platform through
highest performance capability register of the CPPC interface.
Earlier implementations of amd-pstate preferred core only support a static
core ranking and targeted performance. Now it has the ability to dynamically
change the preferred core based on the workload and platform conditions and
accounting for thermals and aging.
Amd-pstate driver utilizes the functions and data structures provided by
the ITMT architecture to enable the scheduler to favor scheduling on cores
which can be get a higher frequency with lower voltage.
We call it amd-pstate preferred core.
Here sched_set_itmt_core_prio() is called to set priorities and
sched_set_itmt_support() is called to enable ITMT feature.
Amd-pstate driver uses the highest performance value to indicate
the priority of CPU. The higher value has a higher priority.
Amd-pstate driver will provide an initial core ordering at boot time.
It relies on the CPPC interface to communicate the core ranking to the
operating system and scheduler to make sure that OS is choosing the cores
with highest performance firstly for scheduling the process. When amd-pstate
driver receives a message with the highest performance change, it will
update the core ranking.
Changes from V9->V10:
- cpufreq: amd-pstate:
- - add judgement for highest_perf. When it is less than 255, the
preferred core feature is enabled. And it will set the priority.
- - deleset "static u32 max_highest_perf" etc, because amd p-state
perferred coe does not require specail process for hotpulg.
Changes form V8->V9:
- all:
- - pick up Tested-By flag added by Oleksandr.
- cpufreq: amd-pstate:
- - pick up Review-By flag added by Wyes.
- - ignore modification of bug.
- - add a attribute of prefcore_ranking.
- - modify data type conversion from u32 to int.
- Documentation: amd-pstate:
- - pick up Review-By flag added by Wyes.
Changes form V7->V8:
- all:
- - pick up Review-By flag added by Mario and Ray.
- cpufreq: amd-pstate:
- - use hw_prefcore embeds into cpudata structure.
- - delete preferred core init from cpu online/off.
Changes form V6->V7:
- x86:
- - Modify kconfig about X86_AMD_PSTATE.
- cpufreq: amd-pstate:
- - modify incorrect comments about scheduler_work().
- - convert highest_perf data type.
- - modify preferred core init when cpu init and online.
- acpi: cppc:
- - modify link of CPPC highest performance.
- cpufreq:
- - modify link of CPPC highest performance changed.
Changes form V5->V6:
- cpufreq: amd-pstate:
- - modify the wrong tag order.
- - modify warning about hw_prefcore sysfs attribute.
- - delete duplicate comments.
- - modify the variable name cppc_highest_perf to prefcore_ranking.
- - modify judgment conditions for setting highest_perf.
- - modify sysfs attribute for CPPC highest perf to pr_debug message.
- Documentation: amd-pstate:
- - modify warning: title underline too short.
Changes form V4->V5:
- cpufreq: amd-pstate:
- - modify sysfs attribute for CPPC highest perf.
- - modify warning about comments
- - rebase linux-next
- cpufreq:
- - Moidfy warning about function declarations.
- Documentation: amd-pstate:
- - align with ``amd-pstat``
Changes form V3->V4:
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.
Changes form V2->V3:
- x86:
- - Modify kconfig and description.
- cpufreq: amd-pstate:
- - Add Co-developed-by tag in commit message.
- cpufreq:
- - Modify commit message.
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.
Changes form V1->V2:
- acpi: cppc:
- - Add reference link.
- cpufreq:
- - Moidfy link error.
- cpufreq: amd-pstate:
- - Init the priorities of all online CPUs
- - Use a single variable to represent the status of preferred core.
- Documentation:
- - Default enabled preferred core.
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.
- - Default enabled preferred core.
- - Use a single variable to represent the status of preferred core.
Meng Li (7):
x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion.
acpi: cppc: Add get the highest performance cppc control
cpufreq: amd-pstate: Enable amd-pstate preferred core supporting.
cpufreq: Add a notification message that the highest perf has changed
cpufreq: amd-pstate: Update amd-pstate preferred core ranking
dynamically
Documentation: amd-pstate: introduce amd-pstate preferred core
Documentation: introduce amd-pstate preferrd core mode kernel command
line options
.../admin-guide/kernel-parameters.txt | 5 +
Documentation/admin-guide/pm/amd-pstate.rst | 59 +++++-
arch/x86/Kconfig | 5 +-
drivers/acpi/cppc_acpi.c | 13 ++
drivers/acpi/processor_driver.c | 6 +
drivers/cpufreq/amd-pstate.c | 187 ++++++++++++++++--
drivers/cpufreq/cpufreq.c | 13 ++
include/acpi/cppc_acpi.h | 5 +
include/linux/amd-pstate.h | 10 +
include/linux/cpufreq.h | 5 +
10 files changed, 288 insertions(+), 20 deletions(-)
--
2.34.1
When we dynamically generate a name for a configuration in get-reg-list
we use strcat() to append to a buffer allocated using malloc() but we
never initialise that buffer. Since malloc() offers no guarantees
regarding the contents of the memory it returns this can lead to us
corrupting, and likely overflowing, the buffer:
vregs: PASS
vregs+pmu: PASS
sve: PASS
sve+pmu: PASS
vregs+pauth_address+pauth_generic: PASS
X�vr+gspauth_addre+spauth_generi+pmu: PASS
Initialise the buffer to an empty string to avoid this.
Fixes: 2f9ace5d4557 ("KVM: arm64: selftests: get-reg-list: Introduce vcpu configs")
Reviewed-by: Andrew Jones <ajones(a)ventanamicro.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v3:
- Rebase this bugfix onto v6.7-rc1
- Link to v2: https://lore.kernel.org/r/20231017-kvm-get-reg-list-str-init-v2-1-ee30b1df3…
Changes in v2:
- Update Fixes: tag.
- Link to v1: https://lore.kernel.org/r/20231013-kvm-get-reg-list-str-init-v1-1-034f370ff…
---
tools/testing/selftests/kvm/get-reg-list.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/kvm/get-reg-list.c b/tools/testing/selftests/kvm/get-reg-list.c
index be7bf5224434..dd62a6976c0d 100644
--- a/tools/testing/selftests/kvm/get-reg-list.c
+++ b/tools/testing/selftests/kvm/get-reg-list.c
@@ -67,6 +67,7 @@ static const char *config_name(struct vcpu_reg_list *c)
c->name = malloc(len);
+ c->name[0] = '\0';
len = 0;
for_each_sublist(c, s) {
if (!strcmp(s->name, "base"))
---
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
change-id: 20231012-kvm-get-reg-list-str-init-76c8ed4e19d6
Best regards,
--
Mark Brown <broonie(a)kernel.org>
v2:
- Add 2 read-only workqueue sysfs files to expose the user requested
cpumask as well as the isolated CPUs to be excluded from
wq_unbound_cpumask.
- Ensure that caller of the new workqueue_unbound_exclude_cpumask()
hold cpus_read_lock.
- Update the cpuset code to make sure the cpus_read_lock is held
whenever workqueue_unbound_exclude_cpumask() may be called.
Isolated cpuset partition can currently be created to contain an
exclusive set of CPUs not used in other cgroups and with load balancing
disabled to reduce interference from the scheduler.
The main purpose of this isolated partition type is to dynamically
emulate what can be done via the "isolcpus" boot command line option,
specifically the default domain flag. One effect of the "isolcpus" option
is to remove the isolated CPUs from the cpumasks of unbound workqueues
since running work functions in an isolated CPU can be a major source
of interference. Changing the unbound workqueue cpumasks can be done at
run time by writing an appropriate cpumask without the isolated CPUs to
/sys/devices/virtual/workqueue/cpumask. So one can set up an isolated
cpuset partition and then write to the cpumask sysfs file to achieve
similar level of CPU isolation. However, this manual process can be
error prone.
This patch series implements automatic exclusion of isolated CPUs from
unbound workqueue cpumasks when an isolated cpuset partition is created
and then adds those CPUs back when the isolated partition is destroyed.
There are also other places in the kernel that look at the HK_FLAG_DOMAIN
cpumask or other HK_FLAG_* cpumasks and exclude the isolated CPUs from
certain actions to further reduce interference. CPUs in an isolated
cpuset partition will not be able to avoid those interferences yet. That
may change in the future as the need arises.
Waiman Long (4):
workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs
from wq_unbound_cpumask
selftests/cgroup: Minor code cleanup and reorganization of
test_cpuset_prs.sh
cgroup/cpuset: Keep track of CPUs in isolated partitions
cgroup/cpuset: Take isolated CPUs out of workqueue unbound cpumask
Documentation/admin-guide/cgroup-v2.rst | 10 +-
include/linux/workqueue.h | 2 +-
kernel/cgroup/cpuset.c | 286 +++++++++++++-----
kernel/workqueue.c | 91 +++++-
.../selftests/cgroup/test_cpuset_prs.sh | 216 ++++++++-----
5 files changed, 438 insertions(+), 167 deletions(-)
--
2.39.3
Another candidate of the subject was "Let users feed and tame DAMOS".
DAMOS Control Difficulty
========================
DAMOS helps users easily implementing effective access pattern aware
system operations. However, controlling DAMOS in wild is not that easy.
The basic way to control DAMON is specifying the target access pattern.
Hence, the user is assumed to know the access pattern of the system and
the workloads well. Though some good tools including DAMON can help
that, it requires time and resource, and the cost depends on the
complexity and the dynamicity of the system and workloads. After all,
the access pattern consist of three ranges, namely ranges of access
rate, age, and size of the regions. Tuning six parameters is already
complex. It is not doable for everyone.
To ease the control, DAMOS allows users to set the upper-limit of the
schemes's aggressiveness, namely DAMOS quota. Then DAMOS prioritizes
regions to apply the action under the limit based on the action and the
access pattern of the regions. For example, use can ask DAMOS to page
out up to 100 MiB of memory regions per second. Then DAMOS pages out
regions that not accessed for longer time first under the limit. This
allows users to set access pattern bit more naively, and focus on only
the one parameter, the quota. That is, the number of parameters to tune
with special care can be reduced from six to one.
Still, however, the optimal value for the quota depends on the system
and the workloads' characteristics, so not that simple. The number of
parameters to tune can also increase again if the user needs to run
multiple schemes, e.g., collapsing hot pages into THP while splitting
cold pages into regular pages.
In short, the existing approach asks users to find the perfect or
adapted tuning and instruct DAMOS how to work. It requires users to be
deligent. That's not a virtue of human, but the machine.
Aim-oriented Feedback-driven DAMOS Quota Auto Tuning
====================================================
Most users would start using DAMOS since there is something they want to
achieve with DAMOS. Having such goal metrics like SLO is common.
Hence, a better approach would be letting users inform DAMOS what they
aim to achieve, and how well DAMOS is doing that. Then DAMOS can
somehow make it. In detail, users provide feedback for each DAMOS
scheme. DAMOS then tune the quota of each scheme based on the users'
feedback and the current quota values.
This patchset implements the idea.
Implementation
--------------
The core logic implementation is in the first patch. In short, it uses
below simple feedback loop algorithm to get next aggressiveness from the
current aggressiveness and the feedback (target_core and current_score)
for the current aggressiveness.
f(n, target_score, current_score) =
max(f(n - 1) * ((target_score - current_score) / target_score + 1), 1)
Note that this algorithm assumes the aggressiveness and the score are
positively proportional. Making it true is the feedback provider's
responsibility.
Test Results
------------
To show if this provides the expected benefit, we extend the performance
tests of damon-tests suite to support virtual address space-based
proactive reclamation scheme that aims 0.5% last 10 seconds some memory
PSI. The test suite runs PARSEC3 and SPLASH-2X workloads with the
scheme and measure the runtime, the RSS, and the PSI for memory (some).
We do same with the same scheme but not having the goal, and yet another
variant of it that the target access patterns of the scheme is tuned for
each workload, in a offline-tuning approach named DAMOOS[1].
The results that normalized to the output that made without any scheme
are as below. The PSI for original run (without any scheme) was zero.
To avoid divide-by-zero, we normalize the value to that of Not-tuned
scheme's result.
xx Not-tuned Offline-tuned Online-tuned
RSS 0.622688178226118 0.787950678944904 0.740093483278979
runtime 1.11767826657912 1.0564674983585 1.0910833880499
PSI 1 0.727521443794069 0.308498846350299
The not-tuned scheme acheives about 38.7% memory saving but incur about
11.7% runtime slowdown. The offline-tuned scheme achieves about 22.2%
memory saving with about 5.5% runtiem slowdown. It also achieves about
28.2% PSI saving. The online-tuned scheme achieves about 26% memory
saving with about 9.1% runtime slowdown. It also achieves about 69.1%
PSI saving. Given the online-tuned version is using this RFC level
implementation and the goal (0.5% last-10 secs memory PSI) was made
after only a few experiments within a day, I think this results show
some potential of this feedback-driven auto tuning approach.
The test code is available[2], so you can reproduce on your system.
[1] https://www.amazon.science/publications/daos-data-access-aware-operating-sy…
[2] https://github.com/damonitor/damon-tests/commit/3f884e61193f0166b8724554b6d…
Patches Sequence
================
The first four patches implement the core logic and user interfaces for
the auto tuning. The first patch implements the core logic for the auto
tuning, and the API for DAMOS users in the kernel space. The second
patch implements basic file operations of DAMON sysfs directories and
files that will be used for setting the goals and providing the
feedback. The third patch connects the quota goals files inputs to the
DAMOS core logic. Finally the fourth patch implements a dedicated DAMOS
sysfs command for efficiently committing the quota goals feedback.
Two patches for simple test of the logic and interfaces follow. The
fifth patch implements the core logic unit test. The sixth patch
implements a selftest for the DAMON Sysfs interface for the goals.
Finally, two patches for documentation follows. The seventh patch
documents the design of the feature. The final eighth patch updates the
usage document for the features.
SeongJae Park (8):
mm/damon/core: implement goal-oriented feedback-driven quota
auto-tuning
mm/damon/sysfs-schemes: implement scheme quota goals directory
mm/damon/sysfs-schemes: commit damos quota goals user input to DAMOS
quota auto-tuning
mm/damon/sysfs-schemes: implement a command for scheme quota goals
only commit
mm/damon/core-test: add a unit test for the feedback loop algorithm
selftests/damon: test quota goals directory
Docs/mm/damon/design: Document DAMOS quota auto tuning
Docs/admin-guide/mm/damon/usage: update for quota goals
Documentation/admin-guide/mm/damon/usage.rst | 25 +-
Documentation/mm/damon/design.rst | 11 +
include/linux/damon.h | 19 ++
mm/damon/core-test.h | 32 +++
mm/damon/core.c | 65 ++++-
mm/damon/sysfs-common.h | 3 +
mm/damon/sysfs-schemes.c | 272 ++++++++++++++++++-
mm/damon/sysfs.c | 27 ++
tools/testing/selftests/damon/sysfs.sh | 27 ++
9 files changed, 463 insertions(+), 18 deletions(-)
base-commit: 4f26b84c39fbc6b03208674681bfde06e0bce25a
--
2.34.1
Adds a check to verify if the rtc device file is valid or not
and prints a useful error message if the file is not accessible.
Signed-off-by: Atul Kumar Pant <atulpant.linux(a)gmail.com>
---
changes since v5:
Updated error message to use strerror().
If the rtc file is invalid, the skip the test.
changes since v4:
Updated the commit message.
changes since v3:
Added Linux-kselftest and Linux-kernel mailing lists.
changes since v2:
Changed error message when rtc file does not exist.
changes since v1:
Removed check for uid=0
If rtc file is invalid, then exit the test.
tools/testing/selftests/rtc/rtctest.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/rtc/rtctest.c b/tools/testing/selftests/rtc/rtctest.c
index 630fef735c7e..27b466111885 100644
--- a/tools/testing/selftests/rtc/rtctest.c
+++ b/tools/testing/selftests/rtc/rtctest.c
@@ -15,6 +15,7 @@
#include <sys/types.h>
#include <time.h>
#include <unistd.h>
+#include <error.h>
#include "../kselftest_harness.h"
#include "../kselftest.h"
@@ -437,7 +438,7 @@ int main(int argc, char **argv)
if (access(rtc_file, F_OK) == 0)
ret = test_harness_run(argc, argv);
else
- ksft_exit_fail_msg("[ERROR]: Cannot access rtc file %s - Exiting\n", rtc_file);
+ ksft_exit_skip("%s: %s\n", rtc_file, strerror(errno));
return ret;
}
--
2.25.1
This series extends KVM RISC-V to allow Guest/VM discover and use
conditional operations related ISA extensions (namely XVentanaCondOps
and Zicond).
To try these patches, use KVMTOOL from riscv_zbx_zicntr_smstateen_condops_v1
branch at: https://github.com/avpatel/kvmtool.git
These patches are based upon the latest riscv_kvm_queue and can also be
found in the riscv_kvm_condops_v3 branch at:
https://github.com/avpatel/linux.git
Changes since v2:
- Dropped patch1, patch2, and patch5 since these patches don't meet
the requirements of patch acceptance policy.
Changes since v1:
- Rebased the series on riscv_kvm_queue
- Split PATCH1 and PATCH2 of v1 series into two patches
- Added separate test configs for XVentanaCondOps and Zicond in PATCH7
of v1 series.
Anup Patel (6):
dt-bindings: riscv: Add Zicond extension entry
RISC-V: Detect Zicond from ISA string
RISC-V: KVM: Allow Zicond extension for Guest/VM
KVM: riscv: selftests: Add senvcfg register to get-reg-list test
KVM: riscv: selftests: Add smstateen registers to get-reg-list test
KVM: riscv: selftests: Add condops extensions to get-reg-list test
.../devicetree/bindings/riscv/extensions.yaml | 6 +++
arch/riscv/include/asm/hwcap.h | 1 +
arch/riscv/include/uapi/asm/kvm.h | 1 +
arch/riscv/kernel/cpufeature.c | 1 +
arch/riscv/kvm/vcpu_onereg.c | 2 +
.../selftests/kvm/riscv/get-reg-list.c | 54 +++++++++++++++++++
6 files changed, 65 insertions(+)
--
2.34.1
Hello, all:
This list is being migrated to new vger infrastructure. No action is required
on your part and there will be no change in how you interact with this list
after the migration is completed.
There will be a short 30-minute delay to the list archives on lore.kernel.org.
Once the backend work is done, I will follow up with another message.
-K