The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 964209202ebe1569c858337441e87ef0f9d71416 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to 'stable@vger.kernel.org' --in-reply-to '2025070817-quaintly-lend-80a3@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 964209202ebe1569c858337441e87ef0f9d71416 Mon Sep 17 00:00:00 2001 From: Zhang Rui rui.zhang@intel.com Date: Thu, 19 Jun 2025 15:13:40 +0800 Subject: [PATCH] powercap: intel_rapl: Do not change CLAMPING bit if ENABLE bit cannot be changed
PL1 cannot be disabled on some platforms. The ENABLE bit is still set after software clears it. This behavior leads to a scenario where, upon user request to disable the Power Limit through the powercap sysfs, the ENABLE bit remains set while the CLAMPING bit is inadvertently cleared.
According to the Intel Software Developer's Manual, the CLAMPING bit, "When set, allows the processor to go below the OS requested P states in order to maintain the power below specified Platform Power Limit value."
Thus this means the system may operate at higher power levels than intended on such platforms.
Enhance the code to check ENABLE bit after writing to it, and stop further processing if ENABLE bit cannot be changed.
Reported-by: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Fixes: 2d281d8196e3 ("PowerCap: Introduce Intel RAPL power capping driver") Cc: All applicable stable@vger.kernel.org Signed-off-by: Zhang Rui rui.zhang@intel.com Link: https://patch.msgid.link/20250619071340.384782-1-rui.zhang@intel.com [ rjw: Use str_enabled_disabled() instead of open-coded equivalent ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index e3be40adc0d7..faa0b6bc5b53 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -341,12 +341,28 @@ static int set_domain_enable(struct powercap_zone *power_zone, bool mode) { struct rapl_domain *rd = power_zone_to_rapl_domain(power_zone); struct rapl_defaults *defaults = get_defaults(rd->rp); + u64 val; int ret;
cpus_read_lock(); ret = rapl_write_pl_data(rd, POWER_LIMIT1, PL_ENABLE, mode); - if (!ret && defaults->set_floor_freq) + if (ret) + goto end; + + ret = rapl_read_pl_data(rd, POWER_LIMIT1, PL_ENABLE, false, &val); + if (ret) + goto end; + + if (mode != val) { + pr_debug("%s cannot be %s\n", power_zone->name, + str_enabled_disabled(mode)); + goto end; + } + + if (defaults->set_floor_freq) defaults->set_floor_freq(rd, mode); + +end: cpus_read_unlock();
return ret;
From: Zhang Rui rui.zhang@intel.com
[ Upstream commit e8e28c2af16b279b6c37d533e1e73effb197cf2e ]
rapl_defaults is Interface specific.
Although current MSR and MMIO Interface share the same rapl_defaults, new Interface like TPMI need its own rapl_defaults callbacks.
Save the rapl_defaults information in the Interface private structure.
No functional change.
Signed-off-by: Zhang Rui rui.zhang@intel.com Tested-by: Wang Wendy wendy.wang@intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Stable-dep-of: 964209202ebe ("powercap: intel_rapl: Do not change CLAMPING bit if ENABLE bit cannot be changed") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/powercap/intel_rapl_common.c | 46 ++++++++++++++++++++-------- include/linux/intel_rapl.h | 2 ++ 2 files changed, 35 insertions(+), 13 deletions(-)
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index 26d00b1853b42..6801eb2d299cc 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -115,6 +115,11 @@ struct rapl_defaults { }; static struct rapl_defaults *rapl_defaults;
+static struct rapl_defaults *get_defaults(struct rapl_package *rp) +{ + return rp->priv->defaults; +} + /* Sideband MBI registers */ #define IOSF_CPU_POWER_BUDGET_CTL_BYT (0x2) #define IOSF_CPU_POWER_BUDGET_CTL_TNG (0xdf) @@ -227,14 +232,15 @@ static int find_nr_power_limit(struct rapl_domain *rd) static int set_domain_enable(struct powercap_zone *power_zone, bool mode) { struct rapl_domain *rd = power_zone_to_rapl_domain(power_zone); + struct rapl_defaults *defaults = get_defaults(rd->rp);
if (rd->state & DOMAIN_STATE_BIOS_LOCKED) return -EACCES;
cpus_read_lock(); rapl_write_data_raw(rd, PL1_ENABLE, mode); - if (rapl_defaults->set_floor_freq) - rapl_defaults->set_floor_freq(rd, mode); + if (defaults->set_floor_freq) + defaults->set_floor_freq(rd, mode); cpus_read_unlock();
return 0; @@ -551,6 +557,7 @@ static void rapl_init_domains(struct rapl_package *rp) enum rapl_domain_type i; enum rapl_domain_reg_id j; struct rapl_domain *rd = rp->domains; + struct rapl_defaults *defaults = get_defaults(rp);
for (i = 0; i < RAPL_DOMAIN_MAX; i++) { unsigned int mask = rp->domain_map & (1 << i); @@ -592,14 +599,14 @@ static void rapl_init_domains(struct rapl_package *rp) switch (i) { case RAPL_DOMAIN_DRAM: rd->domain_energy_unit = - rapl_defaults->dram_domain_energy_unit; + defaults->dram_domain_energy_unit; if (rd->domain_energy_unit) pr_info("DRAM domain energy unit %dpj\n", rd->domain_energy_unit); break; case RAPL_DOMAIN_PLATFORM: rd->domain_energy_unit = - rapl_defaults->psys_domain_energy_unit; + defaults->psys_domain_energy_unit; if (rd->domain_energy_unit) pr_info("Platform domain energy unit %dpj\n", rd->domain_energy_unit); @@ -616,6 +623,7 @@ static u64 rapl_unit_xlate(struct rapl_domain *rd, enum unit_type type, { u64 units = 1; struct rapl_package *rp = rd->rp; + struct rapl_defaults *defaults = get_defaults(rp); u64 scale = 1;
switch (type) { @@ -631,7 +639,7 @@ static u64 rapl_unit_xlate(struct rapl_domain *rd, enum unit_type type, units = rp->energy_unit; break; case TIME_UNIT: - return rapl_defaults->compute_time_window(rp, value, to_raw); + return defaults->compute_time_window(rp, value, to_raw); case ARBITRARY_UNIT: default: return value; @@ -702,10 +710,18 @@ static struct rapl_primitive_info rpi[] = { {NULL, 0, 0, 0}, };
+static int rapl_config(struct rapl_package *rp) +{ + rp->priv->defaults = (void *)rapl_defaults; + return 0; +} + static enum rapl_primitives prim_fixups(struct rapl_domain *rd, enum rapl_primitives prim) { - if (!rapl_defaults->spr_psys_bits) + struct rapl_defaults *defaults = get_defaults(rd->rp); + + if (!defaults->spr_psys_bits) return prim;
if (rd->id != RAPL_DOMAIN_PLATFORM) @@ -960,16 +976,17 @@ static void set_floor_freq_default(struct rapl_domain *rd, bool mode) static void set_floor_freq_atom(struct rapl_domain *rd, bool enable) { static u32 power_ctrl_orig_val; + struct rapl_defaults *defaults = get_defaults(rd->rp); u32 mdata;
- if (!rapl_defaults->floor_freq_reg_addr) { + if (!defaults->floor_freq_reg_addr) { pr_err("Invalid floor frequency config register\n"); return; }
if (!power_ctrl_orig_val) iosf_mbi_read(BT_MBI_UNIT_PMC, MBI_CR_READ, - rapl_defaults->floor_freq_reg_addr, + defaults->floor_freq_reg_addr, &power_ctrl_orig_val); mdata = power_ctrl_orig_val; if (enable) { @@ -977,7 +994,7 @@ static void set_floor_freq_atom(struct rapl_domain *rd, bool enable) mdata |= 1 << 8; } iosf_mbi_write(BT_MBI_UNIT_PMC, MBI_CR_WRITE, - rapl_defaults->floor_freq_reg_addr, mdata); + defaults->floor_freq_reg_addr, mdata); }
static u64 rapl_compute_time_window_core(struct rapl_package *rp, u64 value, @@ -1374,11 +1391,9 @@ struct rapl_package *rapl_add_package(int cpu, struct rapl_if_priv *priv) { int id = topology_logical_die_id(cpu); struct rapl_package *rp; + struct rapl_defaults *defaults; int ret;
- if (!rapl_defaults) - return ERR_PTR(-ENODEV); - rp = kzalloc(sizeof(struct rapl_package), GFP_KERNEL); if (!rp) return ERR_PTR(-ENOMEM); @@ -1388,6 +1403,10 @@ struct rapl_package *rapl_add_package(int cpu, struct rapl_if_priv *priv) rp->lead_cpu = cpu; rp->priv = priv;
+ ret = rapl_config(rp); + if (ret) + goto err_free_package; + if (topology_max_die_per_package() > 1) snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH, "package-%d-die-%d", @@ -1396,8 +1415,9 @@ struct rapl_package *rapl_add_package(int cpu, struct rapl_if_priv *priv) snprintf(rp->name, PACKAGE_DOMAIN_NAME_LENGTH, "package-%d", topology_physical_package_id(cpu));
+ defaults = get_defaults(rp); /* check if the package contains valid domains */ - if (rapl_detect_domains(rp, cpu) || rapl_defaults->check_unit(rp, cpu)) { + if (rapl_detect_domains(rp, cpu) || defaults->check_unit(rp, cpu)) { ret = -ENODEV; goto err_free_package; } diff --git a/include/linux/intel_rapl.h b/include/linux/intel_rapl.h index 9f4b6f5b822f6..bc698f260796e 100644 --- a/include/linux/intel_rapl.h +++ b/include/linux/intel_rapl.h @@ -121,6 +121,7 @@ struct reg_action { * registers. * @write_raw: Callback for writing RAPL interface specific * registers. + * @defaults: internal pointer to interface default settings */ struct rapl_if_priv { struct powercap_control_type *control_type; @@ -131,6 +132,7 @@ struct rapl_if_priv { int limits[RAPL_DOMAIN_MAX]; int (*read_raw)(int cpu, struct reg_action *ra); int (*write_raw)(int cpu, struct reg_action *ra); + void *defaults; };
/* maximum rapl package domain name: package-%d-die-%d */
From: Zhang Rui rui.zhang@intel.com
[ Upstream commit 98ff639a7289067247b3ef9dd5d1e922361e7365 ]
RAPL primitive information is Interface specific.
Although current MSR and MMIO Interface share the same RAPL primitives, new Interface like TPMI has its own RAPL primitive information.
Save the primitive information in the Interface private structure.
Plus, using variant name "rp" for struct rapl_primitive_info is confusing because "rp" is also used for struct rapl_package. Use "rpi" as the variant name for struct rapl_primitive_info, and rename the previous rpi[] array to avoid conflict.
No functional change.
Signed-off-by: Zhang Rui rui.zhang@intel.com Tested-by: Wang Wendy wendy.wang@intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Stable-dep-of: 964209202ebe ("powercap: intel_rapl: Do not change CLAMPING bit if ENABLE bit cannot be changed") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/powercap/intel_rapl_common.c | 50 ++++++++++++++++++---------- include/linux/intel_rapl.h | 2 ++ 2 files changed, 35 insertions(+), 17 deletions(-)
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index 6801eb2d299cc..e99905ca271da 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -654,7 +654,7 @@ static u64 rapl_unit_xlate(struct rapl_domain *rd, enum unit_type type, }
/* in the order of enum rapl_primitives */ -static struct rapl_primitive_info rpi[] = { +static struct rapl_primitive_info rpi_default[] = { /* name, mask, shift, msr index, unit divisor */ PRIMITIVE_INFO_INIT(ENERGY_COUNTER, ENERGY_STATUS_MASK, 0, RAPL_DOMAIN_REG_STATUS, ENERGY_UNIT, 0), @@ -710,9 +710,20 @@ static struct rapl_primitive_info rpi[] = { {NULL, 0, 0, 0}, };
+static struct rapl_primitive_info *get_rpi(struct rapl_package *rp, int prim) +{ + struct rapl_primitive_info *rpi = rp->priv->rpi; + + if (prim < 0 || prim > NR_RAPL_PRIMITIVES || !rpi) + return NULL; + + return &rpi[prim]; +} + static int rapl_config(struct rapl_package *rp) { rp->priv->defaults = (void *)rapl_defaults; + rp->priv->rpi = (void *)rpi_default; return 0; }
@@ -763,14 +774,14 @@ static int rapl_read_data_raw(struct rapl_domain *rd, { u64 value; enum rapl_primitives prim_fixed = prim_fixups(rd, prim); - struct rapl_primitive_info *rp = &rpi[prim_fixed]; + struct rapl_primitive_info *rpi = get_rpi(rd->rp, prim_fixed); struct reg_action ra; int cpu;
- if (!rp->name || rp->flag & RAPL_PRIMITIVE_DUMMY) + if (!rpi || !rpi->name || rpi->flag & RAPL_PRIMITIVE_DUMMY) return -EINVAL;
- ra.reg = rd->regs[rp->id]; + ra.reg = rd->regs[rpi->id]; if (!ra.reg) return -EINVAL;
@@ -778,26 +789,26 @@ static int rapl_read_data_raw(struct rapl_domain *rd,
/* domain with 2 limits has different bit */ if (prim == FW_LOCK && rd->rp->priv->limits[rd->id] == 2) { - rp->mask = POWER_HIGH_LOCK; - rp->shift = 63; + rpi->mask = POWER_HIGH_LOCK; + rpi->shift = 63; } /* non-hardware data are collected by the polling thread */ - if (rp->flag & RAPL_PRIMITIVE_DERIVED) { + if (rpi->flag & RAPL_PRIMITIVE_DERIVED) { *data = rd->rdd.primitives[prim]; return 0; }
- ra.mask = rp->mask; + ra.mask = rpi->mask;
if (rd->rp->priv->read_raw(cpu, &ra)) { pr_debug("failed to read reg 0x%llx on cpu %d\n", ra.reg, cpu); return -EIO; }
- value = ra.value >> rp->shift; + value = ra.value >> rpi->shift;
if (xlate) - *data = rapl_unit_xlate(rd, rp->unit, value, 0); + *data = rapl_unit_xlate(rd, rpi->unit, value, 0); else *data = value;
@@ -810,21 +821,24 @@ static int rapl_write_data_raw(struct rapl_domain *rd, unsigned long long value) { enum rapl_primitives prim_fixed = prim_fixups(rd, prim); - struct rapl_primitive_info *rp = &rpi[prim_fixed]; + struct rapl_primitive_info *rpi = get_rpi(rd->rp, prim_fixed); int cpu; u64 bits; struct reg_action ra; int ret;
+ if (!rpi || !rpi->name || rpi->flag & RAPL_PRIMITIVE_DUMMY) + return -EINVAL; + cpu = rd->rp->lead_cpu; - bits = rapl_unit_xlate(rd, rp->unit, value, 1); - bits <<= rp->shift; - bits &= rp->mask; + bits = rapl_unit_xlate(rd, rpi->unit, value, 1); + bits <<= rpi->shift; + bits &= rpi->mask;
memset(&ra, 0, sizeof(ra));
- ra.reg = rd->regs[rp->id]; - ra.mask = rp->mask; + ra.reg = rd->regs[rpi->id]; + ra.mask = rpi->mask; ra.value = bits;
ret = rd->rp->priv->write_raw(cpu, &ra); @@ -1165,8 +1179,10 @@ static void rapl_update_domain_data(struct rapl_package *rp) rp->domains[dmn].name); /* exclude non-raw primitives */ for (prim = 0; prim < NR_RAW_PRIMITIVES; prim++) { + struct rapl_primitive_info *rpi = get_rpi(rp, prim); + if (!rapl_read_data_raw(&rp->domains[dmn], prim, - rpi[prim].unit, &val)) + rpi->unit, &val)) rp->domains[dmn].rdd.primitives[prim] = val; } } diff --git a/include/linux/intel_rapl.h b/include/linux/intel_rapl.h index bc698f260796e..b2456d97995ba 100644 --- a/include/linux/intel_rapl.h +++ b/include/linux/intel_rapl.h @@ -122,6 +122,7 @@ struct reg_action { * @write_raw: Callback for writing RAPL interface specific * registers. * @defaults: internal pointer to interface default settings + * @rpi: internal pointer to interface primitive info */ struct rapl_if_priv { struct powercap_control_type *control_type; @@ -133,6 +134,7 @@ struct rapl_if_priv { int (*read_raw)(int cpu, struct reg_action *ra); int (*write_raw)(int cpu, struct reg_action *ra); void *defaults; + void *rpi; };
/* maximum rapl package domain name: package-%d-die-%d */
From: Zhang Rui rui.zhang@intel.com
[ Upstream commit 11edbe5c66d624e2e1eec8929d3668d76a574c3b ]
Currently, the RAPL primitive information array is required to be initialized in the order of enum rapl_primitives. This can break easily, especially when different RAPL Interfaces may support different sets of primitives.
Convert the code to initialize the primitive information using array index explicitly.
No functional change.
Signed-off-by: Zhang Rui rui.zhang@intel.com Tested-by: Wang Wendy wendy.wang@intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Stable-dep-of: 964209202ebe ("powercap: intel_rapl: Do not change CLAMPING bit if ENABLE bit cannot be changed") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/powercap/intel_rapl_common.c | 54 ++++++++++++++-------------- 1 file changed, 26 insertions(+), 28 deletions(-)
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index e99905ca271da..169a08e691e8f 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -653,61 +653,59 @@ static u64 rapl_unit_xlate(struct rapl_domain *rd, enum unit_type type, return div64_u64(value, scale); }
-/* in the order of enum rapl_primitives */ -static struct rapl_primitive_info rpi_default[] = { +static struct rapl_primitive_info rpi_default[NR_RAPL_PRIMITIVES] = { /* name, mask, shift, msr index, unit divisor */ - PRIMITIVE_INFO_INIT(ENERGY_COUNTER, ENERGY_STATUS_MASK, 0, + [ENERGY_COUNTER] = PRIMITIVE_INFO_INIT(ENERGY_COUNTER, ENERGY_STATUS_MASK, 0, RAPL_DOMAIN_REG_STATUS, ENERGY_UNIT, 0), - PRIMITIVE_INFO_INIT(POWER_LIMIT1, POWER_LIMIT1_MASK, 0, + [POWER_LIMIT1] = PRIMITIVE_INFO_INIT(POWER_LIMIT1, POWER_LIMIT1_MASK, 0, RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), - PRIMITIVE_INFO_INIT(POWER_LIMIT2, POWER_LIMIT2_MASK, 32, + [POWER_LIMIT2] = PRIMITIVE_INFO_INIT(POWER_LIMIT2, POWER_LIMIT2_MASK, 32, RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), - PRIMITIVE_INFO_INIT(POWER_LIMIT4, POWER_LIMIT4_MASK, 0, + [POWER_LIMIT4] = PRIMITIVE_INFO_INIT(POWER_LIMIT4, POWER_LIMIT4_MASK, 0, RAPL_DOMAIN_REG_PL4, POWER_UNIT, 0), - PRIMITIVE_INFO_INIT(FW_LOCK, POWER_LOW_LOCK, 31, + [FW_LOCK] = PRIMITIVE_INFO_INIT(FW_LOCK, POWER_LOW_LOCK, 31, RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), - PRIMITIVE_INFO_INIT(PL1_ENABLE, POWER_LIMIT1_ENABLE, 15, + [PL1_ENABLE] = PRIMITIVE_INFO_INIT(PL1_ENABLE, POWER_LIMIT1_ENABLE, 15, RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), - PRIMITIVE_INFO_INIT(PL1_CLAMP, POWER_LIMIT1_CLAMP, 16, + [PL1_CLAMP] = PRIMITIVE_INFO_INIT(PL1_CLAMP, POWER_LIMIT1_CLAMP, 16, RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), - PRIMITIVE_INFO_INIT(PL2_ENABLE, POWER_LIMIT2_ENABLE, 47, + [PL2_ENABLE] = PRIMITIVE_INFO_INIT(PL2_ENABLE, POWER_LIMIT2_ENABLE, 47, RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), - PRIMITIVE_INFO_INIT(PL2_CLAMP, POWER_LIMIT2_CLAMP, 48, + [PL2_CLAMP] = PRIMITIVE_INFO_INIT(PL2_CLAMP, POWER_LIMIT2_CLAMP, 48, RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), - PRIMITIVE_INFO_INIT(PL4_ENABLE, POWER_LIMIT4_MASK, 0, + [PL4_ENABLE] = PRIMITIVE_INFO_INIT(PL4_ENABLE, POWER_LIMIT4_MASK, 0, RAPL_DOMAIN_REG_PL4, ARBITRARY_UNIT, 0), - PRIMITIVE_INFO_INIT(TIME_WINDOW1, TIME_WINDOW1_MASK, 17, + [TIME_WINDOW1] = PRIMITIVE_INFO_INIT(TIME_WINDOW1, TIME_WINDOW1_MASK, 17, RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0), - PRIMITIVE_INFO_INIT(TIME_WINDOW2, TIME_WINDOW2_MASK, 49, + [TIME_WINDOW2] = PRIMITIVE_INFO_INIT(TIME_WINDOW2, TIME_WINDOW2_MASK, 49, RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0), - PRIMITIVE_INFO_INIT(THERMAL_SPEC_POWER, POWER_INFO_THERMAL_SPEC_MASK, + [THERMAL_SPEC_POWER] = PRIMITIVE_INFO_INIT(THERMAL_SPEC_POWER, POWER_INFO_THERMAL_SPEC_MASK, 0, RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0), - PRIMITIVE_INFO_INIT(MAX_POWER, POWER_INFO_MAX_MASK, 32, + [MAX_POWER] = PRIMITIVE_INFO_INIT(MAX_POWER, POWER_INFO_MAX_MASK, 32, RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0), - PRIMITIVE_INFO_INIT(MIN_POWER, POWER_INFO_MIN_MASK, 16, + [MIN_POWER] = PRIMITIVE_INFO_INIT(MIN_POWER, POWER_INFO_MIN_MASK, 16, RAPL_DOMAIN_REG_INFO, POWER_UNIT, 0), - PRIMITIVE_INFO_INIT(MAX_TIME_WINDOW, POWER_INFO_MAX_TIME_WIN_MASK, 48, + [MAX_TIME_WINDOW] = PRIMITIVE_INFO_INIT(MAX_TIME_WINDOW, POWER_INFO_MAX_TIME_WIN_MASK, 48, RAPL_DOMAIN_REG_INFO, TIME_UNIT, 0), - PRIMITIVE_INFO_INIT(THROTTLED_TIME, PERF_STATUS_THROTTLE_TIME_MASK, 0, + [THROTTLED_TIME] = PRIMITIVE_INFO_INIT(THROTTLED_TIME, PERF_STATUS_THROTTLE_TIME_MASK, 0, RAPL_DOMAIN_REG_PERF, TIME_UNIT, 0), - PRIMITIVE_INFO_INIT(PRIORITY_LEVEL, PP_POLICY_MASK, 0, + [PRIORITY_LEVEL] = PRIMITIVE_INFO_INIT(PRIORITY_LEVEL, PP_POLICY_MASK, 0, RAPL_DOMAIN_REG_POLICY, ARBITRARY_UNIT, 0), - PRIMITIVE_INFO_INIT(PSYS_POWER_LIMIT1, PSYS_POWER_LIMIT1_MASK, 0, + [PSYS_POWER_LIMIT1] = PRIMITIVE_INFO_INIT(PSYS_POWER_LIMIT1, PSYS_POWER_LIMIT1_MASK, 0, RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), - PRIMITIVE_INFO_INIT(PSYS_POWER_LIMIT2, PSYS_POWER_LIMIT2_MASK, 32, + [PSYS_POWER_LIMIT2] = PRIMITIVE_INFO_INIT(PSYS_POWER_LIMIT2, PSYS_POWER_LIMIT2_MASK, 32, RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), - PRIMITIVE_INFO_INIT(PSYS_PL1_ENABLE, PSYS_POWER_LIMIT1_ENABLE, 17, + [PSYS_PL1_ENABLE] = PRIMITIVE_INFO_INIT(PSYS_PL1_ENABLE, PSYS_POWER_LIMIT1_ENABLE, 17, RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), - PRIMITIVE_INFO_INIT(PSYS_PL2_ENABLE, PSYS_POWER_LIMIT2_ENABLE, 49, + [PSYS_PL2_ENABLE] = PRIMITIVE_INFO_INIT(PSYS_PL2_ENABLE, PSYS_POWER_LIMIT2_ENABLE, 49, RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), - PRIMITIVE_INFO_INIT(PSYS_TIME_WINDOW1, PSYS_TIME_WINDOW1_MASK, 19, + [PSYS_TIME_WINDOW1] = PRIMITIVE_INFO_INIT(PSYS_TIME_WINDOW1, PSYS_TIME_WINDOW1_MASK, 19, RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0), - PRIMITIVE_INFO_INIT(PSYS_TIME_WINDOW2, PSYS_TIME_WINDOW2_MASK, 51, + [PSYS_TIME_WINDOW2] = PRIMITIVE_INFO_INIT(PSYS_TIME_WINDOW2, PSYS_TIME_WINDOW2_MASK, 51, RAPL_DOMAIN_REG_LIMIT, TIME_UNIT, 0), /* non-hardware */ - PRIMITIVE_INFO_INIT(AVERAGE_POWER, 0, 0, 0, POWER_UNIT, + [AVERAGE_POWER] = PRIMITIVE_INFO_INIT(AVERAGE_POWER, 0, 0, 0, POWER_UNIT, RAPL_PRIMITIVE_DERIVED), - {NULL, 0, 0, 0}, };
static struct rapl_primitive_info *get_rpi(struct rapl_package *rp, int prim)
From: Zhang Rui rui.zhang@intel.com
[ Upstream commit 045610c383bd6b740bb7e7c780d6f7729249e60d ]
The same set of operations are shared by different Powert Limits, including Power Limit get/set, Power Limit enable/disable, clamping enable/disable, time window get/set, and max power get/set, etc.
But the same operation for different Power Limit has different primitives because they use different registers/register bits.
A lot of dirty/duplicate code was introduced to handle this difference.
Instead of using hardcoded primitive name directly, using Power Limit id + operation type is much cleaner.
For this sense, move POWER_LIMIT1/POWER_LIMIT2/POWER_LIMIT4 to the beginning of enum rapl_primitives so that they can be reused as Power Limit ids.
No functional change.
Signed-off-by: Zhang Rui rui.zhang@intel.com Tested-by: Wang Wendy wendy.wang@intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Stable-dep-of: 964209202ebe ("powercap: intel_rapl: Do not change CLAMPING bit if ENABLE bit cannot be changed") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/powercap/intel_rapl_common.c | 4 ++-- include/linux/intel_rapl.h | 5 +++-- 2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index 169a08e691e8f..87023d6aebf9c 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -655,14 +655,14 @@ static u64 rapl_unit_xlate(struct rapl_domain *rd, enum unit_type type,
static struct rapl_primitive_info rpi_default[NR_RAPL_PRIMITIVES] = { /* name, mask, shift, msr index, unit divisor */ - [ENERGY_COUNTER] = PRIMITIVE_INFO_INIT(ENERGY_COUNTER, ENERGY_STATUS_MASK, 0, - RAPL_DOMAIN_REG_STATUS, ENERGY_UNIT, 0), [POWER_LIMIT1] = PRIMITIVE_INFO_INIT(POWER_LIMIT1, POWER_LIMIT1_MASK, 0, RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), [POWER_LIMIT2] = PRIMITIVE_INFO_INIT(POWER_LIMIT2, POWER_LIMIT2_MASK, 32, RAPL_DOMAIN_REG_LIMIT, POWER_UNIT, 0), [POWER_LIMIT4] = PRIMITIVE_INFO_INIT(POWER_LIMIT4, POWER_LIMIT4_MASK, 0, RAPL_DOMAIN_REG_PL4, POWER_UNIT, 0), + [ENERGY_COUNTER] = PRIMITIVE_INFO_INIT(ENERGY_COUNTER, ENERGY_STATUS_MASK, 0, + RAPL_DOMAIN_REG_STATUS, ENERGY_UNIT, 0), [FW_LOCK] = PRIMITIVE_INFO_INIT(FW_LOCK, POWER_LOW_LOCK, 31, RAPL_DOMAIN_REG_LIMIT, ARBITRARY_UNIT, 0), [PL1_ENABLE] = PRIMITIVE_INFO_INIT(PL1_ENABLE, POWER_LIMIT1_ENABLE, 15, diff --git a/include/linux/intel_rapl.h b/include/linux/intel_rapl.h index b2456d97995ba..50098e641181a 100644 --- a/include/linux/intel_rapl.h +++ b/include/linux/intel_rapl.h @@ -36,10 +36,10 @@ enum rapl_domain_reg_id { struct rapl_domain;
enum rapl_primitives { - ENERGY_COUNTER, POWER_LIMIT1, POWER_LIMIT2, POWER_LIMIT4, + ENERGY_COUNTER, FW_LOCK,
PL1_ENABLE, /* power limit 1, aka long term */ @@ -74,7 +74,8 @@ struct rapl_domain_data { unsigned long timestamp; };
-#define NR_POWER_LIMITS (3) +#define NR_POWER_LIMITS (POWER_LIMIT4 + 1) + struct rapl_power_limit { struct powercap_zone_constraint *constraint; int prim_id; /* primitive ID used to enable */
From: Zhang Rui rui.zhang@intel.com
[ Upstream commit a38f300bb23c896d2d132a4502086d4bfec2a25e ]
Currently, a RAPL package is registered with the number of Power Limits supported in each RAPL domain. But this doesn't tell which Power Limits are available. Using the number of Power Limits supported to guess the availability of each Power Limit is fragile.
Use bitmap to represent the availability of each Power Limit.
Note that PL1 is mandatory thus it does not need to be set explicitly by the RAPL Interface drivers.
No functional change intended.
Signed-off-by: Zhang Rui rui.zhang@intel.com Tested-by: Wang Wendy wendy.wang@intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Stable-dep-of: 964209202ebe ("powercap: intel_rapl: Do not change CLAMPING bit if ENABLE bit cannot be changed") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/powercap/intel_rapl_common.c | 14 ++++++-------- drivers/powercap/intel_rapl_msr.c | 6 +++--- .../intel/int340x_thermal/processor_thermal_rapl.c | 4 ++-- 3 files changed, 11 insertions(+), 13 deletions(-)
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index 87023d6aebf9c..db0016c188f5f 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -575,20 +575,18 @@ static void rapl_init_domains(struct rapl_package *rp) rapl_domain_names[i]);
rd->id = i; + + /* PL1 is supported by default */ + rp->priv->limits[i] |= BIT(POWER_LIMIT1); rd->rpl[0].prim_id = PL1_ENABLE; rd->rpl[0].name = pl1_name;
- /* - * The PL2 power domain is applicable for limits two - * and limits three - */ - if (rp->priv->limits[i] >= 2) { + if (rp->priv->limits[i] & BIT(POWER_LIMIT2)) { rd->rpl[1].prim_id = PL2_ENABLE; rd->rpl[1].name = pl2_name; }
- /* Enable PL4 domain if the total power limits are three */ - if (rp->priv->limits[i] == 3) { + if (rp->priv->limits[i] & BIT(POWER_LIMIT4)) { rd->rpl[2].prim_id = PL4_ENABLE; rd->rpl[2].name = pl4_name; } @@ -786,7 +784,7 @@ static int rapl_read_data_raw(struct rapl_domain *rd, cpu = rd->rp->lead_cpu;
/* domain with 2 limits has different bit */ - if (prim == FW_LOCK && rd->rp->priv->limits[rd->id] == 2) { + if (prim == FW_LOCK && (rd->rp->priv->limits[rd->id] & BIT(POWER_LIMIT2))) { rpi->mask = POWER_HIGH_LOCK; rpi->shift = 63; } diff --git a/drivers/powercap/intel_rapl_msr.c b/drivers/powercap/intel_rapl_msr.c index e46a7641e42f6..1094b8fed2365 100644 --- a/drivers/powercap/intel_rapl_msr.c +++ b/drivers/powercap/intel_rapl_msr.c @@ -44,8 +44,8 @@ static struct rapl_if_priv rapl_msr_priv_intel = { MSR_DRAM_POWER_LIMIT, MSR_DRAM_ENERGY_STATUS, MSR_DRAM_PERF_STATUS, 0, MSR_DRAM_POWER_INFO }, .regs[RAPL_DOMAIN_PLATFORM] = { MSR_PLATFORM_POWER_LIMIT, MSR_PLATFORM_ENERGY_STATUS, 0, 0, 0}, - .limits[RAPL_DOMAIN_PACKAGE] = 2, - .limits[RAPL_DOMAIN_PLATFORM] = 2, + .limits[RAPL_DOMAIN_PACKAGE] = BIT(POWER_LIMIT2), + .limits[RAPL_DOMAIN_PLATFORM] = BIT(POWER_LIMIT2), };
static struct rapl_if_priv rapl_msr_priv_amd = { @@ -166,7 +166,7 @@ static int rapl_msr_probe(struct platform_device *pdev) rapl_msr_priv->write_raw = rapl_msr_write_raw;
if (id) { - rapl_msr_priv->limits[RAPL_DOMAIN_PACKAGE] = 3; + rapl_msr_priv->limits[RAPL_DOMAIN_PACKAGE] |= BIT(POWER_LIMIT4); rapl_msr_priv->regs[RAPL_DOMAIN_PACKAGE][RAPL_DOMAIN_REG_PL4] = MSR_VR_CURRENT_CONFIG; pr_info("PL4 support detected.\n"); diff --git a/drivers/thermal/intel/int340x_thermal/processor_thermal_rapl.c b/drivers/thermal/intel/int340x_thermal/processor_thermal_rapl.c index a205221ec8df9..e070239106f58 100644 --- a/drivers/thermal/intel/int340x_thermal/processor_thermal_rapl.c +++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_rapl.c @@ -15,8 +15,8 @@ static const struct rapl_mmio_regs rapl_mmio_default = { .reg_unit = 0x5938, .regs[RAPL_DOMAIN_PACKAGE] = { 0x59a0, 0x593c, 0x58f0, 0, 0x5930}, .regs[RAPL_DOMAIN_DRAM] = { 0x58e0, 0x58e8, 0x58ec, 0, 0}, - .limits[RAPL_DOMAIN_PACKAGE] = 2, - .limits[RAPL_DOMAIN_DRAM] = 2, + .limits[RAPL_DOMAIN_PACKAGE] = BIT(POWER_LIMIT2), + .limits[RAPL_DOMAIN_DRAM] = BIT(POWER_LIMIT2), };
static int rapl_mmio_cpu_online(unsigned int cpu)
From: Zhang Rui rui.zhang@intel.com
[ Upstream commit 9050a9cd5e4c848e265915d6e7b1f731e6e1e0e6 ]
The same set of operations are shared by different Powert Limits, including Power Limit get/set, Power Limit enable/disable, clamping enable/disable, time window get/set, and max power get/set, etc.
But the same operation for different Power Limit has different primitives because they use different registers/register bits.
A lot of dirty/duplicate code was introduced to handle this difference.
Introduce a universal way to issue Power Limit operations. Instead of using hardcoded primitive name directly, use Power Limit id + operation type, and hide all the Power Limit difference details in a central place, get_pl_prim(). Two helpers, rapl_read_pl_data() and rapl_write_pl_data(), are introduced at the same time to simplify the code for issuing Power Limit operations.
Signed-off-by: Zhang Rui rui.zhang@intel.com Tested-by: Wang Wendy wendy.wang@intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Stable-dep-of: 964209202ebe ("powercap: intel_rapl: Do not change CLAMPING bit if ENABLE bit cannot be changed") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/powercap/intel_rapl_common.c | 343 ++++++++++++--------------- include/linux/intel_rapl.h | 1 - 2 files changed, 146 insertions(+), 198 deletions(-)
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index db0016c188f5f..cc51bd508ad15 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -96,9 +96,67 @@ enum unit_type { #define DOMAIN_STATE_POWER_LIMIT_SET BIT(1) #define DOMAIN_STATE_BIOS_LOCKED BIT(2)
-static const char pl1_name[] = "long_term"; -static const char pl2_name[] = "short_term"; -static const char pl4_name[] = "peak_power"; +static const char *pl_names[NR_POWER_LIMITS] = { + [POWER_LIMIT1] = "long_term", + [POWER_LIMIT2] = "short_term", + [POWER_LIMIT4] = "peak_power", +}; + +enum pl_prims { + PL_ENABLE, + PL_CLAMP, + PL_LIMIT, + PL_TIME_WINDOW, + PL_MAX_POWER, +}; + +static bool is_pl_valid(struct rapl_domain *rd, int pl) +{ + if (pl < POWER_LIMIT1 || pl > POWER_LIMIT4) + return false; + return rd->rpl[pl].name ? true : false; +} + +static int get_pl_prim(int pl, enum pl_prims prim) +{ + switch (pl) { + case POWER_LIMIT1: + if (prim == PL_ENABLE) + return PL1_ENABLE; + if (prim == PL_CLAMP) + return PL1_CLAMP; + if (prim == PL_LIMIT) + return POWER_LIMIT1; + if (prim == PL_TIME_WINDOW) + return TIME_WINDOW1; + if (prim == PL_MAX_POWER) + return THERMAL_SPEC_POWER; + return -EINVAL; + case POWER_LIMIT2: + if (prim == PL_ENABLE) + return PL2_ENABLE; + if (prim == PL_CLAMP) + return PL2_CLAMP; + if (prim == PL_LIMIT) + return POWER_LIMIT2; + if (prim == PL_TIME_WINDOW) + return TIME_WINDOW2; + if (prim == PL_MAX_POWER) + return MAX_POWER; + return -EINVAL; + case POWER_LIMIT4: + if (prim == PL_LIMIT) + return POWER_LIMIT4; + if (prim == PL_ENABLE) + return PL4_ENABLE; + /* PL4 would be around two times PL2, use same prim as PL2. */ + if (prim == PL_MAX_POWER) + return MAX_POWER; + return -EINVAL; + default: + return -EINVAL; + } +}
#define power_zone_to_rapl_domain(_zone) \ container_of(_zone, struct rapl_domain, power_zone) @@ -155,6 +213,12 @@ static int rapl_read_data_raw(struct rapl_domain *rd, static int rapl_write_data_raw(struct rapl_domain *rd, enum rapl_primitives prim, unsigned long long value); +static int rapl_read_pl_data(struct rapl_domain *rd, int pl, + enum pl_prims pl_prim, + bool xlate, u64 *data); +static int rapl_write_pl_data(struct rapl_domain *rd, int pl, + enum pl_prims pl_prim, + unsigned long long value); static u64 rapl_unit_xlate(struct rapl_domain *rd, enum unit_type type, u64 value, int to_raw); static void package_power_limit_irq_save(struct rapl_package *rp); @@ -222,7 +286,7 @@ static int find_nr_power_limit(struct rapl_domain *rd) int i, nr_pl = 0;
for (i = 0; i < NR_POWER_LIMITS; i++) { - if (rd->rpl[i].name) + if (is_pl_valid(rd, i)) nr_pl++; }
@@ -233,37 +297,34 @@ static int set_domain_enable(struct powercap_zone *power_zone, bool mode) { struct rapl_domain *rd = power_zone_to_rapl_domain(power_zone); struct rapl_defaults *defaults = get_defaults(rd->rp); - - if (rd->state & DOMAIN_STATE_BIOS_LOCKED) - return -EACCES; + int ret;
cpus_read_lock(); - rapl_write_data_raw(rd, PL1_ENABLE, mode); - if (defaults->set_floor_freq) + ret = rapl_write_pl_data(rd, POWER_LIMIT1, PL_ENABLE, mode); + if (!ret && defaults->set_floor_freq) defaults->set_floor_freq(rd, mode); cpus_read_unlock();
- return 0; + return ret; }
static int get_domain_enable(struct powercap_zone *power_zone, bool *mode) { struct rapl_domain *rd = power_zone_to_rapl_domain(power_zone); u64 val; + int ret;
if (rd->state & DOMAIN_STATE_BIOS_LOCKED) { *mode = false; return 0; } cpus_read_lock(); - if (rapl_read_data_raw(rd, PL1_ENABLE, true, &val)) { - cpus_read_unlock(); - return -EIO; - } - *mode = val; + ret = rapl_read_pl_data(rd, POWER_LIMIT1, PL_ENABLE, true, &val); + if (!ret) + *mode = val; cpus_read_unlock();
- return 0; + return ret; }
/* per RAPL domain ops, in the order of rapl_domain_type */ @@ -319,8 +380,8 @@ static int contraint_to_pl(struct rapl_domain *rd, int cid) { int i, j;
- for (i = 0, j = 0; i < NR_POWER_LIMITS; i++) { - if ((rd->rpl[i].name) && j++ == cid) { + for (i = POWER_LIMIT1, j = 0; i < NR_POWER_LIMITS; i++) { + if (is_pl_valid(rd, i) && j++ == cid) { pr_debug("%s: index %d\n", __func__, i); return i; } @@ -341,36 +402,11 @@ static int set_power_limit(struct powercap_zone *power_zone, int cid, cpus_read_lock(); rd = power_zone_to_rapl_domain(power_zone); id = contraint_to_pl(rd, cid); - if (id < 0) { - ret = id; - goto set_exit; - } - rp = rd->rp;
- if (rd->state & DOMAIN_STATE_BIOS_LOCKED) { - dev_warn(&power_zone->dev, - "%s locked by BIOS, monitoring only\n", rd->name); - ret = -EACCES; - goto set_exit; - } - - switch (rd->rpl[id].prim_id) { - case PL1_ENABLE: - rapl_write_data_raw(rd, POWER_LIMIT1, power_limit); - break; - case PL2_ENABLE: - rapl_write_data_raw(rd, POWER_LIMIT2, power_limit); - break; - case PL4_ENABLE: - rapl_write_data_raw(rd, POWER_LIMIT4, power_limit); - break; - default: - ret = -EINVAL; - } + ret = rapl_write_pl_data(rd, id, PL_LIMIT, power_limit); if (!ret) package_power_limit_irq_save(rp); -set_exit: cpus_read_unlock(); return ret; } @@ -380,38 +416,17 @@ static int get_current_power_limit(struct powercap_zone *power_zone, int cid, { struct rapl_domain *rd; u64 val; - int prim; int ret = 0; int id;
cpus_read_lock(); rd = power_zone_to_rapl_domain(power_zone); id = contraint_to_pl(rd, cid); - if (id < 0) { - ret = id; - goto get_exit; - }
- switch (rd->rpl[id].prim_id) { - case PL1_ENABLE: - prim = POWER_LIMIT1; - break; - case PL2_ENABLE: - prim = POWER_LIMIT2; - break; - case PL4_ENABLE: - prim = POWER_LIMIT4; - break; - default: - cpus_read_unlock(); - return -EINVAL; - } - if (rapl_read_data_raw(rd, prim, true, &val)) - ret = -EIO; - else + ret = rapl_read_pl_data(rd, id, PL_LIMIT, true, &val); + if (!ret) *data = val;
-get_exit: cpus_read_unlock();
return ret; @@ -427,23 +442,9 @@ static int set_time_window(struct powercap_zone *power_zone, int cid, cpus_read_lock(); rd = power_zone_to_rapl_domain(power_zone); id = contraint_to_pl(rd, cid); - if (id < 0) { - ret = id; - goto set_time_exit; - }
- switch (rd->rpl[id].prim_id) { - case PL1_ENABLE: - rapl_write_data_raw(rd, TIME_WINDOW1, window); - break; - case PL2_ENABLE: - rapl_write_data_raw(rd, TIME_WINDOW2, window); - break; - default: - ret = -EINVAL; - } + ret = rapl_write_pl_data(rd, id, PL_TIME_WINDOW, window);
-set_time_exit: cpus_read_unlock(); return ret; } @@ -459,33 +460,11 @@ static int get_time_window(struct powercap_zone *power_zone, int cid, cpus_read_lock(); rd = power_zone_to_rapl_domain(power_zone); id = contraint_to_pl(rd, cid); - if (id < 0) { - ret = id; - goto get_time_exit; - }
- switch (rd->rpl[id].prim_id) { - case PL1_ENABLE: - ret = rapl_read_data_raw(rd, TIME_WINDOW1, true, &val); - break; - case PL2_ENABLE: - ret = rapl_read_data_raw(rd, TIME_WINDOW2, true, &val); - break; - case PL4_ENABLE: - /* - * Time window parameter is not applicable for PL4 entry - * so assigining '0' as default value. - */ - val = 0; - break; - default: - cpus_read_unlock(); - return -EINVAL; - } + ret = rapl_read_pl_data(rd, id, PL_TIME_WINDOW, true, &val); if (!ret) *data = val;
-get_time_exit: cpus_read_unlock();
return ret; @@ -505,36 +484,23 @@ static const char *get_constraint_name(struct powercap_zone *power_zone, return NULL; }
-static int get_max_power(struct powercap_zone *power_zone, int id, u64 *data) +static int get_max_power(struct powercap_zone *power_zone, int cid, u64 *data) { struct rapl_domain *rd; u64 val; - int prim; int ret = 0; + int id;
cpus_read_lock(); rd = power_zone_to_rapl_domain(power_zone); - switch (rd->rpl[id].prim_id) { - case PL1_ENABLE: - prim = THERMAL_SPEC_POWER; - break; - case PL2_ENABLE: - prim = MAX_POWER; - break; - case PL4_ENABLE: - prim = MAX_POWER; - break; - default: - cpus_read_unlock(); - return -EINVAL; - } - if (rapl_read_data_raw(rd, prim, true, &val)) - ret = -EIO; - else + id = contraint_to_pl(rd, cid); + + ret = rapl_read_pl_data(rd, id, PL_MAX_POWER, true, &val); + if (!ret) *data = val;
/* As a generalization rule, PL4 would be around two times PL2. */ - if (rd->rpl[id].prim_id == PL4_ENABLE) + if (id == POWER_LIMIT4) *data = *data * 2;
cpus_read_unlock(); @@ -561,6 +527,7 @@ static void rapl_init_domains(struct rapl_package *rp)
for (i = 0; i < RAPL_DOMAIN_MAX; i++) { unsigned int mask = rp->domain_map & (1 << i); + int t;
if (!mask) continue; @@ -578,17 +545,10 @@ static void rapl_init_domains(struct rapl_package *rp)
/* PL1 is supported by default */ rp->priv->limits[i] |= BIT(POWER_LIMIT1); - rd->rpl[0].prim_id = PL1_ENABLE; - rd->rpl[0].name = pl1_name; - - if (rp->priv->limits[i] & BIT(POWER_LIMIT2)) { - rd->rpl[1].prim_id = PL2_ENABLE; - rd->rpl[1].name = pl2_name; - }
- if (rp->priv->limits[i] & BIT(POWER_LIMIT4)) { - rd->rpl[2].prim_id = PL4_ENABLE; - rd->rpl[2].name = pl4_name; + for (t = POWER_LIMIT1; t < NR_POWER_LIMITS; t++) { + if (rp->priv->limits[i] & BIT(t)) + rd->rpl[t].name = pl_names[t]; }
for (j = 0; j < RAPL_DOMAIN_REG_MAX; j++) @@ -842,6 +802,33 @@ static int rapl_write_data_raw(struct rapl_domain *rd, return ret; }
+static int rapl_read_pl_data(struct rapl_domain *rd, int pl, + enum pl_prims pl_prim, bool xlate, u64 *data) +{ + enum rapl_primitives prim = get_pl_prim(pl, pl_prim); + + if (!is_pl_valid(rd, pl)) + return -EINVAL; + + return rapl_read_data_raw(rd, prim, xlate, data); +} + +static int rapl_write_pl_data(struct rapl_domain *rd, int pl, + enum pl_prims pl_prim, + unsigned long long value) +{ + enum rapl_primitives prim = get_pl_prim(pl, pl_prim); + + if (!is_pl_valid(rd, pl)) + return -EINVAL; + + if (rd->state & DOMAIN_STATE_BIOS_LOCKED) { + pr_warn("%s:%s:%s locked by BIOS\n", rd->rp->name, rd->name, pl_names[pl]); + return -EACCES; + } + + return rapl_write_data_raw(rd, prim, value); +} /* * Raw RAPL data stored in MSRs are in certain scales. We need to * convert them into standard units based on the units reported in @@ -969,17 +956,16 @@ static void package_power_limit_irq_restore(struct rapl_package *rp)
static void set_floor_freq_default(struct rapl_domain *rd, bool mode) { - int nr_powerlimit = find_nr_power_limit(rd); + int i;
/* always enable clamp such that p-state can go below OS requested * range. power capping priority over guranteed frequency. */ - rapl_write_data_raw(rd, PL1_CLAMP, mode); + rapl_write_pl_data(rd, POWER_LIMIT1, PL_CLAMP, mode);
- /* some domains have pl2 */ - if (nr_powerlimit > 1) { - rapl_write_data_raw(rd, PL2_ENABLE, mode); - rapl_write_data_raw(rd, PL2_CLAMP, mode); + for (i = POWER_LIMIT2; i < NR_POWER_LIMITS; i++) { + rapl_write_pl_data(rd, i, PL_ENABLE, mode); + rapl_write_pl_data(rd, i, PL_CLAMP, mode); } }
@@ -1306,11 +1292,10 @@ static void rapl_detect_powerlimit(struct rapl_domain *rd) rd->state |= DOMAIN_STATE_BIOS_LOCKED; } } - /* check if power limit MSR exists, otherwise domain is monitoring only */ - for (i = 0; i < NR_POWER_LIMITS; i++) { - int prim = rd->rpl[i].prim_id;
- if (rapl_read_data_raw(rd, prim, false, &val64)) + /* check if power limit exists, otherwise domain is monitoring only */ + for (i = POWER_LIMIT1; i < NR_POWER_LIMITS; i++) { + if (rapl_read_pl_data(rd, i, PL_ENABLE, false, &val64)) rd->rpl[i].name = NULL; } } @@ -1358,13 +1343,13 @@ void rapl_remove_package(struct rapl_package *rp) package_power_limit_irq_restore(rp);
for (rd = rp->domains; rd < rp->domains + rp->nr_domains; rd++) { - rapl_write_data_raw(rd, PL1_ENABLE, 0); - rapl_write_data_raw(rd, PL1_CLAMP, 0); - if (find_nr_power_limit(rd) > 1) { - rapl_write_data_raw(rd, PL2_ENABLE, 0); - rapl_write_data_raw(rd, PL2_CLAMP, 0); - rapl_write_data_raw(rd, PL4_ENABLE, 0); + int i; + + for (i = POWER_LIMIT1; i < NR_POWER_LIMITS; i++) { + rapl_write_pl_data(rd, i, PL_ENABLE, 0); + rapl_write_pl_data(rd, i, PL_CLAMP, 0); } + if (rd->id == RAPL_DOMAIN_PACKAGE) { rd_package = rd; continue; @@ -1451,38 +1436,18 @@ static void power_limit_state_save(void) { struct rapl_package *rp; struct rapl_domain *rd; - int nr_pl, ret, i; + int ret, i;
cpus_read_lock(); list_for_each_entry(rp, &rapl_packages, plist) { if (!rp->power_zone) continue; rd = power_zone_to_rapl_domain(rp->power_zone); - nr_pl = find_nr_power_limit(rd); - for (i = 0; i < nr_pl; i++) { - switch (rd->rpl[i].prim_id) { - case PL1_ENABLE: - ret = rapl_read_data_raw(rd, - POWER_LIMIT1, true, + for (i = POWER_LIMIT1; i < NR_POWER_LIMITS; i++) { + ret = rapl_read_pl_data(rd, i, PL_LIMIT, true, &rd->rpl[i].last_power_limit); - if (ret) - rd->rpl[i].last_power_limit = 0; - break; - case PL2_ENABLE: - ret = rapl_read_data_raw(rd, - POWER_LIMIT2, true, - &rd->rpl[i].last_power_limit); - if (ret) - rd->rpl[i].last_power_limit = 0; - break; - case PL4_ENABLE: - ret = rapl_read_data_raw(rd, - POWER_LIMIT4, true, - &rd->rpl[i].last_power_limit); - if (ret) - rd->rpl[i].last_power_limit = 0; - break; - } + if (ret) + rd->rpl[i].last_power_limit = 0; } } cpus_read_unlock(); @@ -1492,33 +1457,17 @@ static void power_limit_state_restore(void) { struct rapl_package *rp; struct rapl_domain *rd; - int nr_pl, i; + int i;
cpus_read_lock(); list_for_each_entry(rp, &rapl_packages, plist) { if (!rp->power_zone) continue; rd = power_zone_to_rapl_domain(rp->power_zone); - nr_pl = find_nr_power_limit(rd); - for (i = 0; i < nr_pl; i++) { - switch (rd->rpl[i].prim_id) { - case PL1_ENABLE: - if (rd->rpl[i].last_power_limit) - rapl_write_data_raw(rd, POWER_LIMIT1, - rd->rpl[i].last_power_limit); - break; - case PL2_ENABLE: - if (rd->rpl[i].last_power_limit) - rapl_write_data_raw(rd, POWER_LIMIT2, - rd->rpl[i].last_power_limit); - break; - case PL4_ENABLE: - if (rd->rpl[i].last_power_limit) - rapl_write_data_raw(rd, POWER_LIMIT4, - rd->rpl[i].last_power_limit); - break; - } - } + for (i = POWER_LIMIT1; i < NR_POWER_LIMITS; i++) + if (rd->rpl[i].last_power_limit) + rapl_write_pl_data(rd, i, PL_LIMIT, + rd->rpl[i].last_power_limit); } cpus_read_unlock(); } diff --git a/include/linux/intel_rapl.h b/include/linux/intel_rapl.h index 50098e641181a..565992b2a079a 100644 --- a/include/linux/intel_rapl.h +++ b/include/linux/intel_rapl.h @@ -78,7 +78,6 @@ struct rapl_domain_data {
struct rapl_power_limit { struct powercap_zone_constraint *constraint; - int prim_id; /* primitive ID used to enable */ struct rapl_domain *domain; const char *name; u64 last_power_limit;
From: Zhang Rui rui.zhang@intel.com
[ Upstream commit 964209202ebe1569c858337441e87ef0f9d71416 ]
PL1 cannot be disabled on some platforms. The ENABLE bit is still set after software clears it. This behavior leads to a scenario where, upon user request to disable the Power Limit through the powercap sysfs, the ENABLE bit remains set while the CLAMPING bit is inadvertently cleared.
According to the Intel Software Developer's Manual, the CLAMPING bit, "When set, allows the processor to go below the OS requested P states in order to maintain the power below specified Platform Power Limit value."
Thus this means the system may operate at higher power levels than intended on such platforms.
Enhance the code to check ENABLE bit after writing to it, and stop further processing if ENABLE bit cannot be changed.
Reported-by: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Fixes: 2d281d8196e3 ("PowerCap: Introduce Intel RAPL power capping driver") Cc: All applicable stable@vger.kernel.org Signed-off-by: Zhang Rui rui.zhang@intel.com Link: https://patch.msgid.link/20250619071340.384782-1-rui.zhang@intel.com [ rjw: Use str_enabled_disabled() instead of open-coded equivalent ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/powercap/intel_rapl_common.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/powercap/intel_rapl_common.c b/drivers/powercap/intel_rapl_common.c index cc51bd508ad15..7550e8be488d0 100644 --- a/drivers/powercap/intel_rapl_common.c +++ b/drivers/powercap/intel_rapl_common.c @@ -297,12 +297,28 @@ static int set_domain_enable(struct powercap_zone *power_zone, bool mode) { struct rapl_domain *rd = power_zone_to_rapl_domain(power_zone); struct rapl_defaults *defaults = get_defaults(rd->rp); + u64 val; int ret;
cpus_read_lock(); ret = rapl_write_pl_data(rd, POWER_LIMIT1, PL_ENABLE, mode); - if (!ret && defaults->set_floor_freq) + if (ret) + goto end; + + ret = rapl_read_pl_data(rd, POWER_LIMIT1, PL_ENABLE, false, &val); + if (ret) + goto end; + + if (mode != val) { + pr_debug("%s cannot be %s\n", power_zone->name, + str_enabled_disabled(mode)); + goto end; + } + + if (defaults->set_floor_freq) defaults->set_floor_freq(rd, mode); + +end: cpus_read_unlock();
return ret;
On Sun, Jul 20, 2025 at 07:46:59PM -0400, Sasha Levin wrote:
From: Zhang Rui rui.zhang@intel.com
[ Upstream commit e8e28c2af16b279b6c37d533e1e73effb197cf2e ]
rapl_defaults is Interface specific.
Although current MSR and MMIO Interface share the same rapl_defaults, new Interface like TPMI need its own rapl_defaults callbacks.
Save the rapl_defaults information in the Interface private structure.
No functional change.
Signed-off-by: Zhang Rui rui.zhang@intel.com Tested-by: Wang Wendy wendy.wang@intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Stable-dep-of: 964209202ebe ("powercap: intel_rapl: Do not change CLAMPING bit if ENABLE bit cannot be changed") Signed-off-by: Sasha Levin sashal@kernel.org
drivers/powercap/intel_rapl_common.c | 46 ++++++++++++++++++++-------- include/linux/intel_rapl.h | 2 ++ 2 files changed, 35 insertions(+), 13 deletions(-)
Your patch series here missed 3 fixes for it, and they didn't backport cleanly: 081690e94118 ("powercap: intel_rapl: Fix invalid setting of Power Limit 4") a60ec4485f1c ("powercap: intel_rapl: Downgrade BIOS locked limits pr_warn() to pr_debug()") 95f6580352a7 ("powercap: intel_rapl: Fix off by one in get_rpi()")
So I'm going to drop this series from the queue now.
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org