Processors based on the Zen microarchitecture support IOPORT based deeper
C-states. The idle driver reads the acpi_gbl_FADT.xpm_timer_block.address
in the IOPORT based C-state exit path which is claimed to be a
"Dummy wait op" and has been around since ACPI introduction to Linux
dating back to Andy Grover's Mar 14, 2002 posting [1].
The comment above the dummy operation was elaborated by Andreas Mohr back
in 2006 in commit b488f02156d3d ("ACPI: restore comment justifying 'extra'
P_LVLx access") [2] where the commit log claims:
"this dummy read was about: STPCLK# doesn't get asserted in time on
(some) chipsets, which is why we need to have a dummy I/O read to delay
further instruction processing until the CPU is fully stopped."
However, sampling certain workloads with IBS on AMD Zen3 system shows
that a significant amount of time is spent in the dummy op, which
incorrectly gets accounted as C-State residency. A large C-State
residency value can prime the cpuidle governor to recommend a deeper
C-State during the subsequent idle instances, starting a vicious cycle,
leading to performance degradation on workloads that rapidly switch
between busy and idle phases.
One such workload is tbench where a massive performance degradation can
be observed during certain runs. Following are some statistics gathered
by running tbench with 128 clients, on a dual socket (2 x 64C/128T) Zen3
system with the baseline kernel, baseline kernel keeping C2 disabled,
and baseline kernel with this patch applied keeping C2 enabled:
baseline kernel was tip:sched/core at
commit f3dd3f674555 ("sched: Remove the limitation of WF_ON_CPU on
wakelist if wakee cpu is idle")
Kernel : baseline baseline + C2 disabled baseline + patch
Min (MB/s) : 2215.06 33072.10 (+1393.05%) 33016.10 (+1390.52%)
Max (MB/s) : 32938.80 34399.10 34774.50
Median (MB/s) : 32191.80 33476.60 33805.70
AMean (MB/s) : 22448.55 33649.27 (+49.89%) 33865.43 (+50.85%)
AMean Stddev : 17526.70 680.14 880.72
AMean CoefVar : 78.07% 2.02% 2.60%
The data shows there are edge cases that can cause massive regressions
in case of tbench. Profiling the bad runs with IBS shows a significant
amount of time being spent in acpi_idle_do_entry method:
Overhead Command Shared Object Symbol
74.76% swapper [kernel.kallsyms] [k] acpi_idle_do_entry
0.71% tbench [kernel.kallsyms] [k] update_sd_lb_stats.constprop.0
0.69% tbench_srv [kernel.kallsyms] [k] update_sd_lb_stats.constprop.0
0.49% swapper [kernel.kallsyms] [k] psi_group_change
...
Annotation of acpi_idle_do_entry method reveals almost all the time in
acpi_idle_do_entry is spent on the port I/O in wait_for_freeze():
0.14 │ in (%dx),%al # <------ First "in" corresponding to inb(cx->address)
0.51 │ mov 0x144d64d(%rip),%rax
0.00 │ test $0x80000000,%eax
│ ↓ jne 62 # <------ Skip if running in guest
0.00 │ mov 0x19800c3(%rip),%rdx
99.33 │ in (%dx),%eax # <------ Second "in" corresponding to inl(acpi_gbl_FADT.xpm_timer_block.address)
0.00 │62: mov -0x8(%rbp),%r12
0.00 │ leave
0.00 │ ← ret
This overhead is reflected in the C2 residency on the test system where
C2 is an IOPORT based C-State. The total C-state residency reported by
"cpupower idle-info" on CPU0 for good and bad case over the 80s tbench
run is as follows (all numbers are in microseconds):
Good Run Bad Run
(Baseline)
POLL: 43338 6231 (-85.62%)
C1 (MWAIT Based): 23576156 363861 (-98.45%)
C2 (IOPORT Based): 10781218 77027280 (+614.45%)
The larger residency value in bad case leads to the system recommending
C2 state again for subsequent idle instances. The pattern lasts till the
end of the tbench run. Following is the breakdown of "entry_method"
passed to acpi_idle_do_entry during good run and bad run:
Good Run Bad Run
(Baseline)
Number of times acpi_idle_do_entry was called: 6149573 6149050 (-0.01%)
|-> Number of times entry_method was "ACPI_CSTATE_FFH": 6141494 88144 (-98.56%)
|-> Number of times entry_method was "ACPI_CSTATE_HALT": 0 0 (+0.00%)
|-> Number of times entry_method was "ACPI_CSTATE_SYSTEMIO": 8079 6060906 (+74920.49%)
For processors based on the Zen microarchitecture, this dummy wait op is
unnecessary and can be skipped when choosing IOPORT based C-States to
avoid polluting the C-state residency information.
Link: https://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux-fullhistory.git/c… [1]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?… [2]
Suggested-by: Calvin Ong <calvin.ong(a)amd.com>
Cc: stable(a)vger.kernel.org
Cc: regressions(a)lists.linux.dev
Signed-off-by: K Prateek Nayak <kprateek.nayak(a)amd.com>
---
drivers/acpi/processor_idle.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 16a1663d02d4..18850aa2b79b 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -528,8 +528,11 @@ static int acpi_idle_bm_check(void)
static void wait_for_freeze(void)
{
#ifdef CONFIG_X86
- /* No delay is needed if we are in guest */
- if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
+ /*
+ * No delay is needed if we are in guest or on a processor
+ * based on the Zen microarchitecture.
+ */
+ if (boot_cpu_has(X86_FEATURE_HYPERVISOR) || boot_cpu_has(X86_FEATURE_ZEN))
return;
#endif
/* Dummy wait op - must do something useless after P_LVL2 read
--
2.25.1
Good day,
Can I write you here? I have urgent information for you here, With
utmost good faith?, as you know that my country have been in deep
crisis due to the war,
Miss. Mariam Musa.
This reverts commit a3b884cef873 ("firmware: arm_scmi: Add clock management
to the SCMI power domain").
Using the GENPD_FLAG_PM_CLK tells genpd to gate/ungate the consumer
device's clock(s) during runtime suspend/resume through the PM clock API.
More precisely, in genpd_runtime_resume() the clock(s) for the consumer
device would become ungated prior to the driver-level ->runtime_resume()
callbacks gets invoked.
This behaviour isn't a good fit for all platforms/drivers. For example, a
driver may need to make some preparations of its device in its
->runtime_resume() callback, like calling clk_set_rate() before the
clock(s) should be ungated. In these cases, it's easier to let the clock(s)
to be managed solely by the driver, rather than at the PM domain level.
For these reasons, let's drop the use GENPD_FLAG_PM_CLK for the SCMI PM
domain, as to enable it to be more easily adopted across ARM platforms.
Fixes: a3b884cef873 ("firmware: arm_scmi: Add clock management to the SCMI power domain")
Cc: Nicolas Pitre <npitre(a)baylibre.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
---
To get some more background to $subject patch, please have a look at the
lore-link below.
https://lore.kernel.org/all/DU0PR04MB94173B45A2CFEE3BF1BD313A88409@DU0PR04M…
Kind regards
Ulf Hansson
---
drivers/firmware/arm_scmi/scmi_pm_domain.c | 26 ----------------------
1 file changed, 26 deletions(-)
diff --git a/drivers/firmware/arm_scmi/scmi_pm_domain.c b/drivers/firmware/arm_scmi/scmi_pm_domain.c
index 581d34c95769..d5dee625de78 100644
--- a/drivers/firmware/arm_scmi/scmi_pm_domain.c
+++ b/drivers/firmware/arm_scmi/scmi_pm_domain.c
@@ -8,7 +8,6 @@
#include <linux/err.h>
#include <linux/io.h>
#include <linux/module.h>
-#include <linux/pm_clock.h>
#include <linux/pm_domain.h>
#include <linux/scmi_protocol.h>
@@ -53,27 +52,6 @@ static int scmi_pd_power_off(struct generic_pm_domain *domain)
return scmi_pd_power(domain, false);
}
-static int scmi_pd_attach_dev(struct generic_pm_domain *pd, struct device *dev)
-{
- int ret;
-
- ret = pm_clk_create(dev);
- if (ret)
- return ret;
-
- ret = of_pm_clk_add_clks(dev);
- if (ret >= 0)
- return 0;
-
- pm_clk_destroy(dev);
- return ret;
-}
-
-static void scmi_pd_detach_dev(struct generic_pm_domain *pd, struct device *dev)
-{
- pm_clk_destroy(dev);
-}
-
static int scmi_pm_domain_probe(struct scmi_device *sdev)
{
int num_domains, i;
@@ -124,10 +102,6 @@ static int scmi_pm_domain_probe(struct scmi_device *sdev)
scmi_pd->genpd.name = scmi_pd->name;
scmi_pd->genpd.power_off = scmi_pd_power_off;
scmi_pd->genpd.power_on = scmi_pd_power_on;
- scmi_pd->genpd.attach_dev = scmi_pd_attach_dev;
- scmi_pd->genpd.detach_dev = scmi_pd_detach_dev;
- scmi_pd->genpd.flags = GENPD_FLAG_PM_CLK |
- GENPD_FLAG_ACTIVE_WAKEUP;
pm_genpd_init(&scmi_pd->genpd, NULL,
state == SCMI_POWER_STATE_GENERIC_OFF);
--
2.34.1
Device-managed resources allocated post component bind must be tied to
the lifetime of the aggregate DRM device or they will not necessarily be
released when binding of the aggregate device is deferred.
This is specifically true for the HDMI IRQ, which will otherwise remain
requested so that the next bind attempt fails when requesting the IRQ a
second time.
Fix this by tying the device-managed lifetime of the HDMI IRQ to the DRM
device so that it is released when bind fails.
Fixes: 067fef372c73 ("drm/msm/hdmi: refactor bind/init")
Cc: stable(a)vger.kernel.org # 3.19
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/gpu/drm/msm/hdmi/hdmi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/hdmi/hdmi.c b/drivers/gpu/drm/msm/hdmi/hdmi.c
index a0ed6aa8e4e1..f28fb21e3891 100644
--- a/drivers/gpu/drm/msm/hdmi/hdmi.c
+++ b/drivers/gpu/drm/msm/hdmi/hdmi.c
@@ -344,7 +344,7 @@ int msm_hdmi_modeset_init(struct hdmi *hdmi,
goto fail;
}
- ret = devm_request_irq(&pdev->dev, hdmi->irq,
+ ret = devm_request_irq(dev->dev, hdmi->irq,
msm_hdmi_irq, IRQF_TRIGGER_HIGH,
"hdmi_isr", hdmi);
if (ret < 0) {
--
2.35.1
Bienvenido a la FIRMA DE PRÉSTAMOS GARRY JONES. Ofrecemos préstamos
comerciales y personales seguros y confidenciales con tasas de interés
mínimas, tan bajas como el 2 % anual, con un período de reembolso de 1
a 10 años, y puede obtener financiamiento para necesidades personales,
empresas de pequeña y mediana escala (PYME) y grandes empresas.
negocios/corporaciones. Puede pedir prestado entre $ 10,000 a $
50,000,000 dólares.
Saludos,
Sr. Garry Jones.
CEO
FIRMA DE PRÉSTAMOS GARRY JONES
In the "Fast Short REP MOVSB" path of memmove, if we take the path where
the FSRM flag is enabled but the ERMS flag is not, there is no longer a
check for length >= 0x20 (both alternatives will be replaced with NOPs).
If a memmove() requiring a forward copy of less than 0x20 bytes happens
in this case, the `sub $0x20, %rdx` will cause the length to roll around
to a huge value and the copy will eventually hit a page fault.
This is not intended to happen, as the comment above the alternatives
mentions "FSRM implies ERMS".
However, there is a check in early_init_intel() that can disable ERMS,
so we should also be disabling FSRM in this path to maintain correctness
of the memmove() optimization.
Cc: stable(a)vger.kernel.org
Fixes: f444a5ff95dc ("x86/cpufeatures: Add support for fast short REP; MOVSB")
Signed-off-by: Daniel Verkamp <dverkamp(a)chromium.org>
---
arch/x86/kernel/cpu/intel.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 2d7ea5480ec3..71b412f820c7 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -328,6 +328,7 @@ static void early_init_intel(struct cpuinfo_x86 *c)
pr_info("Disabled fast string operations\n");
setup_clear_cpu_cap(X86_FEATURE_REP_GOOD);
setup_clear_cpu_cap(X86_FEATURE_ERMS);
+ setup_clear_cpu_cap(X86_FEATURE_FSRM);
}
}
--
2.37.3.998.g577e59143f-goog