The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
c0a454b9044f ("arm64/bti: Disable in kernel BTI when cross section thunks are broken")
8cdd23c23c3d ("arm64: Restrict ARM64_BTI_KERNEL to clang 12.0.0 and newer")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c0a454b9044fdc99486853aa424e5b3be2107078 Mon Sep 17 00:00:00 2001
From: Mark Brown <broonie(a)kernel.org>
Date: Mon, 5 Sep 2022 15:22:55 +0100
Subject: [PATCH] arm64/bti: Disable in kernel BTI when cross section thunks
are broken
GCC does not insert a `bti c` instruction at the beginning of a function
when it believes that all callers reach the function through a direct
branch[1]. Unfortunately the logic it uses to determine this is not
sufficiently robust, for example not taking account of functions being
placed in different sections which may be loaded separately, so we may
still see thunks being generated to these functions. If that happens,
the first instruction in the callee function will result in a Branch
Target Exception due to the missing landing pad.
While this has currently only been observed in the case of modules
having their main code loaded sufficiently far from their init section
to require thunks it could potentially happen for other cases so the
safest thing is to disable BTI for the kernel when building with an
affected toolchain.
[1]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671
Reported-by: D Scott Phillips <scott(a)os.amperecomputing.com>
[Bits of the commit message are lifted from his report & workaround]
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Link: https://lore.kernel.org/r/20220905142255.591990-1-broonie@kernel.org
Cc: <stable(a)vger.kernel.org> # v5.10+
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 9fb9fff08c94..1ce7685ad5de 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1887,6 +1887,8 @@ config ARM64_BTI_KERNEL
depends on CC_HAS_BRANCH_PROT_PAC_RET_BTI
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94697
depends on !CC_IS_GCC || GCC_VERSION >= 100100
+ # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106671
+ depends on !CC_IS_GCC
# https://github.com/llvm/llvm-project/commit/a88c722e687e6780dcd6a58718350dc…
depends on !CC_IS_CLANG || CLANG_VERSION >= 120000
depends on (!FUNCTION_GRAPH_TRACER || DYNAMIC_FTRACE_WITH_REGS)
commit e89d120c4b720e232cc6a94f0fcbd59c15d41489 upstream.
The AMU counter AMEVCNTR01 (constant counter) should increment at the same
rate as the system counter. On affected Cortex-A510 cores, AMEVCNTR01
increments incorrectly giving a significantly higher output value. This
results in inaccurate task scheduler utilization tracking and incorrect
feedback on CPU frequency.
Work around this problem by returning 0 when reading the affected counter
in key locations that results in disabling all users of this counter from
using it either for frequency invariance or as FFH reference counter. This
effect is the same to firmware disabling affected counters.
Details on how the two features are affected by this erratum:
- AMU counters will not be used for frequency invariance for affected
CPUs and CPUs in the same cpufreq policy. AMUs can still be used for
frequency invariance for unaffected CPUs in the system. Although
unlikely, if no alternative method can be found to support frequency
invariance for affected CPUs (cpufreq based or solution based on
platform counters) frequency invariance will be disabled. Please check
the chapter on frequency invariance at
Documentation/scheduler/sched-capacity.rst for details of its effect.
- Given that FFH can be used to fetch either the core or constant counter
values, restrictions are lifted regarding any of these counters
returning a valid (!0) value. Therefore FFH is considered supported
if there is a least one CPU that support AMUs, independent of any
counters being disabled or affected by this erratum. Clarifying
comments are now added to the cpc_ffh_supported(), cpu_read_constcnt()
and cpu_read_corecnt() functions.
The above is achieved through adding a new erratum: ARM64_ERRATUM_2457168.
Signed-off-by: Ionela Voinescu <ionela.voinescu(a)arm.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: James Morse <james.morse(a)arm.com>
Link: https://lore.kernel.org/r/20220819103050.24211-1-ionela.voinescu@arm.com
---
Hi,
This is a backport to stable 5.15.67 of the upstream commit
e89d120c4b72 arm64: errata: add detection for AMEVCNTR01 incrementing incorrectly
This is sent separately as there were minor conflicts that needed resolving
when applying the mainline patch.
Thanks,
Ionela.
Documentation/arm64/silicon-errata.rst | 2 ++
arch/arm64/Kconfig | 17 ++++++++++++++
arch/arm64/kernel/cpu_errata.c | 9 ++++++++
arch/arm64/kernel/cpufeature.c | 5 +++-
arch/arm64/kernel/topology.c | 32 ++++++++++++++++++++++++--
arch/arm64/tools/cpucaps | 1 +
6 files changed, 63 insertions(+), 3 deletions(-)
diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index 46644736e583..663001f69773 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -94,6 +94,8 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Cortex-A510 | #2441009 | ARM64_ERRATUM_2441009 |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | Cortex-A510 | #2457168 | ARM64_ERRATUM_2457168 |
++----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 |
+----------------+-----------------+-----------------+-----------------------------+
| ARM | Neoverse-N1 | #1349291 | N/A |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 9d80c783142f..9f1d0ca2531d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -683,6 +683,23 @@ config ARM64_ERRATUM_2441009
If unsure, say Y.
+config ARM64_ERRATUM_2457168
+ bool "Cortex-A510: 2457168: workaround for AMEVCNTR01 incrementing incorrectly"
+ depends on ARM64_AMU_EXTN
+ default y
+ help
+ This option adds the workaround for ARM Cortex-A510 erratum 2457168.
+
+ The AMU counter AMEVCNTR01 (constant counter) should increment at the same rate
+ as the system counter. On affected Cortex-A510 cores AMEVCNTR01 increments
+ incorrectly giving a significantly higher output value.
+
+ Work around this problem by returning 0 when reading the affected counter in
+ key locations that results in disabling all users of this counter. This effect
+ is the same to firmware disabling affected counters.
+
+ If unsure, say Y.
+
config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 23c57e0a7fd1..25c495f58f67 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -550,6 +550,15 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
.capability = ARM64_WORKAROUND_NVIDIA_CARMEL_CNP,
ERRATA_MIDR_ALL_VERSIONS(MIDR_NVIDIA_CARMEL),
},
+#endif
+#ifdef CONFIG_ARM64_ERRATUM_2457168
+ {
+ .desc = "ARM erratum 2457168",
+ .capability = ARM64_WORKAROUND_2457168,
+ .type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE,
+ /* Cortex-A510 r0p0-r1p1 */
+ CAP_MIDR_RANGE(MIDR_CORTEX_A510, 0, 0, 1, 1)
+ },
#endif
{
}
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 474aa55c2f68..3e52a9e8b50b 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -1736,7 +1736,10 @@ static void cpu_amu_enable(struct arm64_cpu_capabilities const *cap)
pr_info("detected CPU%d: Activity Monitors Unit (AMU)\n",
smp_processor_id());
cpumask_set_cpu(smp_processor_id(), &amu_cpus);
- update_freq_counters_refs();
+
+ /* 0 reference values signal broken/disabled counters */
+ if (!this_cpu_has_cap(ARM64_WORKAROUND_2457168))
+ update_freq_counters_refs();
}
}
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 4dd14a6620c1..acf67ef4c505 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -308,12 +308,25 @@ core_initcall(init_amu_fie);
static void cpu_read_corecnt(void *val)
{
+ /*
+ * A value of 0 can be returned if the current CPU does not support AMUs
+ * or if the counter is disabled for this CPU. A return value of 0 at
+ * counter read is properly handled as an error case by the users of the
+ * counter.
+ */
*(u64 *)val = read_corecnt();
}
static void cpu_read_constcnt(void *val)
{
- *(u64 *)val = read_constcnt();
+ /*
+ * Return 0 if the current CPU is affected by erratum 2457168. A value
+ * of 0 is also returned if the current CPU does not support AMUs or if
+ * the counter is disabled. A return value of 0 at counter read is
+ * properly handled as an error case by the users of the counter.
+ */
+ *(u64 *)val = this_cpu_has_cap(ARM64_WORKAROUND_2457168) ?
+ 0UL : read_constcnt();
}
static inline
@@ -340,7 +353,22 @@ int counters_read_on_cpu(int cpu, smp_call_func_t func, u64 *val)
*/
bool cpc_ffh_supported(void)
{
- return freq_counters_valid(get_cpu_with_amu_feat());
+ int cpu = get_cpu_with_amu_feat();
+
+ /*
+ * FFH is considered supported if there is at least one present CPU that
+ * supports AMUs. Using FFH to read core and reference counters for CPUs
+ * that do not support AMUs, have counters disabled or that are affected
+ * by errata, will result in a return value of 0.
+ *
+ * This is done to allow any enabled and valid counters to be read
+ * through FFH, knowing that potentially returning 0 as counter value is
+ * properly handled by the users of these counters.
+ */
+ if ((cpu >= nr_cpu_ids) || !cpumask_test_cpu(cpu, cpu_present_mask))
+ return false;
+
+ return true;
}
int cpc_read_ffh(int cpu, struct cpc_reg *reg, u64 *val)
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index b71c6cbb2309..cfaffd3c8289 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -54,6 +54,7 @@ WORKAROUND_1418040
WORKAROUND_1463225
WORKAROUND_1508412
WORKAROUND_1542419
+WORKAROUND_2457168
WORKAROUND_CAVIUM_23154
WORKAROUND_CAVIUM_27456
WORKAROUND_CAVIUM_30115
--
2.25.1
<Note - hope this works - moved to my more opensource friendly email
account>
On 2022-09-12 08:20, Mark Pearson wrote:
>
> --------------------------------------------------------------------------------
> *From:* Jason A. Donenfeld <Jason(a)zx2c4.com>
> *Sent:* September 12, 2022 6:56
> *To:* Sebastian Reichel <sebastian.reichel(a)collabora.com>; Mark Pearson
> <mpearson(a)lenovo.com>
> *Cc:* linux-pm(a)vger.kernel.org <linux-pm(a)vger.kernel.org>;
> stable(a)vger.kernel.org <stable(a)vger.kernel.org>; Rafael J . Wysocki
> <rafael(a)kernel.org>
> *Subject:* [External] Re: [PATCH RESEND] power: supply: avoid nullptr deref in
> __power_supply_is_system_supplied
> CC+ Mark Pearson from Lenovo
> Full thread is here:
> https://lore.kernel.org/all/YwDsy3ZUgTtlKH9r@zx2c4.com/ <https://lore.kernel.org/all/YwDsy3ZUgTtlKH9r@zx2c4.com/>>
> On Mon, Sep 12, 2022 at 11:48 AM Jason A. Donenfeld <Jason(a)zx2c4.com> wrote:
>>
>> Ah another thing:
>>
>> On Mon, Sep 12, 2022 at 11:45 AM Jason A. Donenfeld <Jason(a)zx2c4.com> wrote:
>> > My machine went through three changes I know about between the threshold
>> > of "not crashing" and "crashing":
>> > - Upgraded to 5.19 and then 6.0-rc1.
>> > - I used my laptop on batteries for a prolonged period of time for the
>> > first time in a while.
>> > - I updated KDE, whose power management UI elements may or may not make
>> > frequent calls to this subsystem to update some visual representation.
>>
>> - Updated my BIOS.
>
> GASP! The plot thickens.
>
> It appears that the BIOS update I applied has been removed from
> https://pcsupport.lenovo.com/fr/en/downloads/ds551052-bios-update-utility-b… <https://pcsupport.lenovo.com/fr/en/downloads/ds551052-bios-update-utility-b…>
> and now it only shows the 1.16 version. I updated from 1.16 to 1.18.
>
> The missing release notes are still online if you futz with the URL:
> https://download.lenovo.com/pccbbs/mobiles/n40ur14w.txt
> <https://download.lenovo.com/pccbbs/mobiles/n40ur14w.txt>
> https://download.lenovo.com/pccbbs/mobiles/n40ur15w.txt
> <https://download.lenovo.com/pccbbs/mobiles/n40ur15w.txt>
>
> One of the items for 1.17 says:
>> - (Fix) Fixed an issue where it took a long time to update the battery FW.
>
> So maybe something was happening here...
>
> I'm CC'ing Mark from Lenovo to see if he has any insight as to why
> this BIOS update was pulled.
>
> Maybe the battery was appearing and disappearing rapidly. If that's
> correct, then it'd indicate that this bandaid patch is *wrong* and
> what actually is needed is some kind of reference counting or RCU
> around that sysfs interface (and maybe others).
>
> Jason
Hi Jason,
I'll have to check with the FW team but looking at the internal notes I
think the FW was pulled because of a graphics display regression.
Version 36W was fixing a brightness control issue in discrete mode and
37W (not yet released) is fixing external display - so my guess is
something about the fix in 36W has a side effect
More interesting is the EC FW updates. There isn't a new version posted
but there are fixes in the previous version (EC 33W) for a fix for a
'suspected EC-Battery communication transaction failure'. Is that
potentially related to this patch in some way? I can go and ask for more
details if we think it's related. I'll also see if I can repro on my
P1G4 - but I hadn't seen any other reports so it might be HW specific.
Can you confirm which FW you have from the BIOS setup screen (F1 during
early boot)? BIOS and EC please.
Mark