On Mon, Sep 15, 2025 at 10:31:24AM -0400, Sean Anderson wrote:
> On 9/15/25 05:58, Leo Yan wrote:
> > On Fri, Sep 12, 2025 at 11:13:14AM -0400, Sean Anderson wrote:
> >> coresight_panic_cb is called with interrupts disabled during panics.
> >> However, bus_for_each_dev calls bus_to_subsys which takes
> >> bus_kset->list_lock without disabling IRQs. This may cause a deadlock.
> >
> > I would rephrase it to make it clearer for anyone reading it later:
> >
> > coresight_panic_cb() is called during panics, which can preempt a flow
> > that triggers exceptions (such as data or instruction aborts).
>
> I don't see what exceptions have to do with it. You can also panic
> during a regular interrupt.
The commit mentioned "without disabling IRQs" gives the impression that
the deadlock is caused by IRQ-unsafe locking, which might mislead into
thinking why the issue cannot be fixed with IRQ-safe locking.
Regardless of whether IRQs are disabled, and regardless of the context
(interrupt, bottom-half, or normal thread), the conditions for the
deadlock are only about:
(a) The bus lock has been acquired;
(b) A panic is triggered to try to acquire the same lock.
[...]
> > When I review this patch, I recognize we can consolidate panic notifier
> > in coresight-tmc-core.c, so we don't need to distribute the changes
> > into ETF and ETR drivers (sorry if I misled you in my previous reply).
>
> And this kind of thing is why I went with the straightforward fix
> initially. I do not want to bikeshed the extent that this gets removed.
> IMO the whole "panic ops" stuff should be done directly with the panic
> notifier, hence this patch. If you do not agree with that, then ack v2
> and send a follow up of your own to fix it how you see fit.
I would fix it in one go.
I agree with you that "the whole panic ops stuff should be done directly
with the panic". The only difference between us is that I would keep the
`panic_ops` callback. To me, this encapsulates panic callbacks into
different modules, to make the code more general.
Could you check if the drafted patch below looks good to you? If so, I
will send out a formal patch.
---8<---
From ea78dd22cbdd97f709c5991d5bd3be97be6e137e Mon Sep 17 00:00:00 2001
From: Sean Anderson <sean.anderson(a)linux.dev>
Date: Tue, 16 Sep 2025 16:03:58 +0100
Subject: [PATCH] coresight: Fix possible deadlock in coresight_panic_cb()
coresight_panic_cb() is called during a panic. It invokes
bus_for_each_dev(), which then calls bus_to_subsys() and takes the
'bus_kset->list_lock'. If a panic occurs after the lock has been
acquired, it can lead to a deadlock.
Instead of using a common panic notifier to iterate the bus, this commit
directly registers the TMC device's panic notifier. This avoids bus
iteration and effectively eliminates the race condition that could cause
the deadlock.
Fixes: 46006ceb5d02 ("coresight: core: Add provision for panic callbacks")
Signed-off-by: Sean Anderson <sean.anderson(a)linux.dev>
Signed-off-by: Leo Yan <leo.yan(a)arm.com>
---
drivers/hwtracing/coresight/coresight-core.c | 42 -------------------
.../hwtracing/coresight/coresight-tmc-core.c | 26 ++++++++++++
drivers/hwtracing/coresight/coresight-tmc.h | 2 +
3 files changed, 28 insertions(+), 42 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c
index 3267192f0c1c..cb0cc8d77056 100644
--- a/drivers/hwtracing/coresight/coresight-core.c
+++ b/drivers/hwtracing/coresight/coresight-core.c
@@ -21,7 +21,6 @@
#include <linux/property.h>
#include <linux/delay.h>
#include <linux/pm_runtime.h>
-#include <linux/panic_notifier.h>
#include "coresight-etm-perf.h"
#include "coresight-priv.h"
@@ -1566,36 +1565,6 @@ const struct bus_type coresight_bustype = {
.name = "coresight",
};
-static int coresight_panic_sync(struct device *dev, void *data)
-{
- int mode;
- struct coresight_device *csdev;
-
- /* Run through panic sync handlers for all enabled devices */
- csdev = container_of(dev, struct coresight_device, dev);
- mode = coresight_get_mode(csdev);
-
- if ((mode == CS_MODE_SYSFS) || (mode == CS_MODE_PERF)) {
- if (panic_ops(csdev))
- panic_ops(csdev)->sync(csdev);
- }
-
- return 0;
-}
-
-static int coresight_panic_cb(struct notifier_block *self,
- unsigned long v, void *p)
-{
- bus_for_each_dev(&coresight_bustype, NULL, NULL,
- coresight_panic_sync);
-
- return 0;
-}
-
-static struct notifier_block coresight_notifier = {
- .notifier_call = coresight_panic_cb,
-};
-
static int __init coresight_init(void)
{
int ret;
@@ -1608,20 +1577,11 @@ static int __init coresight_init(void)
if (ret)
goto exit_bus_unregister;
- /* Register function to be called for panic */
- ret = atomic_notifier_chain_register(&panic_notifier_list,
- &coresight_notifier);
- if (ret)
- goto exit_perf;
-
/* initialise the coresight syscfg API */
ret = cscfg_init();
if (!ret)
return 0;
- atomic_notifier_chain_unregister(&panic_notifier_list,
- &coresight_notifier);
-exit_perf:
etm_perf_exit();
exit_bus_unregister:
bus_unregister(&coresight_bustype);
@@ -1631,8 +1591,6 @@ static int __init coresight_init(void)
static void __exit coresight_exit(void)
{
cscfg_exit();
- atomic_notifier_chain_unregister(&panic_notifier_list,
- &coresight_notifier);
etm_perf_exit();
bus_unregister(&coresight_bustype);
}
diff --git a/drivers/hwtracing/coresight/coresight-tmc-core.c b/drivers/hwtracing/coresight/coresight-tmc-core.c
index 36599c431be6..108ed9daf56d 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-core.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-core.c
@@ -21,6 +21,7 @@
#include <linux/slab.h>
#include <linux/dma-mapping.h>
#include <linux/spinlock.h>
+#include <linux/panic_notifier.h>
#include <linux/pm_runtime.h>
#include <linux/of.h>
#include <linux/of_address.h>
@@ -769,6 +770,21 @@ static void register_crash_dev_interface(struct tmc_drvdata *drvdata,
"Valid crash tracedata found\n");
}
+static int tmc_panic_cb(struct notifier_block *nb, unsigned long v, void *p)
+{
+ struct tmc_drvdata *drvdata = container_of(nb, struct tmc_drvdata,
+ panic_notifier);
+ struct coresight_device *csdev = drvdata->csdev;
+
+ if (coresight_get_mode(csdev) == CS_MODE_DISABLED)
+ return 0;
+
+ if (panic_ops(csdev))
+ panic_ops(csdev)->sync(csdev);
+
+ return 0;
+}
+
static int __tmc_probe(struct device *dev, struct resource *res)
{
int ret = 0;
@@ -885,6 +901,12 @@ static int __tmc_probe(struct device *dev, struct resource *res)
goto out;
}
+ if (panic_ops(drvdata->csdev)) {
+ drvdata->panic_notifier.notifier_call = tmc_panic_cb;
+ atomic_notifier_chain_register(&panic_notifier_list,
+ &drvdata->panic_notifier);
+ }
+
out:
if (is_tmc_crashdata_valid(drvdata) &&
!tmc_prepare_crashdata(drvdata))
@@ -929,6 +951,10 @@ static void __tmc_remove(struct device *dev)
{
struct tmc_drvdata *drvdata = dev_get_drvdata(dev);
+ if (panic_ops(drvdata->csdev))
+ atomic_notifier_chain_unregister(&panic_notifier_list,
+ &drvdata->panic_notifier);
+
/*
* Since misc_open() holds a refcount on the f_ops, which is
* etb fops in this case, device is there until last file
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index cbb4ba439158..873c5427673c 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -243,6 +243,7 @@ struct tmc_resrv_buf {
* (after crash) by default.
* @crash_mdata: Reserved memory for storing tmc crash metadata.
* Used by ETR/ETF.
+ * @panic_notifier: Notifier used to clean up during a panic
*/
struct tmc_drvdata {
struct clk *atclk;
@@ -273,6 +274,7 @@ struct tmc_drvdata {
struct etr_buf *perf_buf;
struct tmc_resrv_buf resrv_buf;
struct tmc_resrv_buf crash_mdata;
+ struct notifier_block panic_notifier;
};
struct etr_buf_operations {
--
2.34.1
On Tue, Sep 16, 2025 at 12:14:40PM -0400, Sean Anderson wrote:
[...]
> > Could you check if the drafted patch below looks good to you? If so, I
>
> As stated above I disagree with a half-hearted removal. If you want to do that,
> then I will resend v2 done with an rcu list and you can make your own follow-up.
It is fine to disagree, but please don't resend v2 :)
We have plan to refactor locking in CoreSight driver, I will try my
best to avoid adding new lock unless with a strong reason.
Thanks,
Leo
Change since V10:
1. Update kernel version to 6.18
V10 link: https://lkml.org/lkml/2025/8/6/520
Change since V9:
1. Replace scnprintf with sysfs_emit.
2. Update date in ABI files.
V9 link: https://lkml.org/lkml/2025/7/17/832
Change since V8:
1. Add label in all documentations of coresight components.
2. Add control of the visibility of the label sysfs attribute.
V8 link: https://lkml.org/lkml/2025/7/3/985
Change since V7:
1. Update the conflict when apply to coresight next.
2. Update the Date and version in ABI file.
V7 link: https://patchwork.kernel.org/project/linux-arm-kernel/patch/20250226121926.…
Change since V6:
1. Update the date and version in ABI file.
Change since V5:
1. Update the kernel version of ABI files.
2. Add link of different patch versions.
V5 link: https://patchwork.kernel.org/project/linux-arm-msm/cover/20241210122253.319…
Change since V4:
1. Add label in DT and add label sysfs node for each coresight device.
V4 link: https://patchwork.kernel.org/project/linux-arm-msm/cover/20240703122340.268…
Change since V3:
1. Change device-name to arm,cs-dev-name.
2. Add arm,cs-dev-name to only CTI and sources' dt-binding.
V3 link: https://patchwork.kernel.org/project/linux-arm-msm/cover/20240131082628.628…
Change since V2:
1. Fix the error in coresight core.
drivers/hwtracing/coresight/coresight-core.c:1775:7: error: assigning to 'char *' from 'const char *' discards qualifiers
2. Fix the warning when run dtbinding check.
Documentation/devicetree/bindings/arm/arm,coresight-cpu-debug.yaml: device-name: missing type definition
V2 link: https://patchwork.kernel.org/project/linux-arm-msm/cover/20240115164252.265…
Change since V1:
1. Change coresight-name to device name.
2. Add the device-name in coresight dt bindings.
V1 link: https://patchwork.kernel.org/project/linux-arm-kernel/patch/20230208110716.…
Mao Jinlong (2):
dt-bindings: arm: Add label in the coresight components
coresight: Add label sysfs node support
.../testing/sysfs-bus-coresight-devices-cti | 6 ++
.../sysfs-bus-coresight-devices-dummy-source | 6 ++
.../testing/sysfs-bus-coresight-devices-etb10 | 6 ++
.../testing/sysfs-bus-coresight-devices-etm3x | 6 ++
.../testing/sysfs-bus-coresight-devices-etm4x | 6 ++
.../sysfs-bus-coresight-devices-funnel | 6 ++
.../testing/sysfs-bus-coresight-devices-stm | 6 ++
.../testing/sysfs-bus-coresight-devices-tmc | 6 ++
.../testing/sysfs-bus-coresight-devices-tpdm | 6 ++
.../testing/sysfs-bus-coresight-devices-trbe | 6 ++
.../bindings/arm/arm,coresight-cti.yaml | 4 ++
.../arm/arm,coresight-dummy-sink.yaml | 4 ++
.../arm/arm,coresight-dummy-source.yaml | 4 ++
.../arm/arm,coresight-dynamic-funnel.yaml | 4 ++
.../arm/arm,coresight-dynamic-replicator.yaml | 4 ++
.../bindings/arm/arm,coresight-etb10.yaml | 4 ++
.../bindings/arm/arm,coresight-etm.yaml | 4 ++
.../arm/arm,coresight-static-funnel.yaml | 4 ++
.../arm/arm,coresight-static-replicator.yaml | 4 ++
.../bindings/arm/arm,coresight-tmc.yaml | 4 ++
.../bindings/arm/arm,coresight-tpiu.yaml | 4 ++
.../bindings/arm/qcom,coresight-ctcu.yaml | 4 ++
.../arm/qcom,coresight-remote-etm.yaml | 4 ++
.../bindings/arm/qcom,coresight-tpda.yaml | 4 ++
.../bindings/arm/qcom,coresight-tpdm.yaml | 4 ++
drivers/hwtracing/coresight/coresight-sysfs.c | 71 ++++++++++++++++++-
26 files changed, 189 insertions(+), 2 deletions(-)
--
2.34.1
Hi Sean,
On Thu, Sep 11, 2025 at 11:33:15AM -0400, Sean Anderson wrote:
> coresight_panic_cb is called with interrupts disabled during panics.
> However, bus_for_each_dev calls bus_to_subsys which takes
> bus_kset->list_lock without disabling IRQs. This will cause a deadlock
> if a panic occurs while one of the other coresight functions that uses
> bus_for_each_dev is running.
The decription is a bit misleading. Even when IRQ is disabled, if an
exception happens, a CPU still can be trapped for handling kernel panic.
> Maintain a separate list of coresight devices to access during a panic.
Rather than maintaining a separate list and introducing a new spinlock,
I would argue if we can simply register panic notifier in TMC ETR and
ETF drviers (see tmc_panic_sync_etr() and tmc_panic_sync_etf()).
If there is no dependency between CoreSight modules in panic sync flow,
it is not necessary to maintain list (and lock) for these modules.
I have not involved in panic patches before, so I would like to know
the maintainers' opinion.
Thanks,
Leo
This series is to fix device registration and unregistration.
The first patch addresses the resource is not released properly for a
failure case during a device registration.
The second patch is to use mutex to protect unregistration flow.
The last three patches are for refactoring. Patch 03 explicitly uses
the parent device handler. Patch 04 separates the success and failure
flows for code readable and easier maintenance. Patch 05 improves the
error handling by invoking specific functions for resource cleanup.
Leo Yan (5):
coresight: Correct sink ID map allocation failure handling
coresight: Protect unregistration with mutex
coresight: Explicitly use the parent device handler
coresight: Separate failure and success flows
coresight: Refine error handling for device registration
drivers/hwtracing/coresight/coresight-core.c | 67 +++++++++++---------
1 file changed, 37 insertions(+), 30 deletions(-)
--
2.34.1
This patchset builds upon Yicong's previous patches [1].
Introducing fix two race issues found by using TMC-ETR and CATU, Two
cleanups found when debugging the issues.
[1] https://lore.kernel.org/linux-arm-kernel/20241202092419.11777-1-yangyicong@…
---
Changes in v3:
- Patches 1: Additional comment for tmc_drvdata::etr_mode. Update
comment for tmc_drvdata::reading with Jonathan's Tag.
- Patches 2: Replace scoped_guard with guard with Jonathan's Tag.
- Patches 2: Fix spinlock to raw_spinlock, and refactor this code based
on Leo's suggested solution.
- Patches 3: change the size's type to ssize_t and use max_t to simplify
the code with Leo's Tag.
Link: https://lore.kernel.org/linux-arm-kernel/20250620075412.952934-1-hejunhao3@…
Changes in v2:
- Updated the commit of patch2.
- Rebase to v6.16-rc1
Junhao He (1):
coresight: tmc: refactor the tmc-etr mode setting to avoid race
conditions
Yicong Yang (2):
coresight: tmc: Add missing doc including reading and etr_mode of
struct tmc_drvdata
coresight: tmc: Decouple the perf buffer allocation from sysfs mode
.../hwtracing/coresight/coresight-tmc-etr.c | 110 ++++++++----------
drivers/hwtracing/coresight/coresight-tmc.h | 2 +
2 files changed, 53 insertions(+), 59 deletions(-)
--
2.33.0