This patchset builds upon Yicong's previous patches [1].
Patch 2 introducing fix race issues found by using TMC-ETR. Patch 1 & 3 introducing two cleanups found when debugging the issues.
[1] https://lore.kernel.org/linux-arm-kernel/20241202092419.11777-1-yangyicong@h...
--- Changes in v4: - a) Add comment at the context of set etr to sysfs mode. - b) Move the check on drvdata->read to the start of enable etr. - c) Add checks to prevent multiple sysfs processes from simultaneously competing to enable ETR. - d) Fix the issue with the guard used. Link: https://lore.kernel.org/linux-arm-kernel/20250818080600.418425-1-hejunhao3@h...
--- Changes in v3: - Patches 1: Additional comment for tmc_drvdata::etr_mode. Update comment for tmc_drvdata::reading with Jonathan's Tag. - Patches 2: Replace scoped_guard with guard with Jonathan's Tag. - Patches 2: Fix spinlock to raw_spinlock, and refactor this code based on Leo's suggested solution. - Patches 3: change the size's type to ssize_t and use max_t to simplify the code with Leo's Tag. Link: https://lore.kernel.org/linux-arm-kernel/20250620075412.952934-1-hejunhao3@h...
Changes in v2: - Updated the commit of patch2. - Rebase to v6.16-rc1
Junhao He (1): coresight: tmc: refactor the tmc-etr mode setting to avoid race conditions
Yicong Yang (2): coresight: tmc: Add missing doc including reading and etr_mode of struct tmc_drvdata coresight: tmc: Decouple the perf buffer allocation from sysfs mode
.../hwtracing/coresight/coresight-tmc-etr.c | 136 +++++++++--------- drivers/hwtracing/coresight/coresight-tmc.h | 2 + 2 files changed, 66 insertions(+), 72 deletions(-)
From: Yicong Yang yangyicong@hisilicon.com
tmc_drvdata::reading is used to indicate whether a reading process is performed through /dev/xyz.tmc. tmc_drvdata::etr_mode is used to store the Coresight TMC-ETR buffer mode selected by the user. Document them.
Reviewed-by: James Clark james.clark@linaro.org Signed-off-by: Yicong Yang yangyicong@hisilicon.com Signed-off-by: Junhao He hejunhao3@huawei.com Reviewed-by: Leo Yan leo.yan@arm.com Signed-off-by: Junhao He hejunhao3@h-partners.com --- drivers/hwtracing/coresight/coresight-tmc.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h index cbb4ba439158..dd6c80fbae00 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.h +++ b/drivers/hwtracing/coresight/coresight-tmc.h @@ -221,6 +221,7 @@ struct tmc_resrv_buf { * @pid: Process ID of the process that owns the session that is using * this component. For example this would be the pid of the Perf * process. + * @reading: buffer's in the reading through "/dev/xyz.tmc" entry * @stop_on_flush: Stop on flush trigger user configuration. * @buf: Snapshot of the trace data for ETF/ETB. * @etr_buf: details of buffer used in TMC-ETR @@ -233,6 +234,7 @@ struct tmc_resrv_buf { * @trigger_cntr: amount of words to store after a trigger. * @etr_caps: Bitmask of capabilities of the TMC ETR, inferred from the * device configuration register (DEVID) + * @etr_mode: User preferred mode of the ETR device, default auto mode. * @idr: Holds etr_bufs allocated for this ETR. * @idr_mutex: Access serialisation for idr. * @sysfs_buf: SYSFS buffer for ETR.
From: Junhao He hejunhao3@huawei.com
When trying to run perf and sysfs mode simultaneously, the WARN_ON() in tmc_etr_enable_hw() is triggered sometimes:
WARNING: CPU: 42 PID: 3911571 at drivers/hwtracing/coresight/coresight-tmc-etr.c:1060 tmc_etr_enable_hw+0xc0/0xd8 [coresight_tmc] [..snip..] Call trace: tmc_etr_enable_hw+0xc0/0xd8 [coresight_tmc] (P) tmc_enable_etr_sink+0x11c/0x250 [coresight_tmc] (L) tmc_enable_etr_sink+0x11c/0x250 [coresight_tmc] coresight_enable_path+0x1c8/0x218 [coresight] coresight_enable_sysfs+0xa4/0x228 [coresight] enable_source_store+0x58/0xa8 [coresight] dev_attr_store+0x20/0x40 sysfs_kf_write+0x4c/0x68 kernfs_fop_write_iter+0x120/0x1b8 vfs_write+0x2c8/0x388 ksys_write+0x74/0x108 __arm64_sys_write+0x24/0x38 el0_svc_common.constprop.0+0x64/0x148 do_el0_svc+0x24/0x38 el0_svc+0x3c/0x130 el0t_64_sync_handler+0xc8/0xd0 el0t_64_sync+0x1ac/0x1b0 ---[ end trace 0000000000000000 ]---
Since the sysfs buffer allocation and the hardware enablement is not in the same critical region, it's possible to race with the perf
mode: [sysfs mode] [perf mode] tmc_etr_get_sysfs_buffer() spin_lock(&drvdata->spinlock) [sysfs buffer allocation] spin_unlock(&drvdata->spinlock) spin_lock(&drvdata->spinlock) tmc_etr_enable_hw() drvdata->etr_buf = etr_perf->etr_buf spin_unlock(&drvdata->spinlock) spin_lock(&drvdata->spinlock) tmc_etr_enable_hw() WARN_ON(drvdata->etr_buf) // WARN sicne etr_buf initialized at the perf side spin_unlock(&drvdata->spinlock)
A race condition is introduced here, perf always prioritizes execution, and warnings can lead to concerns about potential hidden bugs, such as getting out of sync.
To fix this, configure the tmc-etr mode before invoking enable_etr_perf() or enable_etr_sysfs(), explicitly check if the tmc-etr sink is already enabled in a different mode, and simplily the setup and checks for "mode". To prevent race conditions between mode transitions.
Fixes: 296b01fd106e ("coresight: Refactor out buffer allocation function for ETR") Reported-by: Yicong Yang yangyicong@hisilicon.com Closes: https://lore.kernel.org/linux-arm-kernel/20241202092419.11777-2-yangyicong@h... Signed-off-by: Junhao He hejunhao3@huawei.com Tested-by: Yicong Yang yangyicong@hisilicon.com Signed-off-by: Junhao He hejunhao3@h-partners.com --- .../hwtracing/coresight/coresight-tmc-etr.c | 106 +++++++++--------- 1 file changed, 55 insertions(+), 51 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index b07fcdb3fe1a..fe1d5e8a0d2b 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1263,11 +1263,6 @@ static struct etr_buf *tmc_etr_get_sysfs_buffer(struct coresight_device *csdev) raw_spin_lock_irqsave(&drvdata->spinlock, flags); }
- if (drvdata->reading || coresight_get_mode(csdev) == CS_MODE_PERF) { - ret = -EBUSY; - goto out; - } - /* * If we don't have a buffer or it doesn't match the requested size, * use the buffer allocated above. Otherwise reuse the existing buffer. @@ -1278,7 +1273,6 @@ static struct etr_buf *tmc_etr_get_sysfs_buffer(struct coresight_device *csdev) drvdata->sysfs_buf = new_buf; }
-out: raw_spin_unlock_irqrestore(&drvdata->spinlock, flags);
/* Free memory outside the spinlock if need be */ @@ -1289,7 +1283,7 @@ static struct etr_buf *tmc_etr_get_sysfs_buffer(struct coresight_device *csdev)
static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) { - int ret = 0; + int ret; unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); struct etr_buf *sysfs_buf = tmc_etr_get_sysfs_buffer(csdev); @@ -1299,23 +1293,10 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
raw_spin_lock_irqsave(&drvdata->spinlock, flags);
- /* - * In sysFS mode we can have multiple writers per sink. Since this - * sink is already enabled no memory is needed and the HW need not be - * touched, even if the buffer size has changed. - */ - if (coresight_get_mode(csdev) == CS_MODE_SYSFS) { - csdev->refcnt++; - goto out; - } - ret = tmc_etr_enable_hw(drvdata, sysfs_buf); - if (!ret) { - coresight_set_mode(csdev, CS_MODE_SYSFS); + if (!ret) csdev->refcnt++; - }
-out: raw_spin_unlock_irqrestore(&drvdata->spinlock, flags);
if (!ret) @@ -1729,39 +1710,24 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data) { int rc = 0; pid_t pid; - unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); struct perf_output_handle *handle = data; struct etr_perf_buffer *etr_perf = etm_perf_sink_config(handle);
- raw_spin_lock_irqsave(&drvdata->spinlock, flags); - /* Don't use this sink if it is already claimed by sysFS */ - if (coresight_get_mode(csdev) == CS_MODE_SYSFS) { - rc = -EBUSY; - goto unlock_out; - } - - if (WARN_ON(!etr_perf || !etr_perf->etr_buf)) { - rc = -EINVAL; - goto unlock_out; - } + if (WARN_ON(!etr_perf || !etr_perf->etr_buf)) + return -EINVAL;
/* Get a handle on the pid of the session owner */ pid = etr_perf->pid;
/* Do not proceed if this device is associated with another session */ - if (drvdata->pid != -1 && drvdata->pid != pid) { - rc = -EBUSY; - goto unlock_out; - } + if (drvdata->pid != -1 && drvdata->pid != pid) + return -EBUSY;
- /* - * No HW configuration is needed if the sink is already in - * use for this session. - */ + /* The sink is already in use for this session */ if (drvdata->pid == pid) { csdev->refcnt++; - goto unlock_out; + return rc; }
rc = tmc_etr_enable_hw(drvdata, etr_perf->etr_buf); @@ -1773,22 +1739,60 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data) csdev->refcnt++; }
-unlock_out: - raw_spin_unlock_irqrestore(&drvdata->spinlock, flags); return rc; }
static int tmc_enable_etr_sink(struct coresight_device *csdev, enum cs_mode mode, void *data) { - switch (mode) { - case CS_MODE_SYSFS: - return tmc_enable_etr_sink_sysfs(csdev); - case CS_MODE_PERF: - return tmc_enable_etr_sink_perf(csdev, data); - default: - return -EINVAL; + struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); + int rc; + +retry: + scoped_guard(raw_spinlock_irqsave, &drvdata->spinlock) { + if (coresight_get_mode(csdev) != CS_MODE_DISABLED && + coresight_get_mode(csdev) != mode) + return -EBUSY; + + switch (mode) { + case CS_MODE_SYSFS: + if (drvdata->reading) + return -EBUSY; + + if (csdev->refcnt) { + /* The sink is already enabled via sysfs */ + csdev->refcnt++; + return 0; + } + + /* + * A sysfs session is currently enabling ETR, preventing + * a second sysfs process from repeatedly triggering the + * enable procedure. + */ + if (coresight_get_mode(csdev) == CS_MODE_SYSFS && !csdev->refcnt) + goto retry; + + /* + * Set ETR to sysfs mode before it is fully enabled, to + * prevent race conditions during mode transitions. + */ + coresight_set_mode(csdev, mode); + break; + case CS_MODE_PERF: + return tmc_enable_etr_sink_perf(csdev, data); + default: + return -EINVAL; + } + } + + rc = tmc_enable_etr_sink_sysfs(csdev); + if (rc) { + guard(raw_spinlock_irqsave)(&drvdata->spinlock); + coresight_set_mode(csdev, CS_MODE_DISABLED); } + + return rc; }
static int tmc_disable_etr_sink(struct coresight_device *csdev)
Hi Junhao,
While your patch fixes the problem it introduces imbalance in the way perf vs sysfs modes are handled.
On 11/11/2025 12:21, Junhao He wrote:
From: Junhao He hejunhao3@huawei.com
When trying to run perf and sysfs mode simultaneously, the WARN_ON() in tmc_etr_enable_hw() is triggered sometimes:
WARNING: CPU: 42 PID: 3911571 at drivers/hwtracing/coresight/coresight-tmc-etr.c:1060 tmc_etr_enable_hw+0xc0/0xd8 [coresight_tmc] [..snip..] Call trace: tmc_etr_enable_hw+0xc0/0xd8 [coresight_tmc] (P) tmc_enable_etr_sink+0x11c/0x250 [coresight_tmc] (L) tmc_enable_etr_sink+0x11c/0x250 [coresight_tmc] coresight_enable_path+0x1c8/0x218 [coresight] coresight_enable_sysfs+0xa4/0x228 [coresight] enable_source_store+0x58/0xa8 [coresight] dev_attr_store+0x20/0x40 sysfs_kf_write+0x4c/0x68 kernfs_fop_write_iter+0x120/0x1b8 vfs_write+0x2c8/0x388 ksys_write+0x74/0x108 __arm64_sys_write+0x24/0x38 el0_svc_common.constprop.0+0x64/0x148 do_el0_svc+0x24/0x38 el0_svc+0x3c/0x130 el0t_64_sync_handler+0xc8/0xd0 el0t_64_sync+0x1ac/0x1b0 ---[ end trace 0000000000000000 ]---
Since the sysfs buffer allocation and the hardware enablement is not in the same critical region, it's possible to race with the perf
mode: [sysfs mode] [perf mode] tmc_etr_get_sysfs_buffer() spin_lock(&drvdata->spinlock) [sysfs buffer allocation] spin_unlock(&drvdata->spinlock) spin_lock(&drvdata->spinlock) tmc_etr_enable_hw() drvdata->etr_buf = etr_perf->etr_buf spin_unlock(&drvdata->spinlock) spin_lock(&drvdata->spinlock)
^^ Could we not double check the mode here and break out ?
Something like this :
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index e0d83ee01b77..2097c57d19d0 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1314,6 +1314,9 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) if (coresight_get_mode(csdev) == CS_MODE_SYSFS) { csdev->refcnt++; goto out; + } else if (coresight_get_mode(csdev) != CS_MODE_DISABLED) { + ret = -EBUSY; + goto out; }
ret = tmc_etr_enable_hw(drvdata, sysfs_buf);
Suzuki
tmc_etr_enable_hw() WARN_ON(drvdata->etr_buf) // WARN sicne etr_buf initialized at the perf side spin_unlock(&drvdata->spinlock)
A race condition is introduced here, perf always prioritizes execution, and warnings can lead to concerns about potential hidden bugs, such as getting out of sync.
To fix this, configure the tmc-etr mode before invoking enable_etr_perf() or enable_etr_sysfs(), explicitly check if the tmc-etr sink is already enabled in a different mode, and simplily the setup and checks for "mode". To prevent race conditions between mode transitions.
Fixes: 296b01fd106e ("coresight: Refactor out buffer allocation function for ETR") Reported-by: Yicong Yang yangyicong@hisilicon.com Closes: https://lore.kernel.org/linux-arm-kernel/20241202092419.11777-2-yangyicong@h... Signed-off-by: Junhao He hejunhao3@huawei.com Tested-by: Yicong Yang yangyicong@hisilicon.com Signed-off-by: Junhao He hejunhao3@h-partners.com
.../hwtracing/coresight/coresight-tmc-etr.c | 106 +++++++++--------- 1 file changed, 55 insertions(+), 51 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index b07fcdb3fe1a..fe1d5e8a0d2b 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1263,11 +1263,6 @@ static struct etr_buf *tmc_etr_get_sysfs_buffer(struct coresight_device *csdev) raw_spin_lock_irqsave(&drvdata->spinlock, flags); }
- if (drvdata->reading || coresight_get_mode(csdev) == CS_MODE_PERF) {
ret = -EBUSY;goto out;- }
- /*
- If we don't have a buffer or it doesn't match the requested size,
- use the buffer allocated above. Otherwise reuse the existing buffer.
@@ -1278,7 +1273,6 @@ static struct etr_buf *tmc_etr_get_sysfs_buffer(struct coresight_device *csdev) drvdata->sysfs_buf = new_buf; } -out: raw_spin_unlock_irqrestore(&drvdata->spinlock, flags); /* Free memory outside the spinlock if need be */ @@ -1289,7 +1283,7 @@ static struct etr_buf *tmc_etr_get_sysfs_buffer(struct coresight_device *csdev) static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) {
- int ret = 0;
- int ret; unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); struct etr_buf *sysfs_buf = tmc_etr_get_sysfs_buffer(csdev);
@@ -1299,23 +1293,10 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) raw_spin_lock_irqsave(&drvdata->spinlock, flags);
- /*
* In sysFS mode we can have multiple writers per sink. Since this* sink is already enabled no memory is needed and the HW need not be* touched, even if the buffer size has changed.*/- if (coresight_get_mode(csdev) == CS_MODE_SYSFS) {
csdev->refcnt++;goto out;- }
- ret = tmc_etr_enable_hw(drvdata, sysfs_buf);
- if (!ret) {
coresight_set_mode(csdev, CS_MODE_SYSFS);
- if (!ret) csdev->refcnt++;
- }
-out: raw_spin_unlock_irqrestore(&drvdata->spinlock, flags); if (!ret) @@ -1729,39 +1710,24 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data) { int rc = 0; pid_t pid;
- unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); struct perf_output_handle *handle = data; struct etr_perf_buffer *etr_perf = etm_perf_sink_config(handle);
- raw_spin_lock_irqsave(&drvdata->spinlock, flags);
/* Don't use this sink if it is already claimed by sysFS */- if (coresight_get_mode(csdev) == CS_MODE_SYSFS) {
rc = -EBUSY;goto unlock_out;- }
- if (WARN_ON(!etr_perf || !etr_perf->etr_buf)) {
rc = -EINVAL;goto unlock_out;- }
- if (WARN_ON(!etr_perf || !etr_perf->etr_buf))
return -EINVAL;/* Get a handle on the pid of the session owner */ pid = etr_perf->pid; /* Do not proceed if this device is associated with another session */
- if (drvdata->pid != -1 && drvdata->pid != pid) {
rc = -EBUSY;goto unlock_out;- }
- if (drvdata->pid != -1 && drvdata->pid != pid)
return -EBUSY;
- /*
* No HW configuration is needed if the sink is already in* use for this session.*/
- /* The sink is already in use for this session */ if (drvdata->pid == pid) { csdev->refcnt++;
goto unlock_out;
}return rc;rc = tmc_etr_enable_hw(drvdata, etr_perf->etr_buf); @@ -1773,22 +1739,60 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data) csdev->refcnt++; } -unlock_out:
- raw_spin_unlock_irqrestore(&drvdata->spinlock, flags); return rc; }
static int tmc_enable_etr_sink(struct coresight_device *csdev, enum cs_mode mode, void *data) {
- switch (mode) {
- case CS_MODE_SYSFS:
return tmc_enable_etr_sink_sysfs(csdev);- case CS_MODE_PERF:
return tmc_enable_etr_sink_perf(csdev, data);- default:
return -EINVAL;
- struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
- int rc;
+retry:
- scoped_guard(raw_spinlock_irqsave, &drvdata->spinlock) {
if (coresight_get_mode(csdev) != CS_MODE_DISABLED &&coresight_get_mode(csdev) != mode)return -EBUSY;switch (mode) {case CS_MODE_SYSFS:if (drvdata->reading)return -EBUSY;if (csdev->refcnt) {/* The sink is already enabled via sysfs */csdev->refcnt++;return 0;}/** A sysfs session is currently enabling ETR, preventing* a second sysfs process from repeatedly triggering the* enable procedure.*/if (coresight_get_mode(csdev) == CS_MODE_SYSFS && !csdev->refcnt)goto retry;/** Set ETR to sysfs mode before it is fully enabled, to* prevent race conditions during mode transitions.*/coresight_set_mode(csdev, mode);break;case CS_MODE_PERF:return tmc_enable_etr_sink_perf(csdev, data);default:return -EINVAL;}- }
- rc = tmc_enable_etr_sink_sysfs(csdev);
- if (rc) {
guard(raw_spinlock_irqsave)(&drvdata->spinlock); }coresight_set_mode(csdev, CS_MODE_DISABLED);- return rc; }
static int tmc_disable_etr_sink(struct coresight_device *csdev)
On 2025/11/13 23:02, Suzuki K Poulose wrote:
Hi Junhao,
While your patch fixes the problem it introduces imbalance in the way perf vs sysfs modes are handled.
Hi Suzuki,
Thank you for your comments.
What you mean is that, as long as tmc_etr_enable_hw() has not yet been called (i.e., ETR hasn't been fully enabled), both the perf and sysfs sessions have the opportunity to take ownership of ETR device?
Based on the following sysfs-based ETR enablement sequence, the spinlock is released at two points (denoted as points a and b). At either of these points, perf may acquire the lock and invoke tmc_etr_enable_hw() to complete the ETR initialization, resulting in ETR_mode == PERF. This would interrupt the ongoing sysfs enablement flow and effectively give precedence to perf over sysfs.
mode: [sysfs mode] tmc_enable_etr_sink_sysfs() tmc_etr_get_sysfs_buffer() spin_lock(&drvdata->spinlock) sysfs_buf = READ_ONCE(drvdata->sysfs_buf); if (!sysfs_buf || (sysfs_buf->size != drvdata->size)) spin_unlock(&drvdata->spinlock) <-- point a [sysfs buffer allocation] spin_lock(&drvdata->spinlock) drvdata->sysfs_buf = new_buf; spin_unlock(&drvdata->spinlock) <-- point b spin_lock(&drvdata->spinlock) tmc_etr_enable_hw() spin_unlock(&drvdata->spinlock)
[perf mode] tmc_enable_etr_sink_perf() spin_lock(&drvdata->spinlock) tmc_etr_enable_hw() spin_unlock(&drvdata->spinlock)
After the patch, both the perf and sysfs sessions contend fairly for the lock at the entry point of tmc_enable_etr_sink(). The first to acquire the lock gains exclusive control over the ETR sink for the remainder of the session.
On 11/11/2025 12:21, Junhao He wrote:
From: Junhao He hejunhao3@huawei.com
When trying to run perf and sysfs mode simultaneously, the WARN_ON() in tmc_etr_enable_hw() is triggered sometimes:
WARNING: CPU: 42 PID: 3911571 at drivers/hwtracing/coresight/coresight-tmc-etr.c:1060 tmc_etr_enable_hw+0xc0/0xd8 [coresight_tmc] [..snip..] Call trace: tmc_etr_enable_hw+0xc0/0xd8 [coresight_tmc] (P) tmc_enable_etr_sink+0x11c/0x250 [coresight_tmc] (L) tmc_enable_etr_sink+0x11c/0x250 [coresight_tmc] coresight_enable_path+0x1c8/0x218 [coresight] coresight_enable_sysfs+0xa4/0x228 [coresight] enable_source_store+0x58/0xa8 [coresight] dev_attr_store+0x20/0x40 sysfs_kf_write+0x4c/0x68 kernfs_fop_write_iter+0x120/0x1b8 vfs_write+0x2c8/0x388 ksys_write+0x74/0x108 __arm64_sys_write+0x24/0x38 el0_svc_common.constprop.0+0x64/0x148 do_el0_svc+0x24/0x38 el0_svc+0x3c/0x130 el0t_64_sync_handler+0xc8/0xd0 el0t_64_sync+0x1ac/0x1b0 ---[ end trace 0000000000000000 ]---
Since the sysfs buffer allocation and the hardware enablement is not in the same critical region, it's possible to race with the perf
mode: [sysfs mode] [perf mode] tmc_etr_get_sysfs_buffer() spin_lock(&drvdata->spinlock) [sysfs buffer allocation] spin_unlock(&drvdata->spinlock) spin_lock(&drvdata->spinlock) tmc_etr_enable_hw() drvdata->etr_buf = etr_perf->etr_buf spin_unlock(&drvdata->spinlock) spin_lock(&drvdata->spinlock)
^^ Could we not double check the mode here and break out ?
Something like this :
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index e0d83ee01b77..2097c57d19d0 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1314,6 +1314,9 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) if (coresight_get_mode(csdev) == CS_MODE_SYSFS) { csdev->refcnt++; goto out;
} else if (coresight_get_mode(csdev) != CS_MODE_DISABLED) {ret = -EBUSY;goto out; } ret = tmc_etr_enable_hw(drvdata, sysfs_buf);Suzuki
Indeed, this approach addresses only the race condition between the perf and sysfs components. It represents the simplest viable solution.
This patch also resolves the potential null pointer dereference of drvdata->sysfs_buf, which could occur when the buffer size is modified after ETR has already been enabled, followed by enabling a second source device [1].
If the refactored patch introduces additional issues, the implementation may be reworked based on Yicong's original patches [2] ?
[1]https://lore.kernel.org/linux-arm-kernel/20241202092419.11777-2-yangyicong@h... [2]https://lore.kernel.org/linux-arm-kernel/20241202092419.11777-4-yangyicong@h...
Best regards, Junhao.
tmc_etr_enable_hw() WARN_ON(drvdata->etr_buf) // WARN sicne etr_buf initialized at the perf side spin_unlock(&drvdata->spinlock)
A race condition is introduced here, perf always prioritizes execution, and warnings can lead to concerns about potential hidden bugs, such as getting out of sync.
To fix this, configure the tmc-etr mode before invoking enable_etr_perf() or enable_etr_sysfs(), explicitly check if the tmc-etr sink is already enabled in a different mode, and simplily the setup and checks for "mode". To prevent race conditions between mode transitions.
Fixes: 296b01fd106e ("coresight: Refactor out buffer allocation function for ETR") Reported-by: Yicong Yang yangyicong@hisilicon.com Closes: https://lore.kernel.org/linux-arm-kernel/20241202092419.11777-2-yangyicong@h... Signed-off-by: Junhao He hejunhao3@huawei.com Tested-by: Yicong Yang yangyicong@hisilicon.com Signed-off-by: Junhao He hejunhao3@h-partners.com
.../hwtracing/coresight/coresight-tmc-etr.c | 106 +++++++++--------- 1 file changed, 55 insertions(+), 51 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index b07fcdb3fe1a..fe1d5e8a0d2b 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1263,11 +1263,6 @@ static struct etr_buf *tmc_etr_get_sysfs_buffer(struct coresight_device *csdev) raw_spin_lock_irqsave(&drvdata->spinlock, flags); }
- if (drvdata->reading || coresight_get_mode(csdev) ==
CS_MODE_PERF) {
ret = -EBUSY;goto out;- }
/* * If we don't have a buffer or it doesn't match the requestedsize, * use the buffer allocated above. Otherwise reuse the existing buffer. @@ -1278,7 +1273,6 @@ static struct etr_buf *tmc_etr_get_sysfs_buffer(struct coresight_device *csdev) drvdata->sysfs_buf = new_buf; } -out: raw_spin_unlock_irqrestore(&drvdata->spinlock, flags); /* Free memory outside the spinlock if need be */ @@ -1289,7 +1283,7 @@ static struct etr_buf *tmc_etr_get_sysfs_buffer(struct coresight_device *csdev) static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) {
- int ret = 0;
- int ret; unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); struct etr_buf *sysfs_buf = tmc_etr_get_sysfs_buffer(csdev);
@@ -1299,23 +1293,10 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) raw_spin_lock_irqsave(&drvdata->spinlock, flags);
- /*
* In sysFS mode we can have multiple writers per sink. Since this* sink is already enabled no memory is needed and the HW neednot be
* touched, even if the buffer size has changed.*/- if (coresight_get_mode(csdev) == CS_MODE_SYSFS) {
csdev->refcnt++;goto out;- }
ret = tmc_etr_enable_hw(drvdata, sysfs_buf);- if (!ret) {
coresight_set_mode(csdev, CS_MODE_SYSFS);
- if (!ret) csdev->refcnt++;
- } -out: raw_spin_unlock_irqrestore(&drvdata->spinlock, flags); if (!ret)
@@ -1729,39 +1710,24 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data) { int rc = 0; pid_t pid;
- unsigned long flags; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); struct perf_output_handle *handle = data; struct etr_perf_buffer *etr_perf = etm_perf_sink_config(handle);
- raw_spin_lock_irqsave(&drvdata->spinlock, flags);
/* Don't use this sink if it is already claimed by sysFS */- if (coresight_get_mode(csdev) == CS_MODE_SYSFS) {
rc = -EBUSY;goto unlock_out;- }
- if (WARN_ON(!etr_perf || !etr_perf->etr_buf)) {
rc = -EINVAL;goto unlock_out;- }
- if (WARN_ON(!etr_perf || !etr_perf->etr_buf))
return -EINVAL; /* Get a handle on the pid of the session owner */ pid = etr_perf->pid; /* Do not proceed if this device is associated with anothersession */
- if (drvdata->pid != -1 && drvdata->pid != pid) {
rc = -EBUSY;goto unlock_out;- }
- if (drvdata->pid != -1 && drvdata->pid != pid)
return -EBUSY;
- /*
* No HW configuration is needed if the sink is already in* use for this session.*/
- /* The sink is already in use for this session */ if (drvdata->pid == pid) { csdev->refcnt++;
goto unlock_out;
return rc; } rc = tmc_etr_enable_hw(drvdata, etr_perf->etr_buf);@@ -1773,22 +1739,60 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data) csdev->refcnt++; } -unlock_out:
- raw_spin_unlock_irqrestore(&drvdata->spinlock, flags); return rc; } static int tmc_enable_etr_sink(struct coresight_device *csdev, enum cs_mode mode, void *data) {
- switch (mode) {
- case CS_MODE_SYSFS:
return tmc_enable_etr_sink_sysfs(csdev);- case CS_MODE_PERF:
return tmc_enable_etr_sink_perf(csdev, data);- default:
return -EINVAL;
- struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
- int rc;
+retry:
- scoped_guard(raw_spinlock_irqsave, &drvdata->spinlock) {
if (coresight_get_mode(csdev) != CS_MODE_DISABLED &&coresight_get_mode(csdev) != mode)return -EBUSY;switch (mode) {case CS_MODE_SYSFS:if (drvdata->reading)return -EBUSY;if (csdev->refcnt) {/* The sink is already enabled via sysfs */csdev->refcnt++;return 0;}/** A sysfs session is currently enabling ETR, preventing* a second sysfs process from repeatedly triggering the* enable procedure.*/if (coresight_get_mode(csdev) == CS_MODE_SYSFS &&!csdev->refcnt)
goto retry;/** Set ETR to sysfs mode before it is fully enabled, to* prevent race conditions during mode transitions.*/coresight_set_mode(csdev, mode);break;case CS_MODE_PERF:return tmc_enable_etr_sink_perf(csdev, data);default:return -EINVAL;}- }
- rc = tmc_enable_etr_sink_sysfs(csdev);
- if (rc) {
guard(raw_spinlock_irqsave)(&drvdata->spinlock);coresight_set_mode(csdev, CS_MODE_DISABLED); }- return rc; } static int tmc_disable_etr_sink(struct coresight_device *csdev)
.
From: Yicong Yang yangyicong@hisilicon.com
Currently the perf buffer allocation follows the below logic: - if the required AUX buffer size if larger, allocate the buffer with the required size - otherwise allocate the size reference to the sysfs buffer size
This is not useful as we only collect to one AUX data, so just try to allocate the buffer match the AUX buffer size.
Suggested-by: Suzuki K Poulose suzuki.poulose@arm.com Link: https://lore.kernel.org/linux-arm-kernel/df8967cd-2157-46a2-97d9-a1aea883cf6... Signed-off-by: Yicong Yang yangyicong@hisilicon.com Signed-off-by: Junhao He hejunhao3@huawei.com Signed-off-by: Junhao He hejunhao3@h-partners.com --- .../hwtracing/coresight/coresight-tmc-etr.c | 30 ++++++------------- 1 file changed, 9 insertions(+), 21 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index fe1d5e8a0d2b..c2ba004a25fe 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1327,9 +1327,7 @@ EXPORT_SYMBOL_GPL(tmc_etr_get_buffer);
/* * alloc_etr_buf: Allocate ETR buffer for use by perf. - * The size of the hardware buffer is dependent on the size configured - * via sysfs and the perf ring buffer size. We prefer to allocate the - * largest possible size, scaling down the size by half until it + * Allocate the largest possible size, scaling down the size by half until it * reaches a minimum limit (1M), beyond which we give up. */ static struct etr_buf * @@ -1338,36 +1336,26 @@ alloc_etr_buf(struct tmc_drvdata *drvdata, struct perf_event *event, { int node; struct etr_buf *etr_buf; - unsigned long size; + ssize_t size;
node = (event->cpu == -1) ? NUMA_NO_NODE : cpu_to_node(event->cpu); - /* - * Try to match the perf ring buffer size if it is larger - * than the size requested via sysfs. - */ - if ((nr_pages << PAGE_SHIFT) > drvdata->size) { - etr_buf = tmc_alloc_etr_buf(drvdata, ((ssize_t)nr_pages << PAGE_SHIFT), - 0, node, NULL); - if (!IS_ERR(etr_buf)) - goto done; - } + + /* Use the minimum limit if the required size is smaller */ + size = nr_pages << PAGE_SHIFT; + size = max_t(ssize_t, size, TMC_ETR_PERF_MIN_BUF_SIZE);
/* - * Else switch to configured size for this ETR - * and scale down until we hit the minimum limit. + * Try to allocate the required size for this ETR, if failed scale + * down until we hit the minimum limit. */ - size = drvdata->size; do { etr_buf = tmc_alloc_etr_buf(drvdata, size, 0, node, NULL); if (!IS_ERR(etr_buf)) - goto done; + return etr_buf; size /= 2; } while (size >= TMC_ETR_PERF_MIN_BUF_SIZE);
return ERR_PTR(-ENOMEM); - -done: - return etr_buf; }
static struct etr_buf *