On 21/09/2023 14:29, Junhao He wrote:
The commmit [1] replace "source_ops(csdev)::enable()" with coresight_enable_source(). After this patch, when enable ETM through perf mode, the csdev::enable will be set to true. Then if we to disable the ETM by sysfs mode at the time, the resource will be released incorrectly, kernel will report the following call trace.
perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 & echo 1 > /sys/bus/coresight/devices/ultra_smb0/enable_sink echo 1 > /sys/bus/coresight/devices/etm0/enable_source echo 0 > /sys/bus/coresight/devices/etm0/enable_source Unable to handle kernel NULL pointer dereference at virtual address 00000000000001d0 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP ... Call trace: etm4_disable+0x54/0x150 [coresight_etm4x] coresight_disable_source+0x6c/0x98 [coresight] coresight_disable+0x74/0x1c0 [coresight] enable_source_store+0x88/0xa0 [coresight] dev_attr_store+0x20/0x40 sysfs_kf_write+0x4c/0x68 kernfs_fop_write_iter+0x120/0x1b8 vfs_write+0x2dc/0x3b0 ksys_write+0x70/0x108 __arm64_sys_write+0x24/0x38 invoke_syscall+0x50/0x128 el0_svc_common.constprop.0+0x104/0x130 do_el0_svc+0x40/0xb8 el0_svc+0x2c/0xb8 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x1a4/0x1a8 Code: d53cd042 91002000 b9402a81 b8626800 (f940ead5) ---[ end trace 0000000000000000 ]---
The csdev::enable specify the device is currently part of an active path, when enabling source via perf mode, csdev::enable also should be set to true. We can add mode check in etm4_disable() to fix that, if it's done, the sysfs cannot report busy when sysfs to enable ETM that has been enabled by perf.
Struct coresight_device add member of mode to check device busy in coresight_(enable/disable)_source function to fixes handle kernel NULL pointer, and sysfs report busy if perf session is already using the ETM. By the way, inperhaps another cleanup patch may be upload to remove the etmv4_drvdata::mode tmc_drvdata::mode or others.
Hi Junhao,
I definitely think we should do the cleanup at the same time as this change. Otherwise we risk leaving in the other modes which are duplicates of this and could easily cause confusion.
Although I saw there are a couple of modes that are not enum cs_mode, but a different kind of mode. And it's not always obvious because enum cs_mode is sometimes stored as an int...
Test: perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 & cat /sys/bus/coresight/devices/etm2/enable_source 1 echo 1 > /sys/bus/coresight/devices/etm2/enable_source -bash: echo: write error: Device or resource busy echo 0 > /sys/bus/coresight/devices/etm2/enable_source coresight etm2: Someone is already using the tracer cat /sys/bus/coresight/devices/etm2/enable_source 1
I tested the fix and it works, although I have a few comments below on whether we should do something else instead.
[1] "coresight: Enable and disable helper devices adjacent to the path"> Fixes: 6148652807ba ("coresight: Enable and disable helper devices
adjacent to the path")
What was the original behavior before that change? Maybe it didn't have exactly the same error but was it still possible to enable and disable a sink after starting a Perf session? Seems like on the disable the same thing would have happened or something else bad?
Signed-off-by: Junhao He hejunhao3@huawei.com
drivers/hwtracing/coresight/coresight-core.c | 18 ++++++++++++++++-- .../hwtracing/coresight/coresight-etm-perf.c | 6 +++++- drivers/hwtracing/coresight/coresight-priv.h | 3 ++- include/linux/coresight.h | 14 ++++++++------ 4 files changed, 31 insertions(+), 10 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c index 206f17826399..7b485fc2ed19 100644 --- a/drivers/hwtracing/coresight/coresight-core.c +++ b/drivers/hwtracing/coresight/coresight-core.c @@ -388,6 +388,7 @@ int coresight_enable_source(struct coresight_device *csdev, enum cs_mode mode, return ret; } csdev->enable = true;
}csdev->mode = mode;
atomic_inc(&csdev->refcnt); @@ -446,18 +447,26 @@ static void coresight_disable_helpers(struct coresight_device *csdev)
- the device if there are no users left.
- @csdev: The coresight device to disable
*/
- @mode: How the coresight device is being used, perf mode or sysfs mode.
- @data: Opaque data to pass on to the disable function of the source device.
For example in perf mode this is a pointer to the struct perf_event.
- Returns true if the device has been disabled.
-bool coresight_disable_source(struct coresight_device *csdev, void *data) +bool coresight_disable_source(struct coresight_device *csdev, enum cs_mode mode,
void *data)
{
- if (csdev->mode && csdev->mode != mode) {
Can you do "csdev->mode != DISABLED" instead of just the implicit check for non zero. Although probably only "csdev->mode != mode" is enough unless you expect coresight_disable_source() to be called with mode = DISABLED.
dev_err(&csdev->dev, "Someone is already using the tracer\n");
return false;
- }
- if (atomic_dec_return(&csdev->refcnt) == 0) { if (source_ops(csdev)->disable) source_ops(csdev)->disable(csdev, data); coresight_disable_helpers(csdev); csdev->enable = false;
} return !csdev->enable;csdev->mode = CS_MODE_DISABLED;
} @@ -1117,6 +1126,11 @@ int coresight_enable(struct coresight_device *csdev) if (ret) goto out;
- if (csdev->mode == CS_MODE_PERF) {
ret = -EBUSY;
goto out;
- }
I'm struggling to understand why a reference count wouldn't work, rather than blocking on whether there is a Perf session active or not. Because now you can do enable via sysfs first, then open the Perf session, but not the other way around.
Is there a reason for the behavior to not be symmetrical?
And refcounts are already used in the code for the individual devices.
Thanks James