The commmit [1] replace "source_ops(csdev)::enable()" with coresight_enable_source(). After this patch, when enable ETM through perf mode, the csdev::enable will be set to true. Then if we to disable the ETM by sysfs mode at the time, the resource will be released incorrectly, kernel will report the following call trace.
perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 & echo 1 > /sys/bus/coresight/devices/ultra_smb0/enable_sink echo 1 > /sys/bus/coresight/devices/etm0/enable_source echo 0 > /sys/bus/coresight/devices/etm0/enable_source Unable to handle kernel NULL pointer dereference at virtual address 00000000000001d0 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP ... Call trace: etm4_disable+0x54/0x150 [coresight_etm4x] coresight_disable_source+0x6c/0x98 [coresight] coresight_disable+0x74/0x1c0 [coresight] enable_source_store+0x88/0xa0 [coresight] dev_attr_store+0x20/0x40 sysfs_kf_write+0x4c/0x68 kernfs_fop_write_iter+0x120/0x1b8 vfs_write+0x2dc/0x3b0 ksys_write+0x70/0x108 __arm64_sys_write+0x24/0x38 invoke_syscall+0x50/0x128 el0_svc_common.constprop.0+0x104/0x130 do_el0_svc+0x40/0xb8 el0_svc+0x2c/0xb8 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x1a4/0x1a8 Code: d53cd042 91002000 b9402a81 b8626800 (f940ead5) ---[ end trace 0000000000000000 ]---
The csdev::enable specify the device is currently part of an active path, when enabling source via perf mode, csdev::enable also should be set to true. We can add mode check in etm4_disable() to fix that, if it's done, the sysfs cannot report busy when sysfs to enable ETM that has been enabled by perf.
Struct coresight_device add member of mode to check device busy in coresight_(enable/disable)_source function to fixes handle kernel NULL pointer, and sysfs report busy if perf session is already using the ETM. By the way, inperhaps another cleanup patch may be upload to remove the etmv4_drvdata::mode tmc_drvdata::mode or others.
Test: perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 & cat /sys/bus/coresight/devices/etm2/enable_source 1 echo 1 > /sys/bus/coresight/devices/etm2/enable_source -bash: echo: write error: Device or resource busy echo 0 > /sys/bus/coresight/devices/etm2/enable_source coresight etm2: Someone is already using the tracer cat /sys/bus/coresight/devices/etm2/enable_source 1
[1] "coresight: Enable and disable helper devices adjacent to the path" Fixes: 6148652807ba ("coresight: Enable and disable helper devices adjacent to the path") Signed-off-by: Junhao He hejunhao3@huawei.com --- drivers/hwtracing/coresight/coresight-core.c | 18 ++++++++++++++++-- .../hwtracing/coresight/coresight-etm-perf.c | 6 +++++- drivers/hwtracing/coresight/coresight-priv.h | 3 ++- include/linux/coresight.h | 14 ++++++++------ 4 files changed, 31 insertions(+), 10 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c index 206f17826399..7b485fc2ed19 100644 --- a/drivers/hwtracing/coresight/coresight-core.c +++ b/drivers/hwtracing/coresight/coresight-core.c @@ -388,6 +388,7 @@ int coresight_enable_source(struct coresight_device *csdev, enum cs_mode mode, return ret; } csdev->enable = true; + csdev->mode = mode; }
atomic_inc(&csdev->refcnt); @@ -446,18 +447,26 @@ static void coresight_disable_helpers(struct coresight_device *csdev) * the device if there are no users left. * * @csdev: The coresight device to disable + * @mode: How the coresight device is being used, perf mode or sysfs mode. * @data: Opaque data to pass on to the disable function of the source device. * For example in perf mode this is a pointer to the struct perf_event. * * Returns true if the device has been disabled. */ -bool coresight_disable_source(struct coresight_device *csdev, void *data) +bool coresight_disable_source(struct coresight_device *csdev, enum cs_mode mode, + void *data) { + if (csdev->mode && csdev->mode != mode) { + dev_err(&csdev->dev, "Someone is already using the tracer\n"); + return false; + } + if (atomic_dec_return(&csdev->refcnt) == 0) { if (source_ops(csdev)->disable) source_ops(csdev)->disable(csdev, data); coresight_disable_helpers(csdev); csdev->enable = false; + csdev->mode = CS_MODE_DISABLED; } return !csdev->enable; } @@ -1117,6 +1126,11 @@ int coresight_enable(struct coresight_device *csdev) if (ret) goto out;
+ if (csdev->mode == CS_MODE_PERF) { + ret = -EBUSY; + goto out; + } + if (csdev->enable) { /* * There could be multiple applications driving the software @@ -1202,7 +1216,7 @@ void coresight_disable(struct coresight_device *csdev) if (ret) goto out;
- if (!csdev->enable || !coresight_disable_source(csdev, NULL)) + if (!csdev->enable || !coresight_disable_source(csdev, CS_MODE_SYSFS, NULL)) goto out;
switch (csdev->subtype.source_subtype) { diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index 5ca6278baff4..015aea9f2b22 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -459,6 +459,10 @@ static void etm_event_start(struct perf_event *event, int flags) if (WARN_ON(ctxt->event_data)) goto fail;
+ /* Sysfs is already using the tracer */ + if (csdev->mode == CS_MODE_SYSFS) + goto fail; + /* * Deal with the ring buffer API and get a handle on the * session's information. @@ -587,7 +591,7 @@ static void etm_event_stop(struct perf_event *event, int mode) return;
/* stop tracer */ - coresight_disable_source(csdev, event); + coresight_disable_source(csdev, CS_MODE_PERF, event);
/* tell the core */ event->hw.state = PERF_HES_STOPPED; diff --git a/drivers/hwtracing/coresight/coresight-priv.h b/drivers/hwtracing/coresight/coresight-priv.h index 767076e07970..1f8b512ba5ac 100644 --- a/drivers/hwtracing/coresight/coresight-priv.h +++ b/drivers/hwtracing/coresight/coresight-priv.h @@ -233,6 +233,7 @@ void coresight_set_percpu_sink(int cpu, struct coresight_device *csdev); struct coresight_device *coresight_get_percpu_sink(int cpu); int coresight_enable_source(struct coresight_device *csdev, enum cs_mode mode, void *data); -bool coresight_disable_source(struct coresight_device *csdev, void *data); +bool coresight_disable_source(struct coresight_device *csdev, enum cs_mode mode, + void *data);
#endif diff --git a/include/linux/coresight.h b/include/linux/coresight.h index a269fffaf991..acbbedb9abf3 100644 --- a/include/linux/coresight.h +++ b/include/linux/coresight.h @@ -216,6 +216,12 @@ struct coresight_sysfs_link { const char *target_name; };
+enum cs_mode { + CS_MODE_DISABLED, + CS_MODE_SYSFS, + CS_MODE_PERF, +}; + /** * struct coresight_device - representation of a device as used by the framework * @pdata: Platform data with device connections associated to this device. @@ -228,6 +234,7 @@ struct coresight_sysfs_link { * @refcnt: keep track of what is in use. * @orphan: true if the component has connections that haven't been linked. * @enable: 'true' if component is currently part of an active path. + * @mode: This tracer's mode, i.e sysFS, Perf or disabled. * @activated: 'true' only if a _sink_ has been activated. A sink can be * activated but not yet enabled. Enabling for a _sink_ * happens when a source has been selected and a path is enabled @@ -252,6 +259,7 @@ struct coresight_device { atomic_t refcnt; bool orphan; bool enable; /* true only if configured as part of a path */ + enum cs_mode mode; /* sink specific fields */ bool activated; /* true only if a sink is part of a path */ struct dev_ext_attribute *ea; @@ -290,12 +298,6 @@ static struct coresight_dev_list (var) = { \
#define to_coresight_device(d) container_of(d, struct coresight_device, dev)
-enum cs_mode { - CS_MODE_DISABLED, - CS_MODE_SYSFS, - CS_MODE_PERF, -}; - #define source_ops(csdev) csdev->ops->source_ops #define sink_ops(csdev) csdev->ops->sink_ops #define link_ops(csdev) csdev->ops->link_ops
On 21/09/2023 14:29, Junhao He wrote:
The commmit [1] replace "source_ops(csdev)::enable()" with coresight_enable_source(). After this patch, when enable ETM through perf mode, the csdev::enable will be set to true. Then if we to disable the ETM by sysfs mode at the time, the resource will be released incorrectly, kernel will report the following call trace.
perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 & echo 1 > /sys/bus/coresight/devices/ultra_smb0/enable_sink echo 1 > /sys/bus/coresight/devices/etm0/enable_source echo 0 > /sys/bus/coresight/devices/etm0/enable_source Unable to handle kernel NULL pointer dereference at virtual address 00000000000001d0 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP ... Call trace: etm4_disable+0x54/0x150 [coresight_etm4x] coresight_disable_source+0x6c/0x98 [coresight] coresight_disable+0x74/0x1c0 [coresight] enable_source_store+0x88/0xa0 [coresight] dev_attr_store+0x20/0x40 sysfs_kf_write+0x4c/0x68 kernfs_fop_write_iter+0x120/0x1b8 vfs_write+0x2dc/0x3b0 ksys_write+0x70/0x108 __arm64_sys_write+0x24/0x38 invoke_syscall+0x50/0x128 el0_svc_common.constprop.0+0x104/0x130 do_el0_svc+0x40/0xb8 el0_svc+0x2c/0xb8 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x1a4/0x1a8 Code: d53cd042 91002000 b9402a81 b8626800 (f940ead5) ---[ end trace 0000000000000000 ]---
The csdev::enable specify the device is currently part of an active path, when enabling source via perf mode, csdev::enable also should be set to true. We can add mode check in etm4_disable() to fix that, if it's done, the sysfs cannot report busy when sysfs to enable ETM that has been enabled by perf.
Struct coresight_device add member of mode to check device busy in coresight_(enable/disable)_source function to fixes handle kernel NULL pointer, and sysfs report busy if perf session is already using the ETM. By the way, inperhaps another cleanup patch may be upload to remove the etmv4_drvdata::mode tmc_drvdata::mode or others.
Hi Junhao,
I definitely think we should do the cleanup at the same time as this change. Otherwise we risk leaving in the other modes which are duplicates of this and could easily cause confusion.
Although I saw there are a couple of modes that are not enum cs_mode, but a different kind of mode. And it's not always obvious because enum cs_mode is sometimes stored as an int...
Test: perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 & cat /sys/bus/coresight/devices/etm2/enable_source 1 echo 1 > /sys/bus/coresight/devices/etm2/enable_source -bash: echo: write error: Device or resource busy echo 0 > /sys/bus/coresight/devices/etm2/enable_source coresight etm2: Someone is already using the tracer cat /sys/bus/coresight/devices/etm2/enable_source 1
I tested the fix and it works, although I have a few comments below on whether we should do something else instead.
[1] "coresight: Enable and disable helper devices adjacent to the path"> Fixes: 6148652807ba ("coresight: Enable and disable helper devices
adjacent to the path")
What was the original behavior before that change? Maybe it didn't have exactly the same error but was it still possible to enable and disable a sink after starting a Perf session? Seems like on the disable the same thing would have happened or something else bad?
Signed-off-by: Junhao He hejunhao3@huawei.com
drivers/hwtracing/coresight/coresight-core.c | 18 ++++++++++++++++-- .../hwtracing/coresight/coresight-etm-perf.c | 6 +++++- drivers/hwtracing/coresight/coresight-priv.h | 3 ++- include/linux/coresight.h | 14 ++++++++------ 4 files changed, 31 insertions(+), 10 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c index 206f17826399..7b485fc2ed19 100644 --- a/drivers/hwtracing/coresight/coresight-core.c +++ b/drivers/hwtracing/coresight/coresight-core.c @@ -388,6 +388,7 @@ int coresight_enable_source(struct coresight_device *csdev, enum cs_mode mode, return ret; } csdev->enable = true;
}csdev->mode = mode;
atomic_inc(&csdev->refcnt); @@ -446,18 +447,26 @@ static void coresight_disable_helpers(struct coresight_device *csdev)
- the device if there are no users left.
- @csdev: The coresight device to disable
*/
- @mode: How the coresight device is being used, perf mode or sysfs mode.
- @data: Opaque data to pass on to the disable function of the source device.
For example in perf mode this is a pointer to the struct perf_event.
- Returns true if the device has been disabled.
-bool coresight_disable_source(struct coresight_device *csdev, void *data) +bool coresight_disable_source(struct coresight_device *csdev, enum cs_mode mode,
void *data)
{
- if (csdev->mode && csdev->mode != mode) {
Can you do "csdev->mode != DISABLED" instead of just the implicit check for non zero. Although probably only "csdev->mode != mode" is enough unless you expect coresight_disable_source() to be called with mode = DISABLED.
dev_err(&csdev->dev, "Someone is already using the tracer\n");
return false;
- }
- if (atomic_dec_return(&csdev->refcnt) == 0) { if (source_ops(csdev)->disable) source_ops(csdev)->disable(csdev, data); coresight_disable_helpers(csdev); csdev->enable = false;
} return !csdev->enable;csdev->mode = CS_MODE_DISABLED;
} @@ -1117,6 +1126,11 @@ int coresight_enable(struct coresight_device *csdev) if (ret) goto out;
- if (csdev->mode == CS_MODE_PERF) {
ret = -EBUSY;
goto out;
- }
I'm struggling to understand why a reference count wouldn't work, rather than blocking on whether there is a Perf session active or not. Because now you can do enable via sysfs first, then open the Perf session, but not the other way around.
Is there a reason for the behavior to not be symmetrical?
And refcounts are already used in the code for the individual devices.
Thanks James
On 21/09/2023 16:36, James Clark wrote:
On 21/09/2023 14:29, Junhao He wrote:
The commmit [1] replace "source_ops(csdev)::enable()" with coresight_enable_source(). After this patch, when enable ETM through perf mode, the csdev::enable will be set to true. Then if we to disable the ETM by sysfs mode at the time, the resource will be released incorrectly, kernel will report the following call trace.
perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 & echo 1 > /sys/bus/coresight/devices/ultra_smb0/enable_sink echo 1 > /sys/bus/coresight/devices/etm0/enable_source echo 0 > /sys/bus/coresight/devices/etm0/enable_source Unable to handle kernel NULL pointer dereference at virtual address 00000000000001d0 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP ... Call trace: etm4_disable+0x54/0x150 [coresight_etm4x] coresight_disable_source+0x6c/0x98 [coresight] coresight_disable+0x74/0x1c0 [coresight] enable_source_store+0x88/0xa0 [coresight] dev_attr_store+0x20/0x40 sysfs_kf_write+0x4c/0x68 kernfs_fop_write_iter+0x120/0x1b8 vfs_write+0x2dc/0x3b0 ksys_write+0x70/0x108 __arm64_sys_write+0x24/0x38 invoke_syscall+0x50/0x128 el0_svc_common.constprop.0+0x104/0x130 do_el0_svc+0x40/0xb8 el0_svc+0x2c/0xb8 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x1a4/0x1a8 Code: d53cd042 91002000 b9402a81 b8626800 (f940ead5) ---[ end trace 0000000000000000 ]---
The csdev::enable specify the device is currently part of an active path, when enabling source via perf mode, csdev::enable also should be set to true. We can add mode check in etm4_disable() to fix that, if it's done, the sysfs cannot report busy when sysfs to enable ETM that has been enabled by perf.
Struct coresight_device add member of mode to check device busy in coresight_(enable/disable)_source function to fixes handle kernel NULL pointer, and sysfs report busy if perf session is already using the ETM. By the way, inperhaps another cleanup patch may be upload to remove the etmv4_drvdata::mode tmc_drvdata::mode or others.
Hi Junhao,
I definitely think we should do the cleanup at the same time as this change. Otherwise we risk leaving in the other modes which are duplicates of this and could easily cause confusion.
Although I saw there are a couple of modes that are not enum cs_mode, but a different kind of mode. And it's not always obvious because enum cs_mode is sometimes stored as an int...
Test: perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 & cat /sys/bus/coresight/devices/etm2/enable_source 1 echo 1 > /sys/bus/coresight/devices/etm2/enable_source -bash: echo: write error: Device or resource busy echo 0 > /sys/bus/coresight/devices/etm2/enable_source coresight etm2: Someone is already using the tracer cat /sys/bus/coresight/devices/etm2/enable_source 1
I tested the fix and it works, although I have a few comments below on whether we should do something else instead.
[1] "coresight: Enable and disable helper devices adjacent to the path"> Fixes: 6148652807ba ("coresight: Enable and disable helper devices
adjacent to the path")
What was the original behavior before that change? Maybe it didn't have exactly the same error but was it still possible to enable and disable a sink after starting a Perf session? Seems like on the disable the same thing would have happened or something else bad?
Before that, the perf was using the source_ops->enable/disable directly. And thus the csdev->enable was not set, forcing the sysfs mode to try enabling it via the source_ops->enable, which would reject the request.
But with the switch to using enable_source, the sysfs mode request bails out early assuming that the "csdev->enable == true" set by the perf mode, to indicate the sysfs mode and goes ahead.
Ideally, we should rely on the backend driver to do the check ?
Does the following patch fix problems for you ?
diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c index 9fabe00a40d6..10e3609d8290 100644 --- a/drivers/hwtracing/coresight/coresight-core.c +++ b/drivers/hwtracing/coresight/coresight-core.c @@ -381,15 +381,12 @@ int coresight_enable_source(struct coresight_device *csdev, enum cs_mode mode, { int ret;
- if (!csdev->enable) { - if (source_ops(csdev)->enable) { - ret = source_ops(csdev)->enable(csdev, data, mode); - if (ret) - return ret; - } + if (source_ops(csdev)->enable) { + ret = source_ops(csdev)->enable(csdev, data, mode); + if (ret) + return ret; csdev->enable = true; } - atomic_inc(&csdev->refcnt);
return 0; @@ -453,7 +450,7 @@ static void coresight_disable_helpers(struct coresight_device *csdev) */ bool coresight_disable_source(struct coresight_device *csdev, void *data) { - if (atomic_dec_return(&csdev->refcnt) == 0) { + if (csdev->enable && atomic_dec_return(&csdev->refcnt) == 0) { if (source_ops(csdev)->disable) source_ops(csdev)->disable(csdev, data); coresight_disable_helpers(csdev); @@ -1202,7 +1199,7 @@ void coresight_disable(struct coresight_device *csdev) if (ret) goto out;
- if (!csdev->enable || !coresight_disable_source(csdev, NULL)) + if (!coresight_disable_source(csdev, NULL)) goto out;
switch (csdev->subtype.source_subtype) {
Suzuki
Hi, Suzuki
On 2023/9/22 17:20, Suzuki K Poulose wrote:
On 21/09/2023 16:36, James Clark wrote:
On 21/09/2023 14:29, Junhao He wrote:
The commmit [1] replace "source_ops(csdev)::enable()" with coresight_enable_source(). After this patch, when enable ETM through perf mode, the csdev::enable will be set to true. Then if we to disable the ETM by sysfs mode at the time, the resource will be released incorrectly, kernel will report the following call trace.
perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 & echo 1 > /sys/bus/coresight/devices/ultra_smb0/enable_sink echo 1 > /sys/bus/coresight/devices/etm0/enable_source echo 0 > /sys/bus/coresight/devices/etm0/enable_source Unable to handle kernel NULL pointer dereference at virtual address 00000000000001d0 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP ... Call trace: etm4_disable+0x54/0x150 [coresight_etm4x] coresight_disable_source+0x6c/0x98 [coresight] coresight_disable+0x74/0x1c0 [coresight] enable_source_store+0x88/0xa0 [coresight] dev_attr_store+0x20/0x40 sysfs_kf_write+0x4c/0x68 kernfs_fop_write_iter+0x120/0x1b8 vfs_write+0x2dc/0x3b0 ksys_write+0x70/0x108 __arm64_sys_write+0x24/0x38 invoke_syscall+0x50/0x128 el0_svc_common.constprop.0+0x104/0x130 do_el0_svc+0x40/0xb8 el0_svc+0x2c/0xb8 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x1a4/0x1a8 Code: d53cd042 91002000 b9402a81 b8626800 (f940ead5) ---[ end trace 0000000000000000 ]---
The csdev::enable specify the device is currently part of an active path, when enabling source via perf mode, csdev::enable also should be set to true. We can add mode check in etm4_disable() to fix that, if it's done, the sysfs cannot report busy when sysfs to enable ETM that has been enabled by perf.
Struct coresight_device add member of mode to check device busy in coresight_(enable/disable)_source function to fixes handle kernel NULL pointer, and sysfs report busy if perf session is already using the ETM. By the way, inperhaps another cleanup patch may be upload to remove the etmv4_drvdata::mode tmc_drvdata::mode or others.
Hi Junhao,
I definitely think we should do the cleanup at the same time as this change. Otherwise we risk leaving in the other modes which are duplicates of this and could easily cause confusion.
Although I saw there are a couple of modes that are not enum cs_mode, but a different kind of mode. And it's not always obvious because enum cs_mode is sometimes stored as an int...
Test: perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 & cat /sys/bus/coresight/devices/etm2/enable_source 1 echo 1 > /sys/bus/coresight/devices/etm2/enable_source -bash: echo: write error: Device or resource busy echo 0 > /sys/bus/coresight/devices/etm2/enable_source coresight etm2: Someone is already using the tracer cat /sys/bus/coresight/devices/etm2/enable_source 1
I tested the fix and it works, although I have a few comments below on whether we should do something else instead.
[1] "coresight: Enable and disable helper devices adjacent to the path"> Fixes: 6148652807ba ("coresight: Enable and disable helper devices
adjacent to the path")
What was the original behavior before that change? Maybe it didn't have exactly the same error but was it still possible to enable and disable a sink after starting a Perf session? Seems like on the disable the same thing would have happened or something else bad?
Before that, the perf was using the source_ops->enable/disable directly. And thus the csdev->enable was not set, forcing the sysfs mode to try enabling it via the source_ops->enable, which would reject the request.
But with the switch to using enable_source, the sysfs mode request bails out early assuming that the "csdev->enable == true" set by the perf mode, to indicate the sysfs mode and goes ahead.
Ideally, we should rely on the backend driver to do the check ?
As i said above. We can add mode check in etm4_disable() to fix that. But I didn't do this because I wanted sysfs to report a busy log. If the logs are unimportant, we do not need to add csdev->mode and can check the mode in etm4_disable().
Does the following patch fix problems for you ?
I tested on my platform and the problem still exists. If the ETM has been used by the perf session, coresight_disable_source() still responds to the disable csdev by sysfs.
# perf record -e /cs_etm/@ultra_smb0/ -C 22 -- sleep 30 & [2] 11157 # echo 1 > /sys/bus/coresight/devices/etm22/enable_source # echo 0 > /sys/bus/coresight/devices/etm22/enable_source [ 5308.708990] Unable to handle kernel NULL pointer dereference at virtual address 00000000000001d0 [ 5308.834767] CPU: 2 PID: 11058 Comm: bash Kdump: loaded Tainted: G O 6.5.0-rc4+ #1 [ 5308.843894] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD, BIOS 2280-V2 CS V5.B221.01 12/09/2021 [ 5308.854030] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 5308.861227] pc : etm4_disable+0x54/0x150 [coresight_etm4x] [ 5308.866956] lr : coresight_disable_source+0x5c/0xb8 [coresight]
Best regards, Junhao.
diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c index 9fabe00a40d6..10e3609d8290 100644 --- a/drivers/hwtracing/coresight/coresight-core.c +++ b/drivers/hwtracing/coresight/coresight-core.c @@ -381,15 +381,12 @@ int coresight_enable_source(struct coresight_device *csdev, enum cs_mode mode, { int ret;
- if (!csdev->enable) {
if (source_ops(csdev)->enable) {
ret = source_ops(csdev)->enable(csdev, data, mode);
if (ret)
return ret;
}
- if (source_ops(csdev)->enable) {
ret = source_ops(csdev)->enable(csdev, data, mode);
if (ret)
}return ret; csdev->enable = true;
atomic_inc(&csdev->refcnt);
return 0;
@@ -453,7 +450,7 @@ static void coresight_disable_helpers(struct coresight_device *csdev) */ bool coresight_disable_source(struct coresight_device *csdev, void *data) {
- if (atomic_dec_return(&csdev->refcnt) == 0) {
- if (csdev->enable && atomic_dec_return(&csdev->refcnt) == 0) { if (source_ops(csdev)->disable) source_ops(csdev)->disable(csdev, data); coresight_disable_helpers(csdev);
@@ -1202,7 +1199,7 @@ void coresight_disable(struct coresight_device *csdev) if (ret) goto out;
- if (!csdev->enable || !coresight_disable_source(csdev, NULL))
if (!coresight_disable_source(csdev, NULL)) goto out;
switch (csdev->subtype.source_subtype) {
Suzuki
CoreSight mailing list -- coresight@lists.linaro.org To unsubscribe send an email to coresight-leave@lists.linaro.org
.
Hello
On 26/09/2023 12:43, hejunhao wrote:
Hi, Suzuki
On 2023/9/22 17:20, Suzuki K Poulose wrote:
On 21/09/2023 16:36, James Clark wrote:
On 21/09/2023 14:29, Junhao He wrote:
The commmit [1] replace "source_ops(csdev)::enable()" with coresight_enable_source(). After this patch, when enable ETM through perf mode, the csdev::enable will be set to true. Then if we to disable the ETM by sysfs mode at the time, the resource will be released incorrectly, kernel will report the following call trace.
perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 & echo 1 > /sys/bus/coresight/devices/ultra_smb0/enable_sink echo 1 > /sys/bus/coresight/devices/etm0/enable_source echo 0 > /sys/bus/coresight/devices/etm0/enable_source Unable to handle kernel NULL pointer dereference at virtual address 00000000000001d0 Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP ... Call trace: etm4_disable+0x54/0x150 [coresight_etm4x] coresight_disable_source+0x6c/0x98 [coresight] coresight_disable+0x74/0x1c0 [coresight] enable_source_store+0x88/0xa0 [coresight] dev_attr_store+0x20/0x40 sysfs_kf_write+0x4c/0x68 kernfs_fop_write_iter+0x120/0x1b8 vfs_write+0x2dc/0x3b0 ksys_write+0x70/0x108 __arm64_sys_write+0x24/0x38 invoke_syscall+0x50/0x128 el0_svc_common.constprop.0+0x104/0x130 do_el0_svc+0x40/0xb8 el0_svc+0x2c/0xb8 el0t_64_sync_handler+0xc0/0xc8 el0t_64_sync+0x1a4/0x1a8 Code: d53cd042 91002000 b9402a81 b8626800 (f940ead5) ---[ end trace 0000000000000000 ]---
The csdev::enable specify the device is currently part of an active path, when enabling source via perf mode, csdev::enable also should be set to true. We can add mode check in etm4_disable() to fix that, if it's done, the sysfs cannot report busy when sysfs to enable ETM that has been enabled by perf.
Struct coresight_device add member of mode to check device busy in coresight_(enable/disable)_source function to fixes handle kernel NULL pointer, and sysfs report busy if perf session is already using the ETM. By the way, inperhaps another cleanup patch may be upload to remove the etmv4_drvdata::mode tmc_drvdata::mode or others.
Hi Junhao,
I definitely think we should do the cleanup at the same time as this change. Otherwise we risk leaving in the other modes which are duplicates of this and could easily cause confusion.
Although I saw there are a couple of modes that are not enum cs_mode, but a different kind of mode. And it's not always obvious because enum cs_mode is sometimes stored as an int...
Test: perf record -e cs_etm/@ultra_smb0/ -C 2 sleep 30 & cat /sys/bus/coresight/devices/etm2/enable_source 1 echo 1 > /sys/bus/coresight/devices/etm2/enable_source -bash: echo: write error: Device or resource busy echo 0 > /sys/bus/coresight/devices/etm2/enable_source coresight etm2: Someone is already using the tracer cat /sys/bus/coresight/devices/etm2/enable_source 1
I tested the fix and it works, although I have a few comments below on whether we should do something else instead.
[1] "coresight: Enable and disable helper devices adjacent to the path"> Fixes: 6148652807ba ("coresight: Enable and disable helper devices
adjacent to the path")
What was the original behavior before that change? Maybe it didn't have exactly the same error but was it still possible to enable and disable a sink after starting a Perf session? Seems like on the disable the same thing would have happened or something else bad?
Before that, the perf was using the source_ops->enable/disable directly. And thus the csdev->enable was not set, forcing the sysfs mode to try enabling it via the source_ops->enable, which would reject the request.
But with the switch to using enable_source, the sysfs mode request bails out early assuming that the "csdev->enable == true" set by the perf mode, to indicate the sysfs mode and goes ahead.
Ideally, we should rely on the backend driver to do the check ?
As i said above. We can add mode check in etm4_disable() to fix that. But I didn't do this because I wanted sysfs to report a busy log. If the logs are unimportant, we do not need to add csdev->mode and can check the mode in etm4_disable().
Does the following patch fix problems for you ?
I tested on my platform and the problem still exists. If the ETM has been used by the perf session, coresight_disable_source() still responds to the disable csdev by sysfs.
# perf record -e /cs_etm/@ultra_smb0/ -C 22 -- sleep 30 & [2] 11157 # echo 1 > /sys/bus/coresight/devices/etm22/enable_source # echo 0 > /sys/bus/coresight/devices/etm22/enable_source [ 5308.708990] Unable to handle kernel NULL pointer dereference at virtual address 00000000000001d0 [ 5308.834767] CPU: 2 PID: 11058 Comm: bash Kdump: loaded Tainted: G O 6.5.0-rc4+ #1 [ 5308.843894] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDD, BIOS 2280-V2 CS V5.B221.01 12/09/2021 [ 5308.854030] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 5308.861227] pc : etm4_disable+0x54/0x150 [coresight_etm4x] [ 5308.866956] lr : coresight_disable_source+0x5c/0xb8 [coresight]
Best regards, Junhao.
diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c index 9fabe00a40d6..10e3609d8290 100644 --- a/drivers/hwtracing/coresight/coresight-core.c +++ b/drivers/hwtracing/coresight/coresight-core.c @@ -381,15 +381,12 @@ int coresight_enable_source(struct coresight_device *csdev, enum cs_mode mode, { int ret;
- if (!csdev->enable) { - if (source_ops(csdev)->enable) { - ret = source_ops(csdev)->enable(csdev, data, mode); - if (ret) - return ret; - } + if (source_ops(csdev)->enable) { + ret = source_ops(csdev)->enable(csdev, data, mode); + if (ret) + return ret; csdev->enable = true; }
atomic_inc(&csdev->refcnt);
return 0; @@ -453,7 +450,7 @@ static void coresight_disable_helpers(struct coresight_device *csdev) */ bool coresight_disable_source(struct coresight_device *csdev, void *data) { - if (atomic_dec_return(&csdev->refcnt) == 0) { + if (csdev->enable && atomic_dec_return(&csdev->refcnt) == 0) { if (source_ops(csdev)->disable) source_ops(csdev)->disable(csdev, data); coresight_disable_helpers(csdev); @@ -1202,7 +1199,7 @@ void coresight_disable(struct coresight_device *csdev) if (ret) goto out;
- if (!csdev->enable || !coresight_disable_source(csdev, NULL)) + if (!coresight_disable_source(csdev, NULL)) goto out;
switch (csdev->subtype.source_subtype) {
I am concerned about the "mode" changes without any synchronization. i.e. earlier, coresight_enable_source() was called under coresight_mutex from sysfs mode. But the perf code doesn't use any of that. So, it is still possible to race and overwrite the "mode".
I think we need to take a closer look at the locking to make sure this is clear.
Suzuki