This series adds thread-stack and synthesized callchain support for Arm
CoreSight, which comes from older series [1] but heavily rewritten.
CS ETM previously kept last-branch state in a per-trace-queue buffer.
That effectively makes the state per CPU, while the call/return history
belongs to a thread. This series moves branch tracking to the common
thread-stack code.
The series records CoreSight branches with thread_stack__event(), uses
thread_stack__br_sample() for last branch entries, flushes thread stacks
after decoder resets.
A decoder reset between AUX trace buffers is treated as a global trace
discontinuity, so all thread stacks are flushed, so avoids carrying
stale call/return history across a trace discontinuity.
One limitation remains for instructions emulated by the kernel. In that
case the exception return address may not match the return address
stored in the thread stack, because after exception return can be one
instruction ahead. The stack can still recover when a later return
matches an upper caller. Given emulated instructions are not the common
target for performance callchain analysis. Supporting this would require
extending the common thread-stack path to accept both the real target
address and an adjusted address for stack matching, so this series
leaves that extra complexity out.
The series has been tested on Orion6 board:
perf test 150 -vvv
150: Check Arm CoreSight synthesized callchain:
--- start ---
test child forked, pid 13528
Test callchain push: PASS
Test callchain pop: PASS
---- end(0) ----
150: Check Arm CoreSight synthesized callchain : Ok
perf script --itrace=g16i10il64
callchain_test 17468 [005] 1031003.229943: 10 instructions:
aaaac32507c4 main+0x8 (/home/kernel/leoy/test_cs_callchain/callchain_test)
ffff90bd225c __libc_start_call_main+0x7c (/usr/lib/aarch64-linux-gnu/libc.so.6)
ffff90bd233c call_init+0x9c (inlined)
ffff90bd233c __libc_start_main_impl+0x9c (inlined)
aaaac3250670 _start+0x30 (/home/kernel/leoy/test_cs_callchain/callchain_test)
callchain_test 17468 [005] 1031003.229943: 10 instructions:
aaaac3250774 do_svc+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
aaaac3250798 print+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
aaaac32507b0 foo+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
aaaac32507c8 main+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
ffff90bd225c __libc_start_call_main+0x7c (/usr/lib/aarch64-linux-gnu/libc.so.6)
ffff90bd233c call_init+0x9c (inlined)
ffff90bd233c __libc_start_main_impl+0x9c (inlined)
aaaac3250670 _start+0x30 (/home/kernel/leoy/test_cs_callchain/callchain_test)
callchain_test 17468 [005] 1031003.229944: 10 instructions:
ffff800080010c20 vectors+0x420 ([kernel.kallsyms])
aaaac3250784 do_svc+0x1c (/home/kernel/leoy/test_cs_callchain/callchain_test)
aaaac3250798 print+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
aaaac32507b0 foo+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
aaaac32507c8 main+0xc (/home/kernel/leoy/test_cs_callchain/callchain_test)
ffff90bd225c __libc_start_call_main+0x7c (/usr/lib/aarch64-linux-gnu/libc.so.6)
ffff90bd233c call_init+0x9c (inlined)
ffff90bd233c __libc_start_main_impl+0x9c (inlined)
aaaac3250670 _start+0x30 (/home/kernel/leoy/test_cs_callchain/callchain_test)
Note, the test fails on Juno board which is caused by many discontinuity
packets (mainly caused by NO_SYNC elem). This is likely caused by the
FIFO overflow on the path.
[1] https://lore.kernel.org/linux-arm-kernel/20200220052701.7754-1-leo.yan@lina…
Signed-off-by: Leo Yan <leo.yan(a)arm.com>
---
Leo Yan (8):
perf cs-etm: Decode ETE exception packets
perf cs-etm: Refactor instruction size handling
perf cs-etm: Use thread-stack for last branch entries
perf cs-etm: Flush thread stacks after decoder reset
perf cs-etm: Support call indentation
perf cs-etm: Filter synthesized branch samples
perf cs-etm: Synthesize callchains for instruction samples
perf test: Add Arm CoreSight callchain test
.../tests/shell/test_arm_coresight_callchain.sh | 235 ++++++++++++++++
tools/perf/util/cs-etm.c | 309 ++++++++++++---------
2 files changed, 408 insertions(+), 136 deletions(-)
---
base-commit: bd2a5be1fe731bc7548205dd148db75f1d588da2
change-id: 20260521-b4-arm_cs_callchain_support_v1-2c2a70719bcc
Best regards,
--
Leo Yan <leo.yan(a)arm.com>
Fix thread tracking when decoding Coresight trace and add a new test for
it.
The new test is added as a Perf test workload instead of a custom binary
with its own build system, but this requires a new feature in Perf test
to pass in control pipes which can enable and disable events. This
scopes the recording to just the workload and helps to reduce the amount
of data recorded in tracing tests.
With this new feature we can re-write all of the Coresight tests to make
use of it and remove the remaining binaries which fixes the following
issues:
* They didn't work in out of source builds
* A lot of the tests unnecessarily required root and didn't skip
without it
* They were mainly qualitative tests which didn't look for specific
behavior
Most importantly, the long build and runtime has been reduced. On a
Radxa Orion O6, unroll_loop_thread.c took 37s to compile which is longer
than the entire Perf build. Now the build time is negligible and the
before and after test runtimes for all the Coresight tests are:
| N1SDP | Orion O6
-----------------------------------
Before | 4m 0s | 14m 49s
After | 26s | 56s
-----------------------------------
Signed-off-by: James Clark <james.clark(a)linaro.org>
---
Changes in v2:
- Add --workload-ctl option to Perf test
- Re-write all the Coresight tests and speed them up
- Pass packet to memory access function so frontend can use either the
previous or current packet's EL
- Link to v1: https://lore.kernel.org/r/20260526-james-cs-context-tracking-fix-v1-0-ebd60…
---
James Clark (18):
perf cs-etm: Queue context packets for frontend
perf test: Add workload-ctl option
perf test: Add a workload that forces context switches
perf test cs-etm: Test process attribution
perf test: Add deterministic workload
perf test cs-etm: Replace unroll loop thread with deterministic decode test
perf test cs-etm: Remove asm_pure_loop test
perf test cs-etm: Replace memcpy test with raw dump stress test
perf test: Add named_threads workload
perf test cs-etm: Test decoding for concurrent threads test
perf test cs-etm: Remove duplicate branch tests
perf test cs-etm: Reduce snapshot size
perf test cs-etm: Speed up basic test
perf test cs-etm: Remove unused Coresight workloads
perf test cs-etm: Make disassembly test use kcore
perf test cs-etm: Add all branch instructions to test
perf test cs-etm: Speed up disassembly test
perf test cs-etm: Move existing tests to coresight folder
Documentation/trace/coresight/coresight-perf.rst | 78 +------
MAINTAINERS | 2 -
tools/perf/Documentation/perf-test.txt | 18 +-
tools/perf/Makefile.perf | 14 +-
tools/perf/scripts/python/arm-cs-trace-disasm.py | 20 +-
tools/perf/tests/builtin-test.c | 187 ++++++++++++++++-
tools/perf/tests/shell/coresight/Makefile | 29 ---
.../perf/tests/shell/coresight/Makefile.miniconfig | 14 --
tools/perf/tests/shell/coresight/asm_pure_loop.sh | 22 --
.../tests/shell/coresight/asm_pure_loop/.gitignore | 1 -
.../tests/shell/coresight/asm_pure_loop/Makefile | 34 ---
.../shell/coresight/asm_pure_loop/asm_pure_loop.S | 30 ---
.../tests/shell/coresight/concurrent_threads.sh | 45 ++++
.../tests/shell/coresight/context_switch_thread.sh | 69 ++++++
tools/perf/tests/shell/coresight/deterministic.sh | 71 +++++++
.../tests/shell/coresight/memcpy_thread/.gitignore | 1 -
.../tests/shell/coresight/memcpy_thread/Makefile | 33 ---
.../shell/coresight/memcpy_thread/memcpy_thread.c | 80 -------
.../tests/shell/coresight/memcpy_thread_16k_10.sh | 22 --
.../perf/tests/shell/coresight/raw_dump_stress.sh | 54 +++++
.../shell/{ => coresight}/test_arm_coresight.sh | 43 ++--
.../{ => coresight}/test_arm_coresight_disasm.sh | 17 +-
.../tests/shell/coresight/thread_loop/.gitignore | 1 -
.../tests/shell/coresight/thread_loop/Makefile | 33 ---
.../shell/coresight/thread_loop/thread_loop.c | 85 --------
.../shell/coresight/thread_loop_check_tid_10.sh | 23 --
.../shell/coresight/thread_loop_check_tid_2.sh | 23 --
.../shell/coresight/unroll_loop_thread/.gitignore | 1 -
.../shell/coresight/unroll_loop_thread/Makefile | 33 ---
.../unroll_loop_thread/unroll_loop_thread.c | 75 -------
.../tests/shell/coresight/unroll_loop_thread_10.sh | 22 --
tools/perf/tests/shell/lib/coresight.sh | 134 ------------
tools/perf/tests/tests.h | 3 +
tools/perf/tests/workloads/Build | 4 +
tools/perf/tests/workloads/context_switch_loop.c | 95 +++++++++
tools/perf/tests/workloads/deterministic.c | 39 ++++
tools/perf/tests/workloads/named_threads.c | 109 ++++++++++
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 21 +-
tools/perf/util/cs-etm.c | 233 +++++++++++++--------
tools/perf/util/cs-etm.h | 8 +-
40 files changed, 892 insertions(+), 934 deletions(-)
---
base-commit: 5f0ca6b80b12bab1ce06839cdffb6148bb650ff4
change-id: 20260515-james-cs-context-tracking-fix-754998bae7ed
Best regards,
--
James Clark <james.clark(a)linaro.org>
On Thu, 04 Jun 2026 15:34:25 +0800, Junrui Luo wrote:
> When the SMB sink is used as a perf AUX sink, smb_update_buffer() calls
> smb_sync_perf_buffer() to copy hardware trace data into the perf AUX ring
> buffer pages. It derives pg_idx = head >> PAGE_SHIFT from @head, which is
> handle->head, and indexes dst_pages[pg_idx]. The pg_idx %= nr_pages
> normalization is only applied after the first loop iteration.
>
> This leaves the initial page index underived from the buffer size, which
> can result in an out-of-bounds write past dst_pages[] when head exceeds
> the AUX buffer size.
>
> [...]
Applied, thanks!
[1/1] coresight: ultrasoc-smb: Fix OOB write in smb_sync_perf_buffer()
https://git.kernel.org/coresight/c/98495b5a4d77
Best regards,
--
Suzuki K Poulose <suzuki.poulose(a)arm.com>
Hi Amir,
On Wed, Jun 03, 2026 at 01:10:17PM -0700, Amir Ayupov wrote:
> Hi James,
>
> Thank you for picking it up.
>
> I tested the v2 patch series and it looks good. There was a minor
> difference in 2/39 tested perf data files: the number of brstack samples
> differs by one, however, there was no loss of binary profile. The resulting
> BOLT profile converted from the perf script output was identical, so I'm OK
> with v2 patch as-is.
Sorry jumping in as brstack is mentioned.
Now branch stack is maintained per-CPU wise, it mixes up branch stack
cross threads. The patch 03 in the series [1] refactors branch stack
per-thread by using common code.
Hope this can benefit a bit the profiling data quality and in case
you are interested in.
Thanks,
Leo
[1] https://lore.kernel.org/linux-perf-users/20260526-b4-arm_cs_callchain_suppo…
Fix thread tracking when decoding Coresight trace and add a new test for
it.
The new test is added as a Perf test workload instead of a custom binary
with its own build system, but this requires a new feature in Perf test
to pass in control pipes which can enable and disable events. This
scopes the recording to just the workload and helps to reduce the amount
of data recorded in tracing tests.
With this new feature we can re-write all of the Coresight tests to make
use of it and remove the remaining binaries which fixes the following
issues:
* They didn't work in out of source builds
* A lot of the tests unnecessarily required root and didn't skip
without it
* They were mainly qualitative tests which didn't look for specific
behavior
Most importantly, the long build and runtime has been reduced. On a
Radxa Orion O6, unroll_loop_thread.c took 37s to compile which is longer
than the entire Perf build. Now the build time is negligible and the
before and after test runtimes for all the Coresight tests are:
| N1SDP | Orion O6
-----------------------------------
Before | 4m 0s | 14m 49s
After | 26s | 56s
-----------------------------------
Signed-off-by: James Clark <james.clark(a)linaro.org>
---
Changes in v3:
- Minor sashiko comments
- Close some more pipes
- Fix warning messages
- Error handling improvements
- Pass packet into cs_etm__synth_instruction_sample()
- Fixup stale comment (Leo)
- Link to v2: https://lore.kernel.org/r/20260602-james-cs-context-tracking-fix-v2-0-85b5c…
Changes in v2:
- Add --workload-ctl option to Perf test
- Re-write all the Coresight tests and speed them up
- Pass packet to memory access function so frontend can use either the
previous or current packet's EL
- Link to v1: https://lore.kernel.org/r/20260526-james-cs-context-tracking-fix-v1-0-ebd60…
---
James Clark (19):
perf cs-etm: Queue context packets for frontend
perf test: Add workload-ctl option
perf test: Add a workload that forces context switches
perf test cs-etm: Test process attribution
perf test: Add deterministic workload
perf test cs-etm: Replace unroll loop thread with deterministic decode test
perf test cs-etm: Remove asm_pure_loop test
perf test cs-etm: Replace memcpy test with raw dump stress test
perf test: Add named_threads workload
perf test cs-etm: Test decoding for concurrent threads test
perf test cs-etm: Remove duplicate branch tests
perf test cs-etm: Skip if not root
perf test cs-etm: Reduce snapshot size
perf test cs-etm: Speed up basic test
perf test cs-etm: Remove unused Coresight workloads
perf test cs-etm: Make disassembly test use kcore
perf test cs-etm: Add all branch instructions to test
perf test cs-etm: Speed up disassembly test
perf test cs-etm: Move existing tests to coresight folder
Documentation/trace/coresight/coresight-perf.rst | 78 +------
MAINTAINERS | 2 -
tools/perf/Documentation/perf-test.txt | 18 +-
tools/perf/Makefile.perf | 14 +-
tools/perf/scripts/python/arm-cs-trace-disasm.py | 20 +-
tools/perf/tests/builtin-test.c | 187 +++++++++++++++-
tools/perf/tests/shell/coresight/Makefile | 29 ---
.../perf/tests/shell/coresight/Makefile.miniconfig | 14 --
tools/perf/tests/shell/coresight/asm_pure_loop.sh | 22 --
.../tests/shell/coresight/asm_pure_loop/.gitignore | 1 -
.../tests/shell/coresight/asm_pure_loop/Makefile | 34 ---
.../shell/coresight/asm_pure_loop/asm_pure_loop.S | 30 ---
.../tests/shell/coresight/concurrent_threads.sh | 45 ++++
.../tests/shell/coresight/context_switch_thread.sh | 69 ++++++
tools/perf/tests/shell/coresight/deterministic.sh | 71 +++++++
.../tests/shell/coresight/memcpy_thread/.gitignore | 1 -
.../tests/shell/coresight/memcpy_thread/Makefile | 33 ---
.../shell/coresight/memcpy_thread/memcpy_thread.c | 80 -------
.../tests/shell/coresight/memcpy_thread_16k_10.sh | 22 --
.../perf/tests/shell/coresight/raw_dump_stress.sh | 48 +++++
.../shell/{ => coresight}/test_arm_coresight.sh | 43 ++--
.../{ => coresight}/test_arm_coresight_disasm.sh | 17 +-
.../tests/shell/coresight/thread_loop/.gitignore | 1 -
.../tests/shell/coresight/thread_loop/Makefile | 33 ---
.../shell/coresight/thread_loop/thread_loop.c | 85 --------
.../shell/coresight/thread_loop_check_tid_10.sh | 23 --
.../shell/coresight/thread_loop_check_tid_2.sh | 23 --
.../shell/coresight/unroll_loop_thread/.gitignore | 1 -
.../shell/coresight/unroll_loop_thread/Makefile | 33 ---
.../unroll_loop_thread/unroll_loop_thread.c | 75 -------
.../tests/shell/coresight/unroll_loop_thread_10.sh | 22 --
tools/perf/tests/shell/lib/coresight.sh | 134 ------------
tools/perf/tests/tests.h | 3 +
tools/perf/tests/workloads/Build | 4 +
tools/perf/tests/workloads/context_switch_loop.c | 101 +++++++++
tools/perf/tests/workloads/deterministic.c | 39 ++++
tools/perf/tests/workloads/named_threads.c | 109 ++++++++++
tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 21 +-
tools/perf/util/cs-etm.c | 234 ++++++++++++---------
tools/perf/util/cs-etm.h | 8 +-
40 files changed, 889 insertions(+), 938 deletions(-)
---
base-commit: 5f0ca6b80b12bab1ce06839cdffb6148bb650ff4
change-id: 20260515-james-cs-context-tracking-fix-754998bae7ed
Best regards,
--
James Clark <james.clark(a)linaro.org>
On Tue, May 19, 2026 at 04:48:08PM +0100, Yeoreum Yun wrote:
> In the perf enable path, there are missing cases where
> cscfg_csdev_disable_active_config() is not called:
>
> - Branch broadcast is selected but not supported by the hardware
> - etm4_enable_hw() fails
>
> This can lead to a leak of config_desc->active_cnt.
> Fix this by properly calling cscfg_csdev_disable_active_config()
> in these error paths.
>
> Fixes: 810ac401db1f ("coresight: etm4x: Add complex configuration handlers to etmv4")
> Signed-off-by: Yeoreum Yun <yeoreum.yun(a)arm.com>
> ---
> .../hwtracing/coresight/coresight-etm4x-core.c | 18 ++++++++++++------
> 1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> index f0a9f2ef4b27..0889937811cb 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> @@ -899,6 +899,8 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
> * Missing BB support could cause silent decode errors
> * so fail to open if it's not supported.
> */
> + if (cfg_hash)
> + cscfg_csdev_disable_active_config(csdev);
> ret = -EINVAL;
> goto out;
> } else {
> @@ -915,6 +917,7 @@ static int etm4_enable_perf(struct coresight_device *csdev,
> struct coresight_path *path)
> {
> struct etmv4_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> + struct perf_event_attr *attr = &event->attr;
> int ret;
>
> if (WARN_ON_ONCE(drvdata->cpu != smp_processor_id()))
> @@ -926,7 +929,7 @@ static int etm4_enable_perf(struct coresight_device *csdev,
> /* Configure the tracer based on the session's specifics */
> ret = etm4_parse_event_config(csdev, event);
> if (ret)
> - goto out;
> + goto err;
>
> drvdata->trcid = path->trace_id;
>
> @@ -935,16 +938,19 @@ static int etm4_enable_perf(struct coresight_device *csdev,
>
> /* And enable it */
> ret = etm4_enable_hw(drvdata);
> -
> -out:
> - /* Failed to start tracer; roll back to DISABLED mode */
> if (ret) {
> - coresight_set_mode(csdev, CS_MODE_DISABLED);
> - return ret;
> + if (ATTR_CFG_GET_FLD(attr, configid))
> + cscfg_csdev_disable_active_config(csdev);
> + goto err;
> }
>
> csdev->path = path;
> return 0;
> +
> +err:
> + /* Failed to start tracer; roll back to DISABLED mode */
> + coresight_set_mode(csdev, CS_MODE_DISABLED);
> + return ret;
> }
Seems to me, it is not readable if spread error handling into both child
and parent functions. I would like we can stick to use the same function
for resource allocation and rollback for failures.
How about below change (I did not verify it yet):
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
index 0889937811cb..8347efb2069b 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
@@ -794,8 +794,7 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
.ATTR_CFG_FLD_timestamp_CFG = U64_MAX,
};
struct perf_event_attr *attr = &event->attr;
- unsigned long cfg_hash;
- int preset, cc_threshold;
+ int cc_threshold;
u8 ts_level;
/* Clear configuration from previous run */
@@ -882,16 +881,6 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
/* bit[12], Return stack enable bit */
config->cfg |= TRCCONFIGR_RS;
- /*
- * Set any selected configuration and preset. A zero configid means no
- * configuration active, preset = 0 means no preset selected.
- */
- cfg_hash = ATTR_CFG_GET_FLD(attr, configid);
- if (cfg_hash) {
- preset = ATTR_CFG_GET_FLD(attr, preset);
- ret = cscfg_csdev_enable_active_config(csdev, cfg_hash, preset);
- }
-
/* branch broadcast - enable if selected and supported */
if (ATTR_CFG_GET_FLD(attr, branch_broadcast)) {
if (!caps->trcbb) {
@@ -899,8 +888,6 @@ static int etm4_parse_event_config(struct coresight_device *csdev,
* Missing BB support could cause silent decode errors
* so fail to open if it's not supported.
*/
- if (cfg_hash)
- cscfg_csdev_disable_active_config(csdev);
ret = -EINVAL;
goto out;
} else {
@@ -918,7 +905,8 @@ static int etm4_enable_perf(struct coresight_device *csdev,
{
struct etmv4_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
struct perf_event_attr *attr = &event->attr;
- int ret;
+ unsigned long cfg_hash;
+ int preset, ret;
if (WARN_ON_ONCE(drvdata->cpu != smp_processor_id()))
return -EINVAL;
@@ -931,6 +919,18 @@ static int etm4_enable_perf(struct coresight_device *csdev,
if (ret)
goto err;
+ /*
+ * Set any selected configuration and preset. A zero configid means no
+ * configuration active, preset = 0 means no preset selected.
+ */
+ cfg_hash = ATTR_CFG_GET_FLD(attr, configid);
+ if (cfg_hash) {
+ preset = ATTR_CFG_GET_FLD(attr, preset);
+ ret = cscfg_csdev_enable_active_config(csdev, cfg_hash, preset);
+ if (ret)
+ goto err;
+ }
+
drvdata->trcid = path->trace_id;
/* Populate pause state */
@@ -938,15 +938,15 @@ static int etm4_enable_perf(struct coresight_device *csdev,
/* And enable it */
ret = etm4_enable_hw(drvdata);
- if (ret) {
- if (ATTR_CFG_GET_FLD(attr, configid))
- cscfg_csdev_disable_active_config(csdev);
- goto err;
- }
+ if (ret)
+ goto err_enable_hw;
csdev->path = path;
return 0;
+err_enable_hw:
+ if (cfg_hash)
+ cscfg_csdev_disable_active_config(csdev);
err:
/* Failed to start tracer; roll back to DISABLED mode */
coresight_set_mode(csdev, CS_MODE_DISABLED);
On Thu, May 28, 2026 at 03:26:57PM +0100, Yeoreum Yun wrote:
[...]
> > > @@ -47,7 +47,7 @@ static int etm4_cfg_map_reg_offset(struct etmv4_drvdata *drvdata,
> > > struct cscfg_regval_csdev *reg_csdev, u32 offset)
> > > {
> > > int err = -EINVAL, idx;
> > > - struct etmv4_config *drvcfg = &drvdata->config;
> > > + struct etmv4_config *drvcfg = &drvdata->active_config;
> >
> > Wouldn't it make more sense to keep using drvdata->config for cfgfs?
> > This would avoid cfgfs updating the active configuration while a session
> > is enabled.
> >
> > My understanding is after refactoring cfgfs, we will never expose
> > active_config or config anymore so this should not an issue. Before
> > that, maybe we just keep to use config so can avoid mess.
>
> I don't think so. since the cfgfs's config is updated right before
> enable and in this patchset, It's applied after "take the mode".
>
> If we do it into drvdta->config, then contention would be increased
> with the access of sysfs and that nullifies almost this patch purpose
> and because of we use "one config" this makes a corruption with perf
> and sysfs.
Here have two different contention: the contention between sysfs and
cfgfs, and the contention between cfgfs and a runtime config (active
config). My suggestion is to avoid the later contention, which is the
main idea in this series that avoid the runtime config is modified in
the middle of a session.
That said, I have no strong opinion.
> > > static void etm4_enable_sysfs_smp_call(void *info)
> > > {
> > > struct etm4_enable_arg *arg = info;
> > > + struct etmv4_drvdata *drvdata;
> > > struct coresight_device *csdev;
> > > + unsigned long cfg_hash;
> > > + int preset;
> > >
> > > if (WARN_ON(!arg))
> > > return;
> > >
> > > - csdev = arg->drvdata->csdev;
> > > + drvdata = arg->drvdata;
> > > + csdev = drvdata->csdev;
> > > if (!coresight_take_mode(csdev, CS_MODE_SYSFS)) {
> > > /* Someone is already using the tracer */
> > > arg->rc = -EBUSY;
> > > return;
> > > }
> > >
> > > - arg->rc = etm4_enable_hw(arg->drvdata);
> > > + /* enable any config activated by configfs */
> > > + cscfg_config_sysfs_get_active_cfg(&cfg_hash, &preset);
> > >
> > > - /* The tracer didn't start */
> > > + drvdata->active_config = arg->config;
> >
> > Can move this down to just before drvdata->trcid assignment so cscfg
> > operations can be put together?
>
> No. Copy the arg->config must be before cscfg_csdev_enable_active_config().
> If that's after, it overrides the "preset" and this is different from
> the former behavior.
Please copy config first, then do cscfg stuffs.
> > > + arg->rc = etm4_enable_hw(drvdata);
> > > if (arg->rc) {
> > > - coresight_set_mode(csdev, CS_MODE_DISABLED);
> > > - return;
> > > + cscfg_csdev_disable_active_config(csdev);
> > > + goto err;
> >
> > Add a new goto tag (like err_enable_hw) for the disable_active_config
> > rollback?
>
> This is only place where cscfg_csdev_disable_active_config().
> Why do we need to goto for cleanup where there is no duplication?
It is about to use a central place for handling errors.
> > > @@ -1449,7 +1458,7 @@ static void etm4_init_arch_data(void *info)
> > >
> > > /* EXLEVEL_S, bits[19:16] Secure state instruction tracing */
> > > caps->s_ex_level = FIELD_GET(TRCIDR3_EXLEVEL_S_MASK, etmidr3);
> > > - drvdata->config.s_ex_level = caps->s_ex_level;
> > > + config->s_ex_level = caps->s_ex_level;
> >
> > config->s_ex_level is redundant and not really a configuration item.
> > We should use caps->s_ex_level instead of config->s_ex_level wherever
> > it is used.
>
> Agree, but this includes it should change the omse function
> declations to pass "caps" argument:
> - etm4_set_comparator_filter()
> - etm4_set_start_stop_filter()
>
> If that's okay, I'll respin with it.
This is fine for me.
Thanks,
Leo