When multiple ETMs are enabled simultaneously, the time required
to complete a flush in the process of reading the TMC device node
may exceed the default wait time of 100us. If the TMC capture is
stopped while any ETM has not completed its flush, it can cause
the corresponding CPU to hang.
Fix the by checking the TMCReady bit after the flush. If TMCReady
bit is set, TraceCaptEn bit can be clear; otherwise, return directly
and stop the TMC read.
Signed-off-by: Yuanfang Zhang <quic_yuanfang(a)quicinc.com>
---
drivers/hwtracing/coresight/coresight-tmc-etf.c | 17 +++++++++++++++--
drivers/hwtracing/coresight/coresight-tmc-etr.c | 22 +++++++++++++++++-----
2 files changed, 32 insertions(+), 7 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index d4f641cd9de69488fe3d1c1dc9b5a9eafb55ed59..bded290c42891d782344d9a6e63ebdbed6719133 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -80,11 +80,21 @@ static void tmc_etb_dump_hw(struct tmc_drvdata *drvdata)
return;
}
-static void __tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
+static int __tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
{
+ int rc;
+
CS_UNLOCK(drvdata->base);
tmc_flush_and_stop(drvdata);
+
+ rc = tmc_wait_for_tmcready(drvdata);
+ if (rc) {
+ dev_err(&drvdata->csdev->dev,
+ "Failed to disable : TMC is not ready\n");
+ CS_LOCK(drvdata->base);
+ return rc;
+ }
/*
* When operating in sysFS mode the content of the buffer needs to be
* read before the TMC is disabled.
@@ -94,6 +104,7 @@ static void __tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
tmc_disable_hw(drvdata);
CS_LOCK(drvdata->base);
+ return 0;
}
static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata)
@@ -650,7 +661,9 @@ int tmc_read_prepare_etb(struct tmc_drvdata *drvdata)
ret = -EINVAL;
goto out;
}
- __tmc_etb_disable_hw(drvdata);
+ ret = __tmc_etb_disable_hw(drvdata);
+ if (ret)
+ goto out;
}
drvdata->reading = true;
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index a48bb85d0e7f44a25b813f3c828cc3d705d16012..63a1f7501562fa0b5c2fe6ea53dce4d82842bec3 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -1135,11 +1135,21 @@ static void tmc_etr_sync_sysfs_buf(struct tmc_drvdata *drvdata)
}
}
-static void __tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
+static int __tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
{
+ int rc;
+
CS_UNLOCK(drvdata->base);
tmc_flush_and_stop(drvdata);
+
+ rc = tmc_wait_for_tmcready(drvdata);
+ if (rc) {
+ dev_err(&drvdata->csdev->dev,
+ "Failed to disable : TMC is not ready\n");
+ CS_LOCK(drvdata->base);
+ return rc;
+ }
/*
* When operating in sysFS mode the content of the buffer needs to be
* read before the TMC is disabled.
@@ -1150,7 +1160,7 @@ static void __tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
tmc_disable_hw(drvdata);
CS_LOCK(drvdata->base);
-
+ return 0;
}
void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
@@ -1779,9 +1789,11 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata)
}
/* Disable the TMC if we are trying to read from a running session. */
- if (coresight_get_mode(drvdata->csdev) == CS_MODE_SYSFS)
- __tmc_etr_disable_hw(drvdata);
-
+ if (coresight_get_mode(drvdata->csdev) == CS_MODE_SYSFS) {
+ ret = __tmc_etr_disable_hw(drvdata);
+ if (ret)
+ goto out;
+ }
drvdata->reading = true;
out:
spin_unlock_irqrestore(&drvdata->spinlock, flags);
---
base-commit: fac04efc5c793dccbd07e2d59af9f90b7fc0dca4
change-id: 20250103-fix_cpu_hung-b5a95179ada4
Best regards,
--
Yuanfang Zhang <quic_yuanfang(a)quicinc.com>
This patch series is rebased on coresight-next-v6.12.rc4
* Patches 1 & 2 adds support for allocation of trace buffer pages from
reserved RAM
* Patches 3 & 4 adds support for saving metadata at the time of kernel panic
* Patch 5 adds support for reading trace data captured at the time of panic
* Patches 6 & 7 adds support for disabling coresight blocks at the time of panic
* Patch 8: Gives the full description about this feature as part of documentation
v12 is posted here,
https://lore.kernel.org/linux-arm-kernel/20241129084714.3057080-1-lcherian@…
Changelog from v12:
* Fixed wrong buffer pointer passed to coresigh_insert_barrier_packet
* tmc_read_prepare/unprepare_crashdata need to be called only once and
hence removed from read path and added to tmc_probe
* tmc_read_prepare_crashdata renamed to tmc_prepare_crashdata and
avoid taking locks as its moved to probe function.
* Introduced read status flag, "reading" specific to reserved buffer to keep the
reserved buffer reading independent of the regular buffer.
* open/release ops for reserved buffer has to take care only about the
set/unset the "reading" status flag as the reserved buffer is prepared
during the probe time itself.
* Few other trivial changes
Changelog from v11:
Convert all commands to literal code blocks, that was missed out in v11.
No other code changes.
Changelog from v10:
* Converted all csdev_access_* to readl functions in tmc_panic_sync_*
* Added "tmc" prefix for register snapshots in struct tmc_crash_metadata
* Converted dev_info to dev_dbg in panic handlers
* Converted dsb to dmb in panic handlers
* Fixed marking metadata as invalid when a user is trying to use the
reserved buffer. Earlier this was wrongly set at the time of reading
reserved trace buffer.
* Moved common validation checks to is_tmc_crashdata_valid and minor
code rearrangements for efficiency
* Got rid of sink specific prepare/unprepare invocations
* Got rid of full from struct tmc_resrv_buf
* While reading crashdata, size is now calculated from metadata instead
of relying on reserved buffer size populated by dtb
* Minor documenation fixes
Changelog from v9:
* Add common helper function of_tmc_get_reserved_resource_by_name
for better code reuse
* Reserved buffer validity and crashdata validity has been separated to
avoid interdependence
* New fields added to crash metadata: version, ffcr, ffsr, mode
* Version checks added for metadata validation
* Special file /dev/crash_tmc_xxx would be available only when
crash metadata is valid
* Removed READ_CRASHDATA mode meant for special casing crashdata reads.
Instead, dedicated read function added for crashdata reads from reserved
buffer which is common for both ETR and ETF sinks as well.
* Documentation added to Documentation/tracing/coresight/panic.rst
Changelog from v8:
* Added missing exit path on error in __tmc_probe.
* Few whitespace fixes, checkpatch fixes.
* With perf sessions honouring stop_on_flush sysfs attribute,
removed redundant variable stop_on_flush_en.
Changelog from v7:
* Fixed breakage on perf test -vvvv "arm coresight".
No issues seen with and without "resrv" buffer mode
* Moved the crashdev registration into a separate function.
* Removed redundant variable in tmc_etr_setup_crashdata_buf
* Avoided a redundant memcpy in tmc_panic_sync_etf.
* Tested kernel panic with trace session started uisng perf.
Please see the title "Perf based testing" below for details.
For this, stop_on_flush sysfs attribute is taken into
consideration while starting perf sessions as well.
Changelog from v6:
* Added special device files for reading crashdata, so that
read_prevboot mode flag is removed.
* Added new sysfs TMC device attribute, stop_on_flush.
Stop on flush trigger event is disabled by default.
User need to explicitly enable this from sysfs for panic stop
to work.
* Address parameter for panicstop ETM configuration is
chosen as kernel "panic" address by default.
* Added missing tmc_wait_for_tmcready during panic handling
* Few other misc code rearrangements.
Changelog from v5:
* Fixed issues reported by CONFIG_DEBUG_ATOMIC_SLEEP
* Fixed a memory leak while reading data from /dev/tmc_etrx in
READ_PREVBOOT mode
* Tested reading trace data from crashdump kernel
Changelog from v4:
* Device tree binding
- Description is made more explicit on the usage of reserved memory
region
- Mismatch in memory region names in dts binding and driver fixed
- Removed "mem" suffix from the memory region names
* Rename "struct tmc_register_snapshot" -> "struct tmc_crash_metadata",
since it contains more than register snapshot.
Related variables are named accordingly.
* Rename struct tmc_drvdata members
resrv_buf -> crash_tbuf
metadata -> crash_mdata
* Size field in metadata refers to RSZ register and hence indicates the
size in 32 bit words. ETR metadata follows this convention, the same
has been extended to ETF metadata as well.
* Added crc32 for more robust metadata and tracedata validation.
* Added/modified dev_dbg messages during metadata validation
* Fixed a typo in patch 5 commit description
Changelog from v3:
* Converted the Coresight ETM driver change to a named configuration.
RFC tag has been removed with this change.
* Fixed yaml issues reported by "make dt_binding_check"
* Added names for reserved memory regions 0 and 1
* Added prevalidation checks for metadata processing
* Fixed a regression introduced in RFC v3
- TMC Status register was getting saved wrongly
* Reverted memremap attribute changes from _WB to _WC to match
with the dma map attributes
* Introduced reserved buffer mode specific .sync op.
This fixes a possible crash when reserved buffer mode was used in
normal trace capture, due to unwanted dma maintenance operations.
Linu Cherian (8):
dt-bindings: arm: coresight-tmc: Add "memory-region" property
coresight: tmc-etr: Add support to use reserved trace memory
coresight: core: Add provision for panic callbacks
coresight: tmc: Enable panic sync handling
coresight: tmc: Add support for reading crash data
coresight: tmc: Stop trace capture on FlIn
coresight: config: Add preloaded configuration
Documentation: coresight: Panic support
.../bindings/arm/arm,coresight-tmc.yaml | 26 ++
Documentation/trace/coresight/panic.rst | 362 ++++++++++++++++++
drivers/hwtracing/coresight/Makefile | 2 +-
.../coresight/coresight-cfg-preload.c | 2 +
.../coresight/coresight-cfg-preload.h | 2 +
.../hwtracing/coresight/coresight-cfg-pstop.c | 83 ++++
drivers/hwtracing/coresight/coresight-core.c | 42 ++
.../hwtracing/coresight/coresight-tmc-core.c | 308 ++++++++++++++-
.../hwtracing/coresight/coresight-tmc-etf.c | 92 ++++-
.../hwtracing/coresight/coresight-tmc-etr.c | 184 ++++++++-
drivers/hwtracing/coresight/coresight-tmc.h | 105 +++++
include/linux/coresight.h | 12 +
12 files changed, 1208 insertions(+), 12 deletions(-)
create mode 100644 Documentation/trace/coresight/panic.rst
create mode 100644 drivers/hwtracing/coresight/coresight-cfg-pstop.c
--
2.34.1
Hi Suzuki,
thanks for the reply! The CPUs of the boards I am using are all based on Arm-v8(.2), but I found the components' addresses in the manuals of the SoCs.
I managed to modify the Devicetree by writing my own .dtsi file (see attachment) and finally got the CoreSight devices in /sys/devices/.
However, dmesg shows the following:
[ 0.000000] Booting Linux on physical CPU 0x0
[ 0.000000] Linux version 4.9.253-coresight (user@user-desktop) (gcc version 7.5.0 (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) ) #1 SMP PREEMPT Wed Jan 1 18:45:04 CET 2025
[ 0.000000] Boot CPU: AArch64 Processor [411fd071]
(omitted 87 lines)
[ 0.212039] DTS File Name: /home/user/Downloads/Linux_for_Tegra/source/public/kernel/kernel-4.9/arch/arm64/boot/dts/../../../../../../hardware/nvidia/platform/t210/porg/kernel-dts/tegra210-p3448-0000-p3449-0000-a02.dts
[ 0.212045] DTB Build time: Jan 1 2025 16:04:45
(omitted 35 lines)
[ 0.420616] DTS File Name: /home/user/Downloads/Linux_for_Tegra/source/public/kernel/kernel-4.9/arch/arm64/boot/dts/../../../../../../hardware/nvidia/platform/t210/porg/kernel-dts/tegra210-p3448-0000-p3449-0000-a02.dts
[ 0.420622] DTB Build time: Jan 1 2025 16:04:45
(omitted 75 lines)
[ 0.524166] OF: amba_device_add() failed (-19) for /funnel_bccplex@73001000
(omitted 367 lines)
[ 1.330484] OF: graph: no port node found in /etf@72030000
[ 1.330757] OF: graph: no port node found in /etr@72050000
[ 1.330987] OF: graph: no port node found in /funnel_major@72010000
[ 1.331238] OF: graph: no port node found in /ptm0@73440000
[ 1.331451] coresight-etm4x 73440000.ptm0: CPU0: Cortex-A57 ETM v4.0 initialized
[ 1.331482] OF: graph: no port node found in /ptm1@73540000
[ 1.331689] coresight-etm4x 73540000.ptm1: CPU1: Cortex-A57 ETM v4.0 initialized
[ 1.331719] OF: graph: no port node found in /ptm2@73640000
[ 1.331938] coresight-etm4x 73640000.ptm2: CPU2: Cortex-A57 ETM v4.0 initialized
[ 1.331944] extcon-disp-state extcon:disp-state: cable 47 state 0
[ 1.331946] Extcon AUX1(HDMI) disable
[ 1.331976] OF: graph: no port node found in /ptm3@73740000
[ 1.332192] coresight-etm4x 73740000.ptm3: CPU3: Cortex-A57 ETM v4.0 initialized
[ 1.332250] OF: graph: no port node found in /replicator@72040000
[ 1.332305] coresight-replicator-qcom 72040000.replicator: REPLICATOR 1.0 initialized
[ 1.332350] OF: graph: no port node found in /stm@72070000
[ 1.332386] coresight-stm 72070000.stm: stm_register_device failed, probing deffered
(omitted 64 lines)
[ 1.411751] OF: graph: no port node found in /stm@72070000
[ 1.412025] coresight-stm 72070000.stm: STM32 initialized
(omitted 212 lines)
Do you have an idea what I did wrong? In the end, I want to be able to follow the steps described here:
https://docs.nvidia.com/jetson/archives/l4t-archived/l4t-3275/index.html#pa…
Best regards,
Vincent
(P.S. There was a problem sending this email a first time, but it should work now)
On 24/01/2025 7:25 am, Jie Gan wrote:
> Add 'struct coresight_path' to store the data that is needed by
> coresight_enable_path/coresight_disable_path. The structure
> will be transmitted to the helper and sink device to enable
> related funcationalities.
>
> Signed-off-by: Jie Gan <quic_jiegan(a)quicinc.com>
> ---
[...]
> /*
> * If we still have access to the event_data via handle,
> @@ -595,11 +599,11 @@ static void etm_event_stop(struct perf_event *event, int mode)
> if (!csdev)
> return;
>
> - path = etm_event_cpu_path(event_data, cpu);
> - if (!path)
> + cs_path = etm_event_cpu_path(event_data, cpu);
> + if (!cs_path)
I don't think renaming 'path' to 'cs_path' is worth the churn. It's in a
lot of places in this change, but I think path is already good enough.
> return;
>
> - sink = coresight_get_sink(path);
> + sink = coresight_get_sink(cs_path->path);
coresight_get_sink() is always called with cs_path->path, so we might as
well make it take a whole path struct. The same with any of the other
functions that operate on path like coresight_get_source().
Proof of concept to support CTCU device. Applies to Jie's patchset in
the parent email. I think this would be a good simplification, it
removes some code and makes things a bit clearer, and works for both the
old and new CTCU requirements. It will require merging into the parent
patchset somehow as it undoes some of those changes.
James Clark (3):
coresight: Don't save handle in path
coresight: Export coresight_get_sink()
coresight: Alloc trace ID after building the path
drivers/hwtracing/coresight/coresight-core.c | 107 +++++++++++++-----
drivers/hwtracing/coresight/coresight-dummy.c | 9 +-
drivers/hwtracing/coresight/coresight-etb10.c | 8 +-
.../hwtracing/coresight/coresight-etm-perf.c | 20 ++--
drivers/hwtracing/coresight/coresight-etm.h | 1 -
.../coresight/coresight-etm3x-core.c | 84 ++------------
.../coresight/coresight-etm3x-sysfs.c | 3 +-
.../coresight/coresight-etm4x-core.c | 83 ++------------
.../coresight/coresight-etm4x-sysfs.c | 4 +-
drivers/hwtracing/coresight/coresight-etm4x.h | 1 -
drivers/hwtracing/coresight/coresight-priv.h | 17 +--
drivers/hwtracing/coresight/coresight-stm.c | 5 +-
drivers/hwtracing/coresight/coresight-sysfs.c | 6 +-
.../hwtracing/coresight/coresight-tmc-etf.c | 9 +-
.../hwtracing/coresight/coresight-tmc-etr.c | 13 +--
drivers/hwtracing/coresight/coresight-tmc.h | 2 +-
drivers/hwtracing/coresight/coresight-tpda.c | 3 +-
drivers/hwtracing/coresight/coresight-tpdm.c | 3 +-
drivers/hwtracing/coresight/coresight-tpiu.c | 2 +-
drivers/hwtracing/coresight/coresight-trbe.c | 4 +-
drivers/hwtracing/coresight/ultrasoc-smb.c | 8 +-
include/linux/coresight.h | 25 +++-
22 files changed, 159 insertions(+), 258 deletions(-)
--
2.34.1
On 29/01/2025 1:02 pm, Jie Gan wrote:
>
>
> On 1/29/2025 6:35 PM, James Clark wrote:
>>
>>
>> On 29/01/2025 12:46 am, Jie Gan wrote:
>>>
>>>
>>> On 1/28/2025 7:55 PM, James Clark wrote:
>>>>
>>>>
>>>> On 24/01/2025 7:25 am, Jie Gan wrote:
>>>>> The Coresight TMC Control Unit hosts miscellaneous configuration
>>>>> registers
>>>>> which control various features related to TMC ETR sink.
>>>>>
>>>>> Based on the trace ID, which is programmed in the related CTCU ATID
>>>>> register of a specific ETR, trace data with that trace ID gets into
>>>>> the ETR buffer, while other trace data gets dropped.
>>>>>
>>>>> Enabling source device sets one bit of the ATID register based on
>>>>> source device's trace ID.
>>>>> Disabling source device resets the bit according to the source
>>>>> device's trace ID.
>>>>>
>>>>> Signed-off-by: Jie Gan <quic_jiegan(a)quicinc.com>
>>>>> ---
>>>>> drivers/hwtracing/coresight/Kconfig | 12 +
>>>>> drivers/hwtracing/coresight/Makefile | 1 +
>>>>> drivers/hwtracing/coresight/coresight-ctcu.c | 276 ++++++++++++++
>>>>> + ++++
>>>>> drivers/hwtracing/coresight/coresight-ctcu.h | 30 ++
>>>>> include/linux/coresight.h | 3 +-
>>>>> 5 files changed, 321 insertions(+), 1 deletion(-)
>>>>> create mode 100644 drivers/hwtracing/coresight/coresight-ctcu.c
>>>>> create mode 100644 drivers/hwtracing/coresight/coresight-ctcu.h
>>>> >
>>>>
>>>> [...]
>>>>
>>>>> +/*
>>>>> + * ctcu_set_etr_traceid: Retrieve the ATID offset and trace ID.
>>>>> + *
>>>>> + * Returns 0 indicates success. None-zero result means failure.
>>>>> + */
>>>>> +static int ctcu_set_etr_traceid(struct coresight_device *csdev,
>>>>> struct coresight_path *cs_path,
>>>>> + bool enable)
>>>>> +{
>>>>> + struct coresight_device *sink = coresight_get_sink(cs_path-
>>>>> >path);
>>>>> + struct ctcu_drvdata *drvdata = dev_get_drvdata(csdev-
>>>>> >dev.parent);
>>>>> + u8 trace_id = cs_path->trace_id;
>>>>> + int port_num;
>>>>> +
>>>>> + if ((sink == NULL) || !IS_VALID_CS_TRACE_ID(trace_id) ||
>>>>> IS_ERR_OR_NULL(drvdata)) {
>>>>> + dev_err(&csdev->dev, "Invalid parameters\n");
>>>>> + return -EINVAL;
>>>>> + }
>>>>> +
>>>>> + port_num = ctcu_get_active_port(sink, csdev);
>>>>> + if (port_num < 0)
>>>>> + return -EINVAL;
>>>>> +
>>>>> + /*
>>>>> + * Skip the disable session if more than one TPDM device that
>>>>> + * connected to the same TPDA device has been enabled.
>>>>> + */
>>>>> + if (enable)
>>>>> + atomic_inc(&drvdata->traceid_refcnt[port_num][trace_id]);
>>>>> + else {
>>>>> + if (atomic_dec_return(&drvdata->traceid_refcnt[port_num]
>>>>> [trace_id]) > 0) {
>>>>> + dev_dbg(&csdev->dev, "Skip the disable session\n");
>>>>> + return 0;
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + dev_dbg(&csdev->dev, "traceid is %d\n", cs_path->trace_id);
>>>>> +
>>>>> + return __ctcu_set_etr_traceid(csdev, trace_id, port_num, enable);
>>>>
>>>> Hi Jie,
>>>>
>>>> Using atomic_dec_return() here doesn't prevent
>>>> __ctcu_set_etr_traceid() from running concurrent enable and
>>>> disables. Once you pass the atomic_dec_return() a second call to
>>>> enable it will mess it up.
>>>>
>>>> I think you need a spinlock around the whole thing and then the
>>>> refcounts don't need to be atomics.
>>>>
>>> Hi, James
>>> Thanks for comment. I may not fully tested my codes here. What I was
>>> thinking is there's no way the refcnt could become a negative number
>>> under current framework. So I just added spinlock in
>>> __ctcu_set_etr_traceid() to ensure concurrent sessions correctly
>>> manipulate the register.
>>>
>>> As the trace_id related to the bit of the ATID register, I think the
>>> concurrent processes are working fine with spinlock around read/write
>>> register.
>>>
>>> I may not fully got your point here. Please help me to correct it.
>>>
>>> Thanks,
>>> Jie
>>>
>>>
>>
>> No it can't become negative, but the refcount can be a different state
>> to the one that was actually written:
>>
>>
>> CPU0 CPU1
>> ---- ----
>> ctcu_set_etr_traceid(enable)
>> ctcu_set_etr_traceid(disable)
>> atomic_inc()
>> recount == 1
>> atomic_dec()
>> recount == 0
>>
>> __ctcu_set_etr_traceid(disable)
>> Lock and write disable state to
>> device
>>
>> __ctcu_set_etr_traceid(enable)
>> Lock and write enable state to
>> device
>>
>>
>> As you can see this leaves the device in an enabled state but the
>> refcount is 0.
> Yes, you are right. I didnt consider this scenario. We definitely need
> spinlock here.
>
>>
>> This is also quite large if you use atomic types:
>>
>> /* refcnt for each traceid of each sink */
>> atomic_t traceid_refcnt[ATID_MAX_NUM][CORESIGHT_TRACE_ID_RES_TOP];
>>
>> Presumably you can't have the refcount for each ID be higher than the
>> max number of TPDMs connected? If you make the locked area a bit wider
>> you don't need atomic types and also solve the above problem. So you
>> could do u8, or DECLARE_BITMAP() and bitmap_read() etc to read 3 bit
>> values. Or however wide it needs to be.
> The original purpose of using atomic here is trying to narrow the locked
> area.
>
> I think u8 is ok here.
> u8 traceid_refcnt[ATID_MAX_NUM][CORESIGHT_TRACE_ID_RES_TOP] will cost
> 224 bytes, I think it's acceptable here.
>
> Thanks,
> Jie
>
Yep u8 sounds ok then
On 29/01/2025 12:46 am, Jie Gan wrote:
>
>
> On 1/28/2025 7:55 PM, James Clark wrote:
>>
>>
>> On 24/01/2025 7:25 am, Jie Gan wrote:
>>> The Coresight TMC Control Unit hosts miscellaneous configuration
>>> registers
>>> which control various features related to TMC ETR sink.
>>>
>>> Based on the trace ID, which is programmed in the related CTCU ATID
>>> register of a specific ETR, trace data with that trace ID gets into
>>> the ETR buffer, while other trace data gets dropped.
>>>
>>> Enabling source device sets one bit of the ATID register based on
>>> source device's trace ID.
>>> Disabling source device resets the bit according to the source
>>> device's trace ID.
>>>
>>> Signed-off-by: Jie Gan <quic_jiegan(a)quicinc.com>
>>> ---
>>> drivers/hwtracing/coresight/Kconfig | 12 +
>>> drivers/hwtracing/coresight/Makefile | 1 +
>>> drivers/hwtracing/coresight/coresight-ctcu.c | 276 +++++++++++++++++++
>>> drivers/hwtracing/coresight/coresight-ctcu.h | 30 ++
>>> include/linux/coresight.h | 3 +-
>>> 5 files changed, 321 insertions(+), 1 deletion(-)
>>> create mode 100644 drivers/hwtracing/coresight/coresight-ctcu.c
>>> create mode 100644 drivers/hwtracing/coresight/coresight-ctcu.h
>> >
>>
>> [...]
>>
>>> +/*
>>> + * ctcu_set_etr_traceid: Retrieve the ATID offset and trace ID.
>>> + *
>>> + * Returns 0 indicates success. None-zero result means failure.
>>> + */
>>> +static int ctcu_set_etr_traceid(struct coresight_device *csdev,
>>> struct coresight_path *cs_path,
>>> + bool enable)
>>> +{
>>> + struct coresight_device *sink = coresight_get_sink(cs_path->path);
>>> + struct ctcu_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>>> + u8 trace_id = cs_path->trace_id;
>>> + int port_num;
>>> +
>>> + if ((sink == NULL) || !IS_VALID_CS_TRACE_ID(trace_id) ||
>>> IS_ERR_OR_NULL(drvdata)) {
>>> + dev_err(&csdev->dev, "Invalid parameters\n");
>>> + return -EINVAL;
>>> + }
>>> +
>>> + port_num = ctcu_get_active_port(sink, csdev);
>>> + if (port_num < 0)
>>> + return -EINVAL;
>>> +
>>> + /*
>>> + * Skip the disable session if more than one TPDM device that
>>> + * connected to the same TPDA device has been enabled.
>>> + */
>>> + if (enable)
>>> + atomic_inc(&drvdata->traceid_refcnt[port_num][trace_id]);
>>> + else {
>>> + if (atomic_dec_return(&drvdata->traceid_refcnt[port_num]
>>> [trace_id]) > 0) {
>>> + dev_dbg(&csdev->dev, "Skip the disable session\n");
>>> + return 0;
>>> + }
>>> + }
>>> +
>>> + dev_dbg(&csdev->dev, "traceid is %d\n", cs_path->trace_id);
>>> +
>>> + return __ctcu_set_etr_traceid(csdev, trace_id, port_num, enable);
>>
>> Hi Jie,
>>
>> Using atomic_dec_return() here doesn't prevent
>> __ctcu_set_etr_traceid() from running concurrent enable and disables.
>> Once you pass the atomic_dec_return() a second call to enable it will
>> mess it up.
>>
>> I think you need a spinlock around the whole thing and then the
>> refcounts don't need to be atomics.
>>
> Hi, James
> Thanks for comment. I may not fully tested my codes here. What I was
> thinking is there's no way the refcnt could become a negative number
> under current framework. So I just added spinlock in
> __ctcu_set_etr_traceid() to ensure concurrent sessions correctly
> manipulate the register.
>
> As the trace_id related to the bit of the ATID register, I think the
> concurrent processes are working fine with spinlock around read/write
> register.
>
> I may not fully got your point here. Please help me to correct it.
>
> Thanks,
> Jie
>
>
No it can't become negative, but the refcount can be a different state
to the one that was actually written:
CPU0 CPU1
---- ----
ctcu_set_etr_traceid(enable)
ctcu_set_etr_traceid(disable)
atomic_inc()
recount == 1
atomic_dec()
recount == 0
__ctcu_set_etr_traceid(disable)
Lock and write disable state to
device
__ctcu_set_etr_traceid(enable)
Lock and write enable state to
device
As you can see this leaves the device in an enabled state but the
refcount is 0.
This is also quite large if you use atomic types:
/* refcnt for each traceid of each sink */
atomic_t traceid_refcnt[ATID_MAX_NUM][CORESIGHT_TRACE_ID_RES_TOP];
Presumably you can't have the refcount for each ID be higher than the
max number of TPDMs connected? If you make the locked area a bit wider
you don't need atomic types and also solve the above problem. So you
could do u8, or DECLARE_BITMAP() and bitmap_read() etc to read 3 bit
values. Or however wide it needs to be.
On 24/01/2025 7:25 am, Jie Gan wrote:
> The Coresight TMC Control Unit hosts miscellaneous configuration registers
> which control various features related to TMC ETR sink.
>
> Based on the trace ID, which is programmed in the related CTCU ATID
> register of a specific ETR, trace data with that trace ID gets into
> the ETR buffer, while other trace data gets dropped.
>
> Enabling source device sets one bit of the ATID register based on
> source device's trace ID.
> Disabling source device resets the bit according to the source
> device's trace ID.
>
> Signed-off-by: Jie Gan <quic_jiegan(a)quicinc.com>
> ---
> drivers/hwtracing/coresight/Kconfig | 12 +
> drivers/hwtracing/coresight/Makefile | 1 +
> drivers/hwtracing/coresight/coresight-ctcu.c | 276 +++++++++++++++++++
> drivers/hwtracing/coresight/coresight-ctcu.h | 30 ++
> include/linux/coresight.h | 3 +-
> 5 files changed, 321 insertions(+), 1 deletion(-)
> create mode 100644 drivers/hwtracing/coresight/coresight-ctcu.c
> create mode 100644 drivers/hwtracing/coresight/coresight-ctcu.h
>
[...]
> +/*
> + * ctcu_set_etr_traceid: Retrieve the ATID offset and trace ID.
> + *
> + * Returns 0 indicates success. None-zero result means failure.
> + */
> +static int ctcu_set_etr_traceid(struct coresight_device *csdev, struct coresight_path *cs_path,
> + bool enable)
> +{
> + struct coresight_device *sink = coresight_get_sink(cs_path->path);
> + struct ctcu_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
> + u8 trace_id = cs_path->trace_id;
> + int port_num;
> +
> + if ((sink == NULL) || !IS_VALID_CS_TRACE_ID(trace_id) || IS_ERR_OR_NULL(drvdata)) {
> + dev_err(&csdev->dev, "Invalid parameters\n");
> + return -EINVAL;
> + }
> +
> + port_num = ctcu_get_active_port(sink, csdev);
> + if (port_num < 0)
> + return -EINVAL;
> +
> + /*
> + * Skip the disable session if more than one TPDM device that
> + * connected to the same TPDA device has been enabled.
> + */
> + if (enable)
> + atomic_inc(&drvdata->traceid_refcnt[port_num][trace_id]);
> + else {
> + if (atomic_dec_return(&drvdata->traceid_refcnt[port_num][trace_id]) > 0) {
> + dev_dbg(&csdev->dev, "Skip the disable session\n");
> + return 0;
> + }
> + }
> +
> + dev_dbg(&csdev->dev, "traceid is %d\n", cs_path->trace_id);
> +
> + return __ctcu_set_etr_traceid(csdev, trace_id, port_num, enable);
Hi Jie,
Using atomic_dec_return() here doesn't prevent __ctcu_set_etr_traceid()
from running concurrent enable and disables. Once you pass the
atomic_dec_return() a second call to enable it will mess it up.
I think you need a spinlock around the whole thing and then the
refcounts don't need to be atomics.
On 24/01/2025 7:25 am, Jie Gan wrote:
> Add 'struct coresight_path' to store the data that is needed by
> coresight_enable_path/coresight_disable_path. The structure
> will be transmitted to the helper and sink device to enable
> related funcationalities.
>
> Signed-off-by: Jie Gan <quic_jiegan(a)quicinc.com>
> ---
> drivers/hwtracing/coresight/coresight-core.c | 87 ++++++++++++++-----
> drivers/hwtracing/coresight/coresight-etb10.c | 3 +-
> .../hwtracing/coresight/coresight-etm-perf.c | 52 ++++++-----
> .../hwtracing/coresight/coresight-etm-perf.h | 2 +-
> drivers/hwtracing/coresight/coresight-priv.h | 21 +++--
> drivers/hwtracing/coresight/coresight-sysfs.c | 32 +++----
> .../hwtracing/coresight/coresight-tmc-etf.c | 3 +-
> .../hwtracing/coresight/coresight-tmc-etr.c | 6 +-
> drivers/hwtracing/coresight/coresight-trbe.c | 4 +-
> drivers/hwtracing/coresight/ultrasoc-smb.c | 3 +-
> 10 files changed, 137 insertions(+), 76 deletions(-)
>
[...]
> INIT_LIST_HEAD(path);
> + cs_path->path = path;
> + /*
> + * Since not all source devices have a defined trace_id function,
> + * make sure to check for it before use.
> + *
> + * Assert the mode is CS_MODE_SYSFS, the trace_id will be assigned
> + * again later if the mode is CS_MODE_PERF.
> + */
> + if (source_ops(source)->trace_id != NULL) {
> + rc = source_ops(source)->trace_id(source, CS_MODE_SYSFS, NULL);
I don't think we should do this. Doesn't this consume two trace IDs for
each session? And I'm not even sure if it's released properly if it's
overwritten.
It should be possible to consolidate the all the trace ID allocation to
a single step when building the path, or another function that gets
called just after the path is built. At the moment the ID can be
allocated from about 5 different places and it's quite hard to
understand, especially with these new changes. I have some of it coded
up, let me finish it off and I can share it.
> + if(IS_VALID_CS_TRACE_ID(rc))
> + cs_path->trace_id = rc;
> + else
> + cs_path->trace_id = 0;
> + }
> + else
> + cs_path->trace_id = 0;
[...]
> +/**
> + * struct coresight_path - data needed by enable/disable path
> + * @handle: perf aux handle for ETM.
> + * @path: path from source to sink.
> + * @trace_id: trace_id of the whole path.
> + */
> +struct coresight_path {
> + struct perf_output_handle *handle;
This is only needed to avoid adding *handle to the enable function call
signature, but having it here implies it needs to be stored. And then we
need to manage the lifecycle of it by nulling it on deletion. All of
this can be avoided by just adding handle to enable().
Unrelated to this patch, but I'm not sure why we were passing around
void* for handle either. It just makes the code hard to read and implies
some flexibility that doesn't exist. It's always "struct
perf_output_handle", so we can change void* to that in the enable
functions. I also have a patch for this that I'll share in a bit.
> + struct list_head *path;
> + u8 trace_id;
> +};
> +
> static inline void coresight_insert_barrier_packet(void *buf)
> {
> if (buf)
> @@ -132,16 +144,15 @@ static inline void CS_UNLOCK(void __iomem *addr)
> } while (0);
> }
>
> -void coresight_disable_path(struct list_head *path);
> -int coresight_enable_path(struct list_head *path, enum cs_mode mode,
> - void *sink_data);
> +void coresight_disable_path(struct coresight_path *cs_path);
> +int coresight_enable_path(struct coresight_path *cs_path, enum cs_mode mode);
> struct coresight_device *coresight_get_sink(struct list_head *path);
This needs to be exported otherwise the build fails because you use it
in a module in another commit. I assume you are building as static?
On 23/01/2025 6:28 am, Jie Gan wrote:
>
>
> On 1/13/2025 8:02 PM, James Clark wrote:
>>
>>
>> On 26/12/2024 1:10 am, Jie Gan wrote:
>>> Add 'trace_id' function pointer in ops. It's responsible for
>>> retrieving the device's trace ID.
>>>
>>> Add 'struct cs_sink_data' to store the data that is needed by
>>> coresight_enable_path/coresight_disable_path. The structure
>>> will be transmitted to the helper and sink device to enable
>>> related funcationalities.
>>>
>>
>> The new cs_sink_data struct is quite specific to this change. Can we
>> start passing the path around to enable/disable functions, that will
>> allow devices to gather anything they want in the future. Because we
>> already have coresight_get_sink(path), coresight_get_source(path) etc.
>>
>> And see below, but for this case we can also change the path struct to
>> contain the trace ID. Then all the new functions, allocations and
>> searches for the trace ID are unecessary. The CTCU will have access to
>> the path, and by the time its enable function is called the trace ID
>> is already assigned.
>>
>> It's also easier to understand at which point a trace ID is allocated,
>> rather than adding the trace_id() callbacks from everywhere which
>> could potentially either read or allocate. I suppose that's "safer"
>> because maybe it's not allocated, but I can't see what case it would
>> happen in reverse.
>>
>>> Signed-off-by: Jie Gan <quic_jiegan(a)quicinc.com>
>>> ---
>>> drivers/hwtracing/coresight/coresight-core.c | 59 +++++++++++++++----
>>> drivers/hwtracing/coresight/coresight-etb10.c | 3 +-
>>> .../hwtracing/coresight/coresight-etm-perf.c | 37 ++++++++++--
>>> .../coresight/coresight-etm3x-core.c | 30 ++++++++++
>>> .../coresight/coresight-etm4x-core.c | 29 +++++++++
>>> drivers/hwtracing/coresight/coresight-priv.h | 13 +++-
>>> drivers/hwtracing/coresight/coresight-stm.c | 22 +++++++
>>> drivers/hwtracing/coresight/coresight-sysfs.c | 24 +++++++-
>>> .../hwtracing/coresight/coresight-tmc-etf.c | 3 +-
>>> .../hwtracing/coresight/coresight-tmc-etr.c | 6 +-
>>> drivers/hwtracing/coresight/coresight-tpda.c | 20 +++++++
>>> drivers/hwtracing/coresight/coresight-trbe.c | 4 +-
>>> drivers/hwtracing/coresight/ultrasoc-smb.c | 3 +-
>>> include/linux/coresight.h | 6 ++
>>> 14 files changed, 234 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/
>>> hwtracing/coresight/coresight-core.c
>>> index 0a9380350fb5..2e560b425fd4 100644
>>> --- a/drivers/hwtracing/coresight/coresight-core.c
>>> +++ b/drivers/hwtracing/coresight/coresight-core.c
>>> @@ -23,6 +23,7 @@
>>> #include "coresight-etm-perf.h"
>>> #include "coresight-priv.h"
>>> #include "coresight-syscfg.h"
>>> +#include "coresight-trace-id.h"
>>> /*
>>> * Mutex used to lock all sysfs enable and disable actions and
>>> loading and
>>> @@ -331,12 +332,12 @@ static int coresight_enable_helper(struct
>>> coresight_device *csdev,
>>> return helper_ops(csdev)->enable(csdev, mode, data);
>>> }
>>> -static void coresight_disable_helper(struct coresight_device *csdev)
>>> +static void coresight_disable_helper(struct coresight_device *csdev,
>>> void *data)
>>> {
>>> - helper_ops(csdev)->disable(csdev, NULL);
>>> + helper_ops(csdev)->disable(csdev, data);
>>> }
>>> -static void coresight_disable_helpers(struct coresight_device *csdev)
>>> +static void coresight_disable_helpers(struct coresight_device
>>> *csdev, void *data)
>>> {
>>> int i;
>>> struct coresight_device *helper;
>>> @@ -344,7 +345,7 @@ static void coresight_disable_helpers(struct
>>> coresight_device *csdev)
>>> for (i = 0; i < csdev->pdata->nr_outconns; ++i) {
>>> helper = csdev->pdata->out_conns[i]->dest_dev;
>>> if (helper && coresight_is_helper(helper))
>>> - coresight_disable_helper(helper);
>>> + coresight_disable_helper(helper, data);
>>> }
>>> }
>>> @@ -361,7 +362,7 @@ static void coresight_disable_helpers(struct
>>> coresight_device *csdev)
>>> void coresight_disable_source(struct coresight_device *csdev, void
>>> *data)
>>> {
>>> source_ops(csdev)->disable(csdev, data);
>>> - coresight_disable_helpers(csdev);
>>> + coresight_disable_helpers(csdev, NULL);
>>> }
>>> EXPORT_SYMBOL_GPL(coresight_disable_source);
>>> @@ -371,7 +372,8 @@ EXPORT_SYMBOL_GPL(coresight_disable_source);
>>> * disabled.
>>> */
>>> static void coresight_disable_path_from(struct list_head *path,
>>> - struct coresight_node *nd)
>>> + struct coresight_node *nd,
>>> + void *sink_data)
>>> {
>>> u32 type;
>>> struct coresight_device *csdev, *parent, *child;
>>> @@ -417,13 +419,13 @@ static void coresight_disable_path_from(struct
>>> list_head *path,
>>> }
>>> /* Disable all helpers adjacent along the path last */
>>> - coresight_disable_helpers(csdev);
>>> + coresight_disable_helpers(csdev, sink_data);
>>> }
>>> }
>>> -void coresight_disable_path(struct list_head *path)
>>> +void coresight_disable_path(struct list_head *path, void *sink_data)
>>> {
>>> - coresight_disable_path_from(path, NULL);
>>> + coresight_disable_path_from(path, NULL, sink_data);
>>> }
>>> EXPORT_SYMBOL_GPL(coresight_disable_path);
>>> @@ -505,10 +507,47 @@ int coresight_enable_path(struct list_head
>>> *path, enum cs_mode mode,
>>> out:
>>> return ret;
>>> err:
>>> - coresight_disable_path_from(path, nd);
>>> + coresight_disable_path_from(path, nd, sink_data);
>>> goto out;
>>> }
>>> +int coresight_read_traceid(struct list_head *path, enum cs_mode mode,
>>> + struct coresight_trace_id_map *id_map)
>>> +{
>>> + int trace_id, type;
>>> + struct coresight_device *csdev;
>>> + struct coresight_node *nd;
>>> +
>>> + list_for_each_entry(nd, path, link) {
>>
>> What do you think about also changing the path to this:
>>
>> struct coresight_path {
>> struct list_head *path,
>> u8 trace_id
>> };
>>
>> That would avoid having to traverse the path on every enable and would
>> remove this function. You could also cache the trace ID in the CTCU
>> for a similar benefit, but it wouldn't remove the need to call this at
>> least once.
>>
>> The expensive part should be the create path part, after that enable
>> and disable should be cheap because they happen on schedule for Perf
>> mode. We should be avoiding allocations and searches.
>>
>>> + csdev = nd->csdev;
>>> + type = csdev->type;
>>> +
>>> + switch (type) {
>>> + case CORESIGHT_DEV_TYPE_SOURCE:
>>> + if (source_ops(csdev)->trace_id != NULL) {
>>> + trace_id = source_ops(csdev)->trace_id(csdev,
>>> + mode,
>>> + id_map);
>>> + if (IS_VALID_CS_TRACE_ID(trace_id))
>>> + goto out;
>>> + }
>>> + break;
>>> + case CORESIGHT_DEV_TYPE_LINK:
>>> + if (link_ops(csdev)->trace_id != NULL) {
>>> + trace_id = link_ops(csdev)->trace_id(csdev);
>>> + if (IS_VALID_CS_TRACE_ID(trace_id))
>>> + goto out;
>>> + }
>>> + break;
>>> + default:
>>> + break;
>>> + }
>>> + }
>>> + return -EINVAL;
>>> +out:
>>> + return trace_id;
>>> +}
>>> +
>>> struct coresight_device *coresight_get_sink(struct list_head *path)
>>> {
>>> struct coresight_device *csdev;
>>> diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/
>>> hwtracing/coresight/coresight-etb10.c
>>> index aea9ac9c4bd0..904b5531c256 100644
>>> --- a/drivers/hwtracing/coresight/coresight-etb10.c
>>> +++ b/drivers/hwtracing/coresight/coresight-etb10.c
>>> @@ -173,7 +173,8 @@ static int etb_enable_perf(struct
>>> coresight_device *csdev, void *data)
>>> pid_t pid;
>>> unsigned long flags;
>>> struct etb_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent);
>>> - struct perf_output_handle *handle = data;
>>> + struct cs_sink_data *sink_data = (struct cs_sink_data *)data;
>>> + struct perf_output_handle *handle = sink_data->handle;
>>> struct cs_buffers *buf = etm_perf_sink_config(handle);
>>> spin_lock_irqsave(&drvdata->spinlock, flags);
>>> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/
>>> drivers/hwtracing/coresight/coresight-etm-perf.c
>>> index ad6a8f4b70b6..e676edd42ddc 100644
>>> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
>>> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
>>> @@ -459,6 +459,7 @@ static void etm_event_start(struct perf_event
>>> *event, int flags)
>>> struct perf_output_handle *handle = &ctxt->handle;
>>> struct coresight_device *sink, *csdev = per_cpu(csdev_src, cpu);
>>> struct list_head *path;
>>> + struct cs_sink_data *sink_data = NULL;
>>> u64 hw_id;
>>> u8 trace_id;
>>> @@ -498,9 +499,20 @@ static void etm_event_start(struct perf_event
>>> *event, int flags)
>>> if (WARN_ON_ONCE(!sink))
>>> goto fail_end_stop;
>>> + sink_data = kzalloc(sizeof(*sink_data), GFP_KERNEL);
>>
>> kzalloc can't be called from here. Check dmesg for the warning. That's
>> another reason to do this change on the path. Because the path is
>> allocated on etm_setup_aux() where allocations are allowed.
>>
> Hi, James
> I just tried with following command and did not observe any warning info
> from dmesg, may I ask what's the issue may suffered here?
>
You might be missing some debugging configs like lockdep etc. The
warning is that etm_event_start() is a non-sleepable context and kzalloc
is sleepable. Even if it wasn't an error we still wouldn't want to do
it, etm_event_start() and stop are called too frequently.
> root@qemuarm64:/data# ./perf record -e cs_etm/@tmc_etr0/ --per-thread ls
> configs kernel.txt logs lost+found misc
> perf perf.data perf.data.old root time
> tzstorage weston
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.145 MB perf.data ]
>
> For the new patch version, I implemented an 8-bit hash table in the CTCU
> driver data to handle situations where multiple TPDMs are connected to
> the same TPDA device have been enabled. As we know, TPDMs share the
> trace_id of the TPDA device they are connected to. If we reset the bit
> based on the trace_id without checking the enabled refcount, it causes
> an issue where trace data from other enabled TPDM devices (sharing the
> same trace_id) cannot enter the ETR buffer, as it gets filtered out by
> the CTCU.
I think sharing the code or a diagram might be easier to follow here.
The mention of a refcount makes sense but I don't follow the need for a
hash table. There are other places where single devices are shared by
multiple paths, like funnels, and they're all done with refcounts.
> I need allocate memory when implement hash table(add/remove key entry)
> in coresight_enable_path flow, but you mentioned we cannot call kzalloc
> from here.
>
> Thanks,
> Jie
>
Why not allocate on setup_aux()? That's called by userspace before the
session starts, and then the path is fixed from that point onwards so
you shouldn't need to do any more allocations. That's how it's setup
currently anyway.