Hi Bhupesh,
On Thu, Aug 25, 2022 at 10:52:32AM +0530, Bhupesh Sharma wrote:
> Some Qualcomm ETM implementations require skipping powering up
> the trace unit, as the ETMs are in the same power domain as
> their CPU cores.
>
> Via commit 5214b563588e ("coresight: etm4x: Add support for
> sysreg only devices"), the setting of 'skip_power_up' flag was
> moved after the 'etm4_init_arch_data' function is called, whereas
> the flag value is itself used inside the function. This causes
> a crash when ETM mode 'Low-power state behavior override' is set
> on some Qualcomm parts.
>
> Fix the same.
>
> Fixes: 5214b563588e ("coresight: etm4x: Add support for sysreg only devices")
> Cc: Mike Leach <mike.leach(a)linaro.org>
> Cc: Suzuki K Poulose <suzuki.poulose(a)arm.com>
> Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> Signed-off-by: Bhupesh Sharma <bhupesh.sharma(a)linaro.org>
> ---
> - v1 can be seen here: https://lore.kernel.org/lkml/20220803191236.3037591-1-bhupesh.sharma@linaro…
> - Addressed the review comments from Suzuki.
> - Rebased on linux-next.
>
> drivers/hwtracing/coresight/coresight-etm4x-core.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> index d39660a3e50c..14c1c7869795 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> @@ -977,6 +977,16 @@ static bool etm4_init_sysreg_access(struct etmv4_drvdata *drvdata,
> if (!cpu_supports_sysreg_trace())
> return false;
>
> + /*
> + * Some Qualcomm implementations require skipping powering up the trace unit,
> + * as the ETMs are in the same power domain as their CPU cores.
> + *
> + * Since the 'skip_power_up' flag is used inside 'etm4_init_arch_data' function,
> + * initialize it before the function is called.
> + */
> + if (fwnode_property_present(dev_fwnode(dev), "qcom,skip-power-up"))
> + drvdata->skip_power_up = true;
> +
I personally think this sentence should be placed in the function
etm4_probe(), you need to move it just before smp call
etm4_init_arch_data(), this can allow DT property "qcom,skip-power-up"
to be respected.
> /*
> * ETMs implementing sysreg access must implement TRCDEVARCH.
> */
> @@ -1951,8 +1961,7 @@ static int etm4_probe(struct device *dev, void __iomem *base, u32 etm_pid)
> return -EINVAL;
>
> /* TRCPDCR is not accessible with system instructions. */
> - if (!desc.access.io_mem ||
> - fwnode_property_present(dev_fwnode(dev), "qcom,skip-power-up"))
> + if (!desc.access.io_mem)
> drvdata->skip_power_up = true;
I prefer to move the condition checking for "desc.access.io_mem" to
etm4_init_sysreg_access(), this can make sure the flag skip_power_up
is set correctly based on property of system register access.
A side topic, in the mainline kernel I found the value
"desc.access.io_mem" is always zero (see the initialized value in
etm4_probe() and etm4_init_sysreg_access()). Should we initialize
desc.access.io_mem to true in etm4_probe()?
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
index d39660a3e50c..cf2555c50abb 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
@@ -1939,6 +1939,7 @@ static int etm4_probe(struct device *dev, void __iomem *base, u32 etm_pid)
if (drvdata->cpu < 0)
return drvdata->cpu;
+ desc.access.io_mem = true;
init_arg.drvdata = drvdata;
init_arg.csa = &desc.access;
init_arg.pid = etm_pid;
Thanks,
Leo
> major = ETM_ARCH_MAJOR_VERSION(drvdata->arch);
> --
> 2.35.3
>
Introduction of TPDM DSB subunit
DSB subunit is responsible for creating a dataset element, and is also
optionally responsible for packing it to fit multiple elements on a
single ATB transfer if possible in the configuration. The TPDM Core
Datapath requests timestamps be stored by the TPDA and then delivering
ATB sized data (depending on ATB width and element size, this could
be smaller or larger than a dataset element) to the ATB Mast FSM.
The DSB subunit must be configured prior to enablement. This series
adds support for TPDM to configure the configure DSB subunit.
Once this series patches are applied properly, the new tpdm nodes for
should be observed at the tpdm path /sys/bus/coresight/devices/tpdm*
which supports DSB subunit.
e.g.
/sys/devices/platform/soc(a)0/69d0000.tpdm/tpdm0#ls -l | grep dsb
-rw-r--r-- 1 root root 4096 Jan 1 00:01 dsb_edge_ctrl
-rw-r--r-- 1 root root 4096 Jan 1 00:01 dsb_edge_ctrl_mask
-rw-r--r-- 1 root root 4096 Jan 1 00:01 dsb_mode
-rw-r--r-- 1 root root 4096 Jan 1 00:01 dsb_patt_mask
-rw-r--r-- 1 root root 4096 Jan 1 00:01 dsb_patt_ts
-rw-r--r-- 1 root root 4096 Jan 1 00:01 dsb_patt_type
-rw-r--r-- 1 root root 4096 Jan 1 00:01 dsb_patt_val
-rw-r--r-- 1 root root 4096 Jan 1 00:01 dsb_trig_patt_mask
-rw-r--r-- 1 root root 4096 Jan 1 00:01 dsb_trig_patt_val
-rw-r--r-- 1 root root 4096 Jan 1 00:01 dsb_trig_ts
-rw-r--r-- 1 root root 4096 Jan 1 00:01 dsb_trig_type
We can use the commands are similar to the below to configure the
TPDMs which support DSB subunit. Enable coresight sink first.
echo 1 > /sys/bus/coresight/devices/tmc_etf0/enable_sink
echo 1 > /sys/bus/coresight/devices/tpdm0/reset
echo 0x3 0x3 0x1 > /sys/bus/coresight/devices/tpdm0/dsb_edge_ctrl_mask
echo 0x6d 0x6d 0 > /sys/bus/coresight/devices/tpdm0/dsb_edge_ctrl
echo 1 > /sys/bus/coresight/devices/tpdm0/dsb_patt_ts
echo 1 > /sys/bus/coresight/devices/tpdm0/dsb_patt_type
echo 0 > /sys/bus/coresight/devices/tpdm0/dsb_trig_ts
echo 0 0xFFFFFFFF > /sys/bus/coresight/devices/tpdm0/dsb_patt_mask
echo 0 0xFFFFFFFF > /sys/bus/coresight/devices/tpdm0/dsb_trig_patt_val
This series applies to coresight/next
https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git?h=next
This patch series depends on patch series "[v12,0/9] Coresight: Add
support for TPDM and TPDA"
https://patchwork.kernel.org/project/linux-arm-kernel/cover/20220905065357.…
Tao Zhang (9):
dt-bindings: arm: Add support for DSB element
coresight-tpda: Add DSB dataset support
coresight-tpdm: Initialize DSB subunit configuration
coresight-tpdm: Add reset node to TPDM node
coresight-tpdm: Add nodes to set trigger timestamp and type
coresight-tpdm: Add node to set dsb programming mode
coresight-tpdm: Add nodes for dsb element creation
coresight-tpdm: Add nodes to configure pattern match output
coresight-tpdm: Add nodes for timestamp request
.../bindings/arm/qcom,coresight-tpda.yaml | 9 +
drivers/hwtracing/coresight/coresight-tpda.c | 62 ++
drivers/hwtracing/coresight/coresight-tpda.h | 4 +
drivers/hwtracing/coresight/coresight-tpdm.c | 625 ++++++++++++++++++++-
drivers/hwtracing/coresight/coresight-tpdm.h | 60 ++
5 files changed, 756 insertions(+), 4 deletions(-)
--
2.7.4
The current method for allocating trace source ID values to sources is
to use a fixed algorithm for CPU based sources of (cpu_num * 2 + 0x10).
The STM is allocated ID 0x1.
This fixed algorithm is used in both the CoreSight driver code, and by
perf when writing the trace metadata in the AUXTRACE_INFO record.
The method needs replacing as currently:-
1. It is inefficient in using available IDs.
2. Does not scale to larger systems with many cores and the algorithm
has no limits so will generate invalid trace IDs for cpu number > 44.
Additionally requirements to allocate additional system IDs on some
systems have been seen.
This patch set introduces an API that allows the allocation of trace IDs
in a dynamic manner.
Architecturally reserved IDs are never allocated, and the system is
limited to allocating only valid IDs.
Each of the current trace sources ETM3.x, ETM4.x and STM is updated to use
the new API.
For the ETMx.x devices IDs are allocated on certain events
a) When using sysfs, an ID will be allocated on hardware enable, or a read of
sysfs TRCTRACEID register and freed when the sysfs reset is written.
b) When using perf, ID is allocated on during setup AUX event, and freed on
event free. IDs are communicated using the AUX_OUTPUT_HW_ID packet.
The ID allocator is notified when perf sessions start and stop
so CPU based IDs are kept constant throughout any perf session.
Note: This patchset breaks some backward compatibility for perf record and
perf report.
The version of the AUXTRACE_INFO has been updated to reflect the fact that
the trace source IDs are generated differently. This will
mean older versions of perf report cannot decode the newer file.
Applies to coresight/next [4d45bc82df66]
Tested on DB410c
Changes since v2:
1) Improved backward compatibility: (requested by James)
Using the new version of perf on an old kernel will generate a usable file
legacy metadata values are set by the new perf and will be used if mew
ID packets are not present in the file.
Using an older version of perf / simpleperf on an updated kernel may still
work. The trace ID allocator has been updated to use the legacy ID values
where possible, so generated file and used trace IDs will match up to the
point where the legacy algorithm is broken anyway.
2) Various changes to the ID allocator and ID packet format.
(suggested by Suzuki)
3) per CPU ID info in allocator now stored as atomic type to allow a passive read
without taking the allocator spinlock. perf flow now allocates and releases ID
values in setup_aux / free_event. Device enable and event enable use the passive
read to set the allocated values. This simplifies the locking mechanisms on the
perf run and fixes issues that arose with locking dependencies.
Changes since v1:
(after feedback & discussion with Mathieu & Suzuki).
1) API has changed. The global trace ID map is managed internally, so it
is no longer passed in to the API functions.
2) perf record does not use sysfs to find the trace IDs. These are now
output as AUX_OUTPUT_HW_ID events. The drivers, perf record, and perf report
have been updated accordingly to generate and handle these events.
Mike Leach (13):
coresight: trace-id: Add API to dynamically assign Trace ID values
coresight: Remove obsolete Trace ID unniqueness checks
coresight: stm: Update STM driver to use Trace ID API
coresight: etm4x: Update ETM4 driver to use Trace ID API
coresight: etm3x: Update ETM3 driver to use Trace ID API
coresight: etmX.X: stm: Remove trace_id() callback
coresight: perf: traceid: Add perf notifiers for Trace ID
perf: cs-etm: Move mapping of Trace ID and cpu into helper function
perf: cs-etm: Update record event to use new Trace ID protocol
kernel: events: Export perf_report_aux_output_id()
perf: cs-etm: Handle PERF_RECORD_AUX_OUTPUT_HW_ID packet
coresight: events: PERF_RECORD_AUX_OUTPUT_HW_ID used for Trace ID
coresight: trace-id: Add debug & test macros to Trace ID allocation
drivers/hwtracing/coresight/Makefile | 2 +-
drivers/hwtracing/coresight/coresight-core.c | 49 +--
.../hwtracing/coresight/coresight-etm-perf.c | 23 ++
drivers/hwtracing/coresight/coresight-etm.h | 3 +-
.../coresight/coresight-etm3x-core.c | 92 +++--
.../coresight/coresight-etm3x-sysfs.c | 27 +-
.../coresight/coresight-etm4x-core.c | 79 ++++-
.../coresight/coresight-etm4x-sysfs.c | 27 +-
drivers/hwtracing/coresight/coresight-etm4x.h | 3 +
drivers/hwtracing/coresight/coresight-stm.c | 49 +--
.../hwtracing/coresight/coresight-trace-id.c | 266 ++++++++++++++
.../hwtracing/coresight/coresight-trace-id.h | 78 +++++
include/linux/coresight-pmu.h | 35 +-
include/linux/coresight.h | 3 -
kernel/events/core.c | 1 +
tools/include/linux/coresight-pmu.h | 48 ++-
tools/perf/arch/arm/util/cs-etm.c | 21 +-
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 7 +
tools/perf/util/cs-etm.c | 331 +++++++++++++++---
tools/perf/util/cs-etm.h | 14 +-
20 files changed, 933 insertions(+), 225 deletions(-)
create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.c
create mode 100644 drivers/hwtracing/coresight/coresight-trace-id.h
--
2.17.1
I'm still leaving out CONFIG_CORESIGHT_SOURCE_ETM4X because it depends
on the outcome of the investigation into CONFIG_PID_IN_CONTEXTIDR, but
I think we should enable these ones for now and start getting some of
the benefits sooner.
Changes since v1:
* Remove CONFIG_CORESIGHT_CTI_INTEGRATION_REGS=y which shouldn't be
enabled by default
-----
As suggested by Catalin here's the change to add Coresight to defconfig.
Unfortunately I don't think we should add CONFIG_CORESIGHT_SOURCE_ETM4X
which builds a few files until [1] is merged because of the overhead
of CONFIG_PID_IN_CONTEXTIDR.
[1]: https://lore.kernel.org/lkml/20211021134530.206216-1-leo.yan@linaro.org/T/
applies to arm64/for-next/core (e99db032d186)
James Clark (1):
arm64: defconfig: Add Coresight as module
arch/arm64/configs/defconfig | 8 ++++++++
1 file changed, 8 insertions(+)
--
2.28.0
On 22/09/2022 11:52, Catalin Marinas wrote:
> On Thu, Sep 22, 2022 at 10:34:45AM +0100, James Clark wrote:
>> On 21/09/2022 16:08, Catalin Marinas wrote:
>>> 2. Always on CONFIG_PID_IN_CONTEXTIDR (we might as well remove the
>>> Kconfig entry). This would write the root pid namespace value
>>> (task_pid_nr()).
>>
>> If we're not worried about the overhead after all, this would be the
>> easiest solution. And then SPE or Coresight already decide whether they
>> want to use the value or not, so no further changes are needed.
>>
>> From Leo's patch there is a table that shows a 1% overhead with it
>> enabled permanently, and I've heard a figure like that mentioned before.
>> So I could also resurrect that patch to use static keys? Although it's a
>> bit more complicated, that would be my preference. And then we can have
>> that mode always on.
>
> I don't think we should bother with static keys, just always enable it
> but try to reduce/group the ISBs from all the functions called on the
> __switch_to() path. We may actually get a speed-up.
>
Ok thanks I will take a look at this
On 21/09/2022 16:08, Catalin Marinas wrote:
> On Wed, Sep 21, 2022 at 03:05:34PM +0100, James Clark wrote:
>> As suggested by Catalin here's the change to add Coresight to defconfig.
>>
>> Unfortunately I don't think we should add CONFIG_CORESIGHT_SOURCE_ETM4X
>> which builds a few files until [1] is merged because of the overhead
>> of CONFIG_PID_IN_CONTEXTIDR.
>>
>> [1]: https://lore.kernel.org/lkml/20211021134530.206216-1-leo.yan@linaro.org/T/
>
> I thought the overhead wasn't the problem, it's mostly negligible. We
> can probably save a few more cycles on the __switch_to() path by
> replacing several isb()s in those functions with a single one just
> before cpu_switch_to().
>
> IIRC the issue is that unless a process runs in the root pid namespace,
> the actual pid written to contextidr is meaningless.
This is true, and Leo has recently disabled it in that scenario in
aab473867fed.
>
> Now that you reminded me of that thread, I see three options (sorry, not
> entirely related to the defconfig updates):
>
> 1. Remove CONFIG_PID_IN_CONTEXTIDR and corresponding code completely,
> find other events to correlate the task with the trace.
Unfortunately when tracing per core we would need kernel timestamps in
the trace to correlate to the switch records. At the moment Coresight is
using a different clock source so it's not possible and we're still
using the context ID to correlate samples.
With FEAT_TRF in v8.4 it will be possible to do this and we've started
working towards that here: 0f00b223ea22. But we'd still have to support
older hardware too, so CONFIG_PID_IN_CONTEXTIDR can't be removed completely.
For SPE it's not required because we already have the right timestamps
in the samples and we've added support for no context IDs in the decoder
here: 27d113cfe892
>
> 2. Always on CONFIG_PID_IN_CONTEXTIDR (we might as well remove the
> Kconfig entry). This would write the root pid namespace value
> (task_pid_nr()).
If we're not worried about the overhead after all, this would be the
easiest solution. And then SPE or Coresight already decide whether they
want to use the value or not, so no further changes are needed.
From Leo's patch there is a table that shows a 1% overhead with it
enabled permanently, and I've heard a figure like that mentioned before.
So I could also resurrect that patch to use static keys? Although it's a
bit more complicated, that would be my preference. And then we can have
that mode always on.
>
> 3. Similar to (2) but instead write task_pid_nr_ns(). An alternative
> here is to write -1 if the task is not in the root pid namespace.
>
> Strong preference for (1).
>
On 21/09/2022 17:46, Mathieu Poirier wrote:
> On Wed, Sep 21, 2022 at 04:26:59PM +0100, Mark Brown wrote:
>> On Wed, Sep 21, 2022 at 03:05:35PM +0100, James Clark wrote:
>>
>>> +CONFIG_CORESIGHT_CTI=m
>>> +CONFIG_CORESIGHT_CTI_INTEGRATION_REGS=y
>>
>
> I agree - integration registers should not be enabled by default.
>
>> Do we want this turned on by default? According to the
>> description it's a bit dangerous and it's exposed via sysfs
>> rather than debugfs.
>
>
Should I disable just CONFIG_CORESIGHT_CTI_INTEGRATION_REGS or
CONFIG_CORESIGHT_CTI as well? There are other writable registers exposed
via sysfs outside of these two options, so I wanted to check if it's
just the integration registers that are the issue.