Hi Steve
On 06/03/2023 05:54, Steve Clevenger wrote:
> An Ampere Computing design decision is MMIO reads are considered the
> same as an external debug access. If TRCOSLAR.OSLK is left set, the
> TRCIDR1 access results in a bus fault followed by a kernel panic. A
> TRCIDR1 read should be valid regardless of TRCOSLAR.OSLK provided MMIO
> access (now deprecated) is supported.
>
> The Ampere work around is to early clear ETM TRCOSLAR.OSLK prior to
> TRCIDR1 access. Do not distinguish between manufacturers.
> AC03_DEBUG_06 is described in the AmpereOne Developer Errata:
>
> Add Ampere ETM PID required for Coresight ETM driver support.
>
> https://solutions.amperecomputing.com/customer-connect/products/AmpereOne-d…
>
> Signed-off-by: Steve Clevenger <scclevenger(a)os.amperecomputing.com>
I went back to the CoreSight ETM4x architecture and this is not an
erratum at your end. This is actually a bug in the ETMv4 driver.
TRCIDR1 is part of the "Trace" and not "Management" registers of
ETMv4. Access to "Trace" registers without OSLK is going to
cause issues. (Un)Fortunately, we never hit this on existing
systems, but that doesn't mean it can stay there.
So we should really fix our detection and only rely on
accessing the TRCDEVARCH to detect the device to be ETMv4.
I have a sent out the fix here [0], are you able to test it :
[0]
https://lkml.kernel.org/r/20230317115728.1358368-1-suzuki.poulose@arm.com
Suzuki
On the HiSilicon platform, there is one ETM per logic core.
When setting up SMT, not every process has an ETM. So the path
".../cs_etm/cpux/trcidr/trcidr0" does not exist, the function
perf_pmu__scan_file() return an error. However, the command
'perf record' will returns silently and don't print.
After this patch, log a error when read fails that makes it easy
for users to debug.
root@localhost:/# perf record -e /cs_etm/@ultra_smb0/ -C 3 uname -a
cs_etm: can't read file cpu3/trcidr/trcidr0
root@localhost:/#
Signed-off-by: Junhao He <hejunhao3(a)huawei.com>
---
tools/perf/arch/arm/util/cs-etm.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
index 7f71c8a237ff..19ea17430399 100644
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -585,7 +585,6 @@ cs_etm_info_priv_size(struct auxtrace_record *itr __maybe_unused,
static bool cs_etm_is_etmv4(struct auxtrace_record *itr, int cpu)
{
- bool ret = false;
char path[PATH_MAX];
int scan;
unsigned int val;
@@ -600,9 +599,10 @@ static bool cs_etm_is_etmv4(struct auxtrace_record *itr, int cpu)
/* The file was read successfully, we have a winner */
if (scan == 1)
- ret = true;
+ return true;
- return ret;
+ pr_err("%s: can't read file %s\n", CORESIGHT_ETM_PMU_NAME, path);
+ return false;
}
static int cs_etm_get_ro(struct perf_pmu *pmu, int cpu, const char *path)
--
2.33.0
On 16/03/2023 03:20, Hao Zhang wrote:
> Introduction of Coresight Dummy subunit
> The Coresight Dummy subunit is for Coresight Dummy component, there are some
> specific Coresight devices that HLOS don't have permission to access. Such as
What is HLOS ?
> some TPDMs, they would be configured in NON-HLOS side, but it's necessary to
What is NON-HLOS ?
> build Coresight path for it to debug. So there need driver to register dummy
> devices as Coresight devices.
Build a path for who to debug ? If this is used by some privileged
software, shouldn't that do all of the work ?
Suzuki
>
> Commit link:
> https://git.codelinaro.org/clo/linux-kernel/coresight/-/tree/coresight-dummy
>
> Hao Zhang (3):
> Coresight: Add coresight dummy driver
> dt-bindings: arm: Add Coresight Dummy Trace YAML schema
> Documentation: trace: Add documentation for Coresight Dummy Trace
>
> .../bindings/arm/qcom,coresight-dummy.yaml | 129 +++++++++++++
> .../trace/coresight/coresight-dummy.rst | 58 ++++++
> drivers/hwtracing/coresight/Kconfig | 11 ++
> drivers/hwtracing/coresight/Makefile | 1 +
> drivers/hwtracing/coresight/coresight-dummy.c | 176 ++++++++++++++++++
> 5 files changed, 375 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/arm/qcom,coresight-dummy.yaml
> create mode 100644 Documentation/trace/coresight/coresight-dummy.rst
> create mode 100644 drivers/hwtracing/coresight/coresight-dummy.c
>
This is printed as a warning but it is normal behavior that users
shouldn't be expected to do anything about. Reduce the warning level to
debug3 so it's only seen in verbose mode to avoid confusion.
Signed-off-by: James Clark <james.clark(a)arm.com>
---
tools/perf/arch/arm/util/cs-etm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
index 7f71c8a237ff..59b50dd70330 100644
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -694,8 +694,8 @@ static void cs_etm_save_etmv4_header(__u64 data[], struct auxtrace_record *itr,
data[CS_ETMV4_TS_SOURCE] = (__u64) cs_etm_get_ro_signed(cs_etm_pmu, cpu,
metadata_etmv4_ro[CS_ETMV4_TS_SOURCE]);
else {
- pr_warning("[%03d] pmu file 'ts_source' not found. Fallback to safe value (-1)\n",
- cpu);
+ pr_debug3("[%03d] pmu file 'ts_source' not found. Fallback to safe value (-1)\n",
+ cpu);
data[CS_ETMV4_TS_SOURCE] = (__u64) -1;
}
}
@@ -729,8 +729,8 @@ static void cs_etm_save_ete_header(__u64 data[], struct auxtrace_record *itr, in
data[CS_ETE_TS_SOURCE] = (__u64) cs_etm_get_ro_signed(cs_etm_pmu, cpu,
metadata_ete_ro[CS_ETE_TS_SOURCE]);
else {
- pr_warning("[%03d] pmu file 'ts_source' not found. Fallback to safe value (-1)\n",
- cpu);
+ pr_debug3("[%03d] pmu file 'ts_source' not found. Fallback to safe value (-1)\n",
+ cpu);
data[CS_ETE_TS_SOURCE] = (__u64) -1;
}
}
--
2.34.1
On Thu, Mar 09, 2023 at 10:06:41AM -0800, Yang Shi wrote:
[...]
> > I reviewed your shared dump, the bad and good perf data both contain the
> > dummy event with 'text_poke = 1'. Could you confirm the shared dump in
> > your previous email is correct or not?
>
> Oops, sorry. I pasted the wrong log. The good one looks like
> (generated by v5.19):
>
> # captured on : Wed Mar 8 18:02:58 2023
> # header version : 1
> # data offset : 408
> # data size : 22640
> # feat offset : 23048
> # hostname : fedora
> # os release : 6.2.0-coresight+
> # perf version : 5.19.g3d7cb6b04c3f
> # arch : aarch64
> # nrcpus online : 128
> # nrcpus avail : 128
> # cpuid : 0x00000000c00fac30
> # total memory : 2108862504 kB
> # cmdline : /home/yshi/linux/tools/perf/perf record -e
> cs_etm/@tmc_etf63/k --kcore --per-thread -- taskset --cpu-list 1 uname
> # event : name = cs_etm/@tmc_etf63/k, , id = { 3832 }, type = 9, size
> = 128, { sample_period, sample_freq } = 1, sample_type =
> IP|TID|IDENTIFIER, read_format = ID, d
> isabled = 1, exclude_user = 1, exclude_hv = 1, enable_on_exec = 1,
> sample_id_all = 1, { bp_len, config2 } = 0x12792918
> # event : name = dummy:u, , id = { 3833 }, type = 1, size = 128,
> config = 0x9, { sample_period, sample_freq } = 1, sample_type =
> IP|TID|IDENTIFIER, read_format = ID,
> disabled = 1, exclude_kernel = 1, exclude_hv = 1, mmap = 1, comm = 1,
> enable_on_exec = 1, task = 1, sample_id_all = 1, exclude_guest = 1,
> mmap2 = 1, comm_exec = 1,
> context_switch = 1, ksymbol = 1, bpf_event = 1
> # CPU_TOPOLOGY info available, use -I to display
> # NUMA_TOPOLOGY info available, use -I to display
> # pmu mappings: armv8_pmuv3_0 = 8, software = 1, arm_cmn_0 = 10,
> uprobe = 7, cs_etm = 9, breakpoint = 5, tracepoint = 2, arm_cmn_1 =
> 11, kprobe = 6
> # contains AUX area data (e.g. instruction trace)
> # CACHE info available, use -I to display
> # time of first sample : 18446744073.709551
> # time of last sample : 18446744073.709551
> # sample duration : 0.000 ms
> # MEM_TOPOLOGY info available, use -I to display
> # missing features: TRACING_DATA CPUDESC BRANCH_STACK GROUP_DESC STAT
> CLOCKID DIR_FORMAT COMPRESSED CPU_PMU_CAPS CLOCK_DATA HYBRID_TOPOLOGY
> HYBRID_CPU_PMU_CAPS
Thanks for confirmation.
Just a quick summary, here we have two issues:
- With command:
perf record -e cs_etm/@tmc_etf63/k --kcore --per-thread \
-- taskset --cpu-list 1 uname",
perf doesn't enable "text poke" attribution.
- With command:
perf record -e cs_etm/@tmc_etf63/k --kcore --per-thread \
-- taskset --cpu-list 1 true (in your previous email), or ...
perf record --kcore -e cs_etm/@tmc_etf63/k --per-thread \
-- taskset --cpu-list 1 uname (in your shared perf data file),
perf enables "text poke" attribution, in this case, perf fails to decode
Arm CoreSight trace data.
[...]
> > Do you mind to share the bad perf.data file with James and me?
>
> Please check the attachment out. Thanks for looking into this problem.
Thank you for sharing the data. We will look into it.
Leo
On Thu, Mar 09, 2023 at 12:46:55PM -0800, Steve Clevenger wrote:
[...]
> > #include <linux/io-64-nonatomic-hi-lo.h>
> >
> > static inline u64 etm4x_relaxed_read64(struct csdev_access *csa, unsigned int offset)
> > {
> > if (csa->io_mem) {
> > if (csa->no_quad_mmio)
> > return hi_lo_readq_relaxed(csa->base + offset);
> > else
> > return readq_relaxed(csa->base + offset);
> > } else {
> > return read_etm4x_sysreg_offset(offset, true);
> > }
> > }
> >
> > I'm not saying to play magic on {writeq|readq}_relaxed(), would it be
> > fine to directly call hi_lo_readq_relaxed()?
>
> I can do this, and it would work fine. Without compile protection, one
> concern is the silent inclusion of io-64-nonatomic-hi-lo.h definitions
> for readq_relaxed/writeq_relaxed if the include hierarchy gets changed.
> The only (minor) risk I see is to performance.
Yeah, we could include io.h ahead io-64-nonatomic-hi-lo.h:
#include <linux/io.h>
#include <linux/io-64-nonatomic-lo-hi.h>
Thus we can get the definitions for {readq|writeq}_relaxed() from
"io.h".
> Second, it seems to me the hi_lo and lo_hi macros were intended to deal
> with endianness. IMO, the implementation I offered is agnostic without a
> hint to endianness or atomicity.
Seems to me, kernel functions with explict endianness naming is not bad
thing :)
> The CoreSight maintainer audience is
> limited and already aware of these issues, so I'll make the change for you.
Thanks! Suzuki, Mike, I think it's good to get your opinion before
Steve can proceed for updating patch.
Leo
Hi Steve,
On Wed, Mar 08, 2023 at 11:08:34AM -0800, Steve Clevenger wrote:
[...]
> I'm familiar with this interface. The reason I chose not to use it is
> this solution is configured at compile time. The writeq_relaxed (used by
> etm4x_relaxed_write64) and readq_relaxed (used by etm4x_relaxed_read64)
> otherwise default to 64-bit access. I was uncertain how a compile time
> change would go over with the maintainers. Correct me, but compile time
> configurations in the kernel seem to be related to ARM64 erratum, versus
> manufacturer specific?
>
> My take (based on limited knowledge) of the changes necessary to support
> a compile time decision seemed to exceed the changes compared with the
> implementation I submitted upstream. If this becomes a sticking point,
> let me know.
I am suggesting something like below:
#include <linux/io-64-nonatomic-hi-lo.h>
static inline u64 etm4x_relaxed_read64(struct csdev_access *csa, unsigned int offset)
{
if (csa->io_mem) {
if (csa->no_quad_mmio)
return hi_lo_readq_relaxed(csa->base + offset);
else
return readq_relaxed(csa->base + offset);
} else {
return read_etm4x_sysreg_offset(offset, true);
}
}
I'm not saying to play magic on {writeq|readq}_relaxed(), would it be
fine to directly call hi_lo_readq_relaxed()?
Thanks,
Leo
Hi Yang,
On Wed, Mar 08, 2023 at 11:56:38AM -0800, Yang Shi wrote:
[...]
> > Dumping raw events could show the events from the bad data file. But
> > it has zero samples after event collapse.
> >
> > The only difference is --kcore inserted a new text_poke dummy event.
> > It seems coresight also inserted a dummy event with my command but
> > your command didn't. So it seems like the two dummy events confused
> > event collapse.
> >
> > The text_poke dummy event is added by commit
> > f42c0ce573df79d1b8bd169008c994dcdd43585a ("perf record: Always get
> > text_poke events with --kcore option"). If I reverted this commit,
> > then it works. But I'm not sure whether this is the right fix or real
> > root cause or not. Or coresight shouldn't insert its own dummy event?
>
> It seems like coresight needs to insert the dummy event if
> full_auxtrace is on IIUC. So it sounds like event collapse can't
> handle such a case?
I am struggling to understand the meaning "event collapse" :)
I reviewed your shared dump, the bad and good perf data both contain the
dummy event with 'text_poke = 1'. Could you confirm the shared dump in
your previous email is correct or not?
> I also tried v5.19 (before "perf record: Always
> get text_poke events with --kcore option", which was merged in v6.0),
> it works. So it seems like a regression.
Yeah, we need to fix it. I am not sure the Linux kernel for Arm64
supports text poke or not (kernel needs some specific handling when
alter instructions), the kernel change is the prerequisites.
On the other hand, in the current code cs-etm misses to handle the
event PERF_RECORD_TEXT_POKE in the function cs_etm__process_event().
This might be the cause for the failure.
Do you mind to share the bad perf.data file with James and me?
Thanks,
Leo
Currently there is a refcount leak in CTI when using system wide mode
or tracing multithreaded applications. See the last commit for a
reproducer. This prevents the module from being unloaded.
Historically there have been a few issues and fixes attempted around
here which have resulted in some extra logic and a member to keep
track of CTI being enabled 'struct coresight_device->ect_enabled'.
The fix in commit 665c157e0204 ("coresight: cti: Fix hang in
cti_disable_hw()") was also related to CTI having its own
enable/disable path which came later than other devices.
If we make CTI a helper device and enable helper devices adjacent to
the path we get very similar enable/disable behavior to now, but with
more reuse of the existing reference counting logic in the coresight
core code. This also affects CATU which can have a little bit of
its hard coded enable/disable code removed.
Enabling CATU on the generic path does require that input connections
are tracked so that it can get its associated ETR buffer.
Applies to coresight/next (669c4614236a7) but also requires the
realloc_array patch here [1].
Also available in full here [2].
[1]: https://lore.kernel.org/linux-arm-kernel/20230306152723.3090195-1-james.cla…
[2]: https://gitlab.arm.com/linux-arm/linux-jc/-/tree/james-cs-cti-module-refcou…
James Clark (8):
coresight: Use enum type for cs_mode wherever possible
coresight: Change name of pdata->conns
coresight: Rename nr_outports to nr_outconns
coresight: Dynamically add connections
coresight: Store in-connections as well as out-connections
coresight: Refactor out buffer allocation function for ETR
coresight: Enable and disable helper devices adjacent to the path
coresight: Fix CTI module refcount leak by making it a helper device
drivers/hwtracing/coresight/coresight-catu.c | 34 ++-
drivers/hwtracing/coresight/coresight-core.c | 258 +++++++++++-------
.../hwtracing/coresight/coresight-cti-core.c | 56 ++--
.../hwtracing/coresight/coresight-cti-sysfs.c | 4 +-
drivers/hwtracing/coresight/coresight-cti.h | 4 +-
drivers/hwtracing/coresight/coresight-etb10.c | 3 +-
.../coresight/coresight-etm3x-core.c | 6 +-
.../coresight/coresight-etm4x-core.c | 6 +-
.../hwtracing/coresight/coresight-platform.c | 168 +++++++++---
drivers/hwtracing/coresight/coresight-priv.h | 9 +-
drivers/hwtracing/coresight/coresight-stm.c | 6 +-
drivers/hwtracing/coresight/coresight-sysfs.c | 1 -
.../hwtracing/coresight/coresight-tmc-etf.c | 2 +-
.../hwtracing/coresight/coresight-tmc-etr.c | 88 +++---
drivers/hwtracing/coresight/coresight-tmc.h | 2 +
drivers/hwtracing/coresight/coresight-tpdm.c | 4 +-
drivers/hwtracing/coresight/coresight-tpiu.c | 3 +-
drivers/hwtracing/coresight/coresight-trbe.c | 3 +-
drivers/hwtracing/coresight/ultrasoc-smb.c | 3 +-
drivers/hwtracing/coresight/ultrasoc-smb.h | 2 +-
include/linux/coresight.h | 92 ++++---
21 files changed, 483 insertions(+), 271 deletions(-)
--
2.34.1