Hi Daniel,
On Tue, Jul 13, 2021 at 02:15:30PM +0200, Daniel Kiss wrote:
> Keep track of the perf handler that is registred by the first tracer.
> This will be used by the update call from polling.
>
> Signed-off-by: Daniel Kiss <daniel.kiss(a)arm.com>
> Signed-off-by: Branislav Rankov <Branislav.Rankov(a)arm.com>
> ---
> drivers/hwtracing/coresight/coresight-tmc-etr.c | 6 ++++--
> drivers/hwtracing/coresight/coresight-tmc.h | 2 ++
> 2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index 589bb2d56e802..55c9b5fd9f832 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -1503,8 +1503,8 @@ tmc_update_etr_buffer(struct coresight_device *csdev,
>
> spin_lock_irqsave(&drvdata->spinlock, flags);
>
> - /* Don't do anything if another tracer is using this sink */
> - if (atomic_read(csdev->refcnt) != 1) {
> + /* Serve only the tracer with the leading perf handler */
> + if (drvdata->perf_handle != handle) {
In CPU wide trace scenarios the first CPU to enable a sink is not
guaranteed to be the same as the last CPU to use it. As far as I understand the
above assumes the first and last CPUs to use a sink are the same.
> spin_unlock_irqrestore(&drvdata->spinlock, flags);
> goto out;
> }
> @@ -1619,6 +1619,7 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data)
> drvdata->pid = pid;
> drvdata->mode = CS_MODE_PERF;
> drvdata->perf_buf = etr_perf->etr_buf;
> + drvdata->perf_handle = handle;
> atomic_inc(csdev->refcnt);
> }
>
> @@ -1666,6 +1667,7 @@ static int tmc_disable_etr_sink(struct coresight_device *csdev)
> drvdata->mode = CS_MODE_DISABLED;
> /* Reset perf specific data */
> drvdata->perf_buf = NULL;
> + drvdata->perf_handle = NULL;
>
> spin_unlock_irqrestore(&drvdata->spinlock, flags);
>
> diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
> index b91ec7dde7bc9..81583ffb973dc 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc.h
> +++ b/drivers/hwtracing/coresight/coresight-tmc.h
> @@ -184,6 +184,7 @@ struct etr_buf {
> * @idr_mutex: Access serialisation for idr.
> * @sysfs_buf: SYSFS buffer for ETR.
> * @perf_buf: PERF buffer for ETR.
> + * @perf_handle: PERF handle for ETR.
> */
> struct tmc_drvdata {
> void __iomem *base;
> @@ -207,6 +208,7 @@ struct tmc_drvdata {
> struct mutex idr_mutex;
> struct etr_buf *sysfs_buf;
> struct etr_buf *perf_buf;
> + struct perf_output_handle *perf_handle;
> };
>
> struct etr_buf_operations {
> --
> 2.25.1
>
Hi Al and Mike,
> >
> > 16 (0xF) should work for all silicon, as AXI allows burst sizes up to 16.
> > So unless we've missed something, this is an implementation non-compliance
> > and we should not be penalising compliant implementations by reducing the
> > default burst size - the question is how we can enable the upstream kernel
> to
> > workaround the issue on this chip only, and to me that sounds like it
> needs
> > something that can be triggered by a setting in DT/ACPI.
> >
>
> Another possibility would be to introduce an errata workaround in
> Kconfig for your silicon.
> There are a number of these already in KConfig for PE issues e.g.
> CONFIG_ARM64_ERRATUM_826319,
> and we have introduced CONFIG_ETM4X_IMPDEF_FEATURE for silicon
> specific variants in the ETMv4 driver.
>
> The latter config compiles in implementation defined workarounds which
> operate on the basis of matching the AMBA ID for the silicon.
> This means they will operate only if configured in KConfig and only on
> silicon where the workaround is needed.
>
Thanks a lot for the suggestions.
We are thinking of using "part-number" and "designer" rather than device
tree property. While we use standard ARM core IP and Coresight SoC-600 IP,
we cannot differentiate our silicon from others using ETR AMBA ID, PIDR
and CPU MIDR registers.
We are proposing to expose Coresight ROM region to the driver and determine
part number and designer by reading the following fields.
part_number = (PIDR1.PART_1 << 8) | PIDR0.PART_0;
designer = ((PIDR4.DES_2 << 7) & 0xf) |
((PIDR2.DES_1 << 4) & 0x7) |
((PIDR1.DES_0 & 0xf));
Using a combination of part number and designer from ROM region would
help in identifying the Marvell implementation.
This option would be generic and could be helpful for other silicon with
similar issues and can be applied across Coresight components like ETF/ETR.
What are your thoughts on this approach ?
With Regards,
Tanmay
> Regards
>
> Mike
>
>
> > Al
Adds a generic API to allow packet processors to count the amount of bytes per channel
processed and not synced plus any packet header or format errors.
The ETMv4 / ETE packet processor is update to use this API.
API adds ocsd_decode_stats_t structure to contain the statistics. (ocsd_if_types.h)
C-API (ocsd_c_apo.h) adds functions:-
ocsd_dt_get_decode_stats() - get pointer to stats block.
ocsd_dt_reset_decode_stats() - resets the counts to zero. This function operates independently
of the main decoder reset.
This allows for tools such as perf which may reset the decoder multiple times per AUXTRACE_BUFFER
to count stats for the entire buffer rather than each capture block.
Mike Leach (4):
opencsd: Add decode statistics API to packet processor.
opencsd: ETMv4: ETE: Add packet processing stats to decoders.
opencsd: tests: Update test programs to use the packet decoder
statistics API
opencsd: Update readme and version info for v1.2.0
README.md | 5 ++-
decoder/include/common/ocsd_dcd_tree.h | 26 ++++++++++-
decoder/include/common/trc_pkt_proc_base.h | 44 ++++++++++++++++++-
decoder/include/opencsd/c_api/opencsd_c_api.h | 30 ++++++++++++-
decoder/include/opencsd/ocsd_if_types.h | 20 +++++++++
decoder/include/opencsd/ocsd_if_version.h | 6 +--
decoder/source/c_api/ocsd_c_api.cpp | 20 ++++++++-
decoder/source/etmv4/trc_pkt_proc_etmv4i.cpp | 10 ++++-
decoder/source/ocsd_dcd_tree.cpp | 39 ++++++++++++++++
decoder/tests/source/c_api_pkt_print_test.c | 37 +++++++++++++++-
decoder/tests/source/trc_pkt_lister.cpp | 37 +++++++++++++++-
11 files changed, 260 insertions(+), 14 deletions(-)
--
2.17.1
For better organising and easier review, this patch series is extracted
from the patch set "perf: Refine barriers for AUX ring buffer" . When
applying this patch series, it needs to be applied on the top of the
patch series [1].
To support the compat mode in perf tool, the patch 01 adds an new item
in "perf_env" to track if kernel is running in 64-bit mode. This patch
is a preparation for later changes.
Patch 02 introduces compat variant functions for accessing AUX trace's
head and tail, these two functions are defined with weak attribute, so
they can be called when any architectures cannot provide 64-bit value
atomic accessing when perf is in compat mode.
Patch 03 supports compat_auxtrace_mmap__{read_head|write_tail} on Arm
platform. For Arm platform with compat mode, the kernel runs in 64-bit
kernel mode and user space tool runs in 32-bit mode, it uses the
instructions "ldrd" and "strd" for 64-bit value atomicity.
This patch set have been tested on Arm64 Juno platform for the perf tool
is built with compiler arm-linux-gnueabihf-gcc.
[1] https://lore.kernel.org/patchwork/cover/1473916/
Leo Yan (3):
perf env: Track kernel 64-bit mode in environment
perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail}
perf auxtrace arm: Support
compat_auxtrace_mmap__{read_head|write_tail}
tools/perf/arch/arm/util/auxtrace.c | 32 +++++++++++
tools/perf/util/auxtrace.c | 88 +++++++++++++++++++++++++++--
tools/perf/util/auxtrace.h | 22 +++++++-
tools/perf/util/env.c | 24 +++++++-
tools/perf/util/env.h | 3 +
5 files changed, 161 insertions(+), 8 deletions(-)
--
2.25.1
Hi Russell,
On Mon, Aug 23, 2021 at 02:39:18PM +0100, Russell King (Oracle) wrote:
> On Mon, Aug 23, 2021 at 09:30:43PM +0800, Leo Yan wrote:
> > On Mon, Aug 23, 2021 at 01:23:42PM +0100, James Clark wrote:
[...]
> > > For x86, it's possible to include tools/include/asm/atomic.h, but that doesn't
> > > include arch/arm/include/asm/atomic.h and there are some other #ifdefs that might
> > > make it not so easy for Arm. Just wondering if you considered trying to include the
> > > existing one? Or decided that it was easier to duplicate it?
> >
> > Good finding!
> >
> > With you reminding, I recognized that the atomic operations for
> > arm/arm64 should be improved for user space program. So far, perf tool
> > simply uses the compiler's atomic implementations (from
> > asm-generic/atomic-gcc.h) for arm/arm64; but for a more reliable
> > implementation, I think we should improve the user space program with
> > architecture's atomic instructions.
>
> No we should not. Sometimes, what's in the kernel is for the kernel's
> use only, and not for userspace's use. That may be because what works
> in kernel space does not work in userspace.
>
> For example, the ARMv6+ atomic operations can be executed in userspace
> _provided_ they are only used on memory which has an exclusive monitor.
> They can't be used on anything that is not "normal memory".
Okay, IIUC, the requirement for "normal memory" and exclusive monitor
should also apply on aarch64 for ldrex/strex, Load-Acquire and
Store-Release instructions, etc. Otherwise, it's heavily dependent on
the exclusive monitors outside the cache coherency domain (but this is
out of the scopes for CPU).
perf tool is very likely to map memory with "normal memory" but we
cannot say it's always true.
So I agree there have risk for exporting the aarch32/aarch64 atomic
headers to user space.
> Prior to
> ARMv6, the atomic operations rely on disabling interrupts. That
> facility is simply not available to userspace, so these must not be
> made available to userspace.
>
> The same applies to bitops.
>
> We've been here before in the past, when the kernel headers were not
> separated from the user ABI headers, and people would write programs
> that included e.g. bitops.h on x86 because they had optimised bitops
> code. This made the userspace programs very non-portable - without
> re-implementing userspace versions of this stuff in every userspace
> program that did this stuff.
>
> So no, having experienced the effects of this kind of thing in the
> past, the kernel should _not_ export architecture specific code in
> header files to userspace.
>
> Also, it should be pointed out that by doing so, you create a licensing
> issue. If the code is GPLv2, and you build your program such that it
> incorporates GPLv2 code, then if the userspace program is not GPLv2
> compliant, you have a licensing problem, and in effect the program
> can be distributed.
>
> Please do not go down this route.
Thanks a lot for the suggestion and quick response.
Leo
Hi All,
The AXI write burst length is set to 0xF in TMC_AXICTL
register in the TMC ETR driver.
Definition in coresight-tmc.h
#define TMC_AXICTL_WR_BURST_16 0xF00
Marvell CN10K chip uses Coresight SoC-600 IP. Since write
burst length field is implementation defined, the maximum
value supported by our chip is 0x7.
We could not find a way to figure out the maximum supported
value through any of the ETR registers. So can you please
recommend a way to choose the value 0x7 without affecting
other silicons ?
With Regards,
Tanmay
Changes since v2:
Add Leo's reviewed by tag that I missed
James Clark (1):
perf cs-etm: Add warnings for missing DSOs
tools/perf/util/cs-etm.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
--
2.23.0
I'm submitting this as an RFC because there are a few changes I'd
like to get feedback on. The two changes I wanted to make were the
last two for Coresight warnings, but I ended up making some perf-wide
changes along the way.
For #1 (perf tools: Add WARN_ONCE equivalent for UI warnings)
* Does it make sense to add warn once equivalents at all, or
should the once be re-done for each usage?
* Or should there be some kind of generic 'once' wrapper?
For #3 (perf tools: Add disassembly warnings for annotate --stdio)
* If the output is interpreted by any other tools, then adding
these warnings could be an issue, so maybe this change could
be dropped, but no error output at all isn't ideal.
For #4 (perf tools: Add flag for tracking warnings of missing DSOs)
* In theory I could re-use 'annotate_warned', but it might make sense
to rename it in that case, or just leave the new auxtrace_warned and
remove any confusion.
This set applies to perf/core e73f0f0ee7541
Thanks
James
James Clark (6):
perf tools: Add WARN_ONCE equivalent for UI warnings
perf tools: Re-add annotate_warned functionality
perf tools: Add disassembly warnings for annotate --stdio
perf tools: Add flag for tracking warnings of missing DSOs
perf cs-etm: Improve Coresight zero timestamp warning
perf cs-etm: Add warnings for missing DSOs
tools/perf/ui/browsers/annotate.c | 1 +
tools/perf/ui/gtk/annotate.c | 1 +
tools/perf/util/annotate.c | 20 +++++++++++++++++--
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 7 +++++--
tools/perf/util/cs-etm.c | 10 +++++++++-
tools/perf/util/debug.h | 14 +++++++++++++
tools/perf/util/dso.h | 1 +
7 files changed, 49 insertions(+), 5 deletions(-)
--
2.28.0
Decoding ETE already works because it is a superset of
ETMv4, but if any new packet types are found then they will be
ignored by the decoder. This patchset creates an ETE decoder
which can output the new packets and saves a new register that
is required. No new packet types are handled by perf yet, as this
can be added in the future.
This set applies on top of "perf cs-etm: Support TRBE
(unformatted decoding)" on perf/core.
James Clark (6):
perf cs-etm: Refactor initialisation of decoder params.
perf cs-etm: Initialise architecture based on TRCIDR1
perf cs-etm: Save TRCDEVARCH register
perf cs-etm: Update OpenCSD decoder for ETE
perf cs-etm: Create ETE decoder
perf cs-etm: Print the decoder name
tools/build/feature/test-libopencsd.c | 4 +-
tools/perf/arch/arm/util/cs-etm.c | 13 +-
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 151 ++++++++----------
.../perf/util/cs-etm-decoder/cs-etm-decoder.h | 8 +
tools/perf/util/cs-etm.c | 54 ++++++-
tools/perf/util/cs-etm.h | 6 +-
6 files changed, 147 insertions(+), 89 deletions(-)
--
2.28.0