On Mon, Apr 21, 2025 at 02:58:18PM -0700, Yabin Cui wrote:
> The cs_etm PMU, regardless of the underlying trace sink (ETF, ETR or
> TRBE), doesn't require contiguous pages for its AUX buffer.
Though contiguous pages are not mandatory for TRBE, I would set the
PERF_PMU_CAP_AUX_NO_SG flag for it. This can potentially benefit
performance.
For non per CPU sinks, it is fine to allocate non-contiguous pages.
Thanks,
Leo
> This patch adds the PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES capability
> to the cs_etm PMU. This allows the kernel to allocate non-contiguous
> pages for the AUX buffer, reducing memory fragmentation when using
> cs_etm.
>
> Signed-off-by: Yabin Cui <yabinc(a)google.com>
> ---
> drivers/hwtracing/coresight/coresight-etm-perf.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index f4cccd68e625..c98646eca7f8 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -899,7 +899,8 @@ int __init etm_perf_init(void)
> int ret;
>
> etm_pmu.capabilities = (PERF_PMU_CAP_EXCLUSIVE |
> - PERF_PMU_CAP_ITRACE);
> + PERF_PMU_CAP_ITRACE |
> + PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES);
>
> etm_pmu.attr_groups = etm_pmu_attr_groups;
> etm_pmu.task_ctx_nr = perf_sw_context;
> --
> 2.49.0.805.g082f7c87e0-goog
>
>
On 21/04/2025 10:58 pm, Yabin Cui wrote:
> For PMUs like ARM ETM/ETE, contiguous AUX buffers are unnecessary
> and increase memory fragmentation.
>
> This patch introduces PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES, allowing
> PMUs to request non-contiguous pages for their AUX buffers.
>
> Signed-off-by: Yabin Cui <yabinc(a)google.com>
> ---
> include/linux/perf_event.h | 1 +
> kernel/events/ring_buffer.c | 6 ++++++
> 2 files changed, 7 insertions(+)
>
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 0069ba6866a4..26ca35d6a9f2 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -301,6 +301,7 @@ struct perf_event_pmu_context;
> #define PERF_PMU_CAP_AUX_OUTPUT 0x0080
> #define PERF_PMU_CAP_EXTENDED_HW_TYPE 0x0100
> #define PERF_PMU_CAP_AUX_PAUSE 0x0200
> +#define PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES 0x0400
>
> /**
> * pmu::scope
> diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
> index 5130b119d0ae..87f42f4e8edc 100644
> --- a/kernel/events/ring_buffer.c
> +++ b/kernel/events/ring_buffer.c
> @@ -710,6 +710,12 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
> max_order = ilog2(nr_pages);
> watermark = 0;
> }
> + /*
> + * When the PMU doesn't prefer contiguous AUX buffer pages, favor
> + * low-order allocations to reduce memory fragmentation.
> + */
> + if (event->pmu->capabilities & PERF_PMU_CAP_AUX_NON_CONTIGUOUS_PAGES)
> + max_order = 0;
>
> /*
> * kcalloc_node() is unable to allocate buffer if the size is larger
Hi Yabin,
I was wondering if this is just the opposite of PERF_PMU_CAP_AUX_NO_SG,
and that order 0 should be used by default for all devices to solve the
issue you describe. Because we already have PERF_PMU_CAP_AUX_NO_SG for
devices that need contiguous pages. Then I found commit 5768402fd9c6
("perf/ring_buffer: Use high order allocations for AUX buffers
optimistically") that explains that the current allocation strategy is
an optimization.
Your change seems to decide that for certain devices we want to optimize
for fragmentation rather than performance. If these are rarely used
features specifically when looking at performance should we not continue
to optimize for performance? Or at least make it user configurable?
Thanks
James
This series fixes and improves clock usage in the Arm CoreSight drivers.
Based on the DT binding documents, the trace clock (atclk) is defined in
some CoreSight modules, but the corresponding drivers to support it are
absent. In most cases, the issue is hidden because atclk is shared by
multiple CoreSight modules and is enabled anyway by other drivers.
The first three patches address this issue.
The programming clock (pclk) management in CoreSight drivers does not
use the devm_XXX() variant APIs, so the drivers needs to manually
disable and release clocks for errors and for normal module exit.
However, the drivers miss to disable clocks during module exit.
Another issue is pclk might be enabled twice in init phase - once by
AMBA bus driver, and again by CoreSight drivers.
Patches 04 and 05 fix pclk issues.
The atclk may also not be disabled in CoreSight drivers during module
exit. This is fixed in patch 06.
Patches 07 to 09 refactor the clock related code. Patch 07 makes the
clock enabling sequence consistent. Patch 08 removes redundant
condition checks and adds error handling in runtime PM. Patch 09
consolidats the clock initialization into a central place.
This series is verified on Arm64 Hikey960 platform.
Leo Yan (9):
coresight: tmc: Support atclk
coresight: catu: Support atclk
coresight: etm4x: Support atclk
coresight: Disable programming clock properly
coresight: Avoid enable programming clock duplicately
coresight: Disable trace bus clock properly
coresight: Make clock sequence consistent
coresight: Refactor runtime PM
coresight: Consolidate clock enabling
drivers/hwtracing/coresight/coresight-catu.c | 53 ++++++++++++++++-----------------
drivers/hwtracing/coresight/coresight-catu.h | 1 +
drivers/hwtracing/coresight/coresight-cpu-debug.c | 41 +++++++++-----------------
drivers/hwtracing/coresight/coresight-ctcu-core.c | 24 +++++----------
drivers/hwtracing/coresight/coresight-etb10.c | 18 ++++--------
drivers/hwtracing/coresight/coresight-etm3x-core.c | 17 ++++-------
drivers/hwtracing/coresight/coresight-etm4x-core.c | 32 ++++++++++----------
drivers/hwtracing/coresight/coresight-etm4x.h | 4 ++-
drivers/hwtracing/coresight/coresight-funnel.c | 66 +++++++++++++++---------------------------
drivers/hwtracing/coresight/coresight-replicator.c | 63 ++++++++++++++--------------------------
drivers/hwtracing/coresight/coresight-stm.c | 34 +++++++++-------------
drivers/hwtracing/coresight/coresight-tmc-core.c | 48 +++++++++++++++---------------
drivers/hwtracing/coresight/coresight-tmc.h | 2 ++
drivers/hwtracing/coresight/coresight-tpiu.c | 36 ++++++++++-------------
include/linux/coresight.h | 47 ++++++++++++++++++------------
15 files changed, 206 insertions(+), 280 deletions(-)
--
2.34.1
On 08/04/2025 8:59 pm, Yabin Cui wrote:
> When enabling a SINK or LINK type coresight device fails, the
> associated helpers should be disabled.
>
> Signed-off-by: Yabin Cui <yabinc(a)google.com>
> Suggested-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
> ---
> drivers/hwtracing/coresight/coresight-core.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c
> index fb43ef6a3b1f..a56ba9087538 100644
> --- a/drivers/hwtracing/coresight/coresight-core.c
> +++ b/drivers/hwtracing/coresight/coresight-core.c
> @@ -486,8 +486,10 @@ int coresight_enable_path(struct coresight_path *path, enum cs_mode mode,
> * that need disabling. Disabling the path here
> * would mean we could disrupt an existing session.
> */
> - if (ret)
> + if (ret) {
> + coresight_disable_helpers(csdev);
Hi Yabin,
Unfortunately coresight_disable_helpers() takes a path pointer now so
this needs to be updated.
I tested with that change made and it works ok.
> goto out;
> + }
> break;
> case CORESIGHT_DEV_TYPE_SOURCE:
> /* sources are enabled from either sysFS or Perf */
> @@ -496,10 +498,13 @@ int coresight_enable_path(struct coresight_path *path, enum cs_mode mode,
> parent = list_prev_entry(nd, link)->csdev;
> child = list_next_entry(nd, link)->csdev;
> ret = coresight_enable_link(csdev, parent, child, source);
> - if (ret)
> + if (ret) {
> + coresight_disable_helpers(csdev);
> goto err;
> + }
> break;
> default:
> + coresight_disable_helpers(csdev);
Minor nit, you could collapse these last two into "goto
err_disable_helpers" and add another label before err: that disables
helpers before falling through to err:.
Other than that:
Reviewed-by: James Clark <james.clark(a)linaro.org>
> goto err;
> }
> }
On 08/04/2025 8:59 pm, Yabin Cui wrote:
> When tracing ETM data on multiple CPUs concurrently via the
> perf interface, the CATU device is shared across different CPU
> paths. This can lead to race conditions when multiple CPUs attempt
> to enable or disable the CATU device simultaneously.
>
> To address these race conditions, this patch introduces the
> following changes:
>
> 1. The enable and disable operations for the CATU device are not
> reentrant. Therefore, a spinlock is added to ensure that only
> one CPU can enable or disable a given CATU device at any point
> in time.
>
> 2. A reference counter is used to manage the enable/disable state
> of the CATU device. The device is enabled when the first CPU
> requires it and is only disabled when the last CPU finishes
> using it. This ensures the device remains active as long as at
> least one CPU needs it.
>
> Signed-off-by: Yabin Cui <yabinc(a)google.com>
> ---
> drivers/hwtracing/coresight/coresight-catu.c | 25 +++++++++++++-------
> drivers/hwtracing/coresight/coresight-catu.h | 1 +
> 2 files changed, 18 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-catu.c b/drivers/hwtracing/coresight/coresight-catu.c
> index fa170c966bc3..30b78b2f8adb 100644
> --- a/drivers/hwtracing/coresight/coresight-catu.c
> +++ b/drivers/hwtracing/coresight/coresight-catu.c
> @@ -458,12 +458,17 @@ static int catu_enable_hw(struct catu_drvdata *drvdata, enum cs_mode cs_mode,
> static int catu_enable(struct coresight_device *csdev, enum cs_mode mode,
> void *data)
> {
> - int rc;
> + int rc = 0;
> struct catu_drvdata *catu_drvdata = csdev_to_catu_drvdata(csdev);
> + guard(raw_spinlock_irqsave)(&catu_drvdata->spinlock);
>
Very minor nit only because you need to resend anyway, but there should
be a newline between the variable definitions and the code. Not sure why
checkpatch doesn't warn here.
> - CS_UNLOCK(catu_drvdata->base);
> - rc = catu_enable_hw(catu_drvdata, mode, data);
> - CS_LOCK(catu_drvdata->base);
> + if (csdev->refcnt == 0) {
> + CS_UNLOCK(catu_drvdata->base);
> + rc = catu_enable_hw(catu_drvdata, mode, data);
> + CS_LOCK(catu_drvdata->base);
> + }
> + if (!rc)
> + csdev->refcnt++;
> return rc;
> }
>
> @@ -486,12 +491,15 @@ static int catu_disable_hw(struct catu_drvdata *drvdata)
>
> static int catu_disable(struct coresight_device *csdev, void *__unused)
> {
> - int rc;
> + int rc = 0;
> struct catu_drvdata *catu_drvdata = csdev_to_catu_drvdata(csdev);
> + guard(raw_spinlock_irqsave)(&catu_drvdata->spinlock);
>
> - CS_UNLOCK(catu_drvdata->base);
> - rc = catu_disable_hw(catu_drvdata);
> - CS_LOCK(catu_drvdata->base);
> + if (--csdev->refcnt == 0) {
Hopefully this never underflows if disable is called again after a
failed enable. We could add a WARN_ON() but I think this is a general
case and not specific to these patches so is probably better to do later
as separate change.
Reviewed-by: James Clark <james.clark(a)linaro.org>
The Trace Network On Chip (TNOC) is an integration hierarchy which is a
hardware component that integrates the functionalities of TPDA and
funnels. It collects trace form subsystems and transfers to coresight
sink.
Signed-off-by: Yuanfang Zhang <quic_yuanfang(a)quicinc.com>
---
Changes in v4:
- Fix dt_binding warning.
- update mask of trace_noc amba_id.
- Modify driver comments.
- rename TRACE_NOC_SYN_VAL to TRACE_NOC_SYNC_INTERVAL.
- Link to v3: https://lore.kernel.org/r/20250411-trace-noc-v3-0-1f19ddf7699b@quicinc.com
Changes in v3:
- Remove unnecessary sysfs nodes.
- update commit messages.
- Use 'writel' instead of 'write_relaxed' when writing to the register for the last time.
- Add trace_id ops.
- Link to v2: https://lore.kernel.org/r/20250226-trace-noc-driver-v2-0-8afc6584afc5@quici…
Changes in v2:
- Modified the format of DT binging file.
- Fix compile warnings.
- Link to v1: https://lore.kernel.org/r/46643089-b88d-49dc-be05-7bf0bb21f847@quicinc.com
---
Yuanfang Zhang (2):
dt-bindings: arm: Add device Trace Network On Chip definition
coresight: add coresight Trace Network On Chip driver
.../bindings/arm/qcom,coresight-tnoc.yaml | 111 ++++++++++++
drivers/hwtracing/coresight/Kconfig | 13 ++
drivers/hwtracing/coresight/Makefile | 1 +
drivers/hwtracing/coresight/coresight-tnoc.c | 191 +++++++++++++++++++++
drivers/hwtracing/coresight/coresight-tnoc.h | 34 ++++
5 files changed, 350 insertions(+)
---
base-commit: a2cc6ff5ec8f91bc463fd3b0c26b61166a07eb11
change-id: 20250403-trace-noc-f8286b30408e
Best regards,
--
Yuanfang Zhang <quic_yuanfang(a)quicinc.com>
On 14/04/2025 11:06, Andy Shevchenko wrote:
> On Mon, Mar 31, 2025 at 02:40:51PM +0100, James Clark wrote:
>> On 31/03/2025 8:14 am, Andy Shevchenko wrote:
>
>> Reviewed-by: James Clark <james.clark(a)linaro.org>
>
> Thank you! Can it be applied now?
>
I will queue this via coresight tree, for v6.16
Cheers
Suzuki
The Trace Network On Chip (TNOC) is an integration hierarchy which is a
hardware component that integrates the functionalities of TPDA and
funnels. It collects trace form subsystems and transfers to coresight
sink.
Signed-off-by: Yuanfang Zhang <quic_yuanfang(a)quicinc.com>
---
Changes in v3:
- Remove unnecessary sysfs nodes.
- update commit messages.
- Use 'writel' instead of 'write_relaxed' when writing to the register for the last time.
- Add trace_id ops.
- Link to v2: https://lore.kernel.org/r/20250226-trace-noc-driver-v2-0-8afc6584afc5@quici…
Changes in v2:
- Modified the format of DT binging file.
- Fix compile warnings.
- Link to v1: https://lore.kernel.org/r/46643089-b88d-49dc-be05-7bf0bb21f847@quicinc.com
---
Yuanfang Zhang (2):
dt-bindings: arm: Add device Trace Network On Chip definition
coresight: add coresight Trace Network On Chip driver
.../bindings/arm/qcom,coresight-tnoc.yaml | 111 ++++++++++++
drivers/hwtracing/coresight/Kconfig | 13 ++
drivers/hwtracing/coresight/Makefile | 1 +
drivers/hwtracing/coresight/coresight-tnoc.c | 186 +++++++++++++++++++++
drivers/hwtracing/coresight/coresight-tnoc.h | 34 ++++
5 files changed, 345 insertions(+)
---
base-commit: a2cc6ff5ec8f91bc463fd3b0c26b61166a07eb11
change-id: 20250403-trace-noc-f8286b30408e
Best regards,
--
Yuanfang Zhang <quic_yuanfang(a)quicinc.com>