This series fixes the following issues with the TRBE and Self-Hosted trace for CoreSight.
The Self-hosted trace filter control registers are now save-restored across CPU PM event. And more importantly the Trace Filtering is now used to control per ETM session (rather than allowing the trace throughout the life time of the system). i.e, ETM configuration of the given run is used to enforce trace filtering (TRFCR) along with the Trace Exclusion controls in TRCVICTLR.
For the TRBE, we were using the TRUNCATED flag in the AUX buffer on every IRQ to indicate that we may have lost a few bytes of trace. But this causes the event to be disabled until the userspace re-enables it back, even when there is space left in the ring buffer. To make things worse, we were restarting the AUX handle, which would soon be disabled, potentially creating 0 sized records (without truncation), which the perf tool tends to ignore. This might cause the event to be disabled permanently. Also, sometimes we leave the buffer TRUNCATED, but delay the closing of the handle to event schedule out, which could cause significant black out in the trace capture. This was reported by Tamas Zsoldos.
This series removes the use of TRUNCATED flag for every IRQ. Instead, it is only used if we really run out of space in the buffer. And also we make sure the "handle" is closed immediately on TRUNCATED case, which triggers the userspace to take action. The core perf layer has been hardened to handle this case where a "handle" is closed out. Finally, we make sure that the CPU trace is prohibited, when the TRBE is left disabled. The ETE/ETM driver will program the Trace Filtering appropriately since we do this dynamically now with the first half of the series.
Changes since v1 [0]: - Moved TRFCR related accessors to a new header file - Following a discussion, dropped the TRUNCATED flag from the TRBE IRQ handler on WRAP. Instead mark COLLISION. - Added new patches to harden the ETM perf layer to handle an error in the sink driver. - Fix TRBE spurious IRQ handling - Cleanup TRBE driver to make the "TRUNCATE" cases managed at a central place.
[0] https://lkml.kernel.org/r/20210712113830.2803257-1-suzuki.poulose@arm.com
Suzuki K Poulose (10): coresight: etm4x: Save restore TRFCR_EL1 coresight: etm4x: Use Trace Filtering controls dynamically coresight: etm-pmu: Ensure the AUX handle is valid coresight: trbe: Ensure the format flag is set on truncation coresight: trbe: Drop duplicate TRUNCATE flags coresight: trbe: Fix handling of spurious interrupts coresight: trbe: Do not truncate buffer on IRQ coresight: trbe: Unify the enabling sequence coresight: trbe: End the AUX handle on truncation coresight: trbe: Prohibit trace before disabling TRBE
.../hwtracing/coresight/coresight-etm-perf.c | 27 ++++- .../coresight/coresight-etm4x-core.c | 98 ++++++++++++---- drivers/hwtracing/coresight/coresight-etm4x.h | 7 +- .../coresight/coresight-self-hosted-trace.h | 34 ++++++ drivers/hwtracing/coresight/coresight-trbe.c | 109 ++++++++++-------- 5 files changed, 197 insertions(+), 78 deletions(-) create mode 100644 drivers/hwtracing/coresight/coresight-self-hosted-trace.h
When the CPU enters a low power mode, the TRFCR_EL1 contents could be reset. Thus we need to save/restore the TRFCR_EL1 along with the ETM4x registers to allow the tracing.
The TRFCR related helpers are in a new header file, as we need to use them for TRBE in the later patches.
Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- Changes since v1: - Moved the TRFCR helpers in to a new header file --- .../coresight/coresight-etm4x-core.c | 43 +++++++++++++------ drivers/hwtracing/coresight/coresight-etm4x.h | 2 + .../coresight/coresight-self-hosted-trace.h | 25 +++++++++++ 3 files changed, 58 insertions(+), 12 deletions(-) create mode 100644 drivers/hwtracing/coresight/coresight-self-hosted-trace.h
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index da27cd4a3c38..3e548dac9b05 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -39,6 +39,7 @@
#include "coresight-etm4x.h" #include "coresight-etm-perf.h" +#include "coresight-self-hosted-trace.h"
static int boot_enable; module_param(boot_enable, int, 0444); @@ -985,7 +986,7 @@ static void cpu_enable_tracing(struct etmv4_drvdata *drvdata) if (is_kernel_in_hyp_mode()) trfcr |= TRFCR_EL2_CX;
- write_sysreg_s(trfcr, SYS_TRFCR_EL1); + write_trfcr(trfcr); }
static void etm4_init_arch_data(void *info) @@ -1528,7 +1529,7 @@ static void etm4_init_trace_id(struct etmv4_drvdata *drvdata) drvdata->trcid = coresight_get_trace_id(drvdata->cpu); }
-static int etm4_cpu_save(struct etmv4_drvdata *drvdata) +static int __etm4_cpu_save(struct etmv4_drvdata *drvdata) { int i, ret = 0; struct etmv4_save_state *state; @@ -1667,7 +1668,23 @@ static int etm4_cpu_save(struct etmv4_drvdata *drvdata) return ret; }
-static void etm4_cpu_restore(struct etmv4_drvdata *drvdata) +static int etm4_cpu_save(struct etmv4_drvdata *drvdata) +{ + int ret = 0; + + /* Save the TRFCR irrespective of whether the ETM is ON */ + if (drvdata->trfc) + drvdata->save_trfcr = read_trfcr(); + /* + * Save and restore the ETM Trace registers only if + * the ETM is active. + */ + if (local_read(&drvdata->mode) && drvdata->save_state) + ret = __etm4_cpu_save(drvdata); + return ret; +} + +static void __etm4_cpu_restore(struct etmv4_drvdata *drvdata) { int i; struct etmv4_save_state *state = drvdata->save_state; @@ -1763,6 +1780,14 @@ static void etm4_cpu_restore(struct etmv4_drvdata *drvdata) etm4_cs_lock(drvdata, csa); }
+static void etm4_cpu_restore(struct etmv4_drvdata *drvdata) +{ + if (drvdata->trfc) + write_trfcr(drvdata->save_trfcr); + if (drvdata->state_needs_restore) + __etm4_cpu_restore(drvdata); +} + static int etm4_cpu_pm_notify(struct notifier_block *nb, unsigned long cmd, void *v) { @@ -1774,23 +1799,17 @@ static int etm4_cpu_pm_notify(struct notifier_block *nb, unsigned long cmd,
drvdata = etmdrvdata[cpu];
- if (!drvdata->save_state) - return NOTIFY_OK; - if (WARN_ON_ONCE(drvdata->cpu != cpu)) return NOTIFY_BAD;
switch (cmd) { case CPU_PM_ENTER: - /* save the state if self-hosted coresight is in use */ - if (local_read(&drvdata->mode)) - if (etm4_cpu_save(drvdata)) - return NOTIFY_BAD; + if (etm4_cpu_save(drvdata)) + return NOTIFY_BAD; break; case CPU_PM_EXIT: case CPU_PM_ENTER_FAILED: - if (drvdata->state_needs_restore) - etm4_cpu_restore(drvdata); + etm4_cpu_restore(drvdata); break; default: return NOTIFY_DONE; diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h index e5b79bdb9851..82cba16b73a6 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.h +++ b/drivers/hwtracing/coresight/coresight-etm4x.h @@ -921,6 +921,7 @@ struct etmv4_save_state { * @lpoverride: If the implementation can support low-power state over. * @trfc: If the implementation supports Arm v8.4 trace filter controls. * @config: structure holding configuration parameters. + * @save_trfcr: Saved TRFCR_EL1 register during a CPU PM event. * @save_state: State to be preserved across power loss * @state_needs_restore: True when there is context to restore after PM exit * @skip_power_up: Indicates if an implementation can skip powering up @@ -973,6 +974,7 @@ struct etmv4_drvdata { bool lpoverride; bool trfc; struct etmv4_config config; + u64 save_trfcr; struct etmv4_save_state *save_state; bool state_needs_restore; bool skip_power_up; diff --git a/drivers/hwtracing/coresight/coresight-self-hosted-trace.h b/drivers/hwtracing/coresight/coresight-self-hosted-trace.h new file mode 100644 index 000000000000..53b35a28075e --- /dev/null +++ b/drivers/hwtracing/coresight/coresight-self-hosted-trace.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +/* + * Arm v8 Self-Hosted trace support. + * + * Copyright (C) 2021 ARM Ltd. + */ + +#ifndef __CORESIGHT_SELF_HOSTED_TRACE_H +#define __CORESIGHT_SELF_HOSTED_TRACE_H + +#include <asm/sysreg.h> + +static inline u64 read_trfcr(void) +{ + return read_sysreg_s(SYS_TRFCR_EL1); +} + +static inline void write_trfcr(u64 val) +{ + write_sysreg_s(val, SYS_TRFCR_EL1); + isb(); +} + +#endif /* __CORESIGHT_SELF_HOSTED_TRACE_H */
The Trace Filtering support (FEAT_TRF) ensures that the ETM can be prohibited from generating any trace for a given EL. This is much stricter knob, than the TRCVICTLR exception level masks. At the moment, we do a onetime enable trace at user and kernel and leave it untouched for the kernel life time.
This patch makes the switch dynamic, by honoring the filters set by the user and enforcing them in the TRFCR controls. We also rename the cpu_enable_tracing() appropriately to cpu_detect_trace_filtering() and the drvdata member trfc => trfcr to indicate the "value" of the TRFCR_EL1.
Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Al Grant al.grant@arm.com Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- .../coresight/coresight-etm4x-core.c | 61 ++++++++++++++----- drivers/hwtracing/coresight/coresight-etm4x.h | 5 +- .../coresight/coresight-self-hosted-trace.h | 7 +++ 3 files changed, 55 insertions(+), 18 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c index 3e548dac9b05..adba84b29455 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -237,6 +237,43 @@ struct etm4_enable_arg { int rc; };
+/* + * etm4x_prohibit_trace - Prohibit the CPU from tracing at all ELs. + * When the CPU supports FEAT_TRF, we could move the ETM to a trace + * prohibited state by filtering the Exception levels via TRFCR_EL1. + */ +static void etm4x_prohibit_trace(struct etmv4_drvdata *drvdata) +{ + if (drvdata->trfcr) + cpu_prohibit_trace(); +} + +/* + * etm4x_allow_trace - Allow CPU tracing in the respective ELs, + * as configured by the drvdata->config.mode for the current + * session. Even though we have TRCVICTLR bits to filter the + * trace in the ELs, it doesn't prevent the ETM from generating + * a packet (e.g, TraceInfo) that might contain the addresses from + * the excluded levels. Thus we use the additional controls provided + * via the Trace Filtering controls (FEAT_TRF) to make sure no trace + * is generated for the excluded ELs. + */ +static void etm4x_allow_trace(struct etmv4_drvdata *drvdata) +{ + u64 trfcr = drvdata->trfcr; + + /* If the CPU doesn't support FEAT_TRF, nothing to do */ + if (!trfcr) + return; + + if (drvdata->config.mode & ETM_MODE_EXCL_KERN) + trfcr &= ~TRFCR_ELx_ExTRE; + if (drvdata->config.mode & ETM_MODE_EXCL_USER) + trfcr &= ~TRFCR_ELx_E0TRE; + + write_trfcr(trfcr); +} + #ifdef CONFIG_ETM4X_IMPDEF_FEATURE
#define HISI_HIP08_AMBA_ID 0x000b6d01 @@ -441,6 +478,7 @@ static int etm4_enable_hw(struct etmv4_drvdata *drvdata) if (etm4x_is_ete(drvdata)) etm4x_relaxed_write32(csa, TRCRSR_TA, TRCRSR);
+ etm4x_allow_trace(drvdata); /* Enable the trace unit */ etm4x_relaxed_write32(csa, 1, TRCPRGCTLR);
@@ -719,7 +757,6 @@ static int etm4_enable(struct coresight_device *csdev, static void etm4_disable_hw(void *info) { u32 control; - u64 trfcr; struct etmv4_drvdata *drvdata = info; struct etmv4_config *config = &drvdata->config; struct coresight_device *csdev = drvdata->csdev; @@ -746,12 +783,7 @@ static void etm4_disable_hw(void *info) * If the CPU supports v8.4 Trace filter Control, * set the ETM to trace prohibited region. */ - if (drvdata->trfc) { - trfcr = read_sysreg_s(SYS_TRFCR_EL1); - write_sysreg_s(trfcr & ~(TRFCR_ELx_ExTRE | TRFCR_ELx_E0TRE), - SYS_TRFCR_EL1); - isb(); - } + etm4x_prohibit_trace(drvdata); /* * Make sure everything completes before disabling, as recommended * by section 7.3.77 ("TRCVICTLR, ViewInst Main Control Register, @@ -767,9 +799,6 @@ static void etm4_disable_hw(void *info) if (coresight_timeout(csa, TRCSTATR, TRCSTATR_PMSTABLE_BIT, 1)) dev_err(etm_dev, "timeout while waiting for PM stable Trace Status\n"); - if (drvdata->trfc) - write_sysreg_s(trfcr, SYS_TRFCR_EL1); - /* read the status of the single shot comparators */ for (i = 0; i < drvdata->nr_ss_cmp; i++) { config->ss_status[i] = @@ -964,15 +993,15 @@ static bool etm4_init_csdev_access(struct etmv4_drvdata *drvdata, return false; }
-static void cpu_enable_tracing(struct etmv4_drvdata *drvdata) +static void cpu_detect_trace_filtering(struct etmv4_drvdata *drvdata) { u64 dfr0 = read_sysreg(id_aa64dfr0_el1); u64 trfcr;
+ drvdata->trfcr = 0; if (!cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_TRACE_FILT_SHIFT)) return;
- drvdata->trfc = true; /* * If the CPU supports v8.4 SelfHosted Tracing, enable * tracing at the kernel EL and EL0, forcing to use the @@ -986,7 +1015,7 @@ static void cpu_enable_tracing(struct etmv4_drvdata *drvdata) if (is_kernel_in_hyp_mode()) trfcr |= TRFCR_EL2_CX;
- write_trfcr(trfcr); + drvdata->trfcr = trfcr; }
static void etm4_init_arch_data(void *info) @@ -1177,7 +1206,7 @@ static void etm4_init_arch_data(void *info) /* NUMCNTR, bits[30:28] number of counters available for tracing */ drvdata->nr_cntr = BMVAL(etmidr5, 28, 30); etm4_cs_lock(drvdata, csa); - cpu_enable_tracing(drvdata); + cpu_detect_trace_filtering(drvdata); }
static inline u32 etm4_get_victlr_access_type(struct etmv4_config *config) @@ -1673,7 +1702,7 @@ static int etm4_cpu_save(struct etmv4_drvdata *drvdata) int ret = 0;
/* Save the TRFCR irrespective of whether the ETM is ON */ - if (drvdata->trfc) + if (drvdata->trfcr) drvdata->save_trfcr = read_trfcr(); /* * Save and restore the ETM Trace registers only if @@ -1782,7 +1811,7 @@ static void __etm4_cpu_restore(struct etmv4_drvdata *drvdata)
static void etm4_cpu_restore(struct etmv4_drvdata *drvdata) { - if (drvdata->trfc) + if (drvdata->trfcr) write_trfcr(drvdata->save_trfcr); if (drvdata->state_needs_restore) __etm4_cpu_restore(drvdata); diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h index 82cba16b73a6..724819592c2e 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.h +++ b/drivers/hwtracing/coresight/coresight-etm4x.h @@ -919,7 +919,8 @@ struct etmv4_save_state { * @nooverflow: Indicate if overflow prevention is supported. * @atbtrig: If the implementation can support ATB triggers * @lpoverride: If the implementation can support low-power state over. - * @trfc: If the implementation supports Arm v8.4 trace filter controls. + * @trfcr: If the CPU supportfs FEAT_TRF, value of the TRFCR_ELx with + * trace allowed at user and kernel ELs. Otherwise, 0. * @config: structure holding configuration parameters. * @save_trfcr: Saved TRFCR_EL1 register during a CPU PM event. * @save_state: State to be preserved across power loss @@ -972,7 +973,7 @@ struct etmv4_drvdata { bool nooverflow; bool atbtrig; bool lpoverride; - bool trfc; + u64 trfcr; struct etmv4_config config; u64 save_trfcr; struct etmv4_save_state *save_state; diff --git a/drivers/hwtracing/coresight/coresight-self-hosted-trace.h b/drivers/hwtracing/coresight/coresight-self-hosted-trace.h index 53b35a28075e..586d26e0cba3 100644 --- a/drivers/hwtracing/coresight/coresight-self-hosted-trace.h +++ b/drivers/hwtracing/coresight/coresight-self-hosted-trace.h @@ -22,4 +22,11 @@ static inline void write_trfcr(u64 val) isb(); }
+static inline void cpu_prohibit_trace(void) +{ + u64 trfcr = read_trfcr(); + + /* Prohibit tracing at EL0 & the kernel EL */ + write_trfcr(trfcr & ~(TRFCR_ELx_ExTRE | TRFCR_ELx_E0TRE)); +} #endif /* __CORESIGHT_SELF_HOSTED_TRACE_H */
The ETM perf infrastructure closes out a handle during event_stop or on an error in starting the event. In either case, it is possible for a "sink" to update/close the handle, under certain circumstances. (e.g no space in ring buffer.). So, ensure that we handle this gracefully in the PMU driver by verifying the handle is still valid.
Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- .../hwtracing/coresight/coresight-etm-perf.c | 27 ++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c index 6f398377fec9..a6ab603afee4 100644 --- a/drivers/hwtracing/coresight/coresight-etm-perf.c +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c @@ -450,8 +450,15 @@ static void etm_event_start(struct perf_event *event, int flags) fail_disable_path: coresight_disable_path(path); fail_end_stop: - perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED); - perf_aux_output_end(handle, 0); + /* + * Check if the handle is still associated with the event, + * to handle cases where if the sink failed to start the + * trace and TRUNCATED the handle already. + */ + if (READ_ONCE(handle->event)) { + perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED); + perf_aux_output_end(handle, 0); + } fail: event->hw.state = PERF_HES_STOPPED; goto out; @@ -519,7 +526,21 @@ static void etm_event_stop(struct perf_event *event, int mode)
size = sink_ops(sink)->update_buffer(sink, handle, event_data->snk_config); - perf_aux_output_end(handle, size); + /* + * Make sure the handle is still valid as the + * sink could have closed it from an IRQ. + * The sink driver must handle the race with + * update_buffer() and IRQ. Thus either we + * should get a valid handle and valid size + * (which may be 0). + * + * But we should never get a non-zero size with + * an invalid handle. + */ + if (READ_ONCE(handle->event)) + perf_aux_output_end(handle, size); + else + WARN_ON(size); }
/* Disabling the path make its elements available to other sessions */
When the TRBE is stopped on truncating an event, we may not set the FORMAT flag, even though the size of the record is 0. Let us be consistent and not confuse the user. Always set the format flag for TRBE generated records.
Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Leo Yan leo.yan@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-trbe.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 176868496879..446f080f8320 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -132,7 +132,8 @@ static void trbe_stop_and_truncate_event(struct perf_output_handle *handle) * the update_buffer() to return a 0 size. */ trbe_drain_and_disable_local(); - perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED); + perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED | + PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW); *this_cpu_ptr(buf->cpudata->drvdata->handle) = NULL; }
We mark the buffer as TRUNCATED when there is no space left in the buffer. But we do it at different points. __trbe_normal_offset() and also, at all the callers of the above function via compute_trbe_buffer_limit(), when the limit == base (i.e offset = 0 as returned by the __trbe_normal_offset()).
So, given that the callers already mark the buffer as TRUNCATED drop the caller inside the __trbe_normal_offset().
This is in preparation to moving the handling of TRUNCATED into a central place.
Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-trbe.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 446f080f8320..62e1a08f73ff 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -253,13 +253,9 @@ static unsigned long __trbe_normal_offset(struct perf_output_handle *handle) * trbe_base trbe_base + nr_pages * * Perf aux buffer does not have any space for the driver to write into. - * Just communicate trace truncation event to the user space by marking - * it with PERF_AUX_FLAG_TRUNCATED. */ - if (!handle->size) { - perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED); + if (!handle->size) return 0; - }
/* Compute the tail and wakeup indices now that we've aligned head */ tail = PERF_IDX2OFF(handle->head + handle->size, buf); @@ -361,7 +357,6 @@ static unsigned long __trbe_normal_offset(struct perf_output_handle *handle) return limit;
trbe_pad_buf(handle, handle->size); - perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED); return 0; }
On a spurious IRQ, right now we disable the TRBE and then re-enable it back, resetting the "buffer" pointers(i.e BASE, LIMIT and more importantly WRITE) to the original pointers from the AUX handle. This implies that we overwrite any trace that was written so far, (by overwriting TRBPTR) while we should have ignored the IRQ.
This patch cleans the behavior, by only stopping the TRBE if the IRQ was indeed raised, as we can read the TRBSR without stopping the TRBE (Only writes to the TRBSR requires the TRBE disabled). And also, on detecting a spurious IRQ after examining the TRBSR, we simply re-enable the TRBE without touching the other parameters.
Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-trbe.c | 29 ++++++++++---------- 1 file changed, 15 insertions(+), 14 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 62e1a08f73ff..503bea0137ae 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -679,15 +679,16 @@ static int arm_trbe_disable(struct coresight_device *csdev)
static void trbe_handle_spurious(struct perf_output_handle *handle) { - struct trbe_buf *buf = etm_perf_sink_config(handle); + u64 limitr = read_sysreg_s(SYS_TRBLIMITR_EL1);
- buf->trbe_limit = compute_trbe_buffer_limit(handle); - buf->trbe_write = buf->trbe_base + PERF_IDX2OFF(handle->head, buf); - if (buf->trbe_limit == buf->trbe_base) { - trbe_drain_and_disable_local(); - return; - } - trbe_enable_hw(buf); + /* + * If the IRQ was spurious, simply re-enable the TRBE + * back without modifiying the buffer parameters to + * retain the trace collected so far. + */ + limitr |= TRBLIMITR_ENABLE; + write_sysreg_s(limitr, SYS_TRBLIMITR_EL1); + isb(); }
static void trbe_handle_overflow(struct perf_output_handle *handle) @@ -760,12 +761,7 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev) enum trbe_fault_action act; u64 status;
- /* - * Ensure the trace is visible to the CPUs and - * any external aborts have been resolved. - */ - trbe_drain_and_disable_local(); - + /* Reads to TRBSR_EL1 is fine when TRBE is active */ status = read_sysreg_s(SYS_TRBSR_EL1); /* * If the pending IRQ was handled by update_buffer callback @@ -774,6 +770,11 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev) if (!is_trbe_irq(status)) return IRQ_NONE;
+ /* + * Ensure the trace is visible to the CPUs and + * any external aborts have been resolved. + */ + trbe_drain_and_disable_local(); clr_trbe_irq(); isb();
The TRBE driver marks the AUX buffer as TRUNCATED when we get an IRQ on FILL event. This has rather unwanted side-effect of the event being disabled when there may be more space in the ring buffer.
So, instead of TRUNCATE we need a different flag to indicate that the trace may have lost a few bytes (i.e from the point of generating the FILL event until the IRQ is consumed). Anyways, the userspace must use the size from RECORD_AUX headers to restrict the "trace" decoding.
Using PARTIAL flag causes the perf tool to generate the following warning:
Warning: AUX data had gaps in it XX times out of YY!
Are you running a KVM guest in the background?
which is pointlessly scary for a user. The other remaining options are : - COLLISION - Use by SPE to indicate samples collided - Add a new flag - Specifically for CoreSight, doesn't sound so good, if we can re-use something.
Given that we don't already use the "COLLISION" flag, the above behavior can be notified using this flag for CoreSight.
Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: James Clark james.clark@arm.com Cc: Mike Leach mike.leach@linaro.org Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-trbe.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 503bea0137ae..d50f142e86d1 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -615,7 +615,7 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev, * for correct size. Also, mark the buffer truncated. */ write = get_trbe_limit_pointer(); - perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED); + perf_aux_output_flag(handle, PERF_AUX_FLAG_COLLISION); }
offset = write - base; @@ -708,7 +708,7 @@ static void trbe_handle_overflow(struct perf_output_handle *handle) * collection upon the WRAP event, without stopping the source. */ perf_aux_output_flag(handle, PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW | - PERF_AUX_FLAG_TRUNCATED); + PERF_AUX_FLAG_COLLISION); perf_aux_output_end(handle, size); event_data = perf_aux_output_begin(handle, event); if (!event_data) {
Hi Suzuki,
On Fri, 23 Jul 2021 at 13:46, Suzuki K Poulose suzuki.poulose@arm.com wrote:
The TRBE driver marks the AUX buffer as TRUNCATED when we get an IRQ on FILL event. This has rather unwanted side-effect of the event being disabled when there may be more space in the ring buffer.
So, instead of TRUNCATE we need a different flag to indicate that the trace may have lost a few bytes (i.e from the point of generating the FILL event until the IRQ is consumed). Anyways, the userspace must use the size from RECORD_AUX headers to restrict the "trace" decoding.
Using PARTIAL flag causes the perf tool to generate the following warning:
Warning: AUX data had gaps in it XX times out of YY!
Are you running a KVM guest in the background?
which is pointlessly scary for a user. The other remaining options are :
- COLLISION - Use by SPE to indicate samples collided
- Add a new flag - Specifically for CoreSight, doesn't sound so good, if we can re-use something.
What is the user visible behaviour when using COLLISION? The TRUNCATE warning is at least accurate - even if the KVM thing is something of a red herring. It is easier to explain a "scary" warning, than try to debug someones problems if perf is silent or misleading when using the COLLISION flag.
Regards
Mike
Given that we don't already use the "COLLISION" flag, the above behavior can be notified using this flag for CoreSight.
Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: James Clark james.clark@arm.com Cc: Mike Leach mike.leach@linaro.org Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com
drivers/hwtracing/coresight/coresight-trbe.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 503bea0137ae..d50f142e86d1 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -615,7 +615,7 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev, * for correct size. Also, mark the buffer truncated. */ write = get_trbe_limit_pointer();
perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
perf_aux_output_flag(handle, PERF_AUX_FLAG_COLLISION); } offset = write - base;
@@ -708,7 +708,7 @@ static void trbe_handle_overflow(struct perf_output_handle *handle) * collection upon the WRAP event, without stopping the source. */ perf_aux_output_flag(handle, PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW |
PERF_AUX_FLAG_TRUNCATED);
PERF_AUX_FLAG_COLLISION); perf_aux_output_end(handle, size); event_data = perf_aux_output_begin(handle, event); if (!event_data) {
-- 2.24.1
Hi Mike,
On 26/07/2021 13:34, Mike Leach wrote:
Hi Suzuki,
On Fri, 23 Jul 2021 at 13:46, Suzuki K Poulose suzuki.poulose@arm.com wrote:
The TRBE driver marks the AUX buffer as TRUNCATED when we get an IRQ on FILL event. This has rather unwanted side-effect of the event being disabled when there may be more space in the ring buffer.
So, instead of TRUNCATE we need a different flag to indicate that the trace may have lost a few bytes (i.e from the point of generating the FILL event until the IRQ is consumed). Anyways, the userspace must use the size from RECORD_AUX headers to restrict the "trace" decoding.
Using PARTIAL flag causes the perf tool to generate the following warning:
Warning: AUX data had gaps in it XX times out of YY!
Are you running a KVM guest in the background?
which is pointlessly scary for a user. The other remaining options are :
- COLLISION - Use by SPE to indicate samples collided
- Add a new flag - Specifically for CoreSight, doesn't sound so good, if we can re-use something.
What is the user visible behaviour when using COLLISION?
If you meant a Warning from the perf tool (similar to TRUNCATE or PARTIAL), the answer is none. We could add one in the perf tool if you think this is necessary.
The TRUNCATE warning is at least accurate - even if the KVM thing is something of a red herring.
It is easier to explain a "scary" warning, than try to debug someones problems if perf is silent or misleading when using the COLLISION flag.
The RECORD_AUX still has this flag. So, if someone really wanted to know how many times the TRBE fired the IRQ and thus potentially lost a few bytes of the trace, they could always look at this.
Definitely this is not something similar to "TRUNCATED", which we realized the hard way, nor the PARTIAL. But the perf tool could report something similar. Please remember that the perf tool always uses the "size" field from the RECORD_AUX to limit the trace decoding.
So, I am not sure how this could create new problems.
Suzuki
Regards
Mike
Given that we don't already use the "COLLISION" flag, the above behavior can be notified using this flag for CoreSight.
Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: James Clark james.clark@arm.com Cc: Mike Leach mike.leach@linaro.org Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com
drivers/hwtracing/coresight/coresight-trbe.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 503bea0137ae..d50f142e86d1 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -615,7 +615,7 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev, * for correct size. Also, mark the buffer truncated. */ write = get_trbe_limit_pointer();
perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
perf_aux_output_flag(handle, PERF_AUX_FLAG_COLLISION); } offset = write - base;
@@ -708,7 +708,7 @@ static void trbe_handle_overflow(struct perf_output_handle *handle) * collection upon the WRAP event, without stopping the source. */ perf_aux_output_flag(handle, PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW |
PERF_AUX_FLAG_TRUNCATED);
PERF_AUX_FLAG_COLLISION); perf_aux_output_end(handle, size); event_data = perf_aux_output_begin(handle, event); if (!event_data) {
-- 2.24.1
HI Suzuki,
On Mon, 26 Jul 2021 at 17:01, Suzuki K Poulose suzuki.poulose@arm.com wrote:
Hi Mike,
On 26/07/2021 13:34, Mike Leach wrote:
Hi Suzuki,
On Fri, 23 Jul 2021 at 13:46, Suzuki K Poulose suzuki.poulose@arm.com wrote:
The TRBE driver marks the AUX buffer as TRUNCATED when we get an IRQ on FILL event. This has rather unwanted side-effect of the event being disabled when there may be more space in the ring buffer.
So, instead of TRUNCATE we need a different flag to indicate that the trace may have lost a few bytes (i.e from the point of generating the FILL event until the IRQ is consumed). Anyways, the userspace must use the size from RECORD_AUX headers to restrict the "trace" decoding.
Using PARTIAL flag causes the perf tool to generate the following warning:
Warning: AUX data had gaps in it XX times out of YY!
Are you running a KVM guest in the background?
which is pointlessly scary for a user. The other remaining options are :
- COLLISION - Use by SPE to indicate samples collided
- Add a new flag - Specifically for CoreSight, doesn't sound so good, if we can re-use something.
What is the user visible behaviour when using COLLISION?
If you meant a Warning from the perf tool (similar to TRUNCATE or PARTIAL), the answer is none. We could add one in the perf tool if you think this is necessary.
I do - the problem is that we have replaced a visible warning with a silent failure.
While we agree that the side effects of TRUNCATE mean it unfeasible as a solution here - at least the PARTIAL message does give some indication. The average perf user is going to rely on the output from the tool - if there is no warning they will assume all is good, but they have possible non-contiguous trace and no indication of such.
Since we are using a collision flag in a particular context - i.e. coresight trace - we have the chance to provide an appropriate message for this context.
The TRUNCATE warning is at least accurate - even if the KVM thing is something of a red herring.
Sorry - I meant PARTIAL here - but the comment stands otherwise.
It is easier to explain a "scary" warning, than try to debug someones problems if perf is silent or misleading when using the COLLISION flag.
The RECORD_AUX still has this flag. So, if someone really wanted to know how many times the TRBE fired the IRQ and thus potentially lost a few bytes of the trace, they could always look at this.
They could - but how would they know that they needed to - what indicators would they have that the trace was not continuous? The point of the perf tool is that it presents an accurate picture to the user, based on the data collected. Most users aren't going to start digging into the intricacies of the perf data file formats and nor should they have to.
Definitely this is not something similar to "TRUNCATED", which we realized the hard way, nor the PARTIAL. But the perf tool could report something similar. Please remember that the perf tool always uses the "size" field from the RECORD_AUX to limit the trace decoding.
So, I am not sure how this could create new problems.
There is no issue with decode - but if a user is investigating a problem using trace, they need to be aware that some trace might be dropped. That way they can take mitigating action.
Regards
Mike
Suzuki
Regards
Mike
Given that we don't already use the "COLLISION" flag, the above behavior can be notified using this flag for CoreSight.
Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: James Clark james.clark@arm.com Cc: Mike Leach mike.leach@linaro.org Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com
drivers/hwtracing/coresight/coresight-trbe.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 503bea0137ae..d50f142e86d1 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -615,7 +615,7 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev, * for correct size. Also, mark the buffer truncated. */ write = get_trbe_limit_pointer();
perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
perf_aux_output_flag(handle, PERF_AUX_FLAG_COLLISION); } offset = write - base;
@@ -708,7 +708,7 @@ static void trbe_handle_overflow(struct perf_output_handle *handle) * collection upon the WRAP event, without stopping the source. */ perf_aux_output_flag(handle, PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW |
PERF_AUX_FLAG_TRUNCATED);
PERF_AUX_FLAG_COLLISION); perf_aux_output_end(handle, size); event_data = perf_aux_output_begin(handle, event); if (!event_data) {
-- 2.24.1
On 27/07/2021 11:46, Mike Leach wrote:
HI Suzuki,
On Mon, 26 Jul 2021 at 17:01, Suzuki K Poulose suzuki.poulose@arm.com wrote:
Hi Mike,
On 26/07/2021 13:34, Mike Leach wrote:
Hi Suzuki,
On Fri, 23 Jul 2021 at 13:46, Suzuki K Poulose suzuki.poulose@arm.com wrote:
The TRBE driver marks the AUX buffer as TRUNCATED when we get an IRQ on FILL event. This has rather unwanted side-effect of the event being disabled when there may be more space in the ring buffer.
So, instead of TRUNCATE we need a different flag to indicate that the trace may have lost a few bytes (i.e from the point of generating the FILL event until the IRQ is consumed). Anyways, the userspace must use the size from RECORD_AUX headers to restrict the "trace" decoding.
Using PARTIAL flag causes the perf tool to generate the following warning:
Warning: AUX data had gaps in it XX times out of YY! Are you running a KVM guest in the background?
which is pointlessly scary for a user. The other remaining options are : - COLLISION - Use by SPE to indicate samples collided - Add a new flag - Specifically for CoreSight, doesn't sound so good, if we can re-use something.
What is the user visible behaviour when using COLLISION?
If you meant a Warning from the perf tool (similar to TRUNCATE or PARTIAL), the answer is none. We could add one in the perf tool if you think this is necessary.
I do - the problem is that we have replaced a visible warning with a silent failure.
While we agree that the side effects of TRUNCATE mean it unfeasible as a solution here - at least the PARTIAL message does give some indication. The average perf user is going to rely on the output from the tool - if there is no warning they will assume all is good, but they have possible non-contiguous trace and no indication of such.
Since we are using a collision flag in a particular context - i.e. coresight trace - we have the chance to provide an appropriate message for this context.
The TRUNCATE warning is at least accurate - even if the KVM thing is something of a red herring.
Sorry - I meant PARTIAL here - but the comment stands otherwise.
It is easier to explain a "scary" warning, than try to debug someones problems if perf is silent or misleading when using the COLLISION flag.
The RECORD_AUX still has this flag. So, if someone really wanted to know how many times the TRBE fired the IRQ and thus potentially lost a few bytes of the trace, they could always look at this.
They could - but how would they know that they needed to - what indicators would they have that the trace was not continuous? The point of the perf tool is that it presents an accurate picture to the user, based on the data collected. Most users aren't going to start digging into the intricacies of the perf data file formats and nor should they have to.
Definitely this is not something similar to "TRUNCATED", which we realized the hard way, nor the PARTIAL. But the perf tool could report something similar. Please remember that the perf tool always uses the "size" field from the RECORD_AUX to limit the trace decoding.
So, I am not sure how this could create new problems.
There is no issue with decode - but if a user is investigating a problem using trace, they need to be aware that some trace might be dropped. That way they can take mitigating action.
Agreed. This is something that can be done by the perf tool.
Kind regards Suzuki
On 07/27/2021 02:06 PM, Suzuki K Poulose wrote:
On 27/07/2021 11:46, Mike Leach wrote:
HI Suzuki,
On Mon, 26 Jul 2021 at 17:01, Suzuki K Poulose suzuki.poulose@arm.com wrote:
Hi Mike,
On 26/07/2021 13:34, Mike Leach wrote:
Hi Suzuki,
On Fri, 23 Jul 2021 at 13:46, Suzuki K Poulose suzuki.poulose@arm.com wrote:
The TRBE driver marks the AUX buffer as TRUNCATED when we get an IRQ on FILL event. This has rather unwanted side-effect of the event being disabled when there may be more space in the ring buffer.
So, instead of TRUNCATE we need a different flag to indicate that the trace may have lost a few bytes (i.e from the point of generating the FILL event until the IRQ is consumed). Anyways, the userspace must use the size from RECORD_AUX headers to restrict the "trace" decoding.
Using PARTIAL flag causes the perf tool to generate the following warning:
Warning: AUX data had gaps in it XX times out of YY!
Are you running a KVM guest in the background?
which is pointlessly scary for a user. The other remaining options are : - COLLISION - Use by SPE to indicate samples collided - Add a new flag - Specifically for CoreSight, doesn't sound so good, if we can re-use something.
What is the user visible behaviour when using COLLISION?
If you meant a Warning from the perf tool (similar to TRUNCATE or PARTIAL), the answer is none. We could add one in the perf tool if you think this is necessary.
I do - the problem is that we have replaced a visible warning with a silent failure.
While we agree that the side effects of TRUNCATE mean it unfeasible as a solution here - at least the PARTIAL message does give some indication. The average perf user is going to rely on the output from the tool - if there is no warning they will assume all is good, but they have possible non-contiguous trace and no indication of such.
Since we are using a collision flag in a particular context - i.e. coresight trace - we have the chance to provide an appropriate message for this context.
The TRUNCATE warning is at least accurate - even if the KVM thing is something of a red herring.
Sorry - I meant PARTIAL here - but the comment stands otherwise.
It is easier to explain a "scary" warning, than try to debug someones problems if perf is silent or misleading when using the COLLISION flag.
The RECORD_AUX still has this flag. So, if someone really wanted to know how many times the TRBE fired the IRQ and thus potentially lost a few bytes of the trace, they could always look at this.
They could - but how would they know that they needed to - what indicators would they have that the trace was not continuous? The point of the perf tool is that it presents an accurate picture to the user, based on the data collected. Most users aren't going to start digging into the intricacies of the perf data file formats and nor should they have to.
Definitely this is not something similar to "TRUNCATED", which we realized the hard way, nor the PARTIAL. But the perf tool could report something similar. Please remember that the perf tool always uses the "size" field from the RECORD_AUX to limit the trace decoding.
So, I am not sure how this could create new problems.
There is no issue with decode - but if a user is investigating a problem using trace, they need to be aware that some trace might be dropped. That way they can take mitigating action.
Agreed. This is something that can be done by the perf tool.
fyi, I have posted a patch to do this here :
[0] https://lkml.kernel.org/r/20210728091219.527886-1-suzuki.poulose@arm.com
Suzuki
Kind regards Suzuki
Unify the sequence of enabling the TRBE. We do this from event_start and also from the TRBE IRQ handler. Lets move this to a common helper. The only minor functional change is returning an error when we fail to enable the TRBE. This should be handled already.
Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-trbe.c | 34 +++++++++++--------- 1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index d50f142e86d1..6d6aad171c72 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -632,6 +632,20 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev, return size; }
+static int __arm_trbe_enable(struct trbe_buf *buf, + struct perf_output_handle *handle) +{ + buf->trbe_limit = compute_trbe_buffer_limit(handle); + buf->trbe_write = buf->trbe_base + PERF_IDX2OFF(handle->head, buf); + if (buf->trbe_limit == buf->trbe_base) { + trbe_stop_and_truncate_event(handle); + return -ENOSPC; + } + *this_cpu_ptr(buf->cpudata->drvdata->handle) = handle; + trbe_enable_hw(buf); + return 0; +} + static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data) { struct trbe_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); @@ -648,14 +662,8 @@ static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data) cpudata->buf = buf; cpudata->mode = mode; buf->cpudata = cpudata; - buf->trbe_limit = compute_trbe_buffer_limit(handle); - buf->trbe_write = buf->trbe_base + PERF_IDX2OFF(handle->head, buf); - if (buf->trbe_limit == buf->trbe_base) { - trbe_stop_and_truncate_event(handle); - return 0; - } - trbe_enable_hw(buf); - return 0; + + return __arm_trbe_enable(buf, handle); }
static int arm_trbe_disable(struct coresight_device *csdev) @@ -722,14 +730,8 @@ static void trbe_handle_overflow(struct perf_output_handle *handle) *this_cpu_ptr(buf->cpudata->drvdata->handle) = NULL; return; } - buf->trbe_limit = compute_trbe_buffer_limit(handle); - buf->trbe_write = buf->trbe_base + PERF_IDX2OFF(handle->head, buf); - if (buf->trbe_limit == buf->trbe_base) { - trbe_stop_and_truncate_event(handle); - return; - } - *this_cpu_ptr(buf->cpudata->drvdata->handle) = handle; - trbe_enable_hw(buf); + + __arm_trbe_enable(buf, handle); }
static bool is_perf_trbe(struct perf_output_handle *handle)
When we detect that there isn't enough space left to start a meaningful session, we disable the TRBE, marking the buffer as TRUNCATED. But we delay the notification to the perf layer by perf_aux_output_end() until the event is scheduled out. This will cause significant black outs in the trace. Now that the CoreSight PMU layer can handle a closed "AUX" handle properly, we can close the handle as soon as we detect the case, allowing the userspace to collect and re-enable the event.
Also, while in the IRQ handler, move the irq_work_run() after we have updated the handle, to make sure the "TRUNCATED" flag causes the event to be disabled as soon as possible.
Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Cc: Peter Zijlstra (Intel) peterz@infradead.org Cc: Will Deacon will@kernel.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-trbe.c | 25 ++++++++++++-------- 1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 6d6aad171c72..e7567727e8fc 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -134,6 +134,7 @@ static void trbe_stop_and_truncate_event(struct perf_output_handle *handle) trbe_drain_and_disable_local(); perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED | PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW); + perf_aux_output_end(handle, 0); *this_cpu_ptr(buf->cpudata->drvdata->handle) = NULL; }
@@ -699,7 +700,7 @@ static void trbe_handle_spurious(struct perf_output_handle *handle) isb(); }
-static void trbe_handle_overflow(struct perf_output_handle *handle) +static int trbe_handle_overflow(struct perf_output_handle *handle) { struct perf_event *event = handle->event; struct trbe_buf *buf = etm_perf_sink_config(handle); @@ -728,10 +729,10 @@ static void trbe_handle_overflow(struct perf_output_handle *handle) */ trbe_drain_and_disable_local(); *this_cpu_ptr(buf->cpudata->drvdata->handle) = NULL; - return; + return -EINVAL; }
- __arm_trbe_enable(buf, handle); + return __arm_trbe_enable(buf, handle); }
static bool is_perf_trbe(struct perf_output_handle *handle) @@ -762,6 +763,7 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev) struct perf_output_handle *handle = *handle_ptr; enum trbe_fault_action act; u64 status; + bool truncated = false;
/* Reads to TRBSR_EL1 is fine when TRBE is active */ status = read_sysreg_s(SYS_TRBSR_EL1); @@ -786,24 +788,27 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev) if (!is_perf_trbe(handle)) return IRQ_NONE;
- /* - * Ensure perf callbacks have completed, which may disable - * the trace buffer in response to a TRUNCATION flag. - */ - irq_work_run(); - act = trbe_get_fault_act(status); switch (act) { case TRBE_FAULT_ACT_WRAP: - trbe_handle_overflow(handle); + truncated = !!trbe_handle_overflow(handle); break; case TRBE_FAULT_ACT_SPURIOUS: trbe_handle_spurious(handle); break; case TRBE_FAULT_ACT_FATAL: trbe_stop_and_truncate_event(handle); + truncated = true; break; } + + /* + * If the buffer was truncated, ensure perf callbacks + * have completed, which will disable the event. + */ + if (truncated) + irq_work_run(); + return IRQ_HANDLED; }
We must prohibit the CPU from tracing before we disable the TRBE and only re-enable it when we are sure the TRBE has been enabled back. Otherwise, leave the CPU in prohibited state.
Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- .../hwtracing/coresight/coresight-self-hosted-trace.h | 4 +++- drivers/hwtracing/coresight/coresight-trbe.c | 9 +++++++++ 2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/hwtracing/coresight/coresight-self-hosted-trace.h b/drivers/hwtracing/coresight/coresight-self-hosted-trace.h index 586d26e0cba3..34372482a3cd 100644 --- a/drivers/hwtracing/coresight/coresight-self-hosted-trace.h +++ b/drivers/hwtracing/coresight/coresight-self-hosted-trace.h @@ -22,11 +22,13 @@ static inline void write_trfcr(u64 val) isb(); }
-static inline void cpu_prohibit_trace(void) +static inline u64 cpu_prohibit_trace(void) { u64 trfcr = read_trfcr();
/* Prohibit tracing at EL0 & the kernel EL */ write_trfcr(trfcr & ~(TRFCR_ELx_ExTRE | TRFCR_ELx_E0TRE)); + /* Return the original value of the TRFCR */ + return trfcr; } #endif /* __CORESIGHT_SELF_HOSTED_TRACE_H */ diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index e7567727e8fc..b8586c170889 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -16,6 +16,7 @@ #define pr_fmt(fmt) DRVNAME ": " fmt
#include <asm/barrier.h> +#include "coresight-self-hosted-trace.h" #include "coresight-trbe.h"
#define PERF_IDX2OFF(idx, buf) ((idx) % ((buf)->nr_pages << PAGE_SHIFT)) @@ -764,6 +765,7 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev) enum trbe_fault_action act; u64 status; bool truncated = false; + u64 trfcr;
/* Reads to TRBSR_EL1 is fine when TRBE is active */ status = read_sysreg_s(SYS_TRBSR_EL1); @@ -774,6 +776,8 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev) if (!is_trbe_irq(status)) return IRQ_NONE;
+ /* Prohibit the CPU from tracing before we disable the TRBE */ + trfcr = cpu_prohibit_trace(); /* * Ensure the trace is visible to the CPUs and * any external aborts have been resolved. @@ -805,9 +809,14 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev) /* * If the buffer was truncated, ensure perf callbacks * have completed, which will disable the event. + * + * Otherwise, restore the trace filter controls to + * allow the tracing. */ if (truncated) irq_work_run(); + else + write_trfcr(trfcr);
return IRQ_HANDLED; }
On 23/07/2021 13:46, Suzuki K Poulose wrote:
This series fixes the following issues with the TRBE and Self-Hosted trace for CoreSight.
The Self-hosted trace filter control registers are now save-restored across CPU PM event. And more importantly the Trace Filtering is now used to control per ETM session (rather than allowing the trace throughout the life time of the system). i.e, ETM configuration of the given run is used to enforce trace filtering (TRFCR) along with the Trace Exclusion controls in TRCVICTLR.
For the TRBE, we were using the TRUNCATED flag in the AUX buffer on every IRQ to indicate that we may have lost a few bytes of trace. But this causes the event to be disabled until the userspace re-enables it back, even when there is space left in the ring buffer. To make things worse, we were restarting the AUX handle, which would soon be disabled, potentially creating 0 sized records (without truncation), which the perf tool tends to ignore. This might cause the event to be disabled permanently. Also, sometimes we leave the buffer TRUNCATED, but delay the closing of the handle to event schedule out, which could cause significant black out in the trace capture. This was reported by Tamas Zsoldos.
This series removes the use of TRUNCATED flag for every IRQ. Instead, it is only used if we really run out of space in the buffer. And also we make sure the "handle" is closed immediately on TRUNCATED case, which triggers the userspace to take action. The core perf layer has been hardened to handle this case where a "handle" is closed out. Finally, we make sure that the CPU trace is prohibited, when the TRBE is left disabled. The ETE/ETM driver will program the Trace Filtering appropriately since we do this dynamically now with the first half of the series.
The series is also available here :
https://git.gitlab.arm.com/linux-arm/linux-skp.git coresight/trbe-self-hosted-fixes/v2
Suzuki