CoreSight August 2021

coresight@lists.linaro.org

11 participants
27 discussions

Re: [PATCH] coresight: cti: Correct the parameter for pm_runtime_put

by Leo Yan

Hi Tao, On Thu, Aug 19, 2021 at 05:29:37PM +0800, Tao Zhang wrote: > The input parameter of the function pm_runtime_put should be the > same in the function cti_enable_hw and cti_disable_hw. The correct > parameter to use here should be dev->parent. > > Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com> > --- > drivers/hwtracing/coresight/coresight-cti-core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c > index e2a3620..8988b2e 100644 > --- a/drivers/hwtracing/coresight/coresight-cti-core.c > +++ b/drivers/hwtracing/coresight/coresight-cti-core.c > @@ -175,7 +175,7 @@ static int cti_disable_hw(struct cti_drvdata *drvdata) > coresight_disclaim_device_unlocked(csdev); > CS_LOCK(drvdata->base); > spin_unlock(&drvdata->spinlock); > - pm_runtime_put(dev); > + pm_runtime_put(dev->parent); coresight_register() allocates data structure 'coresight_device' and assigns probed device to the field 'coresight_device::dev::parent'; thus afterwards we need to pass 'coresight_device::dev::parent' for pm_runtime_xxx() functions. It's not directive for understanding, so log the info at here. For the patch: Reviewed-by: Leo Yan <leo.yan(a)linaro.org> We could wait for Mike to review as well. Thanks, Leo > return 0; > > /* not disabled this call */ > -- > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, > a Linux Foundation Collaborative Project >

3 years, 10 months

[PATCH v3] coresight: tmc-etr: Speed up for bounce buffer in flat mode

by Leo Yan

The AUX bounce buffer is allocated with API dma_alloc_coherent(), in the low level's architecture code, e.g. for Arm64, it maps the memory with the attribution "Normal non-cacheable"; this can be concluded from the definition for pgprot_dmacoherent() in arch/arm64/include/asm/pgtable.h. Later when access the AUX bounce buffer, since the memory mapping is non-cacheable, it's low efficiency due to every load instruction must reach out DRAM. This patch changes to allocate pages with alloc_pages_node(), thus the driver can access the memory with cacheable mapping in the kernel linear virtual address; therefore, because load instructions can fetch data from cache lines rather than always read data from DRAM, the driver can boost memory coping performance. After using the cacheable mapping, the driver uses dma_sync_single_for_cpu() to invalidate cacheline prior to read bounce buffer so can avoid read stale trace data. By measurement the duration for function tmc_update_etr_buffer() with ftrace function_graph tracer, it shows the performance significant improvement for copying 4MiB data from bounce buffer: # echo tmc_etr_get_data_flat_buf > set_graph_notrace // avoid noise # echo tmc_update_etr_buffer > set_graph_function # echo function_graph > current_tracer before: # CPU DURATION FUNCTION CALLS # | | | | | | | 2) | tmc_update_etr_buffer() { ... 2) # 8148.320 us | } after: # CPU DURATION FUNCTION CALLS # | | | | | | | 2) | tmc_update_etr_buffer() { ... 2) # 2463.980 us | } Signed-off-by: Leo Yan <leo.yan(a)linaro.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose(a)arm.com> --- Changes from v2: Sync the entire buffer in one go when the tracing is wrap around (Suzuki); Add Suzuki's review tage. Changes from v1: Set "flat_buf->daddr" to 0 when fails to map DMA region; and dropped the unexpected if condition change in tmc_etr_free_flat_buf(). .../hwtracing/coresight/coresight-tmc-etr.c | 47 ++++++++++++++++--- 1 file changed, 40 insertions(+), 7 deletions(-) diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index 13fd1fc730ed..ac37e9376d2b 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -21,6 +21,7 @@ struct etr_flat_buf { struct device *dev; + struct page *pages; dma_addr_t daddr; void *vaddr; size_t size; @@ -600,6 +601,7 @@ static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata, { struct etr_flat_buf *flat_buf; struct device *real_dev = drvdata->csdev->dev.parent; + ssize_t aligned_size; /* We cannot reuse existing pages for flat buf */ if (pages) @@ -609,11 +611,18 @@ static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata, if (!flat_buf) return -ENOMEM; - flat_buf->vaddr = dma_alloc_coherent(real_dev, etr_buf->size, - &flat_buf->daddr, GFP_KERNEL); - if (!flat_buf->vaddr) { - kfree(flat_buf); - return -ENOMEM; + aligned_size = PAGE_ALIGN(etr_buf->size); + flat_buf->pages = alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, + get_order(aligned_size)); + if (!flat_buf->pages) + goto fail_alloc_pages; + + flat_buf->vaddr = page_address(flat_buf->pages); + flat_buf->daddr = dma_map_page(real_dev, flat_buf->pages, 0, + aligned_size, DMA_FROM_DEVICE); + if (dma_mapping_error(real_dev, flat_buf->daddr)) { + flat_buf->daddr = 0; + goto fail_dma_map_page; } flat_buf->size = etr_buf->size; @@ -622,6 +631,12 @@ static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata, etr_buf->mode = ETR_MODE_FLAT; etr_buf->private = flat_buf; return 0; + +fail_dma_map_page: + __free_pages(flat_buf->pages, get_order(aligned_size)); +fail_alloc_pages: + kfree(flat_buf); + return -ENOMEM; } static void tmc_etr_free_flat_buf(struct etr_buf *etr_buf) @@ -630,15 +645,20 @@ static void tmc_etr_free_flat_buf(struct etr_buf *etr_buf) if (flat_buf && flat_buf->daddr) { struct device *real_dev = flat_buf->dev->parent; + ssize_t aligned_size = PAGE_ALIGN(etr_buf->size); - dma_free_coherent(real_dev, flat_buf->size, - flat_buf->vaddr, flat_buf->daddr); + dma_unmap_page(real_dev, flat_buf->daddr, aligned_size, + DMA_FROM_DEVICE); + __free_pages(flat_buf->pages, get_order(aligned_size)); } kfree(flat_buf); } static void tmc_etr_sync_flat_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp) { + struct etr_flat_buf *flat_buf = etr_buf->private; + struct device *real_dev = flat_buf->dev->parent; + /* * Adjust the buffer to point to the beginning of the trace data * and update the available trace data. @@ -648,6 +668,19 @@ static void tmc_etr_sync_flat_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp) etr_buf->len = etr_buf->size; else etr_buf->len = rwp - rrp; + + /* + * The driver always starts tracing at the beginning of the buffer, + * the only reason why we would get a wrap around is when the buffer + * is full. Sync the entire buffer in one go for this case. + */ + if (etr_buf->offset + etr_buf->len > etr_buf->size) + dma_sync_single_for_cpu(real_dev, flat_buf->daddr, + etr_buf->size, DMA_FROM_DEVICE); + else + dma_sync_single_for_cpu(real_dev, + flat_buf->daddr + etr_buf->offset, + etr_buf->len, DMA_FROM_DEVICE); } static ssize_t tmc_etr_get_data_flat_buf(struct etr_buf *etr_buf, -- 2.25.1

3 years, 10 months

Re: Query regarding AXI Write Burst Length in ETR driver

by Tanmay Jagdale

Hi Al and Leo, > Hi Al, > > On Tue, Aug 24, 2021 at 02:55:07PM +0000, Al Grant wrote: > > [...] > > > > > > > > > 16 (0xF) should work for all silicon, as AXI allows burst > sizes up to 16. > > > > > > > > So unless we've missed something, this is an implementation > > > > > > > > non-compliance and we should not be penalising compliant > > > > > > > > implementations by reducing the default burst size - the > > > > > > > > question is how we can enable the upstream kernel to > > > > > > > > workaround the issue on this chip only, and to me that sounds > > > > > > > > like it needs something that can be triggered by a setting in > DT/ACPI. > > [...] > > > > we can add an optional property like below: > > > > > > * Optional property for TMC: > > > > > > * arm,burst-size: burst size initiated by the TMC on the AXI > master > > > interface. The burst size can be in the range [0..15], the > setting > > > supports one data transfer per burst to maximum of 16 data > > > transfers per burst. If don't set this property, the driver > > > will set to 15 (16 data transfers per burst) as default value. > > > > > > I don't think it's a right way to use CoreSight ROM to read out part > number and > > > producer number, and based on these numbers to set burst size. The main > > > reason is this will add SoC specific code into the driver, and DT > binding is just > > > used to decouple the platform specific with the driver code, so with the > DT > > > binding, the driver works as common as possible. > > > > > > Using errata workaround also is not the right thing to do. Based on the > TMC > > > spec (ARM DDI 0461B), the burst size on AXI bus could be different on > different > > > SoCs > > > > Hi Leo, where are you reading that this is a property of the AXI bus? > > No, I cannot find any description in TMC spec (ARM DDI 0461B) to say > AXICTL.WrBurstLen is a property of the AXI bus. > > After read the register AXICTL (DDI 0461B 3.3.15), I mixed the > concepts, AXICTL.WrBurstLen is an attribution for TMC to initiate the > max number of data transfers, it's not a property for AXI bus itself. > > Sorry I introduced confusion. > > > There is a constraint that the burst size must not be greater than the > > TMC buffer size (see DDI 0461B 3.3.1), but buffer size is indicated by > > the TMC's MEMWIDTH register and the driver can determine that. > > DEVID.MEMWIDTH: The width of the memory interface databus > for ETB/ETF, DEVID.MEMWIDT = 2 * (ATB datawidth) > for ETR, DEVID.MEMWIDT = ATB datawidth > > AXICTL.WrBurstLen: The maximum number of data transfer that can occur > within each burst. > > RSZ.RSZ: The size of the local RAM buffer (in 32-bit words). > > And there have requirement for burst size and TMC buffer size: > > 2 ^ (DEVID.MEMWIDTH - 2) * AXICTL.WrBurstLen <= RSZ.RSZ > > > The situation I thought we were dealing with, is where the TMC > > presents a burst that the AXI spec says is valid but the downstream > > component in this SoC can't accept a burst of that size. Our understanding > > is that this would be a non-compliance in the downstream component - > > i.e. failure to handle a valid AXI transaction - and it would be > > appropriate to treat this as a SoC-specific issue triggering a SoC- > specific > > workaround. (The alternative would be to represent the AXI topology > > in DT/ACPI and then drive the workaround from knowing what the > > downstream component is.) > > Okay, I see, the downstream component (e.g. memory controller) has the > hardware constraint, it imposes the limitation back on TMC burst size. > > The downstream can be a memory controller, or any other component. I > understand your suggestion that use DT device node to bind TMC and > downstream device, and based on the property in the downstream device > we can get to know the constriant for burst size. > > But here it's hard to unify property for downstream components; and I > checked a bit for the latest Device tree spec, if we create the binding > between TMC device and a memory node ('memory' is a general node in DT > binding), the 'memory' node doesn't provide property for burst size. > > So seems to me, a feasible (and simple) way is still to add a property > for TMC, since the TMC is connected to its downstream component through > AXI bus, the developer for a platform has the knowledge for the > constraint for the max burst size. > > * Optional property for TMC: > > * arm,max-burst-size: the max burst size initiated by the TMC on the > AXI master interface. The burst size can be in the range [0..15], > the setting supports one data transfer per burst to maximum of 16 data > transfers per burst. > > When a platform has specific constriant for the maximum burst size > (e.g. > caused by its downstream component), it can set this property for the > maximum burst size; if this property isn't set, the driver will set to > 15 (16 data transfers per burst) as default value. > > Hope I am not arbitrary for this, so I am curious what's opinions from > others. any thoughts? Thanks for providing all the information and ways to fix this issue. Based on the suggestions, would it be okay if we introduce the optional DT property as "mrvl,max-burst-size" since it's a SoC specific setting ? Thanks and Regards, Tanmay > > Thanks, > Leo > > [1] https://urldefense.proofpoint.com/v2/url?u=https- > 3A__github.com_devicetree-2Dorg_devicetree- > 2Dspecification_releases_download_v0.3_devicetree-2Dspecification- > 2Dv0.3.pdf&d=DwIBAg&c=nKjWec2b6R0mOyPaz7xtfQ&r=nUIuNGvBPgdGfPtnCZf81muScqsqX > vLuOk9ojMTpiTc&m=j5ICgutTAewROjPhs6f9hmuagqSziAFuElP5DLmrP_4&s=YbM6THDChtLLK > xUMtZMiM1BpU3Lyw0OfKW_E3nYiCtQ&e=

3 years, 10 months

Re: [PATCHv2 4/4] coresight: Add ETR-PERF polling.

by Mathieu Poirier

Good day, On Tue, Jul 13, 2021 at 02:15:32PM +0200, Daniel Kiss wrote: > ETR might fill up the buffer sooner than an event makes perf to trigger > the synchronisation especially in system wide trace. Polling runs > periodically to sync the ETR buffer. Period is configurable via sysfs, > disabled by default. > > Signed-off-by: Daniel Kiss <daniel.kiss(a)arm.com> > Signed-off-by: Branislav Rankov <Branislav.Rankov(a)arm.com> > Tested-by: Denis Nikitin <denik(a)chromium.org> > --- > .../testing/sysfs-bus-coresight-devices-tmc | 8 + > drivers/hwtracing/coresight/Kconfig | 12 + > drivers/hwtracing/coresight/Makefile | 1 + > .../hwtracing/coresight/coresight-etm-perf.c | 8 + > .../coresight/coresight-etr-perf-polling.c | 275 ++++++++++++++++++ > .../coresight/coresight-etr-perf-polling.h | 38 +++ > .../hwtracing/coresight/coresight-tmc-core.c | 4 + > .../hwtracing/coresight/coresight-tmc-etr.c | 13 + > 8 files changed, 359 insertions(+) > create mode 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.c > create mode 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.h > > diff --git a/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc b/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc > index 6aa527296c710..3b411e8a6f417 100644 > --- a/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc > +++ b/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc > @@ -91,3 +91,11 @@ Contact: Mathieu Poirier <mathieu.poirier(a)linaro.org> > Description: (RW) Size of the trace buffer for TMC-ETR when used in SYSFS > mode. Writable only for TMC-ETR configurations. The value > should be aligned to the kernel pagesize. > + > +What: /sys/bus/coresight/devices/<memory_map>.tmc/polling/period > +Date: July 2021 > +KernelVersion: 5.14 > +Contact: Daniel Kiss <daniel.kiss(a)arm.com> > +Description: (RW) Time in milliseconds when the TMC-ETR is synced. > + Default value is 0, means the feature is disabled. > + Writable only for TMC-ETR configurations. The problem with the current implementation is that period is global to all tracing sessions. Moreover it seems possible to change the value of a period while a session is ongoing, which is guaranteed to confuse users. A better way to proceed is to make the period a perf configurable option the same way timestamp and ccyacc are. Look at the top of coresight-etm-perf.c for examples. This might be the last time we use configuration attributes, after that we should start using Mike Leach's new system configuration feature. > diff --git a/drivers/hwtracing/coresight/Kconfig b/drivers/hwtracing/coresight/Kconfig > index 84530fd80998c..4e91fb98849f4 100644 > --- a/drivers/hwtracing/coresight/Kconfig > +++ b/drivers/hwtracing/coresight/Kconfig > @@ -44,6 +44,18 @@ config CORESIGHT_LINK_AND_SINK_TMC > To compile this driver as a module, choose M here: the > module will be called coresight-tmc. > > + > +config CORESIGHT_ETR_PERF_POLL > + bool "Coresight ETR Perf Polling" > + > + depends on CORESIGHT_LINK_AND_SINK_TMC > + help > + Enable the support for software periodic synchronization of the ETR buffer. > + ETR might fill up the buffer sooner than an event makes perf to trigger > + the synchronization especially in system wide trace. Polling runs > + periodically to sync the ETR buffer. Period is configurable via sysfs, > + disabled by default. > + Polling should be part of the core and not a configurable option. More comments to come... > config CORESIGHT_CATU > tristate "Coresight Address Translation Unit (CATU) driver" > depends on CORESIGHT_LINK_AND_SINK_TMC > diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile > index d60816509755c..6baac328eea87 100644 > --- a/drivers/hwtracing/coresight/Makefile > +++ b/drivers/hwtracing/coresight/Makefile > @@ -5,6 +5,7 @@ > obj-$(CONFIG_CORESIGHT) += coresight.o > coresight-y := coresight-core.o coresight-etm-perf.o coresight-platform.o \ > coresight-sysfs.o > +coresight-$(CONFIG_CORESIGHT_ETR_PERF_POLL) += coresight-etr-perf-polling.o > obj-$(CONFIG_CORESIGHT_LINK_AND_SINK_TMC) += coresight-tmc.o > coresight-tmc-y := coresight-tmc-core.o coresight-tmc-etf.o \ > coresight-tmc-etr.o > diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c > index a3f4c07f5bf8b..3095840a567c4 100644 > --- a/drivers/hwtracing/coresight/coresight-etm-perf.c > +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c > @@ -19,6 +19,7 @@ > #include <linux/workqueue.h> > > #include "coresight-etm-perf.h" > +#include "coresight-etr-perf-polling.h" > #include "coresight-priv.h" > > static struct pmu etm_pmu; > @@ -438,6 +439,8 @@ static void etm_event_start(struct perf_event *event, int flags) > /* Tell the perf core the event is alive */ > event->hw.state = 0; > > + etr_perf_polling_event_start(event, event_data, handle); > + > /* Finally enable the tracer */ > if (source_ops(csdev)->enable(csdev, event, CS_MODE_PERF)) > goto fail_disable_path; > @@ -497,6 +500,8 @@ static void etm_event_stop(struct perf_event *event, int mode) > if (!sink) > return; > > + etr_perf_polling_event_stop(event, event_data); > + > /* stop tracer */ > source_ops(csdev)->disable(csdev, event); > > @@ -741,6 +746,8 @@ int __init etm_perf_init(void) > etm_pmu.addr_filters_validate = etm_addr_filters_validate; > etm_pmu.nr_addr_filters = ETM_ADDR_CMP_MAX; > > + etr_perf_polling_init(); > + > ret = perf_pmu_register(&etm_pmu, CORESIGHT_ETM_PMU_NAME, -1); > if (ret == 0) > etm_perf_up = true; > @@ -750,5 +757,6 @@ int __init etm_perf_init(void) > > void __exit etm_perf_exit(void) > { > + etr_perf_polling_exit(); > perf_pmu_unregister(&etm_pmu); > } > diff --git a/drivers/hwtracing/coresight/coresight-etr-perf-polling.c b/drivers/hwtracing/coresight/coresight-etr-perf-polling.c > new file mode 100644 > index 0000000000000..87e6bc42a62de > --- /dev/null > +++ b/drivers/hwtracing/coresight/coresight-etr-perf-polling.c > @@ -0,0 +1,275 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright(C) 2021 Arm Limited. All rights reserved. > + * Author: Daniel Kiss <daniel.kiss(a)arm.com> > + */ > + > +#include <linux/coresight.h> > +#include <linux/coresight-pmu.h> > +#include <linux/cpumask.h> > +#include <linux/device.h> > +#include <linux/init.h> > +#include <linux/list.h> > +#include <linux/percpu-defs.h> > +#include <linux/perf_event.h> > +#include <linux/slab.h> > +#include <linux/stringhash.h> > +#include <linux/types.h> > +#include <linux/workqueue.h> > + > +#include "coresight-etr-perf-polling.h" > +#include "coresight-priv.h" > +#include "coresight-tmc.h" > + > +struct polling_event_list { > + struct perf_event *perf_event; > + struct etm_event_data *etm_event_data; > + struct perf_output_handle *ctx_handle; > + void (*tmc_etr_reset_hw)(struct tmc_drvdata *data); > + struct list_head list; > +}; > + > +struct polling { > + int cpu; > + struct polling_event_list *polled_event; > + struct delayed_work delayed_work; > +}; > + > +static atomic_t period; > +static spinlock_t spinlock_re; > +static struct list_head registered_events; > + > +static DEFINE_PER_CPU(struct polling, polling); > + > +static ssize_t period_show(struct device *dev, struct device_attribute *attr, > + char *buf) > +{ > + int temp; > + > + temp = atomic_read(&period); > + return sprintf(buf, "%i\n", temp); > +} > + > +static ssize_t period_store(struct device *dev, struct device_attribute *attr, > + const char *buf, size_t count) > +{ > + int temp = 0; > + > + if (!kstrtoint(buf, 10, &temp) && (temp >= 0)) > + atomic_set(&period, temp); > + return count; > +} > + > +static DEVICE_ATTR_RW(period); > + > +static struct attribute *coresight_tmc_polling_attrs[] = { > + &dev_attr_period.attr, > + NULL, > +}; > +const struct attribute_group coresight_tmc_polling_group = { > + .attrs = coresight_tmc_polling_attrs, > + .name = "polling", > +}; > +EXPORT_SYMBOL_GPL(coresight_tmc_polling_group); > + > +static inline void polling_sched_worker(struct polling *p) > +{ > + int tickrate = atomic_read(&period); > + > + if (tickrate > 0) > + schedule_delayed_work_on(p->cpu, &p->delayed_work, > + msecs_to_jiffies(tickrate)); > +} > + > +static inline bool is_etr_related(struct etm_event_data *etm_event_data, int cpu) > +{ > + struct list_head *path; > + struct coresight_device *sink; > + struct tmc_drvdata *drvdata; > + > + path = etm_event_cpu_path(etm_event_data, cpu); > + if (WARN_ON(!path)) > + return false; > + sink = coresight_get_sink(path); > + if (WARN_ON(!sink)) > + return false; > + drvdata = dev_get_drvdata(sink->dev.parent); > + if (drvdata->config_type != TMC_CONFIG_TYPE_ETR) > + return false; > + return true; > +} > + > +/* > + * Adds the event to the polled events list. > + */ > +void etr_perf_polling_event_start(struct perf_event *event, > + struct etm_event_data *etm_event_data, > + struct perf_output_handle *ctx_handle) > +{ > + int cpu = smp_processor_id(); > + struct polling *p = per_cpu_ptr(&polling, cpu); > + struct polling_event_list *element, *tmp; > + > + if (!is_etr_related(etm_event_data, cpu)) > + return; > + > + spin_lock(&spinlock_re); > + list_for_each_entry_safe(element, tmp, &registered_events, list) { > + if (element->ctx_handle == ctx_handle) { > + element->perf_event = event; > + element->etm_event_data = etm_event_data; > + spin_unlock(&spinlock_re); > + p->polled_event = element; > + polling_sched_worker(p); > + return; > + } > + } > + spin_unlock(&spinlock_re); > +} > + > +/* > + * Removes the event from the to be polled events list. > + */ > +void etr_perf_polling_event_stop(struct perf_event *event, > + struct etm_event_data *etm_event_data) > +{ > + int cpu = smp_processor_id(); > + struct polling *p = per_cpu_ptr(&polling, cpu); > + > + if (!is_etr_related(etm_event_data, cpu)) > + return; > + > + if (p->polled_event) { > + struct polling_event_list *element = p->polled_event; > + > + if (element->perf_event == event) { > + p->polled_event = NULL; > + element->perf_event = NULL; > + element->etm_event_data = NULL; > + cancel_delayed_work(&p->delayed_work); > + return; > + } > + } > +} > + > +/* > + * The polling worker is a workqueue job which is periodically > + * woken up to update the perf aux buffer from the etr shrink. > + */ > +static void etr_perf_polling_worker(struct work_struct *work) > +{ > + unsigned long flags; > + struct list_head *path; > + struct coresight_device *sink; > + int size; > + int cpu = smp_processor_id(); > + struct polling *p = per_cpu_ptr(&polling, cpu); > + > + if (!atomic_read(&period)) > + return; > + > + if (!p->polled_event) > + return; > + /* > + * Scheduling would do the same from the perf hooks, > + * this should be done in one go. > + */ > + local_irq_save(flags); > + > + polling_sched_worker(p); > + > + path = etm_event_cpu_path(p->polled_event->etm_event_data, cpu); > + sink = coresight_get_sink(path); > + size = sink_ops(sink)->update_buffer( > + sink, p->polled_event->ctx_handle, > + p->polled_event->etm_event_data->snk_config); > + > + /* > + * Restart the trace. > + */ > + if (p->polled_event->tmc_etr_reset_hw) > + p->polled_event->tmc_etr_reset_hw(dev_get_drvdata(sink->dev.parent)); > + > + WARN_ON(size < 0); > + if (size > 0) { > + struct etm_event_data *new_event_data; > + > + perf_aux_output_end(p->polled_event->ctx_handle, size); > + new_event_data = perf_aux_output_begin( > + p->polled_event->ctx_handle, > + p->polled_event->perf_event); > + if (WARN_ON(new_event_data == NULL)) { > + local_irq_restore(flags); > + return; > + } > + > + p->polled_event->etm_event_data = new_event_data; > + WARN_ON(new_event_data->snk_config != > + p->polled_event->etm_event_data->snk_config); > + } > + > + local_irq_restore(flags); > +} > + > +void etr_perf_polling_handle_register(struct perf_output_handle *handle, > + void (*tmc_etr_reset_hw)(struct tmc_drvdata *drvdata)) > +{ > + struct polling_event_list *element; > + > + element = kmalloc(sizeof(*element), GFP_ATOMIC); > + if (WARN_ON(!element)) > + return; > + memset(element, 0, sizeof(*element)); > + element->ctx_handle = handle; > + element->tmc_etr_reset_hw = tmc_etr_reset_hw; > + spin_lock(&spinlock_re); > + list_add(&element->list, &registered_events); > + spin_unlock(&spinlock_re); > +} > +EXPORT_SYMBOL_GPL(etr_perf_polling_handle_register); > + > +void etr_perf_polling_handle_deregister(struct perf_output_handle *handle) > +{ > + struct polling_event_list *element, *tmp; > + > + spin_lock(&spinlock_re); > + list_for_each_entry_safe(element, tmp, &registered_events, list) { > + if (element->ctx_handle == handle) { > + list_del(&element->list); > + spin_unlock(&spinlock_re); > + kfree(element); > + return; > + } > + } > + spin_unlock(&spinlock_re); > +} > +EXPORT_SYMBOL_GPL(etr_perf_polling_handle_deregister); > + > +void etr_perf_polling_init(void) > +{ > + int cpu; > + > + spin_lock_init(&spinlock_re); > + INIT_LIST_HEAD(&registered_events); > + atomic_set(&period, 0); > + for_each_possible_cpu(cpu) { > + struct polling *p = per_cpu_ptr(&polling, cpu); > + > + p->cpu = cpu; > + p->polled_event = NULL; > + INIT_DELAYED_WORK(&p->delayed_work, etr_perf_polling_worker); > + } > +} > + > +void etr_perf_polling_exit(void) > +{ > + int cpu; > + > + for_each_possible_cpu(cpu) { > + struct polling *p = per_cpu_ptr(&polling, cpu); > + > + cancel_delayed_work_sync(&p->delayed_work); > + WARN_ON(p->polled_event); > + } > + WARN_ON(!list_empty(&registered_events)); > +} > diff --git a/drivers/hwtracing/coresight/coresight-etr-perf-polling.h b/drivers/hwtracing/coresight/coresight-etr-perf-polling.h > new file mode 100644 > index 0000000000000..d47b4424594e6 > --- /dev/null > +++ b/drivers/hwtracing/coresight/coresight-etr-perf-polling.h > @@ -0,0 +1,38 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Copyright(C) 2021 Arm Limited. All rights reserved. > + * Author: Daniel Kiss <daniel.kiss(a)arm.com> > + */ > + > +#ifndef _CORESIGHT_ETM_PERF_POLLING_H > +#define _CORESIGHT_ETM_PERF_POLLING_H > + > +#include <linux/coresight.h> > +#include <linux/perf_event.h> > +#include "coresight-etm-perf.h" > +#include "coresight-tmc.h" > + > +#ifdef CONFIG_CORESIGHT_ETR_PERF_POLL > +void etr_perf_polling_init(void); > +void etr_perf_polling_exit(void); > +void etr_perf_polling_handle_register(struct perf_output_handle *handle, > + void (*tmc_etr_reset_hw)(struct tmc_drvdata *drvdata)); > +void etr_perf_polling_handle_deregister(struct perf_output_handle *handle); > +void etr_perf_polling_event_start(struct perf_event *event, > + struct etm_event_data *etm_event_data, > + struct perf_output_handle *ctx_handle); > +void etr_perf_polling_event_stop(struct perf_event *event, > + struct etm_event_data *etm_event_data); > + > +extern const struct attribute_group coresight_tmc_polling_group; > + > +#else /* !CONFIG_CORESIGHT_ETR_PERF_POLL */ > +#define etr_perf_polling_init() > +#define etr_perf_polling_exit() > +#define etr_perf_polling_handle_register(...) > +#define etr_perf_polling_handle_deregister(...) > +#define etr_perf_polling_event_start(...) > +#define etr_perf_polling_event_stop(...) > +#endif > + > +#endif > diff --git a/drivers/hwtracing/coresight/coresight-tmc-core.c b/drivers/hwtracing/coresight/coresight-tmc-core.c > index 74c6323d4d6ab..dbcdba162bd38 100644 > --- a/drivers/hwtracing/coresight/coresight-tmc-core.c > +++ b/drivers/hwtracing/coresight/coresight-tmc-core.c > @@ -26,6 +26,7 @@ > > #include "coresight-priv.h" > #include "coresight-tmc.h" > +#include "coresight-etr-perf-polling.h" > > DEFINE_CORESIGHT_DEVLIST(etb_devs, "tmc_etb"); > DEFINE_CORESIGHT_DEVLIST(etf_devs, "tmc_etf"); > @@ -365,6 +366,9 @@ static const struct attribute_group coresight_tmc_mgmt_group = { > static const struct attribute_group *coresight_tmc_groups[] = { > &coresight_tmc_group, > &coresight_tmc_mgmt_group, > +#ifdef CONFIG_CORESIGHT_ETR_PERF_POLL > + &coresight_tmc_polling_group, > +#endif > NULL, > }; > > diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c > index 55c9b5fd9f832..67cd4bdcda71b 100644 > --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c > +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c > @@ -16,6 +16,7 @@ > #include <linux/vmalloc.h> > #include "coresight-catu.h" > #include "coresight-etm-perf.h" > +#include "coresight-etr-perf-polling.h" > #include "coresight-priv.h" > #include "coresight-tmc.h" > > @@ -1137,6 +1138,16 @@ void tmc_etr_disable_hw(struct tmc_drvdata *drvdata) > drvdata->etr_buf = NULL; > } > > +#ifdef CONFIG_CORESIGHT_ETR_PERF_POLL > + > +static void tmc_etr_reset_hw(struct tmc_drvdata *drvdata) > +{ > + __tmc_etr_disable_hw(drvdata); > + __tmc_etr_enable_hw(drvdata); > +} > + > +#endif > + > static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) > { > int ret = 0; > @@ -1620,6 +1631,7 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data) > drvdata->mode = CS_MODE_PERF; > drvdata->perf_buf = etr_perf->etr_buf; > drvdata->perf_handle = handle; > + etr_perf_polling_handle_register(handle, tmc_etr_reset_hw); > atomic_inc(csdev->refcnt); > } > > @@ -1667,6 +1679,7 @@ static int tmc_disable_etr_sink(struct coresight_device *csdev) > drvdata->mode = CS_MODE_DISABLED; > /* Reset perf specific data */ > drvdata->perf_buf = NULL; > + etr_perf_polling_handle_deregister(drvdata->perf_handle); > drvdata->perf_handle = NULL; > > spin_unlock_irqrestore(&drvdata->spinlock, flags); > -- > 2.25.1 >

3 years, 10 months

[PATCH] coresight: syscfg: fix compiler warnings

by Jian Cai

This fixes warnings with -Wimplicit-function-declaration, e.g. ^[[1m/mnt/host/source/src/third_party/kernel/v5.4/drivers/hwtracing/coresight/coresight-syscfg.c:455:15: ^[[0m^[[0;1;31merror: ^[[0m^[[1mimplicit declaration of function 'kzalloc' [-Werror,-Wimplicit-function-declaration]^[[0m csdev_item = kzalloc(sizeof(struct cscfg_registered_csdev), GFP_KERNEL); ^[[0;1;32m ^ Signed-off-by: Jian Cai <jiancai(a)google.com> --- drivers/hwtracing/coresight/coresight-syscfg.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/hwtracing/coresight/coresight-syscfg.c b/drivers/hwtracing/coresight/coresight-syscfg.c index fc0760f55c53..43054568430f 100644 --- a/drivers/hwtracing/coresight/coresight-syscfg.c +++ b/drivers/hwtracing/coresight/coresight-syscfg.c @@ -5,6 +5,7 @@ */ #include <linux/platform_device.h> +#include <linux/slab.h> #include "coresight-config.h" #include "coresight-etm-perf.h" -- 2.33.0.259.gc128427fd7-goog

3 years, 10 months

[PATCH] coresight: syscfg: Add initial configfs support

by Jian Cai

This fixes build failures with -Wimplicit-function-declaration, e.g. ^[[1m/mnt/host/source/src/third_party/kernel/v5.4/drivers/hwtracing/coresight/coresight-syscfg.c:455:15: ^[[0m^[[0;1;31merror: ^[[0m^[[1mimplicit declaration of function 'kzalloc' [-Werror,-Wimplicit-function-declaration]^[[0m csdev_item = kzalloc(sizeof(struct cscfg_registered_csdev), GFP_KERNEL); ^[[0;1;32m ^ Signed-off-by: Jian Cai <jiancai(a)google.com> --- drivers/hwtracing/coresight/coresight-syscfg.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/hwtracing/coresight/coresight-syscfg.c b/drivers/hwtracing/coresight/coresight-syscfg.c index fc0760f55c53..43054568430f 100644 --- a/drivers/hwtracing/coresight/coresight-syscfg.c +++ b/drivers/hwtracing/coresight/coresight-syscfg.c @@ -5,6 +5,7 @@ */ #include <linux/platform_device.h> +#include <linux/slab.h> #include "coresight-config.h" #include "coresight-etm-perf.h" -- 2.33.0.259.gc128427fd7-goog

3 years, 10 months

Re: [PATCHv2 2/4] coresight: tmc-etr: Track perf handler.

by Mathieu Poirier

Hi Daniel, On Tue, Jul 13, 2021 at 02:15:30PM +0200, Daniel Kiss wrote: > Keep track of the perf handler that is registred by the first tracer. > This will be used by the update call from polling. > > Signed-off-by: Daniel Kiss <daniel.kiss(a)arm.com> > Signed-off-by: Branislav Rankov <Branislav.Rankov(a)arm.com> > --- > drivers/hwtracing/coresight/coresight-tmc-etr.c | 6 ++++-- > drivers/hwtracing/coresight/coresight-tmc.h | 2 ++ > 2 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c > index 589bb2d56e802..55c9b5fd9f832 100644 > --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c > +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c > @@ -1503,8 +1503,8 @@ tmc_update_etr_buffer(struct coresight_device *csdev, > > spin_lock_irqsave(&drvdata->spinlock, flags); > > - /* Don't do anything if another tracer is using this sink */ > - if (atomic_read(csdev->refcnt) != 1) { > + /* Serve only the tracer with the leading perf handler */ > + if (drvdata->perf_handle != handle) { In CPU wide trace scenarios the first CPU to enable a sink is not guaranteed to be the same as the last CPU to use it. As far as I understand the above assumes the first and last CPUs to use a sink are the same. > spin_unlock_irqrestore(&drvdata->spinlock, flags); > goto out; > } > @@ -1619,6 +1619,7 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data) > drvdata->pid = pid; > drvdata->mode = CS_MODE_PERF; > drvdata->perf_buf = etr_perf->etr_buf; > + drvdata->perf_handle = handle; > atomic_inc(csdev->refcnt); > } > > @@ -1666,6 +1667,7 @@ static int tmc_disable_etr_sink(struct coresight_device *csdev) > drvdata->mode = CS_MODE_DISABLED; > /* Reset perf specific data */ > drvdata->perf_buf = NULL; > + drvdata->perf_handle = NULL; > > spin_unlock_irqrestore(&drvdata->spinlock, flags); > > diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h > index b91ec7dde7bc9..81583ffb973dc 100644 > --- a/drivers/hwtracing/coresight/coresight-tmc.h > +++ b/drivers/hwtracing/coresight/coresight-tmc.h > @@ -184,6 +184,7 @@ struct etr_buf { > * @idr_mutex: Access serialisation for idr. > * @sysfs_buf: SYSFS buffer for ETR. > * @perf_buf: PERF buffer for ETR. > + * @perf_handle: PERF handle for ETR. > */ > struct tmc_drvdata { > void __iomem *base; > @@ -207,6 +208,7 @@ struct tmc_drvdata { > struct mutex idr_mutex; > struct etr_buf *sysfs_buf; > struct etr_buf *perf_buf; > + struct perf_output_handle *perf_handle; > }; > > struct etr_buf_operations { > -- > 2.25.1 >

3 years, 10 months

Query regarding AXI Write Burst Length in ETR driver

by Tanmay Jagdale

Hi Al and Mike, > > > > 16 (0xF) should work for all silicon, as AXI allows burst sizes up to 16. > > So unless we've missed something, this is an implementation non-compliance > > and we should not be penalising compliant implementations by reducing the > > default burst size - the question is how we can enable the upstream kernel > to > > workaround the issue on this chip only, and to me that sounds like it > needs > > something that can be triggered by a setting in DT/ACPI. > > > > Another possibility would be to introduce an errata workaround in > Kconfig for your silicon. > There are a number of these already in KConfig for PE issues e.g. > CONFIG_ARM64_ERRATUM_826319, > and we have introduced CONFIG_ETM4X_IMPDEF_FEATURE for silicon > specific variants in the ETMv4 driver. > > The latter config compiles in implementation defined workarounds which > operate on the basis of matching the AMBA ID for the silicon. > This means they will operate only if configured in KConfig and only on > silicon where the workaround is needed. > Thanks a lot for the suggestions. We are thinking of using "part-number" and "designer" rather than device tree property. While we use standard ARM core IP and Coresight SoC-600 IP, we cannot differentiate our silicon from others using ETR AMBA ID, PIDR and CPU MIDR registers. We are proposing to expose Coresight ROM region to the driver and determine part number and designer by reading the following fields. part_number = (PIDR1.PART_1 << 8) | PIDR0.PART_0; designer = ((PIDR4.DES_2 << 7) & 0xf) | ((PIDR2.DES_1 << 4) & 0x7) | ((PIDR1.DES_0 & 0xf)); Using a combination of part number and designer from ROM region would help in identifying the Marvell implementation. This option would be generic and could be helpful for other silicon with similar issues and can be applied across Coresight components like ETF/ETR. What are your thoughts on this approach ? With Regards, Tanmay > Regards > > Mike > > > > Al

3 years, 10 months

[PATCH 0/4] OpenCSD: v1.2.0: Update to add decode statistics API.

by Mike Leach

Adds a generic API to allow packet processors to count the amount of bytes per channel processed and not synced plus any packet header or format errors. The ETMv4 / ETE packet processor is update to use this API. API adds ocsd_decode_stats_t structure to contain the statistics. (ocsd_if_types.h) C-API (ocsd_c_apo.h) adds functions:- ocsd_dt_get_decode_stats() - get pointer to stats block. ocsd_dt_reset_decode_stats() - resets the counts to zero. This function operates independently of the main decoder reset. This allows for tools such as perf which may reset the decoder multiple times per AUXTRACE_BUFFER to count stats for the entire buffer rather than each capture block. Mike Leach (4): opencsd: Add decode statistics API to packet processor. opencsd: ETMv4: ETE: Add packet processing stats to decoders. opencsd: tests: Update test programs to use the packet decoder statistics API opencsd: Update readme and version info for v1.2.0 README.md | 5 ++- decoder/include/common/ocsd_dcd_tree.h | 26 ++++++++++- decoder/include/common/trc_pkt_proc_base.h | 44 ++++++++++++++++++- decoder/include/opencsd/c_api/opencsd_c_api.h | 30 ++++++++++++- decoder/include/opencsd/ocsd_if_types.h | 20 +++++++++ decoder/include/opencsd/ocsd_if_version.h | 6 +-- decoder/source/c_api/ocsd_c_api.cpp | 20 ++++++++- decoder/source/etmv4/trc_pkt_proc_etmv4i.cpp | 10 ++++- decoder/source/ocsd_dcd_tree.cpp | 39 ++++++++++++++++ decoder/tests/source/c_api_pkt_print_test.c | 37 +++++++++++++++- decoder/tests/source/trc_pkt_lister.cpp | 37 +++++++++++++++- 11 files changed, 260 insertions(+), 14 deletions(-) -- 2.17.1

3 years, 10 months

[PATCH v1 0/3] perf: Support compat mode for AUX ring buffer

by Leo Yan

For better organising and easier review, this patch series is extracted from the patch set "perf: Refine barriers for AUX ring buffer" . When applying this patch series, it needs to be applied on the top of the patch series [1]. To support the compat mode in perf tool, the patch 01 adds an new item in "perf_env" to track if kernel is running in 64-bit mode. This patch is a preparation for later changes. Patch 02 introduces compat variant functions for accessing AUX trace's head and tail, these two functions are defined with weak attribute, so they can be called when any architectures cannot provide 64-bit value atomic accessing when perf is in compat mode. Patch 03 supports compat_auxtrace_mmap__{read_head|write_tail} on Arm platform. For Arm platform with compat mode, the kernel runs in 64-bit kernel mode and user space tool runs in 32-bit mode, it uses the instructions "ldrd" and "strd" for 64-bit value atomicity. This patch set have been tested on Arm64 Juno platform for the perf tool is built with compiler arm-linux-gnueabihf-gcc. [1] https://lore.kernel.org/patchwork/cover/1473916/ Leo Yan (3): perf env: Track kernel 64-bit mode in environment perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail} perf auxtrace arm: Support compat_auxtrace_mmap__{read_head|write_tail} tools/perf/arch/arm/util/auxtrace.c | 32 +++++++++++ tools/perf/util/auxtrace.c | 88 +++++++++++++++++++++++++++-- tools/perf/util/auxtrace.h | 22 +++++++- tools/perf/util/env.c | 24 +++++++- tools/perf/util/env.h | 3 + 5 files changed, 161 insertions(+), 8 deletions(-) -- 2.25.1

3 years, 10 months

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

CoreSight August 2021