This series is split of the Coresight ETR perf support patches posted here [0]. The CATU support and perf backend support will be posted as separate series for better management and review of the patches.
This series adds the support for TMC ETR Scatter-Gather mode to allow using physical non-contiguous buffer for holding the trace data. It also adds a layer to handle the buffer management in a transparent manner, independent of the underlying mode used by the TMC ETR. The layer chooses the ETR mode based on different parameters (size, re-using a set of pages, presence of an SMMU etc.).
Finally we add a sysfs parameter to tune the buffer size for ETR in sysfs-mode.
During the testing, we found out that if the TMC ETR is not properly connected to the memory subsystem, the ETR could lock-up the system while waiting for the "read" transactions to complete in scatter-gather mode. So, we do not use the mode on a system unless it is safe to do so. This is specified by a DT property "arm,scatter-gather".
Applies on coreisght-next tree from Mathieu
Changes since previous version [1]: - Rebased to Mathieu's coresight-next tree to resolve a conflict. - Added tags for DT changes from Rob and Mathieu - Split the SG mode backend support patch from the ETR-BUF patch. - Address other comments from Mathieu
Changes since splitted series [0] : - Split the series in [0] - Address comments on v2 - Rename DT property "scatter-gather" to "arm,scatter-gather" - Add ETM PID for Cortex-A35, use macros to make the listing easier
[0] - http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/574875.html [1] - http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/579135.html
Suzuki K Poulose (12): coresight: ETM: Add support for Arm Cortex-A73 and Cortex-A35 coresight: tmc: Hide trace buffer handling for file read coresight: tmc-etr: Do not clean trace buffer coresight: tmc-etr: Disallow perf mode coresight: Add helper for inserting synchronization packets dts: bindings: Restrict coresight tmc-etr scatter-gather mode dts: juno: Add scatter-gather support for all revisions coresight: Add generic TMC sg table framework coresight: Add support for TMC ETR SG unit coresight: tmc-etr: Add transparent buffer management coresight: tmc-etr buf: Add TMC scatter gather mode backend coresight: tmc: Add configuration support for trace buffer size
.../ABI/testing/sysfs-bus-coresight-devices-tmc | 8 + .../devicetree/bindings/arm/coresight.txt | 5 +- arch/arm64/boot/dts/arm/juno-base.dtsi | 1 + drivers/hwtracing/coresight/coresight-etb10.c | 12 +- drivers/hwtracing/coresight/coresight-etm4x.c | 31 +- drivers/hwtracing/coresight/coresight-priv.h | 10 +- drivers/hwtracing/coresight/coresight-tmc-etf.c | 45 +- drivers/hwtracing/coresight/coresight-tmc-etr.c | 1010 ++++++++++++++++++-- drivers/hwtracing/coresight/coresight-tmc.c | 83 +- drivers/hwtracing/coresight/coresight-tmc.h | 110 ++- drivers/hwtracing/coresight/coresight.c | 3 +- 11 files changed, 1144 insertions(+), 174 deletions(-)
Add ETM PIDs of the Arm cortex-A CPUs to the white list of ETMs. While at it add a helper macro to make it easier to add the new entries.
Cc: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- Changes since last version: - Use ETM's ID register to dump the version, rather than hard coding it for each CPU. --- drivers/hwtracing/coresight/coresight-etm4x.c | 31 ++++++++++++--------------- 1 file changed, 14 insertions(+), 17 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c index 9bc04c5..1d94ebe 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x.c +++ b/drivers/hwtracing/coresight/coresight-etm4x.c @@ -1027,7 +1027,8 @@ static int etm4_probe(struct amba_device *adev, const struct amba_id *id) }
pm_runtime_put(&adev->dev); - dev_info(dev, "%s initialized\n", (char *)id->data); + dev_info(dev, "CPU%d: ETM v%d.%d initialized\n", + drvdata->cpu, drvdata->arch >> 4, drvdata->arch & 0xf);
if (boot_enable) { coresight_enable(drvdata->csdev); @@ -1045,23 +1046,19 @@ static int etm4_probe(struct amba_device *adev, const struct amba_id *id) return ret; }
+#define ETM4x_AMBA_ID(pid) \ + { \ + .id = pid, \ + .mask = 0x000fffff, \ + } + static const struct amba_id etm4_ids[] = { - { /* ETM 4.0 - Cortex-A53 */ - .id = 0x000bb95d, - .mask = 0x000fffff, - .data = "ETM 4.0", - }, - { /* ETM 4.0 - Cortex-A57 */ - .id = 0x000bb95e, - .mask = 0x000fffff, - .data = "ETM 4.0", - }, - { /* ETM 4.0 - A72, Maia, HiSilicon */ - .id = 0x000bb95a, - .mask = 0x000fffff, - .data = "ETM 4.0", - }, - { 0, 0}, + ETM4x_AMBA_ID(0x000bb95d), /* Cortex-A53 */ + ETM4x_AMBA_ID(0x000bb95e), /* Cortex-A57 */ + ETM4x_AMBA_ID(0x000bb95a), /* Cortex-A72 */ + ETM4x_AMBA_ID(0x000bb959), /* Cortex-A73 */ + ETM4x_AMBA_ID(0x000bb9da), /* Cortex-A35 */ + {}, };
static struct amba_driver etm4x_driver = {
At the moment we adjust the buffer pointers for reading the trace data via misc device in the common code for ETF/ETB and ETR. Since we are going to change how we manage the buffer for ETR, let us move the buffer manipulation to the respective driver files, hiding it from the common code. We do so by adding type specific helpers for finding the length of data and the pointer to the buffer, for a given length at a file position.
Cc: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-tmc-etf.c | 18 +++++++++++ drivers/hwtracing/coresight/coresight-tmc-etr.c | 34 ++++++++++++++++++++ drivers/hwtracing/coresight/coresight-tmc.c | 41 ++++++++++++++----------- drivers/hwtracing/coresight/coresight-tmc.h | 4 +++ 4 files changed, 79 insertions(+), 18 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c index 61d849b..73160cd 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c @@ -109,6 +109,24 @@ static void tmc_etf_disable_hw(struct tmc_drvdata *drvdata) CS_LOCK(drvdata->base); }
+/* + * Return the available trace data in the buffer from @pos, with + * a maximum limit of @len, updating the @bufpp on where to + * find it. + */ +ssize_t tmc_etb_get_sysfs_trace(struct tmc_drvdata *drvdata, + loff_t pos, size_t len, char **bufpp) +{ + ssize_t actual = len; + + /* Adjust the len to available size @pos */ + if (pos + actual > drvdata->len) + actual = drvdata->len - pos; + if (actual > 0) + *bufpp = drvdata->buf + pos; + return actual; +} + static int tmc_enable_etf_sink_sysfs(struct coresight_device *csdev) { int ret = 0; diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index 02f747a..f88342d 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -58,6 +58,40 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata) CS_LOCK(drvdata->base); }
+/* + * Return the available trace data in the buffer @pos, with a maximum + * limit of @len, also updating the @bufpp on where to find it. + */ +ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata, + loff_t pos, size_t len, char **bufpp) +{ + ssize_t actual = len; + char *bufp = drvdata->buf + pos; + char *bufend = (char *)(drvdata->vaddr + drvdata->size); + + /* Adjust the len to available size @pos */ + if (pos + actual > drvdata->len) + actual = drvdata->len - pos; + + if (actual <= 0) + return actual; + + /* + * Since we use a circular buffer, with trace data starting + * @drvdata->buf, possibly anywhere in the buffer @drvdata->vaddr, + * wrap the current @pos to within the buffer. + */ + if (bufp >= bufend) + bufp -= drvdata->size; + /* + * For simplicity, avoid copying over a wrapped around buffer. + */ + if ((bufp + actual) > bufend) + actual = bufend - bufp; + *bufpp = bufp; + return actual; +} + static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata) { const u32 *barrier; diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c index 456f122..bb57e7f 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.c +++ b/drivers/hwtracing/coresight/coresight-tmc.c @@ -123,35 +123,40 @@ static int tmc_open(struct inode *inode, struct file *file) return 0; }
+static inline ssize_t tmc_get_sysfs_trace(struct tmc_drvdata *drvdata, + loff_t pos, size_t len, char **bufpp) +{ + switch (drvdata->config_type) { + case TMC_CONFIG_TYPE_ETB: + case TMC_CONFIG_TYPE_ETF: + return tmc_etb_get_sysfs_trace(drvdata, pos, len, bufpp); + case TMC_CONFIG_TYPE_ETR: + return tmc_etr_get_sysfs_trace(drvdata, pos, len, bufpp); + } + + return -EINVAL; +} + static ssize_t tmc_read(struct file *file, char __user *data, size_t len, loff_t *ppos) { + char *bufp; + ssize_t actual; struct tmc_drvdata *drvdata = container_of(file->private_data, struct tmc_drvdata, miscdev); - char *bufp = drvdata->buf + *ppos; + actual = tmc_get_sysfs_trace(drvdata, *ppos, len, &bufp); + if (actual <= 0) + return 0;
- if (*ppos + len > drvdata->len) - len = drvdata->len - *ppos; - - if (drvdata->config_type == TMC_CONFIG_TYPE_ETR) { - if (bufp == (char *)(drvdata->vaddr + drvdata->size)) - bufp = drvdata->vaddr; - else if (bufp > (char *)(drvdata->vaddr + drvdata->size)) - bufp -= drvdata->size; - if ((bufp + len) > (char *)(drvdata->vaddr + drvdata->size)) - len = (char *)(drvdata->vaddr + drvdata->size) - bufp; - } - - if (copy_to_user(data, bufp, len)) { + if (copy_to_user(data, bufp, actual)) { dev_dbg(drvdata->dev, "%s: copy_to_user failed\n", __func__); return -EFAULT; }
- *ppos += len; + *ppos += actual; + dev_dbg(drvdata->dev, "%zu bytes copied\n", actual);
- dev_dbg(drvdata->dev, "%s: %zu bytes copied, %d bytes left\n", - __func__, len, (int)(drvdata->len - *ppos)); - return len; + return actual; }
static int tmc_release(struct inode *inode, struct file *file) diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h index dfaff07..1d7cd58 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.h +++ b/drivers/hwtracing/coresight/coresight-tmc.h @@ -172,10 +172,14 @@ int tmc_read_unprepare_etb(struct tmc_drvdata *drvdata); extern const struct coresight_ops tmc_etb_cs_ops; extern const struct coresight_ops tmc_etf_cs_ops;
+ssize_t tmc_etb_get_sysfs_trace(struct tmc_drvdata *drvdata, + loff_t pos, size_t len, char **bufpp); /* ETR functions */ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata); int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata); extern const struct coresight_ops tmc_etr_cs_ops; +ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata, + loff_t pos, size_t len, char **bufpp);
#define TMC_REG_PAIR(name, lo_off, hi_off) \
We zero out the entire trace buffer used for ETR before it is enabled, for helping with debugging. With the addition of scatter-gather mode, the buffer could be bigger and non-contiguous.
Get rid of this step; if someone wants to debug, they can always add it as and when needed.
Cc: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-tmc-etr.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index f88342d..1de05c9 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -13,9 +13,6 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata) { u32 axictl, sts;
- /* Zero out the memory to help with debug */ - memset(drvdata->vaddr, 0, drvdata->size); - CS_UNLOCK(drvdata->base);
/* Wait for TMCSReady bit to be set */ @@ -340,9 +337,8 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata) if (drvdata->mode == CS_MODE_SYSFS) { /* * The trace run will continue with the same allocated trace - * buffer. The trace buffer is cleared in tmc_etr_enable_hw(), - * so we don't have to explicitly clear it. Also, since the - * tracer is still enabled drvdata::buf can't be NULL. + * buffer. Since the tracer is still enabled drvdata::buf can't + * be NULL. */ tmc_etr_enable_hw(drvdata); } else {
We don't support ETR in perf mode yet. So, don't even try to enable the hardware, even by mistake.
Cc: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-tmc-etr.c | 28 ++----------------------- 1 file changed, 2 insertions(+), 26 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index 1de05c9..18c9a18 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -211,32 +211,8 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
static int tmc_enable_etr_sink_perf(struct coresight_device *csdev) { - int ret = 0; - unsigned long flags; - struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); - - spin_lock_irqsave(&drvdata->spinlock, flags); - if (drvdata->reading) { - ret = -EINVAL; - goto out; - } - - /* - * In Perf mode there can be only one writer per sink. There - * is also no need to continue if the ETR is already operated - * from sysFS. - */ - if (drvdata->mode != CS_MODE_DISABLED) { - ret = -EINVAL; - goto out; - } - - drvdata->mode = CS_MODE_PERF; - tmc_etr_enable_hw(drvdata); -out: - spin_unlock_irqrestore(&drvdata->spinlock, flags); - - return ret; + /* We don't support perf mode yet ! */ + return -EINVAL; }
static int tmc_enable_etr_sink(struct coresight_device *csdev, u32 mode)
Right now we open code filling the trace buffer with synchronization packets when the circular buffer wraps around in different drivers. Move this to a common place. While at it, clean up the barrier_pkt array to strip off the trailing '\0'.
Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-etb10.c | 12 ++++------- drivers/hwtracing/coresight/coresight-priv.h | 10 ++++++++- drivers/hwtracing/coresight/coresight-tmc-etf.c | 27 ++++++++----------------- drivers/hwtracing/coresight/coresight-tmc-etr.c | 13 +----------- drivers/hwtracing/coresight/coresight.c | 3 +-- 5 files changed, 23 insertions(+), 42 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c index 9b6c555..78e71bf 100644 --- a/drivers/hwtracing/coresight/coresight-etb10.c +++ b/drivers/hwtracing/coresight/coresight-etb10.c @@ -195,7 +195,6 @@ static void etb_dump_hw(struct etb_drvdata *drvdata) bool lost = false; int i; u8 *buf_ptr; - const u32 *barrier; u32 read_data, depth; u32 read_ptr, write_ptr; u32 frame_off, frame_endoff; @@ -226,19 +225,16 @@ static void etb_dump_hw(struct etb_drvdata *drvdata)
depth = drvdata->buffer_depth; buf_ptr = drvdata->buf; - barrier = barrier_pkt; for (i = 0; i < depth; i++) { read_data = readl_relaxed(drvdata->base + ETB_RAM_READ_DATA_REG); - if (lost && *barrier) { - read_data = *barrier; - barrier++; - } - *(u32 *)buf_ptr = read_data; buf_ptr += 4; }
+ if (lost) + coresight_insert_barrier_packet(drvdata->buf); + if (frame_off) { buf_ptr -= (frame_endoff * 4); for (i = 0; i < frame_endoff; i++) { @@ -447,7 +443,7 @@ static void etb_update_buffer(struct coresight_device *csdev, buf_ptr = buf->data_pages[cur] + offset; read_data = readl_relaxed(drvdata->base + ETB_RAM_READ_DATA_REG); - if (lost && *barrier) { + if (lost && i < CORESIGHT_BARRIER_PKT_SIZE) { read_data = *barrier; barrier++; } diff --git a/drivers/hwtracing/coresight/coresight-priv.h b/drivers/hwtracing/coresight/coresight-priv.h index 0e5a74d..1a6cf35 100644 --- a/drivers/hwtracing/coresight/coresight-priv.h +++ b/drivers/hwtracing/coresight/coresight-priv.h @@ -57,7 +57,8 @@ static DEVICE_ATTR_RO(name) #define coresight_simple_reg64(type, name, lo_off, hi_off) \ __coresight_simple_func(type, NULL, name, lo_off, hi_off)
-extern const u32 barrier_pkt[5]; +extern const u32 barrier_pkt[4]; +#define CORESIGHT_BARRIER_PKT_SIZE (sizeof(barrier_pkt))
enum etm_addr_type { ETM_ADDR_TYPE_NONE, @@ -91,6 +92,13 @@ struct cs_buffers { void **data_pages; };
+static inline void coresight_insert_barrier_packet(void *buf) +{ + if (buf) + memcpy(buf, barrier_pkt, CORESIGHT_BARRIER_PKT_SIZE); +} + + static inline void CS_LOCK(void __iomem *addr) { do { diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c index 73160cd..0549249 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c @@ -32,39 +32,28 @@ static void tmc_etb_enable_hw(struct tmc_drvdata *drvdata)
static void tmc_etb_dump_hw(struct tmc_drvdata *drvdata) { - bool lost = false; char *bufp; - const u32 *barrier; - u32 read_data, status; + u32 read_data, lost; int i;
- /* - * Get a hold of the status register and see if a wrap around - * has occurred. - */ - status = readl_relaxed(drvdata->base + TMC_STS); - if (status & TMC_STS_FULL) - lost = true; - + /* Check if the buffer wrapped around. */ + lost = readl_relaxed(drvdata->base + TMC_STS) & TMC_STS_FULL; bufp = drvdata->buf; drvdata->len = 0; - barrier = barrier_pkt; while (1) { for (i = 0; i < drvdata->memwidth; i++) { read_data = readl_relaxed(drvdata->base + TMC_RRD); if (read_data == 0xFFFFFFFF) - return; - - if (lost && *barrier) { - read_data = *barrier; - barrier++; - } - + goto done; memcpy(bufp, &read_data, 4); bufp += 4; drvdata->len += 4; } } +done: + if (lost) + coresight_insert_barrier_packet(drvdata->buf); + return; }
static void tmc_etb_disable_hw(struct tmc_drvdata *drvdata) diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index 18c9a18..04206ff 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -91,9 +91,7 @@ ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata,
static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata) { - const u32 *barrier; u32 val; - u32 *temp; u64 rwp;
rwp = tmc_read_rwp(drvdata); @@ -106,16 +104,7 @@ static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata) if (val & TMC_STS_FULL) { drvdata->buf = drvdata->vaddr + rwp - drvdata->paddr; drvdata->len = drvdata->size; - - barrier = barrier_pkt; - temp = (u32 *)drvdata->buf; - - while (*barrier) { - *temp = *barrier; - temp++; - barrier++; - } - + coresight_insert_barrier_packet(drvdata->buf); } else { drvdata->buf = drvdata->vaddr; drvdata->len = rwp - drvdata->paddr; diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c index 29e834a..4969b32 100644 --- a/drivers/hwtracing/coresight/coresight.c +++ b/drivers/hwtracing/coresight/coresight.c @@ -51,8 +51,7 @@ static struct list_head *stm_path; * beginning of the data collected in a buffer. That way the decoder knows that * it needs to look for another sync sequence. */ -const u32 barrier_pkt[5] = {0x7fffffff, 0x7fffffff, - 0x7fffffff, 0x7fffffff, 0x0}; +const u32 barrier_pkt[4] = {0x7fffffff, 0x7fffffff, 0x7fffffff, 0x7fffffff};
static int coresight_id_match(struct device *dev, void *data) {
We are about to add the support for ETR builtin scatter-gather mode for dealing with large amount of trace buffers. However, on some of the platforms, using the ETR SG mode can lock up the system due to the way the ETR is connected to the memory subsystem.
In SG mode, the ETR performs READ from the scatter-gather table to fetch the next page and regular WRITE of trace data. If the READ operation doesn't complete(due to the memory subsystem issues, which we have seen on a couple of platforms) the trace WRITE cannot proceed leading to issues. So, we by default do not use the SG mode, unless it is known to be safe on the platform. We define a DT property for the TMC node to specify whether we have a proper SG mode.
Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Mark Rutland mark.rutland@arm.com Cc: John Horley john.horley@arm.com Cc: Robert Walker robert.walker@arm.com Cc: devicetree@vger.kernel.org Cc: frowand.list@gmail.com Reviewed-by: Rob Herring robh@kernel.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- Documentation/devicetree/bindings/arm/coresight.txt | 2 ++ drivers/hwtracing/coresight/coresight-tmc.c | 9 ++++++++- 2 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt index 15ac8e8..603d3c6 100644 --- a/Documentation/devicetree/bindings/arm/coresight.txt +++ b/Documentation/devicetree/bindings/arm/coresight.txt @@ -86,6 +86,8 @@ its hardware characteristcs. * arm,buffer-size: size of contiguous buffer space for TMC ETR (embedded trace router)
+ * arm,scatter-gather: boolean. Indicates that the TMC-ETR can safely + use the SG mode on this system.
Example:
diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c index bb57e7f..bc8fc86 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.c +++ b/drivers/hwtracing/coresight/coresight-tmc.c @@ -12,6 +12,7 @@ #include <linux/err.h> #include <linux/fs.h> #include <linux/miscdevice.h> +#include <linux/property.h> #include <linux/uaccess.h> #include <linux/slab.h> #include <linux/dma-mapping.h> @@ -296,6 +297,12 @@ const struct attribute_group *coresight_tmc_groups[] = { NULL, };
+static inline bool tmc_etr_can_use_sg(struct tmc_drvdata *drvdata) +{ + return fwnode_property_present(drvdata->dev->fwnode, + "arm,scatter-gather"); +} + /* Detect and initialise the capabilities of a TMC ETR */ static int tmc_etr_setup_caps(struct tmc_drvdata *drvdata, u32 devid, void *dev_caps) @@ -305,7 +312,7 @@ static int tmc_etr_setup_caps(struct tmc_drvdata *drvdata, /* Set the unadvertised capabilities */ tmc_etr_init_caps(drvdata, (u32)(unsigned long)dev_caps);
- if (!(devid & TMC_DEVID_NOSCAT)) + if (!(devid & TMC_DEVID_NOSCAT) && tmc_etr_can_use_sg(drvdata)) tmc_etr_set_cap(drvdata, TMC_ETR_SG);
/* Check if the AXI address width is available */
Advertise that the scatter-gather is properly integrated on all revisions of Juno board.
Cc: Sudeep Holla sudeep.holla@arm.com Cc: Liviu Dudau liviu.dudau@arm.com Cc: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Acked-by: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- arch/arm64/boot/dts/arm/juno-base.dtsi | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi index eb749c5..6ce9090 100644 --- a/arch/arm64/boot/dts/arm/juno-base.dtsi +++ b/arch/arm64/boot/dts/arm/juno-base.dtsi @@ -198,6 +198,7 @@ clocks = <&soc_smc50mhz>; clock-names = "apb_pclk"; power-domains = <&scpi_devpd 0>; + arm,scatter-gather; port { etr_in_port: endpoint { slave-mode;
This patch introduces a generic sg table data structure and associated operations. An SG table can be used to map a set of Data pages where the trace data could be stored by the TMC ETR. The information about the data pages could be stored in different formats, depending on the type of the underlying SG mechanism (e.g, TMC ETR SG vs Coresight CATU). The generic structure provides book keeping of the pages used for the data as well as the table contents. The table should be filled by the user of the infrastructure.
A table can be created by specifying the number of data pages as well as the number of table pages required to hold the pointers, where the latter could be different for different types of tables. The pages are mapped in the appropriate dma data direction mode (i.e, DMA_TO_DEVICE for table pages and DMA_FROM_DEVICE for data pages). The framework can optionally accept a set of allocated data pages (e.g, perf ring buffer) and map them accordingly. The table and data pages are vmap'ed to allow easier access by the drivers. The framework also provides helpers to sync the data written to the pages with appropriate directions.
This will be later used by the TMC ETR SG unit and CATU.
Cc: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- Changes since previous version: - Drop helper for table vaddr/paddr, data vaddr as we don't use them anyways. --- drivers/hwtracing/coresight/coresight-tmc-etr.c | 268 ++++++++++++++++++++++++ drivers/hwtracing/coresight/coresight-tmc.h | 50 +++++ 2 files changed, 318 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index 04206ff..402b061 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -6,9 +6,277 @@
#include <linux/coresight.h> #include <linux/dma-mapping.h> +#include <linux/slab.h> #include "coresight-priv.h" #include "coresight-tmc.h"
+/* + * tmc_pages_get_offset: Go through all the pages in the tmc_pages + * and map the device address @addr to an offset within the virtual + * contiguous buffer. + */ +static long +tmc_pages_get_offset(struct tmc_pages *tmc_pages, dma_addr_t addr) +{ + int i; + dma_addr_t page_start; + + for (i = 0; i < tmc_pages->nr_pages; i++) { + page_start = tmc_pages->daddrs[i]; + if (addr >= page_start && addr < (page_start + PAGE_SIZE)) + return i * PAGE_SIZE + (addr - page_start); + } + + return -EINVAL; +} + +/* + * tmc_pages_free : Unmap and free the pages used by tmc_pages. + * If the pages were not allocated in tmc_pages_alloc(), we would + * simply drop the refcount. + */ +static void tmc_pages_free(struct tmc_pages *tmc_pages, + struct device *dev, enum dma_data_direction dir) +{ + int i; + + for (i = 0; i < tmc_pages->nr_pages; i++) { + if (tmc_pages->daddrs && tmc_pages->daddrs[i]) + dma_unmap_page(dev, tmc_pages->daddrs[i], + PAGE_SIZE, dir); + if (tmc_pages->pages && tmc_pages->pages[i]) + __free_page(tmc_pages->pages[i]); + } + + kfree(tmc_pages->pages); + kfree(tmc_pages->daddrs); + tmc_pages->pages = NULL; + tmc_pages->daddrs = NULL; + tmc_pages->nr_pages = 0; +} + +/* + * tmc_pages_alloc : Allocate and map pages for a given @tmc_pages. + * If @pages is not NULL, the list of page virtual addresses are + * used as the data pages. The pages are then dma_map'ed for @dev + * with dma_direction @dir. + * + * Returns 0 upon success, else the error number. + */ +static int tmc_pages_alloc(struct tmc_pages *tmc_pages, + struct device *dev, int node, + enum dma_data_direction dir, void **pages) +{ + int i, nr_pages; + dma_addr_t paddr; + struct page *page; + + nr_pages = tmc_pages->nr_pages; + tmc_pages->daddrs = kcalloc(nr_pages, sizeof(*tmc_pages->daddrs), + GFP_KERNEL); + if (!tmc_pages->daddrs) + return -ENOMEM; + tmc_pages->pages = kcalloc(nr_pages, sizeof(*tmc_pages->pages), + GFP_KERNEL); + if (!tmc_pages->pages) { + kfree(tmc_pages->daddrs); + tmc_pages->daddrs = NULL; + return -ENOMEM; + } + + for (i = 0; i < nr_pages; i++) { + if (pages && pages[i]) { + page = virt_to_page(pages[i]); + /* Hold a refcount on the page */ + get_page(page); + } else { + page = alloc_pages_node(node, + GFP_KERNEL | __GFP_ZERO, 0); + } + paddr = dma_map_page(dev, page, 0, PAGE_SIZE, dir); + if (dma_mapping_error(dev, paddr)) + goto err; + tmc_pages->daddrs[i] = paddr; + tmc_pages->pages[i] = page; + } + return 0; +err: + tmc_pages_free(tmc_pages, dev, dir); + return -ENOMEM; +} + +static inline long +tmc_sg_get_data_page_offset(struct tmc_sg_table *sg_table, dma_addr_t addr) +{ + return tmc_pages_get_offset(&sg_table->data_pages, addr); +} + +static inline void tmc_free_table_pages(struct tmc_sg_table *sg_table) +{ + if (sg_table->table_vaddr) + vunmap(sg_table->table_vaddr); + tmc_pages_free(&sg_table->table_pages, sg_table->dev, DMA_TO_DEVICE); +} + +static void tmc_free_data_pages(struct tmc_sg_table *sg_table) +{ + if (sg_table->data_vaddr) + vunmap(sg_table->data_vaddr); + tmc_pages_free(&sg_table->data_pages, sg_table->dev, DMA_FROM_DEVICE); +} + +void tmc_free_sg_table(struct tmc_sg_table *sg_table) +{ + tmc_free_table_pages(sg_table); + tmc_free_data_pages(sg_table); +} + +/* + * Alloc pages for the table. Since this will be used by the device, + * allocate the pages closer to the device (i.e, dev_to_node(dev) + * rather than the CPU node). + */ +static int tmc_alloc_table_pages(struct tmc_sg_table *sg_table) +{ + int rc; + struct tmc_pages *table_pages = &sg_table->table_pages; + + rc = tmc_pages_alloc(table_pages, sg_table->dev, + dev_to_node(sg_table->dev), + DMA_TO_DEVICE, NULL); + if (rc) + return rc; + sg_table->table_vaddr = vmap(table_pages->pages, + table_pages->nr_pages, + VM_MAP, + PAGE_KERNEL); + if (!sg_table->table_vaddr) + rc = -ENOMEM; + else + sg_table->table_daddr = table_pages->daddrs[0]; + return rc; +} + +static int tmc_alloc_data_pages(struct tmc_sg_table *sg_table, void **pages) +{ + int rc; + + /* Allocate data pages on the node requested by the caller */ + rc = tmc_pages_alloc(&sg_table->data_pages, + sg_table->dev, sg_table->node, + DMA_FROM_DEVICE, pages); + if (!rc) { + sg_table->data_vaddr = vmap(sg_table->data_pages.pages, + sg_table->data_pages.nr_pages, + VM_MAP, + PAGE_KERNEL); + if (!sg_table->data_vaddr) + rc = -ENOMEM; + } + return rc; +} + +/* + * tmc_alloc_sg_table: Allocate and setup dma pages for the TMC SG table + * and data buffers. TMC writes to the data buffers and reads from the SG + * Table pages. + * + * @dev - Device to which page should be DMA mapped. + * @node - Numa node for mem allocations + * @nr_tpages - Number of pages for the table entries. + * @nr_dpages - Number of pages for Data buffer. + * @pages - Optional list of virtual address of pages. + */ +struct tmc_sg_table *tmc_alloc_sg_table(struct device *dev, + int node, + int nr_tpages, + int nr_dpages, + void **pages) +{ + long rc; + struct tmc_sg_table *sg_table; + + sg_table = kzalloc(sizeof(*sg_table), GFP_KERNEL); + if (!sg_table) + return ERR_PTR(-ENOMEM); + sg_table->data_pages.nr_pages = nr_dpages; + sg_table->table_pages.nr_pages = nr_tpages; + sg_table->node = node; + sg_table->dev = dev; + + rc = tmc_alloc_data_pages(sg_table, pages); + if (!rc) + rc = tmc_alloc_table_pages(sg_table); + if (rc) { + tmc_free_sg_table(sg_table); + kfree(sg_table); + return ERR_PTR(rc); + } + + return sg_table; +} + +/* + * tmc_sg_table_sync_data_range: Sync the data buffer written + * by the device from @offset upto a @size bytes. + */ +void tmc_sg_table_sync_data_range(struct tmc_sg_table *table, + u64 offset, u64 size) +{ + int i, index, start; + int npages = DIV_ROUND_UP(size, PAGE_SIZE); + struct device *dev = table->dev; + struct tmc_pages *data = &table->data_pages; + + start = offset >> PAGE_SHIFT; + for (i = start; i < (start + npages); i++) { + index = i % data->nr_pages; + dma_sync_single_for_cpu(dev, data->daddrs[index], + PAGE_SIZE, DMA_FROM_DEVICE); + } +} + +/* tmc_sg_sync_table: Sync the page table */ +void tmc_sg_table_sync_table(struct tmc_sg_table *sg_table) +{ + int i; + struct device *dev = sg_table->dev; + struct tmc_pages *table_pages = &sg_table->table_pages; + + for (i = 0; i < table_pages->nr_pages; i++) + dma_sync_single_for_device(dev, table_pages->daddrs[i], + PAGE_SIZE, DMA_TO_DEVICE); +} + +/* + * tmc_sg_table_get_data: Get the buffer pointer for data @offset + * in the SG buffer. The @bufpp is updated to point to the buffer. + * Returns : + * the length of linear data available at @offset. + * or + * <= 0 if no data is available. + */ +ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table, + u64 offset, size_t len, char **bufpp) +{ + size_t size; + int pg_idx = offset >> PAGE_SHIFT; + int pg_offset = offset & (PAGE_SIZE - 1); + struct tmc_pages *data_pages = &sg_table->data_pages; + + size = tmc_sg_table_buf_size(sg_table); + if (offset >= size) + return -EINVAL; + + /* Make sure we don't go beyond the end */ + len = (len < (size - offset)) ? len : size - offset; + /* Respect the page boundaries */ + len = (len < (PAGE_SIZE - pg_offset)) ? len : (PAGE_SIZE - pg_offset); + if (len > 0) + *bufpp = page_address(data_pages->pages[pg_idx]) + pg_offset; + return len; +} + static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata) { u32 axictl, sts; diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h index 1d7cd58..cdb668b 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.h +++ b/drivers/hwtracing/coresight/coresight-tmc.h @@ -7,6 +7,7 @@ #ifndef _CORESIGHT_TMC_H #define _CORESIGHT_TMC_H
+#include <linux/dma-mapping.h> #include <linux/miscdevice.h>
#define TMC_RSZ 0x004 @@ -160,6 +161,38 @@ struct tmc_drvdata { u32 etr_caps; };
+/** + * struct tmc_pages - Collection of pages used for SG. + * @nr_pages: Number of pages in the list. + * @daddrs: Array of DMA'able page address. + * @pages: Array pages for the buffer. + */ +struct tmc_pages { + int nr_pages; + dma_addr_t *daddrs; + struct page **pages; +}; + +/* + * struct tmc_sg_table - Generic SG table for TMC + * @dev: Device for DMA allocations + * @table_vaddr: Contiguous Virtual address for PageTable + * @data_vaddr: Contiguous Virtual address for Data Buffer + * @table_daddr: DMA address of the PageTable base + * @node: Node for Page allocations + * @table_pages: List of pages & dma address for Table + * @data_pages: List of pages & dma address for Data + */ +struct tmc_sg_table { + struct device *dev; + void *table_vaddr; + void *data_vaddr; + dma_addr_t table_daddr; + int node; + struct tmc_pages table_pages; + struct tmc_pages data_pages; +}; + /* Generic functions */ void tmc_wait_for_tmcready(struct tmc_drvdata *drvdata); void tmc_flush_and_stop(struct tmc_drvdata *drvdata); @@ -215,4 +248,21 @@ static inline bool tmc_etr_has_cap(struct tmc_drvdata *drvdata, u32 cap) return !!(drvdata->etr_caps & cap); }
+struct tmc_sg_table *tmc_alloc_sg_table(struct device *dev, + int node, + int nr_tpages, + int nr_dpages, + void **pages); +void tmc_free_sg_table(struct tmc_sg_table *sg_table); +void tmc_sg_table_sync_table(struct tmc_sg_table *sg_table); +void tmc_sg_table_sync_data_range(struct tmc_sg_table *table, + u64 offset, u64 size); +ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table, + u64 offset, size_t len, char **bufpp); +static inline unsigned long +tmc_sg_table_buf_size(struct tmc_sg_table *sg_table) +{ + return sg_table->data_pages.nr_pages << PAGE_SHIFT; +} + #endif
This patch adds support for setting up an SG table used by the TMC ETR inbuilt SG unit. The TMC ETR uses 4K page sized tables to hold pointers to the 4K data pages with the last entry in a table pointing to the next table with the entries, by kind of chaining. The 2 LSBs determine the type of the table entry, to one of :
Normal - Points to a 4KB data page. Last - Points to a 4KB data page, but is the last entry in the page table. Link - Points to another 4KB table page with pointers to data.
The code takes care of handling the system page size which could be different than 4K. So we could end up putting multiple ETR SG tables in a single system page, vice versa for the data pages.
Cc: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-tmc-etr.c | 263 ++++++++++++++++++++++++ 1 file changed, 263 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index 402b061..67b4117 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -11,6 +11,87 @@ #include "coresight-tmc.h"
/* + * The TMC ETR SG has a page size of 4K. The SG table contains pointers + * to 4KB buffers. However, the OS may use a PAGE_SIZE different from + * 4K (i.e, 16KB or 64KB). This implies that a single OS page could + * contain more than one SG buffer and tables. + * + * A table entry has the following format: + * + * ---Bit31------------Bit4-------Bit1-----Bit0-- + * | Address[39:12] | SBZ | Entry Type | + * ---------------------------------------------- + * + * Address: Bits [39:12] of a physical page address. Bits [11:0] are + * always zero. + * + * Entry type: + * b00 - Reserved. + * b01 - Last entry in the tables, points to 4K page buffer. + * b10 - Normal entry, points to 4K page buffer. + * b11 - Link. The address points to the base of next table. + */ + +typedef u32 sgte_t; + +#define ETR_SG_PAGE_SHIFT 12 +#define ETR_SG_PAGE_SIZE (1UL << ETR_SG_PAGE_SHIFT) +#define ETR_SG_PAGES_PER_SYSPAGE (PAGE_SIZE / ETR_SG_PAGE_SIZE) +#define ETR_SG_PTRS_PER_PAGE (ETR_SG_PAGE_SIZE / sizeof(sgte_t)) +#define ETR_SG_PTRS_PER_SYSPAGE (PAGE_SIZE / sizeof(sgte_t)) + +#define ETR_SG_ET_MASK 0x3 +#define ETR_SG_ET_LAST 0x1 +#define ETR_SG_ET_NORMAL 0x2 +#define ETR_SG_ET_LINK 0x3 + +#define ETR_SG_ADDR_SHIFT 4 + +#define ETR_SG_ENTRY(addr, type) \ + (sgte_t)((((addr) >> ETR_SG_PAGE_SHIFT) << ETR_SG_ADDR_SHIFT) | \ + (type & ETR_SG_ET_MASK)) + +#define ETR_SG_ADDR(entry) \ + (((dma_addr_t)(entry) >> ETR_SG_ADDR_SHIFT) << ETR_SG_PAGE_SHIFT) +#define ETR_SG_ET(entry) ((entry) & ETR_SG_ET_MASK) + +/* + * struct etr_sg_table : ETR SG Table + * @sg_table: Generic SG Table holding the data/table pages. + * @hwaddr: hwaddress used by the TMC, which is the base + * address of the table. + */ +struct etr_sg_table { + struct tmc_sg_table *sg_table; + dma_addr_t hwaddr; +}; + +/* + * tmc_etr_sg_table_entries: Total number of table entries required to map + * @nr_pages system pages. + * + * We need to map @nr_pages * ETR_SG_PAGES_PER_SYSPAGE data pages. + * Each TMC page can map (ETR_SG_PTRS_PER_PAGE - 1) buffer pointers, + * with the last entry pointing to another page of table entries. + * If we spill over to a new page for mapping 1 entry, we could as + * well replace the link entry of the previous page with the last entry. + */ +static inline unsigned long __attribute_const__ +tmc_etr_sg_table_entries(int nr_pages) +{ + unsigned long nr_sgpages = nr_pages * ETR_SG_PAGES_PER_SYSPAGE; + unsigned long nr_sglinks = nr_sgpages / (ETR_SG_PTRS_PER_PAGE - 1); + /* + * If we spill over to a new page for 1 entry, we could as well + * make it the LAST entry in the previous page, skipping the Link + * address. + */ + if (nr_sglinks && (nr_sgpages % (ETR_SG_PTRS_PER_PAGE - 1) < 2)) + nr_sglinks--; + return nr_sgpages + nr_sglinks; +} + +/* * tmc_pages_get_offset: Go through all the pages in the tmc_pages * and map the device address @addr to an offset within the virtual * contiguous buffer. @@ -277,6 +358,188 @@ ssize_t tmc_sg_table_get_data(struct tmc_sg_table *sg_table, return len; }
+#ifdef ETR_SG_DEBUG +/* Map a dma address to virtual address */ +static unsigned long +tmc_sg_daddr_to_vaddr(struct tmc_sg_table *sg_table, + dma_addr_t addr, bool table) +{ + long offset; + unsigned long base; + struct tmc_pages *tmc_pages; + + if (table) { + tmc_pages = &sg_table->table_pages; + base = (unsigned long)sg_table->table_vaddr; + } else { + tmc_pages = &sg_table->data_pages; + base = (unsigned long)sg_table->data_vaddr; + } + + offset = tmc_pages_get_offset(tmc_pages, addr); + if (offset < 0) + return 0; + return base + offset; +} + +/* Dump the given sg_table */ +static void tmc_etr_sg_table_dump(struct etr_sg_table *etr_table) +{ + sgte_t *ptr; + int i = 0; + dma_addr_t addr; + struct tmc_sg_table *sg_table = etr_table->sg_table; + + ptr = (sgte_t *)tmc_sg_daddr_to_vaddr(sg_table, + etr_table->hwaddr, true); + while (ptr) { + addr = ETR_SG_ADDR(*ptr); + switch (ETR_SG_ET(*ptr)) { + case ETR_SG_ET_NORMAL: + dev_dbg(sg_table->dev, + "%05d: %p\t:[N] 0x%llx\n", i, ptr, addr); + ptr++; + break; + case ETR_SG_ET_LINK: + dev_dbg(sg_table->dev, + "%05d: *** %p\t:{L} 0x%llx ***\n", + i, ptr, addr); + ptr = (sgte_t *)tmc_sg_daddr_to_vaddr(sg_table, + addr, true); + break; + case ETR_SG_ET_LAST: + dev_dbg(sg_table->dev, + "%05d: ### %p\t:[L] 0x%llx ###\n", + i, ptr, addr); + return; + default: + dev_dbg(sg_table->dev, + "%05d: xxx %p\t:[INVALID] 0x%llx xxx\n", + i, ptr, addr); + return; + } + i++; + } + dev_dbg(sg_table->dev, "******* End of Table *****\n"); +} +#else +static inline void tmc_etr_sg_table_dump(struct etr_sg_table *etr_table) {} +#endif + +/* + * Populate the SG Table page table entries from table/data + * pages allocated. Each Data page has ETR_SG_PAGES_PER_SYSPAGE SG pages. + * So does a Table page. So we keep track of indices of the tables + * in each system page and move the pointers accordingly. + */ +#define INC_IDX_ROUND(idx, size) ((idx) = ((idx) + 1) % (size)) +static void tmc_etr_sg_table_populate(struct etr_sg_table *etr_table) +{ + dma_addr_t paddr; + int i, type, nr_entries; + int tpidx = 0; /* index to the current system table_page */ + int sgtidx = 0; /* index to the sg_table within the current syspage */ + int sgtentry = 0; /* the entry within the sg_table */ + int dpidx = 0; /* index to the current system data_page */ + int spidx = 0; /* index to the SG page within the current data page */ + sgte_t *ptr; /* pointer to the table entry to fill */ + struct tmc_sg_table *sg_table = etr_table->sg_table; + dma_addr_t *table_daddrs = sg_table->table_pages.daddrs; + dma_addr_t *data_daddrs = sg_table->data_pages.daddrs; + + nr_entries = tmc_etr_sg_table_entries(sg_table->data_pages.nr_pages); + /* + * Use the contiguous virtual address of the table to update entries. + */ + ptr = sg_table->table_vaddr; + /* + * Fill all the entries, except the last entry to avoid special + * checks within the loop. + */ + for (i = 0; i < nr_entries - 1; i++) { + if (sgtentry == ETR_SG_PTRS_PER_PAGE - 1) { + /* + * Last entry in a sg_table page is a link address to + * the next table page. If this sg_table is the last + * one in the system page, it links to the first + * sg_table in the next system page. Otherwise, it + * links to the next sg_table page within the system + * page. + */ + if (sgtidx == ETR_SG_PAGES_PER_SYSPAGE - 1) { + paddr = table_daddrs[tpidx + 1]; + } else { + paddr = table_daddrs[tpidx] + + (ETR_SG_PAGE_SIZE * (sgtidx + 1)); + } + type = ETR_SG_ET_LINK; + } else { + /* + * Update the indices to the data_pages to point to the + * next sg_page in the data buffer. + */ + type = ETR_SG_ET_NORMAL; + paddr = data_daddrs[dpidx] + spidx * ETR_SG_PAGE_SIZE; + if (!INC_IDX_ROUND(spidx, ETR_SG_PAGES_PER_SYSPAGE)) + dpidx++; + } + *ptr++ = ETR_SG_ENTRY(paddr, type); + /* + * Move to the next table pointer, moving the table page index + * if necessary + */ + if (!INC_IDX_ROUND(sgtentry, ETR_SG_PTRS_PER_PAGE)) { + if (!INC_IDX_ROUND(sgtidx, ETR_SG_PAGES_PER_SYSPAGE)) + tpidx++; + } + } + + /* Set up the last entry, which is always a data pointer */ + paddr = data_daddrs[dpidx] + spidx * ETR_SG_PAGE_SIZE; + *ptr++ = ETR_SG_ENTRY(paddr, ETR_SG_ET_LAST); +} + +/* + * tmc_init_etr_sg_table: Allocate a TMC ETR SG table, data buffer of @size and + * populate the table. + * + * @dev - Device pointer for the TMC + * @node - NUMA node where the memory should be allocated + * @size - Total size of the data buffer + * @pages - Optional list of page virtual address + */ +static struct etr_sg_table __maybe_unused * +tmc_init_etr_sg_table(struct device *dev, int node, + unsigned long size, void **pages) +{ + int nr_entries, nr_tpages; + int nr_dpages = size >> PAGE_SHIFT; + struct tmc_sg_table *sg_table; + struct etr_sg_table *etr_table; + + etr_table = kzalloc(sizeof(*etr_table), GFP_KERNEL); + if (!etr_table) + return ERR_PTR(-ENOMEM); + nr_entries = tmc_etr_sg_table_entries(nr_dpages); + nr_tpages = DIV_ROUND_UP(nr_entries, ETR_SG_PTRS_PER_SYSPAGE); + + sg_table = tmc_alloc_sg_table(dev, node, nr_tpages, nr_dpages, pages); + if (IS_ERR(sg_table)) { + kfree(etr_table); + return ERR_PTR(PTR_ERR(sg_table)); + } + + etr_table->sg_table = sg_table; + /* TMC should use table base address for DBA */ + etr_table->hwaddr = sg_table->table_daddr; + tmc_etr_sg_table_populate(etr_table); + /* Sync the table pages for the HW */ + tmc_sg_table_sync_table(sg_table); + tmc_etr_sg_table_dump(etr_table); + + return etr_table; +} + static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata) { u32 axictl, sts;
The TMC-ETR can use the target trace buffer in two different modes. Normal physically contiguous mode and a discontiguous list pages in Scatter-Gather mode. Also we have dedicated Coresight component, CATU (Coresight Address Translation Unit) to provide improved scatter-gather mode in Coresight SoC-600. This complicates the management of the buffer used for trace, depending on the mode in which ETR is configured.
So, this patch adds a transparent layer for managing the ETR buffer which abstracts the basic operations on the buffer (alloc, free, sync and retrieve the data) and uses the mode specific helpers to do the actual operation. This also allows the ETR driver to choose the best mode for a given use case and adds the flexibility to fallback to a different mode, without duplicating the code.
The patch also adds the "normal" flat memory mode and switches the sysfs driver to use the new layer.
Cc: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- Changes since last version: - Split the SG mode support to a separate patch --- drivers/hwtracing/coresight/coresight-tmc-etr.c | 342 ++++++++++++++++++------ drivers/hwtracing/coresight/coresight-tmc.h | 55 +++- 2 files changed, 308 insertions(+), 89 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index 67b4117..a0e504a 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -6,10 +6,18 @@
#include <linux/coresight.h> #include <linux/dma-mapping.h> +#include <linux/iommu.h> #include <linux/slab.h> #include "coresight-priv.h" #include "coresight-tmc.h"
+struct etr_flat_buf { + struct device *dev; + dma_addr_t daddr; + void *vaddr; + size_t size; +}; + /* * The TMC ETR SG has a page size of 4K. The SG table contains pointers * to 4KB buffers. However, the OS may use a PAGE_SIZE different from @@ -540,16 +548,207 @@ tmc_init_etr_sg_table(struct device *dev, int node, return etr_table; }
+/* + * tmc_etr_alloc_flat_buf: Allocate a contiguous DMA buffer. + */ +static int tmc_etr_alloc_flat_buf(struct tmc_drvdata *drvdata, + struct etr_buf *etr_buf, int node, + void **pages) +{ + struct etr_flat_buf *flat_buf; + + /* We cannot reuse existing pages for flat buf */ + if (pages) + return -EINVAL; + + flat_buf = kzalloc(sizeof(*flat_buf), GFP_KERNEL); + if (!flat_buf) + return -ENOMEM; + + flat_buf->vaddr = dma_alloc_coherent(drvdata->dev, etr_buf->size, + &flat_buf->daddr, GFP_KERNEL); + if (!flat_buf->vaddr) { + kfree(flat_buf); + return -ENOMEM; + } + + flat_buf->size = etr_buf->size; + flat_buf->dev = drvdata->dev; + etr_buf->hwaddr = flat_buf->daddr; + etr_buf->mode = ETR_MODE_FLAT; + etr_buf->private = flat_buf; + return 0; +} + +static void tmc_etr_free_flat_buf(struct etr_buf *etr_buf) +{ + struct etr_flat_buf *flat_buf = etr_buf->private; + + if (flat_buf && flat_buf->daddr) + dma_free_coherent(flat_buf->dev, flat_buf->size, + flat_buf->vaddr, flat_buf->daddr); + kfree(flat_buf); +} + +static void tmc_etr_sync_flat_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp) +{ + /* + * Adjust the buffer to point to the beginning of the trace data + * and update the available trace data. + */ + etr_buf->offset = rrp - etr_buf->hwaddr; + if (etr_buf->full) + etr_buf->len = etr_buf->size; + else + etr_buf->len = rwp - rrp; +} + +static ssize_t tmc_etr_get_data_flat_buf(struct etr_buf *etr_buf, + u64 offset, size_t len, char **bufpp) +{ + struct etr_flat_buf *flat_buf = etr_buf->private; + + *bufpp = (char *)flat_buf->vaddr + offset; + /* + * tmc_etr_buf_get_data already adjusts the length to handle + * buffer wrapping around. + */ + return len; +} + +static const struct etr_buf_operations etr_flat_buf_ops = { + .alloc = tmc_etr_alloc_flat_buf, + .free = tmc_etr_free_flat_buf, + .sync = tmc_etr_sync_flat_buf, + .get_data = tmc_etr_get_data_flat_buf, +}; + +static const struct etr_buf_operations *etr_buf_ops[] = { + [ETR_MODE_FLAT] = &etr_flat_buf_ops, +}; + +static inline int tmc_etr_mode_alloc_buf(int mode, + struct tmc_drvdata *drvdata, + struct etr_buf *etr_buf, int node, + void **pages) +{ + int rc; + + switch (mode) { + case ETR_MODE_FLAT: + rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf, node, pages); + if (!rc) + etr_buf->ops = etr_buf_ops[mode]; + return rc; + default: + return -EINVAL; + } +} + +/* + * tmc_alloc_etr_buf: Allocate a buffer use by ETR. + * @drvdata : ETR device details. + * @size : size of the requested buffer. + * @flags : Required properties for the buffer. + * @node : Node for memory allocations. + * @pages : An optional list of pages. + */ +static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata, + ssize_t size, int flags, + int node, void **pages) +{ + int rc = 0; + struct etr_buf *etr_buf; + + etr_buf = kzalloc(sizeof(*etr_buf), GFP_KERNEL); + if (!etr_buf) + return ERR_PTR(-ENOMEM); + + etr_buf->size = size; + + rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata, + etr_buf, node, pages); + if (rc) { + kfree(etr_buf); + return ERR_PTR(rc); + } + + return etr_buf; +} + +static void tmc_free_etr_buf(struct etr_buf *etr_buf) +{ + WARN_ON(!etr_buf->ops || !etr_buf->ops->free); + etr_buf->ops->free(etr_buf); + kfree(etr_buf); +} + +/* + * tmc_etr_buf_get_data: Get the pointer the trace data at @offset + * with a maximum of @len bytes. + * Returns: The size of the linear data available @pos, with *bufpp + * updated to point to the buffer. + */ +static ssize_t tmc_etr_buf_get_data(struct etr_buf *etr_buf, + u64 offset, size_t len, char **bufpp) +{ + /* Adjust the length to limit this transaction to end of buffer */ + len = (len < (etr_buf->size - offset)) ? len : etr_buf->size - offset; + + return etr_buf->ops->get_data(etr_buf, (u64)offset, len, bufpp); +} + +static inline s64 +tmc_etr_buf_insert_barrier_packet(struct etr_buf *etr_buf, u64 offset) +{ + ssize_t len; + char *bufp; + + len = tmc_etr_buf_get_data(etr_buf, offset, + CORESIGHT_BARRIER_PKT_SIZE, &bufp); + if (WARN_ON(len <= CORESIGHT_BARRIER_PKT_SIZE)) + return -EINVAL; + coresight_insert_barrier_packet(bufp); + return offset + CORESIGHT_BARRIER_PKT_SIZE; +} + +/* + * tmc_sync_etr_buf: Sync the trace buffer availability with drvdata. + * Makes sure the trace data is synced to the memory for consumption. + * @etr_buf->offset will hold the offset to the beginning of the trace data + * within the buffer, with @etr_buf->len bytes to consume. + */ +static void tmc_sync_etr_buf(struct tmc_drvdata *drvdata) +{ + struct etr_buf *etr_buf = drvdata->etr_buf; + u64 rrp, rwp; + u32 status; + + rrp = tmc_read_rrp(drvdata); + rwp = tmc_read_rwp(drvdata); + status = readl_relaxed(drvdata->base + TMC_STS); + etr_buf->full = status & TMC_STS_FULL; + + WARN_ON(!etr_buf->ops || !etr_buf->ops->sync); + + etr_buf->ops->sync(etr_buf, rrp, rwp); + + /* Insert barrier packets at the beginning, if there was an overflow */ + if (etr_buf->full) + tmc_etr_buf_insert_barrier_packet(etr_buf, etr_buf->offset); +} + static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata) { u32 axictl, sts; + struct etr_buf *etr_buf = drvdata->etr_buf;
CS_UNLOCK(drvdata->base);
/* Wait for TMCSReady bit to be set */ tmc_wait_for_tmcready(drvdata);
- writel_relaxed(drvdata->size / 4, drvdata->base + TMC_RSZ); + writel_relaxed(etr_buf->size / 4, drvdata->base + TMC_RSZ); writel_relaxed(TMC_MODE_CIRCULAR_BUFFER, drvdata->base + TMC_MODE);
axictl = readl_relaxed(drvdata->base + TMC_AXICTL); @@ -563,15 +762,15 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata) }
writel_relaxed(axictl, drvdata->base + TMC_AXICTL); - tmc_write_dba(drvdata, drvdata->paddr); + tmc_write_dba(drvdata, etr_buf->hwaddr); /* * If the TMC pointers must be programmed before the session, * we have to set it properly (i.e, RRP/RWP to base address and * STS to "not full"). */ if (tmc_etr_has_cap(drvdata, TMC_ETR_SAVE_RESTORE)) { - tmc_write_rrp(drvdata, drvdata->paddr); - tmc_write_rwp(drvdata, drvdata->paddr); + tmc_write_rrp(drvdata, etr_buf->hwaddr); + tmc_write_rwp(drvdata, etr_buf->hwaddr); sts = readl_relaxed(drvdata->base + TMC_STS) & ~TMC_STS_FULL; writel_relaxed(sts, drvdata->base + TMC_STS); } @@ -587,59 +786,48 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata) }
/* - * Return the available trace data in the buffer @pos, with a maximum - * limit of @len, also updating the @bufpp on where to find it. + * Return the available trace data in the buffer (starts at etr_buf->offset, + * limited by etr_buf->len) from @pos, with a maximum limit of @len, + * also updating the @bufpp on where to find it. Since the trace data + * starts at anywhere in the buffer, depending on the RRP, we adjust the + * @len returned to handle buffer wrapping around. */ ssize_t tmc_etr_get_sysfs_trace(struct tmc_drvdata *drvdata, loff_t pos, size_t len, char **bufpp) { + s64 offset; ssize_t actual = len; - char *bufp = drvdata->buf + pos; - char *bufend = (char *)(drvdata->vaddr + drvdata->size); - - /* Adjust the len to available size @pos */ - if (pos + actual > drvdata->len) - actual = drvdata->len - pos; + struct etr_buf *etr_buf = drvdata->etr_buf;
+ if (pos + actual > etr_buf->len) + actual = etr_buf->len - pos; if (actual <= 0) return actual;
- /* - * Since we use a circular buffer, with trace data starting - * @drvdata->buf, possibly anywhere in the buffer @drvdata->vaddr, - * wrap the current @pos to within the buffer. - */ - if (bufp >= bufend) - bufp -= drvdata->size; - /* - * For simplicity, avoid copying over a wrapped around buffer. - */ - if ((bufp + actual) > bufend) - actual = bufend - bufp; - *bufpp = bufp; - return actual; + /* Compute the offset from which we read the data */ + offset = etr_buf->offset + pos; + if (offset >= etr_buf->size) + offset -= etr_buf->size; + return tmc_etr_buf_get_data(etr_buf, offset, actual, bufpp); }
-static void tmc_etr_dump_hw(struct tmc_drvdata *drvdata) +static struct etr_buf * +tmc_etr_setup_sysfs_buf(struct tmc_drvdata *drvdata) { - u32 val; - u64 rwp; + return tmc_alloc_etr_buf(drvdata, drvdata->size, + 0, cpu_to_node(0), NULL); +}
- rwp = tmc_read_rwp(drvdata); - val = readl_relaxed(drvdata->base + TMC_STS); +static void +tmc_etr_free_sysfs_buf(struct etr_buf *buf) +{ + if (buf) + tmc_free_etr_buf(buf); +}
- /* - * Adjust the buffer to point to the beginning of the trace data - * and update the available trace data. - */ - if (val & TMC_STS_FULL) { - drvdata->buf = drvdata->vaddr + rwp - drvdata->paddr; - drvdata->len = drvdata->size; - coresight_insert_barrier_packet(drvdata->buf); - } else { - drvdata->buf = drvdata->vaddr; - drvdata->len = rwp - drvdata->paddr; - } +static void tmc_etr_sync_sysfs_buf(struct tmc_drvdata *drvdata) +{ + tmc_sync_etr_buf(drvdata); }
static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata) @@ -652,7 +840,8 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata) * read before the TMC is disabled. */ if (drvdata->mode == CS_MODE_SYSFS) - tmc_etr_dump_hw(drvdata); + tmc_etr_sync_sysfs_buf(drvdata); + tmc_disable_hw(drvdata);
CS_LOCK(drvdata->base); @@ -661,35 +850,32 @@ static void tmc_etr_disable_hw(struct tmc_drvdata *drvdata) static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) { int ret = 0; - bool used = false; unsigned long flags; - void __iomem *vaddr = NULL; - dma_addr_t paddr = 0; struct tmc_drvdata *drvdata = dev_get_drvdata(csdev->dev.parent); + struct etr_buf *new_buf = NULL, *free_buf = NULL;
/* - * If we don't have a buffer release the lock and allocate memory. - * Otherwise keep the lock and move along. + * If we are enabling the ETR from disabled state, we need to make + * sure we have a buffer with the right size. The etr_buf is not reset + * immediately after we stop the tracing in SYSFS mode as we wait for + * the user to collect the data. We may be able to reuse the existing + * buffer, provided the size matches. Any allocation has to be done + * with the lock released. */ spin_lock_irqsave(&drvdata->spinlock, flags); - if (!drvdata->vaddr) { + if (!drvdata->etr_buf || (drvdata->etr_buf->size != drvdata->size)) { spin_unlock_irqrestore(&drvdata->spinlock, flags);
- /* - * Contiguous memory can't be allocated while a spinlock is - * held. As such allocate memory here and free it if a buffer - * has already been allocated (from a previous session). - */ - vaddr = dma_alloc_coherent(drvdata->dev, drvdata->size, - &paddr, GFP_KERNEL); - if (!vaddr) - return -ENOMEM; + /* Allocate memory with the locks released */ + free_buf = new_buf = tmc_etr_setup_sysfs_buf(drvdata); + if (IS_ERR(new_buf)) + return PTR_ERR(new_buf);
/* Let's try again */ spin_lock_irqsave(&drvdata->spinlock, flags); }
- if (drvdata->reading) { + if (drvdata->reading || drvdata->mode == CS_MODE_PERF) { ret = -EBUSY; goto out; } @@ -697,21 +883,19 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) /* * In sysFS mode we can have multiple writers per sink. Since this * sink is already enabled no memory is needed and the HW need not be - * touched. + * touched, even if the buffer size has changed. */ if (drvdata->mode == CS_MODE_SYSFS) goto out;
/* - * If drvdata::vaddr == NULL, use the memory allocated above. - * Otherwise a buffer still exists from a previous session, so - * simply use that. + * If we don't have a buffer or it doesn't match the requested size, + * use the buffer allocated above. Otherwise reuse the existing buffer. */ - if (drvdata->vaddr == NULL) { - used = true; - drvdata->vaddr = vaddr; - drvdata->paddr = paddr; - drvdata->buf = drvdata->vaddr; + if (!drvdata->etr_buf || + (new_buf && drvdata->etr_buf->size != new_buf->size)) { + free_buf = drvdata->etr_buf; + drvdata->etr_buf = new_buf; }
drvdata->mode = CS_MODE_SYSFS; @@ -720,8 +904,8 @@ static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev) spin_unlock_irqrestore(&drvdata->spinlock, flags);
/* Free memory outside the spinlock if need be */ - if (!used && vaddr) - dma_free_coherent(drvdata->dev, drvdata->size, vaddr, paddr); + if (free_buf) + tmc_etr_free_sysfs_buf(free_buf);
if (!ret) dev_info(drvdata->dev, "TMC-ETR enabled\n"); @@ -800,8 +984,8 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata) goto out; }
- /* If drvdata::buf is NULL the trace data has been read already */ - if (drvdata->buf == NULL) { + /* If drvdata::etr_buf is NULL the trace data has been read already */ + if (drvdata->etr_buf == NULL) { ret = -EINVAL; goto out; } @@ -820,8 +1004,7 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata) int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata) { unsigned long flags; - dma_addr_t paddr; - void __iomem *vaddr = NULL; + struct etr_buf *etr_buf = NULL;
/* config types are set a boot time and never change */ if (WARN_ON_ONCE(drvdata->config_type != TMC_CONFIG_TYPE_ETR)) @@ -842,17 +1025,16 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata) * The ETR is not tracing and the buffer was just read. * As such prepare to free the trace buffer. */ - vaddr = drvdata->vaddr; - paddr = drvdata->paddr; - drvdata->buf = drvdata->vaddr = NULL; + etr_buf = drvdata->etr_buf; + drvdata->etr_buf = NULL; }
drvdata->reading = false; spin_unlock_irqrestore(&drvdata->spinlock, flags);
/* Free allocated memory out side of the spinlock */ - if (vaddr) - dma_free_coherent(drvdata->dev, drvdata->size, vaddr, paddr); + if (etr_buf) + tmc_free_etr_buf(etr_buf);
return 0; } diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h index cdb668b..39ed306 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.h +++ b/drivers/hwtracing/coresight/coresight-tmc.h @@ -123,6 +123,34 @@ enum tmc_mem_intf_width { #define CORESIGHT_SOC_600_ETR_CAPS \ (TMC_ETR_SAVE_RESTORE | TMC_ETR_AXI_ARCACHE)
+enum etr_mode { + ETR_MODE_FLAT, /* Uses contiguous flat buffer */ +}; + +struct etr_buf_operations; + +/** + * struct etr_buf - Details of the buffer used by ETR + * @mode : Mode of the ETR buffer, contiguous, Scatter Gather etc. + * @full : Trace data overflow + * @size : Size of the buffer. + * @hwaddr : Address to be programmed in the TMC:DBA{LO,HI} + * @offset : Offset of the trace data in the buffer for consumption. + * @len : Available trace data @buf (may round up to the beginning). + * @ops : ETR buffer operations for the mode. + * @private : Backend specific information for the buf + */ +struct etr_buf { + enum etr_mode mode; + bool full; + ssize_t size; + dma_addr_t hwaddr; + unsigned long offset; + s64 len; + const struct etr_buf_operations *ops; + void *private; +}; + /** * struct tmc_drvdata - specifics associated to an TMC component * @base: memory mapped base address for this component. @@ -130,11 +158,10 @@ enum tmc_mem_intf_width { * @csdev: component vitals needed by the framework. * @miscdev: specifics to handle "/dev/xyz.tmc" entry. * @spinlock: only one at a time pls. - * @buf: area of memory where trace data get sent. - * @paddr: DMA start location in RAM. - * @vaddr: virtual representation of @paddr. - * @size: trace buffer size. - * @len: size of the available trace. + * @buf: Snapshot of the trace data for ETF/ETB. + * @etr_buf: details of buffer used in TMC-ETR + * @len: size of the available trace for ETF/ETB. + * @size: trace buffer size for this TMC (common for all modes). * @mode: how this TMC is being used. * @config_type: TMC variant, must be of type @tmc_config_type. * @memwidth: width of the memory interface databus, in bytes. @@ -149,11 +176,12 @@ struct tmc_drvdata { struct miscdevice miscdev; spinlock_t spinlock; bool reading; - char *buf; - dma_addr_t paddr; - void __iomem *vaddr; - u32 size; + union { + char *buf; /* TMC ETB */ + struct etr_buf *etr_buf; /* TMC ETR */ + }; u32 len; + u32 size; u32 mode; enum tmc_config_type config_type; enum tmc_mem_intf_width memwidth; @@ -161,6 +189,15 @@ struct tmc_drvdata { u32 etr_caps; };
+struct etr_buf_operations { + int (*alloc)(struct tmc_drvdata *drvdata, struct etr_buf *etr_buf, + int node, void **pages); + void (*sync)(struct etr_buf *etr_buf, u64 rrp, u64 rwp); + ssize_t (*get_data)(struct etr_buf *etr_buf, u64 offset, size_t len, + char **bufpp); + void (*free)(struct etr_buf *etr_buf); +}; + /** * struct tmc_pages - Collection of pages used for SG. * @nr_pages: Number of pages in the list.
Add the support for Scatter-Gather mode to the etr-buf layer. Since we now have two different modes, we choose the backend based on a set of conditions, documented in the code.
Cc: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- Change since last version: - New in this version, splitted from the original patch. - No functional changes. --- drivers/hwtracing/coresight/coresight-tmc-etr.c | 114 +++++++++++++++++++++++- drivers/hwtracing/coresight/coresight-tmc.h | 1 + 2 files changed, 111 insertions(+), 4 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index a0e504a..19955cf 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -516,7 +516,7 @@ static void tmc_etr_sg_table_populate(struct etr_sg_table *etr_table) * @size - Total size of the data buffer * @pages - Optional list of page virtual address */ -static struct etr_sg_table __maybe_unused * +static struct etr_sg_table * tmc_init_etr_sg_table(struct device *dev, int node, unsigned long size, void **pages) { @@ -623,8 +623,86 @@ static const struct etr_buf_operations etr_flat_buf_ops = { .get_data = tmc_etr_get_data_flat_buf, };
+/* + * tmc_etr_alloc_sg_buf: Allocate an SG buf @etr_buf. Setup the parameters + * appropriately. + */ +static int tmc_etr_alloc_sg_buf(struct tmc_drvdata *drvdata, + struct etr_buf *etr_buf, int node, + void **pages) +{ + struct etr_sg_table *etr_table; + + etr_table = tmc_init_etr_sg_table(drvdata->dev, node, + etr_buf->size, pages); + if (IS_ERR(etr_table)) + return -ENOMEM; + etr_buf->hwaddr = etr_table->hwaddr; + etr_buf->mode = ETR_MODE_ETR_SG; + etr_buf->private = etr_table; + return 0; +} + +static void tmc_etr_free_sg_buf(struct etr_buf *etr_buf) +{ + struct etr_sg_table *etr_table = etr_buf->private; + + if (etr_table) { + tmc_free_sg_table(etr_table->sg_table); + kfree(etr_table); + } +} + +static ssize_t tmc_etr_get_data_sg_buf(struct etr_buf *etr_buf, u64 offset, + size_t len, char **bufpp) +{ + struct etr_sg_table *etr_table = etr_buf->private; + + return tmc_sg_table_get_data(etr_table->sg_table, offset, len, bufpp); +} + +static void tmc_etr_sync_sg_buf(struct etr_buf *etr_buf, u64 rrp, u64 rwp) +{ + long r_offset, w_offset; + struct etr_sg_table *etr_table = etr_buf->private; + struct tmc_sg_table *table = etr_table->sg_table; + + /* Convert hw address to offset in the buffer */ + r_offset = tmc_sg_get_data_page_offset(table, rrp); + if (r_offset < 0) { + dev_warn(table->dev, + "Unable to map RRP %llx to offset\n", rrp); + etr_buf->len = 0; + return; + } + + w_offset = tmc_sg_get_data_page_offset(table, rwp); + if (w_offset < 0) { + dev_warn(table->dev, + "Unable to map RWP %llx to offset\n", rwp); + etr_buf->len = 0; + return; + } + + etr_buf->offset = r_offset; + if (etr_buf->full) + etr_buf->len = etr_buf->size; + else + etr_buf->len = ((w_offset < r_offset) ? etr_buf->size : 0) + + w_offset - r_offset; + tmc_sg_table_sync_data_range(table, r_offset, etr_buf->len); +} + +static const struct etr_buf_operations etr_sg_buf_ops = { + .alloc = tmc_etr_alloc_sg_buf, + .free = tmc_etr_free_sg_buf, + .sync = tmc_etr_sync_sg_buf, + .get_data = tmc_etr_get_data_sg_buf, +}; + static const struct etr_buf_operations *etr_buf_ops[] = { [ETR_MODE_FLAT] = &etr_flat_buf_ops, + [ETR_MODE_ETR_SG] = &etr_sg_buf_ops, };
static inline int tmc_etr_mode_alloc_buf(int mode, @@ -636,6 +714,7 @@ static inline int tmc_etr_mode_alloc_buf(int mode,
switch (mode) { case ETR_MODE_FLAT: + case ETR_MODE_ETR_SG: rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf, node, pages); if (!rc) etr_buf->ops = etr_buf_ops[mode]; @@ -657,17 +736,38 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata, ssize_t size, int flags, int node, void **pages) { - int rc = 0; + int rc = -ENOMEM; + bool has_etr_sg, has_iommu; struct etr_buf *etr_buf;
+ has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG); + has_iommu = iommu_get_domain_for_dev(drvdata->dev); + etr_buf = kzalloc(sizeof(*etr_buf), GFP_KERNEL); if (!etr_buf) return ERR_PTR(-ENOMEM);
etr_buf->size = size;
- rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata, - etr_buf, node, pages); + /* + * If we have to use an existing list of pages, we cannot reliably + * use a contiguous DMA memory (even if we have an IOMMU). Otherwise, + * we use the contiguous DMA memory if at least one of the following + * conditions is true: + * a) The ETR cannot use Scatter-Gather. + * b) we have a backing IOMMU + * c) The requested memory size is smaller (< 1M). + * + * Fallback to available mechanisms. + * + */ + if (!pages && + (!has_etr_sg || has_iommu || size < SZ_1M)) + rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata, + etr_buf, node, pages); + if (rc && has_etr_sg) + rc = tmc_etr_mode_alloc_buf(ETR_MODE_ETR_SG, drvdata, + etr_buf, node, pages); if (rc) { kfree(etr_buf); return ERR_PTR(rc); @@ -761,6 +861,12 @@ static void tmc_etr_enable_hw(struct tmc_drvdata *drvdata) axictl |= TMC_AXICTL_ARCACHE_OS; }
+ if (etr_buf->mode == ETR_MODE_ETR_SG) { + if (WARN_ON(!tmc_etr_has_cap(drvdata, TMC_ETR_SG))) + return; + axictl |= TMC_AXICTL_SCT_GAT_MODE; + } + writel_relaxed(axictl, drvdata->base + TMC_AXICTL); tmc_write_dba(drvdata, etr_buf->hwaddr); /* diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h index 39ed306..eeeba48 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.h +++ b/drivers/hwtracing/coresight/coresight-tmc.h @@ -125,6 +125,7 @@ enum tmc_mem_intf_width {
enum etr_mode { ETR_MODE_FLAT, /* Uses contiguous flat buffer */ + ETR_MODE_ETR_SG, /* Uses in-built TMC ETR SG mechanism */ };
struct etr_buf_operations;
Now that we can dynamically switch between contiguous memory and SG table depending on the trace buffer size, provide the support for selecting an appropriate buffer size.
Cc: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- .../ABI/testing/sysfs-bus-coresight-devices-tmc | 8 ++++++ .../devicetree/bindings/arm/coresight.txt | 3 +- drivers/hwtracing/coresight/coresight-tmc.c | 33 ++++++++++++++++++++++ 3 files changed, 43 insertions(+), 1 deletion(-)
diff --git a/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc b/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc index 4fe677e..ea78714 100644 --- a/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc +++ b/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc @@ -83,3 +83,11 @@ KernelVersion: 4.7 Contact: Mathieu Poirier mathieu.poirier@linaro.org Description: (R) Indicates the capabilities of the Coresight TMC. The value is read directly from the DEVID register, 0xFC8, + +What: /sys/bus/coresight/devices/<memory_map>.tmc/buffer_size +Date: August 2018 +KernelVersion: 4.18 +Contact: Mathieu Poirier mathieu.poirier@linaro.org +Description: (RW) Size of the trace buffer for TMC-ETR when used in SYSFS + mode. Writable only for TMC-ETR configurations. The value + should be aligned to the kernel pagesize. diff --git a/Documentation/devicetree/bindings/arm/coresight.txt b/Documentation/devicetree/bindings/arm/coresight.txt index 603d3c6..9aa30a1 100644 --- a/Documentation/devicetree/bindings/arm/coresight.txt +++ b/Documentation/devicetree/bindings/arm/coresight.txt @@ -84,7 +84,8 @@ its hardware characteristcs. * Optional property for TMC:
* arm,buffer-size: size of contiguous buffer space for TMC ETR - (embedded trace router) + (embedded trace router). This property is obsolete. The buffer size + can be configured dynamically via buffer_size property in sysfs.
* arm,scatter-gather: boolean. Indicates that the TMC-ETR can safely use the SG mode on this system. diff --git a/drivers/hwtracing/coresight/coresight-tmc.c b/drivers/hwtracing/coresight/coresight-tmc.c index bc8fc86..1b817ec 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.c +++ b/drivers/hwtracing/coresight/coresight-tmc.c @@ -277,8 +277,41 @@ static ssize_t trigger_cntr_store(struct device *dev, } static DEVICE_ATTR_RW(trigger_cntr);
+static ssize_t buffer_size_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct tmc_drvdata *drvdata = dev_get_drvdata(dev->parent); + + return sprintf(buf, "%#x\n", drvdata->size); +} + +static ssize_t buffer_size_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t size) +{ + int ret; + unsigned long val; + struct tmc_drvdata *drvdata = dev_get_drvdata(dev->parent); + + /* Only permitted for TMC-ETRs */ + if (drvdata->config_type != TMC_CONFIG_TYPE_ETR) + return -EPERM; + + ret = kstrtoul(buf, 0, &val); + if (ret) + return ret; + /* The buffer size should be page aligned */ + if (val & (PAGE_SIZE - 1)) + return -EINVAL; + drvdata->size = val; + return size; +} + +static DEVICE_ATTR_RW(buffer_size); + static struct attribute *coresight_tmc_attrs[] = { &dev_attr_trigger_cntr.attr, + &dev_attr_buffer_size.attr, NULL, };
On Tue, May 29, 2018 at 02:15:37PM +0100, Suzuki K Poulose wrote:
Now that we can dynamically switch between contiguous memory and SG table depending on the trace buffer size, provide the support for selecting an appropriate buffer size.
Cc: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com
.../ABI/testing/sysfs-bus-coresight-devices-tmc | 8 ++++++ .../devicetree/bindings/arm/coresight.txt | 3 +-
Acked-by: Rob Herring robh@kernel.org
drivers/hwtracing/coresight/coresight-tmc.c | 33 ++++++++++++++++++++++ 3 files changed, 43 insertions(+), 1 deletion(-)
On 29 May 2018 at 07:15, Suzuki K Poulose suzuki.poulose@arm.com wrote:
This series is split of the Coresight ETR perf support patches posted here [0]. The CATU support and perf backend support will be posted as separate series for better management and review of the patches.
This series adds the support for TMC ETR Scatter-Gather mode to allow using physical non-contiguous buffer for holding the trace data. It also adds a layer to handle the buffer management in a transparent manner, independent of the underlying mode used by the TMC ETR. The layer chooses the ETR mode based on different parameters (size, re-using a set of pages, presence of an SMMU etc.).
Finally we add a sysfs parameter to tune the buffer size for ETR in sysfs-mode.
During the testing, we found out that if the TMC ETR is not properly connected to the memory subsystem, the ETR could lock-up the system while waiting for the "read" transactions to complete in scatter-gather mode. So, we do not use the mode on a system unless it is safe to do so. This is specified by a DT property "arm,scatter-gather".
Applies on coreisght-next tree from Mathieu
Changes since previous version [1]:
- Rebased to Mathieu's coresight-next tree to resolve a conflict.
- Added tags for DT changes from Rob and Mathieu
- Split the SG mode backend support patch from the ETR-BUF patch.
- Address other comments from Mathieu
Changes since splitted series [0] :
- Split the series in [0]
- Address comments on v2
- Rename DT property "scatter-gather" to "arm,scatter-gather"
- Add ETM PID for Cortex-A35, use macros to make the listing easier
[0] - http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/574875.html [1] - http://lists.infradead.org/pipermail/linux-arm-kernel/2018-May/579135.html
Suzuki K Poulose (12): coresight: ETM: Add support for Arm Cortex-A73 and Cortex-A35 coresight: tmc: Hide trace buffer handling for file read coresight: tmc-etr: Do not clean trace buffer coresight: tmc-etr: Disallow perf mode coresight: Add helper for inserting synchronization packets dts: bindings: Restrict coresight tmc-etr scatter-gather mode dts: juno: Add scatter-gather support for all revisions coresight: Add generic TMC sg table framework coresight: Add support for TMC ETR SG unit coresight: tmc-etr: Add transparent buffer management coresight: tmc-etr buf: Add TMC scatter gather mode backend coresight: tmc: Add configuration support for trace buffer size
.../ABI/testing/sysfs-bus-coresight-devices-tmc | 8 + .../devicetree/bindings/arm/coresight.txt | 5 +- arch/arm64/boot/dts/arm/juno-base.dtsi | 1 + drivers/hwtracing/coresight/coresight-etb10.c | 12 +- drivers/hwtracing/coresight/coresight-etm4x.c | 31 +- drivers/hwtracing/coresight/coresight-priv.h | 10 +- drivers/hwtracing/coresight/coresight-tmc-etf.c | 45 +- drivers/hwtracing/coresight/coresight-tmc-etr.c | 1010 ++++++++++++++++++-- drivers/hwtracing/coresight/coresight-tmc.c | 83 +- drivers/hwtracing/coresight/coresight-tmc.h | 110 ++- drivers/hwtracing/coresight/coresight.c | 3 +- 11 files changed, 1144 insertions(+), 174 deletions(-)
Applied after correcting indentation problems and rectifying the kernel version and date in "sysfs-bus-coresight-devices-tmc".
Mathieu
-- 2.7.4
On 31/05/18 16:36, Mathieu Poirier wrote:
On 29 May 2018 at 07:15, Suzuki K Poulose suzuki.poulose@arm.com wrote:
Suzuki K Poulose (12): coresight: ETM: Add support for Arm Cortex-A73 and Cortex-A35 coresight: tmc: Hide trace buffer handling for file read coresight: tmc-etr: Do not clean trace buffer coresight: tmc-etr: Disallow perf mode coresight: Add helper for inserting synchronization packets dts: bindings: Restrict coresight tmc-etr scatter-gather mode dts: juno: Add scatter-gather support for all revisions coresight: Add generic TMC sg table framework coresight: Add support for TMC ETR SG unit coresight: tmc-etr: Add transparent buffer management coresight: tmc-etr buf: Add TMC scatter gather mode backend coresight: tmc: Add configuration support for trace buffer size
.../ABI/testing/sysfs-bus-coresight-devices-tmc | 8 + .../devicetree/bindings/arm/coresight.txt | 5 +- arch/arm64/boot/dts/arm/juno-base.dtsi | 1 + drivers/hwtracing/coresight/coresight-etb10.c | 12 +- drivers/hwtracing/coresight/coresight-etm4x.c | 31 +- drivers/hwtracing/coresight/coresight-priv.h | 10 +- drivers/hwtracing/coresight/coresight-tmc-etf.c | 45 +- drivers/hwtracing/coresight/coresight-tmc-etr.c | 1010 ++++++++++++++++++-- drivers/hwtracing/coresight/coresight-tmc.c | 83 +- drivers/hwtracing/coresight/coresight-tmc.h | 110 ++- drivers/hwtracing/coresight/coresight.c | 3 +- 11 files changed, 1144 insertions(+), 174 deletions(-)
Applied after correcting indentation problems and rectifying the kernel version and date in "sysfs-bus-coresight-devices-tmc".
Mathieu,
Thanks for the fixups.
Suzuki