Hi,
Sharing here the design notes and work in progress patches to get some early feedback on the implementation approach.
Introduction ============ This RFC is about extending Linux coresight driver support to address kernel panic and watchdog reset scenarios. This would help coresight users to debug kernel panic and watchdog reset with the help of coresight trace data.
For simplicity, watchdog and kernel panic are addressed in separate sections.
Coresight trace capture: Kernel panic -------------------------------------
From the coresight driver point of view, addressing the kernel panic situation has four main requirements.
a. Support for allocation of trace buffer pages from reserved memory area. Platform can advertise this using a new device tree property added to relevant coresight nodes.
b. Support for stopping coresight blocks at the time of panic
c. Saving required metadata in the specified format
d. Support for reading trace data captured at the time of panic
Allocation of trace buffer pages from reserved RAM ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A new optional device tree property "memory-region" will be added to the ETR/ETF device nodes, that would give the base address and size of trace buffer.
Static allocation of trace buffers would ensure that both IOMMU enabled and disabled cases are handled. Also, platforms that support persistent RAM will allow users to read trace data in the subsequent boot without booting the crashdump kernel.
Note: For ETR sink devices, this reserved region will be used for both trace capture and trace data retrieval. For ETF sink devices, internal SRAM would be used for trace capture, and they would be synced to reserved region for retrieval.
Disabling coresight blocks at the time of panic ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In order to avoid the situation of losing relevant trace data after a kernel panic, it would be desirable to stop the coresight blocks at the time of panic.
This can be achieved by configuring the comparator, CTI and sink devices as below,
Comparator(triggers on kernel panic) --->External out --->CTI -- | ETR/ETF stop <------External In <--------------
Note: No supporting patches are shared for this, since we are planning to make use of the System configuration manager to achieve the required configuration. This is a work in progress.
Saving metadata at the time of kernel panic ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Coresight metadata involves all additional data that are required for a successful trace decode in addition to the trace data. This involves ETR/ETF, ETE register snapshot etc.
A new optional device property "memory-region" will be added to the ETR/ETF/ETE device nodes for this.
Reading trace data captured at the time of panic ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Trace data captured at the time of panic, can be read from rebooted kernel or from crashdump kernel using the below mentioned interface.
Steps for reading trace data captured in previous boot ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1. cd /sys/bus/coresight/devices/tmc_etrXX/
2. Change to special mode called, read_prevboot.
#echo 1 > read_prevboot
3. Dump trace buffer data to a file,
#dd if=/dev/tmc_etrXX of=~/cstrace.bin
4. Reset back to normal mode
#echo 0 > read_prevboot
General flow of trace capture and decode incase of kernel panic ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1. Enable source and sink on all the cores using the sysfs interface. ETR sink will have trace buffers allocated from reserved memory.
2. Run relevant tests.
3. On a kernel panic, all coresight blocks are disabled, necessary metadata is synced by kernel panic handler.
System would eventually reboot or boot a crashdump kernel.
4. For platforms that supports crashdump kernel, raw trace data can be dumped using the coresight sysfs interface from the crashdump kernel itself. Persistent RAM is not a requirement in this case.
5. For platforms that supports persistent RAM, trace data can be dumped using the the coresight sysfs interface in the subsequent Linux boot. Crashdump kernel is not a requirement in this case. Persistent RAM ensures that trace data is intact across reboot.
Coresight trace capture: Watchdog reset --------------------------------------- The main difference between addressing the watchdog reset and kernel panic case are below,
a. Saving coresight metadata need to be taken care by the SCP(system control processor) firmware in the specified format, instead of kernel.
b. Reserved memory region given by firmware for trace buffer and metadata has to be in persistent RAM. Note: This is a requirement for watchdog reset case but optional in kernel panic case.
Watchdog reset can be supported only on platforms that meet the above two requirements.
Testing so far: --------------- Watchdog reset has been tested on Marvell SOCs using the above approach on 5.x kernel version with sysfs method.
Linu Cherian (5): dt-bindings: arm: coresight-tmc: Add "memory-region" property ccoresight: tmc-etr: Add support to use reserved trace memory coresight: core: Add provision for panic callbacks coresight: tmc: Add support for panic sync handling coresight: tmc: Add support for reading tracedata from previous boot
.../bindings/arm/arm,coresight-tmc.yaml | 9 ++ drivers/hwtracing/coresight/coresight-core.c | 31 +++++ drivers/hwtracing/coresight/coresight-priv.h | 1 + .../hwtracing/coresight/coresight-tmc-core.c | 112 ++++++++++++++++++ .../hwtracing/coresight/coresight-tmc-etf.c | 69 +++++++++++ .../hwtracing/coresight/coresight-tmc-etr.c | 98 ++++++++++++++- drivers/hwtracing/coresight/coresight-tmc.h | 32 +++++ include/linux/coresight.h | 11 ++ 8 files changed, 362 insertions(+), 1 deletion(-)
memory-region 0: Reserved trace buffer memory
TMC ETR: When available, use this reserved memory region for trace data capture. Same region is used for trace data retention after a panic or watchdog reset.
TMC ETF: When available, use this reserved memory region for trace data retention synced from internal SRAM after a panic or watchdog reset.
memory-region 1: Reserved meta data memory
TMC ETR, ETF: When available, use this memory for register snapshot retention synced from hardware registers after a panic or watchdog reset.
Signed-off-by: Linu Cherian lcherian@marvell.com --- .../devicetree/bindings/arm/arm,coresight-tmc.yaml | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/Documentation/devicetree/bindings/arm/arm,coresight-tmc.yaml b/Documentation/devicetree/bindings/arm/arm,coresight-tmc.yaml index cb8dceaca70e..10da9331c165 100644 --- a/Documentation/devicetree/bindings/arm/arm,coresight-tmc.yaml +++ b/Documentation/devicetree/bindings/arm/arm,coresight-tmc.yaml @@ -101,6 +101,13 @@ properties: and ETF configurations. $ref: /schemas/graph.yaml#/properties/port
+ memory-region: + items: + - description: Reserved trace buffer memory. Used for ETR and ETF + configurations. + - description: Reserved meta data memory. Used for ETR and ETF + configurations. + required: - compatible - reg @@ -115,6 +122,8 @@ examples: etr@20070000 { compatible = "arm,coresight-tmc", "arm,primecell"; reg = <0x20070000 0x1000>; + memory-region = <&etr_trace_mem_reserved>, + <&etr_mdata_mem_reserved>;
clocks = <&oscclk6a>; clock-names = "apb_pclk";
Add support to use reserved memory for coresight ETR trace buffer.
Introduce a new ETR buffer mode called ETR_MODE_RESRV, which gets enabled when ETR device tree node is supplied with a valid reserved memory region.
Signed-off-by: Anil Kumar Reddy areddy3@marvell.com Signed-off-by: Linu Cherian lcherian@marvell.com --- .../hwtracing/coresight/coresight-tmc-core.c | 50 +++++++++++++++ .../hwtracing/coresight/coresight-tmc-etr.c | 63 ++++++++++++++++++- drivers/hwtracing/coresight/coresight-tmc.h | 16 +++++ 3 files changed, 128 insertions(+), 1 deletion(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-core.c b/drivers/hwtracing/coresight/coresight-tmc-core.c index c106d142e632..22d33a2233b8 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-core.c +++ b/drivers/hwtracing/coresight/coresight-tmc-core.c @@ -21,6 +21,7 @@ #include <linux/spinlock.h> #include <linux/pm_runtime.h> #include <linux/of.h> +#include <linux/of_address.h> #include <linux/coresight.h> #include <linux/amba/bus.h>
@@ -362,6 +363,47 @@ static inline bool tmc_etr_has_non_secure_access(struct tmc_drvdata *drvdata) return (auth & TMC_AUTH_NSID_MASK) == 0x3; }
+static int tmc_get_reserved_region(struct device *parent, void *dev_caps) +{ + struct tmc_drvdata *drvdata = dev_get_drvdata(parent); + struct device_node *node; + struct resource res; + int rc; + + node = of_parse_phandle(parent->of_node, "memory-region", 0); + if (!node) { + dev_dbg(parent, "No memory-region specified\n"); + goto out; + } + + rc = of_address_to_resource(node, 0, &res); + of_node_put(node); + if (rc) { + dev_err(parent, "No address assigned to the memory-region\n"); + goto out; + } + + if (res.start != 0 && resource_size(&res) != 0) { + drvdata->tmc_resrv_buf.vaddr = memremap(res.start, + resource_size(&res), + MEMREMAP_WC); + if (IS_ERR(drvdata->tmc_resrv_buf.vaddr)) { + dev_err(parent, "Failed to map destination address for reserved memory\n"); + rc = PTR_ERR(drvdata->tmc_resrv_buf.vaddr); + goto out; + } + + drvdata->tmc_resrv_buf.paddr = res.start; + drvdata->tmc_resrv_buf.size = resource_size(&res); + /* Size of contiguous buffer space for TMC ETR */ + drvdata->size = drvdata->tmc_resrv_buf.size; + return 0; + } + +out: + return -ENOMEM; +} + /* Detect and initialise the capabilities of a TMC ETR */ static int tmc_etr_setup_caps(struct device *parent, u32 devid, void *dev_caps) { @@ -378,6 +420,13 @@ static int tmc_etr_setup_caps(struct device *parent, u32 devid, void *dev_caps) if (!(devid & TMC_DEVID_NOSCAT) && tmc_etr_can_use_sg(parent)) tmc_etr_set_cap(drvdata, TMC_ETR_SG);
+ /* Get reserved memory region if specified and + * set capability to use reserved memory for trace buffer. + */ + if (!tmc_get_reserved_region(parent, dev_caps)) + tmc_etr_set_cap(drvdata, TMC_ETR_RESRV_MEM); + + /* Check if the AXI address width is available */ if (devid & TMC_DEVID_AXIAW_VALID) dma_mask = ((devid >> TMC_DEVID_AXIAW_SHIFT) & @@ -402,6 +451,7 @@ static int tmc_etr_setup_caps(struct device *parent, u32 devid, void *dev_caps) rc = dma_set_mask_and_coherent(parent, DMA_BIT_MASK(dma_mask)); if (rc) dev_err(parent, "Failed to setup DMA mask: %d\n", rc); + return rc; }
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index 918d461fcf4a..82d0e3840b50 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -686,6 +686,61 @@ static const struct etr_buf_operations etr_flat_buf_ops = { .get_data = tmc_etr_get_data_flat_buf, };
+/* + * tmc_etr_alloc_resrv_buf: Allocate a contiguous DMA buffer from reserved region. + */ +static int tmc_etr_alloc_resrv_buf(struct tmc_drvdata *drvdata, + struct etr_buf *etr_buf, int node, + void **pages) +{ + struct etr_flat_buf *resrv_buf; + struct device *real_dev = drvdata->csdev->dev.parent; + + /* We cannot reuse existing pages for resrv buf */ + if (pages) + return -EINVAL; + + resrv_buf = kzalloc(sizeof(*resrv_buf), GFP_KERNEL); + if (!resrv_buf) + return -ENOMEM; + + resrv_buf->daddr = dma_map_resource(real_dev, drvdata->tmc_resrv_buf.paddr, + etr_buf->size, DMA_FROM_DEVICE, 0); + if (dma_mapping_error(real_dev, resrv_buf->daddr)) { + dev_err(real_dev, "failed to map source buffer address\n"); + kfree(resrv_buf); + return -ENOMEM; + } + + resrv_buf->vaddr = drvdata->tmc_resrv_buf.vaddr; + resrv_buf->size = etr_buf->size; + resrv_buf->dev = &drvdata->csdev->dev; + etr_buf->hwaddr = resrv_buf->daddr; + etr_buf->mode = ETR_MODE_RESRV; + etr_buf->private = resrv_buf; + return 0; +} + +static void tmc_etr_free_resrv_buf(struct etr_buf *etr_buf) +{ + struct etr_flat_buf *resrv_buf = etr_buf->private; + + if (resrv_buf && resrv_buf->daddr) { + struct device *real_dev = resrv_buf->dev->parent; + + dma_unmap_resource(real_dev, resrv_buf->daddr, + resrv_buf->size, DMA_FROM_DEVICE, 0); + } + kfree(resrv_buf); +} + +static const struct etr_buf_operations etr_resrv_buf_ops = { + .alloc = tmc_etr_alloc_resrv_buf, + .free = tmc_etr_free_resrv_buf, + .sync = tmc_etr_sync_flat_buf, + .get_data = tmc_etr_get_data_flat_buf, +}; + /* * tmc_etr_alloc_sg_buf: Allocate an SG buf @etr_buf. Setup the parameters * appropriately. @@ -813,6 +868,7 @@ static const struct etr_buf_operations *etr_buf_ops[] = { [ETR_MODE_FLAT] = &etr_flat_buf_ops, [ETR_MODE_ETR_SG] = &etr_sg_buf_ops, [ETR_MODE_CATU] = NULL, + [ETR_MODE_RESRV] = &etr_resrv_buf_ops };
void tmc_etr_set_catu_ops(const struct etr_buf_operations *catu) @@ -838,6 +894,7 @@ static inline int tmc_etr_mode_alloc_buf(int mode, case ETR_MODE_FLAT: case ETR_MODE_ETR_SG: case ETR_MODE_CATU: + case ETR_MODE_RESRV: if (etr_buf_ops[mode] && etr_buf_ops[mode]->alloc) rc = etr_buf_ops[mode]->alloc(drvdata, etr_buf, node, pages); @@ -862,7 +919,7 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata, int node, void **pages) { int rc = -ENOMEM; - bool has_etr_sg, has_iommu; + bool has_etr_sg, has_iommu, has_etr_resrv_mem; bool has_sg, has_catu; struct etr_buf *etr_buf; struct device *dev = &drvdata->csdev->dev; @@ -870,6 +927,7 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata, has_etr_sg = tmc_etr_has_cap(drvdata, TMC_ETR_SG); has_iommu = iommu_get_domain_for_dev(dev->parent); has_catu = !!tmc_etr_get_catu_device(drvdata); + has_etr_resrv_mem = tmc_etr_has_cap(drvdata, TMC_ETR_RESRV_MEM);
has_sg = has_catu || has_etr_sg;
@@ -891,6 +949,9 @@ static struct etr_buf *tmc_alloc_etr_buf(struct tmc_drvdata *drvdata, * Fallback to available mechanisms. * */ + if (has_etr_resrv_mem) + rc = tmc_etr_mode_alloc_buf(ETR_MODE_RESRV, drvdata, + etr_buf, node, pages); if (!pages && (!has_sg || has_iommu || size < SZ_1M)) rc = tmc_etr_mode_alloc_buf(ETR_MODE_FLAT, drvdata, diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h index 01c0382a29c0..c96b53b5cf89 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.h +++ b/drivers/hwtracing/coresight/coresight-tmc.h @@ -126,6 +126,8 @@ enum tmc_mem_intf_width { * so we have to rely on PID of the IP to detect the functionality. */ #define TMC_ETR_SAVE_RESTORE (0x1U << 2) +/* TMC ETR to use reserved memory for trace buffer*/ +#define TMC_ETR_RESRV_MEM (0x1U << 3)
/* Coresight SoC-600 TMC-ETR unadvertised capabilities */ #define CORESIGHT_SOC_600_ETR_CAPS \ @@ -135,6 +137,7 @@ enum etr_mode { ETR_MODE_FLAT, /* Uses contiguous flat buffer */ ETR_MODE_ETR_SG, /* Uses in-built TMC ETR SG mechanism */ ETR_MODE_CATU, /* Use SG mechanism in CATU */ + ETR_MODE_RESRV, /* Use reserved region contiguous buffer */ };
struct etr_buf_operations; @@ -163,6 +166,17 @@ struct etr_buf { void *private; };
+/** + * @paddr : Start address of reserved memory region. + * @vaddr : Corresponding CPU virtual address. + * @size : Size of reserved memory region. + */ +struct resrv_buf { + phys_addr_t paddr; + void *vaddr; + size_t size; +}; + /** * struct tmc_drvdata - specifics associated to an TMC component * @base: memory mapped base address for this component. @@ -187,6 +201,7 @@ struct etr_buf { * @idr_mutex: Access serialisation for idr. * @sysfs_buf: SYSFS buffer for ETR. * @perf_buf: PERF buffer for ETR. + * @tmc_resrv_buf: Reserved Memory for trace data buffer. Used by ETR/ETF. */ struct tmc_drvdata { void __iomem *base; @@ -211,6 +226,7 @@ struct tmc_drvdata { struct mutex idr_mutex; struct etr_buf *sysfs_buf; struct etr_buf *perf_buf; + struct resrv_buf tmc_resrv_buf; };
struct etr_buf_operations {
Panic callback handlers allows coresight device drivers to sync relevant trace data and trace metadata to reserved memory regions so that they can be retrieved later in the subsequent boot or in the crashdump kernel.
Signed-off-by: Linu Cherian lcherian@marvell.com --- drivers/hwtracing/coresight/coresight-core.c | 31 ++++++++++++++++++++ include/linux/coresight.h | 11 +++++++ 2 files changed, 42 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-core.c b/drivers/hwtracing/coresight/coresight-core.c index d3bf82c0de1d..677c171d1e1d 100644 --- a/drivers/hwtracing/coresight/coresight-core.c +++ b/drivers/hwtracing/coresight/coresight-core.c @@ -19,6 +19,7 @@ #include <linux/of_platform.h> #include <linux/delay.h> #include <linux/pm_runtime.h> +#include <linux/panic_notifier.h>
#include "coresight-etm-perf.h" #include "coresight-priv.h" @@ -1765,6 +1766,31 @@ struct bus_type coresight_bustype = { .name = "coresight", };
+int coresight_panic_sync(struct device *dev, void *data) +{ + + struct coresight_device *csdev = container_of(dev, struct coresight_device, dev); + + /* Run through panic sync handlers for all enabled devices */ + if (csdev->enable && panic_ops(csdev)) + panic_ops(csdev)->sync(csdev); + + return 0; +} + +static int coresight_panic_cb(struct notifier_block *self, + unsigned long v, void *p) +{ + bus_for_each_dev(&coresight_bustype, NULL, NULL, + coresight_panic_sync); + + return 0; +} + +static struct notifier_block coresight_notifier = { + .notifier_call = coresight_panic_cb, +}; + static int __init coresight_init(void) { int ret; @@ -1777,6 +1803,11 @@ static int __init coresight_init(void) if (ret) goto exit_bus_unregister;
+ + /* Register function to be called for panic */ + ret = atomic_notifier_chain_register(&panic_notifier_list, + &coresight_notifier); + /* initialise the coresight syscfg API */ ret = cscfg_init(); if (!ret) diff --git a/include/linux/coresight.h b/include/linux/coresight.h index f19a47b9bb5a..8831df24733a 100644 --- a/include/linux/coresight.h +++ b/include/linux/coresight.h @@ -277,6 +277,7 @@ static struct coresight_dev_list (var) = { \ #define link_ops(csdev) csdev->ops->link_ops #define helper_ops(csdev) csdev->ops->helper_ops #define ect_ops(csdev) csdev->ops->ect_ops +#define panic_ops(csdev) csdev->ops->panic_ops
/** * struct coresight_ops_sink - basic operations for a sink @@ -351,12 +352,22 @@ struct coresight_ops_ect { int (*disable)(struct coresight_device *csdev); };
+/** + * struct coresight_ops_panic - Generic device ops for panic handing + * + * @sync : Sync the device register state/trace data + */ +struct coresight_ops_panic { + int (*sync)(struct coresight_device *csdev); +}; + struct coresight_ops { const struct coresight_ops_sink *sink_ops; const struct coresight_ops_link *link_ops; const struct coresight_ops_source *source_ops; const struct coresight_ops_helper *helper_ops; const struct coresight_ops_ect *ect_ops; + const struct coresight_ops_panic *panic_ops; };
#if IS_ENABLED(CONFIG_CORESIGHT)
- Get reserved region from device tree node for metadata - Define metadata format for TMC - Add TMC ETR panic sync handler that syncs register snapshot to metadata region - Add TMC ETF panic sync handler that syncs register snapshot to metadata region and internal SRAM to reserved trace buffer region.
Signed-off-by: Anil Kumar Reddy areddy3@marvell.com Signed-off-by: Tanmay Jagdale tanmay@marvell.com Signed-off-by: Linu Cherian lcherian@marvell.com --- .../hwtracing/coresight/coresight-tmc-core.c | 29 +++++++++++++++++++ .../hwtracing/coresight/coresight-tmc-etf.c | 19 ++++++++++++ .../hwtracing/coresight/coresight-tmc-etr.c | 12 ++++++++ drivers/hwtracing/coresight/coresight-tmc.h | 16 ++++++++++ 4 files changed, 76 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-core.c b/drivers/hwtracing/coresight/coresight-tmc-core.c index 22d33a2233b8..0c1319851182 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-core.c +++ b/drivers/hwtracing/coresight/coresight-tmc-core.c @@ -370,6 +370,7 @@ static int tmc_get_reserved_region(struct device *parent, void *dev_caps) struct resource res; int rc;
+ /* Trace buffer region */ node = of_parse_phandle(parent->of_node, "memory-region", 0); if (!node) { dev_dbg(parent, "No memory-region specified\n"); @@ -397,6 +398,34 @@ static int tmc_get_reserved_region(struct device *parent, void *dev_caps) drvdata->tmc_resrv_buf.size = resource_size(&res); /* Size of contiguous buffer space for TMC ETR */ drvdata->size = drvdata->tmc_resrv_buf.size; + } + + /* Metadata region */ + node = of_parse_phandle(parent->of_node, "memory-region", 1); + if (!node) { + dev_dbg(parent, "No memory-region specified\n"); + goto out; + } + + rc = of_address_to_resource(node, 0, &res); + of_node_put(node); + if (rc) { + dev_err(parent, "No address assigned to the memory-region\n"); + goto out; + } + + if (res.start != 0 && resource_size(&res) != 0) { + drvdata->tmc_metadata.vaddr = memremap(res.start, + resource_size(&res), MEMREMAP_WC); + if (IS_ERR(drvdata->tmc_metadata.vaddr)) { + dev_err(parent, "Failed to map destination address for reserved memory\n"); + rc = PTR_ERR(drvdata->tmc_metadata.vaddr); + goto out; + } + + drvdata->tmc_metadata.paddr = res.start; + drvdata->tmc_metadata.size = resource_size(&res); + return 0; }
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c index 0ab1f73c2d06..6c84b9ca3318 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c @@ -586,6 +586,20 @@ static unsigned long tmc_update_etf_buffer(struct coresight_device *csdev, return to_read; }
+static int tmc_sync_etf_sink(struct coresight_device *csdev) +{ + /* + * TODO: + * 1. Sync registers from hardware to metadata region + */ + + /* + * TODO: + * 2. Sync Internal SRAM to reserved trace buffer region + */ + return 0; +} + static const struct coresight_ops_sink tmc_etf_sink_ops = { .enable = tmc_enable_etf_sink, .disable = tmc_disable_etf_sink, @@ -599,6 +613,10 @@ static const struct coresight_ops_link tmc_etf_link_ops = { .disable = tmc_disable_etf_link, };
+static const struct coresight_ops_panic tmc_etf_sync_ops = { + .sync = tmc_sync_etf_sink, +}; + const struct coresight_ops tmc_etb_cs_ops = { .sink_ops = &tmc_etf_sink_ops, }; @@ -606,6 +624,7 @@ const struct coresight_ops tmc_etb_cs_ops = { const struct coresight_ops tmc_etf_cs_ops = { .sink_ops = &tmc_etf_sink_ops, .link_ops = &tmc_etf_link_ops, + .panic_ops = &tmc_etf_sync_ops, };
int tmc_read_prepare_etb(struct tmc_drvdata *drvdata) diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index 82d0e3840b50..dc6146012c8c 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1780,10 +1780,22 @@ static const struct coresight_ops_sink tmc_etr_sink_ops = { .free_buffer = tmc_free_etr_buffer, };
+static int tmc_sync_etr_sink(struct coresight_device *csdev) +{ + /* TODO: Sync registers from hardware to metadata region */ + return 0; +} + +static const struct coresight_ops_panic tmc_etr_sync_ops = { + .sync = tmc_sync_etr_sink, +}; + const struct coresight_ops tmc_etr_cs_ops = { .sink_ops = &tmc_etr_sink_ops, + .panic_ops = &tmc_etr_sync_ops, };
+ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata) { int ret = 0; diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h index c96b53b5cf89..b5208af10c56 100644 --- a/drivers/hwtracing/coresight/coresight-tmc.h +++ b/drivers/hwtracing/coresight/coresight-tmc.h @@ -133,6 +133,20 @@ enum tmc_mem_intf_width { #define CORESIGHT_SOC_600_ETR_CAPS \ (TMC_ETR_SAVE_RESTORE | TMC_ETR_AXI_ARCACHE)
+/* TMC metadata region for ETR and ETF configurations */ +struct tmc_register_snapshot { + uint32_t valid; /* Indicate if this ETF/ETR was enabled */ + uint32_t size; /* Size of trace data */ + uint32_t rrphi; /* Read Pointer High Address bits */ + uint32_t rrp; /* Read Pointer */ + uint32_t rwphi; /* Write Pointer High Address bits */ + uint32_t rwp; /* Write Pointer */ + uint32_t sts; /* Status Register */ + uint32_t trc_addrhi; /* High Address bits of trace data in preserved region */ + uint32_t trc_addr; /* Address bits of trace data in preserved region */ + uint32_t reserved[7]; +}; + enum etr_mode { ETR_MODE_FLAT, /* Uses contiguous flat buffer */ ETR_MODE_ETR_SG, /* Uses in-built TMC ETR SG mechanism */ @@ -202,6 +216,7 @@ struct resrv_buf { * @sysfs_buf: SYSFS buffer for ETR. * @perf_buf: PERF buffer for ETR. * @tmc_resrv_buf: Reserved Memory for trace data buffer. Used by ETR/ETF. + * @tmc_metadata: Reserved memory for metadata. Used by ETR/ETF. */ struct tmc_drvdata { void __iomem *base; @@ -227,6 +242,7 @@ struct tmc_drvdata { struct etr_buf *sysfs_buf; struct etr_buf *perf_buf; struct resrv_buf tmc_resrv_buf; + struct resrv_buf tmc_metadata; };
struct etr_buf_operations {
* Introduce a new mode CS_MODE_READ_PREVBOOT for reading tracedata captured in previous boot.
* Add special handlers for preparing ETR/ETF for this special mode
* User can read the trace data as below
For example, for reading trace data from tmc_etf sink
1. cd /sys/bus/coresight/devices/tmc_etfXX/
2. Change mode to READ_PREVBOOT
#echo 1 > read_prevboot
3. Dump trace buffer data to a file,
#dd if=/dev/tmc_etrXX of=~/cstrace.bin
4. Reset back to normal mode
#echo 0 > read_prevboot
Signed-off-by: Anil Kumar Reddy areddy3@marvell.com Signed-off-by: Tanmay Jagdale tanmay@marvell.com Signed-off-by: Linu Cherian lcherian@marvell.com --- drivers/hwtracing/coresight/coresight-priv.h | 1 + .../hwtracing/coresight/coresight-tmc-core.c | 35 ++++++++++++ .../hwtracing/coresight/coresight-tmc-etf.c | 57 +++++++++++++++++++ .../hwtracing/coresight/coresight-tmc-etr.c | 24 ++++++++ 4 files changed, 117 insertions(+)
diff --git a/drivers/hwtracing/coresight/coresight-priv.h b/drivers/hwtracing/coresight/coresight-priv.h index 595ce5862056..2b8b83e72849 100644 --- a/drivers/hwtracing/coresight/coresight-priv.h +++ b/drivers/hwtracing/coresight/coresight-priv.h @@ -86,6 +86,7 @@ enum cs_mode { CS_MODE_DISABLED, CS_MODE_SYSFS, CS_MODE_PERF, + CS_MODE_READ_PREVBOOT, /* Trace data from previous boot */ };
/** diff --git a/drivers/hwtracing/coresight/coresight-tmc-core.c b/drivers/hwtracing/coresight/coresight-tmc-core.c index 0c1319851182..0a2199423c2d 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-core.c +++ b/drivers/hwtracing/coresight/coresight-tmc-core.c @@ -152,6 +152,10 @@ static int tmc_open(struct inode *inode, struct file *file) struct tmc_drvdata *drvdata = container_of(file->private_data, struct tmc_drvdata, miscdev);
+ /* Advertise if we are opening with a special mode */ + if (drvdata->mode == CS_MODE_READ_PREVBOOT) + dev_info(&drvdata->csdev->dev, "TMC read mode for previous boot\n"); + ret = tmc_read_prepare(drvdata); if (ret) return ret; @@ -330,9 +334,40 @@ static ssize_t buffer_size_store(struct device *dev,
static DEVICE_ATTR_RW(buffer_size);
+static ssize_t read_prevboot_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct tmc_drvdata *drvdata = dev_get_drvdata(dev->parent); + + return sprintf(buf, "%#x\n", drvdata->size); +} + +static ssize_t read_prevboot_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t size) +{ + int ret; + unsigned long val; + struct tmc_drvdata *drvdata = dev_get_drvdata(dev->parent); + + ret = kstrtoul(buf, 0, &val); + if (ret) + return ret; + + if (val) + drvdata->mode = CS_MODE_READ_PREVBOOT; + else + drvdata->mode = CS_MODE_DISABLED; + + return size; +} + +static DEVICE_ATTR_RW(read_prevboot); + static struct attribute *coresight_tmc_attrs[] = { &dev_attr_trigger_cntr.attr, &dev_attr_buffer_size.attr, + &dev_attr_read_prevboot.attr, NULL, };
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c index 6c84b9ca3318..92699610edcd 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etf.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c @@ -627,6 +627,42 @@ const struct coresight_ops tmc_etf_cs_ops = { .panic_ops = &tmc_etf_sync_ops, };
+/* Prepare for READ_PREVBOOT mode */ +int tmc_prepare_etb_prevboot(struct tmc_drvdata *drvdata) +{ + struct tmc_register_snapshot *reg_ptr; + unsigned long flags; + u64 trace_addr; + int ret = 0; + + spin_lock_irqsave(&drvdata->spinlock, flags); + + if (!drvdata->tmc_metadata.vaddr) { + ret = -ENOMEM; + goto out; + } + + reg_ptr = drvdata->tmc_metadata.vaddr; + trace_addr = reg_ptr->trc_addr | ((u64)reg_ptr->trc_addrhi << 32); + + drvdata->buf = memremap(trace_addr, reg_ptr->size, MEMREMAP_WC); + if (IS_ERR(drvdata->buf)) { + ret = -ENOMEM; + goto out; + } + + drvdata->len = reg_ptr->size; + + if (reg_ptr->sts & 0x1) + coresight_insert_barrier_packet(drvdata->buf); + drvdata->reading = true; + + spin_unlock_irqrestore(&drvdata->spinlock, flags); + +out: + return ret; +} + int tmc_read_prepare_etb(struct tmc_drvdata *drvdata) { enum tmc_mode mode; @@ -638,6 +674,9 @@ int tmc_read_prepare_etb(struct tmc_drvdata *drvdata) drvdata->config_type != TMC_CONFIG_TYPE_ETF)) return -EINVAL;
+ if (drvdata->mode == CS_MODE_READ_PREVBOOT) + return tmc_prepare_etb_prevboot(drvdata); + spin_lock_irqsave(&drvdata->spinlock, flags);
if (drvdata->reading) { @@ -674,6 +713,21 @@ int tmc_read_prepare_etb(struct tmc_drvdata *drvdata)
return ret; } +/* Handle READ_PREVBOOT mode */ +int tmc_unprepare_etb_prevboot(struct tmc_drvdata *drvdata) +{ + unsigned long flags; + + spin_lock_irqsave(&drvdata->spinlock, flags); + + drvdata->reading = false; + memunmap(drvdata->buf); + drvdata->buf = NULL; + + spin_unlock_irqrestore(&drvdata->spinlock, flags); + + return 0; +}
int tmc_read_unprepare_etb(struct tmc_drvdata *drvdata) { @@ -687,6 +741,9 @@ int tmc_read_unprepare_etb(struct tmc_drvdata *drvdata) drvdata->config_type != TMC_CONFIG_TYPE_ETF)) return -EINVAL;
+ if (drvdata->mode == CS_MODE_READ_PREVBOOT) + return tmc_unprepare_etb_prevboot(drvdata); + spin_lock_irqsave(&drvdata->spinlock, flags);
/* Re-enable the TMC if need be */ diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c index dc6146012c8c..1a53decd887b 100644 --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c @@ -1795,6 +1795,16 @@ const struct coresight_ops tmc_etr_cs_ops = { .panic_ops = &tmc_etr_sync_ops, };
+/* Prepare for READ_PREVBOOT mode */ +int tmc_prepare_etr_prevboot(struct tmc_drvdata *drvdata) +{ + /* + * TODO : Initialize the ETR buffer and status for trace data read + * using the metadata + */ + + return 0; +}
int tmc_read_prepare_etr(struct tmc_drvdata *drvdata) { @@ -1805,6 +1815,9 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata) if (WARN_ON_ONCE(drvdata->config_type != TMC_CONFIG_TYPE_ETR)) return -EINVAL;
+ if (drvdata->mode == CS_MODE_READ_PREVBOOT) + return tmc_prepare_etr_prevboot(drvdata); + spin_lock_irqsave(&drvdata->spinlock, flags); if (drvdata->reading) { ret = -EBUSY; @@ -1832,6 +1845,14 @@ int tmc_read_prepare_etr(struct tmc_drvdata *drvdata) return ret; }
+/* Handle READ_PREVBOOT mode */ +int tmc_unprepare_etr_prevboot(struct tmc_drvdata *drvdata) +{ + /* TODO */ + + return 0; +} + int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata) { unsigned long flags; @@ -1841,6 +1862,9 @@ int tmc_read_unprepare_etr(struct tmc_drvdata *drvdata) if (WARN_ON_ONCE(drvdata->config_type != TMC_CONFIG_TYPE_ETR)) return -EINVAL;
+ if (drvdata->mode == CS_MODE_READ_PREVBOOT) + return tmc_unprepare_etr_prevboot(drvdata); + spin_lock_irqsave(&drvdata->spinlock, flags);
/* RE-enable the TMC if need be */
On 02/05/2023 07:50, Linu Cherian wrote:
Hi,
Sharing here the design notes and work in progress patches to get some early feedback on the implementation approach.
Introduction
This RFC is about extending Linux coresight driver support to address kernel panic and watchdog reset scenarios. This would help coresight users to debug kernel panic and watchdog reset with the help of coresight trace data.
For simplicity, watchdog and kernel panic are addressed in separate sections.
Coresight trace capture: Kernel panic
From the coresight driver point of view, addressing the kernel panic situation has four main requirements.
a. Support for allocation of trace buffer pages from reserved memory area. Platform can advertise this using a new device tree property added to relevant coresight nodes.
b. Support for stopping coresight blocks at the time of panic
c. Saving required metadata in the specified format
d. Support for reading trace data captured at the time of panic
Allocation of trace buffer pages from reserved RAM
A new optional device tree property "memory-region" will be added to the ETR/ETF device nodes, that would give the base address and size of trace buffer. Static allocation of trace buffers would ensure that both IOMMU enabled and disabled cases are handled. Also, platforms that support persistent RAM will allow users to read trace data in the subsequent boot without booting the crashdump kernel. Note: For ETR sink devices, this reserved region will be used for both trace capture and trace data retrieval. For ETF sink devices, internal SRAM would be used for trace capture, and they would be synced to reserved region for retrieval. Disabling coresight blocks at the time of panic ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In order to avoid the situation of losing relevant trace data after a kernel panic, it would be desirable to stop the coresight blocks at the time of panic. This can be achieved by configuring the comparator, CTI and sink devices as below, Comparator(triggers on kernel panic) --->External out --->CTI -- | ETR/ETF stop <------External In <-------------- Note: No supporting patches are shared for this, since we are planning to make use of the System configuration manager to achieve the required configuration. This is a work in progress.
Hi Linu,
Do you have anything to share for this part yet? I'd like to test out the patches with this mode. Or do you just have the watchdog mode working?
Saving metadata at the time of kernel panic
Coresight metadata involves all additional data that are required for a successful trace decode in addition to the trace data. This involves ETR/ETF, ETE register snapshot etc. A new optional device property "memory-region" will be added to the ETR/ETF/ETE device nodes for this. Reading trace data captured at the time of panic
Trace data captured at the time of panic, can be read from rebooted kernel or from crashdump kernel using the below mentioned interface.
Steps for reading trace data captured in previous boot ++++++++++++++++++++++++++++++++++++++++++++++++++++++
cd /sys/bus/coresight/devices/tmc_etrXX/
Change to special mode called, read_prevboot.
#echo 1 > read_prevboot
Dump trace buffer data to a file,
#dd if=/dev/tmc_etrXX of=~/cstrace.bin
Reset back to normal mode
#echo 0 > read_prevboot
I didn't see a step mentioned about how to not overwrite the old data when booting again. Presumably if you want it recording from as early as possible you also need some flag to indicate whether the buffer contains a crash recording or not. I'm also not sure how this flag could be initialised?
General flow of trace capture and decode incase of kernel panic
1. Enable source and sink on all the cores using the sysfs interface. ETR sink will have trace buffers allocated from reserved memory. 2. Run relevant tests. 3. On a kernel panic, all coresight blocks are disabled, necessary metadata is synced by kernel panic handler. System would eventually reboot or boot a crashdump kernel.
[...]
a. Saving coresight metadata need to be taken care by the SCP(system control processor) firmware in the specified format, instead of kernel.
Is it not possible to save the metadata at the beginning? That way it would work more easily with both panic and watchdog modes. Or is there something that can't be known ahead of time?
I can't really think of anything that couldn't be known ahead of time. Unless it's something like the last write pointer?
b. Reserved memory region given by firmware for trace buffer and metadata has to be in persistent RAM. Note: This is a requirement for watchdog reset case but optional in kernel panic case.
Watchdog reset can be supported only on platforms that meet the above two requirements.
Testing so far:
Watchdog reset has been tested on Marvell SOCs using the above approach on 5.x kernel version with sysfs method.
Linu Cherian (5): dt-bindings: arm: coresight-tmc: Add "memory-region" property ccoresight: tmc-etr: Add support to use reserved trace memory coresight: core: Add provision for panic callbacks coresight: tmc: Add support for panic sync handling coresight: tmc: Add support for reading tracedata from previous boot
.../bindings/arm/arm,coresight-tmc.yaml | 9 ++ drivers/hwtracing/coresight/coresight-core.c | 31 +++++ drivers/hwtracing/coresight/coresight-priv.h | 1 + .../hwtracing/coresight/coresight-tmc-core.c | 112 ++++++++++++++++++ .../hwtracing/coresight/coresight-tmc-etf.c | 69 +++++++++++ .../hwtracing/coresight/coresight-tmc-etr.c | 98 ++++++++++++++- drivers/hwtracing/coresight/coresight-tmc.h | 32 +++++ include/linux/coresight.h | 11 ++ 8 files changed, 362 insertions(+), 1 deletion(-)
Hi James,
-----Original Message----- From: James Clark james.clark@arm.com Sent: Tuesday, June 13, 2023 3:08 PM To: Linu Cherian lcherian@marvell.com; mathieu.poirier@linaro.org; suzuki.poulose@arm.com; mike.leach@linaro.org Cc: coresight@lists.linaro.org; Anil Kumar Reddy H areddy3@marvell.com; George Cherian gcherian@marvell.com Subject: [EXT] Re: [RFC PATCH 0/5]Extending Coresight for Kernel panic and watchdog reset
External Email
On 02/05/2023 07:50, Linu Cherian wrote:
Hi,
Sharing here the design notes and work in progress patches to get some early feedback on the implementation approach.
Introduction
This RFC is about extending Linux coresight driver support to address kernel panic and watchdog reset scenarios. This would help coresight users to debug kernel panic and watchdog reset with the help of coresight trace data.
For simplicity, watchdog and kernel panic are addressed in separate sections.
Coresight trace capture: Kernel panic
From the coresight driver point of view, addressing the kernel panic situation has four main requirements.
a. Support for allocation of trace buffer pages from reserved memory area. Platform can advertise this using a new device tree property added to relevant coresight nodes.
b. Support for stopping coresight blocks at the time of panic
c. Saving required metadata in the specified format
d. Support for reading trace data captured at the time of panic
Allocation of trace buffer pages from reserved RAM
A new optional device tree property "memory-region" will be added to the ETR/ETF device nodes, that would give the base address and size of trace buffer. Static allocation of trace buffers would ensure that both IOMMU enabled and disabled cases are handled. Also, platforms that support persistent RAM will allow users to read trace data in the subsequent boot without booting the crashdump kernel. Note: For ETR sink devices, this reserved region will be used for both trace capture and trace data retrieval. For ETF sink devices, internal SRAM would be used for trace capture, and they would be synced to reserved region for retrieval. Disabling coresight blocks at the time of panic ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In order to avoid the situation of losing relevant trace data after a kernel panic, it would be desirable to stop the coresight blocks at the time of panic. This can be achieved by configuring the comparator, CTI and sink devices as below, Comparator(triggers on kernel panic) --->External out --->CTI -- | ETR/ETF stop <------External In <-------------- Note: No supporting patches are shared for this, since we are planning to make use of the System configuration manager to achieve the required configuration. This is a work in progress.
Hi Linu,
Do you have anything to share for this part yet? I'd like to test out the patches with this mode. Or do you just have the watchdog mode working?
Thanks for taking a look at this. Currently we do have the kernel panic case working as well but for a 5.x series.
For stopping Coresight trace at the time of panic involves mainly three changes. 1. ETM resource configuration to generate trigger on "panic" 2. CTI configuration to send the trigger to all ETF/ETR devices (FlushIn Trigger) 3. Configure ETF/ETR devices to flush and stop trace capture on FlushIn Trigger
The initial plan was to get #1 and #2 achieved using the Coresight System configuration manager and #3 would be a kernel change anyways. Since configuration manager support for CTI was work in progress, currently ETM configuration is achieved through a kernel driver change and CTI configuration is done using user space scripts.
Just started working on rebasing these working patches to upstream kernel. Once ready I will send the complete set as V2.
Saving metadata at the time of kernel panic
Coresight metadata involves all additional data that are required for a successful trace decode in addition to the trace data. This involves ETR/ETF, ETE register snapshot etc. A new optional device property "memory-region" will be added to the ETR/ETF/ETE device nodes for this. Reading trace data captured at the time of panic
Trace data captured at the time of panic, can be read from rebooted kernel or from crashdump kernel using the below mentioned interface.
Steps for reading trace data captured in previous boot ++++++++++++++++++++++++++++++++++++++++++++++++++++++
cd /sys/bus/coresight/devices/tmc_etrXX/
Change to special mode called, read_prevboot.
#echo 1 > read_prevboot
Dump trace buffer data to a file,
#dd if=/dev/tmc_etrXX of=~/cstrace.bin
Reset back to normal mode
#echo 0 > read_prevboot
I didn't see a step mentioned about how to not overwrite the old data when booting again. Presumably if you want it recording from as early as possible you also need some flag to indicate whether the buffer contains a crash recording or not. I'm also not sure how this flag could be initialised?
Yeah, currently this scenario is not handled. Will look into this.
General flow of trace capture and decode incase of kernel panic
~~~~~ > > 1. Enable source and sink on all the cores using the sysfs interface. > ETR sink will have trace buffers allocated from reserved memory. > > 2. Run relevant tests. > > 3. On a kernel panic, all coresight blocks are disabled, necessary > metadata is synced by kernel panic handler. > > System would eventually reboot or boot a crashdump kernel. > [...] > > a. Saving coresight metadata need to be taken care by the > SCP(system control processor) firmware in the specified format, > instead of kernel. > Is it not possible to save the metadata at the beginning? That way it would work more easily with both panic and watchdog modes. Or is there something that can't be known ahead of time? I can't really think of anything that couldn't be known ahead of time. Unless it's something like the last write pointer?
Yeah, its mainly the write pointer and status register which forces us to take snapshot after the panic/watchdog has triggered.
b. Reserved memory region given by firmware for trace buffer and
metadata
has to be in persistent RAM. Note: This is a requirement for watchdog reset case but optional in kernel panic case.
Watchdog reset can be supported only on platforms that meet the above two requirements.
Testing so far:
Watchdog reset has been tested on Marvell SOCs using the above approach on 5.x kernel version with sysfs method.
Linu Cherian (5): dt-bindings: arm: coresight-tmc: Add "memory-region" property ccoresight: tmc-etr: Add support to use reserved trace memory coresight: core: Add provision for panic callbacks coresight: tmc: Add support for panic sync handling coresight: tmc: Add support for reading tracedata from previous boot
.../bindings/arm/arm,coresight-tmc.yaml | 9 ++ drivers/hwtracing/coresight/coresight-core.c | 31 +++++ drivers/hwtracing/coresight/coresight-priv.h | 1 + .../hwtracing/coresight/coresight-tmc-core.c | 112 ++++++++++++++++++ .../hwtracing/coresight/coresight-tmc-etf.c | 69 +++++++++++ .../hwtracing/coresight/coresight-tmc-etr.c | 98 ++++++++++++++- drivers/hwtracing/coresight/coresight-tmc.h | 32 +++++ include/linux/coresight.h | 11 ++ 8 files changed, 362 insertions(+), 1 deletion(-)
On 14/06/2023 15:06, Linu Cherian wrote:
Hi James,
-----Original Message----- From: James Clark james.clark@arm.com Sent: Tuesday, June 13, 2023 3:08 PM To: Linu Cherian lcherian@marvell.com; mathieu.poirier@linaro.org; suzuki.poulose@arm.com; mike.leach@linaro.org Cc: coresight@lists.linaro.org; Anil Kumar Reddy H areddy3@marvell.com; George Cherian gcherian@marvell.com Subject: [EXT] Re: [RFC PATCH 0/5]Extending Coresight for Kernel panic and watchdog reset
External Email
On 02/05/2023 07:50, Linu Cherian wrote:
Hi,
Sharing here the design notes and work in progress patches to get some early feedback on the implementation approach.
Introduction
This RFC is about extending Linux coresight driver support to address kernel panic and watchdog reset scenarios. This would help coresight users to debug kernel panic and watchdog reset with the help of coresight trace data.
For simplicity, watchdog and kernel panic are addressed in separate sections.
Coresight trace capture: Kernel panic
From the coresight driver point of view, addressing the kernel panic situation has four main requirements.
a. Support for allocation of trace buffer pages from reserved memory area. Platform can advertise this using a new device tree property added to relevant coresight nodes.
b. Support for stopping coresight blocks at the time of panic
c. Saving required metadata in the specified format
d. Support for reading trace data captured at the time of panic
Allocation of trace buffer pages from reserved RAM
A new optional device tree property "memory-region" will be added to the ETR/ETF device nodes, that would give the base address and size of trace buffer. Static allocation of trace buffers would ensure that both IOMMU enabled and disabled cases are handled. Also, platforms that support persistent RAM will allow users to read trace data in the subsequent boot without booting the crashdump kernel. Note: For ETR sink devices, this reserved region will be used for both trace capture and trace data retrieval. For ETF sink devices, internal SRAM would be used for trace capture, and they would be synced to reserved region for retrieval. Disabling coresight blocks at the time of panic ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In order to avoid the situation of losing relevant trace data after a kernel panic, it would be desirable to stop the coresight blocks at the time of panic. This can be achieved by configuring the comparator, CTI and sink devices as below, Comparator(triggers on kernel panic) --->External out --->CTI -- | ETR/ETF stop <------External In <-------------- Note: No supporting patches are shared for this, since we are planning to make use of the System configuration manager to achieve the required configuration. This is a work in progress.
Hi Linu,
Do you have anything to share for this part yet? I'd like to test out the patches with this mode. Or do you just have the watchdog mode working?
Thanks for taking a look at this. Currently we do have the kernel panic case working as well but for a 5.x series.
For stopping Coresight trace at the time of panic involves mainly three changes.
- ETM resource configuration to generate trigger on "panic"
- CTI configuration to send the trigger to all ETF/ETR devices (FlushIn Trigger)
- Configure ETF/ETR devices to flush and stop trace capture on FlushIn Trigger
The initial plan was to get #1 and #2 achieved using the Coresight System configuration manager and #3 would be a kernel change anyways. Since configuration manager support for CTI was work in progress, currently ETM configuration is achieved through a kernel driver change and CTI configuration is done using user space scripts.
Just started working on rebasing these working patches to upstream kernel. Once ready I will send the complete set as V2.
Saving metadata at the time of kernel panic
Coresight metadata involves all additional data that are required for a successful trace decode in addition to the trace data. This involves ETR/ETF, ETE register snapshot etc. A new optional device property "memory-region" will be added to the ETR/ETF/ETE device nodes for this. Reading trace data captured at the time of panic
Trace data captured at the time of panic, can be read from rebooted kernel or from crashdump kernel using the below mentioned interface.
Steps for reading trace data captured in previous boot ++++++++++++++++++++++++++++++++++++++++++++++++++++++
cd /sys/bus/coresight/devices/tmc_etrXX/
Change to special mode called, read_prevboot.
#echo 1 > read_prevboot
Dump trace buffer data to a file,
#dd if=/dev/tmc_etrXX of=~/cstrace.bin
Reset back to normal mode
#echo 0 > read_prevboot
I didn't see a step mentioned about how to not overwrite the old data when booting again. Presumably if you want it recording from as early as possible you also need some flag to indicate whether the buffer contains a crash recording or not. I'm also not sure how this flag could be initialised?
Yeah, currently this scenario is not handled. Will look into this.
General flow of trace capture and decode incase of kernel panic
~~~~~ > > 1. Enable source and sink on all the cores using the sysfs interface. > ETR sink will have trace buffers allocated from reserved memory. > > 2. Run relevant tests. > > 3. On a kernel panic, all coresight blocks are disabled, necessary > metadata is synced by kernel panic handler. > > System would eventually reboot or boot a crashdump kernel. > [...] > > a. Saving coresight metadata need to be taken care by the > SCP(system control processor) firmware in the specified format, > instead of kernel. > Is it not possible to save the metadata at the beginning? That way it would work more easily with both panic and watchdog modes. Or is there something that can't be known ahead of time? I can't really think of anything that couldn't be known ahead of time. Unless it's something like the last write pointer?
Yeah, its mainly the write pointer and status register which forces us to take snapshot after the panic/watchdog has triggered.
If you are ok with timestamps being enabled then I wonder if the write pointer is redundant? You can just decode up until there is some kind of error or the timestamp goes backwards and then assume that is the last written point?
Also I've been looking at linux/Documentation/powerpc/firmware-assisted-dump.rst to see if there is anything in there that can be re-used. I don't know if you looked at that much? It seems like there is some precedent for involving the firmware, but if it can be avoided maybe that is simpler and easier to support. If not then maybe some things like the command line arguments can be re-used. Although I understand coredumps are a fairly different concept to trace so maybe we do need to come up with something completely new like the dtb changes you have.
Hi James,
-----Original Message----- From: James Clark james.clark@arm.com Sent: Wednesday, June 14, 2023 8:34 PM To: Linu Cherian lcherian@marvell.com; mathieu.poirier@linaro.org; suzuki.poulose@arm.com; mike.leach@linaro.org Cc: coresight@lists.linaro.org; Anil Kumar Reddy H areddy3@marvell.com; George Cherian gcherian@marvell.com Subject: [EXT] Re: [RFC PATCH 0/5]Extending Coresight for Kernel panic and watchdog reset
External Email
On 14/06/2023 15:06, Linu Cherian wrote:
Hi James,
-----Original Message----- From: James Clark james.clark@arm.com Sent: Tuesday, June 13, 2023 3:08 PM To: Linu Cherian lcherian@marvell.com; mathieu.poirier@linaro.org; suzuki.poulose@arm.com; mike.leach@linaro.org Cc: coresight@lists.linaro.org; Anil Kumar Reddy H areddy3@marvell.com; George Cherian gcherian@marvell.com Subject: [EXT] Re: [RFC PATCH 0/5]Extending Coresight for Kernel panic and watchdog reset
External Email
On 02/05/2023 07:50, Linu Cherian wrote:
Hi,
Sharing here the design notes and work in progress patches to get some early feedback on the implementation approach.
Introduction
This RFC is about extending Linux coresight driver support to address kernel panic and watchdog reset scenarios. This would help coresight users to debug kernel panic and watchdog reset with the help of coresight trace data.
For simplicity, watchdog and kernel panic are addressed in separate sections.
Coresight trace capture: Kernel panic
From the coresight driver point of view, addressing the kernel panic situation has four main requirements.
a. Support for allocation of trace buffer pages from reserved memory
area.
Platform can advertise this using a new device tree property added to relevant coresight nodes.
b. Support for stopping coresight blocks at the time of panic
c. Saving required metadata in the specified format
d. Support for reading trace data captured at the time of panic
Allocation of trace buffer pages from reserved RAM
A new optional device tree property "memory-region" will be added to the ETR/ETF device nodes, that would give the base address and size of trace buffer. Static allocation of trace buffers would ensure that both IOMMU enabled and disabled cases are handled. Also, platforms that support persistent RAM will allow users to read trace data in the subsequent boot without booting the crashdump kernel. Note: For ETR sink devices, this reserved region will be used for both trace capture and trace data retrieval. For ETF sink devices, internal SRAM would be used for trace capture, and they would be synced to reserved region for retrieval. Disabling coresight blocks at the time of panic ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In order to avoid the situation of losing relevant trace data after a kernel panic, it would be desirable to stop the coresight blocks at the time of panic. This can be achieved by configuring the comparator, CTI and sink devices as below, Comparator(triggers on kernel panic) --->External out --->CTI -- | ETR/ETF stop <------External In <-------------- Note: No supporting patches are shared for this, since we are planning to make use of the System configuration manager to achieve the required configuration. This is a work in progress.
Hi Linu,
Do you have anything to share for this part yet? I'd like to test out the patches with this mode. Or do you just have the watchdog mode
working?
Thanks for taking a look at this. Currently we do have the kernel panic case
working as well but for a 5.x series.
For stopping Coresight trace at the time of panic involves mainly three
changes.
- ETM resource configuration to generate trigger on "panic"
- CTI configuration to send the trigger to all ETF/ETR devices
(FlushIn Trigger) 3. Configure ETF/ETR devices to flush and stop trace capture on FlushIn Trigger
The initial plan was to get #1 and #2 achieved using the Coresight System
configuration manager and #3 would be a kernel change anyways. Since configuration manager support for CTI was work in progress, currently ETM configuration is achieved through a kernel driver change and CTI configuration is done using user space scripts.
Just started working on rebasing these working patches to upstream
kernel. Once ready I will send the complete set as V2.
Saving metadata at the time of kernel panic
Coresight metadata involves all additional data that are required for a successful trace decode in addition to the trace data. This involves ETR/ETF, ETE register snapshot etc. A new optional device property "memory-region" will be added to the ETR/ETF/ETE device nodes for this. Reading trace data captured at the time of panic
Trace data captured at the time of panic, can be read from rebooted kernel or from crashdump kernel using the below mentioned interface.
Steps for reading trace data captured in previous boot ++++++++++++++++++++++++++++++++++++++++++++++++++++++
cd /sys/bus/coresight/devices/tmc_etrXX/
Change to special mode called, read_prevboot.
#echo 1 > read_prevboot
Dump trace buffer data to a file,
#dd if=/dev/tmc_etrXX of=~/cstrace.bin
Reset back to normal mode
#echo 0 > read_prevboot
I didn't see a step mentioned about how to not overwrite the old data when booting again. Presumably if you want it recording from as early as possible you also need some flag to indicate whether the buffer contains a crash recording or not. I'm also not sure how this flag could be
initialised?
Yeah, currently this scenario is not handled. Will look into this.
General flow of trace capture and decode incase of kernel panic
>> ~~~~~ >>> >>> 1. Enable source and sink on all the cores using the sysfs interface. >>> ETR sink will have trace buffers allocated from reserved memory. >>> >>> 2. Run relevant tests. >>> >>> 3. On a kernel panic, all coresight blocks are disabled, necessary >>> metadata is synced by kernel panic handler. >>> >>> System would eventually reboot or boot a crashdump kernel. >>> >> [...] >>> >>> a. Saving coresight metadata need to be taken care by the >>> SCP(system control processor) firmware in the specified format, >>> instead of kernel. >>> >> >> Is it not possible to save the metadata at the beginning? That way it >> would work more easily with both panic and watchdog modes. Or is >> there something that can't be known ahead of time? >> >> I can't really think of anything that couldn't be known ahead of time. >> Unless it's something like the last write pointer? >> > > Yeah, its mainly the write pointer and status register which forces us to take snapshot after the panic/watchdog has triggered. If you are ok with timestamps being enabled then I wonder if the write pointer is redundant? You can just decode up until there is some kind of error or the timestamp goes backwards and then assume that is the last written point?
By the way, we cannot do away with saving internal SRAM of ETB to DDR after panic or watchdog since they are not preserved across reset.
Also I've been looking at linux/Documentation/powerpc/firmware-assisted-dump.rst to see if there is anything in there that can be re-used. I don't know if you looked at that much? It seems like there is some precedent for involving the firmware, but if it can be avoided maybe that is simpler and easier to support. If not then maybe some things like the command line arguments can be re-used.
Agree with your point of avoiding firmware dependency at any cost. Earlier we have experimented using the ELF core dump option to pass info from primary kernel to secondary kernel, but that doesn’t work for the watchdog reset case and it involves additional user space crash tool extensions as well.
Using firmware allocated memory region for trace data/metadata for each sink devices allows both kernel panic and watchdog case to work seamlessly.
Although I understand coredumps are a fairly different concept to trace so maybe we do need to come up with something completely new like the dtb changes you have.