Re: [RFC PATCH v2 0/7]Extending Coresight for Kernel panic and watchdog reset

1 Sep 2023

      On 13/07/2023 14:47, Linu Cherian wrote:
...
Changelog from v1:

V2 is a complete patchset with kernel panic trace tested on Linux 6.4.
Details on testing with relevant console logs has been added for reference.
Two additional patches(patch 6 & 7) has been included to manage stopping of trace
at the time of kernel panic.
Few bug fixes.

TODO:

Add support to prevent overwriting of trace data captured in previous
boot. (Suggested by James)
DTS properties for reserved memory might need some refinements,
since Linux arm64 kernel has limitation on the number of reserved
regions it supports(ie. 64).
ETM & CTI configuration using system configuration manager is a work
progress. Currently ETM configuration is done in the driver(patch 7) and CTI
configuration is done using CTI sysfs interface.
Reading tracedata from crashdump kernel is not tested.
Perf based trace capture is not tested.

Introduction
This RFC is about extending Linux coresight driver support to address
kernel panic and watchdog reset scenarios. This would help coresight
users to debug kernel panic and watchdog reset with the help of coresight
trace data.
For simplicity, watchdog and kernel panic are addressed in separate
sections.
Coresight trace capture: Kernel panic
From the coresight driver point of view, addressing the kernel panic
situation has four main requirements.
a. Support for allocation of trace buffer pages from reserved memory area.
   Platform can advertise this using a new device tree property added to
   relevant coresight nodes.
b. Support for stopping coresight blocks at the time of panic
c. Saving required metadata in the specified format
d. Support for reading trace data captured at the time of panic
Allocation of trace buffer pages from reserved RAM
A new optional device tree property "memory-region" is added to the
ETR/ETF device nodes, that would give the base address and size of trace
buffer.

Static allocation of trace buffers would ensure that both IOMMU enabled
and disabled cases are handled. Also, platforms that support persistent
RAM will allow users to read trace data in the subsequent boot without
booting the crashdump kernel.  

Note:
For ETR sink devices, this reserved region will be used for both trace
capture and trace data retrieval.
For ETF sink devices, internal SRAM would be used for trace capture,
and they would be synced to reserved region for retrieval.

Note: Patches 1 & 2 adds support for this.

Disabling coresight blocks at the time of panic
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In order to avoid the situation of losing relevant trace data after a
kernel panic, it would be desirable to stop the coresight blocks at the
time of panic.

This can be achieved by configuring the comparator, CTI and sink
devices as below,  

Comparator(triggers on kernel panic) --->External out --->CTI --
   							|	
   	 ETR/ETF stop <------External In <--------------
Note:

* Patch 6 provides the necessary ETR configuration.
* Patch 7 provides the necessary ETM configuration.

Saving metadata at the time of kernel panic
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Coresight metadata involves all additional data that are required for a 
successful trace decode in addition to the trace data. This involves
ETR/ETF, ETE register snapshot etc.

A new optional device property "memory-region" is added to
the ETR/ETF/ETE device nodes for this. 

Note: Patches 3 & 4 adds support for this.

Reading trace data captured at the time of panic
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Trace data captured at the time of panic, can be read from rebooted kernel
or from crashdump kernel using the below mentioned interface. 

Note: Patch 5 adds support for this.

Steps for reading trace data captured in previous boot
++++++++++++++++++++++++++++++++++++++++++++++++++++++
1. cd /sys/bus/coresight/devices/tmc_etrXX/

2. Change to special mode called, read_prevboot.

   #echo 1 > read_prevboot

3. Dump trace buffer data to a file,

   #dd if=/dev/tmc_etrXX of=~/cstrace.bin

I made a reserved region, but when I run this command I get "Unable to
handle kernel paging request at virtual address 001f1921ed10ffae".
Is there an extra step involved if there was no trace captured from a
previous panic? I thought I'd just be able to read out uninitialised
data. Or is it the uninitialised metadata that's causing this issue?
Also that's without KASAN or lockdep turned on. If I have a kernel with
either of those things I get a different warning for each one. I expect
the lockdep one would happen even in the working scenario though?

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [RFC PATCH v2 0/7]Extending Coresight for Kernel panic and watchdog reset

Introduction

Coresight trace capture: Kernel panic