Re: [EXT] Re: [RFC PATCH v3 0/8] Coresight for Kernel panic and watchdog reset

19 Sep 2023

      On 19/09/2023 12:39, Linu Cherian wrote:
...
Hi James,
...
-----Original Message-----
From: James Clark james.clark@arm.com
Sent: Friday, September 15, 2023 7:20 PM
To: Linu Cherian lcherian@marvell.com; suzuki.poulose@arm.com;
mike.leach@linaro.org; leo.yan@linaro.org
Cc: linux-arm-kernel@lists.infradead.org; coresight@lists.linaro.org; linux-
kernel@vger.kernel.org; robh+dt@kernel.org;
krzysztof.kozlowski+dt@linaro.org; conor+dt@kernel.org;
devicetree@vger.kernel.org; Sunil Kovvuri Goutham
sgoutham@marvell.com; George Cherian gcherian@marvell.com
Subject: [EXT] Re: [RFC PATCH v3 0/8] Coresight for Kernel panic and
watchdog reset
External Email

On 04/09/2023 06:05, Linu Cherian wrote:
...
This RFC v3 patch series is rebased on v6.5-rc7 and is dependent on
the below two patches.
[...]
...
Steps for reading trace data captured in previous boot
++++++++++++++++++++++++++++++++++++++++++++++++++++++

cd /sys/bus/coresight/devices/tmc_etrXX/

Change to special mode called, read_prevboot.
#echo 1 > read_prevboot

Dump trace buffer data to a file,
#dd if=/dev/tmc_etrXX of=~/cstrace.bin

Hi Linu,
I left this comment on V2, but I tested it again and get the same result.
Instead of linking it I'll just re-paste it here:
I made a reserved region, but when I run this command I get "Unable to
handle kernel paging request at virtual address 001f1921ed10ffae".
Is there an extra step involved if there was no trace captured from a previous
panic? I thought I'd just be able to read out uninitialised data. Or is it the
uninitialised metadata that's causing this issue?
Also that's without KASAN or lockdep turned on. If I have a kernel with either
of those things I get a different warning for each one. I expect the lockdep
one would happen even in the working scenario though?
Somehow I missed this comment on V2.
I retried the above steps on my board and I do not see issues either with KASAN OR lockdep enabled configs.
Please see logs below.
a. Lockdep enabled config
~# cd /sys/bus/coresight/devices/tmc_etr0
tmc_etr0# echo 1 > read_prevboot
tmc_etr0# dd if=/dev/tmc_etr0 of=~/cstrace.bin
12324+1 records in
12324+1 records out
6310032 bytes (6.3 MB, 6.0 MiB) copied, 0.122883 s, 51.3 MB/s
# zcat /proc/config.gz | grep LOCKDEP
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_LOCKDEP=y
CONFIG_LOCKDEP_BITS=15
CONFIG_LOCKDEP_CHAINS_BITS=16
CONFIG_LOCKDEP_STACK_TRACE_BITS=19
CONFIG_LOCKDEP_STACK_TRACE_HASH_BITS=14
CONFIG_LOCKDEP_CIRCULAR_QUEUE_BITS=12
# CONFIG_DEBUG_LOCKDEP is not set
b. KASAN enabled config
# cd /sys/bus/coresight/devices/tmc_etr0/
tmc_etr0# ls
buf_mode_preferred   connections  power          trigger_cntr
buf_modes_available  enable_sink  read_prevboot  uevent
buffer_size          mgmt         subsystem      waiting_for_supplier
tmc_etr0# echo 1 > read_prevboot
tmc_etr0# dd if=/dev/tmc_etr0 of=~/cstrace.bin
12324+1 records in
12324+1 records out
6310032 bytes (6.3 MB, 6.0 MiB) copied, 0.0940671 s, 67.1 MB/s
~# zcat /proc/config.gz | grep -i kasan
CONFIG_KASAN_SHADOW_OFFSET=0xdfff800000000000
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_SW_TAGS=y
CONFIG_HAVE_ARCH_KASAN_HW_TAGS=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_CC_HAS_KASAN_GENERIC=y
CONFIG_CC_HAS_KASAN_SW_TAGS=y
CONFIG_KASAN=y
CONFIG_KASAN_GENERIC=y
# CONFIG_KASAN_SW_TAGS is not set
# CONFIG_KASAN_HW_TAGS is not set
CONFIG_KASAN_OUTLINE=y
# CONFIG_KASAN_INLINE is not set
CONFIG_KASAN_STACK=y
CONFIG_KASAN_VMALLOC=y
# CONFIG_KASAN_MODULE_TEST is not set
But then I am able to trigger kernel crash with bad metadata(corrupted rwp and rrp) with below stack trace.
[  107.442991]  __arch_copy_to_user+0x180/0x240
[  107.447254]  vfs_read+0xc8/0x2a8
[  107.450476]  ksys_read+0x74/0x110
[  107.453783]  __arm64_sys_read+0x24/0x38
[  107.457611]  invoke_syscall.constprop.0+0x58/0xf8
[  107.462309]  do_el0_svc+0x6c/0x158
[  107.465704]  el0_svc+0x54/0x1c0
[  107.468839]  el0t_64_sync_handler+0x100/0x130
[  107.473188]  el0t_64_sync+0x190/0x198
[  107.476843] Code: d503201f d503201f d503201f d503201f (a8c12027)
Does your stack trace looks similar ? Then its very likely due to bad metadata.
If not, kindly please share yours.
For example, if we have bad values for rwp and rrp, offset can get messed up resulting in above crash.
Will add more validation checks while setting up the prevboot buffer,  so as to avoid processing with bogus metadata values
in the next patch version.
Thanks James for trying this out.
I think it must be bad metadata because I didn't try it with a previous
crash saved yet. I suppose we do need some kind of validation then if
it's possible for bad metadata to cause a crash.
I will try after filling in the metadata and see if that was the issue.
...
...
...

Reset back to normal mode
#echo 0 > read_prevboot

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [EXT] Re: [RFC PATCH v3 0/8] Coresight for Kernel panic and watchdog reset