Hi Mathieu,
On 2020-06-09 20:59, Mathieu Poirier wrote:
On Tue, 9 Jun 2020 at 08:20, Sai Prakash Ranjan saiprakash.ranjan@codeaurora.org wrote:
Hi Mike,
On 2020-06-09 16:08, Mike Leach wrote:
HI Mathieu,
On Mon, 8 Jun 2020 at 18:40, Mathieu Poirier mathieu.poirier@linaro.org wrote:
Hey Mike,
On Mon, 8 Jun 2020 at 09:38, Mike Leach mike.leach@linaro.org wrote:
Hi,
During some recent testing I happened to be using the ETF on the DB410 as a sink for a quick trace test.
Running my usual script which does the following sequence via sysfs:-
- enable sink.
- enable source.
- wait
4 .disable source 5. disable sink 6. read data from sink.
At step 6, the board hung, threw an abort and rebooted:- root@linaro-developer:~# [ 128.556367] Internal error: synchronous external abort: 96000010 [#1] SMP
A typical symptom of accessing registers of a device that isn't powered...
Closer investigation shows this was occurring in tmc_etf_read_unprepare() in coresight-tmc-etf.c.
A quick check though my archive of previous kernels, shows that a 5.6-rc3 build from 27/02/20 does not show the issue, a 5.6-rc6 build from 20/03/20 does show the problem.
A look through the log for coresight-tmc-etf.c shows only one recent change to this file, a patch from Sai - commit 347adb0d6385, on 20/05/20 - to tmc_etf_read_prepare() - which is a couple of months after the issue first arises. Interestingly, if I replicate the change made in this commit, to tmc_etf_read_unprepare(), then the problem disappears.
Very good, at least things are consistent.
Obviously I can submit a patch to the 5.8-rc1 tree once that appears, assuming that this would not be simply masking a problem elsewhere.
Right.
Looking at the code, I think adding a patch to do the same check in read_unprepare() is valid. That being said and in accordance to logic, things should currently crash on Sai's board without read_unprepare() fitted with the same check as in read_prepare().
The patch from Sai addressed the issue of users trying to read the ETF before enabling it, or a source. Now if someone was to use a read data -> disable sink sequence. which is valid & possible, the issue would not occur. I triggered it because of my disable sink -> read data sequence.
Yes my patch was addressing the issue with reading ETF before enabling it. I just tested your sequence of reading ETF after disabling ETF on SC7180 SoC, and as Mathieu guessed it crashed, I had not considered this scenario before when posting the other patch.
[ 128.578058] SError Interrupt on CPU6, code 0xbe000411 -- SError [ 128.578064] pstate: 80400089 (Nzcv daIf +PAN -UAO) [ 128.578065] pc : tmc_read_unprepare_etb+0x74/0x108 [ 128.578065] lr : tmc_read_unprepare_etb+0x54/0x108 [ 128.578066] sp : ffffff80d9507c30 [ 128.578067] x29: ffffff80d9507c30 x28: ffffff80b3569a0c [ 128.578071] x27: 0000000000000000 x26: 00000000000a0001 [ 128.578073] x25: ffffff80cbae9550 x24: 0000000000000010 [ 128.578075] x23: ffffffd07296b0f0 x22: ffffffd0109ee028 [ 128.578077] x21: 0000000000000000 x20: ffffff80d19e70e0 [ 128.578079] x19: ffffff80d19e7080 x18: 0000000000000000 [ 128.578081] x17: 0000000000000000 x16: 0000000000000000 [ 128.578083] x15: 0000000000000000 x14: 0000000000000000 [ 128.578086] x13: 0000000000000000 x12: 0000000000000000 [ 128.578088] x11: 0000000000000000 x10: dfffffd000000001 [ 128.578090] x9 : 0000000000000000 x8 : 0000000000000002 [ 128.578092] x7 : ffffffd071d0fe78 x6 : 0000000000000000 [ 128.578094] x5 : 0000000000000080 x4 : 0000000000000001 [ 128.578096] x3 : ffffffd071d0fe98 x2 : 0000000000000000 [ 128.578098] x1 : 0000000000000004 x0 : 0000000000000001 [ 128.578100] Kernel panic - not syncing: Asynchronous SError Interrupt [ 128.578327] SMP: stopping secondary CPUs [ 128.578329] Kernel Offset: 0x61400000 from 0xffffffd010000000 [ 128.578330] PHYS_OFFSET: 0xffffffea00000000 [ 128.578331] CPU features: 0x080006,2a80aa18 [ 128.578332] Memory Limit: none
I also wonder what happened between 5.6-rc3 and 5.6-rc6, a git bisect would tell us pretty quickly. I'm suspecting a change in the QC core drivers or DT.
I was thinking the same thing. Powering down disabled stuff seems to be perfectly reasonable to me.
Yes, this seems like the right thing to do. I tested this one out, i.e., having the same logic as tmc_read_prepare_etb() in tmc_read_unprepare_etb() as you tried and do not observe this crash.
When the ongoing merge window comes to and, please send a patch with a "Fixes" tag.
Sure.
Thanks, Sai