HI Mathieu,
On Mon, 8 Jun 2020 at 18:40, Mathieu Poirier mathieu.poirier@linaro.org wrote:
Hey Mike,
On Mon, 8 Jun 2020 at 09:38, Mike Leach mike.leach@linaro.org wrote:
Hi,
During some recent testing I happened to be using the ETF on the DB410 as a sink for a quick trace test.
Running my usual script which does the following sequence via sysfs:-
- enable sink.
- enable source.
- wait
4 .disable source 5. disable sink 6. read data from sink.
At step 6, the board hung, threw an abort and rebooted:- root@linaro-developer:~# [ 128.556367] Internal error: synchronous external abort: 96000010 [#1] SMP
A typical symptom of accessing registers of a device that isn't powered...
Closer investigation shows this was occurring in tmc_etf_read_unprepare() in coresight-tmc-etf.c.
A quick check though my archive of previous kernels, shows that a 5.6-rc3 build from 27/02/20 does not show the issue, a 5.6-rc6 build from 20/03/20 does show the problem.
A look through the log for coresight-tmc-etf.c shows only one recent change to this file, a patch from Sai - commit 347adb0d6385, on 20/05/20 - to tmc_etf_read_prepare() - which is a couple of months after the issue first arises. Interestingly, if I replicate the change made in this commit, to tmc_etf_read_unprepare(), then the problem disappears.
Very good, at least things are consistent.
Obviously I can submit a patch to the 5.8-rc1 tree once that appears, assuming that this would not be simply masking a problem elsewhere.
Right.
Looking at the code, I think adding a patch to do the same check in read_unprepare() is valid. That being said and in accordance to logic, things should currently crash on Sai's board without read_unprepare() fitted with the same check as in read_prepare().
The patch from Sai addressed the issue of users trying to read the ETF before enabling it, or a source. Now if someone was to use a read data -> disable sink sequence. which is valid & possible, the issue would not occur. I triggered it because of my disable sink -> read data sequence.
I also wonder what happened between 5.6-rc3 and 5.6-rc6, a git bisect would tell us pretty quickly. I'm suspecting a change in the QC core drivers or DT.
I was thinking the same thing. Powering down disabled stuff seems to be perfectly reasonable to me.
Mike
Thoughts? Anyone see similar issues?
Regards
Mike
-- Mike Leach Principal Engineer, ARM Ltd. Manchester Design Centre. UK