On Fri, 11 Nov 2022 at 09:45, Arnd Bergmann arnd@arndb.de wrote:
On Fri, Nov 11, 2022, at 07:28, Naresh Kamboju wrote:
On Thu, 10 Nov 2022 at 03:33, Arnd Bergmann arnd@arndb.de wrote:
One more idea I had is the unwinder: since this kernel is built with the frame-pointer unwinder, I think the stack usage per function is going to be slightly larger than with the arm unwinder.
Naresh, how hard is it to reproduce this bug intentionally? Can you try if it still happens if you change the .config to use these:?
# CONFIG_FUNCTION_GRAPH_TRACER is not set # CONFIG_UNWINDER_FRAME_POINTER is not set CONFIG_UNWINDER_ARM=y
I have done this experiment and reported crash not reproduced after eight rounds of testing [1].
https://lkft.validation.linaro.org/scheduler/job/5835922#L1993
Ok, good to hear. In this case, I see three possible ways forward to prevent this from coming back on your system:
a) use asynchronous probing for one or more of the drivers as Dmitry suggested. This means fixing it upstream first and then backporting the fix to all stable kernels. We should probably do this anyway, but this will need more testing on your side.
b) Change your kernel config permanently with the options above, if LKFT does not actually rely on CONFIG_FUNCTION_GRAPH_TRACER. I don't know if it does.
c) backport commit 41918ec82eb6 ("ARM: ftrace: enable the graph tracer with the EABI unwinder") from 5.17. This was part of a longer series from Ard, and while the patch itself looks simple enough to be backported, I suspect we'd have to backport the entire series, which is probably not going to be realistic. Ard, any comments on this?
It at least needs the preceding patch, which tracks the location of LR on the stack when using CONFIG_UNWINDER_ARM.
But I'd take the whole series for good measure.