Hi Günter,
On Wed, Oct 25, 2023 at 8:39 PM Guenter Roeck linux@roeck-us.net wrote:
On 10/25/23 10:05, Geert Uytterhoeven wrote:
On Wed, Oct 25, 2023 at 2:35 PM Geert Uytterhoeven geert@linux-m68k.org wrote:
On Wed, Oct 25, 2023 at 12:53 PM Geert Uytterhoeven geert@linux-m68k.org wrote:
On Wed, Oct 25, 2023 at 12:47 PM Geert Uytterhoeven geert@linux-m68k.org wrote:
On Tue, Oct 24, 2023 at 9:22 PM Pavel Machek pavel@denx.de wrote:
But we still have failures on Renesas with 5.10.199-rc2:
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/pipelines/10...
And they still happed during MMC init:
2.638013] renesas_sdhi_internal_dmac ee100000.mmc: Got CD GPIO
[ 2.638846] INFO: trying to register non-static key. [ 2.644192] ledtrig-cpu: registered to indicate activity on CPUs [ 2.649066] The code is fine but needs lockdep annotation, or maybe [ 2.649069] you didn't initialize this object before use? [ 2.649071] turning off the locking correctness validator. [ 2.649080] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.199-rc2-arm64-renesas-ge31b6513c43d #1 [ 2.649082] Hardware name: HopeRun HiHope RZ/G2M with sub board (DT) [ 2.649086] Call trace: [ 2.655106] SMCCC: SOC_ID: ARCH_SOC_ID not implemented, skipping .... [ 2.661354] dump_backtrace+0x0/0x194 [ 2.661361] show_stack+0x14/0x20 [ 2.667430] usbcore: registered new interface driver usbhid [ 2.672230] dump_stack+0xe8/0x130 [ 2.672238] register_lock_class+0x480/0x514 [ 2.672244] __lock_acquire+0x74/0x20ec [ 2.681113] usbhid: USB HID core driver [ 2.687450] lock_acquire+0x218/0x350 [ 2.687456] _raw_spin_lock+0x58/0x80 [ 2.687464] tmio_mmc_irq+0x410/0x9ac [ 2.688556] renesas_sdhi_internal_dmac ee160000.mmc: mmc0 base at 0x00000000ee160000, max clock rate 200 MHz [ 2.744936] __handle_irq_event_percpu+0xbc/0x340 [ 2.749635] handle_irq_event+0x60/0x100 [ 2.753553] handle_fasteoi_irq+0xa0/0x1ec [ 2.757644] __handle_domain_irq+0x7c/0xdc [ 2.761736] efi_header_end+0x4c/0xd0 [ 2.765393] el1_irq+0xcc/0x180 [ 2.768530] arch_cpu_idle+0x14/0x2c [ 2.772100] default_idle_call+0x58/0xe4 [ 2.776019] do_idle+0x244/0x2c0 [ 2.779242] cpu_startup_entry+0x20/0x6c [ 2.783160] rest_init+0x164/0x28c [ 2.786561] arch_call_rest_init+0xc/0x14 [ 2.790565] start_kernel+0x4c4/0x4f8 [ 2.794233] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000014 [ 2.803011] Mem abort info:
from https://lava.ciplatform.org/scheduler/job/1025535 from https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/jobs/5360973... .
Is there something else missing?
I don't have a HopeRun HiHope RZ/G2M, but both v5.10.198 and v5.10.199 seem to work fine on Salvator-XS with R-Car H3 ES2.0 and Salvator-X with R-Car M3-W ES1.0, using a config based on latest renesas_defconfig.
Sorry, I looked at the wrong log on R-Car M3-W. I do see the issue with v5.10.198, but not with v5.10.199.
It seems to be an intermittent issue. Investigating...
After spending too much time on bisecting, the bad guy turns out to be commit 6d3745bbc3341d3b ("mmc: renesas_sdhi: register irqs before registering controller") in v5.10.198.
Adding debug information shows the lock is mmc_host.lock.
It is definitely initialized:
renesas_sdhi_probe() { ... tmio_mmc_host_alloc() mmc_alloc_host spin_lock_init(&host->lock); ... devm_request_irq() -> tmio_mmc_irq tmio_mmc_cmd_irq() spin_lock(&host->lock); ... }
That leaves us with a missing lockdep annotation?
Is it possible that the lock initialization is overwritten ? I seem to recall a recent case where this happens.
Also, there is spin_lock_init(&_host->lock); in tmio_mmc_host_probe(), and tmio_mmc_host_probe() is called after devm_request_irq().
Unless I am missing something, that is initializing tmio_mmc_host.lock, which is a different lock than mmc_host.lock?
Also, how would lockdep annotation help with "Unable to handle kernel NULL pointer dereference at virtual address 0000000000000014" in the log above ?
For the log from v5.10.198-rc1-g18c65c1b4996, that happened because it lacked commit 1e3d016a95067ab3 ("mmc: renesas_sdhi: only reset SCC when its pointer is populated"), according to earlier messages in this thread.
For the NULL pointer dereference in 5.10.199-rc2, I'm not sure. I didn't see that on R-Car M3-W...
According to my logs, I never saw this lockdep issue in MMC on mainline before, so it's a bit hard to guess what's missing...
Gr{oetje,eeting}s,
Geert
-- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds