arm64 dragonboard-410c boot failed while running linux next 2020915 due to the kernel crash.
metadata: git branch: master git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git describe: next-20200915 make_kernelversion: 5.9.0-rc5 kernel-config: https://builds.tuxbuild.com/J5oDTkph2aj855oeGOd45Q/kernel.config
crash log: ------------- [ 3.517615] Synopsys Designware Multimedia Card Interface Driver [ 3.524258] sdhci-pltfm: SDHCI platform and OF driver helper [ 3.531302] Unable to handle kernel paging request at virtual address dead000000000108 [ 3.531396] Mem abort info: [ 3.531460] sdhci_msm 7864900.sdhci: Got CD GPIO [ 3.539182] ESR = 0x96000044 [ 3.541953] ledtrig-cpu: registered to indicate activity on CPUs [ 3.546692] EC = 0x25: DABT (current EL), IL = 32 bits [ 3.546701] SET = 0, FnV = 0 [ 3.555694] usbcore: registered new interface driver usbhid [ 3.555703] usbhid: USB HID core driver [ 3.561638] genirq: irq_chip msmgpio did not update eff. affinity mask of irq 75 [ 3.563908] EA = 0, S1PTW = 0 [ 3.580792] Data abort info: [ 3.583673] ISV = 0, ISS = 0x00000044 [ 3.583900] NET: Registered protocol family 10 [ 3.586785] CM = 0, WnR = 1 [ 3.586794] [dead000000000108] address between user and kernel address ranges [ 3.586806] Internal error: Oops: 96000044 [#1] PREEMPT SMP [ 3.591869] Segment Routing with IPv6 [ 3.594829] Modules linked in: [ 3.594841] CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.9.0-rc5-next-20200915 #1 [ 3.594844] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 3.594862] Workqueue: events_unbound async_run_entry_fn [ 3.597959] NET: Registered protocol family 17 [ 3.604991] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--) [ 3.605000] pc : __clk_put+0x40/0x140 [ 3.605009] lr : __clk_put+0x2c/0x140 [ 3.610613] 9pnet: Installing 9P2000 support [ 3.614183] sp : ffff80001005bbe0 [ 3.614189] x29: ffff80001005bbe0 [ 3.617233] Key type dns_resolver registered [ 3.624696] x28: 000000000000002e [ 3.624701] x27: ffff00003b620a68 x26: ffff800011496568 [ 3.624708] x25: ffff00003fcfe8f8 x24: ffff00003d30c410 [ 3.632518] registered taskstats version 1 [ 3.636931] x23: ffff800011495cf8 x22: ffff00003b620a40 [ 3.636938] x21: ffff00003d30c400 x20: ffff00003b620580 [ 3.636945] x19: ffff00003b64f380 x18: 0000000007824000 [ 3.636951] x17: ffff00003b620a00 x16: ffff00003b6205d0 [ 3.636958] x15: ffff8000119929f8 x14: ffffffffffffffff [ 3.636965] x13: ffff800012947000 x12: ffff800012947000 [ 3.636975] x11: 0000000000000020 [ 3.641233] Loading compiled-in X.509 certificates [ 3.646650] x10: 0101010101010101 [ 3.646654] x9 : ffff8000107b4c84 x8 : 7f7f7f7f7f7f7f7f [ 3.646661] x7 : ffff000009fe5880 x6 : 0000000000000000 [ 3.646668] x5 : 0000000000000000 x4 : ffff000009fe5880 [ 3.646674] x3 : ffff80001250d7a0 x2 : ffff000009fe5880 [ 3.746653] x1 : ffff00003b64f680 x0 : dead000000000100 [ 3.751949] Call trace: [ 3.757243] __clk_put+0x40/0x140 [ 3.759413] clk_put+0x18/0x28 [ 3.762885] dev_pm_opp_put_clkname+0x30/0x58 [ 3.765837] sdhci_msm_probe+0x288/0x9a8 [ 3.770265] platform_drv_probe+0x5c/0xb0 [ 3.774258] really_probe+0xf0/0x4d8 [ 3.778163] driver_probe_device+0xfc/0x168 [ 3.781810] __driver_attach_async_helper+0xbc/0xc8 [ 3.785717] async_run_entry_fn+0x4c/0x1b0 [ 3.790575] process_one_work+0x1c8/0x498 [ 3.794741] worker_thread+0x54/0x428 [ 3.798822] kthread+0x120/0x158 [ 3.802467] ret_from_fork+0x10/0x30 [ 3.805771] Code: 35000720 a9438660 f9000020 b4000040 (f9000401) [ 3.809334] ---[ end trace 1a607a5ea6891b9f ]---
full test log link, https://lkft.validation.linaro.org/scheduler/job/1765840#L2014 https://lkft.validation.linaro.org/scheduler/job/1765842#L1960
On Tue, 15 Sep 2020 at 16:33, Naresh Kamboju naresh.kamboju@linaro.org wrote:
arm64 dragonboard-410c boot failed while running linux next 2020915 due to the kernel crash.
metadata: git branch: master git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git describe: next-20200915 make_kernelversion: 5.9.0-rc5 kernel-config: https://builds.tuxbuild.com/J5oDTkph2aj855oeGOd45Q/kernel.config
crash log:
[ 3.517615] Synopsys Designware Multimedia Card Interface Driver [ 3.524258] sdhci-pltfm: SDHCI platform and OF driver helper [ 3.531302] Unable to handle kernel paging request at virtual address dead000000000108 [ 3.531396] Mem abort info: [ 3.531460] sdhci_msm 7864900.sdhci: Got CD GPIO [ 3.539182] ESR = 0x96000044 [ 3.541953] ledtrig-cpu: registered to indicate activity on CPUs [ 3.546692] EC = 0x25: DABT (current EL), IL = 32 bits [ 3.546701] SET = 0, FnV = 0 [ 3.555694] usbcore: registered new interface driver usbhid [ 3.555703] usbhid: USB HID core driver [ 3.561638] genirq: irq_chip msmgpio did not update eff. affinity mask of irq 75 [ 3.563908] EA = 0, S1PTW = 0 [ 3.580792] Data abort info: [ 3.583673] ISV = 0, ISS = 0x00000044 [ 3.583900] NET: Registered protocol family 10 [ 3.586785] CM = 0, WnR = 1 [ 3.586794] [dead000000000108] address between user and kernel address ranges [ 3.586806] Internal error: Oops: 96000044 [#1] PREEMPT SMP [ 3.591869] Segment Routing with IPv6 [ 3.594829] Modules linked in: [ 3.594841] CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.9.0-rc5-next-20200915 #1 [ 3.594844] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 3.594862] Workqueue: events_unbound async_run_entry_fn [ 3.597959] NET: Registered protocol family 17 [ 3.604991] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--) [ 3.605000] pc : __clk_put+0x40/0x140 [ 3.605009] lr : __clk_put+0x2c/0x140 [ 3.610613] 9pnet: Installing 9P2000 support [ 3.614183] sp : ffff80001005bbe0 [ 3.614189] x29: ffff80001005bbe0 [ 3.617233] Key type dns_resolver registered [ 3.624696] x28: 000000000000002e [ 3.624701] x27: ffff00003b620a68 x26: ffff800011496568 [ 3.624708] x25: ffff00003fcfe8f8 x24: ffff00003d30c410 [ 3.632518] registered taskstats version 1 [ 3.636931] x23: ffff800011495cf8 x22: ffff00003b620a40 [ 3.636938] x21: ffff00003d30c400 x20: ffff00003b620580 [ 3.636945] x19: ffff00003b64f380 x18: 0000000007824000 [ 3.636951] x17: ffff00003b620a00 x16: ffff00003b6205d0 [ 3.636958] x15: ffff8000119929f8 x14: ffffffffffffffff [ 3.636965] x13: ffff800012947000 x12: ffff800012947000 [ 3.636975] x11: 0000000000000020 [ 3.641233] Loading compiled-in X.509 certificates [ 3.646650] x10: 0101010101010101 [ 3.646654] x9 : ffff8000107b4c84 x8 : 7f7f7f7f7f7f7f7f [ 3.646661] x7 : ffff000009fe5880 x6 : 0000000000000000 [ 3.646668] x5 : 0000000000000000 x4 : ffff000009fe5880 [ 3.646674] x3 : ffff80001250d7a0 x2 : ffff000009fe5880 [ 3.746653] x1 : ffff00003b64f680 x0 : dead000000000100 [ 3.751949] Call trace: [ 3.757243] __clk_put+0x40/0x140 [ 3.759413] clk_put+0x18/0x28 [ 3.762885] dev_pm_opp_put_clkname+0x30/0x58 [ 3.765837] sdhci_msm_probe+0x288/0x9a8 [ 3.770265] platform_drv_probe+0x5c/0xb0 [ 3.774258] really_probe+0xf0/0x4d8 [ 3.778163] driver_probe_device+0xfc/0x168 [ 3.781810] __driver_attach_async_helper+0xbc/0xc8 [ 3.785717] async_run_entry_fn+0x4c/0x1b0 [ 3.790575] process_one_work+0x1c8/0x498 [ 3.794741] worker_thread+0x54/0x428 [ 3.798822] kthread+0x120/0x158 [ 3.802467] ret_from_fork+0x10/0x30 [ 3.805771] Code: 35000720 a9438660 f9000020 b4000040 (f9000401) [ 3.809334] ---[ end trace 1a607a5ea6891b9f ]---
full test log link, https://lkft.validation.linaro.org/scheduler/job/1765840#L2014 https://lkft.validation.linaro.org/scheduler/job/1765842#L1960
-- Linaro LKFT https://lkft.linaro.org
Naresh, thanks for reporting!
There have been regressions related to the opp library this cycle, so I am wondering if Viresh may have any ideas, before going into more details.
One thing that also changed from the sdhci-msm point of view, is that we enabled async probe [1]. This could be the thing that triggers an untested error path of the probe?
Otherwise we can always try to revert "mmc: sdhci-msm: Unconditionally call dev_pm_opp_of_remove_table()", which I recently applied again after the earlier errors.
Kind regards Uffe
[1] "mmc: sdhci-msm: Unconditionally call dev_pm_opp_of_remove_table()" https://patchwork.kernel.org/patch/11752095/
On 15-09-20, 23:39, Ulf Hansson wrote:
On Tue, 15 Sep 2020 at 16:33, Naresh Kamboju naresh.kamboju@linaro.org wrote:
arm64 dragonboard-410c boot failed while running linux next 2020915 due to the kernel crash.
metadata: git branch: master git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git describe: next-20200915 make_kernelversion: 5.9.0-rc5 kernel-config: https://builds.tuxbuild.com/J5oDTkph2aj855oeGOd45Q/kernel.config
crash log:
[ 3.517615] Synopsys Designware Multimedia Card Interface Driver [ 3.524258] sdhci-pltfm: SDHCI platform and OF driver helper [ 3.531302] Unable to handle kernel paging request at virtual address dead000000000108 [ 3.531396] Mem abort info: [ 3.531460] sdhci_msm 7864900.sdhci: Got CD GPIO [ 3.539182] ESR = 0x96000044 [ 3.541953] ledtrig-cpu: registered to indicate activity on CPUs [ 3.546692] EC = 0x25: DABT (current EL), IL = 32 bits [ 3.546701] SET = 0, FnV = 0 [ 3.555694] usbcore: registered new interface driver usbhid [ 3.555703] usbhid: USB HID core driver [ 3.561638] genirq: irq_chip msmgpio did not update eff. affinity mask of irq 75 [ 3.563908] EA = 0, S1PTW = 0 [ 3.580792] Data abort info: [ 3.583673] ISV = 0, ISS = 0x00000044 [ 3.583900] NET: Registered protocol family 10 [ 3.586785] CM = 0, WnR = 1 [ 3.586794] [dead000000000108] address between user and kernel address ranges [ 3.586806] Internal error: Oops: 96000044 [#1] PREEMPT SMP [ 3.591869] Segment Routing with IPv6 [ 3.594829] Modules linked in: [ 3.594841] CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.9.0-rc5-next-20200915 #1 [ 3.594844] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 3.594862] Workqueue: events_unbound async_run_entry_fn [ 3.597959] NET: Registered protocol family 17 [ 3.604991] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--) [ 3.605000] pc : __clk_put+0x40/0x140 [ 3.605009] lr : __clk_put+0x2c/0x140 [ 3.610613] 9pnet: Installing 9P2000 support [ 3.614183] sp : ffff80001005bbe0 [ 3.614189] x29: ffff80001005bbe0 [ 3.617233] Key type dns_resolver registered [ 3.624696] x28: 000000000000002e [ 3.624701] x27: ffff00003b620a68 x26: ffff800011496568 [ 3.624708] x25: ffff00003fcfe8f8 x24: ffff00003d30c410 [ 3.632518] registered taskstats version 1 [ 3.636931] x23: ffff800011495cf8 x22: ffff00003b620a40 [ 3.636938] x21: ffff00003d30c400 x20: ffff00003b620580 [ 3.636945] x19: ffff00003b64f380 x18: 0000000007824000 [ 3.636951] x17: ffff00003b620a00 x16: ffff00003b6205d0 [ 3.636958] x15: ffff8000119929f8 x14: ffffffffffffffff [ 3.636965] x13: ffff800012947000 x12: ffff800012947000 [ 3.636975] x11: 0000000000000020 [ 3.641233] Loading compiled-in X.509 certificates [ 3.646650] x10: 0101010101010101 [ 3.646654] x9 : ffff8000107b4c84 x8 : 7f7f7f7f7f7f7f7f [ 3.646661] x7 : ffff000009fe5880 x6 : 0000000000000000 [ 3.646668] x5 : 0000000000000000 x4 : ffff000009fe5880 [ 3.646674] x3 : ffff80001250d7a0 x2 : ffff000009fe5880 [ 3.746653] x1 : ffff00003b64f680 x0 : dead000000000100 [ 3.751949] Call trace: [ 3.757243] __clk_put+0x40/0x140 [ 3.759413] clk_put+0x18/0x28 [ 3.762885] dev_pm_opp_put_clkname+0x30/0x58 [ 3.765837] sdhci_msm_probe+0x288/0x9a8 [ 3.770265] platform_drv_probe+0x5c/0xb0 [ 3.774258] really_probe+0xf0/0x4d8 [ 3.778163] driver_probe_device+0xfc/0x168 [ 3.781810] __driver_attach_async_helper+0xbc/0xc8 [ 3.785717] async_run_entry_fn+0x4c/0x1b0 [ 3.790575] process_one_work+0x1c8/0x498 [ 3.794741] worker_thread+0x54/0x428 [ 3.798822] kthread+0x120/0x158 [ 3.802467] ret_from_fork+0x10/0x30 [ 3.805771] Code: 35000720 a9438660 f9000020 b4000040 (f9000401) [ 3.809334] ---[ end trace 1a607a5ea6891b9f ]---
full test log link, https://lkft.validation.linaro.org/scheduler/job/1765840#L2014 https://lkft.validation.linaro.org/scheduler/job/1765842#L1960
-- Linaro LKFT https://lkft.linaro.org
Naresh, thanks for reporting!
There have been regressions related to the opp library this cycle, so I am wondering if Viresh may have any ideas, before going into more details.
I am really pissed at this. This is the exact bug we got earlier, which Naresh also confirmed as being fixed after the patches I proposed.
One thing that also changed from the sdhci-msm point of view, is that we enabled async probe [1]. This could be the thing that triggers an untested error path of the probe?
Maybe, but I am not sure if this will cause such an issue. At max it should cause issues for other stuff that depends on sdhci.
Otherwise we can always try to revert "mmc: sdhci-msm: Unconditionally call dev_pm_opp_of_remove_table()", which I recently applied again after the earlier errors.
Yeah, that's the easiest of all.
I am trying to find someone with local 410c who can help me fix it in realtime. Lets see.
On Wed, 16 Sep 2020 at 07:22, Viresh Kumar viresh.kumar@linaro.org wrote:
On 15-09-20, 23:39, Ulf Hansson wrote:
On Tue, 15 Sep 2020 at 16:33, Naresh Kamboju naresh.kamboju@linaro.org wrote:
arm64 dragonboard-410c boot failed while running linux next 2020915 due to the kernel crash.
metadata: git branch: master git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git describe: next-20200915 make_kernelversion: 5.9.0-rc5 kernel-config: https://builds.tuxbuild.com/J5oDTkph2aj855oeGOd45Q/kernel.config
crash log:
[ 3.517615] Synopsys Designware Multimedia Card Interface Driver [ 3.524258] sdhci-pltfm: SDHCI platform and OF driver helper [ 3.531302] Unable to handle kernel paging request at virtual address dead000000000108 [ 3.531396] Mem abort info: [ 3.531460] sdhci_msm 7864900.sdhci: Got CD GPIO [ 3.539182] ESR = 0x96000044 [ 3.541953] ledtrig-cpu: registered to indicate activity on CPUs [ 3.546692] EC = 0x25: DABT (current EL), IL = 32 bits [ 3.546701] SET = 0, FnV = 0 [ 3.555694] usbcore: registered new interface driver usbhid [ 3.555703] usbhid: USB HID core driver [ 3.561638] genirq: irq_chip msmgpio did not update eff. affinity mask of irq 75 [ 3.563908] EA = 0, S1PTW = 0 [ 3.580792] Data abort info: [ 3.583673] ISV = 0, ISS = 0x00000044 [ 3.583900] NET: Registered protocol family 10 [ 3.586785] CM = 0, WnR = 1 [ 3.586794] [dead000000000108] address between user and kernel address ranges [ 3.586806] Internal error: Oops: 96000044 [#1] PREEMPT SMP [ 3.591869] Segment Routing with IPv6 [ 3.594829] Modules linked in: [ 3.594841] CPU: 2 PID: 7 Comm: kworker/u8:0 Not tainted 5.9.0-rc5-next-20200915 #1 [ 3.594844] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 3.594862] Workqueue: events_unbound async_run_entry_fn [ 3.597959] NET: Registered protocol family 17 [ 3.604991] pstate: 60000005 (nZCv daif -PAN -UAO BTYPE=--) [ 3.605000] pc : __clk_put+0x40/0x140 [ 3.605009] lr : __clk_put+0x2c/0x140 [ 3.610613] 9pnet: Installing 9P2000 support [ 3.614183] sp : ffff80001005bbe0 [ 3.614189] x29: ffff80001005bbe0 [ 3.617233] Key type dns_resolver registered [ 3.624696] x28: 000000000000002e [ 3.624701] x27: ffff00003b620a68 x26: ffff800011496568 [ 3.624708] x25: ffff00003fcfe8f8 x24: ffff00003d30c410 [ 3.632518] registered taskstats version 1 [ 3.636931] x23: ffff800011495cf8 x22: ffff00003b620a40 [ 3.636938] x21: ffff00003d30c400 x20: ffff00003b620580 [ 3.636945] x19: ffff00003b64f380 x18: 0000000007824000 [ 3.636951] x17: ffff00003b620a00 x16: ffff00003b6205d0 [ 3.636958] x15: ffff8000119929f8 x14: ffffffffffffffff [ 3.636965] x13: ffff800012947000 x12: ffff800012947000 [ 3.636975] x11: 0000000000000020 [ 3.641233] Loading compiled-in X.509 certificates [ 3.646650] x10: 0101010101010101 [ 3.646654] x9 : ffff8000107b4c84 x8 : 7f7f7f7f7f7f7f7f [ 3.646661] x7 : ffff000009fe5880 x6 : 0000000000000000 [ 3.646668] x5 : 0000000000000000 x4 : ffff000009fe5880 [ 3.646674] x3 : ffff80001250d7a0 x2 : ffff000009fe5880 [ 3.746653] x1 : ffff00003b64f680 x0 : dead000000000100 [ 3.751949] Call trace: [ 3.757243] __clk_put+0x40/0x140 [ 3.759413] clk_put+0x18/0x28 [ 3.762885] dev_pm_opp_put_clkname+0x30/0x58 [ 3.765837] sdhci_msm_probe+0x288/0x9a8 [ 3.770265] platform_drv_probe+0x5c/0xb0 [ 3.774258] really_probe+0xf0/0x4d8 [ 3.778163] driver_probe_device+0xfc/0x168 [ 3.781810] __driver_attach_async_helper+0xbc/0xc8 [ 3.785717] async_run_entry_fn+0x4c/0x1b0 [ 3.790575] process_one_work+0x1c8/0x498 [ 3.794741] worker_thread+0x54/0x428 [ 3.798822] kthread+0x120/0x158 [ 3.802467] ret_from_fork+0x10/0x30 [ 3.805771] Code: 35000720 a9438660 f9000020 b4000040 (f9000401) [ 3.809334] ---[ end trace 1a607a5ea6891b9f ]---
full test log link, https://lkft.validation.linaro.org/scheduler/job/1765840#L2014 https://lkft.validation.linaro.org/scheduler/job/1765842#L1960
-- Linaro LKFT https://lkft.linaro.org
Naresh, thanks for reporting!
There have been regressions related to the opp library this cycle, so I am wondering if Viresh may have any ideas, before going into more details.
I am really pissed at this. This is the exact bug we got earlier, which Naresh also confirmed as being fixed after the patches I proposed.
No worries, let's just fix it, again. :-)
One thing that also changed from the sdhci-msm point of view, is that we enabled async probe [1]. This could be the thing that triggers an untested error path of the probe?
Maybe, but I am not sure if this will cause such an issue. At max it should cause issues for other stuff that depends on sdhci.
Otherwise we can always try to revert "mmc: sdhci-msm: Unconditionally call dev_pm_opp_of_remove_table()", which I recently applied again after the earlier errors.
Yeah, that's the easiest of all.
I am trying to find someone with local 410c who can help me fix it in realtime. Lets see.
I have the board as well. If you need some help with testing, just let me know.
In any case, I will try the revert and see how that changes things.
Kind regards Uffe
On 16-09-20, 09:37, Ulf Hansson wrote:
I have the board as well. If you need some help with testing, just let me know.
In any case, I will try the revert and see how that changes things.
I am testing this with help of Naresh currently, will try to update back today itself.
On 16-09-20, 13:37, Viresh Kumar wrote:
On 16-09-20, 09:37, Ulf Hansson wrote:
I have the board as well. If you need some help with testing, just let me know.
In any case, I will try the revert and see how that changes things.
I am testing this with help of Naresh currently, will try to update back today itself.
I think I have found the issue and it is with a new patch from the opp tree (which isn't merged upstream yet):
commit 99f1c7ff37b0 ("opp: Handle multiple calls for same OPP table in _of_add_opp_table_v1()")
I have asked Naresh to run it again, lets see.
On 16-09-20, 13:50, Viresh Kumar wrote:
On 16-09-20, 13:37, Viresh Kumar wrote:
On 16-09-20, 09:37, Ulf Hansson wrote:
I have the board as well. If you need some help with testing, just let me know.
In any case, I will try the revert and see how that changes things.
I am testing this with help of Naresh currently, will try to update back today itself.
I think I have found the issue and it is with a new patch from the opp tree (which isn't merged upstream yet):
commit 99f1c7ff37b0 ("opp: Handle multiple calls for same OPP table in _of_add_opp_table_v1()")
I have asked Naresh to run it again, lets see.
https://lkft.validation.linaro.org/scheduler/job/1770973
Got fixed.
I will update my branch and push it.
Ulf, you don't need to do anything.
Naresh, Thanks a lot for testing this stuff out.