next/master boot: 179 boots: 11 failed, 167 passed with 1 offline (next-20180731)
Full Boot Summary: https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20180731/ Full Build Summary: https://kernelci.org/build/next/branch/master/kernel/next-20180731/
Tree: next Branch: master Git Describe: next-20180731 Git Commit: 85eac382fa06ac72adf891d04bf4d08fb09d80fa Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Tested: 66 unique boards, 26 SoC families, 21 builds out of 200
Boot Regressions Detected:
arm:
bcm2835_defconfig: bcm2835-rpi-b: lab-baylibre-seattle: failing since 1 day (last pass: next-20180727 - first fail: next-20180730) bcm2837-rpi-3-b: lab-baylibre: failing since 1 day (last pass: next-20180727 - first fail: next-20180730)
multi_v7_defconfig: qcom-apq8064-cm-qs600: lab-baylibre-seattle: new failure (last pass: next-20180730) qcom-apq8064-ifc6410: lab-baylibre-seattle: new failure (last pass: next-20180730)
qcom_defconfig: qcom-apq8064-cm-qs600: lab-baylibre-seattle: new failure (last pass: next-20180730)
arm64:
defconfig: apq8096-db820c: lab-bjorn: new failure (last pass: next-20180730) rk3399-puma-haikou: lab-theobroma-systems: new failure (last pass: next-20180730)
x86:
x86_64_defconfig: qemu: lab-baylibre: failing since 1 day (last pass: next-20180727 - first fail: next-20180730) lab-mhart: failing since 1 day (last pass: next-20180727 - first fail: next-20180730) lab-linaro-lkft: failing since 1 day (last pass: next-20180727 - first fail: next-20180730)
Boot Failures Detected:
arm64:
defconfig apq8096-db820c: 1 failed lab rk3399-puma-haikou: 1 failed lab
x86:
x86_64_defconfig qemu: 4 failed labs
arm:
qcom_defconfig qcom-apq8064-cm-qs600: 1 failed lab
multi_v7_defconfig qcom-apq8064-cm-qs600: 1 failed lab qcom-apq8064-ifc6410: 1 failed lab
bcm2835_defconfig bcm2835-rpi-b: 1 failed lab bcm2837-rpi-3-b: 1 failed lab
Offline Platforms:
arm:
qcom_defconfig: qcom-apq8064-ifc6410: 1 offline lab
--- For more info write to info@kernelci.org
On Tue, Jul 31, 2018 at 08:14:12AM -0700, kernelci.org bot wrote:
Today's -next fails to boot on a variety of Qualcomm 32 bit platforms:
multi_v7_defconfig: qcom-apq8064-cm-qs600: lab-baylibre-seattle: new failure (last pass: next-20180730) qcom-apq8064-ifc6410: lab-baylibre-seattle: new failure (last pass: next-20180730) qcom_defconfig: qcom-apq8064-cm-qs600: lab-baylibre-seattle: new failure (last pass: next-20180730)
The logs are all somewhat similar, for example:
https://storage.kernelci.org/next/master/next-20180731/arm/multi_v7_defconfi...
detects a DMA problem during MMCI initialization:
[ 2.237566] mmci-pl18x 121c0000.sdcc: mmc2: PL180 manf 51 rev0 at 0x121c0000 irq 32,0 (pio) [ 2.244790] mmci-pl18x 121c0000.sdcc: DMA channels RX dma2chan1, TX dma2chan2 [ 2.271722] mmci-pl18x 12400000.sdcc: error during DMA transfer! [ 2.271757] mmci-pl18x 12400000.sdcc: buggy DMA detected. Taking evasive action. [ 2.276798] ------------[ cut here ]------------ [ 2.284185] WARNING: CPU: 0 PID: 0 at ../include/linux/dma-mapping.h:551 bam_free_chan+0x2d8/0x2e0 [ 2.288772] Modules linked in: [ 2.297534] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.18.0-rc7-next-20180731 #1
then panics:
[ 2.513796] ------------[ cut here ]------------ [ 2.518367] kernel BUG at ../mm/vmalloc.c:1608! [ 2.522968] Internal error: Oops - BUG: 0 [#1] SMP ARM
trying to release the DMA channel. I've not done any bisection or anything but I do note 8bb2299d2d0b5cc (mmc: mmci: Add and implement a ->dma_setup() callback for qcom dml) and some related commits in the MMC tree.
More details for each of the failed boots at:
https://kernelci.org/boot/id/5b6054f559b5144b9396baa9/ https://kernelci.org/boot/id/5b60551259b5144abb96bab6/ https://kernelci.org/boot/id/5b6054e259b5144b1e96bab2/
including full logs, details of the build and so on.
hi Mark, Ulf
When I see log, I think the patch in attachment could fix this issue , but like I've not qcom board I can't test if it's fixed :-(.
Ulf: for patch delivery, you prefer a patch delivery on mailing list ?
BR Ludo
On 07/31/2018 06:06 PM, Mark Brown wrote:
On Tue, Jul 31, 2018 at 08:14:12AM -0700, kernelci.org bot wrote:
Today's -next fails to boot on a variety of Qualcomm 32 bit platforms:
multi_v7_defconfig: qcom-apq8064-cm-qs600: lab-baylibre-seattle: new failure (last pass: next-20180730) qcom-apq8064-ifc6410: lab-baylibre-seattle: new failure (last pass: next-20180730) qcom_defconfig: qcom-apq8064-cm-qs600: lab-baylibre-seattle: new failure (last pass: next-20180730)
The logs are all somewhat similar, for example:
https://storage.kernelci.org/next/master/next-20180731/arm/multi_v7_defconfig/lab-baylibre-seattle/boot-qcom-apq8064-cm-qs600.html
detects a DMA problem during MMCI initialization:
[ 2.237566] mmci-pl18x 121c0000.sdcc: mmc2: PL180 manf 51 rev0 at 0x121c0000 irq 32,0 (pio) [ 2.244790] mmci-pl18x 121c0000.sdcc: DMA channels RX dma2chan1, TX dma2chan2 [ 2.271722] mmci-pl18x 12400000.sdcc: error during DMA transfer! [ 2.271757] mmci-pl18x 12400000.sdcc: buggy DMA detected. Taking evasive action. [ 2.276798] ------------[ cut here ]------------ [ 2.284185] WARNING: CPU: 0 PID: 0 at ../include/linux/dma-mapping.h:551 bam_free_chan+0x2d8/0x2e0 [ 2.288772] Modules linked in: [ 2.297534] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.18.0-rc7-next-20180731 #1
then panics:
[ 2.513796] ------------[ cut here ]------------ [ 2.518367] kernel BUG at ../mm/vmalloc.c:1608! [ 2.522968] Internal error: Oops - BUG: 0 [#1] SMP ARM
trying to release the DMA channel. I've not done any bisection or anything but I do note 8bb2299d2d0b5cc (mmc: mmci: Add and implement a ->dma_setup() callback for qcom dml) and some related commits in the MMC tree.
More details for each of the failed boots at:
https://kernelci.org/boot/id/5b6054f559b5144b9396baa9/ https://kernelci.org/boot/id/5b60551259b5144abb96bab6/ https://kernelci.org/boot/id/5b6054e259b5144b1e96bab2/
including full logs, details of the build and so on.
On 1 August 2018 at 10:19, Ludovic BARRE ludovic.barre@st.com wrote:
hi Mark, Ulf
When I see log, I think the patch in attachment could fix this issue , but like I've not qcom board I can't test if it's fixed :-(.
Ulf: for patch delivery, you prefer a patch delivery on mailing list ?
Thanks for looking into this.
However, no need to post a fix this time (your patch fixed the issue, but should declare the qcom_variant_init() in mmci.h.
I have already amended the patch, so no further actions is needed.
[...]
Kind regards Uffe
On 31 July 2018 at 18:06, Mark Brown broonie@kernel.org wrote:
On Tue, Jul 31, 2018 at 08:14:12AM -0700, kernelci.org bot wrote:
Today's -next fails to boot on a variety of Qualcomm 32 bit platforms:
multi_v7_defconfig: qcom-apq8064-cm-qs600: lab-baylibre-seattle: new failure (last pass: next-20180730) qcom-apq8064-ifc6410: lab-baylibre-seattle: new failure (last pass: next-20180730) qcom_defconfig: qcom-apq8064-cm-qs600: lab-baylibre-seattle: new failure (last pass: next-20180730)
The logs are all somewhat similar, for example:
https://storage.kernelci.org/next/master/next-20180731/arm/multi_v7_defconfi...
detects a DMA problem during MMCI initialization:
[ 2.237566] mmci-pl18x 121c0000.sdcc: mmc2: PL180 manf 51 rev0 at 0x121c0000 irq 32,0 (pio) [ 2.244790] mmci-pl18x 121c0000.sdcc: DMA channels RX dma2chan1, TX dma2chan2 [ 2.271722] mmci-pl18x 12400000.sdcc: error during DMA transfer! [ 2.271757] mmci-pl18x 12400000.sdcc: buggy DMA detected. Taking evasive action. [ 2.276798] ------------[ cut here ]------------ [ 2.284185] WARNING: CPU: 0 PID: 0 at ../include/linux/dma-mapping.h:551 bam_free_chan+0x2d8/0x2e0 [ 2.288772] Modules linked in: [ 2.297534] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.18.0-rc7-next-20180731 #1
then panics:
[ 2.513796] ------------[ cut here ]------------ [ 2.518367] kernel BUG at ../mm/vmalloc.c:1608! [ 2.522968] Internal error: Oops - BUG: 0 [#1] SMP ARM
trying to release the DMA channel. I've not done any bisection or anything but I do note 8bb2299d2d0b5cc (mmc: mmci: Add and implement a ->dma_setup() callback for qcom dml) and some related commits in the MMC tree.
More details for each of the failed boots at:
https://kernelci.org/boot/id/5b6054f559b5144b9396baa9/ https://kernelci.org/boot/id/5b60551259b5144abb96bab6/ https://kernelci.org/boot/id/5b6054e259b5144b1e96bab2/
including full logs, details of the build and so on.
Mark, thanks for reporting.
Problem was a simple one liner that should have been added to included in my patch "mmc: mmci: Add and implement a ->dma_setup() callback for qcom dml". The missing oneliner caused mmci to wrongly use dma for the qcom variant.
I have amended the patch and published it, it should reach the next tree as of tomorrow. Apologize for the mess it created.
Kind regards Uffe
On Tue, Jul 31, 2018 at 08:14:12AM -0700, kernelci.org bot wrote:
Today's -next fails to boot on db820c:
arm64:
defconfig: apq8096-db820c: lab-bjorn: new failure (last pass: next-20180730)
There's nothing immediately obvious as the boot failure cause in the logs, the last output is a failure to load the ath10k_pci firmware:
04:02:53.750283 [ 4.503980] ath10k_pci 0000:01:00.0: Failed to find firmware-N.bin (N between 2 and 6) from ath10k/QCA6174/hw3.0: -2 04:02:53.756384 [ 4.504010] ath10k_pci 0000:01:00.0: could not fetch firmware files (-2) 04:02:53.760522 [ 4.513736] ath10k_pci 0000:01:00.0: could not probe fw (-2)
but I'm not sure that's the actual cause. More details, including the full boot log, here:
On Tue, Jul 31, 2018 at 05:11:14PM +0100, Mark Brown wrote:
On Tue, Jul 31, 2018 at 08:14:12AM -0700, kernelci.org bot wrote:
Today's -next fails to boot on db820c:
arm64:
defconfig: apq8096-db820c: lab-bjorn: new failure (last pass: next-20180730)
There's nothing immediately obvious as the boot failure cause in the logs, the last output is a failure to load the ath10k_pci firmware:
04:02:53.750283 [ 4.503980] ath10k_pci 0000:01:00.0: Failed to find firmware-N.bin (N between 2 and 6) from ath10k/QCA6174/hw3.0: -2 04:02:53.756384 [ 4.504010] ath10k_pci 0000:01:00.0: could not fetch firmware files (-2) 04:02:53.760522 [ 4.513736] ath10k_pci 0000:01:00.0: could not probe fw (-2)
but I'm not sure that's the actual cause. More details, including the full boot log, here:
I tried booting today's -next on db820c, using arm64 defconfig, and it booted correctly:
I also tried removing the ath10k firmware from my initrd, but it still booted correctly.
# cat /proc/version Linux version 4.18.0-rc7-next-20180731-00001-g47055e3ba913 (nks@centauri) (gcc version 7.2.1 20171011 (Linaro GCC 7.2-2017.11)) #9 SMP PREEMPT Tue Jul 31 21:34:43 CEST 2018
I guess it could be a bug that does not trigger on every boot, or it could be a problem in the kernelci infrastructure.
Kind regards, Niklas
On Tue, Jul 31, 2018 at 09:50:37PM +0200, Niklas Cassel wrote:
I guess it could be a bug that does not trigger on every boot, or it could be a problem in the kernelci infrastructure.
Infrastructure bugs *tend* to manifest differently to this FWIW, though it can never be excluded.
On Wed 01 Aug 02:31 PDT 2018, Mark Brown wrote:
On Tue, Jul 31, 2018 at 09:50:37PM +0200, Niklas Cassel wrote:
I guess it could be a bug that does not trigger on every boot, or it could be a problem in the kernelci infrastructure.
Infrastructure bugs *tend* to manifest differently to this FWIW, though it can never be excluded.
No, that's not an infrastructure issue.
The board did warn about not finding the ath10k firmware, which is always does, so that's not the issue - in itself. Then nothing happened for 266 seconds, so my lab decided to terminate the agony.
So this is either an issue with the stability of next-20180731 or with the specific board.
PS. Today's next did boot successfully on the board.
Regards, Bjorn
On 31/07/18 16:14, kernelci.org bot wrote:
next/master boot: 179 boots: 11 failed, 167 passed with 1 offline (next-20180731)
Full Boot Summary: https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20180731/ Full Build Summary: https://kernelci.org/build/next/branch/master/kernel/next-20180731/
Tree: next Branch: master Git Describe: next-20180731 Git Commit: 85eac382fa06ac72adf891d04bf4d08fb09d80fa Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Tested: 66 unique boards, 26 SoC families, 21 builds out of 200
Boot Regressions Detected:
[...]
x86:
x86_64_defconfig: qemu: lab-baylibre: failing since 1 day (last pass: next-20180727 - first fail: next-20180730) lab-mhart: failing since 1 day (last pass: next-20180727 - first fail: next-20180730) lab-linaro-lkft: failing since 1 day (last pass: next-20180727 - first fail: next-20180730)
I've run a few automated bisection on kernelci.org, it initially landed on this merge commit:
ff719be3476a Merge remote-tracking branch 'scsi/for-next'
The 2 parent commits boot fine, but not the resulting merge. So I did another bisection based on the first branch while merging the incoming one in each iteration, and that landed on this commit:
commit d5038a13eca72fb216c07eb717169092e92284f1 Author: Johannes Thumshirn jthumshirn@suse.de Date: Wed Jul 4 10:53:56 2018 +0200
scsi: core: switch to scsi-mq by default
I then tried to revert it on top of next-20180731 and it did make it boot again. Now, I haven't looked much further in the code, it's entirely possible that the problem is on another incoming branch, in the code that this config enables. At least it seems to be narrowing down the scope for where to look for a problem.
Hope this helps!
Best wishes, Guillaume
[...]
For more info write to info@kernelci.org
Kernel-build-reports mailing list Kernel-build-reports@lists.linaro.org https://lists.linaro.org/mailman/listinfo/kernel-build-reports
On Wed, Aug 01, 2018 at 11:05:36AM +0100, Guillaume Tucker wrote:
On 31/07/18 16:14, kernelci.org bot wrote:
Boot Regressions Detected:
[...]
x86:
x86_64_defconfig: qemu: lab-baylibre: failing since 1 day (last pass: next-20180727 - first fail: next-20180730) lab-mhart: failing since 1 day (last pass: next-20180727 - first fail: next-20180730) lab-linaro-lkft: failing since 1 day (last pass: next-20180727 - first fail: next-20180730)
I've run a few automated bisection on kernelci.org, it initially landed on this merge commit:
ff719be3476a Merge remote-tracking branch 'scsi/for-next'
The 2 parent commits boot fine, but not the resulting merge. So I did another bisection based on the first branch while merging the incoming one in each iteration, and that landed on this commit:
commit d5038a13eca72fb216c07eb717169092e92284f1 Author: Johannes Thumshirn <jthumshirn@suse.de> Date: Wed Jul 4 10:53:56 2018 +0200 scsi: core: switch to scsi-mq by default
I then tried to revert it on top of next-20180731 and it did make it boot again. Now, I haven't looked much further in the code, it's entirely possible that the problem is on another incoming branch, in the code that this config enables. At least it seems to be narrowing down the scope for where to look for a problem.
Copying in everyone else who signed off/acked/reviewed that commit.
On Wed, Aug 01, 2018 at 11:33:03AM +0100, Mark Brown wrote:
On Wed, Aug 01, 2018 at 11:05:36AM +0100, Guillaume Tucker wrote:
On 31/07/18 16:14, kernelci.org bot wrote:
Boot Regressions Detected:
[...]
x86:
x86_64_defconfig: qemu: lab-baylibre: failing since 1 day (last pass: next-20180727 - first fail: next-20180730) lab-mhart: failing since 1 day (last pass: next-20180727 - first fail: next-20180730) lab-linaro-lkft: failing since 1 day (last pass: next-20180727 - first fail: next-20180730)
I've run a few automated bisection on kernelci.org, it initially landed on this merge commit:
ff719be3476a Merge remote-tracking branch 'scsi/for-next'
The 2 parent commits boot fine, but not the resulting merge. So I did another bisection based on the first branch while merging the incoming one in each iteration, and that landed on this commit:
commit d5038a13eca72fb216c07eb717169092e92284f1 Author: Johannes Thumshirn <jthumshirn@suse.de> Date: Wed Jul 4 10:53:56 2018 +0200 scsi: core: switch to scsi-mq by default
I then tried to revert it on top of next-20180731 and it did make it boot again. Now, I haven't looked much further in the code, it's entirely possible that the problem is on another incoming branch, in the code that this config enables. At least it seems to be narrowing down the scope for where to look for a problem.
Copying in everyone else who signed off/acked/reviewed that commit.
You may have to provide some clue, such as dmesg log, boot disk, ...
I guess you don't use virtio-scsi/virtio-blk since both run at blk-mq mode at default, even though without d5038a13eca72fb.
Thanks, Ming
On Wed, Aug 01, 2018 at 06:51:09PM +0800, Ming Lei wrote:
You may have to provide some clue, such as dmesg log, boot disk, ...
I guess you don't use virtio-scsi/virtio-blk since both run at blk-mq mode at default, even though without d5038a13eca72fb.
Boot logs and so on can be found here:
https://kernelci.org/boot/id/5b618c9f59b514931f96ba97/ https://kernelci.org/boot/id/5b618ca359b514904d96bac5/ https://kernelci.org/boot/id/5b618cbc59b51492e896baad/
(these are today's but the symptoms are the same.) The ramdisk is unfortunately not linked through the UI, though we don't get that far.
On 1 August 2018 at 11:59, Mark Brown broonie@kernel.org wrote:
On Wed, Aug 01, 2018 at 06:51:09PM +0800, Ming Lei wrote:
You may have to provide some clue, such as dmesg log, boot disk, ...
I guess you don't use virtio-scsi/virtio-blk since both run at blk-mq mode at default, even though without d5038a13eca72fb.
Boot logs and so on can be found here:
https://kernelci.org/boot/id/5b618c9f59b514931f96ba97/ https://kernelci.org/boot/id/5b618ca359b514904d96bac5/ https://kernelci.org/boot/id/5b618cbc59b51492e896baad/
(these are today's but the symptoms are the same.) The ramdisk is unfortunately not linked through the UI, though we don't get that far.
And a full LAVA boot log from my lab http://lava.streamtester.net/scheduler/job/138067
QEMU command line here: http://lava.streamtester.net/scheduler/job/138067#L75
And a LAVA job definition, which includes the url of the ramdisk and kernel: http://lava.streamtester.net/scheduler/job/138067/definition#defline76
Kernel-build-reports mailing list Kernel-build-reports@lists.linaro.org https://lists.linaro.org/mailman/listinfo/kernel-build-reports
On Wed, Aug 01, 2018 at 12:24:00PM +0100, Matt Hart wrote:
On 1 August 2018 at 11:59, Mark Brown broonie@kernel.org wrote:
On Wed, Aug 01, 2018 at 06:51:09PM +0800, Ming Lei wrote:
You may have to provide some clue, such as dmesg log, boot disk, ...
I guess you don't use virtio-scsi/virtio-blk since both run at blk-mq mode at default, even though without d5038a13eca72fb.
Boot logs and so on can be found here:
https://kernelci.org/boot/id/5b618c9f59b514931f96ba97/ https://kernelci.org/boot/id/5b618ca359b514904d96bac5/ https://kernelci.org/boot/id/5b618cbc59b51492e896baad/
(these are today's but the symptoms are the same.) The ramdisk is unfortunately not linked through the UI, though we don't get that far.
And a full LAVA boot log from my lab http://lava.streamtester.net/scheduler/job/138067
QEMU command line here: http://lava.streamtester.net/scheduler/job/138067#L75
And a LAVA job definition, which includes the url of the ramdisk and kernel: http://lava.streamtester.net/scheduler/job/138067/definition#defline76
Thanks for the sharing!
I can reproduce this issue with above script/initrd/kernel config, and looks the issue disappeared after 'scsi_mod.use_blk_mq=0' is passed.
Not see such issue with zero-day ktest config.
Looks a bit weird, given SCSI_MQ is nothing related with ramdisk.
Thanks, Ming
On Wed, Aug 01, 2018 at 07:52:21PM +0800, Ming Lei wrote:
On Wed, Aug 01, 2018 at 12:24:00PM +0100, Matt Hart wrote:
On 1 August 2018 at 11:59, Mark Brown broonie@kernel.org wrote:
On Wed, Aug 01, 2018 at 06:51:09PM +0800, Ming Lei wrote:
You may have to provide some clue, such as dmesg log, boot disk, ...
I guess you don't use virtio-scsi/virtio-blk since both run at blk-mq mode at default, even though without d5038a13eca72fb.
Boot logs and so on can be found here:
https://kernelci.org/boot/id/5b618c9f59b514931f96ba97/ https://kernelci.org/boot/id/5b618ca359b514904d96bac5/ https://kernelci.org/boot/id/5b618cbc59b51492e896baad/
(these are today's but the symptoms are the same.) The ramdisk is unfortunately not linked through the UI, though we don't get that far.
And a full LAVA boot log from my lab http://lava.streamtester.net/scheduler/job/138067
QEMU command line here: http://lava.streamtester.net/scheduler/job/138067#L75
And a LAVA job definition, which includes the url of the ramdisk and kernel: http://lava.streamtester.net/scheduler/job/138067/definition#defline76
Thanks for the sharing!
I can reproduce this issue with above script/initrd/kernel config, and looks the issue disappeared after 'scsi_mod.use_blk_mq=0' is passed.
Not see such issue with zero-day ktest config.
Looks a bit weird, given SCSI_MQ is nothing related with ramdisk.
Ahm and: qemu [...] -append "console=ttyS0,115200 root=/dev/ram0 debug verbose"
$ grep CONFIG_BLK_DEV_RAM .config # CONFIG_BLK_DEV_RAM is not set
Something is fishy here.
On Wed, Aug 01, 2018 at 07:52:19PM +0800, Ming Lei wrote:
On Wed, Aug 01, 2018 at 12:24:00PM +0100, Matt Hart wrote:
On 1 August 2018 at 11:59, Mark Brown broonie@kernel.org wrote:
On Wed, Aug 01, 2018 at 06:51:09PM +0800, Ming Lei wrote:
You may have to provide some clue, such as dmesg log, boot disk, ...
I guess you don't use virtio-scsi/virtio-blk since both run at blk-mq mode at default, even though without d5038a13eca72fb.
Boot logs and so on can be found here:
https://kernelci.org/boot/id/5b618c9f59b514931f96ba97/ https://kernelci.org/boot/id/5b618ca359b514904d96bac5/ https://kernelci.org/boot/id/5b618cbc59b51492e896baad/
(these are today's but the symptoms are the same.) The ramdisk is unfortunately not linked through the UI, though we don't get that far.
And a full LAVA boot log from my lab http://lava.streamtester.net/scheduler/job/138067
QEMU command line here: http://lava.streamtester.net/scheduler/job/138067#L75
And a LAVA job definition, which includes the url of the ramdisk and kernel: http://lava.streamtester.net/scheduler/job/138067/definition#defline76
Thanks for the sharing!
I can reproduce this issue with above script/initrd/kernel config, and looks the issue disappeared after 'scsi_mod.use_blk_mq=0' is passed.
Not see such issue with zero-day ktest config.
Looks a bit weird, given SCSI_MQ is nothing related with ramdisk.
Seems related with sr:
1) with scsi-mq [ 2.808204] ata2.01: NODEV after polling detection [ 2.809807] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100 [ 2.827377] scsi 1:0:0:0: CD-ROM QEMU QEMU DVD-ROM 2.5+ PQ: 0 ANSI: 5
2) without scsi_mq [ 5.549135] ata2.01: NODEV after polling detection [ 5.554404] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100 [ 5.596143] scsi 1:0:0:0: CD-ROM QEMU QEMU DVD-ROM 2.5+ PQ: 0 ANSI: 5 [ 5.637126] sr 1:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray [ 5.637870] cdrom: Uniform CD-ROM driver Revision: 3.20 [ 5.648940] sr 1:0:0:0: Attached scsi CD-ROM sr0 [ 5.661605] sr 1:0:0:0: Attached scsi generic sg0 type 5
We may need to take a look at recent SCSI change.
Thanks, Ming
On Wed, Aug 01, 2018 at 08:00:44PM +0800, Ming Lei wrote:
On Wed, Aug 01, 2018 at 07:52:19PM +0800, Ming Lei wrote:
On Wed, Aug 01, 2018 at 12:24:00PM +0100, Matt Hart wrote:
On 1 August 2018 at 11:59, Mark Brown broonie@kernel.org wrote:
On Wed, Aug 01, 2018 at 06:51:09PM +0800, Ming Lei wrote:
You may have to provide some clue, such as dmesg log, boot disk, ...
I guess you don't use virtio-scsi/virtio-blk since both run at blk-mq mode at default, even though without d5038a13eca72fb.
Boot logs and so on can be found here:
https://kernelci.org/boot/id/5b618c9f59b514931f96ba97/ https://kernelci.org/boot/id/5b618ca359b514904d96bac5/ https://kernelci.org/boot/id/5b618cbc59b51492e896baad/
(these are today's but the symptoms are the same.) The ramdisk is unfortunately not linked through the UI, though we don't get that far.
And a full LAVA boot log from my lab http://lava.streamtester.net/scheduler/job/138067
QEMU command line here: http://lava.streamtester.net/scheduler/job/138067#L75
And a LAVA job definition, which includes the url of the ramdisk and kernel: http://lava.streamtester.net/scheduler/job/138067/definition#defline76
Thanks for the sharing!
I can reproduce this issue with above script/initrd/kernel config, and looks the issue disappeared after 'scsi_mod.use_blk_mq=0' is passed.
Not see such issue with zero-day ktest config.
Looks a bit weird, given SCSI_MQ is nothing related with ramdisk.
Seems related with sr:
- with scsi-mq
[ 2.808204] ata2.01: NODEV after polling detection [ 2.809807] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100 [ 2.827377] scsi 1:0:0:0: CD-ROM QEMU QEMU DVD-ROM 2.5+ PQ: 0 ANSI: 5
- without scsi_mq
[ 5.549135] ata2.01: NODEV after polling detection [ 5.554404] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100 [ 5.596143] scsi 1:0:0:0: CD-ROM QEMU QEMU DVD-ROM 2.5+ PQ: 0 ANSI: 5 [ 5.637126] sr 1:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray [ 5.637870] cdrom: Uniform CD-ROM driver Revision: 3.20 [ 5.648940] sr 1:0:0:0: Attached scsi CD-ROM sr0 [ 5.661605] sr 1:0:0:0: Attached scsi generic sg0 type 5
We may need to take a look at recent SCSI change.
[ 2.052168] ata2.01: NODEV after polling detection [ 2.053690] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100 [ 2.072269] scsi 1:0:0:0: CD-ROM QEMU QEMU DVD-ROM 2.5+ P5 [ 2.107220] sr 1:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray [ 2.107675] cdrom: Uniform CD-ROM driver Revision: 3.20 [ 2.111851] sr 1:0:0:0: Attached scsi CD-ROM sr0 [ 2.123560] sr 1:0:0:0: Attached scsi generic sg0 type 5
# cat /proc/cmdline console=ttyS0,115200 root=/dev/ram0 debug verbose
$ grep SCSI_MQ .config CONFIG_SCSI_MQ_DEFAULT=y
on Martin's latest 4.19/scsi-queue.
On Wed, Aug 01, 2018 at 02:06:11PM +0200, Johannes Thumshirn wrote:
On Wed, Aug 01, 2018 at 08:00:44PM +0800, Ming Lei wrote:
On Wed, Aug 01, 2018 at 07:52:19PM +0800, Ming Lei wrote:
On Wed, Aug 01, 2018 at 12:24:00PM +0100, Matt Hart wrote:
On 1 August 2018 at 11:59, Mark Brown broonie@kernel.org wrote:
On Wed, Aug 01, 2018 at 06:51:09PM +0800, Ming Lei wrote:
You may have to provide some clue, such as dmesg log, boot disk, ...
I guess you don't use virtio-scsi/virtio-blk since both run at blk-mq mode at default, even though without d5038a13eca72fb.
Boot logs and so on can be found here:
https://kernelci.org/boot/id/5b618c9f59b514931f96ba97/ https://kernelci.org/boot/id/5b618ca359b514904d96bac5/ https://kernelci.org/boot/id/5b618cbc59b51492e896baad/
(these are today's but the symptoms are the same.) The ramdisk is unfortunately not linked through the UI, though we don't get that far.
And a full LAVA boot log from my lab http://lava.streamtester.net/scheduler/job/138067
QEMU command line here: http://lava.streamtester.net/scheduler/job/138067#L75
And a LAVA job definition, which includes the url of the ramdisk and kernel: http://lava.streamtester.net/scheduler/job/138067/definition#defline76
Thanks for the sharing!
I can reproduce this issue with above script/initrd/kernel config, and looks the issue disappeared after 'scsi_mod.use_blk_mq=0' is passed.
Not see such issue with zero-day ktest config.
Looks a bit weird, given SCSI_MQ is nothing related with ramdisk.
Seems related with sr:
- with scsi-mq
[ 2.808204] ata2.01: NODEV after polling detection [ 2.809807] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100 [ 2.827377] scsi 1:0:0:0: CD-ROM QEMU QEMU DVD-ROM 2.5+ PQ: 0 ANSI: 5
- without scsi_mq
[ 5.549135] ata2.01: NODEV after polling detection [ 5.554404] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100 [ 5.596143] scsi 1:0:0:0: CD-ROM QEMU QEMU DVD-ROM 2.5+ PQ: 0 ANSI: 5 [ 5.637126] sr 1:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray [ 5.637870] cdrom: Uniform CD-ROM driver Revision: 3.20 [ 5.648940] sr 1:0:0:0: Attached scsi CD-ROM sr0 [ 5.661605] sr 1:0:0:0: Attached scsi generic sg0 type 5
We may need to take a look at recent SCSI change.
[ 2.052168] ata2.01: NODEV after polling detection [ 2.053690] ata2.00: ATAPI: QEMU DVD-ROM, 2.5+, max UDMA/100 [ 2.072269] scsi 1:0:0:0: CD-ROM QEMU QEMU DVD-ROM 2.5+ P5 [ 2.107220] sr 1:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray [ 2.107675] cdrom: Uniform CD-ROM driver Revision: 3.20 [ 2.111851] sr 1:0:0:0: Attached scsi CD-ROM sr0 [ 2.123560] sr 1:0:0:0: Attached scsi generic sg0 type 5
# cat /proc/cmdline console=ttyS0,115200 root=/dev/ram0 debug verbose
$ grep SCSI_MQ .config CONFIG_SCSI_MQ_DEFAULT=y
on Martin's latest 4.19/scsi-queue.
I just checked my daily test log, looks this issue is reported 1st time on next-20180731.
Thanks, Ming
On Thu, Aug 02, 2018 at 07:15:30AM +0800, Ming Lei wrote:
I just checked my daily test log, looks this issue is reported 1st time on next-20180731.
From the diff between next-20180727 and next-20180731 in drivers/scsi
nothing really sticks out.
$ PAGER= git diff --stat next-20180730..next-20180731 drivers/scsi/ drivers/scsi/3w-9xxx.c | 6 +++++- drivers/scsi/3w-sas.c | 3 +++ drivers/scsi/3w-xxxx.c | 2 ++ drivers/scsi/a100u2w.c | 4 ++-- drivers/scsi/atp870u.c | 16 ++++++++-------- drivers/scsi/ibmvscsi/ibmvscsi.c | 6 ++---- drivers/scsi/libiscsi.c | 2 ++ drivers/scsi/lpfc/lpfc_nvmet.c | 8 +++----- drivers/scsi/scsi_debug.c | 47 ++++++++++++++++++++++++++++++++--------------- 9 files changed, 59 insertions(+), 35 deletions(-)
but drivers/ata has seen some (power management) related changes: $ PAGER= git log --oneline --no-merges next-20180730..next-20180731 drivers/ata/ 11c291461b6e ata: libahci: Allow reconfigure of DEVSLP register 2dbb3ec29a6c ata: libahci: Correct setting of DEVSLP register b1a9585cc396 ata: ahci: Enable DEVSLP by default on x86 with SLP_S0 a5ec5a7bfd1f ata: ahci: Support state with min power but Partial low power state
$ PAGER= git diff --stat next-20180730..next-20180731 drivers/ata/ drivers/ata/ahci.c | 38 +++++++++++++++++++++++++++++++++----- drivers/ata/libahci.c | 25 ++++++++++++++++--------- drivers/ata/libata-core.c | 1 + drivers/ata/libata-scsi.c | 1 + 4 files changed, 51 insertions(+), 14 deletions(-)
I'll be looking into it.
On Wed, Aug 01, 2018 at 12:24:00PM +0100, Matt Hart wrote:
And a full LAVA boot log from my lab http://lava.streamtester.net/scheduler/job/138067
QEMU command line here: http://lava.streamtester.net/scheduler/job/138067#L75
And a LAVA job definition, which includes the url of the ramdisk and kernel: http://lava.streamtester.net/scheduler/job/138067/definition#defline76
I grabbed your qemu cmdline and kernel config and try to reproduce it locally.
Johannes
On Wed, Aug 01, 2018 at 11:59:19AM +0100, Mark Brown wrote:
On Wed, Aug 01, 2018 at 06:51:09PM +0800, Ming Lei wrote:
You may have to provide some clue, such as dmesg log, boot disk, ...
I guess you don't use virtio-scsi/virtio-blk since both run at blk-mq mode at default, even though without d5038a13eca72fb.
Boot logs and so on can be found here:
https://kernelci.org/boot/id/5b618c9f59b514931f96ba97/ https://kernelci.org/boot/id/5b618ca359b514904d96bac5/ https://kernelci.org/boot/id/5b618cbc59b51492e896baad/
(these are today's but the symptoms are the same.) The ramdisk is unfortunately not linked through the UI, though we don't get that far.
Can you give us a bit more information about you test setups? Like Qemu command line, etc..? From you kernel config I see you're not using virtio (as Ming already suggested). What medium are you booting off?
Thanks, Johannes
On 01/08/18 12:25, Johannes Thumshirn wrote:
On Wed, Aug 01, 2018 at 11:59:19AM +0100, Mark Brown wrote:
On Wed, Aug 01, 2018 at 06:51:09PM +0800, Ming Lei wrote:
You may have to provide some clue, such as dmesg log, boot disk, ...
I guess you don't use virtio-scsi/virtio-blk since both run at blk-mq mode at default, even though without d5038a13eca72fb.
Boot logs and so on can be found here:
https://kernelci.org/boot/id/5b618c9f59b514931f96ba97/ https://kernelci.org/boot/id/5b618ca359b514904d96bac5/ https://kernelci.org/boot/id/5b618cbc59b51492e896baad/
(these are today's but the symptoms are the same.) The ramdisk is unfortunately not linked through the UI, though we don't get that far.
Can you give us a bit more information about you test setups? Like Qemu command line, etc..? From you kernel config I see you're not using virtio (as Ming already suggested). What medium are you booting off?
Sure, sorry I should have put that in my first email.
Here's a couple of LAVA boot tests, one passing with the commit reverted and one failing with plain next-20180731:
https://lava.collabora.co.uk/scheduler/job/1215154 https://lava.collabora.co.uk/scheduler/job/1215173
The qemu command line can be found in the log, copying it here:
/usr/bin/qemu-system-x86_64 -cpu host -enable-kvm -nographic -net nic,model=virtio,macaddr=DE:AD:BE:EF:AE:1B -net user -m 1024 -monitor none -kernel /var/lib/lava/dispatcher/tmp/1215173/deployimages-3p80s2zk/kernel/bzImage-85eac382fa06 -append "console=ttyS0,115200 root=/dev/ram0 debug verbose" -initrd /var/lib/lava/dispatcher/tmp/1215173/deployimages-3p80s2zk/ramdisk/rootfs.cpio.gz
The kernel images I used have the git revision in their file name to make it clear where they came from, they were built with x86_64_defconfig. The links to the kernel and ramdisks can be found in the job definition:
https://lava.collabora.co.uk/scheduler/job/1215154/definition
Please let me know if you need any more details. I'm happy to run other boot tests on that same setup if that helps verifying things (such as enabling some VIRTIO configs...).
Best wishes, Guillaume
On Wed, Aug 01, 2018 at 01:37:17PM +0100, Guillaume Tucker wrote:
The kernel images I used have the git revision in their file name to make it clear where they came from, they were built with x86_64_defconfig. The links to the kernel and ramdisks can be found in the job definition:
https://lava.collabora.co.uk/scheduler/job/1215154/definition
Please let me know if you need any more details. I'm happy to run other boot tests on that same setup if that helps verifying things (such as enabling some VIRTIO configs...).
Yes, can you please enable CONFIG_BLK_DEV_RAM? See my other mails in this thread for details.
Thanks, Johannes
On 01/08/18 13:40, Johannes Thumshirn wrote:
On Wed, Aug 01, 2018 at 01:37:17PM +0100, Guillaume Tucker wrote:
The kernel images I used have the git revision in their file name to make it clear where they came from, they were built with x86_64_defconfig. The links to the kernel and ramdisks can be found in the job definition:
https://lava.collabora.co.uk/scheduler/job/1215154/definition
Please let me know if you need any more details. I'm happy to run other boot tests on that same setup if that helps verifying things (such as enabling some VIRTIO configs...).
Yes, can you please enable CONFIG_BLK_DEV_RAM? See my other mails in this thread for details.
Sure, although it didn't make any apparent difference, still on next-20180731:
https://lava.collabora.co.uk/scheduler/job/1215417
The .config file I used is available here, with just CONFIG_BLK_DEV_RAM=y on top of defconfig:
https://people.collabora.com/~gtucker/lava/boot/debug/bzImage-85eac382fa06-b...
Best wishes, Guillaume
On Wed, Aug 01, 2018 at 02:50:40PM +0100, Guillaume Tucker wrote:
Sure, although it didn't make any apparent difference, still on next-20180731:
https://lava.collabora.co.uk/scheduler/job/1215417
The .config file I used is available here, with just CONFIG_BLK_DEV_RAM=y on top of defconfig:
https://people.collabora.com/~gtucker/lava/boot/debug/bzImage-85eac382fa06-blk-dev.config
OK, this is a deviation from what I see here (on mkp's 4.19/scsi-queue not next though).
On 31/07/18 16:14, kernelci.org bot wrote:
next/master boot: 179 boots: 11 failed, 167 passed with 1 offline (next-20180731)
Full Boot Summary: https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20180731/ Full Build Summary: https://kernelci.org/build/next/branch/master/kernel/next-20180731/
Tree: next Branch: master Git Describe: next-20180731 Git Commit: 85eac382fa06ac72adf891d04bf4d08fb09d80fa Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Tested: 66 unique boards, 26 SoC families, 21 builds out of 200
Boot Regressions Detected:
arm:
[...]> bcm2837-rpi-3-b:
lab-baylibre: failing since 1 day (last pass: next-20180727 - first fail: next-20180730)
I've run an automated bisection on kernelci.org which found this "bad" commit:
bb4b894addb09a069c072a0a032f644cc470d17f is the first bad commit commit bb4b894addb09a069c072a0a032f644cc470d17f Author: Srinivas Kandagatla srinivas.kandagatla@linaro.org Date: Fri Jul 13 16:36:28 2018 +0100
ASoC: core: add support to card re-bind using component framework
This patch aims at achieving dynamic behaviour of audio card when the dependent components disappear and reappear.
With this patch the card is removed if any of the dependent component is removed and card is added back if the dependent component comes back. All this is done using component framework and matching based on component name.
Signed-off-by: Srinivas Kandagatla srinivas.kandagatla@linaro.org Reviewed-by: Vinod Koul vkoul@kernel.org Signed-off-by: Mark Brown broonie@kernel.org
The boot tests can all be found here for each step of the bisection with extra checks at the end:
http://lava.baylibre.com:10080/scheduler/alljobs?length=25&search=bisect...
I haven't investigated any further, just ran this boot bisection between next-20180731 and its common base with mainline/master. I'm not sure if this is at all related to the issue hit in next-20180724, probably not.
Hope this helps!
Best wishes, Guillaume
[...]
For more info write to info@kernelci.org
Kernel-build-reports mailing list Kernel-build-reports@lists.linaro.org https://lists.linaro.org/mailman/listinfo/kernel-build-reports
On 01/08/18 16:44, Guillaume Tucker wrote:
On 31/07/18 16:14, kernelci.org bot wrote:
next/master boot: 179 boots: 11 failed, 167 passed with 1 offline (next-20180731)
Full Boot Summary: https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20180731/ Full Build Summary: https://kernelci.org/build/next/branch/master/kernel/next-20180731/
Tree: next Branch: master Git Describe: next-20180731 Git Commit: 85eac382fa06ac72adf891d04bf4d08fb09d80fa Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Tested: 66 unique boards, 26 SoC families, 21 builds out of 200
Boot Regressions Detected:
arm:
[...]> bcm2837-rpi-3-b:
lab-baylibre: failing since 1 day (last pass: next-20180727 - first fail: next-20180730)
I've run an automated bisection on kernelci.org which found this "bad" commit:
bb4b894addb09a069c072a0a032f644cc470d17f is the first bad commit commit bb4b894addb09a069c072a0a032f644cc470d17f Author: Srinivas Kandagatla srinivas.kandagatla@linaro.org Date: Fri Jul 13 16:36:28 2018 +0100
ASoC: core: add support to card re-bind using component framework This patch aims at achieving dynamic behaviour of audio card when the dependent components disappear and reappear. With this patch the card is removed if any of the dependent component is removed and card is added back if the dependent component comes back. All this is done using component framework and matching based on component name. Signed-off-by: Srinivas Kandagatla srinivas.kandagatla@linaro.org Reviewed-by: Vinod Koul vkoul@kernel.org Signed-off-by: Mark Brown broonie@kernel.org
The boot tests can all be found here for each step of the bisection with extra checks at the end:
http://lava.baylibre.com:10080/scheduler/alljobs?length=25&search=bisect...
Sorry, this is the correct link for this bisection:
http://lava.baylibre.com:10080/scheduler/alljobs?length=25&search=bisect...
I haven't investigated any further, just ran this boot bisection between next-20180731 and its common base with mainline/master. I'm not sure if this is at all related to the issue hit in next-20180724, probably not.
Hope this helps!
Best wishes, Guillaume
[...]
For more info write to info@kernelci.org
Kernel-build-reports mailing list Kernel-build-reports@lists.linaro.org https://lists.linaro.org/mailman/listinfo/kernel-build-reports
On Wed, Aug 01, 2018 at 04:44:03PM +0100, Guillaume Tucker wrote:
On 31/07/18 16:14, kernelci.org bot wrote:
[...]> bcm2837-rpi-3-b:
lab-baylibre: failing since 1 day (last pass: next-20180727 - first fail: next-20180730)
I've run an automated bisection on kernelci.org which found this "bad" commit:
bb4b894addb09a069c072a0a032f644cc470d17f is the first bad commit commit bb4b894addb09a069c072a0a032f644cc470d17f Author: Srinivas Kandagatla srinivas.kandagatla@linaro.org Date: Fri Jul 13 16:36:28 2018 +0100
ASoC: core: add support to card re-bind using component framework This patch aims at achieving dynamic behaviour of audio card when the dependent components disappear and reappear.
(something in your mail client or workflow mangled the commit log here).
I'm somewhat suspicious of this result given that that commit has been in -next rather longer than one day and the board doesn't visibly have a sound card in mainline (there are some I2S controllers in there but no card to use them).
On 01/08/18 17:02, Mark Brown wrote:
On Wed, Aug 01, 2018 at 04:44:03PM +0100, Guillaume Tucker wrote:
On 31/07/18 16:14, kernelci.org bot wrote:
[...]> bcm2837-rpi-3-b:
lab-baylibre: failing since 1 day (last pass: next-20180727 - first fail: next-20180730)
I've run an automated bisection on kernelci.org which found this "bad" commit:
bb4b894addb09a069c072a0a032f644cc470d17f is the first bad commit commit bb4b894addb09a069c072a0a032f644cc470d17f Author: Srinivas Kandagatla srinivas.kandagatla@linaro.org Date: Fri Jul 13 16:36:28 2018 +0100
ASoC: core: add support to card re-bind using component framework This patch aims at achieving dynamic behaviour of audio card when the dependent components disappear and reappear.
(something in your mail client or workflow mangled the commit log here).
I'm somewhat suspicious of this result given that that commit has been in -next rather longer than one day and the board doesn't visibly have a sound card in mainline (there are some I2S controllers in there but no card to use them).
Indeed, this bisection was run with CONFIG_MODULES disabled which may have caused this. So although it's a bit surprising that it failed to boot with modules disabled, please discard the bisection result as far as regular boot tests are concerned.
Best wishes, Guillaume
Hi,
Mark Brown broonie@kernel.org hat am 1. August 2018 um 18:02 geschrieben:
On Wed, Aug 01, 2018 at 04:44:03PM +0100, Guillaume Tucker wrote:
On 31/07/18 16:14, kernelci.org bot wrote:
[...]> bcm2837-rpi-3-b:
lab-baylibre: failing since 1 day (last pass: next-20180727 - first fail: next-20180730)
I've run an automated bisection on kernelci.org which found this "bad" commit:
bb4b894addb09a069c072a0a032f644cc470d17f is the first bad commit commit bb4b894addb09a069c072a0a032f644cc470d17f Author: Srinivas Kandagatla srinivas.kandagatla@linaro.org Date: Fri Jul 13 16:36:28 2018 +0100
ASoC: core: add support to card re-bind using component framework This patch aims at achieving dynamic behaviour of audio card when the dependent components disappear and reappear.
i can confirm this after a manual bisect.
(something in your mail client or workflow mangled the commit log here).
I'm somewhat suspicious of this result given that that commit has been in -next rather longer than one day
The boot issue exists in sound/for-next since 2018-07-18 [1], which is shortly after this commit.
and the board doesn't visibly have a sound card in mainline (there are some I2S controllers in there but no card to use them).
The BCM2835 has HDMI Audio from vc4_hdmi. After removing the dma-channel "audio-rx" from the DTS the RPi boots properly even with the patch above.
Any ideas to investigate the issue further?
Best regards Stefan
Hi Stefan,
On 01/08/18 20:11, Stefan Wahren wrote:
bb4b894addb09a069c072a0a032f644cc470d17f is the first bad commit commit bb4b894addb09a069c072a0a032f644cc470d17f Author: Srinivas Kandagatlasrinivas.kandagatla@linaro.org Date: Fri Jul 13 16:36:28 2018 +0100 ASoC: core: add support to card re-bind using component framework This patch aims at achieving dynamic behaviour of audio card when the dependent components disappear and reappear.
i can confirm this after a manual bisect.
(something in your mail client or workflow mangled the commit log here).
I'm somewhat suspicious of this result given that that commit has been in -next rather longer than one day
The boot issue exists in sound/for-next since 2018-07-18 [1], which is shortly after this commit.
and the board doesn't visibly have a sound card in mainline (there are some I2S controllers in there but no card to use them).
The BCM2835 has HDMI Audio from vc4_hdmi. After removing the dma-channel "audio-rx" from the DTS the RPi boots properly even with the patch above.
Any ideas to investigate the issue further?
When it does not boot, does it hang/deadlock or do we have a crash report?
It is not obvious from the bootlog from https://storage.kernelci.org/next/master/next-20180731/arm/bcm2835_defconfig...
--srini
Best regards Stefan
Hi Srinivas,
Am 02.08.2018 um 09:28 schrieb Srinivas Kandagatla:
Hi Stefan,
On 01/08/18 20:11, Stefan Wahren wrote:
bb4b894addb09a069c072a0a032f644cc470d17f is the first bad commit commit bb4b894addb09a069c072a0a032f644cc470d17f Author: Srinivas Kandagatlasrinivas.kandagatla@linaro.org Date: Fri Jul 13 16:36:28 2018 +0100 ASoC: core: add support to card re-bind using component framework This patch aims at achieving dynamic behaviour of audio card when the dependent components disappear and reappear.
i can confirm this after a manual bisect.
(something in your mail client or workflow mangled the commit log here).
I'm somewhat suspicious of this result given that that commit has been in -next rather longer than one day
The boot issue exists in sound/for-next since 2018-07-18 [1], which is shortly after this commit.
and the board doesn't visibly have a sound card in mainline (there are some I2S controllers in there but no card to use them).
The BCM2835 has HDMI Audio from vc4_hdmi. After removing the dma-channel "audio-rx" from the DTS the RPi boots properly even with the patch above.
Any ideas to investigate the issue further?
When it does not boot, does it hang/deadlock or do we have a crash report?
it hangs / deadlock endlessly. I didn't see any crash.
It is not obvious from the bootlog from https://storage.kernelci.org/next/master/next-20180731/arm/bcm2835_defconfig...
--srini
Best regards Stefan
Hi Stefan/Mark,
On 02/08/18 08:44, Stefan Wahren wrote:
Hi Srinivas,
Am 02.08.2018 um 09:28 schrieb Srinivas Kandagatla:
Hi Stefan,
On 01/08/18 20:11, Stefan Wahren wrote:
bb4b894addb09a069c072a0a032f644cc470d17f is the first bad commit commit bb4b894addb09a069c072a0a032f644cc470d17f Author: Srinivas Kandagatlasrinivas.kandagatla@linaro.org Date: Fri Jul 13 16:36:28 2018 +0100 ASoC: core: add support to card re-bind using component framework This patch aims at achieving dynamic behaviour of audio card when the dependent components disappear and reappear.
i can confirm this after a manual bisect.
(something in your mail client or workflow mangled the commit log here).
I'm somewhat suspicious of this result given that that commit has been in -next rather longer than one day
The boot issue exists in sound/for-next since 2018-07-18 [1], which is shortly after this commit.
and the board doesn't visibly have a sound card in mainline (there are some I2S controllers in there but no card to use them).
The BCM2835 has HDMI Audio from vc4_hdmi. After removing the dma-channel "audio-rx" from the DTS the RPi boots properly even with the patch above.
Any ideas to investigate the issue further?
When it does not boot, does it hang/deadlock or do we have a crash report?
it hangs / deadlock endlessly. I didn't see any crash.
I can confirm that this is a deadlock due to this patch, This happens due to a big mutex lock in component framework which keeps hold of it when a component is added, removed or bind/unbind callback is invoked. Looks like the vc4_hdmi is path is invoked as part of component fw component_bind_all() callback which then inside this big lock calls devm_snd_soc_register_component() which does component_add which trys to take the mutex which is when it gets dead lock.
Mark, am not sure what is the best way to go, but I would suggest to remove this offending patch, I will rework on this in the mean time, or try to see if component framework can be made more flexible to handle such use-cases.
Sorry for not testing all the possible usecases due to limitation of hardware!
Thanks, srini
It is not obvious from the bootlog from https://storage.kernelci.org/next/master/next-20180731/arm/bcm2835_defconfig...
--srini
Best regards Stefan
On Thu, Aug 02, 2018 at 03:26:40PM +0100, Srinivas Kandagatla wrote:
Mark, am not sure what is the best way to go, but I would suggest to remove this offending patch, I will rework on this in the mean time, or try to see if component framework can be made more flexible to handle such use-cases.
Yup, probably best to revert as we don't immediately have a fix (I'm drawing a blank too). Can you send me a patch for the revert with a description of the problem please? If not I'll pull one together but it might be tomorrow before I get to it.
On Wed, Aug 01, 2018 at 09:11:40PM +0200, Stefan Wahren wrote:
Mark Brown broonie@kernel.org hat am 1. August 2018 um 18:02 geschrieben:
(something in your mail client or workflow mangled the commit log here).
I'm somewhat suspicious of this result given that that commit has been in -next rather longer than one day
The boot issue exists in sound/for-next since 2018-07-18 [1], which is shortly after this commit.
So did something change in mainline that caused this to trigger? I note that there's been the other DMA issues with these platforms and MMCI so I'm wondering if there's something going on with the DMA here.
and the board doesn't visibly have a sound card in mainline (there are some I2S controllers in there but no card to use them).
The BCM2835 has HDMI Audio from vc4_hdmi. After removing the dma-channel "audio-rx" from the DTS the RPi boots properly even with the patch above.
Any ideas to investigate the issue further?
Testing with things like the lockup detector enabled might show where it's stalled out.
On 01/08/18 16:44, Guillaume Tucker wrote:
On 31/07/18 16:14, kernelci.org bot wrote:
next/master boot: 179 boots: 11 failed, 167 passed with 1 offline (next-20180731)
Full Boot Summary: https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20180731/
Full Build Summary: https://kernelci.org/build/next/branch/master/kernel/next-20180731/
Tree: next Branch: master Git Describe: next-20180731 Git Commit: 85eac382fa06ac72adf891d04bf4d08fb09d80fa Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Tested: 66 unique boards, 26 SoC families, 21 builds out of 200
Boot Regressions Detected:
arm:
[...]> bcm2837-rpi-3-b:
lab-baylibre: failing since 1 day (last pass:
next-20180727 - first fail: next-20180730)
Looking at bootlog
https://storage.kernelci.org/next/master/next-20180731/arm/bcm2835_defconfig...
does not obviously show any regression except one warning!
Are you sure that bcm2837-rpi-3-b has regression?
I've run an automated bisection on kernelci.org which found this "bad" commit:
bb4b894addb09a069c072a0a032f644cc470d17f is the first bad commit commit bb4b894addb09a069c072a0a032f644cc470d17f
This patch has been in next for more than 10-14 days.
--srini
Author: Srinivas Kandagatla srinivas.kandagatla@linaro.org Date: Fri Jul 13 16:36:28 2018 +0100
ASoC: core: add support to card re-bind using component framework This patch aims at achieving dynamic behaviour of audio card when
kernel-build-reports@lists.linaro.org