next/master boot: 254 boots: 16 failed, 231 passed with 4 offline, 1 untried/unknown, 2 conflicts (next-20190726)
Full Boot Summary: https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20190726/ Full Build Summary: https://kernelci.org/build/next/branch/master/kernel/next-20190726/
Tree: next Branch: master Git Describe: next-20190726 Git Commit: fde50b96be821ac9673a7e00847cc4605bd88f34 Git URL: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Tested: 81 unique boards, 27 SoC families, 21 builds out of 230
Boot Failures Detected:
arm: qcom_defconfig: gcc-8: qcom-apq8064-cm-qs600: 1 failed lab qcom-apq8064-ifc6410: 1 failed lab
oxnas_v6_defconfig: gcc-8: ox820-cloudengines-pogoplug-series-3: 1 failed lab
multi_v7_defconfig+CONFIG_CPU_BIG_ENDIAN=y: gcc-8: armada-xp-openblocks-ax3-4: 1 failed lab
arm64: defconfig+CONFIG_CPU_BIG_ENDIAN=y: gcc-8: meson-gxm-khadas-vim2: 1 failed lab
defconfig: gcc-8: apq8096-db820c: 1 failed lab meson-gxm-khadas-vim2: 1 failed lab meson-gxm-nexbox-a1: 1 failed lab rk3399-firefly: 1 failed lab
defconfig+CONFIG_RANDOMIZE_BASE=y: gcc-8: meson-gxl-s905x-nexbox-a95x: 1 failed lab
Offline Platforms:
arm64:
defconfig+CONFIG_CPU_BIG_ENDIAN=y: gcc-8 meson-gxbb-odroidc2: 1 offline lab
defconfig: gcc-8 meson-gxbb-odroidc2: 1 offline lab meson-gxl-s905x-nexbox-a95x: 1 offline lab
defconfig+CONFIG_RANDOMIZE_BASE=y: gcc-8 meson-gxbb-odroidc2: 1 offline lab
Conflicting Boot Failures Detected: (These likely are not failures as other labs are reporting PASS. Needs review.)
arm: multi_v7_defconfig+CONFIG_SMP=n: am57xx-beagle-x15: lab-linaro-lkft: FAIL (gcc-8) lab-drue: PASS (gcc-8)
multi_v7_defconfig: am57xx-beagle-x15: lab-linaro-lkft: FAIL (gcc-8) lab-drue: PASS (gcc-8)
--- For more info write to info@kernelci.org
On Fri, Jul 26, 2019 at 05:18:01AM -0700, kernelci.org bot wrote:
The past few versions of -next failed to boot on apq8096-db820c:
defconfig: gcc-8: apq8096-db820c: 1 failed lab
with an RCU stall towards the end of boot:
00:03:40.521336 [ 18.487538] qcom_q6v5_pas adsp-pil: adsp-pil supply px not found, using dummy regulator 00:04:01.523104 [ 39.499613] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: 00:04:01.533371 [ 39.499657] rcu: 2-...!: (0 ticks this GP) idle=9ca/1/0x4000000000000000 softirq=1450/1450 fqs=50 00:04:01.537544 [ 39.504689] (detected by 0, t=5252 jiffies, g=2425, q=619) 00:04:01.541727 [ 39.513539] Task dump for CPU 2: 00:04:01.547929 [ 39.519096] seq R running task 0 199 198 0x00000000
Full details and logs at:
https://kernelci.org/boot/id/5d3aa7ea59b5142ba868890f/
The last version that worked was from the 15th and there seem to be similar issues in mainline since -rc1.
On Fri 26 Jul 06:48 PDT 2019, Mark Brown wrote:
On Fri, Jul 26, 2019 at 05:18:01AM -0700, kernelci.org bot wrote:
The past few versions of -next failed to boot on apq8096-db820c:
defconfig: gcc-8: apq8096-db820c: 1 failed lab
with an RCU stall towards the end of boot:
00:03:40.521336 [ 18.487538] qcom_q6v5_pas adsp-pil: adsp-pil supply px not found, using dummy regulator 00:04:01.523104 [ 39.499613] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: 00:04:01.533371 [ 39.499657] rcu: 2-...!: (0 ticks this GP) idle=9ca/1/0x4000000000000000 softirq=1450/1450 fqs=50 00:04:01.537544 [ 39.504689] (detected by 0, t=5252 jiffies, g=2425, q=619) 00:04:01.541727 [ 39.513539] Task dump for CPU 2: 00:04:01.547929 [ 39.519096] seq R running task 0 199 198 0x00000000
Full details and logs at:
https://kernelci.org/boot/id/5d3aa7ea59b5142ba868890f/
The last version that worked was from the 15th and there seem to be similar issues in mainline since -rc1.
Thanks for the report Mark, afaict the problem showed up in v5.3-rc1 as well.
I think the problem is that the regulator supplying the GPU power domain(s) isn't enabled - and I think there's a lack of agreement of how this should be controlled.
But we have a partial fix for this floating around, I will give it a spin.
Regards, Bjorn
On Fri 26 Jul 06:48 PDT 2019, Mark Brown wrote:
On Fri, Jul 26, 2019 at 05:18:01AM -0700, kernelci.org bot wrote:
The past few versions of -next failed to boot on apq8096-db820c:
defconfig: gcc-8: apq8096-db820c: 1 failed lab
with an RCU stall towards the end of boot:
00:03:40.521336 [ 18.487538] qcom_q6v5_pas adsp-pil: adsp-pil supply px not found, using dummy regulator 00:04:01.523104 [ 39.499613] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: 00:04:01.533371 [ 39.499657] rcu: 2-...!: (0 ticks this GP) idle=9ca/1/0x4000000000000000 softirq=1450/1450 fqs=50 00:04:01.537544 [ 39.504689] (detected by 0, t=5252 jiffies, g=2425, q=619) 00:04:01.541727 [ 39.513539] Task dump for CPU 2: 00:04:01.547929 [ 39.519096] seq R running task 0 199 198 0x00000000
Full details and logs at:
https://kernelci.org/boot/id/5d3aa7ea59b5142ba868890f/
The last version that worked was from the 15th and there seem to be similar issues in mainline since -rc1.
As you might have seen this problem has come and gone on the apq8096-db820c and I've finally managed to narrow it down a little bit.
The problem first appears on next-20190701, with the introduction of CONFIG_RANDOMIZE_BASE in the defconfig, but after further efforts I've concluded that disabling kpti removes or hides the problem.
With kpti=no on the command line I've now successfully booted the db820c 100+ times without problems (a clear improvement from the 75% failure rate with kpti=yes).
Unfortunately I'm not yet certain why this is causing issues and I'm also seeing the same rcu stall on SDA845 under certain (erroneous?) conditions (where I don't expect them).
Regards, Bjorn
kernel-build-reports@lists.linaro.org