next/master boot: 285 boots: 16 failed, 264 passed with 3 offline, 1 untried/unknown, 1 conflict (next-20190718)
Full Boot Summary: https://kernelci.org/boot/all/job/next/branch/master/kernel/next-20190718/ Full Build Summary: https://kernelci.org/build/next/branch/master/kernel/next-20190718/
Tree: next Branch: master Git Describe: next-20190718 Git Commit: 6d21a41b7b1f46d5d5c3ddc26b55c5c4a6a826b9 Git URL: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git Tested: 90 unique boards, 27 SoC families, 22 builds out of 229
Boot Regressions Detected:
arm64:
defconfig: gcc-8: sun50i-a64-bananapi-m64: lab-clabbe: new failure (last pass: next-20190717) sun50i-h6-orangepi-one-plus: lab-clabbe: new failure (last pass: next-20190717)
Boot Failures Detected:
arm: qcom_defconfig: gcc-8: qcom-apq8064-cm-qs600: 1 failed lab qcom-apq8064-ifc6410: 1 failed lab
multi_v7_defconfig+CONFIG_EFI=y+CONFIG_ARM_LPAE=y: gcc-8: exynos4412-odroidx2: 1 failed lab
oxnas_v6_defconfig: gcc-8: ox820-cloudengines-pogoplug-series-3: 1 failed lab
multi_v7_defconfig+CONFIG_CPU_BIG_ENDIAN=y: gcc-8: armada-xp-openblocks-ax3-4: 1 failed lab
arm64: defconfig+CONFIG_CPU_BIG_ENDIAN=y: clang-8: meson-gxbb-nanopi-k2: 1 failed lab meson-gxl-s905x-khadas-vim: 1 failed lab meson-gxl-s905x-libretech-cc: 1 failed lab meson-gxm-khadas-vim2: 1 failed lab r8a7795-salvator-x: 1 failed lab
defconfig: gcc-8: meson-gxm-khadas-vim2: 1 failed lab rk3399-firefly: 1 failed lab sun50i-a64-bananapi-m64: 1 failed lab sun50i-h6-orangepi-one-plus: 1 failed lab
defconfig+CONFIG_RANDOMIZE_BASE=y: gcc-8: meson-gxl-s905x-nexbox-a95x: 1 failed lab
Offline Platforms:
arm64:
defconfig+CONFIG_CPU_BIG_ENDIAN=y: gcc-8 meson-gxbb-odroidc2: 1 offline lab
defconfig: gcc-8 meson-gxbb-odroidc2: 1 offline lab
defconfig+CONFIG_RANDOMIZE_BASE=y: gcc-8 meson-gxbb-odroidc2: 1 offline lab
Conflicting Boot Failure Detected: (These likely are not failures as other labs are reporting PASS. Needs review.)
arm: multi_v7_defconfig+CONFIG_SMP=n: am57xx-beagle-x15: lab-linaro-lkft: FAIL (gcc-8) lab-drue: PASS (gcc-8)
--- For more info write to info@kernelci.org
On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
Today's -next started failing to boot defconfig on rk3399-firefly:
arm64:
defconfig: gcc-8: rk3399-firefly: 1 failed lab
It hits a BUG() trying to set up cpufreq:
[ 87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz [ 87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz [ 87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz [ 87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22 [ 87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22 [ 87.495335] ------------[ cut here ]------------ [ 87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438! [ 87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
I'm struggling to see anything relevant in the diff from yesterday, the unlisted frequency warnings were there in the logs yesterday but no oops and I'm not seeing any changes in cpufreq, clk or anything relevant looking.
Full bootlog and other info can be found here:
On Thu, Jul 18, 2019 at 05:20:05PM +0100, Mark Brown wrote:
On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
Today's -next started failing to boot defconfig on rk3399-firefly:
[ 87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22 [ 87.495335] ------------[ cut here ]------------ [ 87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438! [ 87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
I'm struggling to see anything relevant in the diff from yesterday, the unlisted frequency warnings were there in the logs yesterday but no oops and I'm not seeing any changes in cpufreq, clk or anything relevant looking.
Full bootlog and other info can be found here:
This is still present in -next today, though we don't have the failure to change frequency any more - it still fails right after cpufreq though:
https://kernelci.org/boot/id/5d51784259b514a021f12245/ https://kernelci.org/boot/id/5d51781559b514a007f12241/
Mark Brown broonie@kernel.org writes:
On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
Today's -next started failing to boot defconfig on rk3399-firefly:
arm64:
defconfig: gcc-8: rk3399-firefly: 1 failed lab
It hits a BUG() trying to set up cpufreq:
[ 87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz [ 87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz [ 87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz [ 87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22 [ 87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22 [ 87.495335] ------------[ cut here ]------------ [ 87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438! [ 87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
I'm struggling to see anything relevant in the diff from yesterday, the unlisted frequency warnings were there in the logs yesterday but no oops and I'm not seeing any changes in cpufreq, clk or anything relevant looking.
Full bootlog and other info can be found here:
I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n) makes the firefly board start working again.
Note that the default defconfig enables the "performance" CPUfreq governor as the default governor, so during kernel boot, it will always switch to the max frequency.
For fun, I set the default governor to "userspace" so the kernel wouldn't make any OPP changes, and that leads to a slightly more informative splat[1]
There is still an OPP change happening because the detected OPP is not one that's listed in the table, so it tries to change to a listed OPP and fails in the bowels of clk_set_rate()
Kevin
[ resent with correct addr for linux-rockchip list ]
Mark Brown broonie@kernel.org writes:
On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
Today's -next started failing to boot defconfig on rk3399-firefly:
arm64:
defconfig: gcc-8: rk3399-firefly: 1 failed lab
It hits a BUG() trying to set up cpufreq:
[ 87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz [ 87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz [ 87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz [ 87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22 [ 87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22 [ 87.495335] ------------[ cut here ]------------ [ 87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438! [ 87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
I'm struggling to see anything relevant in the diff from yesterday, the unlisted frequency warnings were there in the logs yesterday but no oops and I'm not seeing any changes in cpufreq, clk or anything relevant looking.
Full bootlog and other info can be found here:
I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n) makes the firefly board start working again.
Note that the default defconfig enables the "performance" CPUfreq governor as the default governor, so during kernel boot, it will always switch to the max frequency.
For fun, I set the default governor to "userspace" so the kernel wouldn't make any OPP changes, and that leads to a slightly more informative splat[1]
There is still an OPP change happening because the detected OPP is not one that's listed in the table, so it tries to change to a listed OPP and fails in the bowels of clk_set_rate()
Kevin
Hi,
Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
[ resent with correct addr for linux-rockchip list ]
Mark Brown broonie@kernel.org writes:
On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
Today's -next started failing to boot defconfig on rk3399-firefly:
arm64:
defconfig: gcc-8: rk3399-firefly: 1 failed lab
It hits a BUG() trying to set up cpufreq:
[ 87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz [ 87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz [ 87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz [ 87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22 [ 87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22 [ 87.495335] ------------[ cut here ]------------ [ 87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438! [ 87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
I'm struggling to see anything relevant in the diff from yesterday, the unlisted frequency warnings were there in the logs yesterday but no oops and I'm not seeing any changes in cpufreq, clk or anything relevant looking.
Full bootlog and other info can be found here:
I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n) makes the firefly board start working again.
Note that the default defconfig enables the "performance" CPUfreq governor as the default governor, so during kernel boot, it will always switch to the max frequency.
For fun, I set the default governor to "userspace" so the kernel wouldn't make any OPP changes, and that leads to a slightly more informative splat[1]
There is still an OPP change happening because the detected OPP is not one that's listed in the table, so it tries to change to a listed OPP and fails in the bowels of clk_set_rate()
Though I think that might only be a symptom as well. Both the PLL setting code as well as the actual cpu-clock implementation is unchanged since 2017 (and runs just fine on all boards in my farm).
One source for these issues is often the regulator supplying the cpu going haywire - aka the voltage not matching the opp.
As in this error-case it's CPU4 being set, this would mean it might be the big cluster supplied by the external syr825 (fan5355 clone) that might act up. In the Firefly-rk3399 case this is even stranger.
There is a discrepancy between the "fcs,suspend-voltage-selector" between different bootloader versions (how the selection-pin is set up), so the kernel might actually write his requested voltage to the wrong register (not the one for actual voltage, but the second set used for the suspend voltage).
Did you by chance swap bootloaders at some point in recent past?
I'd assume [2] might actually be the same issue last year, though the CI-logs are not available anymore it seems.
Could you try to set the vdd_cpu_b regulator to disabled, so that cpufreq for this cluster defers and see what happens?
I don't really have a Firefly in my boardfarm, so I let 5.3-rc run on a Theobroma Puma which has the same regulator setup as the Firefly and all including the performance governor did run nicely, so it really looks like some sort of Firefly specific issue.
Heiko
Hi Heiko,
Heiko Stuebner heiko@sntech.de writes:
Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
[ resent with correct addr for linux-rockchip list ]
Mark Brown broonie@kernel.org writes:
On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
Today's -next started failing to boot defconfig on rk3399-firefly:
arm64:
defconfig: gcc-8: rk3399-firefly: 1 failed lab
It hits a BUG() trying to set up cpufreq:
[ 87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz [ 87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz [ 87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz [ 87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22 [ 87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22 [ 87.495335] ------------[ cut here ]------------ [ 87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438! [ 87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
I'm struggling to see anything relevant in the diff from yesterday, the unlisted frequency warnings were there in the logs yesterday but no oops and I'm not seeing any changes in cpufreq, clk or anything relevant looking.
Full bootlog and other info can be found here:
I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n) makes the firefly board start working again.
Note that the default defconfig enables the "performance" CPUfreq governor as the default governor, so during kernel boot, it will always switch to the max frequency.
For fun, I set the default governor to "userspace" so the kernel wouldn't make any OPP changes, and that leads to a slightly more informative splat[1]
There is still an OPP change happening because the detected OPP is not one that's listed in the table, so it tries to change to a listed OPP and fails in the bowels of clk_set_rate()
Though I think that might only be a symptom as well. Both the PLL setting code as well as the actual cpu-clock implementation is unchanged since 2017 (and runs just fine on all boards in my farm).
One source for these issues is often the regulator supplying the cpu going haywire - aka the voltage not matching the opp.
As in this error-case it's CPU4 being set, this would mean it might be the big cluster supplied by the external syr825 (fan5355 clone) that might act up. In the Firefly-rk3399 case this is even stranger.
There is a discrepancy between the "fcs,suspend-voltage-selector" between different bootloader versions (how the selection-pin is set up), so the kernel might actually write his requested voltage to the wrong register (not the one for actual voltage, but the second set used for the suspend voltage).
Did you by chance swap bootloaders at some point in recent past?
No, haven't touched bootloader since I initially setup the board.
I'd assume [2] might actually be the same issue last year, though the CI-logs are not available anymore it seems.
Could you try to set the vdd_cpu_b regulator to disabled, so that cpufreq for this cluster defers and see what happens?
Yes, this change[1] definitely makes things boot reliably again, so there's defintiely something a bit unstable with this regulator, at least on this firefly.
Kevin
[1] diff --git a/arch/arm64/boot/dts/rockchip/rk3399-firefly.dts b/arch/arm64/boot/dts/rockchip/rk3399-firefly.dts index c706db0ee9ec..6b70bdcc3328 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399-firefly.dts +++ b/arch/arm64/boot/dts/rockchip/rk3399-firefly.dts @@ -454,6 +454,7 @@
vdd_cpu_b: regulator@40 { compatible = "silergy,syr827"; + status = "disabled"; reg = <0x40>; fcs,suspend-voltage-selector = <0>; regulator-name = "vdd_cpu_b";
Hi Kevin, Heiko,
On 2019/8/22 上午2:59, Kevin Hilman wrote:
Hi Heiko,
Heiko Stuebner heiko@sntech.de writes:
Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
[ resent with correct addr for linux-rockchip list ]
Mark Brown broonie@kernel.org writes:
On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
Today's -next started failing to boot defconfig on rk3399-firefly:
arm64: defconfig: gcc-8: rk3399-firefly: 1 failed lab
It hits a BUG() trying to set up cpufreq:
[ 87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz [ 87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz [ 87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz [ 87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22 [ 87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22 [ 87.495335] ------------[ cut here ]------------ [ 87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438! [ 87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
I'm struggling to see anything relevant in the diff from yesterday, the unlisted frequency warnings were there in the logs yesterday but no oops and I'm not seeing any changes in cpufreq, clk or anything relevant looking.
Full bootlog and other info can be found here:
I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n) makes the firefly board start working again.
Note that the default defconfig enables the "performance" CPUfreq governor as the default governor, so during kernel boot, it will always switch to the max frequency.
For fun, I set the default governor to "userspace" so the kernel wouldn't make any OPP changes, and that leads to a slightly more informative splat[1]
There is still an OPP change happening because the detected OPP is not one that's listed in the table, so it tries to change to a listed OPP and fails in the bowels of clk_set_rate()
Though I think that might only be a symptom as well. Both the PLL setting code as well as the actual cpu-clock implementation is unchanged since 2017 (and runs just fine on all boards in my farm).
One source for these issues is often the regulator supplying the cpu going haywire - aka the voltage not matching the opp.
As in this error-case it's CPU4 being set, this would mean it might be the big cluster supplied by the external syr825 (fan5355 clone) that might act up. In the Firefly-rk3399 case this is even stranger.
There is a discrepancy between the "fcs,suspend-voltage-selector" between different bootloader versions (how the selection-pin is set up), so the kernel might actually write his requested voltage to the wrong register (not the one for actual voltage, but the second set used for the suspend voltage).
Did you by chance swap bootloaders at some point in recent past?
No, haven't touched bootloader since I initially setup the board.
The CPU voltage does not affect by bootloader for kernel should have its own opp-table,
the bootloader may only affect the center/logic power supply.
I'd assume [2] might actually be the same issue last year, though the CI-logs are not available anymore it seems.
Could you try to set the vdd_cpu_b regulator to disabled, so that cpufreq for this cluster defers and see what happens?
Yes, this change[1] definitely makes things boot reliably again, so there's defintiely something a bit unstable with this regulator, at least on this firefly.
Is it possible to target which patch introduce this bug? This board should have work correctly
for a long time with upstream source code.
Thanks,
- Kever
Kevin
[1] diff --git a/arch/arm64/boot/dts/rockchip/rk3399-firefly.dts b/arch/arm64/boot/dts/rockchip/rk3399-firefly.dts index c706db0ee9ec..6b70bdcc3328 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399-firefly.dts +++ b/arch/arm64/boot/dts/rockchip/rk3399-firefly.dts @@ -454,6 +454,7 @@ vdd_cpu_b: regulator@40 { compatible = "silergy,syr827";
reg = <0x40>; fcs,suspend-voltage-selector = <0>; regulator-name = "vdd_cpu_b";status = "disabled";
Linux-rockchip mailing list Linux-rockchip@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-rockchip
Kever Yang kever.yang@rock-chips.com writes:
Hi Kevin, Heiko,
On 2019/8/22 上午2:59, Kevin Hilman wrote:
Hi Heiko,
Heiko Stuebner heiko@sntech.de writes:
Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
[ resent with correct addr for linux-rockchip list ]
Mark Brown broonie@kernel.org writes:
On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
Today's -next started failing to boot defconfig on rk3399-firefly:
arm64: defconfig: gcc-8: rk3399-firefly: 1 failed lab
It hits a BUG() trying to set up cpufreq:
[ 87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz [ 87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz [ 87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz [ 87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22 [ 87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22 [ 87.495335] ------------[ cut here ]------------ [ 87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438! [ 87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
I'm struggling to see anything relevant in the diff from yesterday, the unlisted frequency warnings were there in the logs yesterday but no oops and I'm not seeing any changes in cpufreq, clk or anything relevant looking.
Full bootlog and other info can be found here:
I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n) makes the firefly board start working again.
Note that the default defconfig enables the "performance" CPUfreq governor as the default governor, so during kernel boot, it will always switch to the max frequency.
For fun, I set the default governor to "userspace" so the kernel wouldn't make any OPP changes, and that leads to a slightly more informative splat[1]
There is still an OPP change happening because the detected OPP is not one that's listed in the table, so it tries to change to a listed OPP and fails in the bowels of clk_set_rate()
Though I think that might only be a symptom as well. Both the PLL setting code as well as the actual cpu-clock implementation is unchanged since 2017 (and runs just fine on all boards in my farm).
One source for these issues is often the regulator supplying the cpu going haywire - aka the voltage not matching the opp.
As in this error-case it's CPU4 being set, this would mean it might be the big cluster supplied by the external syr825 (fan5355 clone) that might act up. In the Firefly-rk3399 case this is even stranger.
There is a discrepancy between the "fcs,suspend-voltage-selector" between different bootloader versions (how the selection-pin is set up), so the kernel might actually write his requested voltage to the wrong register (not the one for actual voltage, but the second set used for the suspend voltage).
Did you by chance swap bootloaders at some point in recent past?
No, haven't touched bootloader since I initially setup the board.
The CPU voltage does not affect by bootloader for kernel should have its own opp-table,
the bootloader may only affect the center/logic power supply.
I'd assume [2] might actually be the same issue last year, though the CI-logs are not available anymore it seems.
Could you try to set the vdd_cpu_b regulator to disabled, so that cpufreq for this cluster defers and see what happens?
Yes, this change[1] definitely makes things boot reliably again, so there's defintiely something a bit unstable with this regulator, at least on this firefly.
Is it possible to target which patch introduce this bug? This board should have work correctly for a long time with upstream source code.
Unfortunately, it seems to be a regular, but intermittent failure, so bisection is not producing anything reliable.
You can see that both in mainline[1] and in linux-next[2] there are periodic failures, but it's hard to see any patterns.
I'm starting to think that maybe the regulator on my particular board is just starting to fail, since disabling the regulator for that cluster prevents any voltage changes and makes things reliable again.
If we don't find a solution to this, I'll probably just have to retire this board from my kernelCI lab (of course, I'd be happy to replace it if someone wants to donate another one.) :)
Kevin
[1] https://kernelci.org/boot/rk3399-firefly/job/mainline/ [2] https://kernelci.org/boot/rk3399-firefly/job/next/
Kevin Hilman khilman@baylibre.com writes:
Kever Yang kever.yang@rock-chips.com writes:
Hi Kevin, Heiko,
On 2019/8/22 上午2:59, Kevin Hilman wrote:
Hi Heiko,
Heiko Stuebner heiko@sntech.de writes:
Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
[ resent with correct addr for linux-rockchip list ]
Mark Brown broonie@kernel.org writes:
On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote:
Today's -next started failing to boot defconfig on rk3399-firefly:
> arm64: > defconfig: > gcc-8: > rk3399-firefly: 1 failed lab It hits a BUG() trying to set up cpufreq:
[ 87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz [ 87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz [ 87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz [ 87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22 [ 87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22 [ 87.495335] ------------[ cut here ]------------ [ 87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438! [ 87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
I'm struggling to see anything relevant in the diff from yesterday, the unlisted frequency warnings were there in the logs yesterday but no oops and I'm not seeing any changes in cpufreq, clk or anything relevant looking.
Full bootlog and other info can be found here:
I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n) makes the firefly board start working again.
Note that the default defconfig enables the "performance" CPUfreq governor as the default governor, so during kernel boot, it will always switch to the max frequency.
For fun, I set the default governor to "userspace" so the kernel wouldn't make any OPP changes, and that leads to a slightly more informative splat[1]
There is still an OPP change happening because the detected OPP is not one that's listed in the table, so it tries to change to a listed OPP and fails in the bowels of clk_set_rate()
Though I think that might only be a symptom as well. Both the PLL setting code as well as the actual cpu-clock implementation is unchanged since 2017 (and runs just fine on all boards in my farm).
One source for these issues is often the regulator supplying the cpu going haywire - aka the voltage not matching the opp.
As in this error-case it's CPU4 being set, this would mean it might be the big cluster supplied by the external syr825 (fan5355 clone) that might act up. In the Firefly-rk3399 case this is even stranger.
There is a discrepancy between the "fcs,suspend-voltage-selector" between different bootloader versions (how the selection-pin is set up), so the kernel might actually write his requested voltage to the wrong register (not the one for actual voltage, but the second set used for the suspend voltage).
Did you by chance swap bootloaders at some point in recent past?
No, haven't touched bootloader since I initially setup the board.
The CPU voltage does not affect by bootloader for kernel should have its own opp-table,
the bootloader may only affect the center/logic power supply.
I'd assume [2] might actually be the same issue last year, though the CI-logs are not available anymore it seems.
Could you try to set the vdd_cpu_b regulator to disabled, so that cpufreq for this cluster defers and see what happens?
Yes, this change[1] definitely makes things boot reliably again, so there's defintiely something a bit unstable with this regulator, at least on this firefly.
Is it possible to target which patch introduce this bug? This board should have work correctly for a long time with upstream source code.
Unfortunately, it seems to be a regular, but intermittent failure, so bisection is not producing anything reliable.
You can see that both in mainline[1] and in linux-next[2] there are periodic failures, but it's hard to see any patterns.
Even worse, I (re)tested mainline for versions that were previously passing (v5.2, v5.3-rc5) and they are also failing now.
They work again if I disable that regulator as suggested by Heiko.
So this is increasingly pointing to failing hardware.
Kevin
Hi Kevin,
I want to have a test with my board, I can get the Image and dtb from the link for the job,
but how can I get the randisk which is named initrd-SDbyy2.cpio.gz?
Thanks,
- Kever
On 2019/8/24 上午1:03, Kevin Hilman wrote:
Kevin Hilman khilman@baylibre.com writes:
Kever Yang kever.yang@rock-chips.com writes:
Hi Kevin, Heiko,
On 2019/8/22 上午2:59, Kevin Hilman wrote:
Hi Heiko,
Heiko Stuebner heiko@sntech.de writes:
Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
[ resent with correct addr for linux-rockchip list ]
Mark Brown broonie@kernel.org writes:
> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote: > > Today's -next started failing to boot defconfig on rk3399-firefly: > >> arm64: >> defconfig: >> gcc-8: >> rk3399-firefly: 1 failed lab > It hits a BUG() trying to set up cpufreq: > > [ 87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz > [ 87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz > [ 87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz > [ 87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22 > [ 87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22 > [ 87.495335] ------------[ cut here ]------------ > [ 87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438! > [ 87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP > > I'm struggling to see anything relevant in the diff from yesterday, the > unlisted frequency warnings were there in the logs yesterday but no oops > and I'm not seeing any changes in cpufreq, clk or anything relevant > looking. > > Full bootlog and other info can be found here: > > https://kernelci.org/boot/id/5d302d8359b51498d049e983/ I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n) makes the firefly board start working again.
Note that the default defconfig enables the "performance" CPUfreq governor as the default governor, so during kernel boot, it will always switch to the max frequency.
For fun, I set the default governor to "userspace" so the kernel wouldn't make any OPP changes, and that leads to a slightly more informative splat[1]
There is still an OPP change happening because the detected OPP is not one that's listed in the table, so it tries to change to a listed OPP and fails in the bowels of clk_set_rate()
Though I think that might only be a symptom as well. Both the PLL setting code as well as the actual cpu-clock implementation is unchanged since 2017 (and runs just fine on all boards in my farm).
One source for these issues is often the regulator supplying the cpu going haywire - aka the voltage not matching the opp.
As in this error-case it's CPU4 being set, this would mean it might be the big cluster supplied by the external syr825 (fan5355 clone) that might act up. In the Firefly-rk3399 case this is even stranger.
There is a discrepancy between the "fcs,suspend-voltage-selector" between different bootloader versions (how the selection-pin is set up), so the kernel might actually write his requested voltage to the wrong register (not the one for actual voltage, but the second set used for the suspend voltage).
Did you by chance swap bootloaders at some point in recent past?
No, haven't touched bootloader since I initially setup the board.
The CPU voltage does not affect by bootloader for kernel should have its own opp-table,
the bootloader may only affect the center/logic power supply.
I'd assume [2] might actually be the same issue last year, though the CI-logs are not available anymore it seems.
Could you try to set the vdd_cpu_b regulator to disabled, so that cpufreq for this cluster defers and see what happens?
Yes, this change[1] definitely makes things boot reliably again, so there's defintiely something a bit unstable with this regulator, at least on this firefly.
Is it possible to target which patch introduce this bug? This board should have work correctly for a long time with upstream source code.
Unfortunately, it seems to be a regular, but intermittent failure, so bisection is not producing anything reliable.
You can see that both in mainline[1] and in linux-next[2] there are periodic failures, but it's hard to see any patterns.
Even worse, I (re)tested mainline for versions that were previously passing (v5.2, v5.3-rc5) and they are also failing now.
They work again if I disable that regulator as suggested by Heiko.
So this is increasingly pointing to failing hardware.
Kevin
Hi Kever,
Kever Yang kever.yang@rock-chips.com writes:
Hi Kevin,
I want to have a test with my board, I can get the Image and dtb from the link for the job,
but how can I get the randisk which is named initrd-SDbyy2.cpio.gz?
The ramdisk images are here:
https://storage.kernelci.org/images/rootfs/buildroot/kci-2019.02/arm64/base/
in the kernelCI logs the ramdisk is slightly modified because the kernel modules have been inserted into the cpio archive.
However, for the purposes of this test, you can just test with the unmodified rootfs.cpio.gz above.
Kevin
Thanks,
- Kever
On 2019/8/24 上午1:03, Kevin Hilman wrote:
Kevin Hilman khilman@baylibre.com writes:
Kever Yang kever.yang@rock-chips.com writes:
Hi Kevin, Heiko,
On 2019/8/22 上午2:59, Kevin Hilman wrote:
Hi Heiko,
Heiko Stuebner heiko@sntech.de writes:
Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman: > [ resent with correct addr for linux-rockchip list ] > > Mark Brown broonie@kernel.org writes: > >> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote: >> >> Today's -next started failing to boot defconfig on rk3399-firefly: >> >>> arm64: >>> defconfig: >>> gcc-8: >>> rk3399-firefly: 1 failed lab >> It hits a BUG() trying to set up cpufreq: >> >> [ 87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz >> [ 87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz >> [ 87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz >> [ 87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22 >> [ 87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22 >> [ 87.495335] ------------[ cut here ]------------ >> [ 87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438! >> [ 87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP >> >> I'm struggling to see anything relevant in the diff from yesterday, the >> unlisted frequency warnings were there in the logs yesterday but no oops >> and I'm not seeing any changes in cpufreq, clk or anything relevant >> looking. >> >> Full bootlog and other info can be found here: >> >> https://kernelci.org/boot/id/5d302d8359b51498d049e983/ > I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n) > makes the firefly board start working again. > > Note that the default defconfig enables the "performance" CPUfreq > governor as the default governor, so during kernel boot, it will always > switch to the max frequency. > > For fun, I set the default governor to "userspace" so the kernel > wouldn't make any OPP changes, and that leads to a slightly more > informative splat[1] > > There is still an OPP change happening because the detected OPP is not > one that's listed in the table, so it tries to change to a listed OPP > and fails in the bowels of clk_set_rate() Though I think that might only be a symptom as well. Both the PLL setting code as well as the actual cpu-clock implementation is unchanged since 2017 (and runs just fine on all boards in my farm).
One source for these issues is often the regulator supplying the cpu going haywire - aka the voltage not matching the opp.
As in this error-case it's CPU4 being set, this would mean it might be the big cluster supplied by the external syr825 (fan5355 clone) that might act up. In the Firefly-rk3399 case this is even stranger.
There is a discrepancy between the "fcs,suspend-voltage-selector" between different bootloader versions (how the selection-pin is set up), so the kernel might actually write his requested voltage to the wrong register (not the one for actual voltage, but the second set used for the suspend voltage).
Did you by chance swap bootloaders at some point in recent past?
No, haven't touched bootloader since I initially setup the board.
The CPU voltage does not affect by bootloader for kernel should have its own opp-table,
the bootloader may only affect the center/logic power supply.
I'd assume [2] might actually be the same issue last year, though the CI-logs are not available anymore it seems.
Could you try to set the vdd_cpu_b regulator to disabled, so that cpufreq for this cluster defers and see what happens?
Yes, this change[1] definitely makes things boot reliably again, so there's defintiely something a bit unstable with this regulator, at least on this firefly.
Is it possible to target which patch introduce this bug? This board should have work correctly for a long time with upstream source code.
Unfortunately, it seems to be a regular, but intermittent failure, so bisection is not producing anything reliable.
You can see that both in mainline[1] and in linux-next[2] there are periodic failures, but it's hard to see any patterns.
Even worse, I (re)tested mainline for versions that were previously passing (v5.2, v5.3-rc5) and they are also failing now.
They work again if I disable that regulator as suggested by Heiko.
So this is increasingly pointing to failing hardware.
Kevin
On 2019/8/27 上午1:09, Kevin Hilman wrote:
Hi Kever,
Kever Yang kever.yang@rock-chips.com writes:
Hi Kevin,
I want to have a test with my board, I can get the Image and dtb from the link for the job,
but how can I get the randisk which is named initrd-SDbyy2.cpio.gz?
The ramdisk images are here:
https://storage.kernelci.org/images/rootfs/buildroot/kci-2019.02/arm64/base/
in the kernelCI logs the ramdisk is slightly modified because the kernel modules have been inserted into the cpio archive.
However, for the purposes of this test, you can just test with the unmodified rootfs.cpio.gz above.
I try with this ramdisk, and it hangs at fan53555 init, but not get into cpufreq.
Any suggestion?
My boot log:
https://paste.ubuntu.com/p/WYZKPWp7sk/
Thanks,
- Kever
Kevin
Thanks,
- Kever
On 2019/8/24 上午1:03, Kevin Hilman wrote:
Kevin Hilman khilman@baylibre.com writes:
Kever Yang kever.yang@rock-chips.com writes:
Hi Kevin, Heiko,
On 2019/8/22 上午2:59, Kevin Hilman wrote:
Hi Heiko,
Heiko Stuebner heiko@sntech.de writes:
> Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman: >> [ resent with correct addr for linux-rockchip list ] >> >> Mark Brown broonie@kernel.org writes: >> >>> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote: >>> >>> Today's -next started failing to boot defconfig on rk3399-firefly: >>> >>>> arm64: >>>> defconfig: >>>> gcc-8: >>>> rk3399-firefly: 1 failed lab >>> It hits a BUG() trying to set up cpufreq: >>> >>> [ 87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz >>> [ 87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz >>> [ 87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz >>> [ 87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22 >>> [ 87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22 >>> [ 87.495335] ------------[ cut here ]------------ >>> [ 87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438! >>> [ 87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP >>> >>> I'm struggling to see anything relevant in the diff from yesterday, the >>> unlisted frequency warnings were there in the logs yesterday but no oops >>> and I'm not seeing any changes in cpufreq, clk or anything relevant >>> looking. >>> >>> Full bootlog and other info can be found here: >>> >>> https://kernelci.org/boot/id/5d302d8359b51498d049e983/ >> I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n) >> makes the firefly board start working again. >> >> Note that the default defconfig enables the "performance" CPUfreq >> governor as the default governor, so during kernel boot, it will always >> switch to the max frequency. >> >> For fun, I set the default governor to "userspace" so the kernel >> wouldn't make any OPP changes, and that leads to a slightly more >> informative splat[1] >> >> There is still an OPP change happening because the detected OPP is not >> one that's listed in the table, so it tries to change to a listed OPP >> and fails in the bowels of clk_set_rate() > Though I think that might only be a symptom as well. > Both the PLL setting code as well as the actual cpu-clock implementation > is unchanged since 2017 (and runs just fine on all boards in my farm). > > One source for these issues is often the regulator supplying the cpu > going haywire - aka the voltage not matching the opp. > > As in this error-case it's CPU4 being set, this would mean it might > be the big cluster supplied by the external syr825 (fan5355 clone) > that might act up. In the Firefly-rk3399 case this is even stranger. > > There is a discrepancy between the "fcs,suspend-voltage-selector" > between different bootloader versions (how the selection-pin is set up), > so the kernel might actually write his requested voltage to the wrong > register (not the one for actual voltage, but the second set used for > the suspend voltage). > > Did you by chance swap bootloaders at some point in recent past? No, haven't touched bootloader since I initially setup the board.
The CPU voltage does not affect by bootloader for kernel should have its own opp-table,
the bootloader may only affect the center/logic power supply.
> I'd assume [2] might actually be the same issue last year, though > the CI-logs are not available anymore it seems. > > Could you try to set the vdd_cpu_b regulator to disabled, so that > cpufreq for this cluster defers and see what happens? Yes, this change[1] definitely makes things boot reliably again, so there's defintiely something a bit unstable with this regulator, at least on this firefly.
Is it possible to target which patch introduce this bug? This board should have work correctly for a long time with upstream source code.
Unfortunately, it seems to be a regular, but intermittent failure, so bisection is not producing anything reliable.
You can see that both in mainline[1] and in linux-next[2] there are periodic failures, but it's hard to see any patterns.
Even worse, I (re)tested mainline for versions that were previously passing (v5.2, v5.3-rc5) and they are also failing now.
They work again if I disable that regulator as suggested by Heiko.
So this is increasingly pointing to failing hardware.
Kevin
Hi Kever,
Am Dienstag, 27. August 2019, 03:54:26 CEST schrieb Kever Yang:
On 2019/8/27 上午1:09, Kevin Hilman wrote:
Kever Yang kever.yang@rock-chips.com writes:
I want to have a test with my board, I can get the Image and dtb
from the link for the job,
but how can I get the randisk which is named initrd-SDbyy2.cpio.gz?
The ramdisk images are here:
https://storage.kernelci.org/images/rootfs/buildroot/kci-2019.02/arm64/base/
in the kernelCI logs the ramdisk is slightly modified because the kernel modules have been inserted into the cpio archive.
However, for the purposes of this test, you can just test with the unmodified rootfs.cpio.gz above.
I try with this ramdisk, and it hangs at fan53555 init, but not get into cpufreq.
Any suggestion?
My guess would be the fcs,suspend-voltage-selector maybe?
I.e. old uboots somehow set the voltage gpio strangely, so you'd need fcs,suspend-voltage-selector = <0> while newer uboots I think do configure the gpio, needing a value of <1>;
So try to swap that number in the dts perhaps for a start?
Heiko
My boot log:
https://paste.ubuntu.com/p/WYZKPWp7sk/
Thanks,
- Kever
Kevin
Thanks,
- Kever
On 2019/8/24 上午1:03, Kevin Hilman wrote:
Kevin Hilman khilman@baylibre.com writes:
Kever Yang kever.yang@rock-chips.com writes:
Hi Kevin, Heiko,
On 2019/8/22 上午2:59, Kevin Hilman wrote: > Hi Heiko, > > Heiko Stuebner heiko@sntech.de writes: > >> Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman: >>> [ resent with correct addr for linux-rockchip list ] >>> >>> Mark Brown broonie@kernel.org writes: >>> >>>> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote: >>>> >>>> Today's -next started failing to boot defconfig on rk3399-firefly: >>>> >>>>> arm64: >>>>> defconfig: >>>>> gcc-8: >>>>> rk3399-firefly: 1 failed lab >>>> It hits a BUG() trying to set up cpufreq: >>>> >>>> [ 87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz >>>> [ 87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz >>>> [ 87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz >>>> [ 87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22 >>>> [ 87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22 >>>> [ 87.495335] ------------[ cut here ]------------ >>>> [ 87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438! >>>> [ 87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP >>>> >>>> I'm struggling to see anything relevant in the diff from yesterday, the >>>> unlisted frequency warnings were there in the logs yesterday but no oops >>>> and I'm not seeing any changes in cpufreq, clk or anything relevant >>>> looking. >>>> >>>> Full bootlog and other info can be found here: >>>> >>>> https://kernelci.org/boot/id/5d302d8359b51498d049e983/ >>> I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n) >>> makes the firefly board start working again. >>> >>> Note that the default defconfig enables the "performance" CPUfreq >>> governor as the default governor, so during kernel boot, it will always >>> switch to the max frequency. >>> >>> For fun, I set the default governor to "userspace" so the kernel >>> wouldn't make any OPP changes, and that leads to a slightly more >>> informative splat[1] >>> >>> There is still an OPP change happening because the detected OPP is not >>> one that's listed in the table, so it tries to change to a listed OPP >>> and fails in the bowels of clk_set_rate() >> Though I think that might only be a symptom as well. >> Both the PLL setting code as well as the actual cpu-clock implementation >> is unchanged since 2017 (and runs just fine on all boards in my farm). >> >> One source for these issues is often the regulator supplying the cpu >> going haywire - aka the voltage not matching the opp. >> >> As in this error-case it's CPU4 being set, this would mean it might >> be the big cluster supplied by the external syr825 (fan5355 clone) >> that might act up. In the Firefly-rk3399 case this is even stranger. >> >> There is a discrepancy between the "fcs,suspend-voltage-selector" >> between different bootloader versions (how the selection-pin is set up), >> so the kernel might actually write his requested voltage to the wrong >> register (not the one for actual voltage, but the second set used for >> the suspend voltage). >> >> Did you by chance swap bootloaders at some point in recent past? > No, haven't touched bootloader since I initially setup the board. The CPU voltage does not affect by bootloader for kernel should have its own opp-table,
the bootloader may only affect the center/logic power supply.
>> I'd assume [2] might actually be the same issue last year, though >> the CI-logs are not available anymore it seems. >> >> Could you try to set the vdd_cpu_b regulator to disabled, so that >> cpufreq for this cluster defers and see what happens? > Yes, this change[1] definitely makes things boot reliably again, so > there's defintiely something a bit unstable with this regulator, at > least on this firefly. Is it possible to target which patch introduce this bug? This board should have work correctly for a long time with upstream source code.
Unfortunately, it seems to be a regular, but intermittent failure, so bisection is not producing anything reliable.
You can see that both in mainline[1] and in linux-next[2] there are periodic failures, but it's hard to see any patterns.
Even worse, I (re)tested mainline for versions that were previously passing (v5.2, v5.3-rc5) and they are also failing now.
They work again if I disable that regulator as suggested by Heiko.
So this is increasingly pointing to failing hardware.
Kevin
Hi Heiko,
On 2019/8/27 上午10:14, Heiko Stuebner wrote:
Hi Kever,
Am Dienstag, 27. August 2019, 03:54:26 CEST schrieb Kever Yang:
On 2019/8/27 上午1:09, Kevin Hilman wrote:
Kever Yang kever.yang@rock-chips.com writes:
I want to have a test with my board, I can get the Image and dtb
from the link for the job,
but how can I get the randisk which is named initrd-SDbyy2.cpio.gz?
The ramdisk images are here:
https://storage.kernelci.org/images/rootfs/buildroot/kci-2019.02/arm64/base/
in the kernelCI logs the ramdisk is slightly modified because the kernel modules have been inserted into the cpio archive.
However, for the purposes of this test, you can just test with the unmodified rootfs.cpio.gz above.
I try with this ramdisk, and it hangs at fan53555 init, but not get into cpufreq.
Any suggestion?
My guess would be the fcs,suspend-voltage-selector maybe?
I.e. old uboots somehow set the voltage gpio strangely, so you'd need fcs,suspend-voltage-selector = <0>
Both U-Boot and Kernel dts are still '<0>' for this property, and this is correct setting for cpu_b;
while newer uboots I think do configure the gpio, needing a value of <1>;
There is no 'vsel-gpio' in both upstream U-Boot and kernel dts, while there is a "vsel-gpios = <&gpio1 18 GPIO_ACTIVE_HIGH>;"
in rockchip kernel 4.4 dts. so I think there is no gpio setting on upstream code?
And kernelci's test case, does not update the bootloader, only update kernel.
Thanks,
- Kever
So try to swap that number in the dts perhaps for a start?
Heiko
My boot log:
https://paste.ubuntu.com/p/WYZKPWp7sk/
Thanks,
- Kever
Kevin
Thanks,
- Kever
On 2019/8/24 上午1:03, Kevin Hilman wrote:
Kevin Hilman khilman@baylibre.com writes:
Kever Yang kever.yang@rock-chips.com writes:
> Hi Kevin, Heiko, > > On 2019/8/22 上午2:59, Kevin Hilman wrote: >> Hi Heiko, >> >> Heiko Stuebner heiko@sntech.de writes: >> >>> Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman: >>>> [ resent with correct addr for linux-rockchip list ] >>>> >>>> Mark Brown broonie@kernel.org writes: >>>> >>>>> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote: >>>>> >>>>> Today's -next started failing to boot defconfig on rk3399-firefly: >>>>> >>>>>> arm64: >>>>>> defconfig: >>>>>> gcc-8: >>>>>> rk3399-firefly: 1 failed lab >>>>> It hits a BUG() trying to set up cpufreq: >>>>> >>>>> [ 87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz >>>>> [ 87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz >>>>> [ 87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz >>>>> [ 87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22 >>>>> [ 87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22 >>>>> [ 87.495335] ------------[ cut here ]------------ >>>>> [ 87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438! >>>>> [ 87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP >>>>> >>>>> I'm struggling to see anything relevant in the diff from yesterday, the >>>>> unlisted frequency warnings were there in the logs yesterday but no oops >>>>> and I'm not seeing any changes in cpufreq, clk or anything relevant >>>>> looking. >>>>> >>>>> Full bootlog and other info can be found here: >>>>> >>>>> https://kernelci.org/boot/id/5d302d8359b51498d049e983/ >>>> I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n) >>>> makes the firefly board start working again. >>>> >>>> Note that the default defconfig enables the "performance" CPUfreq >>>> governor as the default governor, so during kernel boot, it will always >>>> switch to the max frequency. >>>> >>>> For fun, I set the default governor to "userspace" so the kernel >>>> wouldn't make any OPP changes, and that leads to a slightly more >>>> informative splat[1] >>>> >>>> There is still an OPP change happening because the detected OPP is not >>>> one that's listed in the table, so it tries to change to a listed OPP >>>> and fails in the bowels of clk_set_rate() >>> Though I think that might only be a symptom as well. >>> Both the PLL setting code as well as the actual cpu-clock implementation >>> is unchanged since 2017 (and runs just fine on all boards in my farm). >>> >>> One source for these issues is often the regulator supplying the cpu >>> going haywire - aka the voltage not matching the opp. >>> >>> As in this error-case it's CPU4 being set, this would mean it might >>> be the big cluster supplied by the external syr825 (fan5355 clone) >>> that might act up. In the Firefly-rk3399 case this is even stranger. >>> >>> There is a discrepancy between the "fcs,suspend-voltage-selector" >>> between different bootloader versions (how the selection-pin is set up), >>> so the kernel might actually write his requested voltage to the wrong >>> register (not the one for actual voltage, but the second set used for >>> the suspend voltage). >>> >>> Did you by chance swap bootloaders at some point in recent past? >> No, haven't touched bootloader since I initially setup the board. > The CPU voltage does not affect by bootloader for kernel should have its > own opp-table, > > the bootloader may only affect the center/logic power supply. > >>> I'd assume [2] might actually be the same issue last year, though >>> the CI-logs are not available anymore it seems. >>> >>> Could you try to set the vdd_cpu_b regulator to disabled, so that >>> cpufreq for this cluster defers and see what happens? >> Yes, this change[1] definitely makes things boot reliably again, so >> there's defintiely something a bit unstable with this regulator, at >> least on this firefly. > Is it possible to target which patch introduce this bug? This board > should have work correctly for a long time with upstream source code. Unfortunately, it seems to be a regular, but intermittent failure, so bisection is not producing anything reliable.
You can see that both in mainline[1] and in linux-next[2] there are periodic failures, but it's hard to see any patterns.
Even worse, I (re)tested mainline for versions that were previously passing (v5.2, v5.3-rc5) and they are also failing now.
They work again if I disable that regulator as suggested by Heiko.
So this is increasingly pointing to failing hardware.
Kevin
Kevin Hilman khilman@baylibre.com writes:
Kevin Hilman khilman@baylibre.com writes:
Kever Yang kever.yang@rock-chips.com writes:
Hi Kevin, Heiko,
On 2019/8/22 上午2:59, Kevin Hilman wrote:
Hi Heiko,
Heiko Stuebner heiko@sntech.de writes:
Am Dienstag, 13. August 2019, 19:35:31 CEST schrieb Kevin Hilman:
[ resent with correct addr for linux-rockchip list ]
Mark Brown broonie@kernel.org writes:
> On Thu, Jul 18, 2019 at 04:28:08AM -0700, kernelci.org bot wrote: > > Today's -next started failing to boot defconfig on rk3399-firefly: > >> arm64: >> defconfig: >> gcc-8: >> rk3399-firefly: 1 failed lab > It hits a BUG() trying to set up cpufreq: > > [ 87.381606] cpufreq: cpufreq_online: CPU0: Running at unlisted freq: 200000 KHz > [ 87.393244] cpufreq: cpufreq_online: CPU0: Unlisted initial frequency changed to: 408000 KHz > [ 87.469777] cpufreq: cpufreq_online: CPU4: Running at unlisted freq: 12000 KHz > [ 87.488595] cpu cpu4: _generic_set_opp_clk_only: failed to set clock rate: -22 > [ 87.491881] cpufreq: __target_index: Failed to change cpu frequency: -22 > [ 87.495335] ------------[ cut here ]------------ > [ 87.496821] kernel BUG at drivers/cpufreq/cpufreq.c:1438! > [ 87.498462] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP > > I'm struggling to see anything relevant in the diff from yesterday, the > unlisted frequency warnings were there in the logs yesterday but no oops > and I'm not seeing any changes in cpufreq, clk or anything relevant > looking. > > Full bootlog and other info can be found here: > > https://kernelci.org/boot/id/5d302d8359b51498d049e983/ I confirm that disabling CPUfreq in the defconfig (CONFIG_CPU_FREQ=n) makes the firefly board start working again.
Note that the default defconfig enables the "performance" CPUfreq governor as the default governor, so during kernel boot, it will always switch to the max frequency.
For fun, I set the default governor to "userspace" so the kernel wouldn't make any OPP changes, and that leads to a slightly more informative splat[1]
There is still an OPP change happening because the detected OPP is not one that's listed in the table, so it tries to change to a listed OPP and fails in the bowels of clk_set_rate()
Though I think that might only be a symptom as well. Both the PLL setting code as well as the actual cpu-clock implementation is unchanged since 2017 (and runs just fine on all boards in my farm).
One source for these issues is often the regulator supplying the cpu going haywire - aka the voltage not matching the opp.
As in this error-case it's CPU4 being set, this would mean it might be the big cluster supplied by the external syr825 (fan5355 clone) that might act up. In the Firefly-rk3399 case this is even stranger.
There is a discrepancy between the "fcs,suspend-voltage-selector" between different bootloader versions (how the selection-pin is set up), so the kernel might actually write his requested voltage to the wrong register (not the one for actual voltage, but the second set used for the suspend voltage).
Did you by chance swap bootloaders at some point in recent past?
No, haven't touched bootloader since I initially setup the board.
The CPU voltage does not affect by bootloader for kernel should have its own opp-table,
the bootloader may only affect the center/logic power supply.
I'd assume [2] might actually be the same issue last year, though the CI-logs are not available anymore it seems.
Could you try to set the vdd_cpu_b regulator to disabled, so that cpufreq for this cluster defers and see what happens?
Yes, this change[1] definitely makes things boot reliably again, so there's defintiely something a bit unstable with this regulator, at least on this firefly.
Is it possible to target which patch introduce this bug? This board should have work correctly for a long time with upstream source code.
Unfortunately, it seems to be a regular, but intermittent failure, so bisection is not producing anything reliable.
You can see that both in mainline[1] and in linux-next[2] there are periodic failures, but it's hard to see any patterns.
Even worse, I (re)tested mainline for versions that were previously passing (v5.2, v5.3-rc5) and they are also failing now.
They work again if I disable that regulator as suggested by Heiko.
So this is increasingly pointing to failing hardware.
This is now failing in the v5.2 stable tree.
Any suggestions on what to do? otherwise, I'll just need to disable this board.
Or, if someone wants to donate a new rk3399-firefly for my lab, I'd be glad to try replacing it.
Kevin
Hi Kevin,
I will send you a Firefly-rk3399 board to you.
Thanks,
- Kever
On 2019/9/27 上午6:51, Kevin Hilman wrote:
This is now failing in the v5.2 stable tree.
Any suggestions on what to do? otherwise, I'll just need to disable this board.
Or, if someone wants to donate a new rk3399-firefly for my lab, I'd be glad to try replacing it.
Kevin
kernel-build-reports@lists.linaro.org