Tree/Branch: mainline Git describe: v3.16-rc1-2-gebe0618 Failed boot tests (console logs at the end) =========================================== exynos5420-arndale-octa: FAIL: arm-exynos_defconfig ste-snowball: FAIL: arm-u8500_defconfig
Full Report ===========
arm-davinci_all_defconfig ------------------------- legacy,dm365evm 0 min 16.9 sec: PASS da850-evm 0 min 15.1 sec: PASS
arm-tegra_defconfig ------------------- tegra124-jetson-tk1 0 min 17.6 sec: PASS tegra30-beaver 0 min 22.7 sec: PASS
arm-multi_v7_defconfig+CONFIG_ARM_LPAE=y ---------------------------------------- tegra124-jetson-tk1 0 min 17.6 sec: PASS armada-xp-openblocks-ax3-4 0 min 24.2 sec: PASS omap5-uevm 1 min 38.4 sec: PASS (Warnings: 1) sun7i-a20-cubieboard2 0 min 12.9 sec: PASS
arm-mvebu_v7_defconfig+CONFIG_CPU_BIG_ENDIAN=y ---------------------------------------------- armada-xp-openblocks-ax3-4 0 min 23.7 sec: PASS armada-370-mirabox 0 min 20.5 sec: PASS
arm-omap2plus_defconfig ----------------------- legacy,3730xm 0 min 41.9 sec: PASS am335x-boneblack 0 min 22.5 sec: PASS omap3-beagle-xm 0 min 49.6 sec: PASS legacy,3530beagle 0 min 22.4 sec: PASS (Warnings: 1) omap4-panda 0 min 53.7 sec: PASS (Warnings: 1) omap3-overo-tobi 0 min 22.7 sec: PASS am335x-bone 0 min 28.9 sec: PASS omap3-overo-storm-tobi 0 min 22.0 sec: PASS omap5-uevm 0 min 58.2 sec: PASS (Warnings: 1) omap3-n900 0 min 17.0 sec: PASS legacy,n900 0 min 17.0 sec: PASS (Warnings: 1) omap4-panda-es 0 min 51.5 sec: PASS (Warnings: 1) legacy,3730storm 0 min 22.8 sec: PASS legacy,3530overo 0 min 21.1 sec: PASS (Warnings: 1)
arm-multi_v7_defconfig ---------------------- imx6dl-wandboard,wand-solo 0 min 15.2 sec: PASS am335x-boneblack 0 min 22.0 sec: PASS sun7i-a20-cubieboard2 0 min 12.8 sec: PASS sun4i-a10-cubieboard 0 min 17.4 sec: PASS exynos5410-smdk5410 0 min 28.6 sec: PASS am335x-bone 0 min 28.7 sec: PASS tegra124-jetson-tk1 0 min 17.5 sec: PASS armada-370-mirabox 0 min 21.8 sec: PASS omap4-panda 0 min 51.7 sec: PASS (Warnings: 1) imx6q-wandboard 0 min 13.8 sec: PASS imx6dl-wandboard,wand-dual 0 min 15.2 sec: PASS ste-snowball 1 min 11.6 sec: PASS tegra30-beaver 0 min 17.3 sec: PASS omap3-n900 0 min 15.2 sec: PASS qcom-apq8074-dragonboard 0 min 17.9 sec: PASS bcm28155-ap 0 min 24.9 sec: PASS omap3-overo-tobi 0 min 21.9 sec: PASS omap3-overo-storm-tobi 0 min 23.9 sec: PASS omap3-beagle-xm 0 min 44.8 sec: PASS exynos5420-arndale-octa 0 min 40.7 sec: PASS armada-xp-openblocks-ax3-4 0 min 25.1 sec: PASS omap5-uevm 1 min 38.0 sec: PASS (Warnings: 1) omap4-panda-es 0 min 50.4 sec: PASS (Warnings: 1) exynos5250-arndale 0 min 31.2 sec: PASS
arm-sunxi_defconfig ------------------- sun7i-a20-cubieboard2 0 min 11.7 sec: PASS sun4i-a10-cubieboard 0 min 11.5 sec: PASS
arm-qcom_defconfig ------------------ qcom-apq8074-dragonboard 0 min 17.4 sec: PASS
arm-bcm_defconfig ----------------- bcm28155-ap 0 min 22.8 sec: PASS
arm-exynos_defconfig -------------------- exynos5420-arndale-octa 0 min 38.8 sec: FAIL exynos5250-arndale 0 min 29.8 sec: PASS exynos5410-smdk5410 0 min 27.1 sec: PASS
arm-imx_v6_v7_defconfig ----------------------- imx6dl-wandboard,wand-dual 0 min 15.2 sec: PASS imx6dl-wandboard,wand-solo 0 min 15.6 sec: PASS imx6q-wandboard 0 min 14.0 sec: PASS
arm-u8500_defconfig ------------------- ste-snowball 1 min 53.5 sec: FAIL
arm-multi_v7_defconfig+CONFIG_CPU_BIG_ENDIAN=y ---------------------------------------------- armada-xp-openblocks-ax3-4 0 min 26.8 sec: PASS armada-370-mirabox 0 min 22.5 sec: PASS
arm-mvebu_v7_defconfig ---------------------- armada-xp-openblocks-ax3-4 0 min 22.7 sec: PASS armada-370-mirabox 0 min 20.2 sec: PASS
arm-sama5_defconfig ------------------- sama5d35ek 0 min 36.5 sec: PASS (Warnings: 1)
Console logs for failures =========================
arm-exynos_defconfig --------------------
exynos5420-arndale-octa: FAIL: last 40 lines of boot log: ---------------------------------------------------------
[ 16.242502] of_get_named_gpiod_flags exited with status 0 [ 16.246647] mmcblk0: p1 p2 p3 p4 [ 16.249817] Unhandled fault: external abort on non-linefetch (0x008) at 0xf00b8088 [ 16.257356] Internal error: : 8 [#2] PREEMPT SMP ARM [ 16.262291] Modules linked in: [ 16.265328] CPU: 0 PID: 6 Comm: kworker/u16:0 Tainted: G D 3.16.0-rc1-00002-gebe0618 #1 [ 16.274258] Workqueue: kmmcd mmc_rescan [ 16.278064] task: ef014000 ti: ef0b4000 task.ti: ef0b4000 [ 16.283442] PC is at rescan_partitions+0xb4/0x290 [ 16.288116] LR is at 0xf00b8000 [ 16.291236] pc : [<c01afea8>] lr : [<f00b8000>] psr: 20000113 [ 16.291236] sp : ef0b5c80 ip : 00000000 fp : 00000000 [ 16.302673] r10: 00000000 r9 : ef22b040 r8 : ed623200 [ 16.307872] r7 : 00000000 r6 : eec24280 r5 : 00000000 r4 : 00000008 [ 16.314371] r3 : 00000013 r2 : 00000000 r1 : 00000000 r0 : 00000001 [ 16.320872] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel [ 16.328150] Control: 10c5387d Table: 2000406a DAC: 00000015 [ 16.333869] Process kworker/u16:0 (pid: 6, stack limit = 0xef0b4240) [ 16.340195] Stack: (0xef0b5c80 to 0xef0b6000) [ 16.344530] 5c80: c0548484 00000001 ef0b4000 c03890b8 00000001 c05245b8 ed5ef598 c0524488 [ 16.352675] 5ca0: eec24380 ed5ef468 eec24330 eec24280 00000000 ed5ef468 00000000 00000000 [ 16.360820] 5cc0: 00000000 eec24280 ed623200 eec24290 00000001 00000000 00000000 c00e3ae0 [ 16.368966] 5ce0: c00e24e0 00000000 00000000 00000001 eec24280 00000000 ed623270 ed62320c [ 16.377111] 5d00: 00000000 ed5ef380 00000003 c00e3d44 c0526794 00000001 00000003 c038a538 [ 16.385257] 5d20: c0526794 c00e2a58 ef0b5d34 c0222588 ed623240 00000001 eec24280 ed623200 [ 16.393402] 5d40: ed623270 ed62320c 00000000 ed5ef380 00000003 c01addc0 c01acc14 ed623200 [ 16.401548] 5d60: 00000080 0b300000 00000002 ed59bc00 ed59c400 ed59f000 ed59f000 ed59f2bc [ 16.409693] 5d80: c02bb9f8 ed59c400 ed59f000 ed59c588 ef0b5dd6 c02bc8fc 00000056 ed59c46c [ 16.417838] 5da0: 00000003 ed59bc00 ed59c400 c02bd52c 00000003 ef0b5dd6 00000008 ed6f1708 [ 16.425984] 5dc0: c04673bc ed59f298 00000000 34362e33 42694720 32310000 694b2038 c0000042 [ 16.434129] 5de0: 00000001 ed59c408 c058a11c 00000000 c054840c 00000001 00000000 c05480b8 [ 16.442275] 5e00: ef013c00 c02b1e7c c02b1e64 c022602c 00000000 ed59c408 c02261dc ed598c08 [ 16.450420] 5e20: 00000000 c0224808 ef365698 ed5bd3c4 ed59c408 ed59c43c c0547fd8 c0225f08 [ 16.458566] 5e40: ed59c408 ed59c408 c0547fd8 c0225678 ed59c408 00000000 ed59c410 c0223ce0 [ 16.466712] 5e60: ef013c00 c0383030 c0479de4 ef0b5e84 00061a80 ed59c400 ed59c408 c0479d90 [ 16.474857] 5e80: 00061a80 c03b3794 00000000 c02b229c c0479d90 c0451bc0 c0479d4c 00000001 [ 16.483003] 5ea0: ed598c00 00000000 ed598c00 c02b4ce8 00061a80 40ff8080 00000000 ed598de~$off # PYBOOT: Exception: kernel: ERROR: failed to boot: Unhandled fault # PYBOOT: Time: 38.81 seconds. # PYBOOT: Result: FAIL
arm-u8500_defconfig -------------------
ste-snowball: FAIL: last 40 lines of boot log: ----------------------------------------------
[ 3.493499] musb-hdrc musb-hdrc.20.auto: musb_init_controller failed with status -517 [ 3.501342] platform musb-hdrc.20.auto: Driver musb-hdrc requests probe deferral [ 3.509277] pinctrl-nomadik soc:pinctrl: pin GPIO256_AF28 already requested by a03e0000.usb_per5; cannot claim for musb-hdrc.20.auto [ 3.521209] pinctrl-nomadik soc:pinctrl: pin-256 (musb-hdrc.20.auto) status -22 [ 3.528503] pinctrl-nomadik soc:pinctrl: could not request pin 256 (GPIO256_AF28) from group usb_a_1 on device pinctrl-nomadik [ 3.539978] musb-ux500 musb-hdrc.20.auto: Error applying setting, reverse things back [ 3.548492] pinctrl-nomadik soc:pinctrl: pin GPIO256_AF28 already requested by a03e0000.usb_per5; cannot claim for musb-hdrc.21.auto [ 3.560424] pinctrl-nomadik soc:pinctrl: pin-256 (musb-hdrc.21.auto) status -22 [ 3.567749] pinctrl-nomadik soc:pinctrl: could not request pin 256 (GPIO256_AF28) from group usb_a_1 on device pinctrl-nomadik [ 3.579223] musb-hdrc musb-hdrc.21.auto: Error applying setting, reverse things back [ 3.587036] HS USB OTG: no transceiver configured [ 3.591766] musb-hdrc musb-hdrc.21.auto: musb_init_controller failed with status -517 [ 3.599609] platform musb-hdrc.21.auto: Driver musb-hdrc requests probe deferral [ 3.607543] pinctrl-nomadik soc:pinctrl: pin GPIO256_AF28 already requested by a03e0000.usb_per5; cannot claim for musb-hdrc.21.auto [ 3.619445] pinctrl-nomadik soc:pinctrl: pin-256 (musb-hdrc.21.auto) status -22 [ 3.626770] pinctrl-nomadik soc:pinctrl: could not request pin 256 (GPIO256_AF28) from group usb_a_1 on device pinctrl-nomadik [ 3.638244] musb-ux500 musb-hdrc.21.auto: Error applying setting, reverse things back [ 3.646759] pinctrl-nomadik soc:pinctrl: pin GPIO256_AF28 already requested by a03e0000.usb_per5; cannot claim for musb-hdrc.22.auto [ 3.658691] pinctrl-nomadik soc:pinctrl: pin-256 (musb-hdrc.22.auto) status -22 [ 3.666015] pinctrl-nomadik soc:pinctrl: could not request pin 256 (GPIO256_AF28) from group usb_a_1 on device pinctrl-nomadik [ 3.677490] musb-hdrc musb-hdrc.22.auto: Error applying setting, reverse things back [ 3.685302] HS USB OTG: no transceiver configured [ 3.690032] musb-hdrc musb-hdrc.22.auto: musb_init_controller failed with status -517 [ 3.697875] platform musb-hdrc.22.auto: Driver musb-hdrc requests probe deferral [ 3.705810] pinctrl-nomadik soc:pinctrl: pin GPIO256_AF28 already requested by a03e0000.usb_per5; cannot claim for musb-hdrc.22.auto [ 3.717712] pinctrl-nomadik soc:pinctrl: pin-256 (musb-hdrc.22.auto) status -22 [ 3.725036] pinctrl-nomadik soc:pinctrl: could not request pin 256 (GPIO256_AF28) from group usb_a_1 on device pinctrl-nomadik [ 3.736511] musb-ux500 musb-hdrc.22.auto: Error applying setting, reverse things back [ 3.745056] pinctrl-nomadik soc:pinctrl: pin GPIO256_AF28 already requested by a03e0000.usb_per5; cannot claim for musb-hdrc.23.auto [ 3.756988] pinctrl-nomadik soc:pinctrl: pin-256 (musb-hdrc.23.auto) status -22 [ 3.764312] pinctrl-nomadik soc:pinctrl: could not request pin 256 (GPIO256_AF28) from group usb_a_1 on device pinctrl-nomadik [ 3.775787] musb-hdrc musb-hdrc.23.auto: Error applying setting, reverse things back [ 3.783599] HS USB OTG: no transceiver configured [ 3.788299] musb-hdrc musb-hdrc.23.auto: musb_init_controller failed with status -517 [ 3.796142] platform musb-hdrc.23.auto: Driver musb-hdrc requests probe deferral [ 3.804077] pinctrl-nomadik soc:pinctrl: pin GPIO256_AF28 already requested by a03e0000.usb_per5; cannot claim for musb-hdrc.23.auto ~$off # PYBOOT: Exception: kernel: ERROR: failed to boot: <class 'pexpect.TIMEOUT'> # PYBOOT: Time: 113.46 seconds. # PYBOOT: Result: FAIL
Sachin,
On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot khilman@linaro.org wrote:
Tree/Branch: mainline Git describe: v3.16-rc1-2-gebe0618 Failed boot tests (console logs at the end) =========================================== exynos5420-arndale-octa: FAIL: arm-exynos_defconfig ste-snowball: FAIL: arm-u8500_defconfig
FYI... these failures are getting more consistent on my octa board, but still not failing every time.
Kevin
On 06/17/2014 10:23 PM, Kevin Hilman wrote:
Sachin,
On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot khilman@linaro.org wrote:
Tree/Branch: mainline Git describe: v3.16-rc1-2-gebe0618 Failed boot tests (console logs at the end) =========================================== exynos5420-arndale-octa: FAIL: arm-exynos_defconfig ste-snowball: FAIL: arm-u8500_defconfig
FYI... these failures are getting more consistent on my octa board, but still not failing every time.
Kevin
Hi Kevin,
Same here.
Observation: If you soft-reset the board (through the jumpers) after getting this problem, the problem keeps repeating. But if you hard-reset the board (by removing the power cord), the problem doesn't occur during next iteration.
On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera trblinux@gmail.com wrote:
On 06/17/2014 10:23 PM, Kevin Hilman wrote:
Sachin,
On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot khilman@linaro.org wrote:
Tree/Branch: mainline Git describe: v3.16-rc1-2-gebe0618 Failed boot tests (console logs at the end) =========================================== exynos5420-arndale-octa: FAIL: arm-exynos_defconfig ste-snowball: FAIL: arm-u8500_defconfig
FYI... these failures are getting more consistent on my octa board, but still not failing every time.
Kevin
Hi Kevin,
Same here.
Observation: If you soft-reset the board (through the jumpers) after getting this problem, the problem keeps repeating. But if you hard-reset the board (by removing the power cord), the problem doesn't occur during next iteration.
I don't ever use the soft-reset, I only toggle the wall power. I don't ever actually remove the power cord though, I'm using a USB-controlled relay to toggle the wall power.
Kevin
On 06/18/2014 09:22 AM, Kevin Hilman wrote:
On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera trblinux@gmail.com wrote:
On 06/17/2014 10:23 PM, Kevin Hilman wrote:
Sachin,
On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot khilman@linaro.org wrote:
Tree/Branch: mainline Git describe: v3.16-rc1-2-gebe0618 Failed boot tests (console logs at the end) =========================================== exynos5420-arndale-octa: FAIL: arm-exynos_defconfig ste-snowball: FAIL: arm-u8500_defconfig
FYI... these failures are getting more consistent on my octa board, but still not failing every time.
Kevin
Hi Kevin,
Same here.
Observation: If you soft-reset the board (through the jumpers) after getting this problem, the problem keeps repeating. But if you hard-reset the board (by removing the power cord), the problem doesn't occur during next iteration.
I don't ever use the soft-reset, I only toggle the wall power. I don't ever actually remove the power cord though, I'm using a USB-controlled relay to toggle the wall power.
Kevin
Laura,
We are getting following kernel panic [1] (not always, but quite regularly) while booting Arndale-Octa (based on Samsung's Exynos5420) board with upstream kernel. I haven't observed this issue with other boards yet.
This issue is observed when I am booting with uImage + dtb (within roughly ~10 iterations).
There is no issue when I am booting appended zImage (zImage+dtb). I tried running it over 200 cycles, but without any failure.
'git bisect' points to this commit. commit 1c2f87c22566 "ARM: 8025/1: Get rid of meminfo"
Reverting this commit on top of v3.16-rc1-17-ge99cfa2, I tested for around 100 iterations of booting with uImage+dtb, without any failure.
[1] Kernel log Unhandled fault: external abort on non-linefetch (0x008) at 0xffc00000 Internal error: : 8 [#1] PREEMPT SMP ARM Modules linked in: CPU: 0 PID: 1136 Comm: kworker/u16:0 Not tainted 3.15.0-rc1-00027-g1c8c3cf-dirty #5 task: ed0f5800 ti: eda52000 task.ti: eda52000 PC is at __copy_to_user_std+0x4c/0x3a8 LR is at copy_page_to_iter+0xb0/0x26c pc : [<c01b858c>] lr : [<c00982c0>] psr: 60000113 sp : eda53de4 ip : 00000000 fp : ee103040 r10: ed9fb700 r9 : 00000080 r8 : eda53eb8 r7 : ffc00000 r6 : 00000000 r5 : 00000080 r4 : eda53e78 r3 : 00000000 r2 : 00000000 r1 : ffc00000 r0 : ed9fb700 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel Control: 10c5387d Table: 2000406a DAC: 00000015 Process kworker/u16:0 (pid: 1136, stack limit = 0xeda52240)
On 06/19/2014 03:02 PM, Tushar Behera wrote:
On 06/18/2014 09:22 AM, Kevin Hilman wrote:
On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera trblinux@gmail.com wrote:
On 06/17/2014 10:23 PM, Kevin Hilman wrote:
Sachin,
On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot khilman@linaro.org wrote:
Tree/Branch: mainline Git describe: v3.16-rc1-2-gebe0618 Failed boot tests (console logs at the end) =========================================== exynos5420-arndale-octa: FAIL: arm-exynos_defconfig ste-snowball: FAIL: arm-u8500_defconfig
FYI... these failures are getting more consistent on my octa board, but still not failing every time.
Kevin
Hi Kevin,
Same here.
Observation: If you soft-reset the board (through the jumpers) after getting this problem, the problem keeps repeating. But if you hard-reset the board (by removing the power cord), the problem doesn't occur during next iteration.
I don't ever use the soft-reset, I only toggle the wall power. I don't ever actually remove the power cord though, I'm using a USB-controlled relay to toggle the wall power.
Kevin
Laura,
We are getting following kernel panic [1] (not always, but quite regularly) while booting Arndale-Octa (based on Samsung's Exynos5420) board with upstream kernel. I haven't observed this issue with other boards yet.
This issue is observed when I am booting with uImage + dtb (within roughly ~10 iterations).
Some more information:
The boot logs are provided in pastebin, okay[2] and failed[3].
In case of boot failures, I am getting a higher value for vm_total_pages (684424 in [3]). In case of successful boot on my board, it is always 521232 [2] on my board.
[2] http://pastebin.com/1iLaizuL [3] http://pastebin.com/5tdDt4GL
There is no issue when I am booting appended zImage (zImage+dtb). I tried running it over 200 cycles, but without any failure.
'git bisect' points to this commit. commit 1c2f87c22566 "ARM: 8025/1: Get rid of meminfo"
Reverting this commit on top of v3.16-rc1-17-ge99cfa2, I tested for around 100 iterations of booting with uImage+dtb, without any failure.
[1] Kernel log Unhandled fault: external abort on non-linefetch (0x008) at 0xffc00000 Internal error: : 8 [#1] PREEMPT SMP ARM Modules linked in: CPU: 0 PID: 1136 Comm: kworker/u16:0 Not tainted 3.15.0-rc1-00027-g1c8c3cf-dirty #5 task: ed0f5800 ti: eda52000 task.ti: eda52000 PC is at __copy_to_user_std+0x4c/0x3a8 LR is at copy_page_to_iter+0xb0/0x26c pc : [<c01b858c>] lr : [<c00982c0>] psr: 60000113 sp : eda53de4 ip : 00000000 fp : ee103040 r10: ed9fb700 r9 : 00000080 r8 : eda53eb8 r7 : ffc00000 r6 : 00000000 r5 : 00000080 r4 : eda53e78 r3 : 00000000 r2 : 00000000 r1 : ffc00000 r0 : ed9fb700 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel Control: 10c5387d Table: 2000406a DAC: 00000015 Process kworker/u16:0 (pid: 1136, stack limit = 0xeda52240)
Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
On 06/19/2014 04:12 PM, Tushar Behera wrote:
On 06/19/2014 03:02 PM, Tushar Behera wrote:
On 06/18/2014 09:22 AM, Kevin Hilman wrote:
On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera trblinux@gmail.com wrote:
On 06/17/2014 10:23 PM, Kevin Hilman wrote:
Sachin,
On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot khilman@linaro.org wrote:
Tree/Branch: mainline Git describe: v3.16-rc1-2-gebe0618 Failed boot tests (console logs at the end) =========================================== exynos5420-arndale-octa: FAIL: arm-exynos_defconfig ste-snowball: FAIL: arm-u8500_defconfig
FYI... these failures are getting more consistent on my octa board, but still not failing every time.
Kevin
Hi Kevin,
Same here.
Observation: If you soft-reset the board (through the jumpers) after getting this problem, the problem keeps repeating. But if you hard-reset the board (by removing the power cord), the problem doesn't occur during next iteration.
I don't ever use the soft-reset, I only toggle the wall power. I don't ever actually remove the power cord though, I'm using a USB-controlled relay to toggle the wall power.
Kevin
Laura,
We are getting following kernel panic [1] (not always, but quite regularly) while booting Arndale-Octa (based on Samsung's Exynos5420) board with upstream kernel. I haven't observed this issue with other boards yet.
This issue is observed when I am booting with uImage + dtb (within roughly ~10 iterations).
Some more information:
The boot logs are provided in pastebin, okay[2] and failed[3].
In case of boot failures, I am getting a higher value for vm_total_pages (684424 in [3]). In case of successful boot on my board, it is always 521232 [2] on my board.
[2] http://pastebin.com/1iLaizuL [3] http://pastebin.com/5tdDt4GL
There is no issue when I am booting appended zImage (zImage+dtb). I tried running it over 200 cycles, but without any failure.
'git bisect' points to this commit. commit 1c2f87c22566 "ARM: 8025/1: Get rid of meminfo"
Reverting this commit on top of v3.16-rc1-17-ge99cfa2, I tested for around 100 iterations of booting with uImage+dtb, without any failure.
[1] Kernel log Unhandled fault: external abort on non-linefetch (0x008) at 0xffc00000 Internal error: : 8 [#1] PREEMPT SMP ARM Modules linked in: CPU: 0 PID: 1136 Comm: kworker/u16:0 Not tainted 3.15.0-rc1-00027-g1c8c3cf-dirty #5 task: ed0f5800 ti: eda52000 task.ti: eda52000 PC is at __copy_to_user_std+0x4c/0x3a8 LR is at copy_page_to_iter+0xb0/0x26c pc : [<c01b858c>] lr : [<c00982c0>] psr: 60000113 sp : eda53de4 ip : 00000000 fp : ee103040 r10: ed9fb700 r9 : 00000080 r8 : eda53eb8 r7 : ffc00000 r6 : 00000000 r5 : 00000080 r4 : eda53e78 r3 : 00000000 r2 : 00000000 r1 : ffc00000 r0 : ed9fb700 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel Control: 10c5387d Table: 2000406a DAC: 00000015 Process kworker/u16:0 (pid: 1136, stack limit = 0xeda52240)
On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera trblinux@gmail.com wrote:
Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
On 06/19/2014 04:12 PM, Tushar Behera wrote:
On 06/19/2014 03:02 PM, Tushar Behera wrote:
On 06/18/2014 09:22 AM, Kevin Hilman wrote:
On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera trblinux@gmail.com wrote:
On 06/17/2014 10:23 PM, Kevin Hilman wrote:
Sachin,
On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot khilman@linaro.org wrote: > > Tree/Branch: mainline > Git describe: v3.16-rc1-2-gebe0618 > Failed boot tests (console logs at the end) > =========================================== > exynos5420-arndale-octa: FAIL: arm-exynos_defconfig > ste-snowball: FAIL: arm-u8500_defconfig
FYI... these failures are getting more consistent on my octa board, but still not failing every time.
Kevin
Hi Kevin,
Same here.
Observation: If you soft-reset the board (through the jumpers) after getting this problem, the problem keeps repeating. But if you hard-reset the board (by removing the power cord), the problem doesn't occur during next iteration.
I don't ever use the soft-reset, I only toggle the wall power. I don't ever actually remove the power cord though, I'm using a USB-controlled relay to toggle the wall power.
Kevin
Laura,
We are getting following kernel panic [1] (not always, but quite regularly) while booting Arndale-Octa (based on Samsung's Exynos5420) board with upstream kernel. I haven't observed this issue with other boards yet.
This issue is observed when I am booting with uImage + dtb (within roughly ~10 iterations).
Some more information:
The boot logs are provided in pastebin, okay[2] and failed[3].
In case of boot failures, I am getting a higher value for vm_total_pages (684424 in [3]). In case of successful boot on my board, it is always 521232 [2] on my board.
I can confirm that reverting the "Get rid of meminfo" patch gets the Octa board booting reliably again for me also.
In case it helps, some boot logs for failures from the last copule linux-next build/boot cycles can be seen here: http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-ex... http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-ex...
Kevin
On 6/23/2014 11:32 AM, Kevin Hilman wrote:
On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera trblinux@gmail.com wrote:
Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
On 06/19/2014 04:12 PM, Tushar Behera wrote:
On 06/19/2014 03:02 PM, Tushar Behera wrote:
On 06/18/2014 09:22 AM, Kevin Hilman wrote:
On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera trblinux@gmail.com wrote:
On 06/17/2014 10:23 PM, Kevin Hilman wrote: > Sachin, > > On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot khilman@linaro.org wrote: >> >> Tree/Branch: mainline >> Git describe: v3.16-rc1-2-gebe0618 >> Failed boot tests (console logs at the end) >> =========================================== >> exynos5420-arndale-octa: FAIL: arm-exynos_defconfig >> ste-snowball: FAIL: arm-u8500_defconfig > > FYI... these failures are getting more consistent on my octa board, > but still not failing every time. > > Kevin >
Hi Kevin,
Same here.
Observation: If you soft-reset the board (through the jumpers) after getting this problem, the problem keeps repeating. But if you hard-reset the board (by removing the power cord), the problem doesn't occur during next iteration.
I don't ever use the soft-reset, I only toggle the wall power. I don't ever actually remove the power cord though, I'm using a USB-controlled relay to toggle the wall power.
Kevin
Laura,
We are getting following kernel panic [1] (not always, but quite regularly) while booting Arndale-Octa (based on Samsung's Exynos5420) board with upstream kernel. I haven't observed this issue with other boards yet.
This issue is observed when I am booting with uImage + dtb (within roughly ~10 iterations).
Some more information:
The boot logs are provided in pastebin, okay[2] and failed[3].
In case of boot failures, I am getting a higher value for vm_total_pages (684424 in [3]). In case of successful boot on my board, it is always 521232 [2] on my board.
I can confirm that reverting the "Get rid of meminfo" patch gets the Octa board booting reliably again for me also.
In case it helps, some boot logs for failures from the last copule linux-next build/boot cycles can be seen here: http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-ex... http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-ex...
Sorry, I missed this yesterday. I'm going to take a look.
Thanks, Laura
On 6/24/2014 10:47 AM, Laura Abbott wrote:
On 6/23/2014 11:32 AM, Kevin Hilman wrote:
On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera trblinux@gmail.com wrote:
Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
On 06/19/2014 04:12 PM, Tushar Behera wrote:
On 06/19/2014 03:02 PM, Tushar Behera wrote:
On 06/18/2014 09:22 AM, Kevin Hilman wrote:
On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera trblinux@gmail.com wrote: > On 06/17/2014 10:23 PM, Kevin Hilman wrote: >> Sachin, >> >> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot khilman@linaro.org wrote: >>> >>> Tree/Branch: mainline >>> Git describe: v3.16-rc1-2-gebe0618 >>> Failed boot tests (console logs at the end) >>> =========================================== >>> exynos5420-arndale-octa: FAIL: arm-exynos_defconfig >>> ste-snowball: FAIL: arm-u8500_defconfig >> >> FYI... these failures are getting more consistent on my octa board, >> but still not failing every time. >> >> Kevin >> > > Hi Kevin, > > Same here. > > Observation: If you soft-reset the board (through the jumpers) after > getting this problem, the problem keeps repeating. But if you hard-reset > the board (by removing the power cord), the problem doesn't occur during > next iteration.
I don't ever use the soft-reset, I only toggle the wall power. I don't ever actually remove the power cord though, I'm using a USB-controlled relay to toggle the wall power.
Kevin
Laura,
We are getting following kernel panic [1] (not always, but quite regularly) while booting Arndale-Octa (based on Samsung's Exynos5420) board with upstream kernel. I haven't observed this issue with other boards yet.
This issue is observed when I am booting with uImage + dtb (within roughly ~10 iterations).
Some more information:
The boot logs are provided in pastebin, okay[2] and failed[3].
In case of boot failures, I am getting a higher value for vm_total_pages (684424 in [3]). In case of successful boot on my board, it is always 521232 [2] on my board.
I can confirm that reverting the "Get rid of meminfo" patch gets the Octa board booting reliably again for me also.
In case it helps, some boot logs for failures from the last copule linux-next build/boot cycles can be seen here: http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-ex... http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-ex...
Sorry, I missed this yesterday. I'm going to take a look.
Were all of
http://pastebin.com/1iLaizuL http://pastebin.com/5tdDt4GL http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-ex... http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-ex...
collected on the same type of board with the same amount of DRAM? I'm seeing a different amount of total pages across all those logs. All the logs have the same lowmem limit so it seems like the upper bound was being calculated incorrectly for passing to free_area_init_node. Nothing is immediately jumping out at me so can you boot up with a small debug patch?
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index 659c75d..88eac1f 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -187,6 +187,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low, unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES]; struct memblock_region *reg;
+ pr_err("XXXXXXX min %lx max_low %lx max_high %lx\n", min, max_low, max_high); + __memblock_dump_all(); /* * initialise the zones. */
It would be helpful to do this across a few bootups to see if the values are actually consistent. I'll keep looking in the meantime.
Thanks, Laura
On 06/25/2014 03:59 AM, Laura Abbott wrote:
On 6/24/2014 10:47 AM, Laura Abbott wrote:
On 6/23/2014 11:32 AM, Kevin Hilman wrote:
On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera trblinux@gmail.com wrote:
Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
On 06/19/2014 04:12 PM, Tushar Behera wrote:
On 06/19/2014 03:02 PM, Tushar Behera wrote:
On 06/18/2014 09:22 AM, Kevin Hilman wrote: > On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera trblinux@gmail.com wrote: >> On 06/17/2014 10:23 PM, Kevin Hilman wrote: >>> Sachin, >>> >>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot khilman@linaro.org wrote: >>>> >>>> Tree/Branch: mainline >>>> Git describe: v3.16-rc1-2-gebe0618 >>>> Failed boot tests (console logs at the end) >>>> =========================================== >>>> exynos5420-arndale-octa: FAIL: arm-exynos_defconfig >>>> ste-snowball: FAIL: arm-u8500_defconfig >>> >>> FYI... these failures are getting more consistent on my octa board, >>> but still not failing every time. >>> >>> Kevin >>> >> >> Hi Kevin, >> >> Same here. >> >> Observation: If you soft-reset the board (through the jumpers) after >> getting this problem, the problem keeps repeating. But if you hard-reset >> the board (by removing the power cord), the problem doesn't occur during >> next iteration. > > I don't ever use the soft-reset, I only toggle the wall power. I > don't ever actually remove the power cord though, I'm using a > USB-controlled relay to toggle the wall power. > > Kevin >
Laura,
We are getting following kernel panic [1] (not always, but quite regularly) while booting Arndale-Octa (based on Samsung's Exynos5420) board with upstream kernel. I haven't observed this issue with other boards yet.
This issue is observed when I am booting with uImage + dtb (within roughly ~10 iterations).
Some more information:
The boot logs are provided in pastebin, okay[2] and failed[3].
In case of boot failures, I am getting a higher value for vm_total_pages (684424 in [3]). In case of successful boot on my board, it is always 521232 [2] on my board.
I can confirm that reverting the "Get rid of meminfo" patch gets the Octa board booting reliably again for me also.
In case it helps, some boot logs for failures from the last copule linux-next build/boot cycles can be seen here: http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-ex... http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-ex...
Sorry, I missed this yesterday. I'm going to take a look.
Were all of
http://pastebin.com/1iLaizuL http://pastebin.com/5tdDt4GL http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-ex... http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-ex...
collected on the same type of board with the same amount of DRAM? I'm seeing a different amount of total pages across all those logs. All the logs have the same lowmem limit so it seems like the upper bound was being calculated incorrectly for passing to free_area_init_node. Nothing is immediately jumping out at me so can you boot up with a small debug patch?
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index 659c75d..88eac1f 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -187,6 +187,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low, unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES]; struct memblock_region *reg;
pr_err("XXXXXXX min %lx max_low %lx max_high %lx\n", min, max_low, max_high);
__memblock_dump_all(); /* * initialise the zones. */
It would be helpful to do this across a few bootups to see if the values are actually consistent. I'll keep looking in the meantime.
Thanks, Laura
Thanks Laura for the pointer. In case of error, I am getting some random memblock_add() calls from drivers/of/fdt.c:early_init_dt_scan_memory.
The issue seems to be from u-boot, where it is not updating the memory subnode properly. I have got a fix for the u-boot, which I am testing right now. I will update tomorrow after I do some more test.
Additional changes in kernel. diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c index c4cddf0..bca82b3 100644 --- a/drivers/of/fdt.c +++ b/drivers/of/fdt.c @@ -817,7 +817,7 @@ int __init early_init_dt_scan_memory(unsigned long node, const char *uname,
endp = reg + (l / sizeof(__be32));
- pr_debug("memory scan node %s, reg size %d, data: %x %x %x %x,\n", + pr_err("memory scan node %s, reg size %d, data: %x %x %x %x,\n", uname, l, reg[0], reg[1], reg[2], reg[3]);
while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) { @@ -891,6 +891,7 @@ void __init __weak early_init_dt_add_memory_arch(u64 base, u64 size) size -= phys_offset - base; base = phys_offset; } + printk("trb: memblock_add base (%llx) size(%llx)\n", base, size); memblock_add(base, size); }
Kernel log:
memory scan node memory, reg size 96, data: 20 10 30 10, trb: memblock_add base (20000000) size(10000000) trb: memblock_add base (30000000) size(10000000) trb: memblock_add base (40000000) size(10000000) trb: memblock_add base (50000000) size(10000000) trb: memblock_add base (60000000) size(10000000) trb: memblock_add base (70000000) size(10000000) trb: memblock_add base (80000000) size(10000000) trb: memblock_add base (90000000) size(fa00000) trb: memblock_add base (fffff000) size(fffff000) trb: memblock_add base (ffeff000) size(fffff000) trb: memblock_add base (fbfff000) size(fffff000) trb: memblock_add base (fffff000) size(effff000) Machine model: Insignal Arndale Octa evaluation board based on EXYNOS5420 bootconsole [earlycon0] enabled Memory policy: Data cache writealloc XXXXXXX min 20000 max_low 4f800 max_high fffff MEMBLOCK configuration: memory size = 0x82a00fff reserved size = 0x75e947 memory.cnt = 0x4 memory[0x0] [0x00000020000000-0x00000042ffffff], 0x23000000 bytes flags: 0x0 memory[0x1] [0x00000043800000-0x00000050ffffff], 0xd800000 bytes flags: 0x0 memory[0x2] [0x00000051800000-0x0000009f9fffff], 0x4e200000 bytes flags: 0x0 memory[0x3] [0x000000fbfff000-0x000000fffffffe], 0x4000fff bytes flags: 0x0 reserved.cnt = 0x6 reserved[0x0] [0x00000020004000-0x00000020007fff], 0x4000 bytes flags: 0x0 reserved[0x1] [0x000000200082c0-0x0000002059cb7f], 0x5948c0 bytes flags: 0x0 reserved[0x2] [0x0000002fe45000-0x0000002fe4fea7], 0xaea8 bytes flags: 0x0 reserved[0x3] [0x0000002fe50000-0x0000002ffff09e], 0x1af09f bytes flags: 0x0 reserved[0x4] [0x0000004f7f3000-0x0000004f7fbfff], 0x9000 bytes flags: 0x0 reserved[0x5] [0x0000004f7fcec0-0x0000004f7fffff], 0x3140 bytes flags: 0x0
On 6/25/2014 5:13 AM, Tushar Behera wrote:
On 06/25/2014 03:59 AM, Laura Abbott wrote:
On 6/24/2014 10:47 AM, Laura Abbott wrote:
On 6/23/2014 11:32 AM, Kevin Hilman wrote:
On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera trblinux@gmail.com wrote:
Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
On 06/19/2014 04:12 PM, Tushar Behera wrote:
On 06/19/2014 03:02 PM, Tushar Behera wrote: > On 06/18/2014 09:22 AM, Kevin Hilman wrote: >> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera trblinux@gmail.com wrote: >>> On 06/17/2014 10:23 PM, Kevin Hilman wrote: >>>> Sachin, >>>> >>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot khilman@linaro.org wrote: >>>>> >>>>> Tree/Branch: mainline >>>>> Git describe: v3.16-rc1-2-gebe0618 >>>>> Failed boot tests (console logs at the end) >>>>> =========================================== >>>>> exynos5420-arndale-octa: FAIL: arm-exynos_defconfig >>>>> ste-snowball: FAIL: arm-u8500_defconfig >>>> >>>> FYI... these failures are getting more consistent on my octa board, >>>> but still not failing every time. >>>> >>>> Kevin >>>> >>> >>> Hi Kevin, >>> >>> Same here. >>> >>> Observation: If you soft-reset the board (through the jumpers) after >>> getting this problem, the problem keeps repeating. But if you hard-reset >>> the board (by removing the power cord), the problem doesn't occur during >>> next iteration. >> >> I don't ever use the soft-reset, I only toggle the wall power. I >> don't ever actually remove the power cord though, I'm using a >> USB-controlled relay to toggle the wall power. >> >> Kevin >> > > Laura, > > We are getting following kernel panic [1] (not always, but quite > regularly) while booting Arndale-Octa (based on Samsung's Exynos5420) > board with upstream kernel. I haven't observed this issue with other > boards yet. > > This issue is observed when I am booting with uImage + dtb (within > roughly ~10 iterations). >
Some more information:
The boot logs are provided in pastebin, okay[2] and failed[3].
In case of boot failures, I am getting a higher value for vm_total_pages (684424 in [3]). In case of successful boot on my board, it is always 521232 [2] on my board.
I can confirm that reverting the "Get rid of meminfo" patch gets the Octa board booting reliably again for me also.
In case it helps, some boot logs for failures from the last copule linux-next build/boot cycles can be seen here: http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-ex... http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-ex...
Sorry, I missed this yesterday. I'm going to take a look.
Were all of
http://pastebin.com/1iLaizuL http://pastebin.com/5tdDt4GL http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-ex... http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-ex...
collected on the same type of board with the same amount of DRAM? I'm seeing a different amount of total pages across all those logs. All the logs have the same lowmem limit so it seems like the upper bound was being calculated incorrectly for passing to free_area_init_node. Nothing is immediately jumping out at me so can you boot up with a small debug patch?
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index 659c75d..88eac1f 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -187,6 +187,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low, unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES]; struct memblock_region *reg;
pr_err("XXXXXXX min %lx max_low %lx max_high %lx\n", min, max_low, max_high);
__memblock_dump_all(); /* * initialise the zones. */
It would be helpful to do this across a few bootups to see if the values are actually consistent. I'll keep looking in the meantime.
Thanks, Laura
Thanks Laura for the pointer. In case of error, I am getting some random memblock_add() calls from drivers/of/fdt.c:early_init_dt_scan_memory.
The issue seems to be from u-boot, where it is not updating the memory subnode properly. I have got a fix for the u-boot, which I am testing right now. I will update tomorrow after I do some more test.
I'm concerned my change can stay as is if this is exposing an issue in u-boot. Asking people to change bootloaders rarely ends well. Can you elaborate on what u-boot is doing that would be exposing this issue?
Thanks, Laura
On 06/26/2014 03:27 AM, Laura Abbott wrote:
On 6/25/2014 5:13 AM, Tushar Behera wrote:
On 06/25/2014 03:59 AM, Laura Abbott wrote:
On 6/24/2014 10:47 AM, Laura Abbott wrote:
On 6/23/2014 11:32 AM, Kevin Hilman wrote:
On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera trblinux@gmail.com wrote:
Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
On 06/19/2014 04:12 PM, Tushar Behera wrote: > On 06/19/2014 03:02 PM, Tushar Behera wrote: >> On 06/18/2014 09:22 AM, Kevin Hilman wrote: >>> On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera trblinux@gmail.com wrote: >>>> On 06/17/2014 10:23 PM, Kevin Hilman wrote: >>>>> Sachin, >>>>> >>>>> On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot khilman@linaro.org wrote: >>>>>> >>>>>> Tree/Branch: mainline >>>>>> Git describe: v3.16-rc1-2-gebe0618 >>>>>> Failed boot tests (console logs at the end) >>>>>> =========================================== >>>>>> exynos5420-arndale-octa: FAIL: arm-exynos_defconfig >>>>>> ste-snowball: FAIL: arm-u8500_defconfig >>>>> >>>>> FYI... these failures are getting more consistent on my octa board, >>>>> but still not failing every time. >>>>> >>>>> Kevin >>>>> >>>> >>>> Hi Kevin, >>>> >>>> Same here. >>>> >>>> Observation: If you soft-reset the board (through the jumpers) after >>>> getting this problem, the problem keeps repeating. But if you hard-reset >>>> the board (by removing the power cord), the problem doesn't occur during >>>> next iteration. >>> >>> I don't ever use the soft-reset, I only toggle the wall power. I >>> don't ever actually remove the power cord though, I'm using a >>> USB-controlled relay to toggle the wall power. >>> >>> Kevin >>> >> >> Laura, >> >> We are getting following kernel panic [1] (not always, but quite >> regularly) while booting Arndale-Octa (based on Samsung's Exynos5420) >> board with upstream kernel. I haven't observed this issue with other >> boards yet. >> >> This issue is observed when I am booting with uImage + dtb (within >> roughly ~10 iterations). >> > > Some more information: > > The boot logs are provided in pastebin, okay[2] and failed[3]. > > In case of boot failures, I am getting a higher value for vm_total_pages > (684424 in [3]). In case of successful boot on my board, it is always > 521232 [2] on my board.
I can confirm that reverting the "Get rid of meminfo" patch gets the Octa board booting reliably again for me also.
In case it helps, some boot logs for failures from the last copule linux-next build/boot cycles can be seen here: http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-ex... http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-ex...
Sorry, I missed this yesterday. I'm going to take a look.
Were all of
http://pastebin.com/1iLaizuL http://pastebin.com/5tdDt4GL http://armcloud.us/kernel-ci/next/next-20140623/arm-exynos_defconfig/boot-ex... http://armcloud.us/kernel-ci/next/next-20140620/arm-exynos_defconfig/boot-ex...
collected on the same type of board with the same amount of DRAM? I'm seeing a different amount of total pages across all those logs. All the logs have the same lowmem limit so it seems like the upper bound was being calculated incorrectly for passing to free_area_init_node. Nothing is immediately jumping out at me so can you boot up with a small debug patch?
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index 659c75d..88eac1f 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -187,6 +187,8 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max_low, unsigned long zone_size[MAX_NR_ZONES], zhole_size[MAX_NR_ZONES]; struct memblock_region *reg;
pr_err("XXXXXXX min %lx max_low %lx max_high %lx\n", min, max_low, max_high);
__memblock_dump_all(); /* * initialise the zones. */
It would be helpful to do this across a few bootups to see if the values are actually consistent. I'll keep looking in the meantime.
Thanks, Laura
Thanks Laura for the pointer. In case of error, I am getting some random memblock_add() calls from drivers/of/fdt.c:early_init_dt_scan_memory.
The issue seems to be from u-boot, where it is not updating the memory subnode properly. I have got a fix for the u-boot, which I am testing right now. I will update tomorrow after I do some more test.
I'm concerned my change can stay as is if this is exposing an issue in u-boot. Asking people to change bootloaders rarely ends well. Can you elaborate on what u-boot is doing that would be exposing this issue?
Thanks, Laura
Laura,
Here is my assessment of the current situation.
*Bug in the u-boot* Current u-boot for Arndale-octa board has defined NR_BANKS as 12 and the core uses a global structure (gd->bd) to maintain the start and size of individual banks. Depending on the revision of SoC used on the board, the board file [1] updates the start/size for either 8 or 12 banks. In case of current revision of Arndale-Octa boards, the board file always updates start/size for 8 banks, leaving the start/size data for remaining 4 banks uninitialized.
But the u-boot core[2] updates the value of all the 12 banks, thus potentially updating invalid data for last 4 banks.
The issue can be fixed by resetting the start/size for unused memory banks to 0/0.[3]
*Before migration to memblock* The path for adding DRAM banks was done through [4]. For Exynos systems, NR_BANKS was defined as 8. The initial check for rejecting any banks beyond NR_BANKS was good enough for fixing this issue. The bootlog[5] (with some debug messages) shows the invalid data, both in u-boot and kernel. Please grep for "NR_BANKS too low, ignoring memory" in the bootlog.
*After migration to memblock* Now that the memory banks are added through [6], all the memory banks are getting updated unconditionally resulting in the panic.
IMO, the bug is in u-boot and we should fix that.
[1] https://github.com/tusharbehera/u-boot/blob/tracking-arndale-octa-v2012.07/b... [2] https://github.com/tusharbehera/u-boot/blob/tracking-arndale-octa-v2012.07/a... [3] https://github.com/tusharbehera/u-boot/commit/9be794e886603a80f2c8686a75187a... [4] https://github.com/tusharbehera/linux/blob/v3.15-rc1/arch/arm/kernel/setup.c... [5] http://pastebin.com/vLP2oG1mP [6] https://github.com/tusharbehera/linux/blob/v3.16-rc1/drivers/of/fdt.c#L878
Hi Tushar,
Here is my assessment of the current situation.
Thanks for digging into this and the detailed diagnosis.
*Bug in the u-boot* Current u-boot for Arndale-octa board has defined NR_BANKS as 12 and the core uses a global structure (gd->bd) to maintain the start and size of individual banks. Depending on the revision of SoC used on the board, the board file [1] updates the start/size for either 8 or 12 banks. In case of current revision of Arndale-Octa boards, the board file always updates start/size for 8 banks, leaving the start/size data for remaining 4 banks uninitialized.
But the u-boot core[2] updates the value of all the 12 banks, thus potentially updating invalid data for last 4 banks.
The issue can be fixed by resetting the start/size for unused memory banks to 0/0.[3]
*Before migration to memblock* The path for adding DRAM banks was done through [4]. For Exynos systems, NR_BANKS was defined as 8. The initial check for rejecting any banks beyond NR_BANKS was good enough for fixing this issue. The bootlog[5] (with some debug messages) shows the invalid data, both in u-boot and kernel. Please grep for "NR_BANKS too low, ignoring memory" in the bootlog.
*After migration to memblock* Now that the memory banks are added through [6], all the memory banks are getting updated unconditionally resulting in the panic.
IMO, the bug is in u-boot and we should fix that.
I agree that the u-boot bug needs to be fixed, and FWIW, I updated my u-boot and haven't seen the boot failure yet after several boots with next-20140625.
That being said, since it's not always feasible/practical to update u-boot, and when it comes down to it, this is still a kernel regression, we should also fix the kernel to sanity check the values coming from u-boot, like it was doing before.
Could you (or Laura) come up with a way to recreate the sanity check that was detecting this problem before and ignoring those banks?
Thanks,
Kevin
On Thu, Jun 26, 2014 at 07:59:19AM -0700, Kevin Hilman wrote:
I agree that the u-boot bug needs to be fixed, and FWIW, I updated my u-boot and haven't seen the boot failure yet after several boots with next-20140625.
That being said, since it's not always feasible/practical to update u-boot, and when it comes down to it, this is still a kernel regression, we should also fix the kernel to sanity check the values coming from u-boot, like it was doing before.
It wasn't sanity checking the values (there is some sanity checking, but the sanity checking doesn't catch this).
What caught it was that the kernel was configured to only look at the first 8 of the 12 meminfo entries with ATAGs. Since we no longer have that limit, all meminfo entries are now looked at (since the kernel doesn't need the limit.)
We could add back a soft-limit on the number of meminfo entries, but this has to be platform specific. Another entry to go into the mach_info structures?
On 6/26/2014 8:17 AM, Russell King - ARM Linux wrote:
On Thu, Jun 26, 2014 at 07:59:19AM -0700, Kevin Hilman wrote:
I agree that the u-boot bug needs to be fixed, and FWIW, I updated my u-boot and haven't seen the boot failure yet after several boots with next-20140625.
That being said, since it's not always feasible/practical to update u-boot, and when it comes down to it, this is still a kernel regression, we should also fix the kernel to sanity check the values coming from u-boot, like it was doing before.
It wasn't sanity checking the values (there is some sanity checking, but the sanity checking doesn't catch this).
What caught it was that the kernel was configured to only look at the first 8 of the 12 meminfo entries with ATAGs. Since we no longer have that limit, all meminfo entries are now looked at (since the kernel doesn't need the limit.)
We could add back a soft-limit on the number of meminfo entries, but this has to be platform specific. Another entry to go into the mach_info structures?
This is the least bad option I've come up with. It brings back early_init_dt_add_memory_arch so we can use arm_add_memory and stop adding memory if it reaches an upper threshold. I was debating setting the default at 12 or 8 but setting at 12 seems like it would involve the fewest platform changes.
Thanks, Laura
----8<----
From 1a5265fd178fea0da432fa9d49ce28e78bd25e04 Mon Sep 17 00:00:00 2001
From: Laura Abbott lauraa@codeaurora.org Date: Thu, 26 Jun 2014 11:23:44 -0700 Subject: [PATCH] arm: Add back maximum bank limit
Commit 1c2f87c22566cd057bc8cde10c37ae9da1a1bb76 (ARM: 8025/1: Get rid of meminfo) dropped the upper bound on the number of memory banks that can be added as there was no technical need in the kernel. It turns out though, some bootloaders (specifically the arndale-octa exynos boards) may pass invalid memory information and rely on the kernel to not parse this data. This is a bug in the bootloader but we still need to work around this. Re-introduce a maximum bank limit per board to prevent invalid banks from being passed to the kernel.
Signed-off-by: Laura Abbott lauraa@codeaurora.org --- arch/arm/include/asm/mach/arch.h | 8 ++++++-- arch/arm/kernel/devtree.c | 4 ++++ arch/arm/kernel/setup.c | 16 ++++++++++++++++ arch/arm/mach-exynos/exynos.c | 1 + 4 files changed, 27 insertions(+), 2 deletions(-)
diff --git a/arch/arm/include/asm/mach/arch.h b/arch/arm/include/asm/mach/arch.h index 060a75e..2a436ac 100644 --- a/arch/arm/include/asm/mach/arch.h +++ b/arch/arm/include/asm/mach/arch.h @@ -40,6 +40,8 @@ struct machine_desc { unsigned int video_start; /* start of video RAM */ unsigned int video_end; /* end of video RAM */
+ unsigned int bank_limit; /* maximum number of memory + * banks to add */ unsigned char reserve_lp0 :1; /* never has lp0 */ unsigned char reserve_lp1 :1; /* never has lp1 */ unsigned char reserve_lp2 :1; /* never has lp2 */ @@ -85,7 +87,8 @@ static const struct machine_desc __mach_desc_##_type \ __used \ __attribute__((__section__(".arch.info.init"))) = { \ .nr = MACH_TYPE_##_type, \ - .name = _name, + .name = _name, \ + .bank_limit = 12,
#define MACHINE_END \ }; @@ -95,6 +98,7 @@ static const struct machine_desc __mach_desc_##_name \ __used \ __attribute__((__section__(".arch.info.init"))) = { \ .nr = ~0, \ - .name = _namestr, + .name = _namestr, \ + .bank_limit = 12,
#endif diff --git a/arch/arm/kernel/devtree.c b/arch/arm/kernel/devtree.c index e94a157..ea9ce92 100644 --- a/arch/arm/kernel/devtree.c +++ b/arch/arm/kernel/devtree.c @@ -27,6 +27,10 @@ #include <asm/mach/arch.h> #include <asm/mach-types.h>
+void __init early_init_dt_add_memory_arch(u64 base, u64 size) +{ + arm_add_memory(base, size); +}
#ifdef CONFIG_SMP extern struct of_cpu_method __cpu_method_of_table[]; diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c index 8a16ee5..3ab94d1 100644 --- a/arch/arm/kernel/setup.c +++ b/arch/arm/kernel/setup.c @@ -629,11 +629,26 @@ void __init dump_machine_table(void) /* can't use cpu_relax() here as it may require MMU setup */; }
+static unsigned int bank_cnt; +static unsigned int max_cnt; + int __init arm_add_memory(u64 start, u64 size) { u64 aligned_start;
/* + * Some buggy bootloaders rely on the old meminfo behavior of not adding + * more than n banks since anything past that may contain invalid data. + */ + if (bank_cnt >= max_cnt) { + pr_crit("Max banks too low, ignoring memory at 0x%08llx\n", + (long long)start); + return -EINVAL; + } + + bank_cnt++; + + /* * Ensure that start/size are aligned to a page boundary. * Size is appropriately rounded down, start is rounded up. */ @@ -879,6 +894,7 @@ void __init setup_arch(char **cmdline_p) mdesc = setup_machine_tags(__atags_pointer, __machine_arch_type); machine_desc = mdesc; machine_name = mdesc->name; + max_cnt = mdesc->bank_limit;
if (mdesc->reboot_mode != REBOOT_HARD) reboot_mode = mdesc->reboot_mode; diff --git a/arch/arm/mach-exynos/exynos.c b/arch/arm/mach-exynos/exynos.c index f38cf7c..91283fd 100644 --- a/arch/arm/mach-exynos/exynos.c +++ b/arch/arm/mach-exynos/exynos.c @@ -350,4 +350,5 @@ DT_MACHINE_START(EXYNOS_DT, "SAMSUNG EXYNOS (Flattened Device Tree)") .dt_compat = exynos_dt_compat, .restart = exynos_restart, .reserve = exynos_reserve, + .bank_limit = 8, MACHINE_END
On 06/27/2014 01:12 AM, Laura Abbott wrote:
+static unsigned int bank_cnt; +static unsigned int max_cnt;
int __init arm_add_memory(u64 start, u64 size) { u64 aligned_start; /*
* Some buggy bootloaders rely on the old meminfo behavior of not adding
* more than n banks since anything past that may contain invalid data.
*/
- if (bank_cnt >= max_cnt) {
pr_crit("Max banks too low, ignoring memory at 0x%08llx\n",
(long long)start);
return -EINVAL;
- }
- bank_cnt++;
- /*
*/
- Ensure that start/size are aligned to a page boundary.
- Size is appropriately rounded down, start is rounded up.
@@ -879,6 +894,7 @@ void __init setup_arch(char **cmdline_p) mdesc = setup_machine_tags(__atags_pointer, __machine_arch_type); machine_desc = mdesc; machine_name = mdesc->name;
- max_cnt = mdesc->bank_limit;
arm_add_memory is getting called before this is being set, resulting in none of the memory banks getting added[1].
setup_machine_fdt -> early_init_dt_scan -> early_init_dt_scan_memory
Would it make sense to re-introduce the config option ARM_NR_BANKS and replace max_cnt with NR_BANKS?
[1] http://pastebin.com/MawYD7kb
if (mdesc->reboot_mode != REBOOT_HARD) reboot_mode = mdesc->reboot_mode; diff --git a/arch/arm/mach-exynos/exynos.c b/arch/arm/mach-exynos/exynos.c index f38cf7c..91283fd 100644 --- a/arch/arm/mach-exynos/exynos.c +++ b/arch/arm/mach-exynos/exynos.c @@ -350,4 +350,5 @@ DT_MACHINE_START(EXYNOS_DT, "SAMSUNG EXYNOS (Flattened Device Tree)") .dt_compat = exynos_dt_compat, .restart = exynos_restart, .reserve = exynos_reserve,
- .bank_limit = 8,
MACHINE_END
On 6/26/2014 8:06 PM, Tushar Behera wrote:
On 06/27/2014 01:12 AM, Laura Abbott wrote:
+static unsigned int bank_cnt; +static unsigned int max_cnt;
int __init arm_add_memory(u64 start, u64 size) { u64 aligned_start;
/*
* Some buggy bootloaders rely on the old meminfo behavior of not adding
* more than n banks since anything past that may contain invalid data.
*/
if (bank_cnt >= max_cnt) {
pr_crit("Max banks too low, ignoring memory at 0x%08llx\n",
(long long)start);
return -EINVAL;
}
bank_cnt++;
/*
- Ensure that start/size are aligned to a page boundary.
- Size is appropriately rounded down, start is rounded up.
*/
@@ -879,6 +894,7 @@ void __init setup_arch(char **cmdline_p) mdesc = setup_machine_tags(__atags_pointer, __machine_arch_type); machine_desc = mdesc; machine_name = mdesc->name;
- max_cnt = mdesc->bank_limit;
arm_add_memory is getting called before this is being set, resulting in none of the memory banks getting added[1].
setup_machine_fdt -> early_init_dt_scan -> early_init_dt_scan_memory
Would it make sense to re-introduce the config option ARM_NR_BANKS and replace max_cnt with NR_BANKS?
I was hoping to avoid re-introducing the config option but that may be the case if we can't make the machine_info work. I'll take a better look tomorrow.
Thanks, Laura
On Fri, Jun 27, 2014 at 02:09:58AM -0700, Laura Abbott wrote:
On 6/26/2014 8:06 PM, Tushar Behera wrote:
arm_add_memory is getting called before this is being set, resulting in none of the memory banks getting added[1].
setup_machine_fdt -> early_init_dt_scan -> early_init_dt_scan_memory
Would it make sense to re-introduce the config option ARM_NR_BANKS and replace max_cnt with NR_BANKS?
I was hoping to avoid re-introducing the config option but that may be the case if we can't make the machine_info work. I'll take a better look tomorrow.
The problem with the config option is that it's not single zImage friendly.
Hi Kevin and Tushar,
Am 26.06.2014 16:59, schrieb Kevin Hilman:
IMO, the bug is in u-boot and we should fix that.
I agree that the u-boot bug needs to be fixed, and FWIW, I updated my u-boot and haven't seen the boot failure yet after several boots with next-20140625.
Could you clarify your test setup: Are you using the original InSignal SPL [1] with just your own u-boot.bin? Or do you have access to some newer Samsung-signed SPL?
That being said, since it's not always feasible/practical to update u-boot, and when it comes down to it, this is still a kernel regression, we should also fix the kernel to sanity check the values coming from u-boot, like it was doing before.
Sounds good.
Apart from this memory issue here, I noticed that CPUs don't appear to be in HYP mode for virtualization, which had required a signed SPL update for the ODROID-XU [2]. And to me it looks as if there's no Arndale Octa support in upstream U-Boot [3], no real maintenance on the InSignal fork [4] and a policy of not cooperating with others [5].
Thanks, Andreas
[1] http://forum.insignal.co.kr/viewtopic.php?f=6&t=3199 [2] http://forum.odroid.com/viewtopic.php?f=64&t=2778&start=40#p32581 [3] http://git.denx.de/?p=u-boot.git%3Ba=blob%3Bf=boards.cfg%3Bh=947f2bc5ba2794c... [4] http://git.insignal.co.kr/insignal/arndale_octa-jb_mr1.1/u-boot/ [5] http://forum.insignal.co.kr/viewtopic.php?f=40&t=3613
On 06/26/2014 10:34 PM, Andreas Färber wrote:
Hi Kevin and Tushar,
Am 26.06.2014 16:59, schrieb Kevin Hilman:
IMO, the bug is in u-boot and we should fix that.
I agree that the u-boot bug needs to be fixed, and FWIW, I updated my u-boot and haven't seen the boot failure yet after several boots with next-20140625.
Could you clarify your test setup: Are you using the original InSignal SPL [1] with just your own u-boot.bin? Or do you have access to some newer Samsung-signed SPL?
The u-boot changes for Arndale-Octa was done as part of an activity within Linaro. Insignal had signed the SPL binary for us. You can extract the signed SPL binary from following hwpack[6] (tar xfz and then within u_boot folder[7]).
The source code for this u-boot can be found here.[8]
Just in case, commands to flash u-boot binaries are listed here.[9]
That being said, since it's not always feasible/practical to update u-boot, and when it comes down to it, this is still a kernel regression, we should also fix the kernel to sanity check the values coming from u-boot, like it was doing before.
Sounds good.
Apart from this memory issue here, I noticed that CPUs don't appear to be in HYP mode for virtualization, which had required a signed SPL update for the ODROID-XU [2]. And to me it looks as if there's no Arndale Octa support in upstream U-Boot [3], no real maintenance on the InSignal fork [4] and a policy of not cooperating with others [5].
Adding Arndale-Octa support to upstream U-Boot was on a TODO list, but that didn't materialize because of some other reasons.
Thanks, Andreas
[1] http://forum.insignal.co.kr/viewtopic.php?f=6&t=3199 [2] http://forum.odroid.com/viewtopic.php?f=64&t=2778&start=40#p32581 [3] http://git.denx.de/?p=u-boot.git%3Ba=blob%3Bf=boards.cfg%3Bh=947f2bc5ba2794c... [4] http://git.insignal.co.kr/insignal/arndale_octa-jb_mr1.1/u-boot/ [5] http://forum.insignal.co.kr/viewtopic.php?f=40&t=3613
[6] http://snapshots.linaro.org/kernel-hwpack/linux-linaro-tracking-ll-arndale-o... [7] <path_to_extracted_folder>/u_boot/usr/lib/u-boot/arndale_octa [8] git.linaro.org/landing-teams/working/samsung/u-boot.git/shortlog/refs/heads/tracking-arndale_octa [9] http://pastebin.com/pfGF2giq
Thanks,
Am 23.06.2014 20:32, schrieb Kevin Hilman:
On Sun, Jun 22, 2014 at 8:56 PM, Tushar Behera trblinux@gmail.com wrote:
Adding linux-samsung-soc and linux-arm-kernel ML for wider audience.
On 06/19/2014 04:12 PM, Tushar Behera wrote:
On 06/19/2014 03:02 PM, Tushar Behera wrote:
On 06/18/2014 09:22 AM, Kevin Hilman wrote:
On Tue, Jun 17, 2014 at 8:26 PM, Tushar Behera trblinux@gmail.com wrote:
On 06/17/2014 10:23 PM, Kevin Hilman wrote: > Sachin, > > On Mon, Jun 16, 2014 at 11:16 PM, Kevin's boot bot khilman@linaro.org wrote: >> >> Tree/Branch: mainline >> Git describe: v3.16-rc1-2-gebe0618 >> Failed boot tests (console logs at the end) >> =========================================== >> exynos5420-arndale-octa: FAIL: arm-exynos_defconfig >> ste-snowball: FAIL: arm-u8500_defconfig > > FYI... these failures are getting more consistent on my octa board, > but still not failing every time. > > Kevin >
Hi Kevin,
Same here.
Observation: If you soft-reset the board (through the jumpers) after getting this problem, the problem keeps repeating. But if you hard-reset the board (by removing the power cord), the problem doesn't occur during next iteration.
I don't ever use the soft-reset, I only toggle the wall power. I don't ever actually remove the power cord though, I'm using a USB-controlled relay to toggle the wall power.
Kevin
Laura,
We are getting following kernel panic [1] (not always, but quite regularly) while booting Arndale-Octa (based on Samsung's Exynos5420) board with upstream kernel. I haven't observed this issue with other boards yet.
This issue is observed when I am booting with uImage + dtb (within roughly ~10 iterations).
Some more information:
The boot logs are provided in pastebin, okay[2] and failed[3].
In case of boot failures, I am getting a higher value for vm_total_pages (684424 in [3]). In case of successful boot on my board, it is always 521232 [2] on my board.
I can confirm that reverting the "Get rid of meminfo" patch gets the Octa board booting reliably again for me also.
Confirming that the revert [1] fixes also the issue I was reporting for my Arndale Octa. I'm using zImage + dtb and had been resetting via J10.
Regards, Andreas
[1] https://github.com/afaerber/linux/commits/arndale-octa-next
kernel-build-reports@lists.linaro.org