Currently it is difficult to tell the difference between an infrastructure problem in a device bootloader, or a kernel failure.
If a kernel silently fails to boot, LAVA throws a bootloader-commands timeout because it hasn’t matched ‘Linux version’ to know the kernel has started. However, this timeout could also be caused by a real problem in the bootloader, such as a DHCP failure or a TFTP timeout. KernelCI would like to catch actual infrastructure problems in the bootloader, but can’t tell if the kernel just didn’t boot, or the commands actually timed out in the bootloader.
To fix this, we're going to: - change the bootloader-commands action to finish when it has sent the last command - have auto-login-action takeover monitoring the kernel boot process - extend bootloader-commands to match more infrastructure problems - update uboot commands to execute the commands in order (like the other bootloader implementations), rather than building a script and then calling that as the last command
This work is scoped for the January 2018.1 LAVA release.
On 23 November 2017 at 13:23, Matt Hart matthew.hart@linaro.org wrote:
Currently it is difficult to tell the difference between an infrastructure problem in a device bootloader, or a kernel failure.
If a kernel silently fails to boot, LAVA throws a bootloader-commands timeout because it hasn’t matched ‘Linux version’ to know the kernel has started. However, this timeout could also be caused by a real problem in the bootloader, such as a DHCP failure or a TFTP timeout. KernelCI would like to catch actual infrastructure problems in the bootloader, but can’t tell if the kernel just didn’t boot, or the commands actually timed out in the bootloader.
To fix this, we're going to:
- change the bootloader-commands action to finish when it has sent the
last command
- have auto-login-action takeover monitoring the kernel boot process
would it be possible to re-instantiate the v1 feature that measured kernel boot time until "Freeing unused kernel memory"? Currently I see no way of measuring this interval. The only available measurement is auto-login-action that completes when there is a prompt available.
milosz
- extend bootloader-commands to match more infrastructure problems
- update uboot commands to execute the commands in order (like the
other bootloader implementations), rather than building a script and then calling that as the last command
This work is scoped for the January 2018.1 LAVA release. _______________________________________________ Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users
On 23/11/17 13:23, Matt Hart wrote:
Currently it is difficult to tell the difference between an infrastructure problem in a device bootloader, or a kernel failure.
If a kernel silently fails to boot, LAVA throws a bootloader-commands timeout because it hasn’t matched ‘Linux version’ to know the kernel has started. However, this timeout could also be caused by a real problem in the bootloader, such as a DHCP failure or a TFTP timeout.
Most bootloaders have a message that they print just before starting the kernel, e.g. "Starting kernel". Would it be possible to also look for this message (if defined in the device config) and in case of timeout report an infrastructure error?
KernelCI would like to catch actual infrastructure problems in the bootloader, but can’t tell if the kernel just didn’t boot, or the commands actually timed out in the bootloader.
To fix this, we're going to:
- change the bootloader-commands action to finish when it has sent the
last command
- have auto-login-action takeover monitoring the kernel boot process
- extend bootloader-commands to match more infrastructure problems
- update uboot commands to execute the commands in order (like the
other bootloader implementations), rather than building a script and then calling that as the last command
This work is scoped for the January 2018.1 LAVA release.
Thanks, Guillaume