Hi Amit -
On Thu, Mar 15, 2018 at 10:49:32AM +0530, Amit Kucheria wrote:
On Thu, Mar 15, 2018 at 10:12 AM, Linaro QA qa-reports@linaro.org wrote:
Summary
kernel: 4.16.0-rc5 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git branch: master git commit: 3032f8c504d2b15d58e4c96060a96b47e215573c git describe: v4.16-rc5-46-g3032f8c504d2 Test details: https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v4.16-rc5-46-g303...
No regressions (compared to build v4.16-rc5-4-gfc6eabbbf8ef)
Boards, architectures and test suites:
dragonboard-410c
- boot - fail: 39
We need to reconcile this failure with the subject line to allow better filtering of email.
While it is true that there are no regressions compared to a previous build from a few hours ago, a boot failure is a huge regression from the "baseline". We should be blaring horns and flashing lights to draw attention to it. :-) I know the QCLT is serious about tracking and fixing these as soon as possible.
We agree. The current email template was designed for a very specific usecase and is not optimal for your needs. We plan to have a meeting to discuss this at Connect. There's also related issue https://github.com/Linaro/squad/issues/242.
Second, if you look at the latest mainline results at https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v4.16-rc5-60-g0aa..., dragonboard fails to boot something like 10% of the time due to some transient issue between lava, the lava job template, and the board itself (see https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v4.16-rc5-60-g0aa... for an example lava run). It is hard to blare sirens for a boot failure, when boot failures are a regular and acceptable occurrence. Granted, failing 100% of the time should be and is a big issue.
So here are a few proposals to improve the report:
- Convert these statistics to searchable keyboards that I can setup
email filters on.
e.g. dragonboard-410c * boot - fail: 39
might become,
dragonboard-410c: boot: fail
(Is 39 the number of times a boot was attempted?)
Yes, 39 attempts. A regular run has about 20 separate lava runs per board, and qa-reports automatically re-submits some types of failed jobs.
- Change the subject on this email to reflect that regressions still
exist but nothing changed from the previous build. If possible point to the last build where this failure was NOT present.
We originally did just that, but it was removed from the LKFT template because other people did not like it. Really, the only way to solve this is with per-user email settings.
- Add some easy to search keywords in the email instead of resorting
to pattern matching. This removes the need for proposal 1.
I see from a previous report the following summary, so lkft did warn us but it got lost in the noise. Most times we'd like to only know if db410c fails. How can we achieve that? Perhaps adding a unique keyword e.g. DB410CBOOTFAIL somewhere in the email? Doesn't need to be in your nicely formatted summary. I can then safely ignore all "regression found" email that don't have the keyword.
That's a good suggestion. It would be even better if you only received emails with db410c regressions.
We've failed when we successfully detect a problem, but nobody notices due to the signal:noise ratio. We have to fix that.
This particular boot problem impacted multiple arm64 boards, and so we (the LKFT triage team) didn't consider it a qualcomm landing team issue, and so we did not inform you about our findings (our mistake). Meanwhile, you were working to find the root cause while we already knew it.
Next time, we should think to CC you on our investigations related to db410c, and you should also consider asking us about things you notice either in #linaro-lkft or lkft-triage@lists.linaro.org. A lot of this is just learning to work together. We try to stay on top of these results and usually have done a preliminary investigation within a day or two of any regressions.
Regressions (compared to build v4.14.26)
dragonboard-410c: boot: * dragonboard-410c
* test src: not informed
hi6220-hikey - arm64: boot: * hi6220-hikey
* test src: not informed