Re: [Lts-dev] 4.9.58/4d4a6a3f: regressions detected in project lkft/linux-stable-rc-4.9-oe

27 Oct 2017


      On Wed, Oct 25, 2017 at 9:05 PM, Tom Gall tom.gall@linaro.org wrote:
...
...
On Oct 25, 2017, at 5:31 AM, Greg KH gregkh@google.com wrote:
On Tue, Oct 24, 2017 at 08:18:48AM +0100, Milosz Wasilewski wrote:
...
On 24 October 2017 at 08:14, Linaro QA qa-reports@linaro.org wrote:
\
...
...
...
It's the first time this test failed ever. Here is the log:
perf_event_open02 0 TINFO : overall task clock: 42794146
perf_event_open02 0 TINFO : hw sum: 903650031, task clock sum: 128422221
perf_event_open02 0 TINFO : ratio: 3.000930
perf_event_open02 1 TFAIL : perf_event_open02.c:333: test failed
(ratio was greater than )
Source here: https://github.com/linux-test-project/ltp/blob/20170929/testcases/kernel/sys...
So what does this mean?  Did a patch in this release break something?
If so, what patch, and why isn't it showing up in 4.13 stable and
4.14-rc?
Looking at mainline historical data (which includes 4.13 if you scroll down far enough) :
https://qa-reports.linaro.org/lkft/linux-mainline-oe/tests/ltp-syscalls-test...
Seems this one occasionally fails.
Looking at 4.9 :
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/tests/ltp-syscalls...
Only one failure in the history recorded thus far.
In summary I don’t think we’re looking at a regression. I do think we’re looking at something
worth a deeper look tho especially given it’s failing across architectures. How reproducible that
might be… that’ll be the fun part.
I looked at the source code and we are not hitting the failure that
this is testing for, as
that should be 100% reproducible. The test tries to see if the time
measured by the
performance counters matches the time as observed by the task timers.
The ratio reported above is 3.000930, the successful case would be
'n', while the failure
would be 'n + 4', where 'n' is the number of performance counter
registers in the CPU
(which I don't know for hikey). We report failure for anything larger
than 'n + 0.0001',
so presumably is n is '3' here and we differ from that by 0.00093,
which is significantly
outside of the expected range.
This could be the result of
a) hardware specifics in the performance counters,
b) problems in the scheduler (e.g. with EAS)
c) problems with the timekeeping
d) statistical inaccuracies
e) fundamental problems with the test case
It would be helpful to run the perf_event_open02 test in verbose mode
in a loop to get
a little more statistical information (maximum, average, standard
deviation) for that
ratio number to figure out whether we need to look for an actual bug
in one of the
above areas or should just change the test case to allow a somewhat wider margin
of error. This is probably a low priority compared to other failures
we saw, so we could
also ignore the result for now.
Arnd

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [Lts-dev] 4.9.58/4d4a6a3f: regressions detected in project lkft/linux-stable-rc-4.9-oe