On Wed, Oct 25, 2017 at 9:05 PM, Tom Gall tom.gall@linaro.org wrote:
On Oct 25, 2017, at 5:31 AM, Greg KH gregkh@google.com wrote:
On Tue, Oct 24, 2017 at 08:18:48AM +0100, Milosz Wasilewski wrote:
On 24 October 2017 at 08:14, Linaro QA qa-reports@linaro.org wrote:
\
It's the first time this test failed ever. Here is the log:
perf_event_open02 0 TINFO : overall task clock: 42794146 perf_event_open02 0 TINFO : hw sum: 903650031, task clock sum: 128422221 perf_event_open02 0 TINFO : ratio: 3.000930 perf_event_open02 1 TFAIL : perf_event_open02.c:333: test failed (ratio was greater than )
Source here: https://github.com/linux-test-project/ltp/blob/20170929/testcases/kernel/sys...
So what does this mean? Did a patch in this release break something? If so, what patch, and why isn't it showing up in 4.13 stable and 4.14-rc?
Looking at mainline historical data (which includes 4.13 if you scroll down far enough) : https://qa-reports.linaro.org/lkft/linux-mainline-oe/tests/ltp-syscalls-test...
Seems this one occasionally fails.
Looking at 4.9 : https://qa-reports.linaro.org/lkft/linux-stable-rc-4.9-oe/tests/ltp-syscalls...
Only one failure in the history recorded thus far.
In summary I don’t think we’re looking at a regression. I do think we’re looking at something worth a deeper look tho especially given it’s failing across architectures. How reproducible that might be… that’ll be the fun part.
I looked at the source code and we are not hitting the failure that this is testing for, as that should be 100% reproducible. The test tries to see if the time measured by the performance counters matches the time as observed by the task timers.
The ratio reported above is 3.000930, the successful case would be 'n', while the failure would be 'n + 4', where 'n' is the number of performance counter registers in the CPU (which I don't know for hikey). We report failure for anything larger than 'n + 0.0001', so presumably is n is '3' here and we differ from that by 0.00093, which is significantly outside of the expected range.
This could be the result of
a) hardware specifics in the performance counters, b) problems in the scheduler (e.g. with EAS) c) problems with the timekeeping d) statistical inaccuracies e) fundamental problems with the test case
It would be helpful to run the perf_event_open02 test in verbose mode in a loop to get a little more statistical information (maximum, average, standard deviation) for that ratio number to figure out whether we need to look for an actual bug in one of the above areas or should just change the test case to allow a somewhat wider margin of error. This is probably a low priority compared to other failures we saw, so we could also ignore the result for now.
Arnd