Re: stable-rc 4.4.147-rc1/db3e08ea00d0: regressions detected in project stable v4.4.y on OE

8 Aug 2018

      On 8 August 2018 at 07:07, Rafael David Tinoco rafael.tinoco@linaro.org wrote:
...
Naresh,
This is not Greg's last push but I thought it would be good to do
an initial check before last results so, if facing same issues, you
could have something by tomorrow.
On Tue, Aug 07, 2018 at 10:26:16PM +0000, Linaro QA wrote:
...
Summary
kernel: 4.4.147-rc1
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.4.y
git commit: db3e08ea00d093d99d1ccbd3cf069db67967de00
git describe: v4.4.146-12-gdb3e08ea00d0
Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.146-12-...
Regressions (compared to build v4.4.146)
qemu_arm:
  ltp-fs-tests:
    * read_all_dev
    * runltp_fs
* test src: git://github.com/linux-test-project/ltp.git

read_all_dev:
tst_test.c:1015: INFO: Timeout per run is 0h 15m 00s read_all.c:321:
BROK: queue_push(workers[j].q, path) timed out
This test is basically checking each mounted filesystem and checking if
it is supposed to try to read or not (ignore list). If it is supposed to
read directory contents it schedules a read callback function (that will
read just a chunk of each file) in a thread from a thread pool created
in the beginning of the test (1 per CPU).
This was a timeout (and stack frame was in a queue_push because that is
where the code spend most of its time, most likely).  Timeout could have
happened because trying to read a /dev/XXX (to be discovered) could have
caused one thread from the thread pool not to return from the read()
attempt on the device (char/block).
If it continues in the next push, we should take a deeper look.
This is a new LTP test case and
this is not a new failure.
On the latest push this test case got PASS.
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/tests/ltp-fs-tests...
...
...
qemu_x86_64:
  ltp-syscalls-tests:
    * inotify06
    * runltp_syscalls
* test src: git://github.com/linux-test-project/ltp.git

Test case history shows it was failed only once and PASS on latest push,
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/tests/ltp-syscalls...
...
safe_macros.c:225: BROK: inotify06.c:76: open(fname_1,66,0600) failed: ENOENT
tst_test.c:1015: INFO: Timeout per run is 0h 30m 00s
inotify06.c:87: BROK: inotify_init failed: EMFILE
OTOH, It was failed only on qemu_x86_64 and PASS on real hardware.
We have to investigate why only failed on qemu.
...

EMFILE: The user limit on the total number of inotify instances has been reached.
EMFILE: The per-process limit on the number of open file descriptors has been reached.
ENFILE: The system-wide limit on the total number of open files has been reached.

This test exhausts inotify file descriptors and watchers by creating
groups of inotify file descriptors for 5 files and adding watchers to
it. At the same time one child task keeps opening (O_CREAT), closing and
unlinking the same files that are being monitored by the inotify logic.
This test tests an old bug of kernels 4.2, trying to cause a kernel crash because
of a race condition. In our case it is likely that the test has stressed
the logic of opening/closing files (and adding watchers to it). If the watcher
logic got stuck and close could not cleanup the file descriptor reference, then
we could achieve a file descriptor limit (thus the errors above). There is also
a chance of inotify watcher logic (when adding) causing open() to missbehave
(returning ENOENT when O_CREAT flag is given):
ENOENT: O_CREAT is not set (not our case) and the named file does not
exist.  Or, a directory component in pathname does not exist or is a
dangling symbolic link.
ENOENT pathname refers to a nonexistent directory, O_TMPFILE and one of
O_WRONLY or O_RDWR were specified  in flags, but this kernel version
does not provide the O_TMPFILE functionality.
Those 2 don't seem to be the case of this test.
@Naresh, mind opening a bug and pasting this small comment on this issue
?  (Reminder: we have many inotify/fanotify missing patches for v4.4
and, likely, if this is something hard to be fixed it will be a wont
fix. My fear is that this is something related to open() and inotify()
watchers. Checking if this test has ever failed before would also be
benefitial.
This was not a failure earlier,
However, re-submitted test job to run inotify06 in loop for 50 iterations.
If this re-producible  will report bug.
https://lkft.validation.linaro.org/scheduler/job/360271
...
...
x15 - arm:
  ltp-syscalls-tests:
    * poll02
    * runltp_syscalls
* test src: git://github.com/linux-test-project/ltp.git

tst_timer_test.c:318: INFO: min 1004586us, max 1005156us, median 1004586us, trunc mean 1004586.00us (discarded 1)
tst_timer_test.c:321: FAIL: poll() slept for too long
I think this is the one (this was hard to find on the output). I explained
the issue with this in another situation:
For QEMU:
    https://bugs.linaro.org/show_bug.cgi?id=3852#c4
For X15:
    https://bugs.linaro.org/show_bug.cgi?id=3852#c8
Having paravirtulized clock source might help in this one as well
(https://projects.linaro.org/browse/KV-114).
I have re-submitted the job to test poll02 for 50 iterations.
https://lkft.validation.linaro.org/scheduler/job/360274
PASSED on the latest push.
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/tests/ltp-syscalls...
Thank you.
- Naresh
...
Thanks o/

Rafael

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: stable-rc 4.4.147-rc1/db3e08ea00d0: regressions detected in project stable v4.4.y on OE

Summary

Regressions (compared to build v4.4.146)