On 8 August 2018 at 07:07, Rafael David Tinoco rafael.tinoco@linaro.org wrote:
Naresh,
This is not Greg's last push but I thought it would be good to do an initial check before last results so, if facing same issues, you could have something by tomorrow.
On Tue, Aug 07, 2018 at 10:26:16PM +0000, Linaro QA wrote:
Summary
kernel: 4.4.147-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.4.y git commit: db3e08ea00d093d99d1ccbd3cf069db67967de00 git describe: v4.4.146-12-gdb3e08ea00d0 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/build/v4.4.146-12-...
Regressions (compared to build v4.4.146)
qemu_arm: ltp-fs-tests: * read_all_dev * runltp_fs
* test src: git://github.com/linux-test-project/ltp.git
read_all_dev:
tst_test.c:1015: INFO: Timeout per run is 0h 15m 00s read_all.c:321: BROK: queue_push(workers[j].q, path) timed out
This test is basically checking each mounted filesystem and checking if it is supposed to try to read or not (ignore list). If it is supposed to read directory contents it schedules a read callback function (that will read just a chunk of each file) in a thread from a thread pool created in the beginning of the test (1 per CPU).
This was a timeout (and stack frame was in a queue_push because that is where the code spend most of its time, most likely). Timeout could have happened because trying to read a /dev/XXX (to be discovered) could have caused one thread from the thread pool not to return from the read() attempt on the device (char/block).
If it continues in the next push, we should take a deeper look.
This is a new LTP test case and this is not a new failure. On the latest push this test case got PASS. https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/tests/ltp-fs-tests...
qemu_x86_64: ltp-syscalls-tests: * inotify06 * runltp_syscalls
* test src: git://github.com/linux-test-project/ltp.git
Test case history shows it was failed only once and PASS on latest push, https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/tests/ltp-syscalls...
safe_macros.c:225: BROK: inotify06.c:76: open(fname_1,66,0600) failed: ENOENT
tst_test.c:1015: INFO: Timeout per run is 0h 30m 00s inotify06.c:87: BROK: inotify_init failed: EMFILE
OTOH, It was failed only on qemu_x86_64 and PASS on real hardware. We have to investigate why only failed on qemu.
- EMFILE: The user limit on the total number of inotify instances has been reached.
- EMFILE: The per-process limit on the number of open file descriptors has been reached.
- ENFILE: The system-wide limit on the total number of open files has been reached.
This test exhausts inotify file descriptors and watchers by creating groups of inotify file descriptors for 5 files and adding watchers to it. At the same time one child task keeps opening (O_CREAT), closing and unlinking the same files that are being monitored by the inotify logic.
This test tests an old bug of kernels 4.2, trying to cause a kernel crash because of a race condition. In our case it is likely that the test has stressed the logic of opening/closing files (and adding watchers to it). If the watcher logic got stuck and close could not cleanup the file descriptor reference, then we could achieve a file descriptor limit (thus the errors above). There is also a chance of inotify watcher logic (when adding) causing open() to missbehave (returning ENOENT when O_CREAT flag is given):
ENOENT: O_CREAT is not set (not our case) and the named file does not exist. Or, a directory component in pathname does not exist or is a dangling symbolic link.
ENOENT pathname refers to a nonexistent directory, O_TMPFILE and one of O_WRONLY or O_RDWR were specified in flags, but this kernel version does not provide the O_TMPFILE functionality.
Those 2 don't seem to be the case of this test.
@Naresh, mind opening a bug and pasting this small comment on this issue ? (Reminder: we have many inotify/fanotify missing patches for v4.4 and, likely, if this is something hard to be fixed it will be a wont fix. My fear is that this is something related to open() and inotify() watchers. Checking if this test has ever failed before would also be benefitial.
This was not a failure earlier, However, re-submitted test job to run inotify06 in loop for 50 iterations. If this re-producible will report bug. https://lkft.validation.linaro.org/scheduler/job/360271
x15 - arm: ltp-syscalls-tests: * poll02 * runltp_syscalls
* test src: git://github.com/linux-test-project/ltp.git
tst_timer_test.c:318: INFO: min 1004586us, max 1005156us, median 1004586us, trunc mean 1004586.00us (discarded 1) tst_timer_test.c:321: FAIL: poll() slept for too long
I think this is the one (this was hard to find on the output). I explained the issue with this in another situation:
For QEMU: https://bugs.linaro.org/show_bug.cgi?id=3852#c4 For X15: https://bugs.linaro.org/show_bug.cgi?id=3852#c8
Having paravirtulized clock source might help in this one as well (https://projects.linaro.org/browse/KV-114).
I have re-submitted the job to test poll02 for 50 iterations. https://lkft.validation.linaro.org/scheduler/job/360274
PASSED on the latest push. https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/tests/ltp-syscalls...
Thank you.
- Naresh
Thanks o/
Rafael