Re: stable-rc 4.4.147-rc1/db3e08ea00d0: regressions detected in project stable v4.4.y on OE

8 Aug 2018


      ...
...
read_all_dev:
tst_test.c:1015: INFO: Timeout per run is 0h 15m 00s read_all.c:321:
BROK: queue_push(workers[j].q, path) timed out
This test is basically checking each mounted filesystem and checking if
it is supposed to try to read or not (ignore list). If it is supposed to
read directory contents it schedules a read callback function (that will
read just a chunk of each file) in a thread from a thread pool created
in the beginning of the test (1 per CPU).
This was a timeout (and stack frame was in a queue_push because that is
where the code spend most of its time, most likely).  Timeout could have
happened because trying to read a /dev/XXX (to be discovered) could have
caused one thread from the thread pool not to return from the read()
attempt on the device (char/block).
If it continues in the next push, we should take a deeper look.
This is a new LTP test case and
this is not a new failure.
On the latest push this test case got PASS.
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/tests/ltp-fs-tests...
...
...
qemu_x86_64:
  ltp-syscalls-tests:
    * inotify06
    * runltp_syscalls
* test src: git://github.com/linux-test-project/ltp.git

Test case history shows it was failed only once and PASS on latest push,
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/tests/ltp-syscalls...
...
safe_macros.c:225: BROK: inotify06.c:76: open(fname_1,66,0600) failed: ENOENT
tst_test.c:1015: INFO: Timeout per run is 0h 30m 00s
inotify06.c:87: BROK: inotify_init failed: EMFILE
OTOH, It was failed only on qemu_x86_64 and PASS on real hardware.
We have to investigate why only failed on qemu.
Yes.
FS test => read_all_dev read_all -d /dev -p -q -r 10
You could simulate this test doing:
root@readall:/dev$ find . -not -type d -exec echo {} ; -exec dd if={} of=/dev/null bs=1M count=1 ;
./vcsa5
0+1 records in
0+1 records out
4004 bytes (4.0 kB, 3.9 KiB) copied, 0.00404404 s, 990 kB/s
./vcs5
0+1 records in
0+1 records out
2000 bytes (2.0 kB, 2.0 KiB) copied, 0.00369915 s, 541 kB/s
...
./hvc3
dd: failed to open './hvc3': No such device
./hvc2
dd: failed to open './hvc2': No such device
./hvc1
dd: failed to open './hvc1': No such device
./hvc0
<stuck>
I'm stuck in /dev/hvc0 in a 4.17 kernel (from Debian). This is likely
what is going on, we are stuck reading a char device (for example) and
the thread waiting for the read() call never finishes its job, so the
timeout triggers. (in my case this comes from CONFIG_HVC_XEN from the
default config file from debian).
We should open a low priority bug for this one orelse we will face this
in a near future depending on which modules are loaded and if the device
being read is giving our read() call something back or not.
Not an issue with this kernel anyway.
Tks for checking Naresh!
...
...
@Naresh, mind opening a bug and pasting this small comment on this issue
?  (Reminder: we have many inotify/fanotify missing patches for v4.4
and, likely, if this is something hard to be fixed it will be a wont
fix. My fear is that this is something related to open() and inotify()
watchers. Checking if this test has ever failed before would also be
benefitial.
This was not a failure earlier,
However, re-submitted test job to run inotify06 in loop for 50 iterations.
If this re-producible  will report bug.
https://lkft.validation.linaro.org/scheduler/job/360271
...
...
For QEMU:
    https://bugs.linaro.org/show_bug.cgi?id=3852#c4
For X15:
    https://bugs.linaro.org/show_bug.cgi?id=3852#c8
Having paravirtulized clock source might help in this one as well
(https://projects.linaro.org/browse/KV-114).
I have re-submitted the job to test poll02 for 50 iterations.
https://lkft.validation.linaro.org/scheduler/job/360274
PASSED on the latest push.
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.4-oe/tests/ltp-syscalls...
Yep, this is timing related.
...
Thank you.
Tks for checking!! Cheers o/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Re: stable-rc 4.4.147-rc1/db3e08ea00d0: regressions detected in project stable v4.4.y on OE