On Mon, Jul 18, 2022 at 02:58:39AM +0000, lizhijian@fujitsu.com wrote:
0Day/LKP observed that the kselftest blocks forever since one of the pidfd_wait doesn't terminate in 1 of 30 runs. After digging into the source, we found that it blocks at: ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WCONTINUED, NULL), 0);
wait_states has below testing flow: CHILD PARENT ---------------+-------------- 1 STOP itself 2 WAIT for CHILD STOPPED 3 SIGNAL CHILD to CONT 4 CONT 5 STOP itself 5' WAIT for CHILD CONT 6 WAIT for CHILD STOPPED
The problem is that the kernel cannot ensure the order of 5 and 5', once 5's goes first, the test will fail.
we can reproduce it by: $ while true; do make run_tests -C pidfd; done
Introduce a blocking read in child process to make sure the parent can check its WCONTINUED.
CC: Philip Li philip.li@intel.com Reported-by: kernel test robot lkp@intel.com Signed-off-by: Li Zhijian lizhijian@fujitsu.com
I have almost forgotten this patch since the former version post over 6 months ago. This time I just do a rebase and update the comments. V2: rewrite with pipe to avoid usleep
Thanks for sticking with this! Reviewed-by: Christian Brauner (Microsoft) brauner@kernel.org