On Mon, Jun 03, 2024 at 06:22:32PM +0100, Mark Brown wrote:
On Mon, Jun 03, 2024 at 05:27:52PM +0100, Mark Brown wrote:
On Mon, May 27, 2024 at 08:07:40PM +0100, Mark Brown wrote:
This is now in mainline and appears to be causing several tests (at least the ptrace vmaccess global_attach test on arm64, possibly also some of the epoll tests) that previously were timed out by the harness to to hang instead. A bisect seems to point at this patch in particular, there was a bunch of discussion of the fallout of these patches but I'm afraid I lost track of it, is there something in flight for this? -next is affected as well from the looks of it.
Thanks for the heads up. I warned about not being able to test everything when fixing kselftest last time, but nobody show up. Is there an easy way to run most kselftests? We really need a (more accessible) CI...
FWIW I'm still seeing this on -rc2...
AFAICT this is due to the switch to using clone3() with CLONE_VFORK
I guess it started with the previous vfork() that was later replaced with CLONE_VFORK.
to start the test which means we never even call alarm() to set up the timeout for the test, let alone have the signal for it delivered. I'm a confused about how this could ever work, with clone_vfork() the parent shouldn't run until the child execs (which won't happen here) or exits. Since we don't call alarm() until after we started the child we never actually get that far, but even if we reorder things we'll not get the signal for the alarm if the child messes up since the parent is suspended.
I'm not clear what the original race being fixed here was but it seems like we should revert this since the timeout functionality is pretty important?
It took me a while to fix all the previous issues and it would be much easier to just fix this issue too.
I'm working on it.