David Laight David.Laight@ACULAB.COM wrote:
From: Oleg Nesterov
Sent: 29 May 2019 17:12 Al, Linus, Eric, please help.
The previous discussion was very confusing, we simply can not understand each other.
To me everything looks very simple and clear, but perhaps I missed something obvious? Please correct me.
I think that the following code is correct
int interrupted = 0;
void sigint_handler(int sig) { interrupted = 1; }
int main(void) { sigset_t sigint, empty;
sigemptyset(&sigint); sigaddset(&sigint, SIGINT); sigprocmask(SIG_BLOCK, &sigint, NULL); signal(SIGINT, sigint_handler); sigemptyset(&empty); // so pselect() unblocks SIGINT ret = pselect(..., &empty);
^^^^^ sigint
if (ret >= 0) // sucess or timeout assert(!interrupted); if (interrupted) assert(ret == -EINTR);
}
IOW, if pselect(sigmask) temporary unblocks SIGINT according to sigmask, this signal should not be delivered if a ready fd was found or timeout. The signal handle should only run if ret == -EINTR.
Personally I think that is wrong. Given code like the above that has: while (!interrupted) { pselect(..., &sigint); // process available data }
You want the signal handler to be executed even if one of the fds always has available data. Otherwise you can't interrupt a process that is always busy.
Agreed... I believe cmogstored has always had a bug in the way it uses epoll_pwait because it failed to check interrupts if:
a) an FD is ready + interrupt b) epoll_pwait returns 0 on interrupt
The bug remains in userspace for a), which I will fix by adding an interrupt check when an FD is ready. The window is very small for a) and difficult to trigger, and also in a rare code path.
The b) case is the kernel bug introduced in 854a6ed56839a40f ("signal: Add restore_user_sigmask()").
I don't think there's any disagreement that b) is a kernel bug.
So the confusion is for a), and POSIX is not clear w.r.t. how pselect/poll works when there's both FD readiness and an interrupt.
Thus I'm inclined to believe *select/*poll/epoll_*wait should follow POSIX read() semantics:
If a read() is interrupted by a signal before it reads any data, it shall return −1 with errno set to [EINTR].
If a read() is interrupted by a signal after it has successfully read some data, it shall return the number of bytes read.
One option is to return -EINTR if a signal is pending when the mask is updated - before even looking at anything else.
Signals that happen later on (eg after a timeout) need not be reported (until the next time around the loop).
I'm not sure that's necessary and it would cause delays in signal handling.