On Fri, Oct 25, 2024 at 11:35:03PM +0100, Lorenzo Stoakes wrote:
On Fri, Oct 25, 2024 at 11:56:52PM +0200, Vlastimil Babka wrote:
On 10/25/24 19:12, Lorenzo Stoakes wrote:
On Wed, Oct 23, 2024 at 05:24:40PM +0100, Lorenzo Stoakes wrote:
Implement a new lightweight guard page feature, that is regions of userland virtual memory that, when accessed, cause a fatal signal to arise.
<snip>
Hi Andrew - Could you apply the below fix-patch? I realise we must handle fatal signals and conditional rescheduling in the vector_madvise() special case.
Thanks!
----8<---- From 546d7e1831c71599fc733d589e0d75f52e84826d Mon Sep 17 00:00:00 2001 From: Lorenzo Stoakes lorenzo.stoakes@oracle.com Date: Fri, 25 Oct 2024 18:05:48 +0100 Subject: [PATCH] mm: yield on fatal signal/cond_sched() in vector_madvise()
While we have to treat -ERESTARTNOINTR specially here as we are looping through a vector of operations and can't simply restart the entire operation, we mustn't hold up fatal signals or RT kernels.
For plain madvise() syscall returning -ERESTARTNOINTR does the right thing and checks fatal_signal_pending() before returning, right?
I believe so. But now you've caused me some doubt so let me double check and make absolutely sure :)
Uh actually can we be just returning -ERESTARTNOINTR or do we need to use restart_syscall()?
Yeah I was wondering about that, but restart_syscall() seems to set TIF_SIGPENDING, and I wondered if that was correct... but then I saw other places that seemed to use it direct so it seemed so.
Let's eliminiate doubt, will check this next week and make sure.
Yeah looks like we do actually, as the function is handled by arch_do_signal_or_restart():
do_syscall_64() -> sycall_exit_to_user_mode_work() -> __sycall_exit_to_user_mode_work() -> exit_to_user_mode_prepare() -> exit_to_user_mode_loop() -> arch_do_signal_or_restart() -> (possibly) get_signal()
And arch_do_signal_or_restart() is only invoked by exit_to_user_mode_loop() if _TIF_SIGPENDING or _TIF_NOTIFY_SIGNAL is set:
if (ti_work & (_TIF_SIGPENDING | _TIF_NOTIFY_SIGNAL)) arch_do_signal_or_restart(regs);
This is set by restart_syscall():
static inline int restart_syscall(void) { set_tsk_thread_flag(current, TIF_SIGPENDING); return -ERESTARTNOINTR; }
It's a nop if no signal is actually pending too, and it handily also deals with signals...
mm/madvise.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/mm/madvise.c b/mm/madvise.c index 48eba25e25fe..127aa5d86656 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1713,8 +1713,14 @@ static ssize_t vector_madvise(struct mm_struct *mm, struct iov_iter *iter, * we have already rescinded locks, it should be no problem to * simply try again. */
if (ret == -ERESTARTNOINTR)
if (ret == -ERESTARTNOINTR) {
if (fatal_signal_pending(current)) {
ret = -EINTR;
break;
}
cond_resched();
Should be unnecessary as we're calling an operation that takes a rwsem so there are reschedule points already. And with lazy preempt hopefully cond_resched()s will become history, so let's not add more only to delete later.
Ack will remove on respin.
continue;
if (ret < 0) break; iov_iter_advance(iter, iter_iov_len(iter));}
-- 2.47.0
For simplicitly with your other comments too I think I'll respin this next week.
So will respin to use restart_syscall() correctly (+ fix up your other points too).
Have tested and confirmed that failing to use restart_syscall() causes the -ERESTARTINTR to be returned and no restart, but using it works.