New subject: [PATCH v4.9.y,v4.4.y v2] futex,rt_mutex: Restructure rt_mutex_finish_proxy_lock()

8 Mar 2019

From: Peter Zijlstra peterz@infradead.org
commit 38d589f2fd08f1296aea3ce62bebd185125c6d81 upstream
With the ultimate goal of keeping rt_mutex wait_list and futex_q waiters
consistent it's necessary to split 'rt_mutex_futex_lock()' into finer
parts, such that only the actual blocking can be done without hb->lock
held.
Split split_mutex_finish_proxy_lock() into two parts, one that does the
blocking and one that does remove_waiter() when the lock acquire failed.
When the rtmutex was acquired successfully the waiter can be removed in the
acquisiton path safely, since there is no concurrency on the lock owner.
This means that, except for futex_lock_pi(), all wait_list modifications
are done with both hb->lock and wait_lock held.
[bigeasy@linutronix.de: fix for futex_requeue_pi_signal_restart]
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170322104152.001659630@infradead.org
Signed-off-by: Thomas Gleixner tglx@linutronix.de
Signed-off-by: Zubin Mithra zsm@chromium.org
---
Syzkaller reported a GPF in rt_mutex_top_waiter when fuzzing a 4.4
kernel. The corresponding call trace is below:
Call Trace:
 [<ffffffff81275c10>] remove_waiter+0x1e/0x1c8 kernel/locking/rtmutex.c:1082
 [<ffffffff81276279>] rt_mutex_start_proxy_lock+0x95/0xb1 kernel/locking/rtmutex.c:1685
 [<ffffffff812d8690>] futex_requeue+0x929/0xbc3 kernel/futex.c:1944
 [<ffffffff812dc10a>] do_futex+0xecf/0xf9a kernel/futex.c:3249
 [<ffffffff812dc428>] SYSC_futex kernel/futex.c:3287 [inline]
 [<ffffffff812dc428>] SyS_futex+0x253/0x29e kernel/futex.c:3255
 [<ffffffff832cab7a>] entry_SYSCALL_64_fastpath+0x31/0xb3
Code: e0 2a 53 48 c1 ea 03 80 3c 02 00 74 05 e8 f5 54 1e 00 49 8b 5c 24 40 b8 ff ff 37 00 48 c1 e0 2a 48 8d 7b 38 48 89 fa 48 c1 ea 03 <80> 3c 02 00 74 05 e8 d1 54 1e 00 4c 39 63 38 74 02 0f 0b 48 89 
RIP  [<ffffffff81274d83>] rt_mutex_top_waiter+0x42/0x5d kernel/locking/rtmutex_common.h:53
 RSP <ffff8800b6687998>
---[ end trace ab9c561cca7592c2 ]---
The PoC triggers a crash on the mainline kernel at tag:v4.4, stable at
4.4.y and the 4.4 kernel being fuzzed.
The following tests after applying this patch:
* LTP tests inside testcases/kernel/syscalls/futex
* Syzkaller Repro does not cause GPF with the backport
* Chrome OS tryjob tests
* Some tests from within glibc/ntpl
kernel/futex.c                  |  7 +++--
 kernel/locking/rtmutex.c        | 52 ++++++++++++++++++++++++++++-----
 kernel/locking/rtmutex_common.h |  8 +++--
 3 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index a26d217c99fe7..0c92c8d34ffa2 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2923,10 +2923,13 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
    	 */
    	WARN_ON(!q.pi_state);
    	pi_mutex = &q.pi_state->pi_mutex;
-		ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter);
-		debug_rt_mutex_free_waiter(&rt_waiter);
+		ret = rt_mutex_wait_proxy_lock(pi_mutex, to, &rt_waiter);
spin_lock(q.lock_ptr);
+		if (ret && !rt_mutex_cleanup_proxy_lock(pi_mutex, &rt_waiter))
+			ret = 0;
+
+		debug_rt_mutex_free_waiter(&rt_waiter);
    	/*
    	 * Fixup the pi_state owner and possibly acquire the lock if we
    	 * haven't already.
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index b066724d7a5be..dd173df9ee5e5 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1712,21 +1712,23 @@ struct task_struct *rt_mutex_next_owner(struct rt_mutex *lock)
 }
/**
- * rt_mutex_finish_proxy_lock() - Complete lock acquisition
+ * rt_mutex_wait_proxy_lock() - Wait for lock acquisition
  * @lock:		the rt_mutex we were woken on
  * @to:			the timeout, null if none. hrtimer should already have
  *			been started.
  * @waiter:		the pre-initialized rt_mutex_waiter
  *
- * Complete the lock acquisition started our behalf by another thread.
+ * Wait for the the lock acquisition started on our behalf by
+ * rt_mutex_start_proxy_lock(). Upon failure, the caller must call
+ * rt_mutex_cleanup_proxy_lock().
  *
  * Returns:
  *  0 - success
  * <0 - error, one of -EINTR, -ETIMEDOUT
  *
- * Special API call for PI-futex requeue support
+ * Special API call for PI-futex support
  */
-int rt_mutex_finish_proxy_lock(struct rt_mutex *lock,
+int rt_mutex_wait_proxy_lock(struct rt_mutex *lock,
    		       struct hrtimer_sleeper *to,
    		       struct rt_mutex_waiter *waiter)
 {
@@ -1739,9 +1741,6 @@ int rt_mutex_finish_proxy_lock(struct rt_mutex *lock,
    /* sleep on the mutex */
    ret = __rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, to, waiter);
-	if (unlikely(ret))
-		remove_waiter(lock, waiter);
-
    /*
     * try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
     * have to fix that up.
@@ -1752,3 +1751,42 @@ int rt_mutex_finish_proxy_lock(struct rt_mutex *lock,
return ret;
 }
+
+/**
+ * rt_mutex_cleanup_proxy_lock() - Cleanup failed lock acquisition
+ * @lock:		the rt_mutex we were woken on
+ * @waiter:		the pre-initialized rt_mutex_waiter
+ *
+ * Attempt to clean up after a failed rt_mutex_wait_proxy_lock().
+ *
+ * Unless we acquired the lock; we're still enqueued on the wait-list and can
+ * in fact still be granted ownership until we're removed. Therefore we can
+ * find we are in fact the owner and must disregard the
+ * rt_mutex_wait_proxy_lock() failure.
+ *
+ * Returns:
+ *  true  - did the cleanup, we done.
+ *  false - we acquired the lock after rt_mutex_wait_proxy_lock() returned,
+ *          caller should disregards its return value.
+ *
+ * Special API call for PI-futex support
+ */
+bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
+				 struct rt_mutex_waiter *waiter)
+{
+	bool cleanup = false;
+
+	raw_spin_lock_irq(&lock->wait_lock);
+	/*
+	 * Unless we're the owner; we're still enqueued on the wait_list.
+	 * So check if we became owner, if not, take us off the wait_list.
+	 */
+	if (rt_mutex_owner(lock) != current) {
+		remove_waiter(lock, waiter);
+		fixup_rt_mutex_waiters(lock);
+		cleanup = true;
+	}
+	raw_spin_unlock_irq(&lock->wait_lock);
+
+	return cleanup;
+}
diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h
index e317e1cbb3eba..6f8f68edb700c 100644
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -106,9 +106,11 @@ extern void rt_mutex_proxy_unlock(struct rt_mutex *lock,
 extern int rt_mutex_start_proxy_lock(struct rt_mutex *lock,
    			     struct rt_mutex_waiter *waiter,
    			     struct task_struct *task);
-extern int rt_mutex_finish_proxy_lock(struct rt_mutex *lock,
-				      struct hrtimer_sleeper *to,
-				      struct rt_mutex_waiter *waiter);
+extern int rt_mutex_wait_proxy_lock(struct rt_mutex *lock,
+			       struct hrtimer_sleeper *to,
+			       struct rt_mutex_waiter *waiter);
+extern bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock,
+				 struct rt_mutex_waiter *waiter);
 extern int rt_mutex_timed_futex_lock(struct rt_mutex *l, struct hrtimer_sleeper *to);
 extern bool rt_mutex_futex_unlock(struct rt_mutex *lock,
    			  struct wake_q_head *wqh);
-- 
2.21.0.360.g471c308f928-goog


    

[PATCH v4.9.y, v4.4.y v2] futex, rt_mutex: Restructure rt_mutex_finish_proxy_lock()