[RFC 7/9] backports: backport ww_mutex support - Linaro-mm-sig

28 Jul 2013

From: "Luis R. Rodriguez" mcgrof@do-not-panic.com
This backports the kernel's wound/wait style locks 040a0a371,
using the linux-stable v3.11-rc2 as a base for development.
Given the complexity to support debugging mutexes this backport
implementation is simplified by only making this feature availabe
if you to have DEBUG_MUTEXES and DEBUG_LOCK_ALLOC disabled. Given
that ww mutex is required for DRM this also means we must update
the kconfig for DRM and require you to also not be able to build
DRM if you have either of these options enabled. Support for
DEBUG_MUTEXES and DEBUG_LOCK_ALLOC can be added later by anyone
daring.
Part of the ww mutex addition to the kernel required modifying
the fast path mutex locking scheme by requiring you to deal
with the slow path alternatives on your own (refer to a41b56ef).
The reason for this change was that the mutex fastpath implementation
assumed your slowpath alternative can only be passed one argument
and the addition of ww mutexes requires dealing with the slow
path with a context passed.
It'd be painful to backport all asm for an optimized fastpath
implementation so we penalize the backport ww mutex fast path
by using the generic atomic_dec_return().
To backport a clean our own mutex_lock_common() with the least
amount of changes against upstream commits 2bd2c92c and 41fcb9f2
also needed to be backported. Commit 2bd2c92c dealt with adding
support for queue mutex spinners with an MCS lock, since this
cannot be backported for older kernels we provide empty inlines.
Commit 41fcb9f2 just removed SCHED_FEAT_OWNER_SPIN as it was an
early hack, the only thing required to backport this commit was
to provide an alternative declaration for mutex_spin_on_owner()
as it was declared non-inline for older kernels.
Finally c5491ea7 required backporting schedule_preempt_disabled()
as well but that just consisted of carrying over the original
implementation. Since its not exported we need to reimplement
it to make it available to our internal core ww mutex port.
mcgrof@frijol ~/linux-stable (git::master)$ git describe --contains 040a0a371
v3.11-rc1~147^2~5
mcgrof@frijol ~/linux-stable (git::master)$ git describe --contains a41b56ef
v3.11-rc1~147^2~6
mcgrof@frijol ~/linux-stable (git::master)$ git describe --contains 2bd2c92c
v3.10-rc1~200^2~3
mcgrof@frijol ~/linux-stable (git::master)$ git describe --contains 41fcb9f2
v3.10-rc1~200^2~5
mcgrof@frijol ~/linux-stable (git::master)$ git describe --contains c5491ea7
v3.4-rc1~3^2~27
commit 040a0a37100563754bb1fee6ff6427420bcfa609
Author: Maarten Lankhorst maarten.lankhorst@canonical.com
Date:   Mon Jun 24 10:30:04 2013 +0200
mutex: Add support for wound/wait style locks
Wound/wait mutexes are used when other multiple lock
    acquisitions of a similar type can be done in an arbitrary
    order. The deadlock handling used here is called wait/wound in
    the RDBMS literature: The older tasks waits until it can acquire
    the contended lock. The younger tasks needs to back off and drop
    all the locks it is currently holding, i.e. the younger task is
    wounded.
For full documentation please read Documentation/ww-mutex-design.txt.
References: https://lwn.net/Articles/548909/
    Signed-off-by: Maarten Lankhorst maarten.lankhorst@canonical.com
    Acked-by: Daniel Vetter daniel.vetter@ffwll.ch
    Acked-by: Rob Clark robdclark@gmail.com
    Acked-by: Peter Zijlstra a.p.zijlstra@chello.nl
    Cc: dri-devel@lists.freedesktop.org
    Cc: linaro-mm-sig@lists.linaro.org
    Cc: rostedt@goodmis.org
    Cc: daniel@ffwll.ch
    Cc: Linus Torvalds torvalds@linux-foundation.org
    Cc: Andrew Morton akpm@linux-foundation.org
    Cc: Thomas Gleixner tglx@linutronix.de
    Link: http://lkml.kernel.org/r/51C8038C.9000106@canonical.com
    Signed-off-by: Ingo Molnar mingo@kernel.org
commit a41b56efa70e060f650aeb54740aaf52044a1ead
Author: Maarten Lankhorst maarten.lankhorst@canonical.com
Date:   Thu Jun 20 13:31:05 2013 +0200
arch: Make __mutex_fastpath_lock_retval return whether fastpath succeeded or not
This will allow me to call functions that have multiple
    arguments if fastpath fails. This is required to support ticket
    mutexes, because they need to be able to pass an extra argument
    to the fail function.
Originally I duplicated the functions, by adding
    __mutex_fastpath_lock_retval_arg. This ended up being just a
    duplication of the existing function, so a way to test if
    fastpath was called ended up being better.
This also cleaned up the reservation mutex patch some by being
    able to call an atomic_set instead of atomic_xchg, and making it
    easier to detect if the wrong unlock function was previously
    used.
Signed-off-by: Maarten Lankhorst maarten.lankhorst@canonical.com
    Acked-by: Peter Zijlstra a.p.zijlstra@chello.nl
    Cc: dri-devel@lists.freedesktop.org
    Cc: linaro-mm-sig@lists.linaro.org
    Cc: robclark@gmail.com
    Cc: rostedt@goodmis.org
    Cc: daniel@ffwll.ch
    Cc: Linus Torvalds torvalds@linux-foundation.org
    Cc: Andrew Morton akpm@linux-foundation.org
    Cc: Thomas Gleixner tglx@linutronix.de
    Link: http://lkml.kernel.org/r/20130620113105.4001.83929.stgit@patser
    Signed-off-by: Ingo Molnar mingo@kernel.org
commit 2bd2c92cf07cc4a373bf316c75b78ac465fefd35
Author: Waiman Long Waiman.Long@hp.com
Date:   Wed Apr 17 15:23:13 2013 -0400
mutex: Queue mutex spinners with MCS lock to reduce cacheline contention
<-- snip -->
commit 41fcb9f230bf773656d1768b73000ef720bf00c3
Author: Waiman Long Waiman.Long@hp.com
Date:   Wed Apr 17 15:23:11 2013 -0400
mutex: Move mutex spinning code from sched/core.c back to mutex.c
<-- snip -->
commit c5491ea779793f977d282754db478157cc409d82
Author: Thomas Gleixner tglx@linutronix.de
Date:   Mon Mar 21 12:09:35 2011 +0100
sched/rt: Add schedule_preempt_disabled()
<-- snip -->
Cc: maarten.lankhorst@canonical.com
Cc: Daniel Vetter daniel.vetter@ffwll.ch
Cc: Rob Clark robdclark@gmail.com
Cc: Peter Zijlstra a.p.zijlstra@chello.nl
Cc: dri-devel@lists.freedesktop.org
Cc: linaro-mm-sig@lists.linaro.org
Cc: rostedt@goodmis.org
Cc: daniel@ffwll.ch
Cc: Linus Torvalds torvalds@linux-foundation.org
Cc: Andrew Morton akpm@linux-foundation.org
Cc: Thomas Gleixner tglx@linutronix.de
Signed-off-by: Luis R. Rodriguez mcgrof@do-not-panic.com
---
 backport/backport-include/linux/ww_mutex.h |  333 ++++++++++++++
 backport/compat/Kconfig                    |   11 +
 backport/compat/Makefile                   |    1 +
 backport/compat/kernel/ww_mutex.c          |  667 ++++++++++++++++++++++++++++
 4 files changed, 1012 insertions(+)
 create mode 100644 backport/backport-include/linux/ww_mutex.h
 create mode 100644 backport/compat/kernel/ww_mutex.c

diff --git a/backport/backport-include/linux/ww_mutex.h b/backport/backport-include/linux/ww_mutex.h
new file mode 100644
index 0000000..0953939
--- /dev/null
+++ b/backport/backport-include/linux/ww_mutex.h
@@ -0,0 +1,333 @@
+#ifndef __BACKPORT_LINUX_WW_MUTEX_H
+#define __BACKPORT_LINUX_WW_MUTEX_H
+
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,11,0)
+#include_next <linux/ww_mutex.h>
+#else
+#ifdef CPTCFG_BACKPORT_BUILD_WW_MUTEX
+/*
+ * Wound/Wait Mutexes: blocking mutual exclusion locks with deadlock avoidance
+ *
+ * Original mutex implementation started by Ingo Molnar:
+ *
+ *  Copyright (C) 2004, 2005, 2006 Red Hat, Inc., Ingo Molnar mingo@redhat.com
+ *
+ * Wound/wait implementation:
+ *  Copyright (C) 2013 Canonical Ltd.
+ *
+ * This file contains the main data structure and API definitions.
+ */
+
+#include <linux/mutex.h>
+
+struct ww_class {
+	atomic_long_t stamp;
+	struct lock_class_key acquire_key;
+	struct lock_class_key mutex_key;
+	const char *acquire_name;
+	const char *mutex_name;
+};
+
+struct ww_acquire_ctx {
+	struct task_struct *task;
+	unsigned long stamp;
+	unsigned acquired;
+};
+
+struct ww_mutex {
+	struct mutex base;
+	struct ww_acquire_ctx *ctx;
+};
+
+# define __WW_CLASS_MUTEX_INITIALIZER(lockname, ww_class)
+
+#define __WW_CLASS_INITIALIZER(ww_class) \
+		{ .stamp = ATOMIC_LONG_INIT(0) \
+		, .acquire_name = #ww_class "_acquire" \
+		, .mutex_name = #ww_class "_mutex" }
+
+#define __WW_MUTEX_INITIALIZER(lockname, class) \
+		{ .base = { __MUTEX_INITIALIZER(lockname) } \
+		__WW_CLASS_MUTEX_INITIALIZER(lockname, class) }
+
+#define DEFINE_WW_CLASS(classname) \
+	struct ww_class classname = __WW_CLASS_INITIALIZER(classname)
+
+#define DEFINE_WW_MUTEX(mutexname, ww_class) \
+	struct ww_mutex mutexname = __WW_MUTEX_INITIALIZER(mutexname, ww_class)
+
+/**
+ * ww_mutex_init - initialize the w/w mutex
+ * @lock: the mutex to be initialized
+ * @ww_class: the w/w class the mutex should belong to
+ *
+ * Initialize the w/w mutex to unlocked state and associate it with the given
+ * class.
+ *
+ * It is not allowed to initialize an already locked mutex.
+ */
+#define ww_mutex_init LINUX_BACKPORT(ww_mutex_init)
+static inline void ww_mutex_init(struct ww_mutex *lock,
+				 struct ww_class *ww_class)
+{
+	__mutex_init(&lock->base, ww_class->mutex_name, &ww_class->mutex_key);
+	lock->ctx = NULL;
+}
+
+/**
+ * ww_acquire_init - initialize a w/w acquire context
+ * @ctx: w/w acquire context to initialize
+ * @ww_class: w/w class of the context
+ *
+ * Initializes an context to acquire multiple mutexes of the given w/w class.
+ *
+ * Context-based w/w mutex acquiring can be done in any order whatsoever within
+ * a given lock class. Deadlocks will be detected and handled with the
+ * wait/wound logic.
+ *
+ * Mixing of context-based w/w mutex acquiring and single w/w mutex locking can
+ * result in undetected deadlocks and is so forbidden. Mixing different contexts
+ * for the same w/w class when acquiring mutexes can also result in undetected
+ * deadlocks, and is hence also forbidden. Both types of abuse will be caught by
+ * enabling CONFIG_PROVE_LOCKING.
+ *
+ * Nesting of acquire contexts for _different_ w/w classes is possible, subject
+ * to the usual locking rules between different lock classes.
+ *
+ * An acquire context must be released with ww_acquire_fini by the same task
+ * before the memory is freed. It is recommended to allocate the context itself
+ * on the stack.
+ */
+#define ww_acquire_init LINUX_BACKPORT(ww_acquire_init)
+static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
+				   struct ww_class *ww_class)
+{
+	ctx->task = current;
+	ctx->stamp = atomic_long_inc_return(&ww_class->stamp);
+	ctx->acquired = 0;
+}
+
+/**
+ * ww_acquire_done - marks the end of the acquire phase
+ * @ctx: the acquire context
+ *
+ * Marks the end of the acquire phase, any further w/w mutex lock calls using
+ * this context are forbidden.
+ *
+ * Calling this function is optional, it is just useful to document w/w mutex
+ * code and clearly designated the acquire phase from actually using the locked
+ * data structures.
+ */
+#define ww_acquire_done LINUX_BACKPORT(ww_acquire_done)
+static inline void ww_acquire_done(struct ww_acquire_ctx *ctx)
+{
+}
+
+/**
+ * ww_acquire_fini - releases a w/w acquire context
+ * @ctx: the acquire context to free
+ *
+ * Releases a w/w acquire context. This must be called _after_ all acquired w/w
+ * mutexes have been released with ww_mutex_unlock.
+ */
+#define ww_acquire_fini LINUX_BACKPORT(ww_acquire_fini)
+static inline void ww_acquire_fini(struct ww_acquire_ctx *ctx)
+{
+}
+
+#define __ww_mutex_lock LINUX_BACKPORT(__ww_mutex_lock)
+extern int __must_check __ww_mutex_lock(struct ww_mutex *lock,
+					struct ww_acquire_ctx *ctx);
+#define __ww_mutex_lock_interruptible LINUX_BACKPORT(__ww_mutex_lock_interruptible)
+extern int __must_check __ww_mutex_lock_interruptible(struct ww_mutex *lock,
+						      struct ww_acquire_ctx *ctx);
+
+/**
+ * ww_mutex_lock - acquire the w/w mutex
+ * @lock: the mutex to be acquired
+ * @ctx: w/w acquire context, or NULL to acquire only a single lock.
+ *
+ * Lock the w/w mutex exclusively for this task.
+ *
+ * Deadlocks within a given w/w class of locks are detected and handled with the
+ * wait/wound algorithm. If the lock isn't immediately avaiable this function
+ * will either sleep until it is (wait case). Or it selects the current context
+ * for backing off by returning -EDEADLK (wound case). Trying to acquire the
+ * same lock with the same context twice is also detected and signalled by
+ * returning -EALREADY. Returns 0 if the mutex was successfully acquired.
+ *
+ * In the wound case the caller must release all currently held w/w mutexes for
+ * the given context and then wait for this contending lock to be available by
+ * calling ww_mutex_lock_slow. Alternatively callers can opt to not acquire this
+ * lock and proceed with trying to acquire further w/w mutexes (e.g. when
+ * scanning through lru lists trying to free resources).
+ *
+ * The mutex must later on be released by the same task that
+ * acquired it. The task may not exit without first unlocking the mutex. Also,
+ * kernel memory where the mutex resides must not be freed with the mutex still
+ * locked. The mutex must first be initialized (or statically defined) before it
+ * can be locked. memset()-ing the mutex to 0 is not allowed. The mutex must be
+ * of the same w/w lock class as was used to initialize the acquire context.
+ *
+ * A mutex acquired with this function must be released with ww_mutex_unlock.
+ */
+#define ww_mutex_lock LINUX_BACKPORT(ww_mutex_lock)
+static inline int ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
+{
+	if (ctx)
+		return __ww_mutex_lock(lock, ctx);
+
+	mutex_lock(&lock->base);
+	return 0;
+}
+
+/**
+ * ww_mutex_lock_interruptible - acquire the w/w mutex, interruptible
+ * @lock: the mutex to be acquired
+ * @ctx: w/w acquire context
+ *
+ * Lock the w/w mutex exclusively for this task.
+ *
+ * Deadlocks within a given w/w class of locks are detected and handled with the
+ * wait/wound algorithm. If the lock isn't immediately avaiable this function
+ * will either sleep until it is (wait case). Or it selects the current context
+ * for backing off by returning -EDEADLK (wound case). Trying to acquire the
+ * same lock with the same context twice is also detected and signalled by
+ * returning -EALREADY. Returns 0 if the mutex was successfully acquired. If a
+ * signal arrives while waiting for the lock then this function returns -EINTR.
+ *
+ * In the wound case the caller must release all currently held w/w mutexes for
+ * the given context and then wait for this contending lock to be available by
+ * calling ww_mutex_lock_slow_interruptible. Alternatively callers can opt to
+ * not acquire this lock and proceed with trying to acquire further w/w mutexes
+ * (e.g. when scanning through lru lists trying to free resources).
+ *
+ * The mutex must later on be released by the same task that
+ * acquired it. The task may not exit without first unlocking the mutex. Also,
+ * kernel memory where the mutex resides must not be freed with the mutex still
+ * locked. The mutex must first be initialized (or statically defined) before it
+ * can be locked. memset()-ing the mutex to 0 is not allowed. The mutex must be
+ * of the same w/w lock class as was used to initialize the acquire context.
+ *
+ * A mutex acquired with this function must be released with ww_mutex_unlock.
+ */
+#define ww_mutex_lock_interruptible LINUX_BACKPORT(ww_mutex_lock_interruptible)
+static inline int __must_check ww_mutex_lock_interruptible(struct ww_mutex *lock,
+							   struct ww_acquire_ctx *ctx)
+{
+	if (ctx)
+		return __ww_mutex_lock_interruptible(lock, ctx);
+	else
+		return mutex_lock_interruptible(&lock->base);
+}
+
+/**
+ * ww_mutex_lock_slow - slowpath acquiring of the w/w mutex
+ * @lock: the mutex to be acquired
+ * @ctx: w/w acquire context
+ *
+ * Acquires a w/w mutex with the given context after a wound case. This function
+ * will sleep until the lock becomes available.
+ *
+ * The caller must have released all w/w mutexes already acquired with the
+ * context and then call this function on the contended lock.
+ *
+ * Afterwards the caller may continue to (re)acquire the other w/w mutexes it
+ * needs with ww_mutex_lock. Note that the -EALREADY return code from
+ * ww_mutex_lock can be used to avoid locking this contended mutex twice.
+ *
+ * It is forbidden to call this function with any other w/w mutexes associated
+ * with the context held. It is forbidden to call this on anything else than the
+ * contending mutex.
+ *
+ * Note that the slowpath lock acquiring can also be done by calling
+ * ww_mutex_lock directly. This function here is simply to help w/w mutex
+ * locking code readability by clearly denoting the slowpath.
+ */
+#define ww_mutex_lock_slow LINUX_BACKPORT(ww_mutex_lock_slow)
+static inline void
+ww_mutex_lock_slow(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
+{
+	int ret;
+	ret = ww_mutex_lock(lock, ctx);
+	(void)ret;
+}
+
+/**
+ * ww_mutex_lock_slow_interruptible - slowpath acquiring of the w/w mutex, interruptible
+ * @lock: the mutex to be acquired
+ * @ctx: w/w acquire context
+ *
+ * Acquires a w/w mutex with the given context after a wound case. This function
+ * will sleep until the lock becomes available and returns 0 when the lock has
+ * been acquired. If a signal arrives while waiting for the lock then this
+ * function returns -EINTR.
+ *
+ * The caller must have released all w/w mutexes already acquired with the
+ * context and then call this function on the contended lock.
+ *
+ * Afterwards the caller may continue to (re)acquire the other w/w mutexes it
+ * needs with ww_mutex_lock. Note that the -EALREADY return code from
+ * ww_mutex_lock can be used to avoid locking this contended mutex twice.
+ *
+ * It is forbidden to call this function with any other w/w mutexes associated
+ * with the given context held. It is forbidden to call this on anything else
+ * than the contending mutex.
+ *
+ * Note that the slowpath lock acquiring can also be done by calling
+ * ww_mutex_lock_interruptible directly. This function here is simply to help
+ * w/w mutex locking code readability by clearly denoting the slowpath.
+ */
+#define ww_mutex_lock_slow_interruptible LINUX_BACKPORT(ww_mutex_lock_slow_interruptible)
+static inline int __must_check
+ww_mutex_lock_slow_interruptible(struct ww_mutex *lock,
+				 struct ww_acquire_ctx *ctx)
+{
+	return ww_mutex_lock_interruptible(lock, ctx);
+}
+
+#define ww_mutex_unlock LINUX_BACKPORT(ww_mutex_unlock)
+extern void ww_mutex_unlock(struct ww_mutex *lock);
+
+/**
+ * ww_mutex_trylock - tries to acquire the w/w mutex without acquire context
+ * @lock: mutex to lock
+ *
+ * Trylocks a mutex without acquire context, so no deadlock detection is
+ * possible. Returns 1 if the mutex has been acquired successfully, 0 otherwise.
+ */
+#define ww_mutex_trylock LINUX_BACKPORT(ww_mutex_trylock)
+static inline int __must_check ww_mutex_trylock(struct ww_mutex *lock)
+{
+	return mutex_trylock(&lock->base);
+}
+
+/***
+ * ww_mutex_destroy - mark a w/w mutex unusable
+ * @lock: the mutex to be destroyed
+ *
+ * This function marks the mutex uninitialized, and any subsequent
+ * use of the mutex is forbidden. The mutex must not be locked when
+ * this function is called.
+ */
+#define ww_mutex_destroy LINUX_BACKPORT(ww_mutex_destroy)
+static inline void ww_mutex_destroy(struct ww_mutex *lock)
+{
+	mutex_destroy(&lock->base);
+}
+
+/**
+ * ww_mutex_is_locked - is the w/w mutex locked
+ * @lock: the mutex to be queried
+ *
+ * Returns 1 if the mutex is locked, 0 if unlocked.
+ */
+#define ww_mutex_is_locked LINUX_BACKPORT(ww_mutex_is_locked)
+static inline bool ww_mutex_is_locked(struct ww_mutex *lock)
+{
+	return mutex_is_locked(&lock->base);
+}
+
+#endif /* CPTCFG_BACKPORT_BUILD_WW_MUTEX */
+#endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(3,11,0) */
+#endif /* __BACKPORT_LINUX_WW_MUTEX_H */
diff --git a/backport/compat/Kconfig b/backport/compat/Kconfig
index e2f0cdd..f3c1ab3 100644
--- a/backport/compat/Kconfig
+++ b/backport/compat/Kconfig
@@ -185,6 +185,17 @@ config BACKPORT_LEDS_CLASS
 config BACKPORT_LEDS_TRIGGERS
    bool
+config BACKPORT_BUILD_WW_MUTEX
+	bool
+	# Build only if on kernels < 3.11
+	# For now only DRM drivers use ww mutexes.
+	depends on DRM && BACKPORT_KERNEL_3_11
+	default y if BACKPORT_USERSEL_BUILD_ALL
+	# probably a bad idea if you have these options given we
+	# ripped those options out.
+	depends on !DEBUG_MUTEXES
+	depends on !DEBUG_LOCK_ALLOC
+
 config BACKPORT_BUILD_RADIX_HELPERS
    bool
    # You have selected to build backported DRM drivers
diff --git a/backport/compat/Makefile b/backport/compat/Makefile
index 252290e..fec01c4 100644
--- a/backport/compat/Makefile
+++ b/backport/compat/Makefile
@@ -41,3 +41,4 @@ compat-$(CPTCFG_BACKPORT_BUILD_KFIFO) += kfifo.o
 compat-$(CPTCFG_BACKPORT_BUILD_GENERIC_ATOMIC64) += compat_atomic.o
 compat-$(CPTCFG_BACKPORT_BUILD_DMA_SHARED_HELPERS) += dma-shared-helpers.o
 compat-$(CPTCFG_BACKPORT_BUILD_RADIX_HELPERS) += lib-radix-tree-helpers.o
+compat-$(CPTCFG_BACKPORT_BUILD_WW_MUTEX) += kernel/ww_mutex.o
diff --git a/backport/compat/kernel/ww_mutex.c b/backport/compat/kernel/ww_mutex.c
new file mode 100644
index 0000000..257c2a4
--- /dev/null
+++ b/backport/compat/kernel/ww_mutex.c
@@ -0,0 +1,667 @@
+/*
+ * Copyright (c) 2013  Luis R. Rodriguez mcgrof@do-not-panic.com
+ *
+ * Backport ww mutex for older kernels. This is not supported when
+ * DEBUG_MUTEXES or DEBUG_LOCK_ALLOC is enabled.
+ *
+ * Taken from: kernel/mutex.c - via linux-stable v3.11-rc2
+ *
+ * Mutexes: blocking mutual exclusion locks
+ *
+ * Started by Ingo Molnar:
+ *
+ *  Copyright (C) 2004, 2005, 2006 Red Hat, Inc., Ingo Molnar mingo@redhat.com
+ *
+ * Many thanks to Arjan van de Ven, Thomas Gleixner, Steven Rostedt and
+ * David Howells for suggestions and improvements.
+ *
+ *  - Adaptive spinning for mutexes by Peter Zijlstra. (Ported to mainline
+ *    from the -rt tree, where it was originally implemented for rtmutexes
+ *    by Steven Rostedt, based on work by Gregory Haskins, Peter Morreale
+ *    and Sven Dietrich.
+ *
+ * Also see Documentation/mutex-design.txt.
+ */
+
+#include <linux/mutex.h>
+#include <linux/ww_mutex.h>
+#include <asm/mutex.h>
+#include <linux/sched.h>
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,9,0)
+#include <linux/sched/rt.h>
+#endif
+#include <linux/export.h>
+#include <linux/spinlock.h>
+#include <linux/interrupt.h>
+#include <linux/debug_locks.h>
+#include <linux/version.h>
+
+/*
+ * A negative mutex count indicates that waiters are sleeping waiting for the
+ * mutex.
+ */
+#define	MUTEX_SHOW_NO_WAITER(mutex)	(atomic_read(&(mutex)->count) >= 0)
+
+#define spin_lock_mutex(lock, flags) \
+	do { spin_lock(lock); (void)(flags); } while (0)
+#define spin_unlock_mutex(lock, flags) \
+	do { spin_unlock(lock); (void)(flags); } while (0)
+#define mutex_remove_waiter(lock, waiter, ti) \
+	__list_del((waiter)->list.prev, (waiter)->list.next)
+
+#ifdef CONFIG_SMP
+static inline void mutex_set_owner(struct mutex *lock)
+{
+	lock->owner = current;
+}
+
+static inline void mutex_clear_owner(struct mutex *lock)
+{
+	lock->owner = NULL;
+}
+#else
+static inline void mutex_set_owner(struct mutex *lock)
+{
+}
+
+static inline void mutex_clear_owner(struct mutex *lock)
+{
+}
+#endif
+
+
+#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,10,0) /* 2bd2c92c and 41fcb9f2 */
+/*
+ * In order to avoid a stampede of mutex spinners from acquiring the mutex
+ * more or less simultaneously, the spinners need to acquire a MCS lock
+ * first before spinning on the owner field.
+ *
+ * We don't inline mspin_lock() so that perf can correctly account for the
+ * time spent in this lock function.
+ */
+struct mspin_node {
+	struct mspin_node *next ;
+	int		  locked;	/* 1 if lock acquired */
+};
+#define	MLOCK(mutex)	((struct mspin_node **)&((mutex)->spin_mlock))
+
+static noinline
+void mspin_lock(struct mspin_node **lock, struct mspin_node *node)
+{
+	struct mspin_node *prev;
+
+	/* Init node */
+	node->locked = 0;
+	node->next   = NULL;
+
+	prev = xchg(lock, node);
+	if (likely(prev == NULL)) {
+		/* Lock acquired */
+		node->locked = 1;
+		return;
+	}
+	ACCESS_ONCE(prev->next) = node;
+	smp_wmb();
+	/* Wait until the lock holder passes the lock down */
+	while (!ACCESS_ONCE(node->locked))
+		arch_mutex_cpu_relax();
+}
+
+static void mspin_unlock(struct mspin_node **lock, struct mspin_node *node)
+{
+	struct mspin_node *next = ACCESS_ONCE(node->next);
+
+	if (likely(!next)) {
+		/*
+		 * Release the lock by setting it to NULL
+		 */
+		if (cmpxchg(lock, node, NULL) == node)
+			return;
+		/* Wait until the next pointer is set */
+		while (!(next = ACCESS_ONCE(node->next)))
+			arch_mutex_cpu_relax();
+	}
+	ACCESS_ONCE(next->locked) = 1;
+	smp_wmb();
+}
+
+/*
+ * Mutex spinning code migrated from kernel/sched/core.c
+ */
+
+static inline bool owner_running(struct mutex *lock, struct task_struct *owner)
+{
+	if (lock->owner != owner)
+		return false;
+
+	/*
+	 * Ensure we emit the owner->on_cpu, dereference _after_ checking
+	 * lock->owner still matches owner, if that fails, owner might
+	 * point to free()d memory, if it still matches, the rcu_read_lock()
+	 * ensures the memory stays valid.
+	 */
+	barrier();
+
+	return owner->on_cpu;
+}
+
+/*
+ * Look out! "owner" is an entirely speculative pointer
+ * access and not reliable.
+ */
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,10,0)
+static noinline
+#endif
+int mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
+{
+	rcu_read_lock();
+	while (owner_running(lock, owner)) {
+		if (need_resched())
+			break;
+
+		arch_mutex_cpu_relax();
+	}
+	rcu_read_unlock();
+
+	/*
+	 * We break out the loop above on need_resched() and when the
+	 * owner changed, which is a sign for heavy contention. Return
+	 * success only when lock->owner is NULL.
+	 */
+	return lock->owner == NULL;
+}
+
+/*
+ * Initial check for entering the mutex spinning loop
+ */
+static inline int mutex_can_spin_on_owner(struct mutex *lock)
+{
+	int retval = 1;
+
+	rcu_read_lock();
+	if (lock->owner)
+		retval = lock->owner->on_cpu;
+	rcu_read_unlock();
+	/*
+	 * if lock->owner is not set, the mutex owner may have just acquired
+	 * it and not set the owner yet or the mutex has been released.
+	 */
+	return retval;
+}
+#else /* Backport 2bd2c92c: help keep backport_mutex_lock_common() clean */
+
+struct mspin_node {
+};
+#define	MLOCK(mutex) NULL
+
+static noinline
+void mspin_lock(struct mspin_node **lock, struct mspin_node *node)
+{
+}
+
+static void mspin_unlock(struct mspin_node **lock, struct mspin_node *node)
+{
+}
+
+static inline bool owner_running(struct mutex *lock, struct task_struct *owner)
+{
+}
+
+int mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
+{
+}
+
+static inline int mutex_can_spin_on_owner(struct mutex *lock)
+{
+	return 1;
+}
+#endif /* LINUX_VERSION_CODE >= KERNEL_VERSION(3,10,0) */
+#endif /* CONFIG_MUTEX_SPIN_ON_OWNER */
+
+/*
+ * Release the lock, slowpath:
+ */
+static inline void
+__mutex_unlock_common_slowpath(atomic_t *lock_count, int nested)
+{
+	struct mutex *lock = container_of(lock_count, struct mutex, count);
+	unsigned long flags;
+
+	spin_lock_mutex(&lock->wait_lock, flags);
+	mutex_release(&lock->dep_map, nested, _RET_IP_);
+	/* debug_mutex_unlock(lock); */
+
+	/*
+	 * some architectures leave the lock unlocked in the fastpath failure
+	 * case, others need to leave it locked. In the later case we have to
+	 * unlock it here
+	 */
+	if (__mutex_slowpath_needs_to_unlock())
+		atomic_set(&lock->count, 1);
+
+	if (!list_empty(&lock->wait_list)) {
+		/* get the first entry from the wait-list: */
+		struct mutex_waiter *waiter =
+				list_entry(lock->wait_list.next,
+					   struct mutex_waiter, list);
+
+		/* debug_mutex_wake_waiter(lock, waiter); */
+
+		wake_up_process(waiter->task);
+	}
+
+	spin_unlock_mutex(&lock->wait_lock, flags);
+}
+
+/*
+ * Release the lock, slowpath:
+ */
+static __used noinline void
+__mutex_unlock_slowpath(atomic_t *lock_count)
+{
+	__mutex_unlock_common_slowpath(lock_count, 1);
+}
+
+/**
+ * ww_mutex_unlock - release the w/w mutex
+ * @lock: the mutex to be released
+ *
+ * Unlock a mutex that has been locked by this task previously with any of the
+ * ww_mutex_lock* functions (with or without an acquire context). It is
+ * forbidden to release the locks after releasing the acquire context.
+ *
+ * This function must not be used in interrupt context. Unlocking
+ * of a unlocked mutex is not allowed.
+ */
+void __sched ww_mutex_unlock(struct ww_mutex *lock)
+{
+	/*
+	 * The unlocking fastpath is the 0->1 transition from 'locked'
+	 * into 'unlocked' state:
+	 */
+	if (lock->ctx) {
+		if (lock->ctx->acquired > 0)
+			lock->ctx->acquired--;
+		lock->ctx = NULL;
+	}
+
+	__mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath);
+}
+EXPORT_SYMBOL_GPL(ww_mutex_unlock);
+
+static inline int __sched
+__mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
+{
+	struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
+	struct ww_acquire_ctx *hold_ctx = ACCESS_ONCE(ww->ctx);
+
+	if (!hold_ctx)
+		return 0;
+
+	if (unlikely(ctx == hold_ctx))
+		return -EALREADY;
+
+	if (ctx->stamp - hold_ctx->stamp <= LONG_MAX &&
+	    (ctx->stamp != hold_ctx->stamp || ctx > hold_ctx)) {
+		return -EDEADLK;
+	}
+
+	return 0;
+}
+
+static __always_inline void ww_mutex_lock_acquired(struct ww_mutex *ww,
+						   struct ww_acquire_ctx *ww_ctx)
+{
+	ww_ctx->acquired++;
+}
+
+/*
+ * after acquiring lock with fastpath or when we lost out in contested
+ * slowpath, set ctx and wake up any waiters so they can recheck.
+ *
+ * This function is never called when CONFIG_DEBUG_LOCK_ALLOC is set,
+ * as the fastpath and opportunistic spinning are disabled in that case.
+ */
+static __always_inline void
+ww_mutex_set_context_fastpath(struct ww_mutex *lock,
+			       struct ww_acquire_ctx *ctx)
+{
+	unsigned long flags;
+	struct mutex_waiter *cur;
+
+	ww_mutex_lock_acquired(lock, ctx);
+
+	lock->ctx = ctx;
+
+	/*
+	 * The lock->ctx update should be visible on all cores before
+	 * the atomic read is done, otherwise contended waiters might be
+	 * missed. The contended waiters will either see ww_ctx == NULL
+	 * and keep spinning, or it will acquire wait_lock, add itself
+	 * to waiter list and sleep.
+	 */
+	smp_mb(); /* ^^^ */
+
+	/*
+	 * Check if lock is contended, if not there is nobody to wake up
+	 */
+	if (likely(atomic_read(&lock->base.count) == 0))
+		return;
+
+	/*
+	 * Uh oh, we raced in fastpath, wake up everyone in this case,
+	 * so they can see the new lock->ctx.
+	 */
+	spin_lock_mutex(&lock->base.wait_lock, flags);
+	list_for_each_entry(cur, &lock->base.wait_list, list) {
+		/* debug_mutex_wake_waiter(&lock->base, cur); */
+		wake_up_process(cur->task);
+	}
+	spin_unlock_mutex(&lock->base.wait_lock, flags);
+}
+
+/**
+ * backport_schedule_preempt_disabled - called with preemption disabled
+ *
+ * Backports c5491ea7. This is not exported so we leave it
+ * here as this is the only current core user on backports.
+ * Although available on >= 3.4 its only for in-kernel code so
+ * we provide our own.
+ *
+ * Returns with preemption disabled. Note: preempt_count must be 1
+ */
+static void __sched backport_schedule_preempt_disabled(void)
+{
+	preempt_enable_no_resched();
+	schedule();
+	preempt_disable();
+}
+
+/*
+ * Lock a mutex (possibly interruptible), slowpath:
+ */
+static __always_inline int __sched
+__backport_mutex_lock_common(struct mutex *lock, long state,
+			     unsigned int subclass,
+			     struct lockdep_map *nest_lock, unsigned long ip,
+			     struct ww_acquire_ctx *ww_ctx)
+{
+	struct task_struct *task = current;
+	struct mutex_waiter waiter;
+	unsigned long flags;
+	int ret;
+
+	preempt_disable();
+	mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
+
+#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
+	/*
+	 * Optimistic spinning.
+	 *
+	 * We try to spin for acquisition when we find that there are no
+	 * pending waiters and the lock owner is currently running on a
+	 * (different) CPU.
+	 *
+	 * The rationale is that if the lock owner is running, it is likely to
+	 * release the lock soon.
+	 *
+	 * Since this needs the lock owner, and this mutex implementation
+	 * doesn't track the owner atomically in the lock field, we need to
+	 * track it non-atomically.
+	 *
+	 * We can't do this for DEBUG_MUTEXES because that relies on wait_lock
+	 * to serialize everything.
+	 *
+	 * The mutex spinners are queued up using MCS lock so that only one
+	 * spinner can compete for the mutex. However, if mutex spinning isn't
+	 * going to happen, there is no point in going through the lock/unlock
+	 * overhead.
+	 */
+	if (!mutex_can_spin_on_owner(lock))
+		goto slowpath;
+
+	for (;;) {
+		struct task_struct *owner;
+		struct mspin_node  node;
+
+		if (!__builtin_constant_p(ww_ctx == NULL) && ww_ctx->acquired > 0) {
+			struct ww_mutex *ww;
+
+			ww = container_of(lock, struct ww_mutex, base);
+			/*
+			 * If ww->ctx is set the contents are undefined, only
+			 * by acquiring wait_lock there is a guarantee that
+			 * they are not invalid when reading.
+			 *
+			 * As such, when deadlock detection needs to be
+			 * performed the optimistic spinning cannot be done.
+			 */
+			if (ACCESS_ONCE(ww->ctx))
+				break;
+		}
+
+		/*
+		 * If there's an owner, wait for it to either
+		 * release the lock or go to sleep.
+		 */
+		mspin_lock(MLOCK(lock), &node);
+		owner = ACCESS_ONCE(lock->owner);
+		if (owner && !mutex_spin_on_owner(lock, owner)) {
+			mspin_unlock(MLOCK(lock), &node);
+			break;
+		}
+
+		if ((atomic_read(&lock->count) == 1) &&
+		    (atomic_cmpxchg(&lock->count, 1, 0) == 1)) {
+			lock_acquired(&lock->dep_map, ip);
+			if (!__builtin_constant_p(ww_ctx == NULL)) {
+				struct ww_mutex *ww;
+				ww = container_of(lock, struct ww_mutex, base);
+
+				ww_mutex_set_context_fastpath(ww, ww_ctx);
+			}
+
+			mutex_set_owner(lock);
+			mspin_unlock(MLOCK(lock), &node);
+			preempt_enable();
+			return 0;
+		}
+		mspin_unlock(MLOCK(lock), &node);
+
+		/*
+		 * When there's no owner, we might have preempted between the
+		 * owner acquiring the lock and setting the owner field. If
+		 * we're an RT task that will live-lock because we won't let
+		 * the owner complete.
+		 */
+		if (!owner && (need_resched() || rt_task(task)))
+			break;
+
+		/*
+		 * The cpu_relax() call is a compiler barrier which forces
+		 * everything in this loop to be re-loaded. We don't need
+		 * memory barriers as we'll eventually observe the right
+		 * values at the cost of a few extra spins.
+		 */
+		arch_mutex_cpu_relax();
+	}
+slowpath:
+#endif
+	spin_lock_mutex(&lock->wait_lock, flags);
+
+	/* We don't support DEBUG_MUTEXES on the backport */
+	/* debug_mutex_lock_common(lock, &waiter); */
+	/* debug_mutex_add_waiter(lock, &waiter, task_thread_info(task)); */
+
+	/* add waiting tasks to the end of the waitqueue (FIFO): */
+	list_add_tail(&waiter.list, &lock->wait_list);
+	waiter.task = task;
+
+	if (MUTEX_SHOW_NO_WAITER(lock) && (atomic_xchg(&lock->count, -1) == 1))
+		goto done;
+
+	lock_contended(&lock->dep_map, ip);
+
+	for (;;) {
+		/*
+		 * Lets try to take the lock again - this is needed even if
+		 * we get here for the first time (shortly after failing to
+		 * acquire the lock), to make sure that we get a wakeup once
+		 * it's unlocked. Later on, if we sleep, this is the
+		 * operation that gives us the lock. We xchg it to -1, so
+		 * that when we release the lock, we properly wake up the
+		 * other waiters:
+		 */
+		if (MUTEX_SHOW_NO_WAITER(lock) &&
+		   (atomic_xchg(&lock->count, -1) == 1))
+			break;
+
+		/*
+		 * got a signal? (This code gets eliminated in the
+		 * TASK_UNINTERRUPTIBLE case.)
+		 */
+		if (unlikely(signal_pending_state(state, task))) {
+			ret = -EINTR;
+			goto err;
+		}
+
+		if (!__builtin_constant_p(ww_ctx == NULL) && ww_ctx->acquired > 0) {
+			ret = __mutex_lock_check_stamp(lock, ww_ctx);
+			if (ret)
+				goto err;
+		}
+
+		__set_task_state(task, state);
+
+		/* didn't get the lock, go to sleep: */
+		spin_unlock_mutex(&lock->wait_lock, flags);
+		backport_schedule_preempt_disabled();
+		spin_lock_mutex(&lock->wait_lock, flags);
+	}
+
+done:
+	lock_acquired(&lock->dep_map, ip);
+	/* got the lock - rejoice! */
+	mutex_remove_waiter(lock, &waiter, current_thread_info());
+	mutex_set_owner(lock);
+
+	if (!__builtin_constant_p(ww_ctx == NULL)) {
+		struct ww_mutex *ww = container_of(lock,
+						      struct ww_mutex,
+						      base);
+		struct mutex_waiter *cur;
+
+		/*
+		 * This branch gets optimized out for the common case,
+		 * and is only important for ww_mutex_lock.
+		 */
+
+		ww_mutex_lock_acquired(ww, ww_ctx);
+		ww->ctx = ww_ctx;
+
+		/*
+		 * Give any possible sleeping processes the chance to wake up,
+		 * so they can recheck if they have to back off.
+		 */
+		list_for_each_entry(cur, &lock->wait_list, list) {
+			/* debug_mutex_wake_waiter(lock, cur); */
+			wake_up_process(cur->task);
+		}
+	}
+
+	/* set it to 0 if there are no waiters left: */
+	if (likely(list_empty(&lock->wait_list)))
+		atomic_set(&lock->count, 0);
+
+	spin_unlock_mutex(&lock->wait_lock, flags);
+
+	/* debug_mutex_free_waiter(&waiter); */
+	preempt_enable();
+
+	return 0;
+
+err:
+	mutex_remove_waiter(lock, &waiter, task_thread_info(task));
+	spin_unlock_mutex(&lock->wait_lock, flags);
+	/* debug_mutex_free_waiter(&waiter); */
+	mutex_release(&lock->dep_map, 1, ip);
+	preempt_enable();
+	return ret;
+}
+
+static noinline int __sched
+__ww_mutex_lock_slowpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
+{
+	return __backport_mutex_lock_common(&lock->base, TASK_UNINTERRUPTIBLE, 0,
+					    NULL, _RET_IP_, ctx);
+}
+
+static noinline int __sched
+__ww_mutex_lock_interruptible_slowpath(struct ww_mutex *lock,
+					    struct ww_acquire_ctx *ctx)
+{
+	return __backport_mutex_lock_common(&lock->base, TASK_INTERRUPTIBLE, 0,
+					    NULL, _RET_IP_, ctx);
+}
+
+/**
+ * __mutex_fastpath_lock_retval - try to take the lock by moving the count
+ *				  from 1 to a 0 value
+ * @count: pointer of type atomic_t
+ *
+ * For backporting purposes we can't use the older kernel's
+ * __mutex_fastpath_lock_retval() since upon failure of a fastpath
+ * lock we want to call our a failure routine with more than one argument, in
+ * this case the context for ww mutexes. Refer to commit a41b56ef the
+ * argument increase. It'd be painful to backport all asm code for the
+ * supported architectures so instead lets penalize the backport ww mutex
+ * fastpath lock with the not so efficient generic atomic_dec_return()
+ * implementation.
+ *
+ * Change the count from 1 to a value lower than 1. This function returns 0
+ * if the fastpath succeeds, or -1 otherwise.
+ */
+static inline int
+__backport_mutex_fastpath_lock_retval(atomic_t *count)
+{
+	if (unlikely(atomic_dec_return(count) < 0))
+		return -1;
+	return 0;
+}
+
+int __sched
+__ww_mutex_lock(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
+{
+	int ret;
+
+	might_sleep();
+
+	ret = __backport_mutex_fastpath_lock_retval(&lock->base.count);
+
+	if (likely(!ret)) {
+		ww_mutex_set_context_fastpath(lock, ctx);
+		mutex_set_owner(&lock->base);
+	} else
+		ret = __ww_mutex_lock_slowpath(lock, ctx);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(__ww_mutex_lock);
+
+int __sched
+__ww_mutex_lock_interruptible(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
+{
+	int ret;
+
+	might_sleep();
+
+	ret = __backport_mutex_fastpath_lock_retval(&lock->base.count);
+
+	if (likely(!ret)) {
+		ww_mutex_set_context_fastpath(lock, ctx);
+		mutex_set_owner(&lock->base);
+	} else
+		ret = __ww_mutex_lock_interruptible_slowpath(lock, ctx);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(__ww_mutex_lock_interruptible);
-- 
1.7.10.4