[PATCH 1/5] dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFER

List overview All Threads
Download

newer

older

[PATCHv9 00/25] Integration of...

Re: [Linaro-mm-sig] [RFC] New...

Maarten Lankhorst

28 Sep 2012 28 Sep '12

12:41 p.m.

Documentation says that code requiring dma-buf should add it to select, so inline fallbacks are not going to be used. A link error will make it obvious what went wrong, instead of silently doing nothing at runtime.

Signed-off-by: Maarten Lankhorst maarten.lankhorst@canonical.com --- include/linux/dma-buf.h | 99 ----------------------------------------------- 1 file changed, 99 deletions(-)

diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index eb48f38..bd2e52c 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -156,7 +156,6 @@ static inline void get_dma_buf(struct dma_buf *dmabuf) get_file(dmabuf->file); }

-#ifdef CONFIG_DMA_SHARED_BUFFER struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, struct device *dev); void dma_buf_detach(struct dma_buf *dmabuf, @@ -184,103 +183,5 @@ int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *, unsigned long); void *dma_buf_vmap(struct dma_buf *); void dma_buf_vunmap(struct dma_buf *, void *vaddr); -#else - -static inline struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, - struct device *dev) -{ - return ERR_PTR(-ENODEV); -} - -static inline void dma_buf_detach(struct dma_buf *dmabuf, - struct dma_buf_attachment *dmabuf_attach) -{ - return; -} - -static inline struct dma_buf *dma_buf_export(void *priv, - const struct dma_buf_ops *ops, - size_t size, int flags) -{ - return ERR_PTR(-ENODEV); -} - -static inline int dma_buf_fd(struct dma_buf *dmabuf, int flags) -{ - return -ENODEV; -} - -static inline struct dma_buf *dma_buf_get(int fd) -{ - return ERR_PTR(-ENODEV); -} - -static inline void dma_buf_put(struct dma_buf *dmabuf) -{ - return; -} - -static inline struct sg_table *dma_buf_map_attachment( - struct dma_buf_attachment *attach, enum dma_data_direction write) -{ - return ERR_PTR(-ENODEV); -} - -static inline void dma_buf_unmap_attachment(struct dma_buf_attachment *attach, - struct sg_table *sg, enum dma_data_direction dir) -{ - return; -} - -static inline int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, - size_t start, size_t len, - enum dma_data_direction dir) -{ - return -ENODEV; -} - -static inline void dma_buf_end_cpu_access(struct dma_buf *dmabuf, - size_t start, size_t len, - enum dma_data_direction dir) -{ -} - -static inline void *dma_buf_kmap_atomic(struct dma_buf *dmabuf, - unsigned long pnum) -{ - return NULL; -} - -static inline void dma_buf_kunmap_atomic(struct dma_buf *dmabuf, - unsigned long pnum, void *vaddr) -{ -} - -static inline void *dma_buf_kmap(struct dma_buf *dmabuf, unsigned long pnum) -{ - return NULL; -} - -static inline void dma_buf_kunmap(struct dma_buf *dmabuf, - unsigned long pnum, void *vaddr) -{ -} - -static inline int dma_buf_mmap(struct dma_buf *dmabuf, - struct vm_area_struct *vma, - unsigned long pgoff) -{ - return -ENODEV; -} - -static inline void *dma_buf_vmap(struct dma_buf *dmabuf) -{ - return NULL; -} - -static inline void dma_buf_vunmap(struct dma_buf *dmabuf, void *vaddr) -{ -} -#endif /* CONFIG_DMA_SHARED_BUFFER */

#endif /* __DMA_BUF_H__ */

Show replies by date

Maarten Lankhorst

28 Sep 28 Sep

12:42 p.m.

New subject: [PATCH 2/5] fence: dma-buf cross-device synchronization (v9)

A fence can be attached to a buffer which is being filled or consumed by hw, to allow userspace to pass the buffer without waiting to another device. For example, userspace can call page_flip ioctl to display the next frame of graphics after kicking the GPU but while the GPU is still rendering. The display device sharing the buffer with the GPU would attach a callback to get notified when the GPU's rendering-complete IRQ fires, to update the scan-out address of the display, without having to wake up userspace.

A fence is transient, one-shot deal. It is allocated and attached to one or more dma-buf's. When the one that attached it is done, with the pending operation, it can signal the fence: + fence_signal()

To have a rough approximation whether a fence is fired, call: + fence_is_signaled()

The dma-buf-mgr handles tracking, and waiting on, the fences associated with a dma-buf.

The one pending on the fence can add an async callback: + fence_add_callback()

The callback can optionally be cancelled with: + fence_remove_callback()

To wait synchronously, optionally with a timeout: + fence_wait() + fence_wait_timeout()

A default software-only implementation is provided, which can be used by drivers attaching a fence to a buffer when they have no other means for hw sync. But a memory backed fence is also envisioned, because it is common that GPU's can write to, or poll on some memory location for synchronization. For example:

fence = custom_get_fence(...); if ((seqno_fence = to_seqno_fence(fence)) != NULL) { dma_buf *fence_buf = fence->sync_buf; get_dma_buf(fence_buf);

... tell the hw the memory location to wait ... custom_wait_on(fence_buf, fence->seqno_ofs, fence->seqno); } else { /* fall-back to sw sync * / fence_add_callback(fence, my_cb); }

On SoC platforms, if some other hw mechanism is provided for synchronizing between IP blocks, it could be supported as an alternate implementation with it's own fence ops in a similar way.

To facilitate other non-sw implementations, the enable_signaling callback can be used to keep track if a device not supporting hw sync is waiting on the fence, and in this case should arrange to call fence_signal() at some point after the condition has changed, to notify other devices waiting on the fence. If there are no sw waiters, this can be skipped to avoid waking the CPU unnecessarily. The handler of the enable_signaling op should take a refcount until the fence is signaled, then release its ref.

The intention is to provide a userspace interface (presumably via eventfd) later, to be used in conjunction with dma-buf's mmap support for sw access to buffers (or for userspace apps that would prefer to do their own synchronization).

v1: Original v2: After discussion w/ danvet and mlankhorst on #dri-devel, we decided that dma-fence didn't need to care about the sw->hw signaling path (it can be handled same as sw->sw case), and therefore the fence->ops can be simplified and more handled in the core. So remove the signal, add_callback, cancel_callback, and wait ops, and replace with a simple enable_signaling() op which can be used to inform a fence supporting hw->hw signaling that one or more devices which do not support hw signaling are waiting (and therefore it should enable an irq or do whatever is necessary in order that the CPU is notified when the fence is passed). v3: Fix locking fail in attach_fence() and get_fence() v4: Remove tie-in w/ dma-buf.. after discussion w/ danvet and mlankorst we decided that we need to be able to attach one fence to N dma-buf's, so using the list_head in dma-fence struct would be problematic. v5: [ Maarten Lankhorst ] Updated for dma-bikeshed-fence and dma-buf-manager. v6: [ Maarten Lankhorst ] I removed dma_fence_cancel_callback and some comments about checking if fence fired or not. This is broken by design. waitqueue_active during destruction is now fatal, since the signaller should be holding a reference in enable_signalling until it signalled the fence. Pass the original dma_fence_cb along, and call __remove_wait in the dma_fence_callback handler, so that no cleanup needs to be performed. v7: [ Maarten Lankhorst ] Set cb->func and only enable sw signaling if fence wasn't signaled yet, for example for hardware fences that may choose to signal blindly. v8: [ Maarten Lankhorst ] Tons of tiny fixes, moved __dma_fence_init to header and fixed include mess. dma-fence.h now includes dma-buf.h All members are now initialized, so kmalloc can be used for allocating a dma-fence. More documentation added. v9: Change compiler bitfields to flags, change return type of enable_signaling to bool. Rework dma_fence_wait. Added dma_fence_is_signaled and dma_fence_wait_timeout. s/dma// and change exports to non GPL. Added fence_is_signaled and fence_enable_sw_signaling calls, add ability to override default wait operation.

Signed-off-by: Maarten Lankhorst maarten.lankhorst@canonical.com --- Documentation/DocBook/device-drivers.tmpl | 2 drivers/base/Makefile | 2 drivers/base/fence.c | 337 +++++++++++++++++++++++++++++ include/linux/fence.h | 234 ++++++++++++++++++++ 4 files changed, 574 insertions(+), 1 deletion(-) create mode 100644 drivers/base/fence.c create mode 100644 include/linux/fence.h

diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl index 7514dbf..6f53fc0 100644 --- a/Documentation/DocBook/device-drivers.tmpl +++ b/Documentation/DocBook/device-drivers.tmpl @@ -126,6 +126,8 @@ X!Edrivers/base/interface.c </sect1> <sect1><title>Device Drivers DMA Management</title> !Edrivers/base/dma-buf.c +!Edrivers/base/fence.c +!Iinclude/linux/fence.h !Edrivers/base/dma-coherent.c !Edrivers/base/dma-mapping.c </sect1> diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 5aa2d70..0026563 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -10,7 +10,7 @@ obj-$(CONFIG_CMA) += dma-contiguous.o obj-y += power/ obj-$(CONFIG_HAS_DMA) += dma-mapping.o obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o -obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o fence.o obj-$(CONFIG_ISA) += isa.o obj-$(CONFIG_FW_LOADER) += firmware_class.o obj-$(CONFIG_NUMA) += node.o diff --git a/drivers/base/fence.c b/drivers/base/fence.c new file mode 100644 index 0000000..b90a09e --- /dev/null +++ b/drivers/base/fence.c @@ -0,0 +1,337 @@ +/* + * Fence mechanism for dma-buf and to allow for asynchronous dma access + * + * Copyright (C) 2012 Canonical Ltd + * Copyright (C) 2012 Texas Instruments + * + * Authors: + * Rob Clark rob.clark@linaro.org + * Maarten Lankhorst maarten.lankhorst@canonical.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <linux/slab.h> +#include <linux/export.h> +#include <linux/fence.h> + +static int __fence_signal(struct fence *fence) +{ + if (fence->flags & FENCE_FLAG_SIGNALED) + return -EINVAL; + + fence->flags |= FENCE_FLAG_SIGNALED; + __wake_up_locked_key(&fence->event_queue, TASK_NORMAL, + &fence->event_queue); + return 0; +} + +/** + * fence_signal - signal completion of a fence + * @fence: the fence to signal + * + * Signal completion for software callbacks on a fence, this will unblock + * fence_wait() calls and run all the callbacks added with + * fence_add_callback(). Can be called multiple times, but since a fence + * can only go from unsignaled to signaled state, it will only be effective + * the first time. + */ +int fence_signal(struct fence *fence) +{ + unsigned long flags; + int ret; + + if (WARN_ON(!fence)) + return -EINVAL; + + spin_lock_irqsave(&fence->event_queue.lock, flags); + ret = __fence_signal(fence); + spin_unlock_irqrestore(&fence->event_queue.lock, flags); + + return ret; +} +EXPORT_SYMBOL(fence_signal); + +static void release_fence(struct kref *kref) +{ + struct fence *fence = + container_of(kref, struct fence, refcount); + + BUG_ON(waitqueue_active(&fence->event_queue)); + + if (fence->ops->release) + fence->ops->release(fence); + + kfree(fence); +} + +/** + * fence_put - decreases refcount of the fence + * @fence: [in] fence to reduce refcount of + */ +void fence_put(struct fence *fence) +{ + if (WARN_ON(!fence)) + return; + kref_put(&fence->refcount, release_fence); +} +EXPORT_SYMBOL(fence_put); + +/** + * fence_get - increases refcount of the fence + * @fence: [in] fence to increase refcount of + */ +void fence_get(struct fence *fence) +{ + if (WARN_ON(!fence)) + return; + kref_get(&fence->refcount); +} +EXPORT_SYMBOL(fence_get); + +static int +__fence_wake_func(wait_queue_t *wait, unsigned mode, int flags, void *key) +{ + struct fence_cb *cb = + container_of(wait, struct fence_cb, base); + + __remove_wait_queue(key, wait); + return cb->func(cb, wait->private); +} + +/** + * fence_enable_sw_signaling - enable signaling on fence + * @fence: [in] the fence to enable + * + * this will request for sw signaling to be enabled, to make the fence + * complete as soon as possible + */ +void fence_enable_sw_signaling(struct fence *fence) +{ + unsigned long flags; + spin_lock_irqsave(&fence->event_queue.lock, flags); + + if (!(fence->flags & (FENCE_FLAG_SIGNALED | + FENCE_FLAG_NEED_SW_SIGNAL))) { + fence->flags |= FENCE_FLAG_NEED_SW_SIGNAL; + + spin_unlock_irqrestore(&fence->event_queue.lock, flags); + + if (!fence->ops->enable_signaling(fence)) + fence_signal(fence); + } else + spin_unlock_irqrestore(&fence->event_queue.lock, flags); +} +EXPORT_SYMBOL(fence_enable_sw_signaling); + +/** + * fence_add_callback - add a callback to be called when the fence + * is signaled + * @fence: [in] the fence to wait on + * @cb: [in] the callback to register + * @func: [in] the function to call + * @priv: [in] the argument to pass to function + * + * cb will be initialized by fence_add_callback, no initialization + * by the caller is required. Any number of callbacks can be registered + * to a fence, but a callback can only be registered to one fence at a time. + * + * Note that the callback can be called from an atomic context. If + * fence is already signaled, this function will return -ENOENT (and + * *not* call the callback) + * + * Add a software callback to the fence. Same restrictions apply to + * refcount as it does to fence_wait, however the caller doesn't need to + * keep a refcount to fence afterwards: when software access is enabled, + * the creator of the fence is required to keep the fence alive until + * after it signals with fence_signal. The callback itself can be called + * from irq context. + * + */ +int fence_add_callback(struct fence *fence, struct fence_cb *cb, + fence_func_t func, void *priv) +{ + unsigned long flags; + int ret; + + if (WARN_ON(!fence || !func)) + return -EINVAL; + + spin_lock_irqsave(&fence->event_queue.lock, flags); + + if (!(fence->flags & (FENCE_FLAG_SIGNALED | + FENCE_FLAG_NEED_SW_SIGNAL))) { + bool enabled; + + fence->flags |= FENCE_FLAG_NEED_SW_SIGNAL; + + /* Drop lock here, else there might be lock inversion when + * enable_signaling takes a lock that's held when + * fence_signal is called. + */ + spin_unlock_irqrestore(&fence->event_queue.lock, flags); + enabled = fence->ops->enable_signaling(fence); + spin_lock_irqsave(&fence->event_queue.lock, flags); + + if (!enabled) + __fence_signal(fence); + } + + if (fence->flags & FENCE_FLAG_SIGNALED) + ret = -ENOENT; + else { + ret = 0; + + cb->base.flags = 0; + cb->base.func = __fence_wake_func; + cb->base.private = priv; + cb->fence = fence; + cb->func = func; + __add_wait_queue(&fence->event_queue, &cb->base); + } + spin_unlock_irqrestore(&fence->event_queue.lock, flags); + + return ret; +} +EXPORT_SYMBOL(fence_add_callback); + +/** + * fence_remove_callback - remove a callback from the signaling list + * @fence: [in] the fence to wait on + * @cb: [in] the callback to remove + * + * Remove a previously queued callback from the fence. This function returns + * true is the callback is succesfully removed, or false if the fence has + * already been signaled. + * + * *WARNING*: + * Cancelling a callback should only be done if you really know what you're + * doing, since deadlocks and race conditions could occur all too easily. For + * this reason, it should only ever be done on hardware lockup recovery, + * with a reference held to the fence. + */ +bool +fence_remove_callback(struct fence *fence, struct fence_cb *cb) +{ + unsigned long flags; + bool ret; + + spin_lock_irqsave(&fence->event_queue.lock, flags); + + ret = !(fence->flags & FENCE_FLAG_SIGNALED); + if (ret) + __remove_wait_queue(&fence->event_queue, &cb->base); + spin_unlock_irqrestore(&fence->event_queue.lock, flags); + + return ret; +} +EXPORT_SYMBOL(fence_remove_callback); + +/** + * fence_default_wait - default sleep until the fence gets signaled + * or until timeout elapses + * @fence: [in] the fence to wait on + * @intr: [in] if true, do an interruptible wait + * @timeout: [in] timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT + * + * Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or the + * remaining timeout in jiffies on success. + */ +long +fence_default_wait(struct fence *fence, bool intr, signed long timeout) +{ + DECLARE_WAITQUEUE(wait, current); + unsigned long flags; + long ret = timeout; + + spin_lock_irqsave(&fence->event_queue.lock, flags); + + if (!(fence->flags & FENCE_FLAG_SIGNALED)) { + bool enable = false; + + if (intr && signal_pending(current)) { + ret = -ERESTARTSYS; + goto out; + } + + if (!(fence->flags & FENCE_FLAG_NEED_SW_SIGNAL)) { + fence->flags |= FENCE_FLAG_NEED_SW_SIGNAL; + enable = true; + } + + __add_wait_queue(&fence->event_queue, &wait); + while (!(fence->flags & FENCE_FLAG_SIGNALED) && ret > 0) { + if (intr) + set_current_state(TASK_INTERRUPTIBLE); + else + set_current_state(TASK_UNINTERRUPTIBLE); + spin_unlock_irqrestore(&fence->event_queue.lock, + flags); + + if (enable) { + if (!fence->ops->enable_signaling(fence)) + fence_signal(fence); + enable = false; + } + + ret = schedule_timeout(ret); + + spin_lock_irqsave(&fence->event_queue.lock, flags); + if (ret > 0 && intr && signal_pending(current)) + ret = -ERESTARTSYS; + } + __remove_wait_queue(&fence->event_queue, &wait); + __set_current_state(TASK_RUNNING); + } +out: + spin_unlock_irqrestore(&fence->event_queue.lock, flags); + return ret; +} +EXPORT_SYMBOL(fence_default_wait); + +static bool sw_enable_signaling(struct fence *fence) +{ + /* fence_create sets needs_sw_signal, + * so this should never be called + */ + WARN_ON_ONCE(1); + return true; +} + +static const struct fence_ops sw_fence_ops = { + .enable_signaling = sw_enable_signaling, +}; + +/** + * fence_create - create a simple sw-only fence + * @priv: [in] the value to use for the priv member + * + * This fence only supports signaling from/to CPU. Other implementations + * of fence can be used to support hardware to hardware signaling, if + * supported by the hardware, and use the fence_helper_* functions for + * compatibility with other devices that only support sw signaling. + */ +struct fence *fence_create(void *priv) +{ + struct fence *fence; + + fence = kmalloc(sizeof(struct fence), GFP_KERNEL); + if (!fence) + return NULL; + + __fence_init(fence, &sw_fence_ops, priv); + fence->flags |= FENCE_FLAG_NEED_SW_SIGNAL; + + return fence; +} +EXPORT_SYMBOL(fence_create); diff --git a/include/linux/fence.h b/include/linux/fence.h new file mode 100644 index 0000000..623af026 --- /dev/null +++ b/include/linux/fence.h @@ -0,0 +1,234 @@ +/* + * Fence mechanism for dma-buf to allow for asynchronous dma access + * + * Copyright (C) 2012 Canonical Ltd + * Copyright (C) 2012 Texas Instruments + * + * Authors: + * Rob Clark rob.clark@linaro.org + * Maarten Lankhorst maarten.lankhorst@canonical.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef __FENCE_H__ +#define __FENCE_H__ + +#include <linux/err.h> +#include <linux/wait.h> +#include <linux/list.h> +#include <linux/bitops.h> +#include <linux/kref.h> +#include <linux/sched.h> + +struct fence; +struct fence_ops; +struct fence_cb; + +/** + * struct fence - software synchronization primitive + * @refcount: refcount for this fence + * @ops: fence_ops associated with this fence + * @event_queue: event queue used for signaling fence + * @priv: fence specific private data + * @flags: A mask of FENCE_FLAG_* defined below + * + * FENCE_FLAG_NEED_SW_SIGNAL - enable_signaling has been called + * FENCE_FLAG_SIGNALED - fence is already signaled + */ +struct fence { + struct kref refcount; + const struct fence_ops *ops; + wait_queue_head_t event_queue; + void *priv; + unsigned long flags; +}; +#define FENCE_FLAG_SIGNALED BIT(0) +#define FENCE_FLAG_NEED_SW_SIGNAL BIT(1) + +typedef int (*fence_func_t)(struct fence_cb *cb, void *priv); + +/** + * struct fence_cb - callback for fence_add_callback + * @base: wait_queue_t added to event_queue + * @func: fence_func_t to call + * @fence: fence this fence_cb was used on + * + * This struct will be initialized by fence_add_callback, additional + * data can be passed along by embedding fence_cb in another struct. + */ +struct fence_cb { + wait_queue_t base; + fence_func_t func; + struct fence *fence; +}; + +/** + * struct fence_ops - operations implemented for fence + * @enable_signaling: enable software signaling of fence + * @signaled: [optional] peek whether the fence is signaled + * @release: [optional] called on destruction of fence + * + * Notes on enable_signaling: + * For fence implementations that have the capability for hw->hw + * signaling, they can implement this op to enable the necessary + * irqs, or insert commands into cmdstream, etc. This is called + * in the first wait() or add_callback() path to let the fence + * implementation know that there is another driver waiting on + * the signal (ie. hw->sw case). + * + * This function can be called called from atomic context, but not + * from irq context, so normal spinlocks can be used. + * + * A return value of false indicates the fence already passed, + * or some failure occured that made it impossible to enable + * signaling. True indicates succesful enabling. + * + * Calling fence_signal before enable_signaling is called allows + * for a tiny race window in which enable_signaling is called during, + * before, or after fence_signal. To fight this, it is recommended + * that before enable_signaling returns true an extra reference is + * taken on the fence, to be released when the fence is signaled. + * This will mean fence_signal will still be called twice, but + * the second time will be a noop since it was already signaled. + * + * Notes on release: + * Can be NULL, this function allows additional commands to run on + * destruction of the fence. Can be called from irq context. + */ + +struct fence_ops { + bool (*enable_signaling)(struct fence *fence); + bool (*signaled)(struct fence *fence); + long (*wait)(struct fence *fence, bool intr, signed long); + void (*release)(struct fence *fence); +}; + +struct fence *fence_create(void *priv); + +/** + * __fence_init - Initialize a custom fence. + * @fence: [in] the fence to initialize + * @ops: [in] the fence_ops for operations on this fence + * @priv: [in] the value to use for the priv member + * + * Initializes an allocated fence, the caller doesn't have to keep its + * refcount after committing with this fence, but it will need to hold a + * refcount again if fence_ops.enable_signaling gets called. This can + * be used for other implementing other types of fence. + */ +static inline void +__fence_init(struct fence *fence, const struct fence_ops *ops, void *priv) +{ + WARN_ON(!ops || !ops->enable_signaling || !ops->wait); + + kref_init(&fence->refcount); + fence->ops = ops; + fence->priv = priv; + fence->flags = 0UL; + init_waitqueue_head(&fence->event_queue); +} + +void fence_get(struct fence *fence); +void fence_put(struct fence *fence); + +int fence_signal(struct fence *fence); +long fence_default_wait(struct fence *fence, bool intr, signed long); +int fence_add_callback(struct fence *fence, struct fence_cb *cb, + fence_func_t func, void *priv); +bool fence_remove_callback(struct fence *fence, struct fence_cb *cb); +void fence_enable_sw_signaling(struct fence *fence); + +/** + * fence_is_signaled - Return an indication if the fence is signaled yet. + * @fence: [in] the fence to check + * + * Returns true if the fence was already signaled, false if not. Since this + * function doesn't enable signaling, it is not guaranteed to ever return true + * If fence_add_callback, fence_wait or fence_enable_sw_signaling + * haven't been called before. + * + * It's recommended for seqno fences to call fence_signal when the + * operation is complete, it makes it possible to prevent issues from + * wraparound between time of issue and time of use by checking the return + * value of this function before calling hardware-specific wait instructions. + */ +static inline bool +fence_is_signaled(struct fence *fence) +{ + rmb(); + + if (fence->flags & FENCE_FLAG_SIGNALED) + return true; + + if (fence->ops->signaled && fence->ops->signaled(fence)) { + fence_signal(fence); + return true; + } + + return false; +} + +/** + * fence_wait_timeout - sleep until the fence gets signaled + * or until timeout elapses + * @fence: [in] the fence to wait on + * @intr: [in] if true, do an interruptible wait + * @timeout: [in] timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT + * + * Returns -ERESTARTSYS if interrupted, 0 if the wait timed out, or the + * remaining timeout in jiffies on success. Other error values may be + * returned on custom implementations. + * + * Performs a synchronous wait on this fence. It is assumed the caller + * directly or indirectly (buf-mgr between reservation and committing) + * holds a reference to the fence, otherwise the fence might be + * freed before return, resulting in undefined behavior. + */ +static inline long +fence_wait_timeout(struct fence *fence, bool intr, signed long timeout) +{ + if (WARN_ON(timeout < 0)) + return -EINVAL; + + return fence->ops->wait(fence, intr, timeout); +} + +/** + * fence_wait - sleep until the fence gets signaled + * @fence: [in] the fence to wait on + * @intr: [in] if true, do an interruptible wait + * + * This function will return -ERESTARTSYS if interrupted by a signal, + * or 0 if the fence was signaled. Other error values may be + * returned on custom implementations. + * + * Performs a synchronous wait on this fence. It is assumed the caller + * directly or indirectly (buf-mgr between reservation and committing) + * holds a reference to the fence, otherwise the fence might be + * freed before return, resulting in undefined behavior. + */ +static inline long fence_wait(struct fence *fence, bool intr) +{ + long ret; + + /* Since fence_wait_timeout cannot timeout with + * MAX_SCHEDULE_TIMEOUT, only valid return values are + * -ERESTARTSYS and MAX_SCHEDULE_TIMEOUT. + */ + ret = fence_wait_timeout(fence, intr, MAX_SCHEDULE_TIMEOUT); + + return ret < 0 ? ret : 0; +} + +#endif /* __FENCE_H__ */

Maarten Lankhorst

7 Oct 7 Oct

4:31 p.m.

New subject: [PATCH 2/5] fence: dma-buf cross-device synchronization (v9)

Op 28-09-12 14:42, Maarten Lankhorst schreef:

...

A fence can be attached to a buffer which is being filled or consumed by hw, to allow userspace to pass the buffer without waiting to another device. For example, userspace can call page_flip ioctl to display the next frame of graphics after kicking the GPU but while the GPU is still rendering. The display device sharing the buffer with the GPU would attach a callback to get notified when the GPU's rendering-complete IRQ fires, to update the scan-out address of the display, without having to wake up userspace.

A fence is transient, one-shot deal. It is allocated and attached to one or more dma-buf's. When the one that attached it is done, with the pending operation, it can signal the fence:

fence_signal()

To have a rough approximation whether a fence is fired, call:

fence_is_signaled()

The dma-buf-mgr handles tracking, and waiting on, the fences associated with a dma-buf.

The one pending on the fence can add an async callback:

fence_add_callback()

The callback can optionally be cancelled with:

fence_remove_callback()

To wait synchronously, optionally with a timeout:

fence_wait()

fence_wait_timeout()

...

Implementing this makes the locking feel weird, instead of abusing fence->event_queue.lock I think it would make sense to remove this part entirely, and have a simply linked list plus a pointer to a spinlock.

enable_signaling should be called with this spinlock held. This way, the enable_signaling callback would be easier to implement and doesn't have to double check for races as much in there.

Also __fence_signal should be exported which would be the same as fence_signal, but without taking the spinlock as separate step, so it can be called with the spinlock held.

How do you feel about these proposed changes?

~Maarten

Maarten Lankhorst

28 Sep 28 Sep

12:42 p.m.

New subject: [PATCH 3/5] seqno-fence: Hardware dma-buf implementation of fencing (v3)

This type of fence can be used with hardware synchronization for simple hardware that can block execution until the condition (dma_buf[offset] - value) >= 0 has been met.

A software fallback still has to be provided in case the fence is used with a device that doesn't support this mechanism. It is useful to expose this for graphics cards that have an op to support this.

Some cards like i915 can export those, but don't have an option to wait, so they need the software fallback.

I extended the original patch by Rob Clark.

v1: Original v2: Renamed from bikeshed to seqno, moved into dma-fence.c since not much was left of the file. Lots of documentation added. v3: Use fence_ops instead of custom callbacks. Moved to own file to avoid circular dependency between dma-buf.h and fence.h

Signed-off-by: Maarten Lankhorst maarten.lankhorst@canonical.com --- Documentation/DocBook/device-drivers.tmpl | 1 drivers/base/fence.c | 36 ++++++++++ include/linux/seqno-fence.h | 107 +++++++++++++++++++++++++++++ 3 files changed, 144 insertions(+) create mode 100644 include/linux/seqno-fence.h

diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl index 6f53fc0..ad14396 100644 --- a/Documentation/DocBook/device-drivers.tmpl +++ b/Documentation/DocBook/device-drivers.tmpl @@ -128,6 +128,7 @@ X!Edrivers/base/interface.c !Edrivers/base/dma-buf.c !Edrivers/base/fence.c !Iinclude/linux/fence.h +!Iinclude/linux/seqno-fence.h !Edrivers/base/dma-coherent.c !Edrivers/base/dma-mapping.c </sect1> diff --git a/drivers/base/fence.c b/drivers/base/fence.c index b90a09e..0a184b2 100644 --- a/drivers/base/fence.c +++ b/drivers/base/fence.c @@ -24,6 +24,7 @@ #include <linux/slab.h> #include <linux/export.h> #include <linux/fence.h> +#include <linux/seqno-fence.h>

static int __fence_signal(struct fence *fence) { @@ -335,3 +336,38 @@ struct fence *fence_create(void *priv) return fence; } EXPORT_SYMBOL(fence_create); + +static bool seqno_enable_signaling(struct fence *fence) +{ + struct seqno_fence *seqno_fence = to_seqno_fence(fence); + return seqno_fence->ops->enable_signaling(fence); +} + +static bool seqno_signaled(struct fence *fence) +{ + struct seqno_fence *seqno_fence = to_seqno_fence(fence); + return seqno_fence->ops->signaled && seqno_fence->ops->signaled(fence); +} + +static void seqno_release(struct fence *fence) +{ + struct seqno_fence *f = to_seqno_fence(fence); + + if (f->ops->release) + f->ops->release(fence); + dma_buf_put(f->sync_buf); +} + +static long seqno_wait(struct fence *fence, bool intr, signed long timeout) +{ + struct seqno_fence *f = to_seqno_fence(fence); + return f->ops->wait(fence, intr, timeout); +} + +const struct fence_ops seqno_fence_ops = { + .enable_signaling = seqno_enable_signaling, + .signaled = seqno_signaled, + .wait = seqno_wait, + .release = seqno_release +}; +EXPORT_SYMBOL_GPL(seqno_fence_ops); diff --git a/include/linux/seqno-fence.h b/include/linux/seqno-fence.h new file mode 100644 index 0000000..971cebe --- /dev/null +++ b/include/linux/seqno-fence.h @@ -0,0 +1,107 @@ +/* + * seqno-fence, using a dma-buf to synchronize fencing + * + * Copyright (C) 2012 Texas Instruments + * Copyright (C) 2012 Canonical Ltd + * Authors: + * Rob Clark rob.clark@linaro.org + * Maarten Lankhorst maarten.lankhorst@canonical.com + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef __SEQNO_FENCE_H__ +#define __SEQNO_FENCE_H__ + +#include <linux/fence.h> +#include <linux/dma-buf.h> + +struct seqno_fence { + struct fence base; + + const struct fence_ops *ops; + struct dma_buf *sync_buf; + uint32_t seqno_ofs; + uint32_t seqno; +}; + +extern const struct fence_ops seqno_fence_ops; + +/** + * to_seqno_fence - cast a fence to a seqno_fence + * @fence: fence to cast to a seqno_fence + * + * Returns NULL if the fence is not a seqno_fence, + * or the seqno_fence otherwise. + */ +static inline struct seqno_fence * +to_seqno_fence(struct fence *fence) +{ + if (fence->ops != &seqno_fence_ops) + return NULL; + return container_of(fence, struct seqno_fence, base); +} + +/** + * seqno_fence_init - initialize a seqno fence + * @fence: seqno_fence to initialize + * @sync_buf: buffer containing the memory location to signal on + * @seqno_ofs: the offset within @sync_buf + * @seqno: the sequence # to signal on + * @priv: value of priv member + * @ops: the fence_ops for operations on this seqno fence + * + * This function initializes a struct seqno_fence with passed parameters, + * and takes a reference on sync_buf which is released on fence destruction. + * + * A seqno_fence is a dma_fence which can complete in software when + * enable_signaling is called, but it also completes when + * (s32)((sync_buf)[seqno_ofs] - seqno) >= 0 is true + * + * The seqno_fence will take a refcount on the sync_buf until it's + * destroyed, but actual lifetime of sync_buf may be longer if one of the + * callers take a reference to it. + * + * Certain hardware have instructions to insert this type of wait condition + * in the command stream, so no intervention from software would be needed. + * This type of fence can be destroyed before completed, however a reference + * on the sync_buf dma-buf can be taken. It is encouraged to re-use the same + * dma-buf for sync_buf, since mapping or unmapping the sync_buf to the + * device's vm can be expensive. + * + * It is recommended for creators of seqno_fence to call fence_signal + * before destruction. This will prevent possible issues from wraparound at + * time of issue vs time of check, since users can check fence_is_signaled + * before submitting instructions for the hardware to wait on the fence. + * However, when ops.enable_signaling is not called, it doesn't have to be + * done as soon as possible, just before there's any real danger of seqno + * wraparound. + */ +static inline void +seqno_fence_init(struct seqno_fence *fence, + struct dma_buf *sync_buf, + uint32_t seqno_ofs, uint32_t seqno, void *priv, + const struct fence_ops *ops) +{ + BUG_ON(!fence || !sync_buf || !ops->enable_signaling || !ops->wait); + + __fence_init(&fence->base, &seqno_fence_ops, priv); + + get_dma_buf(sync_buf); + fence->ops = ops; + fence->sync_buf = sync_buf; + fence->seqno_ofs = seqno_ofs; + fence->seqno = seqno; +} + +#endif /* __SEQNO_FENCE_H__ */

Maarten Lankhorst

12:43 p.m.

New subject: [PATCH 4/5] reservation: cross-device reservation support

This adds support for a generic reservations framework that can be hooked up to ttm and dma-buf and allows easy sharing of reservations across devices.

The idea is that a dma-buf and ttm object both will get a pointer to a struct reservation_object, which has to be reserved before anything is done with the buffer.

Some followup patches are needed in ttm so the lru_lock is no longer taken during the reservation step. This makes the lockdep annotation patch a lot more useful, and the assumption that the lru lock protects atomic removal off the lru list will fail soon, anyway.

Signed-off-by: Maarten Lankhorst maarten.lankhorst@canonical.com --- Documentation/DocBook/device-drivers.tmpl | 2 drivers/base/Makefile | 2 drivers/base/reservation.c | 285 +++++++++++++++++++++++++++++ include/linux/reservation.h | 179 ++++++++++++++++++ 4 files changed, 467 insertions(+), 1 deletion(-) create mode 100644 drivers/base/reservation.c create mode 100644 include/linux/reservation.h

diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl index ad14396..24e6e80 100644 --- a/Documentation/DocBook/device-drivers.tmpl +++ b/Documentation/DocBook/device-drivers.tmpl @@ -129,6 +129,8 @@ X!Edrivers/base/interface.c !Edrivers/base/fence.c !Iinclude/linux/fence.h !Iinclude/linux/seqno-fence.h +!Edrivers/base/reservation.c +!Iinclude/linux/reservation.h !Edrivers/base/dma-coherent.c !Edrivers/base/dma-mapping.c </sect1> diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 0026563..f6f731d 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -10,7 +10,7 @@ obj-$(CONFIG_CMA) += dma-contiguous.o obj-y += power/ obj-$(CONFIG_HAS_DMA) += dma-mapping.o obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o -obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o fence.o +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o fence.o reservation.o obj-$(CONFIG_ISA) += isa.o obj-$(CONFIG_FW_LOADER) += firmware_class.o obj-$(CONFIG_NUMA) += node.o diff --git a/drivers/base/reservation.c b/drivers/base/reservation.c new file mode 100644 index 0000000..93e2d9f --- /dev/null +++ b/drivers/base/reservation.c @@ -0,0 +1,285 @@ +/* + * Copyright (C) 2012 Canonical Ltd + * + * Based on bo.c which bears the following copyright notice, + * but is dual licensed: + * + * Copyright (c) 2006-2009 VMware, Inc., Palo Alto, CA., USA + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sub license, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial portions + * of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, + * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR + * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE + * USE OR OTHER DEALINGS IN THE SOFTWARE. + * + **************************************************************************/ +/* + * Authors: Thomas Hellstrom <thellstrom-at-vmware-dot-com> + */ + +#include <linux/fence.h> +#include <linux/reservation.h> +#include <linux/export.h> +#include <linux/sched.h> +#include <linux/slab.h> + +atomic64_t reservation_counter = ATOMIC64_INIT(1); +EXPORT_SYMBOL(reservation_counter); + +int +object_reserve(struct reservation_object *obj, bool intr, bool no_wait, + reservation_ticket_t *ticket) +{ + int ret; + u64 sequence = ticket ? ticket->seqno : 1; + u64 oldseq; + + while (unlikely(oldseq = atomic64_cmpxchg(&obj->reserved, 0, sequence))) { + + /** + * Deadlock avoidance for multi-obj reserving. + */ + if (sequence > 1 && oldseq > 1) { + /** + * We've already reserved this one. + */ + if (unlikely(sequence == oldseq)) + return -EDEADLK; + /** + * Already reserved by a thread that will not back + * off for us. We need to back off. + */ + if (unlikely(sequence - oldseq < (1ULL << 63))) + return -EAGAIN; + } + + if (no_wait) + return -EBUSY; + + ret = object_wait_unreserved(obj, intr); + + if (unlikely(ret)) + return ret; + } + + /** + * Wake up waiters that may need to recheck for deadlock, + * if we decreased the sequence number. + */ + wake_up_all(&obj->event_queue); + + return 0; +} +EXPORT_SYMBOL(object_reserve); + +int +object_wait_unreserved(struct reservation_object *obj, bool intr) +{ + if (intr) { + return wait_event_interruptible(obj->event_queue, + !object_is_reserved(obj)); + } else { + wait_event(obj->event_queue, + !object_is_reserved(obj)); + return 0; + } +} +EXPORT_SYMBOL(object_wait_unreserved); + +void +object_unreserve(struct reservation_object *obj, + reservation_ticket_t *ticket) +{ + smp_mb(); + atomic64_set(&obj->reserved, 0); + wake_up_all(&obj->event_queue); +} +EXPORT_SYMBOL(object_unreserve); + +/** + * ticket_backoff - cancel a reservation + * @ticket: [in] a reservation_ticket + * @entries: [in] the list list of reservation_entry entries to unreserve + * + * This function cancels a previous reservation done by + * ticket_reserve. This is useful in case something + * goes wrong between reservation and committing. + * + * This should only be called after ticket_reserve returns success. + */ +void +ticket_backoff(struct reservation_ticket *ticket, struct list_head *entries) +{ + struct list_head *cur; + + if (list_empty(entries)) + return; + + list_for_each(cur, entries) { + struct reservation_object *obj; + + reservation_entry_get(cur, &obj, NULL); + + object_unreserve(obj, ticket); + } + reservation_ticket_fini(ticket); +} +EXPORT_SYMBOL(ticket_backoff); + +static void +ticket_backoff_early(struct reservation_ticket *ticket, + struct list_head *list, + struct reservation_entry *entry) +{ + list_for_each_entry_continue_reverse(entry, list, head) { + struct reservation_object *obj; + + reservation_entry_get(&entry->head, &obj, NULL); + object_unreserve(obj, ticket); + } + reservation_ticket_fini(ticket); +} + +/** + * ticket_reserve - reserve a list of reservation_entry + * @ticket: [out] a reservation_ticket + * @entries: [in] a list of entries to reserve. + * + * Do not initialize ticket, it will be initialized by this function. + * + * XXX: Nuke rest + * The caller will have to queue waits on those fences before calling + * ufmgr_fence_buffer_objects, with either hardware specific methods, + * fence_add_callback will, or fence_wait. + * + * As such, by incrementing refcount on reservation_entry before calling + * fence_add_callback, and making the callback decrement refcount on + * reservation_entry, or releasing refcount if fence_add_callback + * failed, the reservation_entry will be freed when all the fences + * have been signaled, and only after the last ref is released, which should + * be after ufmgr_fence_buffer_objects. With proper locking, when the + * list_head holding the list of reservation_entry's becomes empty it + * indicates all fences for all bufs have been signaled. + */ +int +ticket_reserve(struct reservation_ticket *ticket, + struct list_head *entries) +{ + struct list_head *cur; + int ret; + + if (list_empty(entries)) + return 0; + +retry: + reservation_ticket_init(ticket); + + list_for_each(cur, entries) { + struct reservation_entry *entry; + struct reservation_object *bo; + bool shared; + + entry = reservation_entry_get(cur, &bo, &shared); + + ret = object_reserve(bo, true, false, ticket); + switch (ret) { + case 0: + break; + case -EAGAIN: + ticket_backoff_early(ticket, entries, entry); + ret = object_wait_unreserved(bo, true); + if (unlikely(ret != 0)) + return ret; + goto retry; + default: + ticket_backoff_early(ticket, entries, entry); + return ret; + } + + if (shared && + bo->fence_shared_count == BUF_MAX_SHARED_FENCE) { + WARN_ON_ONCE(1); + ticket_backoff_early(ticket, entries, entry); + return -EINVAL; + } + } + + return 0; +} +EXPORT_SYMBOL(ticket_reserve); + +/** + * ticket_commit - commit a reservation with a new fence + * @ticket: [in] the reservation_ticket returned by + * ticket_reserve + * @entries: [in] a linked list of struct reservation_entry + * @fence: [in] the fence that indicates completion + * + * This function will call reservation_ticket_fini, no need + * to do it manually. + * + * This function should be called after a hardware command submission is + * completed succesfully. The fence is used to indicate completion of + * those commands. + */ +void +ticket_commit(struct reservation_ticket *ticket, + struct list_head *entries, struct fence *fence) +{ + struct list_head *cur; + + if (list_empty(entries)) + return; + + if (WARN_ON(!fence)) { + ticket_backoff(ticket, entries); + return; + } + + list_for_each(cur, entries) { + struct reservation_object *bo; + bool shared; + + reservation_entry_get(cur, &bo, &shared); + + if (!shared) { + int i; + for (i = 0; i < bo->fence_shared_count; ++i) { + fence_put(bo->fence_shared[i]); + bo->fence_shared[i] = NULL; + } + bo->fence_shared_count = 0; + if (bo->fence_excl) + fence_put(bo->fence_excl); + + bo->fence_excl = fence; + } else { + if (WARN_ON(bo->fence_shared_count >= + ARRAY_SIZE(bo->fence_shared))) { + continue; + } + + bo->fence_shared[bo->fence_shared_count++] = fence; + } + fence_get(fence); + + object_unreserve(bo, ticket); + } + reservation_ticket_fini(ticket); +} +EXPORT_SYMBOL(ticket_commit); diff --git a/include/linux/reservation.h b/include/linux/reservation.h new file mode 100644 index 0000000..93280af --- /dev/null +++ b/include/linux/reservation.h @@ -0,0 +1,179 @@ +/* + * Header file for reservations for dma-buf and ttm + * + * Copyright(C) 2011 Linaro Limited. All rights reserved. + * Copyright (C) 2012 Canonical Ltd + * Copyright (C) 2012 Texas Instruments + * + * Authors: + * Rob Clark rob.clark@linaro.org + * Maarten Lankhorst maarten.lankhorst@canonical.com + * Thomas Hellstrom <thellstrom-at-vmware-dot-com> + * + * Based on bo.c which bears the following copyright notice, + * but is dual licensed: + * + * Copyright (c) 2006-2009 VMware, Inc., Palo Alto, CA., USA + * All Rights Reserved. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the + * "Software"), to deal in the Software without restriction, including + * without limitation the rights to use, copy, modify, merge, publish, + * distribute, sub license, and/or sell copies of the Software, and to + * permit persons to whom the Software is furnished to do so, subject to + * the following conditions: + * + * The above copyright notice and this permission notice (including the + * next paragraph) shall be included in all copies or substantial portions + * of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, + * DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR + * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE + * USE OR OTHER DEALINGS IN THE SOFTWARE. + */ +#ifndef __RESERVATION_H__ +#define __RESERVATION_H__ + +#define BUF_MAX_SHARED_FENCE 8 + +#include <linux/fence.h> + +extern atomic64_t reservation_counter; + +struct reservation_object { + wait_queue_head_t event_queue; + + atomic64_t reserved; + + u32 fence_shared_count; + struct fence *fence_excl; + struct fence *fence_shared[BUF_MAX_SHARED_FENCE]; +}; + +typedef struct reservation_ticket { + u64 seqno; +} reservation_ticket_t; + +/** + * struct reservation_entry - reservation structure for a + * reservation_object + * @head: list entry + * @obj_shared: pointer to a reservation_object to reserve + * + * Bit 0 of obj_shared is set to bool shared, as such pointer has to be + * converted back, which can be done with reservation_entry_get. + */ +struct reservation_entry { + struct list_head head; + unsigned long obj_shared; +}; + + +static inline void +__reservation_object_init(struct reservation_object *obj) +{ + init_waitqueue_head(&obj->event_queue); +} + +static inline void +reservation_object_init(struct reservation_object *obj) +{ + memset(obj, 0, sizeof(*obj)); + __reservation_object_init(obj); +} + +static inline bool +object_is_reserved(struct reservation_object *obj) +{ + return !!atomic64_read(&obj->reserved); +} + +static inline void +reservation_object_fini(struct reservation_object *obj) +{ + int i; + + BUG_ON(waitqueue_active(&obj->event_queue)); + BUG_ON(object_is_reserved(obj)); + + if (obj->fence_excl) + fence_put(obj->fence_excl); + for (i = 0; i < obj->fence_shared_count; ++i) + fence_put(obj->fence_shared[i]); +} + +static inline void +reservation_ticket_init(struct reservation_ticket *t) +{ + do { + t->seqno = atomic64_inc_return(&reservation_counter); + } while (unlikely(t->seqno < 2)); +} + +/** + * reservation_ticket_fini - end a reservation ticket + * @t: [in] reservation_ticket that completed all reservations + * + * This currently does nothing, but should be called after all reservations + * made with this ticket have been unreserved. It is likely that in the future + * it will be hooked up to perf events, or aid in debugging in other ways. + */ +static inline void +reservation_ticket_fini(struct reservation_ticket *t) +{ } + +/** + * reservation_entry_init - initialize and append a reservation_entry + * to the list + * @entry: entry to initialize + * @list: list to append to + * @obj: reservation_object to initialize the entry with + * @shared: whether shared or exclusive access is requested + */ +static inline void +reservation_entry_init(struct reservation_entry *entry, + struct list_head *list, + struct reservation_object *obj, bool shared) +{ + entry->obj_shared = (unsigned long)obj | !!shared; +} + +static inline struct reservation_entry * +reservation_entry_get(struct list_head *list, + struct reservation_object **obj, bool *shared) +{ + struct reservation_entry *e = container_of(list, struct reservation_entry, head); + unsigned long val = e->obj_shared; + + if (obj) + *obj = (struct reservation_object*)(val & ~1); + if (shared) + *shared = val & 1; + return e; +} + +extern int +object_reserve(struct reservation_object *obj, + bool intr, bool no_wait, + reservation_ticket_t *ticket); + +extern void +object_unreserve(struct reservation_object *, + reservation_ticket_t *ticket); + +extern int +object_wait_unreserved(struct reservation_object *, bool intr); + +extern int ticket_reserve(struct reservation_ticket *, + struct list_head *entries); +extern void ticket_backoff(struct reservation_ticket *, + struct list_head *entries); +extern void ticket_commit(struct reservation_ticket *, + struct list_head *entries, struct fence *); + +#endif /* __BUF_MGR_H__ */

Thomas Hellström

3:29 p.m.

New subject: [PATCH 4/5] reservation: cross-device reservation support

On 9/28/12 2:43 PM, Maarten Lankhorst wrote:

...

This adds support for a generic reservations framework that can be hooked up to ttm and dma-buf and allows easy sharing of reservations across devices.

The idea is that a dma-buf and ttm object both will get a pointer to a struct reservation_object, which has to be reserved before anything is done with the buffer.

"Anything is done with the buffer" should probably be rephrased, as different members of the buffer struct may be protected by different locks. It may not be practical or even possible to protect all buffer members with reservation.

...

Some followup patches are needed in ttm so the lru_lock is no longer taken during the reservation step. This makes the lockdep annotation patch a lot more useful, and the assumption that the lru lock protects atomic removal off the lru list will fail soon, anyway.

As previously discussed, I'm unfortunately not prepared to accept removal of the reserve-lru atomicity into the TTM code at this point. The current code is based on this assumption and removing it will end up with efficiencies, breaking the delayed delete code and probably a locking nightmare when trying to write new TTM code.

...

Signed-off-by: Maarten Lankhorst maarten.lankhorst@canonical.com

Documentation/DocBook/device-drivers.tmpl | 2 drivers/base/Makefile | 2 drivers/base/reservation.c | 285 +++++++++++++++++++++++++++++ include/linux/reservation.h | 179 ++++++++++++++++++ 4 files changed, 467 insertions(+), 1 deletion(-) create mode 100644 drivers/base/reservation.c create mode 100644 include/linux/reservation.h

diff --git a/Documentation/DocBook/device-drivers.tmpl b/Documentation/DocBook/device-drivers.tmpl index ad14396..24e6e80 100644 --- a/Documentation/DocBook/device-drivers.tmpl +++ b/Documentation/DocBook/device-drivers.tmpl @@ -129,6 +129,8 @@ X!Edrivers/base/interface.c !Edrivers/base/fence.c !Iinclude/linux/fence.h !Iinclude/linux/seqno-fence.h +!Edrivers/base/reservation.c +!Iinclude/linux/reservation.h !Edrivers/base/dma-coherent.c !Edrivers/base/dma-mapping.c </sect1> diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 0026563..f6f731d 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -10,7 +10,7 @@ obj-$(CONFIG_CMA) += dma-contiguous.o obj-y += power/ obj-$(CONFIG_HAS_DMA) += dma-mapping.o obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o -obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o fence.o +obj-$(CONFIG_DMA_SHARED_BUFFER) += dma-buf.o fence.o reservation.o obj-$(CONFIG_ISA) += isa.o obj-$(CONFIG_FW_LOADER) += firmware_class.o obj-$(CONFIG_NUMA) += node.o diff --git a/drivers/base/reservation.c b/drivers/base/reservation.c new file mode 100644 index 0000000..93e2d9f --- /dev/null +++ b/drivers/base/reservation.c @@ -0,0 +1,285 @@ +/*

Copyright (C) 2012 Canonical Ltd

Based on bo.c which bears the following copyright notice,

but is dual licensed:

Copyright (c) 2006-2009 VMware, Inc., Palo Alto, CA., USA

All Rights Reserved.

Permission is hereby granted, free of charge, to any person obtaining a

copy of this software and associated documentation files (the

"Software"), to deal in the Software without restriction, including

without limitation the rights to use, copy, modify, merge, publish,

distribute, sub license, and/or sell copies of the Software, and to

permit persons to whom the Software is furnished to do so, subject to

the following conditions:

The above copyright notice and this permission notice (including the

next paragraph) shall be included in all copies or substantial portions

of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR

IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,

FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL

THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM,

DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR

OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE

USE OR OTHER DEALINGS IN THE SOFTWARE.

**************************************************************************/

+/*

Authors: Thomas Hellstrom <thellstrom-at-vmware-dot-com>

*/

+#include <linux/fence.h> +#include <linux/reservation.h> +#include <linux/export.h> +#include <linux/sched.h> +#include <linux/slab.h>

+atomic64_t reservation_counter = ATOMIC64_INIT(1); +EXPORT_SYMBOL(reservation_counter);

+int +object_reserve(struct reservation_object *obj, bool intr, bool no_wait,
      reservation_ticket_t *ticket)
+{
int ret;

u64 sequence = ticket ? ticket->seqno : 1;

u64 oldseq;

while (unlikely(oldseq = atomic64_cmpxchg(&obj->reserved, 0, sequence))) {
/**
 * Deadlock avoidance for multi-obj reserving.
 */
if (sequence > 1 && oldseq > 1) {
	/**
	 * We've already reserved this one.
	 */
	if (unlikely(sequence == oldseq))
		return -EDEADLK;
	/**
	 * Already reserved by a thread that will not back
	 * off for us. We need to back off.
	 */
	if (unlikely(sequence - oldseq < (1ULL << 63)))
		return -EAGAIN;
}
if (no_wait)
	return -EBUSY;
ret = object_wait_unreserved(obj, intr);
if (unlikely(ret))
	return ret;
}

/**
* Wake up waiters that may need to recheck for deadlock,
* if we decreased the sequence number.
*/
wake_up_all(&obj->event_queue);

return 0;
+} +EXPORT_SYMBOL(object_reserve);

+int +object_wait_unreserved(struct reservation_object *obj, bool intr) +{
if (intr) {
return wait_event_interruptible(obj->event_queue,
		!object_is_reserved(obj));
} else {
wait_event(obj->event_queue,
	   !object_is_reserved(obj));
return 0;
}
+} +EXPORT_SYMBOL(object_wait_unreserved);

+void +object_unreserve(struct reservation_object *obj,
 reservation_ticket_t *ticket)
+{

smp_mb();

atomic64_set(&obj->reserved, 0);

wake_up_all(&obj->event_queue);

+} +EXPORT_SYMBOL(object_unreserve);

+/**

ticket_backoff - cancel a reservation

@ticket: [in] a reservation_ticket

@entries: [in] the list list of reservation_entry entries to unreserve

This function cancels a previous reservation done by

ticket_reserve. This is useful in case something

goes wrong between reservation and committing.

This should only be called after ticket_reserve returns success.

*/

+void +ticket_backoff(struct reservation_ticket *ticket, struct list_head *entries) +{
struct list_head *cur;

if (list_empty(entries))
return;
list_for_each(cur, entries) {
struct reservation_object *obj;
reservation_entry_get(cur, &obj, NULL);
object_unreserve(obj, ticket);
}

reservation_ticket_fini(ticket);
+} +EXPORT_SYMBOL(ticket_backoff);

+static void +ticket_backoff_early(struct reservation_ticket *ticket,
	 struct list_head *list,
	 struct reservation_entry *entry)
+{
list_for_each_entry_continue_reverse(entry, list, head) {
struct reservation_object *obj;
reservation_entry_get(&entry->head, &obj, NULL);
object_unreserve(obj, ticket);
}

reservation_ticket_fini(ticket);
+}

+/**

ticket_reserve - reserve a list of reservation_entry

@ticket: [out] a reservation_ticket

@entries: [in] a list of entries to reserve.

Do not initialize ticket, it will be initialized by this function.

XXX: Nuke rest

The caller will have to queue waits on those fences before calling

ufmgr_fence_buffer_objects, with either hardware specific methods,

fence_add_callback will, or fence_wait.

As such, by incrementing refcount on reservation_entry before calling

fence_add_callback, and making the callback decrement refcount on

reservation_entry, or releasing refcount if fence_add_callback

failed, the reservation_entry will be freed when all the fences

have been signaled, and only after the last ref is released, which should

be after ufmgr_fence_buffer_objects. With proper locking, when the

list_head holding the list of reservation_entry's becomes empty it

indicates all fences for all bufs have been signaled.

*/

+int +ticket_reserve(struct reservation_ticket *ticket,
   struct list_head *entries)
+{
struct list_head *cur;

int ret;

if (list_empty(entries))
return 0;
+retry:
reservation_ticket_init(ticket);

list_for_each(cur, entries) {
struct reservation_entry *entry;
struct reservation_object *bo;
bool shared;
entry = reservation_entry_get(cur, &bo, &shared);
ret = object_reserve(bo, true, false, ticket);
switch (ret) {
case 0:
	break;
case -EAGAIN:
	ticket_backoff_early(ticket, entries, entry);
	ret = object_wait_unreserved(bo, true);
	if (unlikely(ret != 0))
		return ret;
	goto retry;
default:
	ticket_backoff_early(ticket, entries, entry);
	return ret;
}
if (shared &&
    bo->fence_shared_count == BUF_MAX_SHARED_FENCE) {
	WARN_ON_ONCE(1);
	ticket_backoff_early(ticket, entries, entry);
	return -EINVAL;
}
}

return 0;
+} +EXPORT_SYMBOL(ticket_reserve);

+/**

ticket_commit - commit a reservation with a new fence

@ticket: [in] the reservation_ticket returned by

ticket_reserve

@entries: [in] a linked list of struct reservation_entry

@fence: [in] the fence that indicates completion

This function will call reservation_ticket_fini, no need

to do it manually.

This function should be called after a hardware command submission is

completed succesfully. The fence is used to indicate completion of

those commands.

*/

+void +ticket_commit(struct reservation_ticket *ticket,
  struct list_head *entries, struct fence *fence)
+{
struct list_head *cur;

if (list_empty(entries))
return;
if (WARN_ON(!fence)) {
ticket_backoff(ticket, entries);
return;
}

list_for_each(cur, entries) {
struct reservation_object *bo;
bool shared;
reservation_entry_get(cur, &bo, &shared);
if (!shared) {
	int i;
	for (i = 0; i < bo->fence_shared_count; ++i) {
		fence_put(bo->fence_shared[i]);
		bo->fence_shared[i] = NULL;
	}
	bo->fence_shared_count = 0;
	if (bo->fence_excl)
		fence_put(bo->fence_excl);
	bo->fence_excl = fence;
} else {
	if (WARN_ON(bo->fence_shared_count >=
		    ARRAY_SIZE(bo->fence_shared))) {
		continue;
	}
	bo->fence_shared[bo->fence_shared_count++] = fence;
}
fence_get(fence);
object_unreserve(bo, ticket);
}

reservation_ticket_fini(ticket);
+} +EXPORT_SYMBOL(ticket_commit); diff --git a/include/linux/reservation.h b/include/linux/reservation.h new file mode 100644 index 0000000..93280af --- /dev/null +++ b/include/linux/reservation.h @@ -0,0 +1,179 @@ +/*

Header file for reservations for dma-buf and ttm

Copyright(C) 2011 Linaro Limited. All rights reserved.

Copyright (C) 2012 Canonical Ltd

Copyright (C) 2012 Texas Instruments

Authors:

Rob Clark rob.clark@linaro.org

Maarten Lankhorst maarten.lankhorst@canonical.com

Thomas Hellstrom <thellstrom-at-vmware-dot-com>

Based on bo.c which bears the following copyright notice,

but is dual licensed:

Copyright (c) 2006-2009 VMware, Inc., Palo Alto, CA., USA

All Rights Reserved.

Permission is hereby granted, free of charge, to any person obtaining a

copy of this software and associated documentation files (the

"Software"), to deal in the Software without restriction, including

without limitation the rights to use, copy, modify, merge, publish,

distribute, sub license, and/or sell copies of the Software, and to

permit persons to whom the Software is furnished to do so, subject to

the following conditions:

The above copyright notice and this permission notice (including the

next paragraph) shall be included in all copies or substantial portions

of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR

IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,

FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL

THE COPYRIGHT HOLDERS, AUTHORS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM,

DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR

OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE

USE OR OTHER DEALINGS IN THE SOFTWARE.

*/

+#ifndef __RESERVATION_H__ +#define __RESERVATION_H__

+#define BUF_MAX_SHARED_FENCE 8

+#include <linux/fence.h>

+extern atomic64_t reservation_counter;

+struct reservation_object {

wait_queue_head_t event_queue;

atomic64_t reserved;

u32 fence_shared_count;

struct fence *fence_excl;

struct fence *fence_shared[BUF_MAX_SHARED_FENCE];

+};

+typedef struct reservation_ticket {

u64 seqno;

+} reservation_ticket_t;

+/**

struct reservation_entry - reservation structure for a

reservation_object

@head: list entry

@obj_shared: pointer to a reservation_object to reserve

Bit 0 of obj_shared is set to bool shared, as such pointer has to be

converted back, which can be done with reservation_entry_get.

*/

+struct reservation_entry {

struct list_head head;

unsigned long obj_shared;

+};

+static inline void +__reservation_object_init(struct reservation_object *obj) +{

init_waitqueue_head(&obj->event_queue);

+}

+static inline void +reservation_object_init(struct reservation_object *obj) +{

memset(obj, 0, sizeof(*obj));

__reservation_object_init(obj);

+}

+static inline bool +object_is_reserved(struct reservation_object *obj) +{

return !!atomic64_read(&obj->reserved);

+}

+static inline void +reservation_object_fini(struct reservation_object *obj) +{
int i;

BUG_ON(waitqueue_active(&obj->event_queue));

BUG_ON(object_is_reserved(obj));

if (obj->fence_excl)
fence_put(obj->fence_excl);
for (i = 0; i < obj->fence_shared_count; ++i)
fence_put(obj->fence_shared[i]);
+}

+static inline void +reservation_ticket_init(struct reservation_ticket *t) +{
do {
t->seqno = atomic64_inc_return(&reservation_counter);
} while (unlikely(t->seqno < 2));
+}

+/**

reservation_ticket_fini - end a reservation ticket

@t: [in] reservation_ticket that completed all reservations

This currently does nothing, but should be called after all reservations

made with this ticket have been unreserved. It is likely that in the future

it will be hooked up to perf events, or aid in debugging in other ways.

*/

+static inline void +reservation_ticket_fini(struct reservation_ticket *t) +{ }

+/**

reservation_entry_init - initialize and append a reservation_entry

to the list

@entry: entry to initialize

@list: list to append to

@obj: reservation_object to initialize the entry with

@shared: whether shared or exclusive access is requested

*/

+static inline void +reservation_entry_init(struct reservation_entry *entry,
	   struct list_head *list,
	   struct reservation_object *obj, bool shared)
+{

entry->obj_shared = (unsigned long)obj | !!shared;

+}

+static inline struct reservation_entry * +reservation_entry_get(struct list_head *list,
	  struct reservation_object **obj, bool *shared)
+{
struct reservation_entry *e = container_of(list, struct reservation_entry, head);

unsigned long val = e->obj_shared;

if (obj)
*obj = (struct reservation_object*)(val & ~1);
if (shared)
*shared = val & 1;
return e;
+}

+extern int +object_reserve(struct reservation_object *obj,
	       bool intr, bool no_wait,
	       reservation_ticket_t *ticket);
+extern void +object_unreserve(struct reservation_object *,
		 reservation_ticket_t *ticket);
+extern int +object_wait_unreserved(struct reservation_object *, bool intr);

+extern int ticket_reserve(struct reservation_ticket *,
			  struct list_head *entries);
+extern void ticket_backoff(struct reservation_ticket *,
	       struct list_head *entries);
+extern void ticket_commit(struct reservation_ticket *,
	      struct list_head *entries, struct fence *);
+#endif /* __BUF_MGR_H__ */

Maarten Lankhorst

4:01 p.m.

New subject: [PATCH 4/5] reservation: cross-device reservation support

Op 28-09-12 17:29, Thomas Hellström schreef:

...

On 9/28/12 2:43 PM, Maarten Lankhorst wrote:

...
This adds support for a generic reservations framework that can be hooked up to ttm and dma-buf and allows easy sharing of reservations across devices.

The idea is that a dma-buf and ttm object both will get a pointer to a struct reservation_object, which has to be reserved before anything is done with the buffer.

"Anything is done with the buffer" should probably be rephrased, as different members of the buffer struct may be protected by different locks. It may not be practical or even possible to protect all buffer members with reservation.

Agreed.

...

...
Some followup patches are needed in ttm so the lru_lock is no longer taken during the reservation step. This makes the lockdep annotation patch a lot more useful, and the assumption that the lru lock protects atomic removal off the lru list will fail soon, anyway.

As previously discussed, I'm unfortunately not prepared to accept removal of the reserve-lru atomicity into the TTM code at this point. The current code is based on this assumption and removing it will end up with efficiencies, breaking the delayed delete code and probably a locking nightmare when trying to write new TTM code.

The lru lock removal patch fixed the delayed delete code, it really is not different from the current situation. In fact it is more clear without the guarantee what various parts are trying to protect.

Nothing prevents you from holding the lru_lock while trylocking, leaving that guarantee intact for that part. Can you really just review the patch and tell me where it breaks and/or makes the code unreadable?

See my preemptive reply to patch 1/5 for details. I would prefer you followup there. :-)

~Maarten

Thomas Hellstrom

3 Oct 3 Oct

12:33 p.m.

New subject: [PATCH 4/5] reservation: cross-device reservation support

I took a quick look on the fencing and added some thoughts on shared fences:

On 09/28/2012 02:43 PM, Maarten Lankhorst wrote:

...

This adds support for a generic reservations framework that can be hooked up to ttm and dma-buf and allows easy sharing of reservations across devices.

The idea is that a dma-buf and ttm object both will get a pointer to a struct reservation_object, which has to be reserved before anything is done with the buffer.

Some followup patches are needed in ttm so the lru_lock is no longer taken during the reservation step. This makes the lockdep annotation patch a lot more useful, and the assumption that the lru lock protects atomic removal off the lru list will fail soon, anyway.

Signed-off-by: Maarten Lankhorst maarten.lankhorst@canonical.com

+/**

ticket_commit - commit a reservation with a new fence

@ticket: [in] the reservation_ticket returned by

ticket_reserve

@entries: [in] a linked list of struct reservation_entry

@fence: [in] the fence that indicates completion

This function will call reservation_ticket_fini, no need

to do it manually.

This function should be called after a hardware command submission is

completed succesfully. The fence is used to indicate completion of

those commands.

*/

+void +ticket_commit(struct reservation_ticket *ticket,
  struct list_head *entries, struct fence *fence)
+{
struct list_head *cur;

if (list_empty(entries))
return;
if (WARN_ON(!fence)) {
ticket_backoff(ticket, entries);
return;
}

list_for_each(cur, entries) {
struct reservation_object *bo;
bool shared;
reservation_entry_get(cur, &bo, &shared);
if (!shared) {
	int i;
	for (i = 0; i < bo->fence_shared_count; ++i) {
		fence_put(bo->fence_shared[i]);
		bo->fence_shared[i] = NULL;
	}
	bo->fence_shared_count = 0;
	if (bo->fence_excl)
		fence_put(bo->fence_excl);
	bo->fence_excl = fence;

I assume here that the validation code has made sure that fences are either ordered or expired so that "fence" signals *after* all other fences have signaled.

...

```
} else {
```
```
	if (WARN_ON(bo->fence_shared_count >=
```
```
		    ARRAY_SIZE(bo->fence_shared))) {
```
```
		continue;
```
```
	}
```

This is bad. Failure to fence a buffer is a catastrophic error that can lead to pages being reused for other stuff while still being read by the GPU, and the caller must be informed with an error code and sync on the fence.

I guess this has been discussed previously, but I think it might be more appropriate with a list of pointers to fences. There is an allocation overhead, for this, but allocation from a mem cache should really be fast enough, and the list entries can be allocated during ticket_reserve to avoid errors in the commit code.

...

	bo->fence_shared[bo->fence_shared_count++] = fence;

It might be good if this function had access to a light version of a cross-device struct fence * order_fences(struct fence *a, struct fence *b) function that can quickly check two fences and determine whether signaling one means that the other one also is signaled. In that case one or more of the shared fences can be unreferenced, putting less pressure on the fence_shared array. The lightweight version of order_fences is allowed to fail if there is no simple and quick way of ordering them. Could perhaps be added to the fence API.

And (even though not part of the reservation API) There is a heavyweight version of that cross-device function int order_fence(struct fence *a, int gpu_engine) needed for the validation code exclusive fencing that *makes sure* fence a has signaled before the current gpu_engine executes its commands. For some gpu - fence pairs the ordering is done implicitly since they share the same command stream, for some it's possible to insert barriers in the gpu_engine command stream (radeon and nouveau is doing that), and if there is no other way of doing it, the code will need to wait for the fence.

...

```
}
```
```
fence_get(fence);
```

Hmm. Perhaps a fence_get(fence, NUM) to avoid a huge number of atomic incs?

...

object_unreserve(bo, ticket);
}

reservation_ticket_fini(ticket);
+} +EXPORT_SYMBOL(ticket_commit);

Thomas

Maarten Lankhorst

28 Sep 28 Sep

12:43 p.m.

New subject: [PATCH 5/5] reservation: Add lockdep annotation and selftests

Signed-off-by: Maarten Lankhorst maarten.lankhorst@canonical.com

---

The self-tests will fail if the commit "lockdep: Check if nested lock is actually held" from linux tip core/locking is not applied. --- drivers/base/reservation.c | 46 +++++- include/linux/reservation.h | 29 +++- lib/Kconfig.debug | 1 lib/locking-selftest.c | 353 +++++++++++++++++++++++++++++++++++++++++-- 4 files changed, 403 insertions(+), 26 deletions(-)

diff --git a/drivers/base/reservation.c b/drivers/base/reservation.c index 93e2d9f..b8d4f4d 100644 --- a/drivers/base/reservation.c +++ b/drivers/base/reservation.c @@ -41,6 +41,18 @@ atomic64_t reservation_counter = ATOMIC64_INIT(1); EXPORT_SYMBOL(reservation_counter);

+const char reservation_object_name[] = "reservation_object"; +EXPORT_SYMBOL(reservation_object_name); + +const char reservation_ticket_name[] = "reservation_ticket"; +EXPORT_SYMBOL(reservation_ticket_name); + +struct lock_class_key reservation_object_class; +EXPORT_SYMBOL(reservation_object_class); + +struct lock_class_key reservation_ticket_class; +EXPORT_SYMBOL(reservation_ticket_class); + int object_reserve(struct reservation_object *obj, bool intr, bool no_wait, reservation_ticket_t *ticket) @@ -49,6 +61,10 @@ object_reserve(struct reservation_object *obj, bool intr, bool no_wait, u64 sequence = ticket ? ticket->seqno : 1; u64 oldseq;

+ if (!no_wait) + mutex_acquire_nest(&obj->dep_map, 0, 0, + ticket ? &ticket->dep_map : NULL, _RET_IP_); + while (unlikely(oldseq = atomic64_cmpxchg(&obj->reserved, 0, sequence))) {

/** @@ -58,14 +74,18 @@ object_reserve(struct reservation_object *obj, bool intr, bool no_wait, /** * We've already reserved this one. */ - if (unlikely(sequence == oldseq)) - return -EDEADLK; + if (unlikely(sequence == oldseq)) { + ret = -EDEADLK; + goto fail; + } /** * Already reserved by a thread that will not back * off for us. We need to back off. */ - if (unlikely(sequence - oldseq < (1ULL << 63))) - return -EAGAIN; + if (unlikely(sequence - oldseq < (1ULL << 63))) { + ret = -EAGAIN; + goto fail; + } }

if (no_wait) @@ -74,9 +94,12 @@ object_reserve(struct reservation_object *obj, bool intr, bool no_wait, ret = object_wait_unreserved(obj, intr);

if (unlikely(ret)) - return ret; + goto fail; }

+ if (no_wait) + mutex_acquire(&obj->dep_map, 0, 1, _RET_IP_); + /** * Wake up waiters that may need to recheck for deadlock, * if we decreased the sequence number. @@ -84,6 +107,11 @@ object_reserve(struct reservation_object *obj, bool intr, bool no_wait, wake_up_all(&obj->event_queue);

return 0; + +fail: + if (!no_wait) + mutex_release(&obj->dep_map, 1, _RET_IP_); + return ret; } EXPORT_SYMBOL(object_reserve);

@@ -105,6 +133,14 @@ void object_unreserve(struct reservation_object *obj, reservation_ticket_t *ticket) { + mutex_release(&obj->dep_map, 1, _RET_IP_); + + if (!object_is_reserved(obj)) { +#ifndef CONFIG_DEBUG_LOCKING_API_SELFTESTS + WARN_ON(1); +#endif + return; + } smp_mb(); atomic64_set(&obj->reserved, 0); wake_up_all(&obj->event_queue); diff --git a/include/linux/reservation.h b/include/linux/reservation.h index 93280af..6fa0cdb 100644 --- a/include/linux/reservation.h +++ b/include/linux/reservation.h @@ -44,6 +44,10 @@ #include <linux/fence.h>

extern atomic64_t reservation_counter; +extern const char reservation_object_name[]; +extern struct lock_class_key reservation_object_class; +extern const char reservation_ticket_name[]; +extern struct lock_class_key reservation_ticket_class;

struct reservation_object { wait_queue_head_t event_queue; @@ -53,10 +57,17 @@ struct reservation_object { u32 fence_shared_count; struct fence *fence_excl; struct fence *fence_shared[BUF_MAX_SHARED_FENCE]; + +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lockdep_map dep_map; +#endif };

typedef struct reservation_ticket { u64 seqno; +#ifdef CONFIG_DEBUG_LOCK_ALLOC + struct lockdep_map dep_map; +#endif } reservation_ticket_t;

/** @@ -73,11 +84,13 @@ struct reservation_entry { unsigned long obj_shared; };

- static inline void __reservation_object_init(struct reservation_object *obj) { init_waitqueue_head(&obj->event_queue); + + lockdep_init_map(&obj->dep_map, reservation_object_name, + &reservation_object_class, 0); }

static inline void @@ -110,6 +123,16 @@ reservation_object_fini(struct reservation_object *obj) static inline void reservation_ticket_init(struct reservation_ticket *t) { +#ifdef CONFIG_LOCKDEP + /* + * Make sure we are not reinitializing a held ticket: + */ + + debug_check_no_locks_freed((void *)t, sizeof(*t)); +#endif + lockdep_init_map(&t->dep_map, reservation_ticket_name, + &reservation_ticket_class, 0); + mutex_acquire(&t->dep_map, 0, 0, _RET_IP_); do { t->seqno = atomic64_inc_return(&reservation_counter); } while (unlikely(t->seqno < 2)); @@ -125,7 +148,9 @@ reservation_ticket_init(struct reservation_ticket *t) */ static inline void reservation_ticket_fini(struct reservation_ticket *t) -{ } +{ + mutex_release(&t->dep_map, 1, _RET_IP_); +}

/** * reservation_entry_init - initialize and append a reservation_entry diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 2403a63..3211730 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -700,6 +700,7 @@ config DEBUG_ATOMIC_SLEEP config DEBUG_LOCKING_API_SELFTESTS bool "Locking API boot-time self-tests" depends on DEBUG_KERNEL + select CONFIG_DMA_SHARED_BUFFER help Say Y here if you want the kernel to run a short self-test during bootup. The self-test checks whether common types of locking bugs diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c index 7aae0f2..fc4a45b 100644 --- a/lib/locking-selftest.c +++ b/lib/locking-selftest.c @@ -20,6 +20,7 @@ #include <linux/interrupt.h> #include <linux/debug_locks.h> #include <linux/irqflags.h> +#include <linux/reservation.h>

/* * Change this to 1 if you want to see the failure printouts: @@ -42,6 +43,7 @@ __setup("debug_locks_verbose=", setup_debug_locks_verbose); #define LOCKTYPE_RWLOCK 0x2 #define LOCKTYPE_MUTEX 0x4 #define LOCKTYPE_RWSEM 0x8 +#define LOCKTYPE_RESERVATION 0x10

/* * Normal standalone locks, for the circular and irq-context @@ -920,11 +922,17 @@ GENERATE_PERMUTATIONS_3_EVENTS(irq_read_recursion_soft) static void reset_locks(void) { local_irq_disable(); + lockdep_free_key_range(&reservation_object_class, 1); + lockdep_free_key_range(&reservation_ticket_class, 1); + I1(A); I1(B); I1(C); I1(D); I1(X1); I1(X2); I1(Y1); I1(Y2); I1(Z1); I1(Z2); lockdep_reset(); I2(A); I2(B); I2(C); I2(D); init_shared_classes(); + + memset(&reservation_object_class, 0, sizeof reservation_object_class); + memset(&reservation_ticket_class, 0, sizeof reservation_ticket_class); local_irq_enable(); }

@@ -938,7 +946,6 @@ static int unexpected_testcase_failures; static void dotest(void (*testcase_fn)(void), int expected, int lockclass_mask) { unsigned long saved_preempt_count = preempt_count(); - int expected_failure = 0;

WARN_ON(irqs_disabled());

@@ -946,26 +953,16 @@ static void dotest(void (*testcase_fn)(void), int expected, int lockclass_mask) /* * Filter out expected failures: */ + if (debug_locks != expected) { #ifndef CONFIG_PROVE_LOCKING - if ((lockclass_mask & LOCKTYPE_SPIN) && debug_locks != expected) - expected_failure = 1; - if ((lockclass_mask & LOCKTYPE_RWLOCK) && debug_locks != expected) - expected_failure = 1; - if ((lockclass_mask & LOCKTYPE_MUTEX) && debug_locks != expected) - expected_failure = 1; - if ((lockclass_mask & LOCKTYPE_RWSEM) && debug_locks != expected) - expected_failure = 1; + expected_testcase_failures++; + printk("failed|"); +#else + unexpected_testcase_failures++; + printk("FAILED|"); + + dump_stack(); #endif - if (debug_locks != expected) { - if (expected_failure) { - expected_testcase_failures++; - printk("failed|"); - } else { - unexpected_testcase_failures++; - - printk("FAILED|"); - dump_stack(); - } } else { testcase_successes++; printk(" ok |"); @@ -1108,6 +1105,322 @@ static inline void print_testname(const char *testname) DO_TESTCASE_6IRW(desc, name, 312); \ DO_TESTCASE_6IRW(desc, name, 321);

+static void reservation_test_fail_reserve(void) +{ + struct reservation_ticket t; + struct reservation_object o; + + reservation_object_init(&o); + reservation_ticket_init(&t); + t.seqno++; + + object_reserve(&o, false, false, &t); + /* No lockdep test, pure API */ + WARN_ON(object_reserve(&o, false, true, &t) != -EDEADLK); + t.seqno--; + WARN_ON(object_reserve(&o, false, true, &t) != -EBUSY); + t.seqno += 2; + WARN_ON(object_reserve(&o, false, true, &t) != -EAGAIN); + object_unreserve(&o, NULL); + + reservation_ticket_fini(&t); +} + +static void reservation_test_two_tickets(void) +{ + struct reservation_ticket t, t2; + + reservation_ticket_init(&t); + reservation_ticket_init(&t2); + + reservation_ticket_fini(&t2); + reservation_ticket_fini(&t); +} + +static void reservation_test_ticket_unreserve_twice(void) +{ + struct reservation_ticket t; + + reservation_ticket_init(&t); + reservation_ticket_fini(&t); + reservation_ticket_fini(&t); +} + +static void reservation_test_object_unreserve_twice(void) +{ + struct reservation_object o; + + reservation_object_init(&o); + object_reserve(&o, false, false, NULL); + object_unreserve(&o, NULL); + object_unreserve(&o, NULL); +} + +static void reservation_test_fence_nest_unreserved(void) +{ + struct reservation_object o; + + reservation_object_init(&o); + + spin_lock_nest_lock(&lock_A, &o); + spin_unlock(&lock_A); +} + +static void reservation_test_ticket_block(void) +{ + struct reservation_ticket t; + struct reservation_object o, o2; + + reservation_object_init(&o); + reservation_object_init(&o2); + reservation_ticket_init(&t); + + object_reserve(&o, false, false, &t); + object_reserve(&o2, false, false, NULL); + object_unreserve(&o2, NULL); + object_unreserve(&o, &t); + + reservation_ticket_fini(&t); +} + +static void reservation_test_ticket_try(void) +{ + struct reservation_ticket t; + struct reservation_object o, o2; + + reservation_object_init(&o); + reservation_object_init(&o2); + reservation_ticket_init(&t); + + object_reserve(&o, false, false, &t); + object_reserve(&o2, false, true, NULL); + object_unreserve(&o2, NULL); + object_unreserve(&o, &t); + + reservation_ticket_fini(&t); +} + +static void reservation_test_ticket_ticket(void) +{ + struct reservation_ticket t; + struct reservation_object o, o2; + + reservation_object_init(&o); + reservation_object_init(&o2); + reservation_ticket_init(&t); + + object_reserve(&o, false, false, &t); + object_reserve(&o2, false, false, &t); + object_unreserve(&o2, &t); + object_unreserve(&o, &t); + + reservation_ticket_fini(&t); +} + +static void reservation_test_try_block(void) +{ + struct reservation_object o, o2; + + reservation_object_init(&o); + reservation_object_init(&o2); + + object_reserve(&o, false, true, NULL); + object_reserve(&o2, false, false, NULL); + object_unreserve(&o2, NULL); + object_unreserve(&o, NULL); +} + +static void reservation_test_try_try(void) +{ + struct reservation_object o, o2; + + reservation_object_init(&o); + reservation_object_init(&o2); + + object_reserve(&o, false, true, NULL); + object_reserve(&o2, false, true, NULL); + object_unreserve(&o2, NULL); + object_unreserve(&o, NULL); +} + +static void reservation_test_try_ticket(void) +{ + struct reservation_ticket t; + struct reservation_object o, o2; + + reservation_object_init(&o); + reservation_object_init(&o2); + + object_reserve(&o, false, true, NULL); + reservation_ticket_init(&t); + + object_reserve(&o2, false, false, &t); + object_unreserve(&o2, &t); + object_unreserve(&o, NULL); + + reservation_ticket_fini(&t); +} + +static void reservation_test_block_block(void) +{ + struct reservation_object o, o2; + + reservation_object_init(&o); + reservation_object_init(&o2); + + object_reserve(&o, false, false, NULL); + object_reserve(&o2, false, false, NULL); + object_unreserve(&o2, NULL); + object_unreserve(&o, NULL); +} + +static void reservation_test_block_try(void) +{ + struct reservation_object o, o2; + + reservation_object_init(&o); + reservation_object_init(&o2); + + object_reserve(&o, false, false, NULL); + object_reserve(&o2, false, true, NULL); + object_unreserve(&o2, NULL); + object_unreserve(&o, NULL); +} + +static void reservation_test_block_ticket(void) +{ + struct reservation_ticket t; + struct reservation_object o, o2; + + reservation_object_init(&o); + reservation_object_init(&o2); + + object_reserve(&o, false, false, NULL); + reservation_ticket_init(&t); + + object_reserve(&o2, false, false, &t); + object_unreserve(&o2, &t); + object_unreserve(&o, NULL); + + reservation_ticket_fini(&t); +} + +static void reservation_test_fence_block(void) +{ + struct reservation_object o; + + reservation_object_init(&o); + spin_lock(&lock_A); + spin_unlock(&lock_A); + + object_reserve(&o, false, false, NULL); + spin_lock(&lock_A); + spin_unlock(&lock_A); + object_unreserve(&o, NULL); + + spin_lock(&lock_A); + object_reserve(&o, false, false, NULL); + object_unreserve(&o, NULL); + spin_unlock(&lock_A); +} + +static void reservation_test_fence_try(void) +{ + struct reservation_object o; + + reservation_object_init(&o); + spin_lock(&lock_A); + spin_unlock(&lock_A); + + object_reserve(&o, false, true, NULL); + spin_lock(&lock_A); + spin_unlock(&lock_A); + object_unreserve(&o, NULL); + + spin_lock(&lock_A); + object_reserve(&o, false, true, NULL); + object_unreserve(&o, NULL); + spin_unlock(&lock_A); +} + +static void reservation_test_fence_ticket(void) +{ + struct reservation_ticket t; + struct reservation_object o; + + reservation_object_init(&o); + spin_lock(&lock_A); + spin_unlock(&lock_A); + + reservation_ticket_init(&t); + + object_reserve(&o, false, false, &t); + spin_lock(&lock_A); + spin_unlock(&lock_A); + object_unreserve(&o, &t); + + spin_lock(&lock_A); + object_reserve(&o, false, false, &t); + object_unreserve(&o, &t); + spin_unlock(&lock_A); + + reservation_ticket_fini(&t); +} + +static void reservation_tests(void) +{ + printk(" --------------------------------------------------------------------------\n"); + printk(" | Reservation tests |\n"); + printk(" ---------------------\n"); + + print_testname("reservation api failures"); + dotest(reservation_test_fail_reserve, SUCCESS, LOCKTYPE_RESERVATION); + printk("\n"); + + print_testname("reserving two tickets"); + dotest(reservation_test_two_tickets, FAILURE, LOCKTYPE_RESERVATION); + printk("\n"); + + print_testname("unreserve ticket twice"); + dotest(reservation_test_ticket_unreserve_twice, FAILURE, LOCKTYPE_RESERVATION); + printk("\n"); + + print_testname("unreserve object twice"); + dotest(reservation_test_object_unreserve_twice, FAILURE, LOCKTYPE_RESERVATION); + printk("\n"); + + print_testname("spinlock nest unreserved"); + dotest(reservation_test_fence_nest_unreserved, FAILURE, LOCKTYPE_RESERVATION); + printk("\n"); + + printk(" -----------------------------------------------------\n"); + printk(" |block | try |ticket|\n"); + printk(" -----------------------------------------------------\n"); + + print_testname("ticket"); + dotest(reservation_test_ticket_block, FAILURE, LOCKTYPE_RESERVATION); + dotest(reservation_test_ticket_try, SUCCESS, LOCKTYPE_RESERVATION); + dotest(reservation_test_ticket_ticket, SUCCESS, LOCKTYPE_RESERVATION); + printk("\n"); + + print_testname("try"); + dotest(reservation_test_try_block, FAILURE, LOCKTYPE_RESERVATION); + dotest(reservation_test_try_try, SUCCESS, LOCKTYPE_RESERVATION); + dotest(reservation_test_try_ticket, FAILURE, LOCKTYPE_RESERVATION); + printk("\n"); + + print_testname("block"); + dotest(reservation_test_block_block, FAILURE, LOCKTYPE_RESERVATION); + dotest(reservation_test_block_try, SUCCESS, LOCKTYPE_RESERVATION); + dotest(reservation_test_block_ticket, FAILURE, LOCKTYPE_RESERVATION); + printk("\n"); + + print_testname("spinlock"); + dotest(reservation_test_fence_block, FAILURE, LOCKTYPE_RESERVATION); + dotest(reservation_test_fence_try, SUCCESS, LOCKTYPE_RESERVATION); + dotest(reservation_test_fence_ticket, FAILURE, LOCKTYPE_RESERVATION); + printk("\n"); +}

void locking_selftest(void) { @@ -1188,6 +1501,8 @@ void locking_selftest(void) DO_TESTCASE_6x2("irq read-recursion", irq_read_recursion); // DO_TESTCASE_6x2B("irq read-recursion #2", irq_read_recursion2);

+ reservation_tests(); + if (unexpected_testcase_failures) { printk("-----------------------------------------------------------------\n"); debug_locks = 0;

Daniel Vetter

1:20 p.m.

New subject: [PATCH 1/5] dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFER

On Fri, Sep 28, 2012 at 02:41:48PM +0200, Maarten Lankhorst wrote:

...

Documentation says that code requiring dma-buf should add it to select, so inline fallbacks are not going to be used. A link error will make it obvious what went wrong, instead of silently doing nothing at runtime.

Signed-off-by: Maarten Lankhorst maarten.lankhorst@canonical.com

Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch

I think it'd be good if we could merge this for 3.7 ... -Daniel

...

include/linux/dma-buf.h | 99 ----------------------------------------------- 1 file changed, 99 deletions(-)

diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index eb48f38..bd2e52c 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -156,7 +156,6 @@ static inline void get_dma_buf(struct dma_buf *dmabuf) get_file(dmabuf->file); } -#ifdef CONFIG_DMA_SHARED_BUFFER struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, struct device *dev); void dma_buf_detach(struct dma_buf *dmabuf, @@ -184,103 +183,5 @@ int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *, unsigned long); void *dma_buf_vmap(struct dma_buf *); void dma_buf_vunmap(struct dma_buf *, void *vaddr); -#else

-static inline struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
					struct device *dev)
-{

return ERR_PTR(-ENODEV);

-}

-static inline void dma_buf_detach(struct dma_buf *dmabuf,
		  struct dma_buf_attachment *dmabuf_attach)
-{

return;

-}

-static inline struct dma_buf *dma_buf_export(void *priv,
			     const struct dma_buf_ops *ops,
			     size_t size, int flags)
-{

return ERR_PTR(-ENODEV);

-}

-static inline int dma_buf_fd(struct dma_buf *dmabuf, int flags) -{

return -ENODEV;

-}

-static inline struct dma_buf *dma_buf_get(int fd) -{

return ERR_PTR(-ENODEV);

-}

-static inline void dma_buf_put(struct dma_buf *dmabuf) -{

return;

-}

-static inline struct sg_table *dma_buf_map_attachment(

struct dma_buf_attachment *attach, enum dma_data_direction write)

-{

return ERR_PTR(-ENODEV);

-}

-static inline void dma_buf_unmap_attachment(struct dma_buf_attachment *attach,
	struct sg_table *sg, enum dma_data_direction dir)
-{

return;

-}

-static inline int dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
			   size_t start, size_t len,
			   enum dma_data_direction dir)
-{

return -ENODEV;

-}

-static inline void dma_buf_end_cpu_access(struct dma_buf *dmabuf,
			  size_t start, size_t len,
			  enum dma_data_direction dir)
-{ -}

-static inline void *dma_buf_kmap_atomic(struct dma_buf *dmabuf,
			unsigned long pnum)
-{

return NULL;

-}

-static inline void dma_buf_kunmap_atomic(struct dma_buf *dmabuf,
			 unsigned long pnum, void *vaddr)
-{ -}

-static inline void *dma_buf_kmap(struct dma_buf *dmabuf, unsigned long pnum) -{

return NULL;

-}

-static inline void dma_buf_kunmap(struct dma_buf *dmabuf,
		  unsigned long pnum, void *vaddr)
-{ -}

-static inline int dma_buf_mmap(struct dma_buf *dmabuf,
	       struct vm_area_struct *vma,
	       unsigned long pgoff)
-{

return -ENODEV;

-}

-static inline void *dma_buf_vmap(struct dma_buf *dmabuf) -{

return NULL;

-}

-static inline void dma_buf_vunmap(struct dma_buf *dmabuf, void *vaddr) -{ -} -#endif /* CONFIG_DMA_SHARED_BUFFER */ #endif /* __DMA_BUF_H__ */

dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Maarten Lankhorst

2:14 p.m.

Hey,

Op 28-09-12 14:41, Maarten Lankhorst schreef:

...

Documentation says that code requiring dma-buf should add it to select, so inline fallbacks are not going to be used. A link error will make it obvious what went wrong, instead of silently doing nothing at runtime.

The whole patch series is in my tree, I use stg so things might move around, do not use for merging currently:

http://cgit.freedesktop.org/~mlankhorst/linux/log/?h=v10-wip

It contains everything in here plus the patches for ttm to make it work, I use a old snapshot of drm-next + merge of nouveau as base. Description of what the parts do:

Series to fix small api issues when moving over:

drm/ttm: Remove cpu_writers related code drm/ttm: Add ttm_bo_is_reserved function drm/radeon: Use ttm_bo_is_reserved drm/vmwgfx: use ttm_bo_is_reserved

drm/vmwgfx: remove use of fence_obj_args drm/ttm: remove sync_obj_arg drm/ttm: remove sync_obj_arg from ttm_bo_move_accel_cleanup drm/ttm: remove sync_arg entirely

drm/nouveau: unpin buffers before releasing to prevent lockdep warnings drm/nouveau: add reservation to nouveau_bo_vma_del drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep

Hey great, now we only have one user left for fence waiting before reserving, lets fix that and remove fence lock: ttm_bo_cleanup_refs_or_queue and ttm_bo_cleanup_refs have to reserve before waiting, lets do it in the squash commit so we don't have to throw lock order around everywhere:

drm/ttm: remove fence_lock

-- Up to this point should be mergeable now

Then we start working on lru_lock removal slightly, this means the lru list no longer is empty but can contain only reserved buffers:

drm/ttm: do not check if list is empty in ttm_bo_force_list_clean drm/ttm: move reservations for ttm_bo_cleanup_refs

-- Still mergeable up to this point, just fixes

Patch series from this email: dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFER fence: dma-buf cross-device synchronization (v9) seqno-fence: Hardware dma-buf implementation of fencing (v3) reservation: cross-device reservation support reservation: Add lockdep annotation and selftests

Now hook it up to drm/ttm in a few steps: usage around reservations: drm/ttm: make ttm reservation calls behave like reservation calls drm/ttm: use dma_reservation api dma-buf: use reservations drm/ttm: allow drivers to pass custom dma_reservation_objects for a bo

then kill off the lru lock around reservation: drm/ttm: remove lru_lock around ttm_bo_reserve drm/ttm: simplify ttm_eu_*

The lru_lock removal patch removes the lock around lru_lock around the reservation, this will break the assumption that items on the lru list and swap list can always be reserved, and this gets patched up too. Is there any part in ttm you disagree with? I believe that this is all mergeable, the lru_lock removal patch could be moved to before the reservation parts, this might make merging easier, but I don't think there is any ttm part of the series that are wrong on a conceptual level.

~Maarten

Thomas Hellstrom

7:42 p.m.

On 09/28/2012 04:14 PM, Maarten Lankhorst wrote:

...

Hey,

Op 28-09-12 14:41, Maarten Lankhorst schreef:

...
Documentation says that code requiring dma-buf should add it to select, so inline fallbacks are not going to be used. A link error will make it obvious what went wrong, instead of silently doing nothing at runtime.

The whole patch series is in my tree, I use stg so things might move around, do not use for merging currently:

http://cgit.freedesktop.org/~mlankhorst/linux/log/?h=v10-wip

It contains everything in here plus the patches for ttm to make it work, I use a old snapshot of drm-next + merge of nouveau as base. Description of what the parts do:

Series to fix small api issues when moving over:

drm/ttm: Remove cpu_writers related code drm/ttm: Add ttm_bo_is_reserved function drm/radeon: Use ttm_bo_is_reserved drm/vmwgfx: use ttm_bo_is_reserved

drm/vmwgfx: remove use of fence_obj_args drm/ttm: remove sync_obj_arg drm/ttm: remove sync_obj_arg from ttm_bo_move_accel_cleanup drm/ttm: remove sync_arg entirely

drm/nouveau: unpin buffers before releasing to prevent lockdep warnings drm/nouveau: add reservation to nouveau_bo_vma_del drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep

Hey great, now we only have one user left for fence waiting before reserving, lets fix that and remove fence lock: ttm_bo_cleanup_refs_or_queue and ttm_bo_cleanup_refs have to reserve before waiting, lets do it in the squash commit so we don't have to throw lock order around everywhere:

drm/ttm: remove fence_lock

-- Up to this point should be mergeable now

Then we start working on lru_lock removal slightly, this means the lru list no longer is empty but can contain only reserved buffers:

drm/ttm: do not check if list is empty in ttm_bo_force_list_clean drm/ttm: move reservations for ttm_bo_cleanup_refs

-- Still mergeable up to this point, just fixes

Patch series from this email: dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFER fence: dma-buf cross-device synchronization (v9) seqno-fence: Hardware dma-buf implementation of fencing (v3) reservation: cross-device reservation support reservation: Add lockdep annotation and selftests

Now hook it up to drm/ttm in a few steps: usage around reservations: drm/ttm: make ttm reservation calls behave like reservation calls drm/ttm: use dma_reservation api dma-buf: use reservations drm/ttm: allow drivers to pass custom dma_reservation_objects for a bo

then kill off the lru lock around reservation: drm/ttm: remove lru_lock around ttm_bo_reserve drm/ttm: simplify ttm_eu_*

The lru_lock removal patch removes the lock around lru_lock around the reservation, this will break the assumption that items on the lru list and swap list can always be reserved, and this gets patched up too. Is there any part in ttm you disagree with? I believe that this is all mergeable, the lru_lock removal patch could be moved to before the reservation parts, this might make merging easier, but I don't think there is any ttm part of the series that are wrong on a conceptual level.

~Maarten

....From another email

...

...
As previously discussed, I'm unfortunately not prepared to accept removal of the reserve-lru atomicity into the TTM code at this point. The current code is based on this assumption and removing it will end up with efficiencies, breaking the delayed delete code and probably a locking nightmare when trying to write new TTM code.

The lru lock removal patch fixed the delayed delete code, it really is not different from the current situation. In fact it is more clear without the guarantee what various parts are trying to protect.

Nothing prevents you from holding the lru_lock while trylocking,

[1] While this would not cause any deadlocks, Any decent lockdep code would establish lru->reserve as the locking order once a lru- reserve trylock succeeds, but the locking order is really reserve->lru for obvious reasons, which means we will get a lot of lockdep errors? Yes, there are a two reversals like these already in the TTM code, and I'm not very proud of them.

leaving that guarantee intact for that part. Can you really just review the patch and tell me where it breaks and/or makes the code unreadable?

OK. Now I'm looking at http://cgit.freedesktop.org/~mlankhorst/linux/tree/drivers/gpu/drm/ttm/ttm_b...

And let's start with a function that's seen some change, ttm_mem_evict_first:

*) Line 715: You're traversing a list using list_for_each() calling a function that may remove the list entr||||y *) Line 722: You're unlocking the lock protecting the list in the middle of list traversal *) Line 507: WARN_ON_ONCE in a code path quite likely to get called? *) Line 512: sleep while atomic *) Line 729: Forgot to unreserve *) Line 730: Forgot to lock lru *) Line 735: Now you're restarting with the first item on the LRU list. Why the loop at line 715? *) Line 740: Deadlocking reserve *) Line 757: Calling TTM Bo evict, but there might have been another process already evicting the buffer while you released the lru_lock in line 739, before reserving the buffer.

And this is even before it starts to get interesting, like how you guarantee that when you release a buffer from the delayed delete list, you're the only process having a reference?

Now, it's probably possible to achieve what you're trying to do, if we accept the lock reversal in [1], but since I have newborn twins and I have about one hour of spare time a week with I now spent on this review and I guess there are countless more hours before this can work. (These code paths were never tested, right?) One of the biggest TTM reworks was to introduce the atomicity assumption and remove a lot of code that was prone to deadlocks, races and buffer leaks. I'm not prepared to revert that work without an extremely good reason, and "It can be done" is not such a reason.

We *need* to carefully weigh it against any benefits you have in your work, and you need to test these codepaths in parallell cases subject to heavy aperture / vram thrashing and frequent signals causing interrupted waits.

And I think you need to present the gains in your work that can motivate the testing-and review time for this.

Thanks, Thomas

Thomas Hellstrom

8:10 p.m.

On 09/28/2012 09:42 PM, Thomas Hellstrom wrote:

...

On 09/28/2012 04:14 PM, Maarten Lankhorst wrote:

...
Hey,

Op 28-09-12 14:41, Maarten Lankhorst schreef:

...
Documentation says that code requiring dma-buf should add it to select, so inline fallbacks are not going to be used. A link error will make it obvious what went wrong, instead of silently doing nothing at runtime.

The whole patch series is in my tree, I use stg so things might move around, do not use for merging currently:

http://cgit.freedesktop.org/~mlankhorst/linux/log/?h=v10-wip

It contains everything in here plus the patches for ttm to make it work, I use a old snapshot of drm-next + merge of nouveau as base. Description of what the parts do:

Series to fix small api issues when moving over:

drm/ttm: Remove cpu_writers related code drm/ttm: Add ttm_bo_is_reserved function drm/radeon: Use ttm_bo_is_reserved drm/vmwgfx: use ttm_bo_is_reserved

drm/vmwgfx: remove use of fence_obj_args drm/ttm: remove sync_obj_arg drm/ttm: remove sync_obj_arg from ttm_bo_move_accel_cleanup drm/ttm: remove sync_arg entirely

drm/nouveau: unpin buffers before releasing to prevent lockdep warnings drm/nouveau: add reservation to nouveau_bo_vma_del drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep

Hey great, now we only have one user left for fence waiting before reserving, lets fix that and remove fence lock: ttm_bo_cleanup_refs_or_queue and ttm_bo_cleanup_refs have to reserve before waiting, lets do it in the squash commit so we don't have to throw lock order around everywhere:

drm/ttm: remove fence_lock

-- Up to this point should be mergeable now

Then we start working on lru_lock removal slightly, this means the lru list no longer is empty but can contain only reserved buffers:

drm/ttm: do not check if list is empty in ttm_bo_force_list_clean drm/ttm: move reservations for ttm_bo_cleanup_refs

-- Still mergeable up to this point, just fixes

Patch series from this email: dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFER fence: dma-buf cross-device synchronization (v9) seqno-fence: Hardware dma-buf implementation of fencing (v3) reservation: cross-device reservation support reservation: Add lockdep annotation and selftests

Now hook it up to drm/ttm in a few steps: usage around reservations: drm/ttm: make ttm reservation calls behave like reservation calls drm/ttm: use dma_reservation api dma-buf: use reservations drm/ttm: allow drivers to pass custom dma_reservation_objects for a bo

then kill off the lru lock around reservation: drm/ttm: remove lru_lock around ttm_bo_reserve drm/ttm: simplify ttm_eu_*

The lru_lock removal patch removes the lock around lru_lock around the reservation, this will break the assumption that items on the lru list and swap list can always be reserved, and this gets patched up too. Is there any part in ttm you disagree with? I believe that this is all mergeable, the lru_lock removal patch could be moved to before the reservation parts, this might make merging easier, but I don't think there is any ttm part of the series that are wrong on a conceptual level.

~Maarten

....From another email

...
...
As previously discussed, I'm unfortunately not prepared to accept removal of the reserve-lru atomicity into the TTM code at this point. The current code is based on this assumption and removing it will end up with efficiencies, breaking the delayed delete code and probably a locking nightmare when trying to write new TTM code.

The lru lock removal patch fixed the delayed delete code, it really is not different from the current situation. In fact it is more clear without the guarantee what various parts are trying to protect.

Nothing prevents you from holding the lru_lock while trylocking,

[1] While this would not cause any deadlocks, Any decent lockdep code would establish lru->reserve as the locking order once a lru- reserve trylock succeeds, but the locking order is really reserve->lru for obvious reasons, which means we will get a lot of lockdep errors? Yes, there are a two reversals like these already in the TTM code, and I'm not very proud of them.

leaving that guarantee intact for that part. Can you really just review the patch and tell me where it breaks and/or makes the code unreadable?

OK. Now I'm looking at http://cgit.freedesktop.org/~mlankhorst/linux/tree/drivers/gpu/drm/ttm/ttm_b...

And let's start with a function that's seen some change, ttm_mem_evict_first:

*) Line 715: You're traversing a list using list_for_each() calling a function that may remove the list entr||||y *) Line 722: You're unlocking the lock protecting the list in the middle of list traversal *) Line 507: WARN_ON_ONCE in a code path quite likely to get called? *) Line 512: sleep while atomic *) Line 729: Forgot to unreserve *) Line 730: Forgot to lock lru *) Line 735: Now you're restarting with the first item on the LRU list. Why the loop at line 715? *) Line 740: Deadlocking reserve *) Line 757: Calling TTM Bo evict, but there might have been another process already evicting the buffer while you released the lru_lock in line 739, before reserving the buffer.

Actually, Lines 715, 722, 730 and 735 are OK, sorry about that, but the others should be valid. /Thomas

Maarten Lankhorst

29 Sep 29 Sep

3:16 p.m.

Op 28-09-12 22:10, Thomas Hellstrom schreef:

...

On 09/28/2012 09:42 PM, Thomas Hellstrom wrote:

...
On 09/28/2012 04:14 PM, Maarten Lankhorst wrote:

...
Hey,

Op 28-09-12 14:41, Maarten Lankhorst schreef:

...
Documentation says that code requiring dma-buf should add it to select, so inline fallbacks are not going to be used. A link error will make it obvious what went wrong, instead of silently doing nothing at runtime.

The whole patch series is in my tree, I use stg so things might move around, do not use for merging currently:

http://cgit.freedesktop.org/~mlankhorst/linux/log/?h=v10-wip

It contains everything in here plus the patches for ttm to make it work, I use a old snapshot of drm-next + merge of nouveau as base. Description of what the parts do:

Series to fix small api issues when moving over:

drm/ttm: Remove cpu_writers related code drm/ttm: Add ttm_bo_is_reserved function drm/radeon: Use ttm_bo_is_reserved drm/vmwgfx: use ttm_bo_is_reserved

drm/vmwgfx: remove use of fence_obj_args drm/ttm: remove sync_obj_arg drm/ttm: remove sync_obj_arg from ttm_bo_move_accel_cleanup drm/ttm: remove sync_arg entirely

drm/nouveau: unpin buffers before releasing to prevent lockdep warnings drm/nouveau: add reservation to nouveau_bo_vma_del drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep

Hey great, now we only have one user left for fence waiting before reserving, lets fix that and remove fence lock: ttm_bo_cleanup_refs_or_queue and ttm_bo_cleanup_refs have to reserve before waiting, lets do it in the squash commit so we don't have to throw lock order around everywhere:

drm/ttm: remove fence_lock

-- Up to this point should be mergeable now

Then we start working on lru_lock removal slightly, this means the lru list no longer is empty but can contain only reserved buffers:

drm/ttm: do not check if list is empty in ttm_bo_force_list_clean drm/ttm: move reservations for ttm_bo_cleanup_refs

-- Still mergeable up to this point, just fixes

Patch series from this email: dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFER fence: dma-buf cross-device synchronization (v9) seqno-fence: Hardware dma-buf implementation of fencing (v3) reservation: cross-device reservation support reservation: Add lockdep annotation and selftests

Now hook it up to drm/ttm in a few steps: usage around reservations: drm/ttm: make ttm reservation calls behave like reservation calls drm/ttm: use dma_reservation api dma-buf: use reservations drm/ttm: allow drivers to pass custom dma_reservation_objects for a bo

then kill off the lru lock around reservation: drm/ttm: remove lru_lock around ttm_bo_reserve drm/ttm: simplify ttm_eu_*

The lru_lock removal patch removes the lock around lru_lock around the reservation, this will break the assumption that items on the lru list and swap list can always be reserved, and this gets patched up too. Is there any part in ttm you disagree with? I believe that this is all mergeable, the lru_lock removal patch could be moved to before the reservation parts, this might make merging easier, but I don't think there is any ttm part of the series that are wrong on a conceptual level.

~Maarten

....From another email

...
...
As previously discussed, I'm unfortunately not prepared to accept removal of the reserve-lru atomicity into the TTM code at this point. The current code is based on this assumption and removing it will end up with efficiencies, breaking the delayed delete code and probably a locking nightmare when trying to write new TTM code.

The lru lock removal patch fixed the delayed delete code, it really is not different from the current situation. In fact it is more clear without the guarantee what various parts are trying to protect.

Nothing prevents you from holding the lru_lock while trylocking,

[1] While this would not cause any deadlocks, Any decent lockdep code would establish lru->reserve as the locking order once a lru- reserve trylock succeeds, but the locking order is really reserve->lru for obvious reasons, which means we will get a lot of lockdep errors? Yes, there are a two reversals like these already in the TTM code, and I'm not very proud of them.

leaving that guarantee intact for that part. Can you really just review the patch and tell me where it breaks and/or makes the code unreadable?

OK. Now I'm looking at http://cgit.freedesktop.org/~mlankhorst/linux/tree/drivers/gpu/drm/ttm/ttm_b...

And let's start with a function that's seen some change, ttm_mem_evict_first:

*) Line 715: You're traversing a list using list_for_each() calling a function that may remove the list entr||||y *) Line 722: You're unlocking the lock protecting the list in the middle of list traversal

...

...
*) Line 507: WARN_ON_ONCE in a code path quite likely to get called?

When will it get called? ttm_bo_delayed_delete calls it if it's already on delayed destroy list. ttm_mem_evict_first only calls it if on that list too. ttm_bo_swapout won't call it either if not on the list.

...

...
*) Line 512: sleep while atomic

Oops. Order is wrong, ttm_bo_wait has to be called first before anything else. Thanks for catching it.

...

...
*) Line 729: Forgot to unreserve

Thanks for catching it.

...

...
*) Line 740: Deadlocking reserve

Agreed, I'll remove it and just give up if no buffer could be reserved from the lru list. I was just confused since that function passed on no_wait_reserve which doesn't make sense there.

...

...
*) Line 757: Calling TTM Bo evict, but there might have been another process already evicting the buffer while you released the lru_lock in line 739, before reserving the buffer.

Yeah I think this part of the code needs to be taken out, which would be more similar to the old behavior since before nothing on the lru list was going to be contended anyway. Removing the offending code and just returning -EBUSY regardless should be the correct fix, maintaining the current behavior.

I'll remove the no_wait_reserve argument too, since it was unused before and would just be confusing to keep.

...

Actually, Lines 715, 722, 730 and 735 are OK, sorry about that, but the others should be valid. /Thomas

Is there any point in keeping ttm_bo_move_ttm's no_wait_reserve and no_wait_gpu argument? Nothing seems to use it and it would allow removal of 1 or more of those parameters for a whole lot of functions:

ttm_bo_move*, ttm_bo_handle_move_mem, ttm_bo_evict (though keeping no_wait_gpu), ttm_mem_evict_first, ttm_bo_mem_force_space, ttm_bo_mem_space, ttm_bo_move_buffer, ttm_bo_validate.

This would be a separate patch prerequisite patch though, but it should make it more clear that there's no blocking on reservation in eviction.

~Maarten

Thomas Hellstrom

1 Oct 1 Oct

8:49 a.m.

On 09/29/2012 05:16 PM, Maarten Lankhorst wrote:

...

Op 28-09-12 22:10, Thomas Hellstrom schreef:

...
On 09/28/2012 09:42 PM, Thomas Hellstrom wrote:

...
On 09/28/2012 04:14 PM, Maarten Lankhorst wrote:

...
Hey,

Op 28-09-12 14:41, Maarten Lankhorst schreef:

...
Documentation says that code requiring dma-buf should add it to select, so inline fallbacks are not going to be used. A link error will make it obvious what went wrong, instead of silently doing nothing at runtime.

The whole patch series is in my tree, I use stg so things might move around, do not use for merging currently:

http://cgit.freedesktop.org/~mlankhorst/linux/log/?h=v10-wip

It contains everything in here plus the patches for ttm to make it work, I use a old snapshot of drm-next + merge of nouveau as base. Description of what the parts do:

Series to fix small api issues when moving over:

drm/ttm: Remove cpu_writers related code drm/ttm: Add ttm_bo_is_reserved function drm/radeon: Use ttm_bo_is_reserved drm/vmwgfx: use ttm_bo_is_reserved

drm/vmwgfx: remove use of fence_obj_args drm/ttm: remove sync_obj_arg drm/ttm: remove sync_obj_arg from ttm_bo_move_accel_cleanup drm/ttm: remove sync_arg entirely

drm/nouveau: unpin buffers before releasing to prevent lockdep warnings drm/nouveau: add reservation to nouveau_bo_vma_del drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep

Hey great, now we only have one user left for fence waiting before reserving, lets fix that and remove fence lock: ttm_bo_cleanup_refs_or_queue and ttm_bo_cleanup_refs have to reserve before waiting, lets do it in the squash commit so we don't have to throw lock order around everywhere:

drm/ttm: remove fence_lock

-- Up to this point should be mergeable now

Then we start working on lru_lock removal slightly, this means the lru list no longer is empty but can contain only reserved buffers:

drm/ttm: do not check if list is empty in ttm_bo_force_list_clean drm/ttm: move reservations for ttm_bo_cleanup_refs

-- Still mergeable up to this point, just fixes

Patch series from this email: dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFER fence: dma-buf cross-device synchronization (v9) seqno-fence: Hardware dma-buf implementation of fencing (v3) reservation: cross-device reservation support reservation: Add lockdep annotation and selftests

Now hook it up to drm/ttm in a few steps: usage around reservations: drm/ttm: make ttm reservation calls behave like reservation calls drm/ttm: use dma_reservation api dma-buf: use reservations drm/ttm: allow drivers to pass custom dma_reservation_objects for a bo

then kill off the lru lock around reservation: drm/ttm: remove lru_lock around ttm_bo_reserve drm/ttm: simplify ttm_eu_*

The lru_lock removal patch removes the lock around lru_lock around the reservation, this will break the assumption that items on the lru list and swap list can always be reserved, and this gets patched up too. Is there any part in ttm you disagree with? I believe that this is all mergeable, the lru_lock removal patch could be moved to before the reservation parts, this might make merging easier, but I don't think there is any ttm part of the series that are wrong on a conceptual level.

~Maarten

....From another email

...
...
As previously discussed, I'm unfortunately not prepared to accept removal of the reserve-lru atomicity into the TTM code at this point. The current code is based on this assumption and removing it will end up with efficiencies, breaking the delayed delete code and probably a locking nightmare when trying to write new TTM code.

The lru lock removal patch fixed the delayed delete code, it really is not different from the current situation. In fact it is more clear without the guarantee what various parts are trying to protect.

Nothing prevents you from holding the lru_lock while trylocking,

[1] While this would not cause any deadlocks, Any decent lockdep code would establish lru->reserve as the locking order once a lru- reserve trylock succeeds, but the locking order is really reserve->lru for obvious reasons, which means we will get a lot of lockdep errors? Yes, there are a two reversals like these already in the TTM code, and I'm not very proud of them.

leaving that guarantee intact for that part. Can you really just review the patch and tell me where it breaks and/or makes the code unreadable?

OK. Now I'm looking at http://cgit.freedesktop.org/~mlankhorst/linux/tree/drivers/gpu/drm/ttm/ttm_b...

And let's start with a function that's seen some change, ttm_mem_evict_first:

*) Line 715: You're traversing a list using list_for_each() calling a function that may remove the list entr||||y *) Line 722: You're unlocking the lock protecting the list in the middle of list traversal *) Line 507: WARN_ON_ONCE in a code path quite likely to get called?

When will it get called? ttm_bo_delayed_delete calls it if it's already on delayed destroy list. ttm_mem_evict_first only calls it if on that list too. ttm_bo_swapout won't call it either if not on the list.

Two threads calling ttm_bo_delayed_delete at the same time?

Anyway, in the swapout code there is an additional deadlock and no loop on reserved buffers

And this is really my point, by removing the atomicity you get into this kind of deadlock- and locking mess, and TTM transits from hard to maintain to unmaintainable.

Furthermore, I've requested four times now that we bring this up on the design level, but you consequently refuse to answer my questions asking me to review the code instead. With that kind of argumentation, I could easily write a patch that changes your reservation objects to take a spinlock before reserving and a list of functions to be called with that block held just after reserving, and ask you to review that, Then we could compare the amount of stable and tested code needing a thorough retesting our pathes touch.

So, until we've agreed on a design, This is a firm NAK from my side when it comes to the TTM part.

Thanks, Thomas

Maarten Lankhorst

9:47 a.m.

Op 28-09-12 21:42, Thomas Hellstrom schreef:

...

On 09/28/2012 04:14 PM, Maarten Lankhorst wrote:

...
Hey,

Op 28-09-12 14:41, Maarten Lankhorst schreef:

...
Documentation says that code requiring dma-buf should add it to select, so inline fallbacks are not going to be used. A link error will make it obvious what went wrong, instead of silently doing nothing at runtime.

The whole patch series is in my tree, I use stg so things might move around, do not use for merging currently:

http://cgit.freedesktop.org/~mlankhorst/linux/log/?h=v10-wip

It contains everything in here plus the patches for ttm to make it work, I use a old snapshot of drm-next + merge of nouveau as base. Description of what the parts do:

Series to fix small api issues when moving over:

drm/ttm: Remove cpu_writers related code drm/ttm: Add ttm_bo_is_reserved function drm/radeon: Use ttm_bo_is_reserved drm/vmwgfx: use ttm_bo_is_reserved

drm/vmwgfx: remove use of fence_obj_args drm/ttm: remove sync_obj_arg drm/ttm: remove sync_obj_arg from ttm_bo_move_accel_cleanup drm/ttm: remove sync_arg entirely

drm/nouveau: unpin buffers before releasing to prevent lockdep warnings drm/nouveau: add reservation to nouveau_bo_vma_del drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep

Hey great, now we only have one user left for fence waiting before reserving, lets fix that and remove fence lock: ttm_bo_cleanup_refs_or_queue and ttm_bo_cleanup_refs have to reserve before waiting, lets do it in the squash commit so we don't have to throw lock order around everywhere:

drm/ttm: remove fence_lock

-- Up to this point should be mergeable now

Then we start working on lru_lock removal slightly, this means the lru list no longer is empty but can contain only reserved buffers:

drm/ttm: do not check if list is empty in ttm_bo_force_list_clean drm/ttm: move reservations for ttm_bo_cleanup_refs

-- Still mergeable up to this point, just fixes

Patch series from this email: dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFER fence: dma-buf cross-device synchronization (v9) seqno-fence: Hardware dma-buf implementation of fencing (v3) reservation: cross-device reservation support reservation: Add lockdep annotation and selftests

Now hook it up to drm/ttm in a few steps: usage around reservations: drm/ttm: make ttm reservation calls behave like reservation calls drm/ttm: use dma_reservation api dma-buf: use reservations drm/ttm: allow drivers to pass custom dma_reservation_objects for a bo

then kill off the lru lock around reservation: drm/ttm: remove lru_lock around ttm_bo_reserve drm/ttm: simplify ttm_eu_*

The lru_lock removal patch removes the lock around lru_lock around the reservation, this will break the assumption that items on the lru list and swap list can always be reserved, and this gets patched up too. Is there any part in ttm you disagree with? I believe that this is all mergeable, the lru_lock removal patch could be moved to before the reservation parts, this might make merging easier, but I don't think there is any ttm part of the series that are wrong on a conceptual level.

~Maarten

....From another email

...
...
As previously discussed, I'm unfortunately not prepared to accept removal of the reserve-lru atomicity into the TTM code at this point. The current code is based on this assumption and removing it will end up with efficiencies, breaking the delayed delete code and probably a locking nightmare when trying to write new TTM code.

The lru lock removal patch fixed the delayed delete code, it really is not different from the current situation. In fact it is more clear without the guarantee what various parts are trying to protect.

Nothing prevents you from holding the lru_lock while trylocking,

[1] While this would not cause any deadlocks, Any decent lockdep code would establish lru->reserve as the locking order once a lru- reserve trylock succeeds, but the locking order is really reserve->lru for obvious reasons, which means we will get a lot of lockdep errors? Yes, there are a two reversals like these already in the TTM code, and I'm not very proud of them.

I was doing a evil hack where I 'released' lru_lock to lockdep before doing the annotation for a blocking acquire, and left trylock annotations as they were. This made lockdep do the right thing.

...

And this is even before it starts to get interesting, like how you guarantee that when you release a buffer from the delayed delete list, you're the only process having a reference?

l thought list_kref made sure of that? Even if not the only one with a reference, the list_empty check would make sure it would only run once. I'l fix it up again so it doesn't become a WARN_ON_ONCE, I didn't know it could run multiple times at the time, so I'll change it back to unlikely.

...

Now, it's probably possible to achieve what you're trying to do, if we accept the lock reversal in [1], but since I have newborn twins and I have about one hour of spare time a week with I now spent on this review and I guess there are countless more hours before this can work. (These code paths were never tested, right?) One of the biggest TTM reworks was to introduce the atomicity assumption and remove a lot of code that was prone to deadlocks, races and buffer leaks. I'm not prepared to revert that work without an extremely good reason, and "It can be done" is not such a reason.

Deepest apologies, I only did a quick glance at the code part of this email, overlooked this part since I was swamped with other things and meant to do a full reply on monday. I didn't mean to make it sound like I only cared blindly about merging my code, just wanted to find a good solution.

...

We *need* to carefully weigh it against any benefits you have in your work, and you need to test these codepaths in parallell cases subject to heavy aperture / vram thrashing and frequent signals causing interrupted waits.

Agreed, is there already a tester for this or should I write my own?

...

And I think you need to present the gains in your work that can motivate the testing-and review time for this.

Agreed.

~Maarten

Thomas Hellstrom

2 Oct 2 Oct

6:46 a.m.

On 10/01/2012 11:47 AM, Maarten Lankhorst wrote:

...

Op 28-09-12 21:42, Thomas Hellstrom schreef:

...
On 09/28/2012 04:14 PM, Maarten Lankhorst wrote:

...
Hey,

Op 28-09-12 14:41, Maarten Lankhorst schreef:

...
Documentation says that code requiring dma-buf should add it to select, so inline fallbacks are not going to be used. A link error will make it obvious what went wrong, instead of silently doing nothing at runtime.

The whole patch series is in my tree, I use stg so things might move around, do not use for merging currently:

http://cgit.freedesktop.org/~mlankhorst/linux/log/?h=v10-wip

It contains everything in here plus the patches for ttm to make it work, I use a old snapshot of drm-next + merge of nouveau as base. Description of what the parts do:

Series to fix small api issues when moving over:

drm/ttm: Remove cpu_writers related code drm/ttm: Add ttm_bo_is_reserved function drm/radeon: Use ttm_bo_is_reserved drm/vmwgfx: use ttm_bo_is_reserved

drm/vmwgfx: remove use of fence_obj_args drm/ttm: remove sync_obj_arg drm/ttm: remove sync_obj_arg from ttm_bo_move_accel_cleanup drm/ttm: remove sync_arg entirely

drm/nouveau: unpin buffers before releasing to prevent lockdep warnings drm/nouveau: add reservation to nouveau_bo_vma_del drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep

Hey great, now we only have one user left for fence waiting before reserving, lets fix that and remove fence lock: ttm_bo_cleanup_refs_or_queue and ttm_bo_cleanup_refs have to reserve before waiting, lets do it in the squash commit so we don't have to throw lock order around everywhere:

drm/ttm: remove fence_lock

-- Up to this point should be mergeable now

Then we start working on lru_lock removal slightly, this means the lru list no longer is empty but can contain only reserved buffers:

drm/ttm: do not check if list is empty in ttm_bo_force_list_clean drm/ttm: move reservations for ttm_bo_cleanup_refs

-- Still mergeable up to this point, just fixes

Patch series from this email: dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFER fence: dma-buf cross-device synchronization (v9) seqno-fence: Hardware dma-buf implementation of fencing (v3) reservation: cross-device reservation support reservation: Add lockdep annotation and selftests

Now hook it up to drm/ttm in a few steps: usage around reservations: drm/ttm: make ttm reservation calls behave like reservation calls drm/ttm: use dma_reservation api dma-buf: use reservations drm/ttm: allow drivers to pass custom dma_reservation_objects for a bo

then kill off the lru lock around reservation: drm/ttm: remove lru_lock around ttm_bo_reserve drm/ttm: simplify ttm_eu_*

The lru_lock removal patch removes the lock around lru_lock around the reservation, this will break the assumption that items on the lru list and swap list can always be reserved, and this gets patched up too. Is there any part in ttm you disagree with? I believe that this is all mergeable, the lru_lock removal patch could be moved to before the reservation parts, this might make merging easier, but I don't think there is any ttm part of the series that are wrong on a conceptual level.

~Maarten

....From another email

...
...
As previously discussed, I'm unfortunately not prepared to accept removal of the reserve-lru atomicity into the TTM code at this point. The current code is based on this assumption and removing it will end up with efficiencies, breaking the delayed delete code and probably a locking nightmare when trying to write new TTM code.

The lru lock removal patch fixed the delayed delete code, it really is not different from the current situation. In fact it is more clear without the guarantee what various parts are trying to protect.

Nothing prevents you from holding the lru_lock while trylocking,

[1] While this would not cause any deadlocks, Any decent lockdep code would establish lru->reserve as the locking order once a lru- reserve trylock succeeds, but the locking order is really reserve->lru for obvious reasons, which means we will get a lot of lockdep errors? Yes, there are a two reversals like these already in the TTM code, and I'm not very proud of them.

I was doing a evil hack where I 'released' lru_lock to lockdep before doing the annotation for a blocking acquire, and left trylock annotations as they were. This made lockdep do the right thing.

I've never looked into how lockdep works. Is this something that can be done permanently or just for testing purposes? Although not related to this, is it possible to do something similar to the trylock reversal in the fault() code where mmap_sem() and reserve() change order using a reserve trylock?

...

...
And this is even before it starts to get interesting, like how you guarantee that when you release a buffer from the delayed delete list, you're the only process having a reference?

l thought list_kref made sure of that? Even if not the only one with a reference, the list_empty check would make sure it would only run once. I'l fix it up again so it doesn't become a WARN_ON_ONCE, I didn't know it could run multiple times at the time, so I'll change it back to unlikely.

Yes, you've probably right. A case we've seen earlier (before the atomicity was introduced) was one or more threads picked up a bo from the LRU list and prepared to reserve it, while the delayed delete function removed them from the ddestroy list. Then the first thread queued an accelerated eviction, adding a new fence and the bo was left hanging. I don't think that can happen with the reserve trylocks within the lru spinlock, though.

...

...
Now, it's probably possible to achieve what you're trying to do, if we accept the lock reversal in [1], but since I have newborn twins and I have about one hour of spare time a week with I now spent on this review and I guess there are countless more hours before this can work. (These code paths were never tested, right?) One of the biggest TTM reworks was to introduce the atomicity assumption and remove a lot of code that was prone to deadlocks, races and buffer leaks. I'm not prepared to revert that work without an extremely good reason, and "It can be done" is not such a reason.

Deepest apologies, I only did a quick glance at the code part of this email, overlooked this part since I was swamped with other things and meant to do a full reply on monday. I didn't mean to make it sound like I only cared blindly about merging my code, just wanted to find a good solution.

...
We *need* to carefully weigh it against any benefits you have in your work, and you need to test these codepaths in parallell cases subject to heavy aperture / vram thrashing and frequent signals causing interrupted waits.

Agreed, is there already a tester for this or should I write my own?

Although I think it would be nice to have a highly parallel execbuf implementation on an extremely simple software GPU, what I typically do is to take an existing driver (none of them implements parallel reserve yet, but vmware is about to soon)

a) Use an application that frequently recycles buffers, so that the delayed-delete code gets busy (Perhaps google-earth, panning over a landscape not too high above the earth) b) Hack the drivers aperture / vram sizes to something small, so that you can see that the eviction code gets hit. c) Adjust the memory limits in TTM sysfs memory accounting (You can write and change on the fly), so that you can see that the swapping code gets hit. d) Code a signal delivery so that every 20th time or so the eviction code is about to wait, it receives an -ERESTARTSYS with a harmless signal. e) start another instance of google-earth.

/Thomas

...

...
And I think you need to present the gains in your work that can motivate the testing-and review time for this.

Agreed.

~Maarten

Daniel Vetter

8:03 a.m.

New subject: [PATCH 1/5] dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFER

On Tue, Oct 02, 2012 at 08:46:32AM +0200, Thomas Hellstrom wrote:

...

On 10/01/2012 11:47 AM, Maarten Lankhorst wrote:

...
I was doing a evil hack where I 'released' lru_lock to lockdep before doing the annotation for a blocking acquire, and left trylock annotations as they were. This made lockdep do the right thing.

I've never looked into how lockdep works. Is this something that can be done permanently or just for testing purposes? Although not related to this, is it possible to do something similar to the trylock reversal in the fault() code where mmap_sem() and reserve() change order using a reserve trylock?

lockdep just requires a bunch of annotations, is a compile-time configure option CONFIG_PROVE_LOCKING and if disabled, has zero overhead. And it's rather awesome in detected deadlocks and handling crazy locking schemes correctly: - correctly handles trylocks - correctly handles nested locking (i.e. grabbing a global lock, then grabbing subordinate locks in an unordered sequence since the global lock ensures that no deadlocks can happen). - any kinds of inversions with special contexts like hardirq, softirq - same for page-reclaim, i.e. it will yell if you could (potentially) deadlock because your shrinker grabs a lock that you hold while calling kmalloc. - there are special annotates for various subsystems, e.g. to check for del_timer_sync vs. locks held by that timer. Or the console_lock annotations I've just recently submitted. - all that with a really flexible set of annotation primitives that afaics should work for almost any insane locking scheme. The fact that Maarten could come up with proper reservation annotations without any changes to lockdep testifies this (he only had to fix a tiny thing to make it a bit more strict in a corner case).

In short I think it's made of awesome. The only downside is that it lacks documentation, you have to read the code to understand it :(

The reason I've suggested to Maarten to abolish the trylock_reservation within the lru_lock is that in that way lockdep only ever sees the trylock, and hence is less strict about complainig about deadlocks. But semantically it's an unconditional reserve. Maarten had some horrible hacks that leaked the lockdep annotations out of the new reservation code, which allowed ttm to be properly annotated. But those also reduced the usefulness for any other users of the reservation code, and so Maarten looked into whether he could remove that trylock dance in ttm.

Imo having excellent lockdep support for cross-device reservations is a requirment, and ending up with less strict annotations for either ttm based drivers or other drivers is not good. And imo the ugly layering that Maarten had in his first proof-of-concept also indicates that something is amiss in the design.

[I'll refrain from comment on ttm specifics to not make a fool of me ;-)]

...

...
...
And this is even before it starts to get interesting, like how you guarantee that when you release a buffer from the delayed delete list, you're the only process having a reference?

l thought list_kref made sure of that? Even if not the only one with a reference, the list_empty check would make sure it would only run once. I'l fix it up again so it doesn't become a WARN_ON_ONCE, I didn't know it could run multiple times at the time, so I'll change it back to unlikely.

Yes, you've probably right. A case we've seen earlier (before the atomicity was introduced) was one or more threads picked up a bo from the LRU list and prepared to reserve it, while the delayed delete function removed them from the ddestroy list. Then the first thread queued an accelerated eviction, adding a new fence and the bo was left hanging. I don't think that can happen with the reserve trylocks within the lru spinlock, though.

...
...
Now, it's probably possible to achieve what you're trying to do, if we accept the lock reversal in [1], but since I have newborn twins and I have about one hour of spare time a week with I now spent on this review and I guess there are countless more hours before this can work. (These code paths were never tested, right?) One of the biggest TTM reworks was to introduce the atomicity assumption and remove a lot of code that was prone to deadlocks, races and buffer leaks. I'm not prepared to revert that work without an extremely good reason, and "It can be done" is not such a reason.

Deepest apologies, I only did a quick glance at the code part of this email, overlooked this part since I was swamped with other things and meant to do a full reply on monday. I didn't mean to make it sound like I only cared blindly about merging my code, just wanted to find a good solution.

...
We *need* to carefully weigh it against any benefits you have in your work, and you need to test these codepaths in parallell cases subject to heavy aperture / vram thrashing and frequent signals causing interrupted waits.

Agreed, is there already a tester for this or should I write my own?

Although I think it would be nice to have a highly parallel execbuf implementation on an extremely simple software GPU, what I typically do is to take an existing driver (none of them implements parallel reserve yet, but vmware is about to soon)

a) Use an application that frequently recycles buffers, so that the delayed-delete code gets busy (Perhaps google-earth, panning over a landscape not too high above the earth) b) Hack the drivers aperture / vram sizes to something small, so that you can see that the eviction code gets hit. c) Adjust the memory limits in TTM sysfs memory accounting (You can write and change on the fly), so that you can see that the swapping code gets hit. d) Code a signal delivery so that every 20th time or so the eviction code is about to wait, it receives an -ERESTARTSYS with a harmless signal. e) start another instance of google-earth.

tbh, this should be a simple testsuite that you can just run. Like we're (slowly) building up for drm/i915 in intel-gpu-tools. At least that'll be one of the merge requirements for i915.ko.

Cheers, Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Thomas Hellstrom

3 Oct 3 Oct

7:45 a.m.

On 10/02/2012 10:03 AM, Daniel Vetter wrote:

...

On Tue, Oct 02, 2012 at 08:46:32AM +0200, Thomas Hellstrom wrote:

...
On 10/01/2012 11:47 AM, Maarten Lankhorst wrote:

...
I was doing a evil hack where I 'released' lru_lock to lockdep before doing the annotation for a blocking acquire, and left trylock annotations as they were. This made lockdep do the right thing.

I've never looked into how lockdep works. Is this something that can be done permanently or just for testing purposes? Although not related to this, is it possible to do something similar to the trylock reversal in the fault() code where mmap_sem() and reserve() change order using a reserve trylock?

lockdep just requires a bunch of annotations, is a compile-time configure option CONFIG_PROVE_LOCKING and if disabled, has zero overhead. And it's rather awesome in detected deadlocks and handling crazy locking schemes correctly:

correctly handles trylocks

correctly handles nested locking (i.e. grabbing a global lock, then grabbing subordinate locks in an unordered sequence since the global lock ensures that no deadlocks can happen).

any kinds of inversions with special contexts like hardirq, softirq

same for page-reclaim, i.e. it will yell if you could (potentially) deadlock because your shrinker grabs a lock that you hold while calling kmalloc.

there are special annotates for various subsystems, e.g. to check for del_timer_sync vs. locks held by that timer. Or the console_lock annotations I've just recently submitted.

all that with a really flexible set of annotation primitives that afaics should work for almost any insane locking scheme. The fact that Maarten could come up with proper reservation annotations without any changes to lockdep testifies this (he only had to fix a tiny thing to make it a bit more strict in a corner case).

In short I think it's made of awesome. The only downside is that it lacks documentation, you have to read the code to understand it :(

The reason I've suggested to Maarten to abolish the trylock_reservation within the lru_lock is that in that way lockdep only ever sees the trylock, and hence is less strict about complainig about deadlocks. But semantically it's an unconditional reserve. Maarten had some horrible hacks that leaked the lockdep annotations out of the new reservation code, which allowed ttm to be properly annotated. But those also reduced the usefulness for any other users of the reservation code, and so Maarten looked into whether he could remove that trylock dance in ttm.

Imo having excellent lockdep support for cross-device reservations is a requirment, and ending up with less strict annotations for either ttm based drivers or other drivers is not good. And imo the ugly layering that Maarten had in his first proof-of-concept also indicates that something is amiss in the design.

So if I understand you correctly, the reservation changes in TTM are motivated by the fact that otherwise, in the generic reservation code, lockdep can only be annotated for a trylock and not a waiting lock, when it *is* in fact a waiting lock.

I'm completely unfamiliar with setting up lockdep annotations, but the only place a deadlock might occur is if the trylock fails and we do a wait_for_unreserve(). Isn't it possible to annotate the call to wait_for_unreserve() just like an interruptible waiting lock (that is always interrupted, but at least any deadlock will be catched?).

/Thomas

Daniel Vetter

7:54 a.m.

On Wed, Oct 3, 2012 at 9:45 AM, Thomas Hellstrom thellstrom@vmware.com wrote:

...

On 10/02/2012 10:03 AM, Daniel Vetter wrote:

...
On Tue, Oct 02, 2012 at 08:46:32AM +0200, Thomas Hellstrom wrote:

...
On 10/01/2012 11:47 AM, Maarten Lankhorst wrote:

...
I was doing a evil hack where I 'released' lru_lock to lockdep before doing the annotation for a blocking acquire, and left trylock annotations as they were. This made lockdep do the right thing.

I've never looked into how lockdep works. Is this something that can be done permanently or just for testing purposes? Although not related to this, is it possible to do something similar to the trylock reversal in the fault() code where mmap_sem() and reserve() change order using a reserve trylock?

lockdep just requires a bunch of annotations, is a compile-time configure option CONFIG_PROVE_LOCKING and if disabled, has zero overhead. And it's rather awesome in detected deadlocks and handling crazy locking schemes correctly:

correctly handles trylocks

correctly handles nested locking (i.e. grabbing a global lock, then grabbing subordinate locks in an unordered sequence since the global lock ensures that no deadlocks can happen).

any kinds of inversions with special contexts like hardirq, softirq

same for page-reclaim, i.e. it will yell if you could (potentially) deadlock because your shrinker grabs a lock that you hold while calling kmalloc.

there are special annotates for various subsystems, e.g. to check for del_timer_sync vs. locks held by that timer. Or the console_lock annotations I've just recently submitted.

all that with a really flexible set of annotation primitives that afaics should work for almost any insane locking scheme. The fact that Maarten could come up with proper reservation annotations without any changes

to lockdep testifies this (he only had to fix a tiny thing to make it a bit more strict in a corner case).

In short I think it's made of awesome. The only downside is that it lacks documentation, you have to read the code to understand it :(

The reason I've suggested to Maarten to abolish the trylock_reservation within the lru_lock is that in that way lockdep only ever sees the trylock, and hence is less strict about complainig about deadlocks. But semantically it's an unconditional reserve. Maarten had some horrible hacks that leaked the lockdep annotations out of the new reservation code, which allowed ttm to be properly annotated. But those also reduced the usefulness for any other users of the reservation code, and so Maarten looked into whether he could remove that trylock dance in ttm.

Imo having excellent lockdep support for cross-device reservations is a requirment, and ending up with less strict annotations for either ttm based drivers or other drivers is not good. And imo the ugly layering that Maarten had in his first proof-of-concept also indicates that something is amiss in the design.

So if I understand you correctly, the reservation changes in TTM are motivated by the fact that otherwise, in the generic reservation code, lockdep can only be annotated for a trylock and not a waiting lock, when it *is* in fact a waiting lock.

I'm completely unfamiliar with setting up lockdep annotations, but the only place a deadlock might occur is if the trylock fails and we do a wait_for_unreserve(). Isn't it possible to annotate the call to wait_for_unreserve() just like an interruptible waiting lock (that is always interrupted, but at least any deadlock will be catched?).

Hm, I have to admit that idea hasn't crossed my mind, but it's indeed a hole in our current reservation lockdep annotations - since we're blocking for the unreserve, other threads could potential block waiting on us to release a lock we're holding already, resulting in a deadlock.

Since no other locking primitive that I know of has this wait_for_unlocked interface, I don't know how we could map this in lockdep. One idea is to grab the lock and release it again immediately (only in the annotations, not the real lock ofc). But I need to check the lockdep code to see whether that doesn't trip it up.

Cheers, Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Thomas Hellstrom

8:37 a.m.

On 10/03/2012 09:54 AM, Daniel Vetter wrote:

...

On Wed, Oct 3, 2012 at 9:45 AM, Thomas Hellstrom thellstrom@vmware.com wrote:

...
On 10/02/2012 10:03 AM, Daniel Vetter wrote:

...
On Tue, Oct 02, 2012 at 08:46:32AM +0200, Thomas Hellstrom wrote:

...
On 10/01/2012 11:47 AM, Maarten Lankhorst wrote:

...
I was doing a evil hack where I 'released' lru_lock to lockdep before doing the annotation for a blocking acquire, and left trylock annotations as they were. This made lockdep do the right thing.

I've never looked into how lockdep works. Is this something that can be done permanently or just for testing purposes? Although not related to this, is it possible to do something similar to the trylock reversal in the fault() code where mmap_sem() and reserve() change order using a reserve trylock?

lockdep just requires a bunch of annotations, is a compile-time configure option CONFIG_PROVE_LOCKING and if disabled, has zero overhead. And it's rather awesome in detected deadlocks and handling crazy locking schemes correctly:

correctly handles trylocks

correctly handles nested locking (i.e. grabbing a global lock, then grabbing subordinate locks in an unordered sequence since the global lock ensures that no deadlocks can happen).

any kinds of inversions with special contexts like hardirq, softirq

same for page-reclaim, i.e. it will yell if you could (potentially) deadlock because your shrinker grabs a lock that you hold while calling kmalloc.

there are special annotates for various subsystems, e.g. to check for del_timer_sync vs. locks held by that timer. Or the console_lock annotations I've just recently submitted.

all that with a really flexible set of annotation primitives that afaics should work for almost any insane locking scheme. The fact that Maarten could come up with proper reservation annotations without any changes

to lockdep testifies this (he only had to fix a tiny thing to make it a bit more strict in a corner case).

In short I think it's made of awesome. The only downside is that it lacks documentation, you have to read the code to understand it :(

The reason I've suggested to Maarten to abolish the trylock_reservation within the lru_lock is that in that way lockdep only ever sees the trylock, and hence is less strict about complainig about deadlocks. But semantically it's an unconditional reserve. Maarten had some horrible hacks that leaked the lockdep annotations out of the new reservation code, which allowed ttm to be properly annotated. But those also reduced the usefulness for any other users of the reservation code, and so Maarten looked into whether he could remove that trylock dance in ttm.

Imo having excellent lockdep support for cross-device reservations is a requirment, and ending up with less strict annotations for either ttm based drivers or other drivers is not good. And imo the ugly layering that Maarten had in his first proof-of-concept also indicates that something is amiss in the design.

So if I understand you correctly, the reservation changes in TTM are motivated by the fact that otherwise, in the generic reservation code, lockdep can only be annotated for a trylock and not a waiting lock, when it *is* in fact a waiting lock.

I'm completely unfamiliar with setting up lockdep annotations, but the only place a deadlock might occur is if the trylock fails and we do a wait_for_unreserve(). Isn't it possible to annotate the call to wait_for_unreserve() just like an interruptible waiting lock (that is always interrupted, but at least any deadlock will be catched?).

Hm, I have to admit that idea hasn't crossed my mind, but it's indeed a hole in our current reservation lockdep annotations - since we're blocking for the unreserve, other threads could potential block waiting on us to release a lock we're holding already, resulting in a deadlock.

Since no other locking primitive that I know of has this wait_for_unlocked interface, I don't know how we could map this in lockdep. One idea is to grab the lock and release it again immediately (only in the annotations, not the real lock ofc). But I need to check the lockdep code to see whether that doesn't trip it up.

I imagine doing the same as mutex_lock_interruptible() does in the interrupted path should work...

...

Cheers, Daniel

Daniel Vetter

8:53 a.m.

On Wed, Oct 3, 2012 at 10:37 AM, Thomas Hellstrom thellstrom@vmware.com wrote:

...

...
...
So if I understand you correctly, the reservation changes in TTM are motivated by the fact that otherwise, in the generic reservation code, lockdep can only be annotated for a trylock and not a waiting lock, when it *is* in fact a waiting lock.

I'm completely unfamiliar with setting up lockdep annotations, but the only place a deadlock might occur is if the trylock fails and we do a wait_for_unreserve(). Isn't it possible to annotate the call to wait_for_unreserve() just like an interruptible waiting lock (that is always interrupted, but at least any deadlock will be catched?).

Hm, I have to admit that idea hasn't crossed my mind, but it's indeed a hole in our current reservation lockdep annotations - since we're blocking for the unreserve, other threads could potential block waiting on us to release a lock we're holding already, resulting in a deadlock.

Since no other locking primitive that I know of has this wait_for_unlocked interface, I don't know how we could map this in lockdep. One idea is to grab the lock and release it again immediately (only in the annotations, not the real lock ofc). But I need to check the lockdep code to see whether that doesn't trip it up.

I imagine doing the same as mutex_lock_interruptible() does in the interrupted path should work...

It simply calls the unlock lockdep annotation function if it breaks out. So doing a lock/unlock cycle in wait_unreserve should do what we want.

And to properly annotate the ttm reserve paths we could just add an unconditional wait_unreserve call at the beginning like you suggested (maybe with #ifdef CONFIG_PROVE_LOCKING in case ppl freak out about the added atomic read in the uncontended case). -Daniel

-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Thomas Hellstrom

10:53 a.m.

On 10/03/2012 10:53 AM, Daniel Vetter wrote:

...

On Wed, Oct 3, 2012 at 10:37 AM, Thomas Hellstrom thellstrom@vmware.com wrote:

...
...
...
So if I understand you correctly, the reservation changes in TTM are motivated by the fact that otherwise, in the generic reservation code, lockdep can only be annotated for a trylock and not a waiting lock, when it *is* in fact a waiting lock.

I'm completely unfamiliar with setting up lockdep annotations, but the only place a deadlock might occur is if the trylock fails and we do a wait_for_unreserve(). Isn't it possible to annotate the call to wait_for_unreserve() just like an interruptible waiting lock (that is always interrupted, but at least any deadlock will be catched?).

Hm, I have to admit that idea hasn't crossed my mind, but it's indeed a hole in our current reservation lockdep annotations - since we're blocking for the unreserve, other threads could potential block waiting on us to release a lock we're holding already, resulting in a deadlock.

Since no other locking primitive that I know of has this wait_for_unlocked interface, I don't know how we could map this in lockdep. One idea is to grab the lock and release it again immediately (only in the annotations, not the real lock ofc). But I need to check the lockdep code to see whether that doesn't trip it up.

I imagine doing the same as mutex_lock_interruptible() does in the interrupted path should work...

It simply calls the unlock lockdep annotation function if it breaks out. So doing a lock/unlock cycle in wait_unreserve should do what we want.

And to properly annotate the ttm reserve paths we could just add an unconditional wait_unreserve call at the beginning like you suggested (maybe with #ifdef CONFIG_PROVE_LOCKING in case ppl freak out about the added atomic read in the uncontended case). -Daniel

I think atomic_read()s are cheap, at least on intel as IIRC they don't require bus locking, still I think we should keep it within CONFIG_PROVE_LOCKING

which btw reminds me there's an optimization that can be done in that one should really only call atomic_cmpxchg() if a preceding atomic_read() hints that it will succeed.

Now, does this mean TTM can keep the atomic reserve <-> lru list removal?

Thanks, Thomas

Maarten Lankhorst

12:46 p.m.

Op 03-10-12 12:53, Thomas Hellstrom schreef:

...

On 10/03/2012 10:53 AM, Daniel Vetter wrote:

...
On Wed, Oct 3, 2012 at 10:37 AM, Thomas Hellstrom thellstrom@vmware.com wrote:

...
...
...
So if I understand you correctly, the reservation changes in TTM are motivated by the fact that otherwise, in the generic reservation code, lockdep can only be annotated for a trylock and not a waiting lock, when it *is* in fact a waiting lock.

I'm completely unfamiliar with setting up lockdep annotations, but the only place a deadlock might occur is if the trylock fails and we do a wait_for_unreserve(). Isn't it possible to annotate the call to wait_for_unreserve() just like an interruptible waiting lock (that is always interrupted, but at least any deadlock will be catched?).

Hm, I have to admit that idea hasn't crossed my mind, but it's indeed a hole in our current reservation lockdep annotations - since we're blocking for the unreserve, other threads could potential block waiting on us to release a lock we're holding already, resulting in a deadlock.

Since no other locking primitive that I know of has this wait_for_unlocked interface, I don't know how we could map this in lockdep. One idea is to grab the lock and release it again immediately (only in the annotations, not the real lock ofc). But I need to check the lockdep code to see whether that doesn't trip it up.

I imagine doing the same as mutex_lock_interruptible() does in the interrupted path should work...

It simply calls the unlock lockdep annotation function if it breaks out. So doing a lock/unlock cycle in wait_unreserve should do what we want.

And to properly annotate the ttm reserve paths we could just add an unconditional wait_unreserve call at the beginning like you suggested (maybe with #ifdef CONFIG_PROVE_LOCKING in case ppl freak out about the added atomic read in the uncontended case). -Daniel

I think atomic_read()s are cheap, at least on intel as IIRC they don't require bus locking, still I think we should keep it within CONFIG_PROVE_LOCKING

which btw reminds me there's an optimization that can be done in that one should really only call atomic_cmpxchg() if a preceding atomic_read() hints that it will succeed.

Now, does this mean TTM can keep the atomic reserve <-> lru list removal?

I don't think it would be a good idea to keep this across devices, there's currently no callback to remove buffers off the lru list.

However I am convinced that the current behavior where swapout and eviction/destruction never ever do a blocking reserve should be preserved. I looked more into it and it seems to allow to recursely quite a few times between all the related commands, and it wouldn't surprise me if that turned out to be cause of the lockups before moving to the current code. no_wait_reserve in those functions should be removed and always treated as true.

Atomic lru_lock + reserve can still be done in the places where it matters though, but it might have to try the list for multiple bo's before it succeeds. As long as no blocking is done the effective behavior would stay the same.

~Maarten

Thomas Hellstrom

12:56 p.m.

On 10/03/2012 02:46 PM, Maarten Lankhorst wrote:

...

Op 03-10-12 12:53, Thomas Hellstrom schreef:

...
On 10/03/2012 10:53 AM, Daniel Vetter wrote:

...
On Wed, Oct 3, 2012 at 10:37 AM, Thomas Hellstrom thellstrom@vmware.com wrote:

...
...
...
So if I understand you correctly, the reservation changes in TTM are motivated by the fact that otherwise, in the generic reservation code, lockdep can only be annotated for a trylock and not a waiting lock, when it *is* in fact a waiting lock.

I'm completely unfamiliar with setting up lockdep annotations, but the only place a deadlock might occur is if the trylock fails and we do a wait_for_unreserve(). Isn't it possible to annotate the call to wait_for_unreserve() just like an interruptible waiting lock (that is always interrupted, but at least any deadlock will be catched?).

Hm, I have to admit that idea hasn't crossed my mind, but it's indeed a hole in our current reservation lockdep annotations - since we're blocking for the unreserve, other threads could potential block waiting on us to release a lock we're holding already, resulting in a deadlock.

Since no other locking primitive that I know of has this wait_for_unlocked interface, I don't know how we could map this in lockdep. One idea is to grab the lock and release it again immediately (only in the annotations, not the real lock ofc). But I need to check the lockdep code to see whether that doesn't trip it up.

I imagine doing the same as mutex_lock_interruptible() does in the interrupted path should work...

It simply calls the unlock lockdep annotation function if it breaks out. So doing a lock/unlock cycle in wait_unreserve should do what we want.

And to properly annotate the ttm reserve paths we could just add an unconditional wait_unreserve call at the beginning like you suggested (maybe with #ifdef CONFIG_PROVE_LOCKING in case ppl freak out about the added atomic read in the uncontended case). -Daniel

I think atomic_read()s are cheap, at least on intel as IIRC they don't require bus locking, still I think we should keep it within CONFIG_PROVE_LOCKING

which btw reminds me there's an optimization that can be done in that one should really only call atomic_cmpxchg() if a preceding atomic_read() hints that it will succeed.

Now, does this mean TTM can keep the atomic reserve <-> lru list removal?

I don't think it would be a good idea to keep this across devices,

Why?

...

there's currently no callback to remove buffers off the lru list.

So why don't we add one, and one to put them on the *correct* LRU list while unreserving.

/Thomas

Maarten Lankhorst

7:57 a.m.

Hey,

Op 03-10-12 09:45, Thomas Hellstrom schreef:

...

On 10/02/2012 10:03 AM, Daniel Vetter wrote:

...
On Tue, Oct 02, 2012 at 08:46:32AM +0200, Thomas Hellstrom wrote:

...
On 10/01/2012 11:47 AM, Maarten Lankhorst wrote:

...
I was doing a evil hack where I 'released' lru_lock to lockdep before doing the annotation for a blocking acquire, and left trylock annotations as they were. This made lockdep do the right thing.

I've never looked into how lockdep works. Is this something that can be done permanently or just for testing purposes? Although not related to this, is it possible to do something similar to the trylock reversal in the fault() code where mmap_sem() and reserve() change order using a reserve trylock?

lockdep just requires a bunch of annotations, is a compile-time configure option CONFIG_PROVE_LOCKING and if disabled, has zero overhead. And it's rather awesome in detected deadlocks and handling crazy locking schemes correctly:

correctly handles trylocks

correctly handles nested locking (i.e. grabbing a global lock, then grabbing subordinate locks in an unordered sequence since the global lock ensures that no deadlocks can happen).

any kinds of inversions with special contexts like hardirq, softirq

same for page-reclaim, i.e. it will yell if you could (potentially) deadlock because your shrinker grabs a lock that you hold while calling kmalloc.

there are special annotates for various subsystems, e.g. to check for del_timer_sync vs. locks held by that timer. Or the console_lock annotations I've just recently submitted.

all that with a really flexible set of annotation primitives that afaics should work for almost any insane locking scheme. The fact that Maarten could come up with proper reservation annotations without any changes to lockdep testifies this (he only had to fix a tiny thing to make it a bit more strict in a corner case).

In short I think it's made of awesome. The only downside is that it lacks documentation, you have to read the code to understand it :(

The reason I've suggested to Maarten to abolish the trylock_reservation within the lru_lock is that in that way lockdep only ever sees the trylock, and hence is less strict about complainig about deadlocks. But semantically it's an unconditional reserve. Maarten had some horrible hacks that leaked the lockdep annotations out of the new reservation code, which allowed ttm to be properly annotated. But those also reduced the usefulness for any other users of the reservation code, and so Maarten looked into whether he could remove that trylock dance in ttm.

Imo having excellent lockdep support for cross-device reservations is a requirment, and ending up with less strict annotations for either ttm based drivers or other drivers is not good. And imo the ugly layering that Maarten had in his first proof-of-concept also indicates that something is amiss in the design.

So if I understand you correctly, the reservation changes in TTM are motivated by the fact that otherwise, in the generic reservation code, lockdep can only be annotated for a trylock and not a waiting lock, when it *is* in fact a waiting lock.

I'm completely unfamiliar with setting up lockdep annotations, but the only place a deadlock might occur is if the trylock fails and we do a wait_for_unreserve(). Isn't it possible to annotate the call to wait_for_unreserve() just like an interruptible waiting lock (that is always interrupted, but at least any deadlock will be catched?).

That would not find all bugs, lockdep is meant to find even theoretical bugs, so annotating it as a waiting lock makes more sense. Otherwise lockdep will only barf when the initial trylock fails.

~Maarten

Thomas Hellstrom

8:35 a.m.

On 10/03/2012 09:57 AM, Maarten Lankhorst wrote:

...

Hey,

Op 03-10-12 09:45, Thomas Hellstrom schreef:

...
On 10/02/2012 10:03 AM, Daniel Vetter wrote:

...
On Tue, Oct 02, 2012 at 08:46:32AM +0200, Thomas Hellstrom wrote:

...
On 10/01/2012 11:47 AM, Maarten Lankhorst wrote:

...
I was doing a evil hack where I 'released' lru_lock to lockdep before doing the annotation for a blocking acquire, and left trylock annotations as they were. This made lockdep do the right thing.

I've never looked into how lockdep works. Is this something that can be done permanently or just for testing purposes? Although not related to this, is it possible to do something similar to the trylock reversal in the fault() code where mmap_sem() and reserve() change order using a reserve trylock?

lockdep just requires a bunch of annotations, is a compile-time configure option CONFIG_PROVE_LOCKING and if disabled, has zero overhead. And it's rather awesome in detected deadlocks and handling crazy locking schemes correctly:

correctly handles trylocks

correctly handles nested locking (i.e. grabbing a global lock, then grabbing subordinate locks in an unordered sequence since the global lock ensures that no deadlocks can happen).

any kinds of inversions with special contexts like hardirq, softirq

same for page-reclaim, i.e. it will yell if you could (potentially) deadlock because your shrinker grabs a lock that you hold while calling kmalloc.

there are special annotates for various subsystems, e.g. to check for del_timer_sync vs. locks held by that timer. Or the console_lock annotations I've just recently submitted.

all that with a really flexible set of annotation primitives that afaics should work for almost any insane locking scheme. The fact that Maarten could come up with proper reservation annotations without any changes to lockdep testifies this (he only had to fix a tiny thing to make it a bit more strict in a corner case).

In short I think it's made of awesome. The only downside is that it lacks documentation, you have to read the code to understand it :(

The reason I've suggested to Maarten to abolish the trylock_reservation within the lru_lock is that in that way lockdep only ever sees the trylock, and hence is less strict about complainig about deadlocks. But semantically it's an unconditional reserve. Maarten had some horrible hacks that leaked the lockdep annotations out of the new reservation code, which allowed ttm to be properly annotated. But those also reduced the usefulness for any other users of the reservation code, and so Maarten looked into whether he could remove that trylock dance in ttm.

Imo having excellent lockdep support for cross-device reservations is a requirment, and ending up with less strict annotations for either ttm based drivers or other drivers is not good. And imo the ugly layering that Maarten had in his first proof-of-concept also indicates that something is amiss in the design.

So if I understand you correctly, the reservation changes in TTM are motivated by the fact that otherwise, in the generic reservation code, lockdep can only be annotated for a trylock and not a waiting lock, when it *is* in fact a waiting lock.

I'm completely unfamiliar with setting up lockdep annotations, but the only place a deadlock might occur is if the trylock fails and we do a wait_for_unreserve(). Isn't it possible to annotate the call to wait_for_unreserve() just like an interruptible waiting lock (that is always interrupted, but at least any deadlock will be catched?).

That would not find all bugs, lockdep is meant to find even theoretical bugs, so annotating it as a waiting lock makes more sense. Otherwise lockdep will only barf when the initial trylock fails.

Really, starting a waiting reserve with a call to wait_for_unreserve() if CONFIG_LOCKDEP is set shouldn't be that bad :)? That would catch also the the theoretical errors. In fact, it should suffice with annotating for such a call?

/Thomas

...

~Maarten

4806

days inactive

4815

days old

linaro-mm-sig@lists.linaro.org

26 comments

participants

tags (0)

participants (4)

Daniel Vetter
Maarten Lankhorst
Thomas Hellstrom
Thomas Hellström