We get 1 warning when building kernel with W=1:
drivers/dma-buf/sw_sync.c:87:23: warning: no previous prototype for 'sync_timeline_create' [-Wmissing-prototypes]
In fact, this function is only used in the file in which it is
declared and don't need a declaration, but can be made static.
So this patch marks it 'static'.
Signed-off-by: Baoyou Xie <baoyou.xie(a)linaro.org>
---
drivers/dma-buf/sw_sync.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
index 62e8e6d..6f16c85 100644
--- a/drivers/dma-buf/sw_sync.c
+++ b/drivers/dma-buf/sw_sync.c
@@ -84,7 +84,7 @@ static inline struct sync_pt *fence_to_sync_pt(struct fence *fence)
* Creates a new sync_timeline. Returns the sync_timeline object or NULL in
* case of error.
*/
-struct sync_timeline *sync_timeline_create(const char *name)
+static struct sync_timeline *sync_timeline_create(const char *name)
{
struct sync_timeline *obj;
--
2.7.4
We get 1 warning when building kernel with W=1:
drivers/dma-buf/sw_sync.c:87:23: warning: no previous prototype for 'sync_timeline_create' [-Wmissing-prototypes]
In fact, this function is only used in the file in which it is
declared and don't need a declaration, but can be made static.
So this patch marks it 'static'.
Signed-off-by: Baoyou Xie <baoyou.xie(a)linaro.org>
---
drivers/dma-buf/sw_sync.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
index 62e8e6d..6f16c85 100644
--- a/drivers/dma-buf/sw_sync.c
+++ b/drivers/dma-buf/sw_sync.c
@@ -84,7 +84,7 @@ static inline struct sync_pt *fence_to_sync_pt(struct fence *fence)
* Creates a new sync_timeline. Returns the sync_timeline object or NULL in
* case of error.
*/
-struct sync_timeline *sync_timeline_create(const char *name)
+static struct sync_timeline *sync_timeline_create(const char *name)
{
struct sync_timeline *obj;
--
2.7.4
Currently we install a callback for performing poll on a dma-buf,
irrespective of the timeout. This involves taking a spinlock, as well as
unnecessary work, and greatly reduces scaling of poll(.timeout=0) across
multiple threads.
We can query whether the poll will block prior to installing the
callback to make the busy-query fast.
Single thread: 60% faster
8 threads on 4 (+4 HT) cores: 600% faster
Still not quite the perfect scaling we get with a native busy ioctl, but
poll(dmabuf) is faster due to the quicker lookup of the object and
avoiding drm_ioctl().
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org
Reviewed-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
---
drivers/dma-buf/dma-buf.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index cf04d249a6a4..c7a7bc579941 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -156,6 +156,18 @@ static unsigned int dma_buf_poll(struct file *file, poll_table *poll)
if (!events)
return 0;
+ if (poll_does_not_wait(poll)) {
+ if (events & POLLOUT &&
+ !reservation_object_test_signaled_rcu(resv, true))
+ events &= ~(POLLOUT | POLLIN);
+
+ if (events & POLLIN &&
+ !reservation_object_test_signaled_rcu(resv, false))
+ events &= ~POLLIN;
+
+ return events;
+ }
+
retry:
seq = read_seqcount_begin(&resv->seq);
rcu_read_lock();
--
2.9.3
In order to be completely generic, we have to double check the read
seqlock after acquiring a reference to the fence. If the driver is
allocating fences from a SLAB_DESTROY_BY_RCU, or similar freelist, then
within an RCU grace period a fence may be freed and reallocated. The RCU
read side critical section does not prevent this reallocation, instead
we have to inspect the reservation's seqlock to double check if the
fences have been reassigned as we were acquiring our reference.
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org
---
drivers/dma-buf/reservation.c | 30 ++++++++++--------------------
1 file changed, 10 insertions(+), 20 deletions(-)
diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 3369e4668e96..e74493e7332b 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -474,12 +474,13 @@ bool reservation_object_test_signaled_rcu(struct reservation_object *obj,
bool test_all)
{
unsigned seq, shared_count;
- int ret = true;
+ int ret;
+ rcu_read_lock();
retry:
+ ret = true;
shared_count = 0;
seq = read_seqcount_begin(&obj->seq);
- rcu_read_lock();
if (test_all) {
unsigned i;
@@ -490,46 +491,35 @@ retry:
if (fobj)
shared_count = fobj->shared_count;
- if (read_seqcount_retry(&obj->seq, seq))
- goto unlock_retry;
-
for (i = 0; i < shared_count; ++i) {
struct fence *fence = rcu_dereference(fobj->shared[i]);
ret = reservation_object_test_signaled_single(fence);
if (ret < 0)
- goto unlock_retry;
+ goto retry;
else if (!ret)
break;
}
- /*
- * There could be a read_seqcount_retry here, but nothing cares
- * about whether it's the old or newer fence pointers that are
- * signaled. That race could still have happened after checking
- * read_seqcount_retry. If you care, use ww_mutex_lock.
- */
+ if (read_seqcount_retry(&obj->seq, seq))
+ goto retry;
}
if (!shared_count) {
struct fence *fence_excl = rcu_dereference(obj->fence_excl);
- if (read_seqcount_retry(&obj->seq, seq))
- goto unlock_retry;
-
if (fence_excl) {
ret = reservation_object_test_signaled_single(
fence_excl);
if (ret < 0)
- goto unlock_retry;
+ goto retry;
+
+ if (read_seqcount_retry(&obj->seq, seq))
+ goto retry;
}
}
rcu_read_unlock();
return ret;
-
-unlock_retry:
- rcu_read_unlock();
- goto retry;
}
EXPORT_SYMBOL_GPL(reservation_object_test_signaled_rcu);
--
2.9.3
This variant of fence_get_rcu() takes an RCU protected pointer to a
fence and carefully returns a reference to the fence ensuring that it is
not reallocated as it does. This is required when mixing fences and
SLAB_DESTROY_BY_RCU - although it serves a more pedagogical function atm
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org
---
include/linux/fence.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 51 insertions(+), 5 deletions(-)
diff --git a/include/linux/fence.h b/include/linux/fence.h
index 0d763053f97a..c9c5ba98c302 100644
--- a/include/linux/fence.h
+++ b/include/linux/fence.h
@@ -183,6 +183,16 @@ void fence_release(struct kref *kref);
void fence_free(struct fence *fence);
/**
+ * fence_put - decreases refcount of the fence
+ * @fence: [in] fence to reduce refcount of
+ */
+static inline void fence_put(struct fence *fence)
+{
+ if (fence)
+ kref_put(&fence->refcount, fence_release);
+}
+
+/**
* fence_get - increases refcount of the fence
* @fence: [in] fence to increase refcount of
*
@@ -210,13 +220,49 @@ static inline struct fence *fence_get_rcu(struct fence *fence)
}
/**
- * fence_put - decreases refcount of the fence
- * @fence: [in] fence to reduce refcount of
+ * fence_get_rcu_safe - acquire a reference to an RCU tracked fence
+ * @fence: [in] pointer to fence to increase refcount of
+ *
+ * Function returns NULL if no refcount could be obtained, or the fence.
+ * This function handles acquiring a reference to a fence that may be
+ * reallocated within the RCU grace period (such as with SLAB_DESTROY_BY_RCU),
+ * so long as the caller is using RCU on the pointer to the fence.
+ *
+ * An alternative mechanism is to employ a seqlock to protect a bunch of
+ * fences, such as used by struct reservation_object. When using a seqlock,
+ * the seqlock must be taken before and checked after a reference to the
+ * fence is acquired (as shown here).
+ *
+ * The caller is required to hold the RCU read lock.
*/
-static inline void fence_put(struct fence *fence)
+static inline struct fence *fence_get_rcu_safe(struct fence * __rcu *fencep)
{
- if (fence)
- kref_put(&fence->refcount, fence_release);
+ do {
+ struct fence *fence;
+
+ fence = rcu_dereference(*fencep);
+ if (!fence || !fence_get_rcu(fence))
+ return NULL;
+
+ /* The atomic_inc_not_zero() inside fence_get_rcu()
+ * provides a full memory barrier upon success (such as now).
+ * This is paired with the write barrier from assigning
+ * to the __rcu protected fence pointer so that if that
+ * pointer still matches the current fence, we know we
+ * have successfully acquire a reference to it. If it no
+ * longer matches, we are holding a reference to some other
+ * reallocated pointer. This is possible if the allocator
+ * is using a freelist like SLAB_DESTROY_BY_RCU where the
+ * fence remains valid for the RCU grace period, but it
+ * may be reallocated. When using such allocators, we are
+ * responsible for ensuring the reference we get is to
+ * the right fence, as below.
+ */
+ if (fence == rcu_access_pointer(*fencep))
+ return rcu_pointer_handoff(fence);
+
+ fence_put(fence);
+ } while (1);
}
int fence_signal(struct fence *fence);
--
2.9.3
In order to be completely generic, we have to double check the read
seqlock after acquiring a reference to the fence. If the driver is
allocating fences from a SLAB_DESTROY_BY_RCU, or similar freelist, then
within an RCU grace period a fence may be freed and reallocated. The RCU
read side critical section does not prevent this reallocation, instead
we have to inspect the reservation's seqlock to double check if the
fences have been reassigned as we were acquiring our reference.
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org
---
drivers/dma-buf/reservation.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 10fd441dd4ed..3369e4668e96 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -388,9 +388,6 @@ retry:
if (fobj)
shared_count = fobj->shared_count;
- if (read_seqcount_retry(&obj->seq, seq))
- goto unlock_retry;
-
for (i = 0; i < shared_count; ++i) {
struct fence *lfence = rcu_dereference(fobj->shared[i]);
@@ -413,9 +410,6 @@ retry:
if (!shared_count) {
struct fence *fence_excl = rcu_dereference(obj->fence_excl);
- if (read_seqcount_retry(&obj->seq, seq))
- goto unlock_retry;
-
if (fence_excl &&
!test_bit(FENCE_FLAG_SIGNALED_BIT, &fence_excl->flags)) {
if (!fence_get_rcu(fence_excl))
@@ -430,6 +424,11 @@ retry:
rcu_read_unlock();
if (fence) {
+ if (read_seqcount_retry(&obj->seq, seq)) {
+ fence_put(fence);
+ goto retry;
+ }
+
ret = fence_wait_timeout(fence, intr, ret);
fence_put(fence);
if (ret > 0 && wait_all && (i + 1 < shared_count))
--
2.9.3
Hi,
This is v3 on the attempt to remove the misuse of the DMA cache APIS from Ion.
As from before:
The APIs created are kernel_force_cache_clean and kernel_force_cache_invalidate.
They force a clean and invalidate of the cache, respectively. The aim was to
take the semantics of dma_sync and turn them into something that isn't
dma_sync. This series includes a nominal implementation for arm/arm64, mostly
for demonstration purposes.
The major change from v2 is that the implementations no longer leverage the
DMA abstractions. Russell King noted that dma_map and dma_unmap just 'happen'
to do the right thing but they aren't guaranteed.
I'm hoping at v3 there are no objections to the general concept but if they
exist please express them.
Thanks,
Laura
[1]http://www.mail-archive.com/driverdev-devel@linuxdriverproject.org/msg494…
Laura Abbott (5):
Documentation: Introduce kernel_force_cache_* APIs
arm: Impelment ARCH_HAS_FORCE_CACHE
arm64: Implement ARCH_HAS_FORCE_CACHE
staging: android: ion: Convert to the kernel_force_cache APIs
staging: ion: Add support for syncing with DMA_BUF_IOCTL_SYNC
Documentation/cachetlb.txt | 18 ++++++-
arch/arm/include/asm/cacheflush.h | 11 ++++
arch/arm/include/asm/glue-cache.h | 2 +
arch/arm/mm/Makefile | 2 +-
arch/arm/mm/cache-fa.S | 8 +++
arch/arm/mm/cache-nop.S | 6 +++
arch/arm/mm/cache-v4.S | 10 ++++
arch/arm/mm/cache-v4wb.S | 8 +++
arch/arm/mm/cache-v4wt.S | 8 +++
arch/arm/mm/cache-v6.S | 8 +++
arch/arm/mm/cache-v7.S | 13 +++++
arch/arm/mm/cacheflush.c | 71 +++++++++++++++++++++++++
arch/arm/mm/proc-arm920.S | 8 +++
arch/arm/mm/proc-arm922.S | 8 +++
arch/arm/mm/proc-arm925.S | 8 +++
arch/arm/mm/proc-arm926.S | 8 +++
arch/arm/mm/proc-feroceon.S | 11 ++++
arch/arm/mm/proc-macros.S | 2 +
arch/arm/mm/proc-xsc3.S | 9 ++++
arch/arm/mm/proc-xscale.S | 9 ++++
arch/arm64/include/asm/cacheflush.h | 8 +++
arch/arm64/mm/cache.S | 24 +++++++--
arch/arm64/mm/flush.c | 11 ++++
drivers/staging/android/ion/ion.c | 53 +++++++++++-------
drivers/staging/android/ion/ion_carveout_heap.c | 8 +--
drivers/staging/android/ion/ion_chunk_heap.c | 12 +++--
drivers/staging/android/ion/ion_page_pool.c | 7 +--
drivers/staging/android/ion/ion_priv.h | 11 ----
drivers/staging/android/ion/ion_system_heap.c | 6 +--
include/linux/cacheflush.h | 11 ++++
30 files changed, 330 insertions(+), 49 deletions(-)
create mode 100644 arch/arm/mm/cacheflush.c
create mode 100644 include/linux/cacheflush.h
--
2.7.4
Hi,
This is v3 of the previous series. The scope continues to shrink. The ABI
ioctl was dropped after discussion about how it creates more problems than
it actually solves. This is mostly a rebase to staging-next with some
refactoring from not having the ABI ioctl. There was some discussion about
ion_dummy cleanup but I've decided to have that be a separate patch.
Laura Abbott (2):
staging: android: ion: Pull out ion ioctls to a separate file
staging: android: ion: Add ioctl to query available heaps
drivers/staging/android/ion/Makefile | 3 +-
drivers/staging/android/ion/ion-ioctl.c | 177 +++++++++++++++++++++++++
drivers/staging/android/ion/ion.c | 227 ++++++--------------------------
drivers/staging/android/ion/ion_priv.h | 94 +++++++++++++
drivers/staging/android/uapi/ion.h | 39 ++++++
5 files changed, 349 insertions(+), 191 deletions(-)
create mode 100644 drivers/staging/android/ion/ion-ioctl.c
--
2.7.4
Hi,
This is a follow up to my previous series[1] for Ion ioctls. I've changed the
focus slightly based on the feedback. The ID remapping was less useful than I
originally thought and without that addition there isn't much benefit to have
a new alloc ioctl. The ABI check and query interface still seem beneficial.
There was some discussion on where exactly these types of ioctls would be
called. I expect the answer will depend on exactly how it's integrated.
Long term, I'd still like to fix the ABI to not be a checklist of botching
up ioctls but that focus will come later.
Changes from v1:
- Rebased
- Dropped RFC
- Dropped ID remapping and dependent logic
- Changed query logic to only need one ioctl
- Fixed alignment of query ioctl structure
[1] http://www.mail-archive.com/driverdev-devel@linuxdriverproject.org/msg48036…
Laura Abbott (4):
staging: android: ion: Drop heap type masks
staging: android: ion: Pull out ion ioctls to a separate file
staging: android: ion: Add an ioctl for ABI checking
staging: android: ion: Add ioctl to query available heaps
drivers/staging/android/ion/Makefile | 3 +-
drivers/staging/android/ion/ion-ioctl.c | 188 ++++++++++++++++++++++++++
drivers/staging/android/ion/ion.c | 226 ++++++--------------------------
drivers/staging/android/ion/ion_priv.h | 94 +++++++++++++
drivers/staging/android/uapi/ion.h | 67 +++++++++-
5 files changed, 382 insertions(+), 196 deletions(-)
create mode 100644 drivers/staging/android/ion/ion-ioctl.c
--
2.7.4