Linaro-mm-sig

linaro-mm-sig@lists.linaro.org

19 participants
2950 discussions

Re: [Linaro-mm-sig] [PATCH] dma-fence: fix dma_fence_get_rcu_safe

by Chris Wilson

Quoting Christian König (2017-09-04 14:27:33) > From: Christian König <christian.koenig(a)amd.com> > > The logic is buggy and unnecessary complex. When dma_fence_get_rcu() fails to > acquire a reference it doesn't necessary mean that there is no fence at all. > > It usually mean that the fence was replaced by a new one and in this situation > we certainly want to have the new one as result and *NOT* NULL. Which is not guaranteed by the code you wrote either. The point of the comment is that the mb is only inside the successful kref_atomic_inc_unless_zero, and that only after that mb do you know whether or not you have the current fence. You can argue that you want to replace the if (!dma_fence_get_rcu()) return NULL with if (!dma_fence_get_rcu() continue; but it would be incorrect to say that by simply ignoring the post-condition check that you do have the right fence. -Chris

7 years, 9 months

[PATCH] dma-fence: fix dma_fence_get_rcu_safe v2

by Christian König

From: Christian König <christian.koenig(a)amd.com> When dma_fence_get_rcu() fails to acquire a reference it doesn't necessary mean that there is no fence at all. It usually mean that the fence was replaced by a new one and in this situation we certainly want to have the new one as result and *NOT* NULL. v2: Keep extra check after dma_fence_get_rcu(). Signed-off-by: Christian König <christian.koenig(a)amd.com> Cc: Chris Wilson <chris(a)chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: linux-media(a)vger.kernel.org Cc: dri-devel(a)lists.freedesktop.org Cc: linaro-mm-sig(a)lists.linaro.org --- include/linux/dma-fence.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 0a186c4..f4f23cb 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -248,9 +248,12 @@ dma_fence_get_rcu_safe(struct dma_fence * __rcu *fencep) struct dma_fence *fence; fence = rcu_dereference(*fencep); - if (!fence || !dma_fence_get_rcu(fence)) + if (!fence) return NULL; + if (!dma_fence_get_rcu(fence)) + continue; + /* The atomic_inc_not_zero() inside dma_fence_get_rcu() * provides a full memory barrier upon success (such as now). * This is paired with the write barrier from assigning -- 2.7.4

7 years, 9 months

[PATCH] dma-buf: remove redundant initialization of sg_table

by Colin King

From: Colin Ian King <colin.king(a)canonical.com> sg_table is being initialized and is never read before it is updated again later on, hence making the initialization redundant. Remove the initialization. Detected by clang scan-build: "warning: Value stored to 'sg_table' during its initialization is never read" Signed-off-by: Colin Ian King <colin.king(a)canonical.com> --- drivers/dma-buf/dma-buf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 4a038dcf5361..bc1cb284111c 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -625,7 +625,7 @@ EXPORT_SYMBOL_GPL(dma_buf_detach); struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, enum dma_data_direction direction) { - struct sg_table *sg_table = ERR_PTR(-EINVAL); + struct sg_table *sg_table; might_sleep(); -- 2.14.1

7 years, 10 months

Re: [Linaro-mm-sig] [PATCH] dma-buf: fix reservation_object_wait_timeout_rcu to wait correctly v2

by Deucher, Alexander

> -----Original Message----- > From: Christian König [mailto:deathsimple@vodafone.de] > Sent: Monday, July 31, 2017 10:13 AM > To: linux-media(a)vger.kernel.org; dri-devel(a)lists.freedesktop.org; linaro- > mm-sig(a)lists.linaro.org; Zhou, David(ChunMing); Deucher, Alexander > Subject: Re: [PATCH] dma-buf: fix reservation_object_wait_timeout_rcu to > wait correctly v2 > > Ping, what do you guys think of that? Seems reasonable to me. Reviewed-by: Alex Deucher <alexander.deucher(a)amd.com> > > Am 25.07.2017 um 15:35 schrieb Christian König: > > From: Christian König <christian.koenig(a)amd.com> > > > > With hardware resets in mind it is possible that all shared fences are > > signaled, but the exlusive isn't. Fix waiting for everything in this situation. > > > > v2: make sure we always wait for the exclusive fence > > > > Signed-off-by: Christian König <christian.koenig(a)amd.com> > > --- > > drivers/dma-buf/reservation.c | 33 +++++++++++++++------------------ > > 1 file changed, 15 insertions(+), 18 deletions(-) > > > > diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c > > index 393817e..9d4316d 100644 > > --- a/drivers/dma-buf/reservation.c > > +++ b/drivers/dma-buf/reservation.c > > @@ -373,12 +373,25 @@ long reservation_object_wait_timeout_rcu(struct > reservation_object *obj, > > long ret = timeout ? timeout : 1; > > > > retry: > > - fence = NULL; > > shared_count = 0; > > seq = read_seqcount_begin(&obj->seq); > > rcu_read_lock(); > > > > - if (wait_all) { > > + fence = rcu_dereference(obj->fence_excl); > > + if (fence && !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence- > >flags)) { > > + if (!dma_fence_get_rcu(fence)) > > + goto unlock_retry; > > + > > + if (dma_fence_is_signaled(fence)) { > > + dma_fence_put(fence); > > + fence = NULL; > > + } > > + > > + } else { > > + fence = NULL; > > + } > > + > > + if (!fence && wait_all) { > > struct reservation_object_list *fobj = > > rcu_dereference(obj- > >fence); > > > > @@ -405,22 +418,6 @@ long reservation_object_wait_timeout_rcu(struct > reservation_object *obj, > > } > > } > > > > - if (!shared_count) { > > - struct dma_fence *fence_excl = rcu_dereference(obj- > >fence_excl); > > - > > - if (fence_excl && > > - !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, > > - &fence_excl->flags)) { > > - if (!dma_fence_get_rcu(fence_excl)) > > - goto unlock_retry; > > - > > - if (dma_fence_is_signaled(fence_excl)) > > - dma_fence_put(fence_excl); > > - else > > - fence = fence_excl; > > - } > > - } > > - > > rcu_read_unlock(); > > if (fence) { > > if (read_seqcount_retry(&obj->seq, seq)) { >

7 years, 11 months

Re: [Linaro-mm-sig] [PATCH] dma-buf: fix reservation_object_wait_timeout_rcu to wait correctly

by Daniel Vetter

On Tue, Jul 25, 2017 at 11:11:35AM +0200, Christian König wrote: > Am 24.07.2017 um 13:57 schrieb Daniel Vetter: > > On Mon, Jul 24, 2017 at 11:51 AM, Christian König > > <deathsimple(a)vodafone.de> wrote: > > > Am 24.07.2017 um 10:33 schrieb Daniel Vetter: > > > > On Fri, Jul 21, 2017 at 06:20:01PM +0200, Christian König wrote: > > > > > From: Christian König <christian.koenig(a)amd.com> > > > > > > > > > > With hardware resets in mind it is possible that all shared fences are > > > > > signaled, but the exlusive isn't. Fix waiting for everything in this > > > > > situation. > > > > How did you end up with both shared and exclusive fences on the same > > > > reservation object? At least I thought the point of exclusive was that > > > > it's exclusive (and has an implicit barrier on all previous shared > > > > fences). Same for shared fences, they need to wait for the exclusive one > > > > (and replace it). > > > > > > > > Is this fallout from the amdgpu trickery where by default you do all > > > > shared fences? I thought we've aligned semantics a while back ... > > > > > > No, that is perfectly normal even for other drivers. Take a look at the > > > reservation code. > > > > > > The exclusive fence replaces all shared fences, but adding a shared fence > > > doesn't replace the exclusive fence. That actually makes sense, cause when > > > you want to add move shared fences those need to wait for the last exclusive > > > fence as well. > > Hm right. > > > > > Now normally I would agree that when you have shared fences it is sufficient > > > to wait for all of them cause those operations can't start before the > > > exclusive one finishes. But with GPU reset and/or the ability to abort > > > already submitted operations it is perfectly possible that you end up with > > > an exclusive fence which isn't signaled and a shared fence which is signaled > > > in the same reservation object. > > How does that work? The batch(es) with the shared fence are all > > supposed to wait for the exclusive fence before they start, which > > means even if you gpu reset and restart/cancel certain things, they > > shouldn't be able to complete out of order. > > Assume the following: > 1. The exclusive fence is some move operation by the kernel which executes > on a DMA engine. > 2. The shared fence is a 3D operation submitted by userspace which executes > on the 3D engine. > > Now we found the 3D engine to be hung and needs a reset, all currently > submitted jobs are aborted, marked with an error code and their fences put > into the signaled state. > > Since we only reset the 3D engine, the move operation (fortunately) isn't > affected by this. > > I think this applies to all drivers and isn't something amdgpu specific. Not i915 because: - At first we only had system wide gpu reset that killed everything, which means all requests will be completed, not just on a single engine. - Now we have per-engine reset, but combined with replaying them (the offending one gets a no-op batch to avoid re-hanging), to make sure the depency tree doesn't fall apart. Now I see that doing this isn't all that simple, and either way we still have the case where one driver resets but not the other (in multi-gpu), but I'm not exactly sure how to best handle this. What exactly is the downside of not dropping this assumption, i.e. why do you want this patch? What blows up? -Daniel > > Regards, > Christian. > > > > > If you outright cancel a fence then you're supposed to first call > > dma_fence_set_error(-EIO) and then complete it. Note that atm that > > part might be slightly overengineered and I'm not sure about how we > > expose stuff to userspace, e.g. dma_fence_set_error(-EAGAIN) is (or > > soon, has been) used by i915 for it's internal book-keeping, which > > might not be the best to leak to other consumers. But completing > > fences (at least exported ones, where userspace or other drivers can > > get at them) shouldn't be possible. > > -Daniel > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

7 years, 11 months

Re: [Linaro-mm-sig] [PATCH] dma-buf: fix reservation_object_wait_timeout_rcu to wait correctly

by Daniel Vetter

On Mon, Jul 24, 2017 at 11:51 AM, Christian König <deathsimple(a)vodafone.de> wrote: > Am 24.07.2017 um 10:33 schrieb Daniel Vetter: >> >> On Fri, Jul 21, 2017 at 06:20:01PM +0200, Christian König wrote: >>> >>> From: Christian König <christian.koenig(a)amd.com> >>> >>> With hardware resets in mind it is possible that all shared fences are >>> signaled, but the exlusive isn't. Fix waiting for everything in this >>> situation. >> >> How did you end up with both shared and exclusive fences on the same >> reservation object? At least I thought the point of exclusive was that >> it's exclusive (and has an implicit barrier on all previous shared >> fences). Same for shared fences, they need to wait for the exclusive one >> (and replace it). >> >> Is this fallout from the amdgpu trickery where by default you do all >> shared fences? I thought we've aligned semantics a while back ... > > > No, that is perfectly normal even for other drivers. Take a look at the > reservation code. > > The exclusive fence replaces all shared fences, but adding a shared fence > doesn't replace the exclusive fence. That actually makes sense, cause when > you want to add move shared fences those need to wait for the last exclusive > fence as well. Hm right. > Now normally I would agree that when you have shared fences it is sufficient > to wait for all of them cause those operations can't start before the > exclusive one finishes. But with GPU reset and/or the ability to abort > already submitted operations it is perfectly possible that you end up with > an exclusive fence which isn't signaled and a shared fence which is signaled > in the same reservation object. How does that work? The batch(es) with the shared fence are all supposed to wait for the exclusive fence before they start, which means even if you gpu reset and restart/cancel certain things, they shouldn't be able to complete out of order. If you outright cancel a fence then you're supposed to first call dma_fence_set_error(-EIO) and then complete it. Note that atm that part might be slightly overengineered and I'm not sure about how we expose stuff to userspace, e.g. dma_fence_set_error(-EAGAIN) is (or soon, has been) used by i915 for it's internal book-keeping, which might not be the best to leak to other consumers. But completing fences (at least exported ones, where userspace or other drivers can get at them) shouldn't be possible. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch

7 years, 11 months

Re: [Linaro-mm-sig] [PATCH] dma-buf: fix reservation_object_wait_timeout_rcu to wait correctly

by zhoucm1

On 2017年07月22日 00:20, Christian König wrote: > From: Christian König <christian.koenig(a)amd.com> > > With hardware resets in mind it is possible that all shared fences are > signaled, but the exlusive isn't. Fix waiting for everything in this situation. > > Signed-off-by: Christian König <christian.koenig(a)amd.com> > --- > drivers/dma-buf/reservation.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c > index e2eff86..ce3f9c1 100644 > --- a/drivers/dma-buf/reservation.c > +++ b/drivers/dma-buf/reservation.c > @@ -461,7 +461,7 @@ long reservation_object_wait_timeout_rcu(struct reservation_object *obj, > } > } > > - if (!shared_count) { > + if (!fence) { previous code seems be a bug, the exclusive fence isn't be waited at all if shared_count != 0. With your fix, there still is a case the exclusive fence could be skipped, that when fobj->shared[shared_count-1] isn't signalled. Regards, David Zhou > struct dma_fence *fence_excl = rcu_dereference(obj->fence_excl); > > if (fence_excl &&

7 years, 11 months

Re: [Linaro-mm-sig] [PATCH] dma-buf: fix reservation_object_wait_timeout_rcu to wait correctly

by Daniel Vetter

On Fri, Jul 21, 2017 at 06:20:01PM +0200, Christian König wrote: > From: Christian König <christian.koenig(a)amd.com> > > With hardware resets in mind it is possible that all shared fences are > signaled, but the exlusive isn't. Fix waiting for everything in this situation. How did you end up with both shared and exclusive fences on the same reservation object? At least I thought the point of exclusive was that it's exclusive (and has an implicit barrier on all previous shared fences). Same for shared fences, they need to wait for the exclusive one (and replace it). Is this fallout from the amdgpu trickery where by default you do all shared fences? I thought we've aligned semantics a while back ... -Daniel > > Signed-off-by: Christian König <christian.koenig(a)amd.com> > --- > drivers/dma-buf/reservation.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c > index e2eff86..ce3f9c1 100644 > --- a/drivers/dma-buf/reservation.c > +++ b/drivers/dma-buf/reservation.c > @@ -461,7 +461,7 @@ long reservation_object_wait_timeout_rcu(struct reservation_object *obj, > } > } > > - if (!shared_count) { > + if (!fence) { > struct dma_fence *fence_excl = rcu_dereference(obj->fence_excl); > > if (fence_excl && > -- > 2.7.4 > > _______________________________________________ > dri-devel mailing list > dri-devel(a)lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

7 years, 11 months

[PATCH] dma-fence: Don't BUG_ON when not absolutely needed

by Daniel Vetter

It makes debugging a massive pain. Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: Gustavo Padovan <gustavo(a)padovan.org> Cc: linux-media(a)vger.kernel.org Cc: linaro-mm-sig(a)lists.linaro.org --- drivers/dma-buf/dma-fence.c | 4 ++-- include/linux/dma-fence.h | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 56e0a0e1b600..9a302799040e 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -48,7 +48,7 @@ static atomic64_t dma_fence_context_counter = ATOMIC64_INIT(0); */ u64 dma_fence_context_alloc(unsigned num) { - BUG_ON(!num); + WARN_ON(!num); return atomic64_add_return(num, &dma_fence_context_counter) - num; } EXPORT_SYMBOL(dma_fence_context_alloc); @@ -172,7 +172,7 @@ void dma_fence_release(struct kref *kref) trace_dma_fence_destroy(fence); - BUG_ON(!list_empty(&fence->cb_list)); + WARN_ON(!list_empty(&fence->cb_list)); if (fence->ops->release) fence->ops->release(fence); diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 9342cf0dada4..171895072435 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -431,8 +431,8 @@ int dma_fence_get_status(struct dma_fence *fence); static inline void dma_fence_set_error(struct dma_fence *fence, int error) { - BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)); - BUG_ON(error >= 0 || error < -MAX_ERRNO); + WARN_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)); + WARN_ON(error >= 0 || error < -MAX_ERRNO); fence->error = error; } -- 2.13.2

7 years, 11 months

Re: [Linaro-mm-sig] [PATCH] dma-buf: avoid scheduling on fence status query v2

by Gustavo Padovan

2017-04-26 Christian König <deathsimple(a)vodafone.de>: > Am 26.04.2017 um 16:46 schrieb Andres Rodriguez: > > When a timeout of zero is specified, the caller is only interested in > > the fence status. > > > > In the current implementation, dma_fence_default_wait will always call > > schedule_timeout() at least once for an unsignaled fence. This adds a > > significant overhead to a fence status query. > > > > Avoid this overhead by returning early if a zero timeout is specified. > > > > v2: move early return after enable_signaling > > > > Signed-off-by: Andres Rodriguez <andresx7(a)gmail.com> > > Reviewed-by: Christian König <christian.koenig(a)amd.com> pushed to drm-misc-next. Thanks all. Gustavo

8 years, 1 month

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig