completely rework the dma_resv semantic

List overview All Threads
Download

newer

older

[PATCH] dma-fence: fix free sync...

Re: [Linaro-mm-sig] [PATCH 18/24]...

Christian König

7 Dec 2021 7 Dec '21

12:33 p.m.

Hi Daniel,

just a gentle ping that you wanted to take a look at this.

Not much changed compared to the last version, only a minor bugfix in the dma_resv_get_singleton error handling.

Regards, Christian.

Show replies by date

Christian König

7 Dec 7 Dec

12:33 p.m.

New subject: [PATCH 01/24] dma-buf: add dma_resv_replace_fences

This function allows to replace fences from the shared fence list when we can gurantee that the operation represented by the original fence has finished or no accesses to the resources protected by the dma_resv object any more when the new fence finishes.

Then use this function in the amdkfd code when BOs are unmapped from the process.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/dma-buf/dma-resv.c | 43 ++++++++++++++++ .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 49 +++---------------- include/linux/dma-resv.h | 2 + 3 files changed, 52 insertions(+), 42 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 4deea75c0b9c..a688dbded3d3 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -284,6 +284,49 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence) } EXPORT_SYMBOL(dma_resv_add_shared_fence);

+/** + * dma_resv_replace_fences - replace fences in the dma_resv obj + * @obj: the reservation object + * @context: the context of the fences to replace + * @replacement: the new fence to use instead + * + * Replace fences with a specified context with a new fence. Only valid if the + * operation represented by the original fences is completed or has no longer + * access to the resources protected by the dma_resv object when the new fence + * completes. + */ +void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context, + struct dma_fence *replacement) +{ + struct dma_resv_list *list; + struct dma_fence *old; + unsigned int i; + + dma_resv_assert_held(obj); + + write_seqcount_begin(&obj->seq); + + old = dma_resv_excl_fence(obj); + if (old->context == context) { + RCU_INIT_POINTER(obj->fence_excl, dma_fence_get(replacement)); + dma_fence_put(old); + } + + list = dma_resv_shared_list(obj); + for (i = 0; list && i < list->shared_count; ++i) { + old = rcu_dereference_protected(list->shared[i], + dma_resv_held(obj)); + if (old->context != context) + continue; + + rcu_assign_pointer(list->shared[i], dma_fence_get(replacement)); + dma_fence_put(old); + } + + write_seqcount_end(&obj->seq); +} +EXPORT_SYMBOL(dma_resv_replace_fences); + /** * dma_resv_add_excl_fence - Add an exclusive fence. * @obj: the reservation object diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 71acd577803e..b558ef0f8c4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -236,53 +236,18 @@ void amdgpu_amdkfd_release_notify(struct amdgpu_bo *bo) static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo, struct amdgpu_amdkfd_fence *ef) { - struct dma_resv *resv = bo->tbo.base.resv; - struct dma_resv_list *old, *new; - unsigned int i, j, k; + struct dma_fence *replacement;

if (!ef) return -EINVAL;

- old = dma_resv_shared_list(resv); - if (!old) - return 0; - - new = kmalloc(struct_size(new, shared, old->shared_max), GFP_KERNEL); - if (!new) - return -ENOMEM; - - /* Go through all the shared fences in the resevation object and sort - * the interesting ones to the end of the list. + /* TODO: Instead of block before we should use the fence of the page + * table update and TLB flush here directly. */ - for (i = 0, j = old->shared_count, k = 0; i < old->shared_count; ++i) { - struct dma_fence *f; - - f = rcu_dereference_protected(old->shared[i], - dma_resv_held(resv)); - - if (f->context == ef->base.context) - RCU_INIT_POINTER(new->shared[--j], f); - else - RCU_INIT_POINTER(new->shared[k++], f); - } - new->shared_max = old->shared_max; - new->shared_count = k; - - /* Install the new fence list, seqcount provides the barriers */ - write_seqcount_begin(&resv->seq); - RCU_INIT_POINTER(resv->fence, new); - write_seqcount_end(&resv->seq); - - /* Drop the references to the removed fences or move them to ef_list */ - for (i = j; i < old->shared_count; ++i) { - struct dma_fence *f; - - f = rcu_dereference_protected(new->shared[i], - dma_resv_held(resv)); - dma_fence_put(f); - } - kfree_rcu(old, rcu); - + replacement = dma_fence_get_stub(); + dma_resv_replace_fences(bo->tbo.base.resv, ef->base.context, + replacement); + dma_fence_put(replacement); return 0; }

diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index eebf04325b34..e0be34265eae 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -457,6 +457,8 @@ void dma_resv_init(struct dma_resv *obj); void dma_resv_fini(struct dma_resv *obj); int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences); void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence); +void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context, + struct dma_fence *fence); void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence); int dma_resv_get_fences(struct dma_resv *obj, struct dma_fence **pfence_excl, unsigned *pshared_count, struct dma_fence ***pshared);

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

9:05 p.m.

New subject: [PATCH 01/24] dma-buf: add dma_resv_replace_fences

On Tue, Dec 07, 2021 at 01:33:48PM +0100, Christian König wrote:

...

This function allows to replace fences from the shared fence list when we can gurantee that the operation represented by the original fence has finished or no accesses to the resources protected by the dma_resv object any more when the new fence finishes.

Then use this function in the amdkfd code when BOs are unmapped from the process.

Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/dma-resv.c | 43 ++++++++++++++++ .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 49 +++---------------- include/linux/dma-resv.h | 2 + 3 files changed, 52 insertions(+), 42 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 4deea75c0b9c..a688dbded3d3 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -284,6 +284,49 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence) } EXPORT_SYMBOL(dma_resv_add_shared_fence); +/**

dma_resv_replace_fences - replace fences in the dma_resv obj

@obj: the reservation object

@context: the context of the fences to replace

@replacement: the new fence to use instead

Replace fences with a specified context with a new fence. Only valid if the

operation represented by the original fences is completed or has no longer

access to the resources protected by the dma_resv object when the new fence

completes.

*/

+void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context,
	     struct dma_fence *replacement)
+{
struct dma_resv_list *list;

struct dma_fence *old;

unsigned int i;

dma_resv_assert_held(obj);

write_seqcount_begin(&obj->seq);

old = dma_resv_excl_fence(obj);

if (old->context == context) {
RCU_INIT_POINTER(obj->fence_excl, dma_fence_get(replacement));
dma_fence_put(old);
}

list = dma_resv_shared_list(obj);

for (i = 0; list && i < list->shared_count; ++i) {
old = rcu_dereference_protected(list->shared[i],
				dma_resv_held(obj));
if (old->context != context)
	continue;
rcu_assign_pointer(list->shared[i], dma_fence_get(replacement));
dma_fence_put(old);

Since the fences are all guaranteed to be from the same context, maybe we should have a WARN_ON(__dma_fence_is_later()); here just to be safe?

With that added:

Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch

...

}

write_seqcount_end(&obj->seq);

+} +EXPORT_SYMBOL(dma_resv_replace_fences);

/**

dma_resv_add_excl_fence - Add an exclusive fence.

@obj: the reservation object

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 71acd577803e..b558ef0f8c4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -236,53 +236,18 @@ void amdgpu_amdkfd_release_notify(struct amdgpu_bo *bo) static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo, struct amdgpu_amdkfd_fence *ef) {

struct dma_resv *resv = bo->tbo.base.resv;

struct dma_resv_list *old, *new;

unsigned int i, j, k;

struct dma_fence *replacement;

if (!ef) return -EINVAL;
old = dma_resv_shared_list(resv);

if (!old)
return 0;
new = kmalloc(struct_size(new, shared, old->shared_max), GFP_KERNEL);

if (!new)
return -ENOMEM;
/* Go through all the shared fences in the resevation object and sort
* the interesting ones to the end of the list.
/* TODO: Instead of block before we should use the fence of the page
* table update and TLB flush here directly.
*/
for (i = 0, j = old->shared_count, k = 0; i < old->shared_count; ++i) {
struct dma_fence *f;
f = rcu_dereference_protected(old->shared[i],
			      dma_resv_held(resv));
if (f->context == ef->base.context)
	RCU_INIT_POINTER(new->shared[--j], f);
else
	RCU_INIT_POINTER(new->shared[k++], f);
}

new->shared_max = old->shared_max;

new->shared_count = k;

/* Install the new fence list, seqcount provides the barriers */

write_seqcount_begin(&resv->seq);

RCU_INIT_POINTER(resv->fence, new);

write_seqcount_end(&resv->seq);

/* Drop the references to the removed fences or move them to ef_list */

for (i = j; i < old->shared_count; ++i) {
struct dma_fence *f;
f = rcu_dereference_protected(new->shared[i],
			      dma_resv_held(resv));
dma_fence_put(f);
}

kfree_rcu(old, rcu);
replacement = dma_fence_get_stub();

dma_resv_replace_fences(bo->tbo.base.resv, ef->base.context,
		replacement);
dma_fence_put(replacement); return 0;
} diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index eebf04325b34..e0be34265eae 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -457,6 +457,8 @@ void dma_resv_init(struct dma_resv *obj); void dma_resv_fini(struct dma_resv *obj); int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences); void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence); +void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context,
	     struct dma_fence *fence);
void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence); int dma_resv_get_fences(struct dma_resv *obj, struct dma_fence **pfence_excl, unsigned *pshared_count, struct dma_fence ***pshared); -- 2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

3 Jan 3 Jan

10:48 a.m.

New subject: [PATCH 01/24] dma-buf: add dma_resv_replace_fences

Am 22.12.21 um 22:05 schrieb Daniel Vetter:

...

On Tue, Dec 07, 2021 at 01:33:48PM +0100, Christian König wrote:

...
This function allows to replace fences from the shared fence list when we can gurantee that the operation represented by the original fence has finished or no accesses to the resources protected by the dma_resv object any more when the new fence finishes.

Then use this function in the amdkfd code when BOs are unmapped from the process.

Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/dma-resv.c | 43 ++++++++++++++++ .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 49 +++---------------- include/linux/dma-resv.h | 2 + 3 files changed, 52 insertions(+), 42 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 4deea75c0b9c..a688dbded3d3 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -284,6 +284,49 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence) } EXPORT_SYMBOL(dma_resv_add_shared_fence); +/**

dma_resv_replace_fences - replace fences in the dma_resv obj

@obj: the reservation object

@context: the context of the fences to replace

@replacement: the new fence to use instead

Replace fences with a specified context with a new fence. Only valid if the

operation represented by the original fences is completed or has no longer

access to the resources protected by the dma_resv object when the new fence

completes.

*/

+void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context,
	     struct dma_fence *replacement)
+{
struct dma_resv_list *list;

struct dma_fence *old;

unsigned int i;

dma_resv_assert_held(obj);

write_seqcount_begin(&obj->seq);

old = dma_resv_excl_fence(obj);

if (old->context == context) {
RCU_INIT_POINTER(obj->fence_excl, dma_fence_get(replacement));
dma_fence_put(old);
}

list = dma_resv_shared_list(obj);

for (i = 0; list && i < list->shared_count; ++i) {
old = rcu_dereference_protected(list->shared[i],
				dma_resv_held(obj));
if (old->context != context)
	continue;
rcu_assign_pointer(list->shared[i], dma_fence_get(replacement));
dma_fence_put(old);
Since the fences are all guaranteed to be from the same context, maybe we should have a WARN_ON(__dma_fence_is_later()); here just to be safe?

First of all happy new year!

Then to answer your question, no :)

This here is the case where we replace an preemption fence with a VM page table update fence. So both fences are not from the same context.

But since you ask that means that I somehow need to improve the documentation.

Regards, Christian.

...

With that added:

Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch

...
}

write_seqcount_end(&obj->seq);

+} +EXPORT_SYMBOL(dma_resv_replace_fences);

/**

dma_resv_add_excl_fence - Add an exclusive fence.

@obj: the reservation object

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 71acd577803e..b558ef0f8c4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -236,53 +236,18 @@ void amdgpu_amdkfd_release_notify(struct amdgpu_bo *bo) static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo, struct amdgpu_amdkfd_fence *ef) {

struct dma_resv *resv = bo->tbo.base.resv;

struct dma_resv_list *old, *new;

unsigned int i, j, k;

struct dma_fence *replacement;

if (!ef) return -EINVAL;
old = dma_resv_shared_list(resv);

if (!old)
return 0;
new = kmalloc(struct_size(new, shared, old->shared_max), GFP_KERNEL);

if (!new)
return -ENOMEM;
/* Go through all the shared fences in the resevation object and sort
* the interesting ones to the end of the list.
/* TODO: Instead of block before we should use the fence of the page
* table update and TLB flush here directly.
*/
for (i = 0, j = old->shared_count, k = 0; i < old->shared_count; ++i) {
struct dma_fence *f;
f = rcu_dereference_protected(old->shared[i],
			      dma_resv_held(resv));
if (f->context == ef->base.context)
	RCU_INIT_POINTER(new->shared[--j], f);
else
	RCU_INIT_POINTER(new->shared[k++], f);
}

new->shared_max = old->shared_max;

new->shared_count = k;

/* Install the new fence list, seqcount provides the barriers */

write_seqcount_begin(&resv->seq);

RCU_INIT_POINTER(resv->fence, new);

write_seqcount_end(&resv->seq);

/* Drop the references to the removed fences or move them to ef_list */

for (i = j; i < old->shared_count; ++i) {
struct dma_fence *f;
f = rcu_dereference_protected(new->shared[i],
			      dma_resv_held(resv));
dma_fence_put(f);
}

kfree_rcu(old, rcu);
replacement = dma_fence_get_stub();

dma_resv_replace_fences(bo->tbo.base.resv, ef->base.context,
		replacement);
dma_fence_put(replacement); return 0; }
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index eebf04325b34..e0be34265eae 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -457,6 +457,8 @@ void dma_resv_init(struct dma_resv *obj); void dma_resv_fini(struct dma_resv *obj); int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences); void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence); +void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context,
	     struct dma_fence *fence);
void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence); int dma_resv_get_fences(struct dma_resv *obj, struct dma_fence **pfence_excl, unsigned *pshared_count, struct dma_fence ***pshared);
-- 2.25.1

Daniel Vetter

14 Jan 14 Jan

4:28 p.m.

New subject: [PATCH 01/24] dma-buf: add dma_resv_replace_fences

On Mon, Jan 03, 2022 at 11:48:25AM +0100, Christian König wrote:

...

Am 22.12.21 um 22:05 schrieb Daniel Vetter:

...
On Tue, Dec 07, 2021 at 01:33:48PM +0100, Christian König wrote:

...
This function allows to replace fences from the shared fence list when we can gurantee that the operation represented by the original fence has finished or no accesses to the resources protected by the dma_resv object any more when the new fence finishes.

Then use this function in the amdkfd code when BOs are unmapped from the process.

Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/dma-resv.c | 43 ++++++++++++++++ .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 49 +++---------------- include/linux/dma-resv.h | 2 + 3 files changed, 52 insertions(+), 42 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 4deea75c0b9c..a688dbded3d3 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -284,6 +284,49 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence) } EXPORT_SYMBOL(dma_resv_add_shared_fence); +/**

dma_resv_replace_fences - replace fences in the dma_resv obj

@obj: the reservation object

@context: the context of the fences to replace

@replacement: the new fence to use instead

Replace fences with a specified context with a new fence. Only valid if the

operation represented by the original fences is completed or has no longer

access to the resources protected by the dma_resv object when the new fence

completes.

*/

+void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context,
	     struct dma_fence *replacement)
+{
struct dma_resv_list *list;

struct dma_fence *old;

unsigned int i;

dma_resv_assert_held(obj);

write_seqcount_begin(&obj->seq);

old = dma_resv_excl_fence(obj);

if (old->context == context) {
RCU_INIT_POINTER(obj->fence_excl, dma_fence_get(replacement));
dma_fence_put(old);
}

list = dma_resv_shared_list(obj);

for (i = 0; list && i < list->shared_count; ++i) {
old = rcu_dereference_protected(list->shared[i],
				dma_resv_held(obj));
if (old->context != context)
	continue;
rcu_assign_pointer(list->shared[i], dma_fence_get(replacement));
dma_fence_put(old);
Since the fences are all guaranteed to be from the same context, maybe we should have a WARN_ON(__dma_fence_is_later()); here just to be safe?
First of all happy new year!

Happy new year to you too!

Also I'm only still catching up.

...

Then to answer your question, no :)

This here is the case where we replace an preemption fence with a VM page table update fence. So both fences are not from the same context.

But since you ask that means that I somehow need to improve the documentation.

Hm yeah then I'm confused, since right above you have the context check. And I thought if the contexts are equal, then the fences must be ordered, and since you're adding a new one it must be a later fences.

But now you're saying this is to replace a fence with a totally different context one (which can totally make sense for the special fences compute mode contexts all need), but then I honestly don't get why you even check for the context.

Maybe more docs help explain what's going on, or maybe we should have the is_later check only if the new fences is from the same context. amdkfd might not benefit, but this is a new generic interface and other drivers might horrendously screw this up :-) Plus then a big comment that if it's a different fence timeline context the driver must guarantee that the new fence is guaranteed to signal after anything we're replacing here.

I think it might also be good to just include the specific amdkfd use case with a short intro to wth are preempt-ctx and page table fences, to explain when this function is actually useful.

It's definitely a very special case function, and I'm worried driver authors might come up with creative abuses for it that cause trouble. -Daniel

...

Regards, Christian.

...
With that added:

Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch

...
}

write_seqcount_end(&obj->seq);

+} +EXPORT_SYMBOL(dma_resv_replace_fences);

/**

dma_resv_add_excl_fence - Add an exclusive fence.

@obj: the reservation object

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 71acd577803e..b558ef0f8c4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -236,53 +236,18 @@ void amdgpu_amdkfd_release_notify(struct amdgpu_bo *bo) static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo, struct amdgpu_amdkfd_fence *ef) {

struct dma_resv *resv = bo->tbo.base.resv;

struct dma_resv_list *old, *new;

unsigned int i, j, k;

struct dma_fence *replacement; if (!ef) return -EINVAL;
old = dma_resv_shared_list(resv);

if (!old)
return 0;
new = kmalloc(struct_size(new, shared, old->shared_max), GFP_KERNEL);

if (!new)
return -ENOMEM;
/* Go through all the shared fences in the resevation object and sort
* the interesting ones to the end of the list.
/* TODO: Instead of block before we should use the fence of the page
* table update and TLB flush here directly.
*/
for (i = 0, j = old->shared_count, k = 0; i < old->shared_count; ++i) {
struct dma_fence *f;
f = rcu_dereference_protected(old->shared[i],
			      dma_resv_held(resv));
if (f->context == ef->base.context)
	RCU_INIT_POINTER(new->shared[--j], f);
else
	RCU_INIT_POINTER(new->shared[k++], f);
}

new->shared_max = old->shared_max;

new->shared_count = k;

/* Install the new fence list, seqcount provides the barriers */

write_seqcount_begin(&resv->seq);

RCU_INIT_POINTER(resv->fence, new);

write_seqcount_end(&resv->seq);

/* Drop the references to the removed fences or move them to ef_list */

for (i = j; i < old->shared_count; ++i) {
struct dma_fence *f;
f = rcu_dereference_protected(new->shared[i],
			      dma_resv_held(resv));
dma_fence_put(f);
}

kfree_rcu(old, rcu);
replacement = dma_fence_get_stub();

dma_resv_replace_fences(bo->tbo.base.resv, ef->base.context,
		replacement);
dma_fence_put(replacement); return 0; }
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index eebf04325b34..e0be34265eae 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -457,6 +457,8 @@ void dma_resv_init(struct dma_resv *obj); void dma_resv_fini(struct dma_resv *obj); int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences); void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence); +void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context,
	     struct dma_fence *fence);
void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence); int dma_resv_get_fences(struct dma_resv *obj, struct dma_fence **pfence_excl, unsigned *pshared_count, struct dma_fence ***pshared);
-- 2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

7 Dec 7 Dec

12:33 p.m.

New subject: [PATCH 02/24] dma-buf: finally make the dma_resv_list private

Drivers should never touch this directly.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/dma-buf/dma-resv.c | 26 ++++++++++++++++++++++++++ include/linux/dma-resv.h | 26 +------------------------- 2 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index a688dbded3d3..a12a3a39f280 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -56,6 +56,19 @@ DEFINE_WD_CLASS(reservation_ww_class); EXPORT_SYMBOL(reservation_ww_class);

+/** + * struct dma_resv_list - a list of shared fences + * @rcu: for internal use + * @shared_count: table of shared fences + * @shared_max: for growing shared fence table + * @shared: shared fence table + */ +struct dma_resv_list { + struct rcu_head rcu; + u32 shared_count, shared_max; + struct dma_fence __rcu *shared[]; +}; + /** * dma_resv_list_alloc - allocate fence list * @shared_max: number of fences we need space for @@ -133,6 +146,19 @@ void dma_resv_fini(struct dma_resv *obj) } EXPORT_SYMBOL(dma_resv_fini);

+/** + * dma_resv_shared_list - get the reservation object's shared fence list + * @obj: the reservation object + * + * Returns the shared fence list. Caller must either hold the objects + * through dma_resv_lock() or the RCU read side lock through rcu_read_lock(), + * or one of the variants of each + */ +static inline struct dma_resv_list *dma_resv_shared_list(struct dma_resv *obj) +{ + return rcu_dereference_check(obj->fence, dma_resv_held(obj)); +} + /** * dma_resv_reserve_shared - Reserve space to add shared fences to * a dma_resv. diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index e0be34265eae..3baf2a4a9a0d 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -47,18 +47,7 @@

extern struct ww_class reservation_ww_class;

-/** - * struct dma_resv_list - a list of shared fences - * @rcu: for internal use - * @shared_count: table of shared fences - * @shared_max: for growing shared fence table - * @shared: shared fence table - */ -struct dma_resv_list { - struct rcu_head rcu; - u32 shared_count, shared_max; - struct dma_fence __rcu *shared[]; -}; +struct dma_resv_list;

/** * struct dma_resv - a reservation object manages fences for a buffer @@ -440,19 +429,6 @@ dma_resv_excl_fence(struct dma_resv *obj) return rcu_dereference_check(obj->fence_excl, dma_resv_held(obj)); }

-/** - * dma_resv_shared_list - get the reservation object's shared fence list - * @obj: the reservation object - * - * Returns the shared fence list. Caller must either hold the objects - * through dma_resv_lock() or the RCU read side lock through rcu_read_lock(), - * or one of the variants of each - */ -static inline struct dma_resv_list *dma_resv_shared_list(struct dma_resv *obj) -{ - return rcu_dereference_check(obj->fence, dma_resv_held(obj)); -} - void dma_resv_init(struct dma_resv *obj); void dma_resv_fini(struct dma_resv *obj); int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences);

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

9:08 p.m.

New subject: [PATCH 02/24] dma-buf: finally make the dma_resv_list private

On Tue, Dec 07, 2021 at 01:33:49PM +0100, Christian König wrote:

...

Drivers should never touch this directly.

Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/dma-resv.c | 26 ++++++++++++++++++++++++++ include/linux/dma-resv.h | 26 +------------------------- 2 files changed, 27 insertions(+), 25 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index a688dbded3d3..a12a3a39f280 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -56,6 +56,19 @@ DEFINE_WD_CLASS(reservation_ww_class); EXPORT_SYMBOL(reservation_ww_class); +/**

struct dma_resv_list - a list of shared fences

@rcu: for internal use

@shared_count: table of shared fences

@shared_max: for growing shared fence table

@shared: shared fence table

*/

Imo drop the kerneldoc here and just make these comments before the right member if you feel like keeping them. Imo it's obvious enough what's going on that the comments aren't necessary, and we don't kerneldoc document internals generally at all - only interfaces relevant by drivers and things outside of a subsystem.

...

+struct dma_resv_list {

struct rcu_head rcu;

u32 shared_count, shared_max;

struct dma_fence __rcu *shared[];

+};

/**

dma_resv_list_alloc - allocate fence list

@shared_max: number of fences we need space for

@@ -133,6 +146,19 @@ void dma_resv_fini(struct dma_resv *obj) } EXPORT_SYMBOL(dma_resv_fini); +/**

dma_resv_shared_list - get the reservation object's shared fence list

@obj: the reservation object

Returns the shared fence list. Caller must either hold the objects

through dma_resv_lock() or the RCU read side lock through rcu_read_lock(),

or one of the variants of each

*/

Same here. With that:

Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch

...

+static inline struct dma_resv_list *dma_resv_shared_list(struct dma_resv *obj) +{

return rcu_dereference_check(obj->fence, dma_resv_held(obj));

+}

/**

dma_resv_reserve_shared - Reserve space to add shared fences to

a dma_resv.

diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index e0be34265eae..3baf2a4a9a0d 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -47,18 +47,7 @@ extern struct ww_class reservation_ww_class; -/**

struct dma_resv_list - a list of shared fences

@rcu: for internal use

@shared_count: table of shared fences

@shared_max: for growing shared fence table

@shared: shared fence table

*/

-struct dma_resv_list {

struct rcu_head rcu;

u32 shared_count, shared_max;

struct dma_fence __rcu *shared[];

-}; +struct dma_resv_list; /**

struct dma_resv - a reservation object manages fences for a buffer

@@ -440,19 +429,6 @@ dma_resv_excl_fence(struct dma_resv *obj) return rcu_dereference_check(obj->fence_excl, dma_resv_held(obj)); } -/**

dma_resv_shared_list - get the reservation object's shared fence list

@obj: the reservation object

Returns the shared fence list. Caller must either hold the objects

through dma_resv_lock() or the RCU read side lock through rcu_read_lock(),

or one of the variants of each

*/

-static inline struct dma_resv_list *dma_resv_shared_list(struct dma_resv *obj) -{

return rcu_dereference_check(obj->fence, dma_resv_held(obj));

-}

void dma_resv_init(struct dma_resv *obj); void dma_resv_fini(struct dma_resv *obj); int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences); -- 2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

7 Dec 7 Dec

12:33 p.m.

New subject: [PATCH 03/24] dma-buf: drop excl_fence parameter from dma_resv_get_fences

Returning the exclusive fence separately is no longer used.

Instead add a write parameter to indicate the use case.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/dma-buf/dma-resv.c | 48 ++++++++------------ drivers/dma-buf/st-dma-resv.c | 26 ++--------- drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 6 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 2 +- drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 3 +- include/linux/dma-resv.h | 4 +- 6 files changed, 31 insertions(+), 58 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index a12a3a39f280..480c305554a1 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -611,57 +611,45 @@ EXPORT_SYMBOL(dma_resv_copy_fences); * dma_resv_get_fences - Get an object's shared and exclusive * fences without update side lock held * @obj: the reservation object - * @fence_excl: the returned exclusive fence (or NULL) - * @shared_count: the number of shared fences returned - * @shared: the array of shared fence ptrs returned (array is krealloc'd to - * the required size, and must be freed by caller) - * - * Retrieve all fences from the reservation object. If the pointer for the - * exclusive fence is not specified the fence is put into the array of the - * shared fences as well. Returns either zero or -ENOMEM. + * @write: true if we should return all fences + * @num_fences: the number of fences returned + * @fences: the array of fence ptrs returned (array is krealloc'd to the + * required size, and must be freed by caller) + * + * Retrieve all fences from the reservation object. + * Returns either zero or -ENOMEM. */ -int dma_resv_get_fences(struct dma_resv *obj, struct dma_fence **fence_excl, - unsigned int *shared_count, struct dma_fence ***shared) +int dma_resv_get_fences(struct dma_resv *obj, bool write, + unsigned int *num_fences, struct dma_fence ***fences) { struct dma_resv_iter cursor; struct dma_fence *fence;

- *shared_count = 0; - *shared = NULL; - - if (fence_excl) - *fence_excl = NULL; + *num_fences = 0; + *fences = NULL;

- dma_resv_iter_begin(&cursor, obj, true); + dma_resv_iter_begin(&cursor, obj, write); dma_resv_for_each_fence_unlocked(&cursor, fence) {

if (dma_resv_iter_is_restarted(&cursor)) { unsigned int count;

- while (*shared_count) - dma_fence_put((*shared)[--(*shared_count)]); + while (*num_fences) + dma_fence_put((*fences)[--(*num_fences)]);

- if (fence_excl) - dma_fence_put(*fence_excl); - - count = cursor.shared_count; - count += fence_excl ? 0 : 1; + count = cursor.shared_count + 1;

/* Eventually re-allocate the array */ - *shared = krealloc_array(*shared, count, + *fences = krealloc_array(*fences, count, sizeof(void *), GFP_KERNEL); - if (count && !*shared) { + if (count && !*fences) { dma_resv_iter_end(&cursor); return -ENOMEM; } }

- dma_fence_get(fence); - if (dma_resv_iter_is_exclusive(&cursor) && fence_excl) - *fence_excl = fence; - else - (*shared)[(*shared_count)++] = fence; + (*fences)[(*num_fences)++] = dma_fence_get(fence); } dma_resv_iter_end(&cursor);

diff --git a/drivers/dma-buf/st-dma-resv.c b/drivers/dma-buf/st-dma-resv.c index bc32b3eedcb6..cbe999c6e7a6 100644 --- a/drivers/dma-buf/st-dma-resv.c +++ b/drivers/dma-buf/st-dma-resv.c @@ -275,7 +275,7 @@ static int test_shared_for_each_unlocked(void *arg)

static int test_get_fences(void *arg, bool shared) { - struct dma_fence *f, *excl = NULL, **fences = NULL; + struct dma_fence *f, **fences = NULL; struct dma_resv resv; int r, i;

@@ -304,35 +304,19 @@ static int test_get_fences(void *arg, bool shared) } dma_resv_unlock(&resv);

- r = dma_resv_get_fences(&resv, &excl, &i, &fences); + r = dma_resv_get_fences(&resv, shared, &i, &fences); if (r) { pr_err("get_fences failed\n"); goto err_free; }

- if (shared) { - if (excl != NULL) { - pr_err("get_fences returned unexpected excl fence\n"); - goto err_free; - } - if (i != 1 || fences[0] != f) { - pr_err("get_fences returned unexpected shared fence\n"); - goto err_free; - } - } else { - if (excl != f) { - pr_err("get_fences returned unexpected excl fence\n"); - goto err_free; - } - if (i != 0) { - pr_err("get_fences returned unexpected shared fence\n"); - goto err_free; - } + if (i != 1 || fences[0] != f) { + pr_err("get_fences returned unexpected fence\n"); + goto err_free; }

dma_fence_signal(f); err_free: - dma_fence_put(excl); while (i--) dma_fence_put(fences[i]); kfree(fences); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c index 68108f151dad..d17e1c911689 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c @@ -200,8 +200,10 @@ int amdgpu_display_crtc_page_flip_target(struct drm_crtc *crtc, goto unpin; }

- r = dma_resv_get_fences(new_abo->tbo.base.resv, NULL, - &work->shared_count, &work->shared); + /* TODO: Unify this with other drivers */ + r = dma_resv_get_fences(new_abo->tbo.base.resv, true, + &work->shared_count, + &work->shared); if (unlikely(r != 0)) { DRM_ERROR("failed to get fences for buffer\n"); goto unpin; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c index b7fb72bff2c1..be48487e2ca7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c @@ -112,7 +112,7 @@ void amdgpu_pasid_free_delayed(struct dma_resv *resv, unsigned count; int r;

- r = dma_resv_get_fences(resv, NULL, &count, &fences); + r = dma_resv_get_fences(resv, true, &count, &fences); if (r) goto fallback;

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c index b5e8ce86dbe7..64c90ff348f2 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c @@ -189,8 +189,7 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit) continue;

if (bo->flags & ETNA_SUBMIT_BO_WRITE) { - ret = dma_resv_get_fences(robj, NULL, - &bo->nr_shared, + ret = dma_resv_get_fences(robj, true, &bo->nr_shared, &bo->shared); if (ret) return ret; diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 3baf2a4a9a0d..fa2002939b19 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -436,8 +436,8 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence); void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context, struct dma_fence *fence); void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence); -int dma_resv_get_fences(struct dma_resv *obj, struct dma_fence **pfence_excl, - unsigned *pshared_count, struct dma_fence ***pshared); +int dma_resv_get_fences(struct dma_resv *obj, bool write, + unsigned int *num_fences, struct dma_fence ***fences); int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src); long dma_resv_wait_timeout(struct dma_resv *obj, bool wait_all, bool intr, unsigned long timeout);

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

9:13 p.m.

New subject: [PATCH 03/24] dma-buf: drop excl_fence parameter from dma_resv_get_fences

On Tue, Dec 07, 2021 at 01:33:50PM +0100, Christian König wrote:

...

Returning the exclusive fence separately is no longer used.

Instead add a write parameter to indicate the use case.

Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/dma-resv.c | 48 ++++++++------------ drivers/dma-buf/st-dma-resv.c | 26 ++--------- drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 6 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 2 +- drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 3 +- include/linux/dma-resv.h | 4 +- 6 files changed, 31 insertions(+), 58 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index a12a3a39f280..480c305554a1 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -611,57 +611,45 @@ EXPORT_SYMBOL(dma_resv_copy_fences);

dma_resv_get_fences - Get an object's shared and exclusive

fences without update side lock held

@obj: the reservation object

@fence_excl: the returned exclusive fence (or NULL)

@shared_count: the number of shared fences returned

@shared: the array of shared fence ptrs returned (array is krealloc'd to

the required size, and must be freed by caller)

Retrieve all fences from the reservation object. If the pointer for the

exclusive fence is not specified the fence is put into the array of the

shared fences as well. Returns either zero or -ENOMEM.

@write: true if we should return all fences

I'm assuming that this will be properly documented later on in the series ...

...

@num_fences: the number of fences returned

@fences: the array of fence ptrs returned (array is krealloc'd to the

required size, and must be freed by caller)

Retrieve all fences from the reservation object.

Returns either zero or -ENOMEM.

*/

-int dma_resv_get_fences(struct dma_resv *obj, struct dma_fence **fence_excl,
	unsigned int *shared_count, struct dma_fence ***shared)
+int dma_resv_get_fences(struct dma_resv *obj, bool write,
	unsigned int *num_fences, struct dma_fence ***fences)
{ struct dma_resv_iter cursor; struct dma_fence *fence;
*shared_count = 0;

*shared = NULL;

if (fence_excl)
*fence_excl = NULL;
*num_fences = 0;

*fences = NULL;

dma_resv_iter_begin(&cursor, obj, true);

dma_resv_iter_begin(&cursor, obj, write); dma_resv_for_each_fence_unlocked(&cursor, fence) {

if (dma_resv_iter_is_restarted(&cursor)) { unsigned int count;
	while (*shared_count)
		dma_fence_put((*shared)[--(*shared_count)]);
	while (*num_fences)
		dma_fence_put((*fences)[--(*num_fences)]);
	if (fence_excl)
		dma_fence_put(*fence_excl);
	count = cursor.shared_count;
	count += fence_excl ? 0 : 1;
	count = cursor.shared_count + 1;
/* Eventually re-allocate the array */
	*shared = krealloc_array(*shared, count,
	*fences = krealloc_array(*fences, count,
			 sizeof(void *),
			 GFP_KERNEL);
	if (count && !*shared) {
	if (count && !*fences) {
	dma_resv_iter_end(&cursor);
	return -ENOMEM;
}
}
dma_fence_get(fence);
if (dma_resv_iter_is_exclusive(&cursor) && fence_excl)
	*fence_excl = fence;
else
	(*shared)[(*shared_count)++] = fence;
(*fences)[(*num_fences)++] = dma_fence_get(fence);
} dma_resv_iter_end(&cursor);
diff --git a/drivers/dma-buf/st-dma-resv.c b/drivers/dma-buf/st-dma-resv.c index bc32b3eedcb6..cbe999c6e7a6 100644 --- a/drivers/dma-buf/st-dma-resv.c +++ b/drivers/dma-buf/st-dma-resv.c @@ -275,7 +275,7 @@ static int test_shared_for_each_unlocked(void *arg) static int test_get_fences(void *arg, bool shared) {

struct dma_fence *f, *excl = NULL, **fences = NULL;

struct dma_fence *f, **fences = NULL; struct dma_resv resv; int r, i;

@@ -304,35 +304,19 @@ static int test_get_fences(void *arg, bool shared) } dma_resv_unlock(&resv);

r = dma_resv_get_fences(&resv, &excl, &i, &fences);

r = dma_resv_get_fences(&resv, shared, &i, &fences); if (r) { pr_err("get_fences failed\n"); goto err_free; }
if (shared) {
if (excl != NULL) {
	pr_err("get_fences returned unexpected excl fence\n");
	goto err_free;
}
if (i != 1 || fences[0] != f) {
	pr_err("get_fences returned unexpected shared fence\n");
	goto err_free;
}
} else {
if (excl != f) {
	pr_err("get_fences returned unexpected excl fence\n");
	goto err_free;
}
if (i != 0) {
	pr_err("get_fences returned unexpected shared fence\n");
	goto err_free;
}
if (i != 1 || fences[0] != f) {
pr_err("get_fences returned unexpected fence\n");
goto err_free;
}
dma_fence_signal(f); err_free:

dma_fence_put(excl); while (i--) dma_fence_put(fences[i]); kfree(fences);

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c index 68108f151dad..d17e1c911689 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c @@ -200,8 +200,10 @@ int amdgpu_display_crtc_page_flip_target(struct drm_crtc *crtc, goto unpin; }
r = dma_resv_get_fences(new_abo->tbo.base.resv, NULL,
		&work->shared_count, &work->shared);
/* TODO: Unify this with other drivers */

r = dma_resv_get_fences(new_abo->tbo.base.resv, true,
		&work->shared_count,
		&work->shared);
if (unlikely(r != 0)) { DRM_ERROR("failed to get fences for buffer\n"); goto unpin;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c index b7fb72bff2c1..be48487e2ca7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c @@ -112,7 +112,7 @@ void amdgpu_pasid_free_delayed(struct dma_resv *resv, unsigned count; int r;

r = dma_resv_get_fences(resv, NULL, &count, &fences);

r = dma_resv_get_fences(resv, true, &count, &fences); if (r) goto fallback;

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c index b5e8ce86dbe7..64c90ff348f2 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c @@ -189,8 +189,7 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit) continue; if (bo->flags & ETNA_SUBMIT_BO_WRITE) {
	ret = dma_resv_get_fences(robj, NULL,
				  &bo->nr_shared,
	ret = dma_resv_get_fences(robj, true, &bo->nr_shared,
			  &bo->shared);
if (ret)
	return ret;
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 3baf2a4a9a0d..fa2002939b19 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -436,8 +436,8 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence); void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context, struct dma_fence *fence); void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence); -int dma_resv_get_fences(struct dma_resv *obj, struct dma_fence **pfence_excl,
	unsigned *pshared_count, struct dma_fence ***pshared);
+int dma_resv_get_fences(struct dma_resv *obj, bool write,
	unsigned int *num_fences, struct dma_fence ***fences);
int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src); long dma_resv_wait_timeout(struct dma_resv *obj, bool wait_all, bool intr, unsigned long timeout); -- 2.25.1

Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

7 Dec 7 Dec

12:33 p.m.

New subject: [PATCH 04/24] dma-buf: add dma_resv_get_singleton v2

Add a function to simplify getting a single fence for all the fences in the dma_resv object.

v2: fix ref leak in error handling

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/dma-buf/dma-resv.c | 52 ++++++++++++++++++++++++++++++++++++++ include/linux/dma-resv.h | 2 ++ 2 files changed, 54 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 480c305554a1..694716a3d66d 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@ */

#include <linux/dma-resv.h> +#include <linux/dma-fence-array.h> #include <linux/export.h> #include <linux/mm.h> #include <linux/sched/mm.h> @@ -657,6 +658,57 @@ int dma_resv_get_fences(struct dma_resv *obj, bool write, } EXPORT_SYMBOL_GPL(dma_resv_get_fences);

+/** + * dma_resv_get_singleton - Get a single fence for all the fences + * @obj: the reservation object + * @write: true if we should return all fences + * @fence: the resulting fence + * + * Get a single fence representing all the fences inside the resv object. + * Returns either 0 for success or -ENOMEM. + * + * Warning: This can't be used like this when adding the fence back to the resv + * object since that can lead to stack corruption when finalizing the + * dma_fence_array. + */ +int dma_resv_get_singleton(struct dma_resv *obj, bool write, + struct dma_fence **fence) +{ + struct dma_fence_array *array; + struct dma_fence **fences; + unsigned count; + int r; + + r = dma_resv_get_fences(obj, write, &count, &fences); + if (r) + return r; + + if (count == 0) { + *fence = NULL; + return 0; + } + + if (count == 1) { + *fence = fences[0]; + kfree(fences); + return 0; + } + + array = dma_fence_array_create(count, fences, + dma_fence_context_alloc(1), + 1, false); + if (!array) { + while (count--) + dma_fence_put(fences[count]); + kfree(fences); + return -ENOMEM; + } + + *fence = &array->base; + return 0; +} +EXPORT_SYMBOL_GPL(dma_resv_get_singleton); + /** * dma_resv_wait_timeout - Wait on reservation's objects * shared and/or exclusive fences. diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index fa2002939b19..cdfbbda6f600 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -438,6 +438,8 @@ void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context, void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence); int dma_resv_get_fences(struct dma_resv *obj, bool write, unsigned int *num_fences, struct dma_fence ***fences); +int dma_resv_get_singleton(struct dma_resv *obj, bool write, + struct dma_fence **fence); int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src); long dma_resv_wait_timeout(struct dma_resv *obj, bool wait_all, bool intr, unsigned long timeout);

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

9:21 p.m.

New subject: [PATCH 04/24] dma-buf: add dma_resv_get_singleton v2

On Tue, Dec 07, 2021 at 01:33:51PM +0100, Christian König wrote:

...

Add a function to simplify getting a single fence for all the fences in the dma_resv object.

v2: fix ref leak in error handling

Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/dma-resv.c | 52 ++++++++++++++++++++++++++++++++++++++ include/linux/dma-resv.h | 2 ++ 2 files changed, 54 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 480c305554a1..694716a3d66d 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@ */ #include <linux/dma-resv.h> +#include <linux/dma-fence-array.h> #include <linux/export.h> #include <linux/mm.h> #include <linux/sched/mm.h> @@ -657,6 +658,57 @@ int dma_resv_get_fences(struct dma_resv *obj, bool write, } EXPORT_SYMBOL_GPL(dma_resv_get_fences); +/**

dma_resv_get_singleton - Get a single fence for all the fences

@obj: the reservation object

@write: true if we should return all fences

@fence: the resulting fence

Get a single fence representing all the fences inside the resv object.

Returns either 0 for success or -ENOMEM.

Warning: This can't be used like this when adding the fence back to the resv

object since that can lead to stack corruption when finalizing the

dma_fence_array.

Uh I don't get this one? I thought the only problem with nested fences is the signalling recursion, which we work around with the irq_work?

Also if there's really an issue with dma_fence_array fences, then that warning should be on the dma_resv kerneldoc, not somewhere hidden like this. And finally I really don't see what can go wrong, sure we'll end up with the same fence once in the dma_resv_list and then once more in the fence array. But they're all refcounted, so really shouldn't matter.

The code itself looks correct, but me not understanding what even goes wrong here freaks me out a bit.

I guess something to figure out next year, I kinda hoped I could squeeze a review in before I disappear :-/ -Daniel

...

*/

+int dma_resv_get_singleton(struct dma_resv *obj, bool write,
	   struct dma_fence **fence)
+{
struct dma_fence_array *array;

struct dma_fence **fences;

unsigned count;

int r;

r = dma_resv_get_fences(obj, write, &count, &fences);
   if (r)
return r;
if (count == 0) {
*fence = NULL;
return 0;
}

if (count == 1) {
*fence = fences[0];
kfree(fences);
return 0;
}

array = dma_fence_array_create(count, fences,
		       dma_fence_context_alloc(1),
		       1, false);
if (!array) {
while (count--)
	dma_fence_put(fences[count]);
kfree(fences);
return -ENOMEM;
}

*fence = &array->base;

return 0;
+} +EXPORT_SYMBOL_GPL(dma_resv_get_singleton);

/**

dma_resv_wait_timeout - Wait on reservation's objects

shared and/or exclusive fences.

diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index fa2002939b19..cdfbbda6f600 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -438,6 +438,8 @@ void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context, void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence); int dma_resv_get_fences(struct dma_resv *obj, bool write, unsigned int *num_fences, struct dma_fence ***fences); +int dma_resv_get_singleton(struct dma_resv *obj, bool write,
	   struct dma_fence **fence);
int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src); long dma_resv_wait_timeout(struct dma_resv *obj, bool wait_all, bool intr, unsigned long timeout); -- 2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

3 Jan 3 Jan

11:13 a.m.

New subject: [PATCH 04/24] dma-buf: add dma_resv_get_singleton v2

Am 22.12.21 um 22:21 schrieb Daniel Vetter:

...

On Tue, Dec 07, 2021 at 01:33:51PM +0100, Christian König wrote:

...
Add a function to simplify getting a single fence for all the fences in the dma_resv object.

v2: fix ref leak in error handling

Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/dma-resv.c | 52 ++++++++++++++++++++++++++++++++++++++ include/linux/dma-resv.h | 2 ++ 2 files changed, 54 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 480c305554a1..694716a3d66d 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@ */ #include <linux/dma-resv.h> +#include <linux/dma-fence-array.h> #include <linux/export.h> #include <linux/mm.h> #include <linux/sched/mm.h> @@ -657,6 +658,57 @@ int dma_resv_get_fences(struct dma_resv *obj, bool write, } EXPORT_SYMBOL_GPL(dma_resv_get_fences); +/**

dma_resv_get_singleton - Get a single fence for all the fences

@obj: the reservation object

@write: true if we should return all fences

@fence: the resulting fence

Get a single fence representing all the fences inside the resv object.

Returns either 0 for success or -ENOMEM.

Warning: This can't be used like this when adding the fence back to the resv

object since that can lead to stack corruption when finalizing the

dma_fence_array.

Uh I don't get this one? I thought the only problem with nested fences is the signalling recursion, which we work around with the irq_work?

Nope, the main problem is finalizing the dma_fence_array.

E.g. imagine that you build up a chain of dma_fence_array objects like this: a<-b<-c<-d<-e<-f.....

With each one referencing the previous dma_fence_array and then you call dma_fence_put() on the last one. That in turn will cause calling dma_fence_put() on the previous one, which in turn will cause dma_fence_put() one the one before the previous one etc....

In other words you recurse because each dma_fence_array instance drops the last reference of it's predecessor.

What we could do is to delegate dropping the reference to the containing fences in a dma_fence_array as well, but that would require some changes to the irq_work_run_list() function to be halve way efficient.

...

Also if there's really an issue with dma_fence_array fences, then that warning should be on the dma_resv kerneldoc, not somewhere hidden like this. And finally I really don't see what can go wrong, sure we'll end up with the same fence once in the dma_resv_list and then once more in the fence array. But they're all refcounted, so really shouldn't matter.

The code itself looks correct, but me not understanding what even goes wrong here freaks me out a bit.

Yeah, IIRC we already discussed that with Jason in length as well.

Essentially what you can't do is to put a dma_fence_array into another dma_fence_array without causing issues.

So I think we should maybe just add a WARN_ON() into dma_fence_array_init() to make sure that this never happens.

Regards, Christian.

...

I guess something to figure out next year, I kinda hoped I could squeeze a review in before I disappear :-/ -Daniel

...
*/

+int dma_resv_get_singleton(struct dma_resv *obj, bool write,
	   struct dma_fence **fence)
+{
struct dma_fence_array *array;

struct dma_fence **fences;

unsigned count;

int r;

r = dma_resv_get_fences(obj, write, &count, &fences);
   if (r)
return r;
if (count == 0) {
*fence = NULL;
return 0;
}

if (count == 1) {
*fence = fences[0];
kfree(fences);
return 0;
}

array = dma_fence_array_create(count, fences,
		       dma_fence_context_alloc(1),
		       1, false);
if (!array) {
while (count--)
	dma_fence_put(fences[count]);
kfree(fences);
return -ENOMEM;
}

*fence = &array->base;

return 0;
+} +EXPORT_SYMBOL_GPL(dma_resv_get_singleton);

/**

dma_resv_wait_timeout - Wait on reservation's objects

shared and/or exclusive fences.

diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index fa2002939b19..cdfbbda6f600 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -438,6 +438,8 @@ void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context, void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence); int dma_resv_get_fences(struct dma_resv *obj, bool write, unsigned int *num_fences, struct dma_fence ***fences); +int dma_resv_get_singleton(struct dma_resv *obj, bool write,
	   struct dma_fence **fence);
int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src); long dma_resv_wait_timeout(struct dma_resv *obj, bool wait_all, bool intr, unsigned long timeout);
-- 2.25.1

Daniel Vetter

14 Jan 14 Jan

4:31 p.m.

New subject: [PATCH 04/24] dma-buf: add dma_resv_get_singleton v2

On Mon, Jan 03, 2022 at 12:13:41PM +0100, Christian König wrote:

...

Am 22.12.21 um 22:21 schrieb Daniel Vetter:

...
On Tue, Dec 07, 2021 at 01:33:51PM +0100, Christian König wrote:

...
Add a function to simplify getting a single fence for all the fences in the dma_resv object.

v2: fix ref leak in error handling

Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/dma-resv.c | 52 ++++++++++++++++++++++++++++++++++++++ include/linux/dma-resv.h | 2 ++ 2 files changed, 54 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 480c305554a1..694716a3d66d 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@ */ #include <linux/dma-resv.h> +#include <linux/dma-fence-array.h> #include <linux/export.h> #include <linux/mm.h> #include <linux/sched/mm.h> @@ -657,6 +658,57 @@ int dma_resv_get_fences(struct dma_resv *obj, bool write, } EXPORT_SYMBOL_GPL(dma_resv_get_fences); +/**

dma_resv_get_singleton - Get a single fence for all the fences

@obj: the reservation object

@write: true if we should return all fences

@fence: the resulting fence

Get a single fence representing all the fences inside the resv object.

Returns either 0 for success or -ENOMEM.

Warning: This can't be used like this when adding the fence back to the resv

object since that can lead to stack corruption when finalizing the

dma_fence_array.

Uh I don't get this one? I thought the only problem with nested fences is the signalling recursion, which we work around with the irq_work?

Nope, the main problem is finalizing the dma_fence_array.

E.g. imagine that you build up a chain of dma_fence_array objects like this: a<-b<-c<-d<-e<-f.....

With each one referencing the previous dma_fence_array and then you call dma_fence_put() on the last one. That in turn will cause calling dma_fence_put() on the previous one, which in turn will cause dma_fence_put() one the one before the previous one etc....

In other words you recurse because each dma_fence_array instance drops the last reference of it's predecessor.

What we could do is to delegate dropping the reference to the containing fences in a dma_fence_array as well, but that would require some changes to the irq_work_run_list() function to be halve way efficient.o

...
Also if there's really an issue with dma_fence_array fences, then that warning should be on the dma_resv kerneldoc, not somewhere hidden like this. And finally I really don't see what can go wrong, sure we'll end up with the same fence once in the dma_resv_list and then once more in the fence array. But they're all refcounted, so really shouldn't matter.

The code itself looks correct, but me not understanding what even goes wrong here freaks me out a bit.

Yeah, IIRC we already discussed that with Jason in length as well.

Essentially what you can't do is to put a dma_fence_array into another dma_fence_array without causing issues.

So I think we should maybe just add a WARN_ON() into dma_fence_array_init() to make sure that this never happens.

Yeah I think this would be much clearer instead of sprinkling half the story as a scary&confusing warning over all kinds of users which internally use dma fence arrays.

And then if it goes boom I guess we could fix it internally in dma_fence_array_init by flattening fences down again. But only if actually needed.

What confused me is why dma_resv is special, and from your reply it sounds like it really isn't. -Daniel

...

Regards, Christian.

...
I guess something to figure out next year, I kinda hoped I could squeeze a review in before I disappear :-/ -Daniel

...
*/

+int dma_resv_get_singleton(struct dma_resv *obj, bool write,
	   struct dma_fence **fence)
+{
struct dma_fence_array *array;

struct dma_fence **fences;

unsigned count;

int r;

r = dma_resv_get_fences(obj, write, &count, &fences);
   if (r)
return r;
if (count == 0) {
*fence = NULL;
return 0;
}

if (count == 1) {
*fence = fences[0];
kfree(fences);
return 0;
}

array = dma_fence_array_create(count, fences,
		       dma_fence_context_alloc(1),
		       1, false);
if (!array) {
while (count--)
	dma_fence_put(fences[count]);
kfree(fences);
return -ENOMEM;
}

*fence = &array->base;

return 0;
+} +EXPORT_SYMBOL_GPL(dma_resv_get_singleton);

/**

dma_resv_wait_timeout - Wait on reservation's objects

shared and/or exclusive fences.

diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index fa2002939b19..cdfbbda6f600 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -438,6 +438,8 @@ void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context, void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence); int dma_resv_get_fences(struct dma_resv *obj, bool write, unsigned int *num_fences, struct dma_fence ***fences); +int dma_resv_get_singleton(struct dma_resv *obj, bool write,
	   struct dma_fence **fence);
int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src); long dma_resv_wait_timeout(struct dma_resv *obj, bool wait_all, bool intr, unsigned long timeout);
-- 2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

17 Jan 17 Jan

11:26 a.m.

New subject: [PATCH 04/24] dma-buf: add dma_resv_get_singleton v2

Am 14.01.22 um 17:31 schrieb Daniel Vetter:

...

On Mon, Jan 03, 2022 at 12:13:41PM +0100, Christian König wrote:

...
Am 22.12.21 um 22:21 schrieb Daniel Vetter:

...
On Tue, Dec 07, 2021 at 01:33:51PM +0100, Christian König wrote:

...
Add a function to simplify getting a single fence for all the fences in the dma_resv object.

v2: fix ref leak in error handling

Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/dma-resv.c | 52 ++++++++++++++++++++++++++++++++++++++ include/linux/dma-resv.h | 2 ++ 2 files changed, 54 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 480c305554a1..694716a3d66d 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@ */ #include <linux/dma-resv.h> +#include <linux/dma-fence-array.h> #include <linux/export.h> #include <linux/mm.h> #include <linux/sched/mm.h> @@ -657,6 +658,57 @@ int dma_resv_get_fences(struct dma_resv *obj, bool write, } EXPORT_SYMBOL_GPL(dma_resv_get_fences); +/**

dma_resv_get_singleton - Get a single fence for all the fences

@obj: the reservation object

@write: true if we should return all fences

@fence: the resulting fence

Get a single fence representing all the fences inside the resv object.

Returns either 0 for success or -ENOMEM.

Warning: This can't be used like this when adding the fence back to the resv

object since that can lead to stack corruption when finalizing the

dma_fence_array.

Uh I don't get this one? I thought the only problem with nested fences is the signalling recursion, which we work around with the irq_work?

Nope, the main problem is finalizing the dma_fence_array.

E.g. imagine that you build up a chain of dma_fence_array objects like this: a<-b<-c<-d<-e<-f.....

With each one referencing the previous dma_fence_array and then you call dma_fence_put() on the last one. That in turn will cause calling dma_fence_put() on the previous one, which in turn will cause dma_fence_put() one the one before the previous one etc....

In other words you recurse because each dma_fence_array instance drops the last reference of it's predecessor.

What we could do is to delegate dropping the reference to the containing fences in a dma_fence_array as well, but that would require some changes to the irq_work_run_list() function to be halve way efficient.o

...
Also if there's really an issue with dma_fence_array fences, then that warning should be on the dma_resv kerneldoc, not somewhere hidden like this. And finally I really don't see what can go wrong, sure we'll end up with the same fence once in the dma_resv_list and then once more in the fence array. But they're all refcounted, so really shouldn't matter.

The code itself looks correct, but me not understanding what even goes wrong here freaks me out a bit.

Yeah, IIRC we already discussed that with Jason in length as well.

Essentially what you can't do is to put a dma_fence_array into another dma_fence_array without causing issues.

So I think we should maybe just add a WARN_ON() into dma_fence_array_init() to make sure that this never happens.

Yeah I think this would be much clearer instead of sprinkling half the story as a scary&confusing warning over all kinds of users which internally use dma fence arrays.

And then if it goes boom I guess we could fix it internally in dma_fence_array_init by flattening fences down again. But only if actually needed.

Ok, going to do that first then.

...

What confused me is why dma_resv is special, and from your reply it sounds like it really isn't.

Well, it isn't special in any way. It's just something very obvious which could go wrong.

Regards, Christian.

...

-Daniel

...
Regards, Christian.

...
I guess something to figure out next year, I kinda hoped I could squeeze a review in before I disappear :-/ -Daniel

...
*/

+int dma_resv_get_singleton(struct dma_resv *obj, bool write,
	   struct dma_fence **fence)
+{
struct dma_fence_array *array;

struct dma_fence **fences;

unsigned count;

int r;

r = dma_resv_get_fences(obj, write, &count, &fences);
   if (r)
return r;
if (count == 0) {
*fence = NULL;
return 0;
}

if (count == 1) {
*fence = fences[0];
kfree(fences);
return 0;
}

array = dma_fence_array_create(count, fences,
		       dma_fence_context_alloc(1),
		       1, false);
if (!array) {
while (count--)
	dma_fence_put(fences[count]);
kfree(fences);
return -ENOMEM;
}

*fence = &array->base;

return 0;
+} +EXPORT_SYMBOL_GPL(dma_resv_get_singleton);

/**

dma_resv_wait_timeout - Wait on reservation's objects

shared and/or exclusive fences.

diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index fa2002939b19..cdfbbda6f600 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -438,6 +438,8 @@ void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context, void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence); int dma_resv_get_fences(struct dma_resv *obj, bool write, unsigned int *num_fences, struct dma_fence ***fences); +int dma_resv_get_singleton(struct dma_resv *obj, bool write,
	   struct dma_fence **fence);
int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src); long dma_resv_wait_timeout(struct dma_resv *obj, bool wait_all, bool intr, unsigned long timeout);
-- 2.25.1

Jason Ekstrand

2 Mar 2 Mar

5:39 p.m.

New subject: [PATCH 04/24] dma-buf: add dma_resv_get_singleton v2

On Mon, Jan 17, 2022 at 5:26 AM Christian König < ckoenig.leichtzumerken@gmail.com> wrote:

...

Am 14.01.22 um 17:31 schrieb Daniel Vetter:

...
On Mon, Jan 03, 2022 at 12:13:41PM +0100, Christian König wrote:

...
Am 22.12.21 um 22:21 schrieb Daniel Vetter:

...
On Tue, Dec 07, 2021 at 01:33:51PM +0100, Christian König wrote:

...
Add a function to simplify getting a single fence for all the fences

in

...
...
...
...
the dma_resv object.

v2: fix ref leak in error handling

Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/dma-resv.c | 52

++++++++++++++++++++++++++++++++++++++

...
...
...
...
include/linux/dma-resv.h | 2 ++ 2 files changed, 54 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 480c305554a1..694716a3d66d 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -34,6 +34,7 @@ */ #include <linux/dma-resv.h> +#include <linux/dma-fence-array.h> #include <linux/export.h> #include <linux/mm.h> #include <linux/sched/mm.h> @@ -657,6 +658,57 @@ int dma_resv_get_fences(struct dma_resv *obj,

bool write,

...
...
...
...
} EXPORT_SYMBOL_GPL(dma_resv_get_fences); +/**

dma_resv_get_singleton - Get a single fence for all the fences

@obj: the reservation object

@write: true if we should return all fences

@fence: the resulting fence

Get a single fence representing all the fences inside the resv

object.

...
...
...
...

Returns either 0 for success or -ENOMEM.

Warning: This can't be used like this when adding the fence back

to the resv

...
...
...
...

object since that can lead to stack corruption when finalizing the

dma_fence_array.

Uh I don't get this one? I thought the only problem with nested fences

is

...
...
...
the signalling recursion, which we work around with the irq_work?

Nope, the main problem is finalizing the dma_fence_array.

E.g. imagine that you build up a chain of dma_fence_array objects like

this:

...
...
a<-b<-c<-d<-e<-f.....

With each one referencing the previous dma_fence_array and then you call dma_fence_put() on the last one. That in turn will cause calling dma_fence_put() on the previous one, which in turn will cause dma_fence_put() one the one before the previous one etc....

In other words you recurse because each dma_fence_array instance drops

the

...
...
last reference of it's predecessor.

What we could do is to delegate dropping the reference to the containing fences in a dma_fence_array as well, but that would require some

changes to

...
...
the irq_work_run_list() function to be halve way efficient.o

...
Also if there's really an issue with dma_fence_array fences, then that warning should be on the dma_resv kerneldoc, not somewhere hidden like this. And finally I really don't see what can go wrong, sure we'll end

up

...
...
...
with the same fence once in the dma_resv_list and then once more in the fence array. But they're all refcounted, so really shouldn't matter.

The code itself looks correct, but me not understanding what even goes wrong here freaks me out a bit.

Yeah, IIRC we already discussed that with Jason in length as well.

Essentially what you can't do is to put a dma_fence_array into another dma_fence_array without causing issues.

So I think we should maybe just add a WARN_ON() into

dma_fence_array_init()

...
...
to make sure that this never happens.

Yeah I think this would be much clearer instead of sprinkling half the story as a scary&confusing warning over all kinds of users which internally use dma fence arrays.

Agreed. WARN_ON in dma_fence_array_init() would be better for everyone, I think.

...

...
And then if it goes boom I guess we could fix it internally in dma_fence_array_init by flattening fences down again. But only if

actually

...
needed.

Ok, going to do that first then.

Sounds good. This patch looks pretty reasonable to me. I do have a bit of a concern with how it's being used to replace calls to dma_resv_excl_fence() in later patches, though. In particular, this may allocate memory whereas dma_resv_excl_fence() does not so we need to be really careful in each of the replacements that doing so is safe. That's a job for the per-driver reviewers but I thought I'd drop a note here so we're all aware of and watching for it.

--Jason

...

...
What confused me is why dma_resv is special, and from your reply it

sounds

...
like it really isn't.

Well, it isn't special in any way. It's just something very obvious which could go wrong.

Regards, Christian.

...
-Daniel

...
Regards, Christian.

...
I guess something to figure out next year, I kinda hoped I could

squeeze a

...
...
...
review in before I disappear :-/ -Daniel

...
*/

+int dma_resv_get_singleton(struct dma_resv *obj, bool write,
                struct dma_fence **fence)
+{
struct dma_fence_array *array;

struct dma_fence **fences;

unsigned count;

int r;

r = dma_resv_get_fences(obj, write, &count, &fences);
   if (r)
     return r;
if (count == 0) {
     *fence = NULL;
     return 0;
}

if (count == 1) {
     *fence = fences[0];
     kfree(fences);
     return 0;
}

array = dma_fence_array_create(count, fences,
                            dma_fence_context_alloc(1),
                            1, false);
if (!array) {
     while (count--)
             dma_fence_put(fences[count]);
     kfree(fences);
     return -ENOMEM;
}

*fence = &array->base;

return 0;
+} +EXPORT_SYMBOL_GPL(dma_resv_get_singleton);

/**

dma_resv_wait_timeout - Wait on reservation's objects

shared and/or exclusive fences.

diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index fa2002939b19..cdfbbda6f600 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -438,6 +438,8 @@ void dma_resv_replace_fences(struct dma_resv
*obj, uint64_t context,

...
...
...
...
void dma_resv_add_excl_fence(struct dma_resv *obj, struct

dma_fence *fence);

...
...
...
...
int dma_resv_get_fences(struct dma_resv *obj, bool write, unsigned int *num_fences, struct dma_fence

***fences);

...
...
...
...
+int dma_resv_get_singleton(struct dma_resv *obj, bool write,
                struct dma_fence **fence);
int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv
*src);

...
...
...
...
long dma_resv_wait_timeout(struct dma_resv *obj, bool wait_all,

bool intr,

...
...
...
...
                          unsigned long timeout);
-- 2.25.1

Christian König

7 Dec 7 Dec

12:33 p.m.

New subject: [PATCH 05/24] RDMA: use dma_resv_wait() instead of extracting the fence

Use dma_resv_wait() instead of extracting the exclusive fence and waiting on it manually.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/infiniband/core/umem_dmabuf.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c index f0760741f281..d32cd7538835 100644 --- a/drivers/infiniband/core/umem_dmabuf.c +++ b/drivers/infiniband/core/umem_dmabuf.c @@ -16,7 +16,6 @@ int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf) { struct sg_table *sgt; struct scatterlist *sg; - struct dma_fence *fence; unsigned long start, end, cur = 0; unsigned int nmap = 0; int i; @@ -68,11 +67,8 @@ int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf) * may be not up-to-date. Wait for the exporter to finish * the migration. */ - fence = dma_resv_excl_fence(umem_dmabuf->attach->dmabuf->resv); - if (fence) - return dma_fence_wait(fence, false); - - return 0; + return dma_resv_wait_timeout(umem_dmabuf->attach->dmabuf->resv, false, + false, MAX_SCHEDULE_TIMEOUT); } EXPORT_SYMBOL(ib_umem_dmabuf_map_pages);

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

9:23 p.m.

New subject: [PATCH 05/24] RDMA: use dma_resv_wait() instead of extracting the fence

On Tue, Dec 07, 2021 at 01:33:52PM +0100, Christian König wrote:

...

Use dma_resv_wait() instead of extracting the exclusive fence and waiting on it manually.

Signed-off-by: Christian König christian.koenig@amd.com

No rdma lists nor maintainers on cc, so no chances to get the ack you need to merge this through drm-misc-next.

...

drivers/infiniband/core/umem_dmabuf.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c index f0760741f281..d32cd7538835 100644 --- a/drivers/infiniband/core/umem_dmabuf.c +++ b/drivers/infiniband/core/umem_dmabuf.c @@ -16,7 +16,6 @@ int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf) { struct sg_table *sgt; struct scatterlist *sg;

struct dma_fence *fence; unsigned long start, end, cur = 0; unsigned int nmap = 0; int i;

@@ -68,11 +67,8 @@ int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf) * may be not up-to-date. Wait for the exporter to finish * the migration. */
fence = dma_resv_excl_fence(umem_dmabuf->attach->dmabuf->resv);

if (fence)
return dma_fence_wait(fence, false);
return 0;
return dma_resv_wait_timeout(umem_dmabuf->attach->dmabuf->resv, false,
		     false, MAX_SCHEDULE_TIMEOUT);

I think a wrapper for dma_resv_wait() without timeout would be neat, which we lack. Either way:

Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch

...

} EXPORT_SYMBOL(ib_umem_dmabuf_map_pages); -- 2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

7 Dec 7 Dec

12:33 p.m.

New subject: [PATCH 06/24] drm/etnaviv: stop using dma_resv_excl_fence

We can get the excl fence together with the shared ones as well.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/etnaviv/etnaviv_gem.h | 1 - drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 14 +++++--------- drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 ---------- 3 files changed, 5 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.h b/drivers/gpu/drm/etnaviv/etnaviv_gem.h index 98e60df882b6..f596d743baa3 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.h +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.h @@ -80,7 +80,6 @@ struct etnaviv_gem_submit_bo { u64 va; struct etnaviv_gem_object *obj; struct etnaviv_vram_mapping *mapping; - struct dma_fence *excl; unsigned int nr_shared; struct dma_fence **shared; }; diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c index 64c90ff348f2..4286dc93fdaa 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c @@ -188,15 +188,11 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit) if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT) continue;

- if (bo->flags & ETNA_SUBMIT_BO_WRITE) { - ret = dma_resv_get_fences(robj, true, &bo->nr_shared, - &bo->shared); - if (ret) - return ret; - } else { - bo->excl = dma_fence_get(dma_resv_excl_fence(robj)); - } - + ret = dma_resv_get_fences(robj, + !!(bo->flags & ETNA_SUBMIT_BO_WRITE), + &bo->nr_shared, &bo->shared); + if (ret) + return ret; }

return ret; diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c index 180bb633d5c5..8c038a363d15 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c @@ -39,16 +39,6 @@ etnaviv_sched_dependency(struct drm_sched_job *sched_job, struct etnaviv_gem_submit_bo *bo = &submit->bos[i]; int j;

- if (bo->excl) { - fence = bo->excl; - bo->excl = NULL; - - if (!dma_fence_is_signaled(fence)) - return fence; - - dma_fence_put(fence); - } - for (j = 0; j < bo->nr_shared; j++) { if (!bo->shared[j]) continue;

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

9:26 p.m.

New subject: [PATCH 06/24] drm/etnaviv: stop using dma_resv_excl_fence

On Tue, Dec 07, 2021 at 01:33:53PM +0100, Christian König wrote:

...

We can get the excl fence together with the shared ones as well.

Signed-off-by: Christian König christian.koenig@amd.com

Pls cc driver maintainers.

dim add-missing-cc

is your friend if you're lazy can even combine that with git rebase -x. Same for all the other driver patches, some acks/testing would be good to avoid fallout (we had a bit much of that with all these I think).

...

drivers/gpu/drm/etnaviv/etnaviv_gem.h | 1 - drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 14 +++++--------- drivers/gpu/drm/etnaviv/etnaviv_sched.c | 10 ---------- 3 files changed, 5 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.h b/drivers/gpu/drm/etnaviv/etnaviv_gem.h index 98e60df882b6..f596d743baa3 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.h +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.h @@ -80,7 +80,6 @@ struct etnaviv_gem_submit_bo { u64 va; struct etnaviv_gem_object *obj; struct etnaviv_vram_mapping *mapping;

struct dma_fence *excl; unsigned int nr_shared; struct dma_fence **shared;

}; diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c index 64c90ff348f2..4286dc93fdaa 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c @@ -188,15 +188,11 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit) if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT) continue;
if (bo->flags & ETNA_SUBMIT_BO_WRITE) {
	ret = dma_resv_get_fences(robj, true, &bo->nr_shared,
				  &bo->shared);
	if (ret)
		return ret;
} else {
	bo->excl = dma_fence_get(dma_resv_excl_fence(robj));
}
ret = dma_resv_get_fences(robj,
			  !!(bo->flags & ETNA_SUBMIT_BO_WRITE),

Afaik the cast to bool !! here is overkill, compiler will do that for you or something like that. With that dropped:

Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch

...

			  &bo->nr_shared, &bo->shared);
if (ret)
	return ret;
}
return ret; diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c index 180bb633d5c5..8c038a363d15 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c @@ -39,16 +39,6 @@ etnaviv_sched_dependency(struct drm_sched_job *sched_job, struct etnaviv_gem_submit_bo *bo = &submit->bos[i]; int j;
if (bo->excl) {
	fence = bo->excl;
	bo->excl = NULL;
	if (!dma_fence_is_signaled(fence))
		return fence;
	dma_fence_put(fence);
}
for (j = 0; j < bo->nr_shared; j++) { if (!bo->shared[j]) continue;
-- 2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

7 Dec 7 Dec

12:33 p.m.

New subject: [PATCH 07/24] drm/nouveau: stop using dma_resv_excl_fence

Instead use the new dma_resv_get_singleton function.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/nouveau/nouveau_bo.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index fa73fe57f97b..74f8652d2bd3 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -959,7 +959,14 @@ nouveau_bo_vm_cleanup(struct ttm_buffer_object *bo, { struct nouveau_drm *drm = nouveau_bdev(bo->bdev); struct drm_device *dev = drm->dev; - struct dma_fence *fence = dma_resv_excl_fence(bo->base.resv); + struct dma_fence *fence; + int ret; + + /* TODO: This is actually a memory management dependency */ + ret = dma_resv_get_singleton(bo->base.resv, false, &fence); + if (ret) + dma_resv_wait_timeout(bo->base.resv, false, false, + MAX_SCHEDULE_TIMEOUT);

nv10_bo_put_tile_region(dev, *old_tile, fence); *old_tile = new_tile;

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

9:26 p.m.

New subject: [PATCH 07/24] drm/nouveau: stop using dma_resv_excl_fence

On Tue, Dec 07, 2021 at 01:33:54PM +0100, Christian König wrote:

...

Instead use the new dma_resv_get_singleton function.

Signed-off-by: Christian König christian.koenig@amd.com

drivers/gpu/drm/nouveau/nouveau_bo.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index fa73fe57f97b..74f8652d2bd3 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -959,7 +959,14 @@ nouveau_bo_vm_cleanup(struct ttm_buffer_object *bo, { struct nouveau_drm *drm = nouveau_bdev(bo->bdev); struct drm_device *dev = drm->dev;

struct dma_fence *fence = dma_resv_excl_fence(bo->base.resv);
struct dma_fence *fence;

int ret;

/* TODO: This is actually a memory management dependency */

ret = dma_resv_get_singleton(bo->base.resv, false, &fence);

if (ret)
dma_resv_wait_timeout(bo->base.resv, false, false,
		      MAX_SCHEDULE_TIMEOUT);

Needs ack from nouveau folks.

Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch

...

nv10_bo_put_tile_region(dev, *old_tile, fence);

*old_tile = new_tile;

2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

7 Dec 7 Dec

12:33 p.m.

New subject: [PATCH 08/24] drm/vmwgfx: stop using dma_resv_excl_fence

Instead use the new dma_resv_get_singleton function.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c index 8d1e869cc196..23c3fc2cbf10 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c @@ -1168,8 +1168,10 @@ int vmw_resources_clean(struct vmw_buffer_object *vbo, pgoff_t start, vmw_bo_fence_single(bo, NULL); if (bo->moving) dma_fence_put(bo->moving); - bo->moving = dma_fence_get - (dma_resv_excl_fence(bo->base.resv)); + + /* TODO: This is actually a memory management dependency */ + return dma_resv_get_singleton(bo->base.resv, false, + &bo->moving); }

return 0;

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

9:31 p.m.

New subject: [PATCH 08/24] drm/vmwgfx: stop using dma_resv_excl_fence

On Tue, Dec 07, 2021 at 01:33:55PM +0100, Christian König wrote:

...

Instead use the new dma_resv_get_singleton function.

Signed-off-by: Christian König christian.koenig@amd.com

drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c index 8d1e869cc196..23c3fc2cbf10 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c @@ -1168,8 +1168,10 @@ int vmw_resources_clean(struct vmw_buffer_object *vbo, pgoff_t start, vmw_bo_fence_single(bo, NULL); if (bo->moving) dma_fence_put(bo->moving);
bo->moving = dma_fence_get
	(dma_resv_excl_fence(bo->base.resv));
/* TODO: This is actually a memory management dependency */
return dma_resv_get_singleton(bo->base.resv, false,
			      &bo->moving);

Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch

...

} return 0; -- 2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

7 Dec 7 Dec

12:33 p.m.

New subject: [PATCH 09/24] drm/radeon: stop using dma_resv_excl_fence

Instead use the new dma_resv_get_singleton function.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/radeon/radeon_display.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 573154268d43..a6f875118f01 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -533,7 +533,12 @@ static int radeon_crtc_page_flip_target(struct drm_crtc *crtc, DRM_ERROR("failed to pin new rbo buffer before flip\n"); goto cleanup; } - work->fence = dma_fence_get(dma_resv_excl_fence(new_rbo->tbo.base.resv)); + r = dma_resv_get_singleton(new_rbo->tbo.base.resv, false, &work->fence); + if (r) { + radeon_bo_unreserve(new_rbo); + DRM_ERROR("failed to get new rbo buffer fences\n"); + goto cleanup; + } radeon_bo_get_tiling_flags(new_rbo, &tiling_flags, NULL); radeon_bo_unreserve(new_rbo);

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

9:30 p.m.

New subject: [PATCH 09/24] drm/radeon: stop using dma_resv_excl_fence

On Tue, Dec 07, 2021 at 01:33:56PM +0100, Christian König wrote:

...

Instead use the new dma_resv_get_singleton function.

Signed-off-by: Christian König christian.koenig@amd.com

drivers/gpu/drm/radeon/radeon_display.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 573154268d43..a6f875118f01 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -533,7 +533,12 @@ static int radeon_crtc_page_flip_target(struct drm_crtc *crtc, DRM_ERROR("failed to pin new rbo buffer before flip\n"); goto cleanup; }

work->fence = dma_fence_get(dma_resv_excl_fence(new_rbo->tbo.base.resv));
r = dma_resv_get_singleton(new_rbo->tbo.base.resv, false, &work->fence);

if (r) {
radeon_bo_unreserve(new_rbo);
DRM_ERROR("failed to get new rbo buffer fences\n");
goto cleanup;
}

Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch

...

radeon_bo_get_tiling_flags(new_rbo, &tiling_flags, NULL); radeon_bo_unreserve(new_rbo); -- 2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

7 Dec 7 Dec

12:33 p.m.

New subject: [PATCH 10/24] drm/amdgpu: remove excl as shared workarounds

This was added because of the now dropped shared on excl dependency.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +---- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 6 ------ 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 0311d799a010..53e407ea4c89 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1275,14 +1275,11 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, /* * Work around dma_resv shortcommings by wrapping up the * submission in a dma_fence_chain and add it as exclusive - * fence, but first add the submission as shared fence to make - * sure that shared fences never signal before the exclusive - * one. + * fence. */ dma_fence_chain_init(chain, dma_resv_excl_fence(resv), dma_fence_get(p->fence), 1);

- dma_resv_add_shared_fence(resv, p->fence); rcu_assign_pointer(resv->fence_excl, &chain->base); e->chain = NULL; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index a1e63ba4c54a..85d31d85c384 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -226,12 +226,6 @@ static void amdgpu_gem_object_close(struct drm_gem_object *obj, if (!amdgpu_vm_ready(vm)) goto out_unlock;

- fence = dma_resv_excl_fence(bo->tbo.base.resv); - if (fence) { - amdgpu_bo_fence(bo, fence, true); - fence = NULL; - } - r = amdgpu_vm_clear_freed(adev, vm, &fence); if (r || !fence) goto out_unlock;

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

9:34 p.m.

New subject: [PATCH 10/24] drm/amdgpu: remove excl as shared workarounds

On Tue, Dec 07, 2021 at 01:33:57PM +0100, Christian König wrote:

...

This was added because of the now dropped shared on excl dependency.

Signed-off-by: Christian König christian.koenig@amd.com

I didn't do a full re-audit of whether you got them all, I think latest with the semantic change to allow more kinds of fence types with dma-resv we should catch them all.

Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch

...

drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +---- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 6 ------ 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 0311d799a010..53e407ea4c89 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1275,14 +1275,11 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, /* * Work around dma_resv shortcommings by wrapping up the * submission in a dma_fence_chain and add it as exclusive
 * fence, but first add the submission as shared fence to make
 * sure that shared fences never signal before the exclusive
 * one.
 * fence.
*/ dma_fence_chain_init(chain, dma_resv_excl_fence(resv), dma_fence_get(p->fence), 1);
dma_resv_add_shared_fence(resv, p->fence);
rcu_assign_pointer(resv->fence_excl, &chain->base); e->chain = NULL; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index a1e63ba4c54a..85d31d85c384 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -226,12 +226,6 @@ static void amdgpu_gem_object_close(struct drm_gem_object *obj, if (!amdgpu_vm_ready(vm)) goto out_unlock;
fence = dma_resv_excl_fence(bo->tbo.base.resv);

if (fence) {
amdgpu_bo_fence(bo, fence, true);
fence = NULL;
}

r = amdgpu_vm_clear_freed(adev, vm, &fence); if (r || !fence) goto out_unlock;
-- 2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

7 Dec 7 Dec

12:33 p.m.

New subject: [PATCH 11/24] drm/amdgpu: use dma_resv_for_each_fence for CS workaround

Get the write fence using dma_resv_for_each_fence instead of accessing it manually.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 53e407ea4c89..7facd614e50a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1268,6 +1268,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, amdgpu_bo_list_for_each_entry(e, p->bo_list) { struct dma_resv *resv = e->tv.bo->base.resv; struct dma_fence_chain *chain = e->chain; + struct dma_resv_iter cursor; + struct dma_fence *fence;

if (!chain) continue; @@ -1277,9 +1279,10 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, * submission in a dma_fence_chain and add it as exclusive * fence. */ - dma_fence_chain_init(chain, dma_resv_excl_fence(resv), - dma_fence_get(p->fence), 1); - + dma_resv_for_each_fence(&cursor, resv, false, fence) { + break; + } + dma_fence_chain_init(chain, fence, dma_fence_get(p->fence), 1); rcu_assign_pointer(resv->fence_excl, &chain->base); e->chain = NULL; }

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

9:37 p.m.

New subject: [PATCH 11/24] drm/amdgpu: use dma_resv_for_each_fence for CS workaround

On Tue, Dec 07, 2021 at 01:33:58PM +0100, Christian König wrote:

...

Get the write fence using dma_resv_for_each_fence instead of accessing it manually.

Signed-off-by: Christian König christian.koenig@amd.com

drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 53e407ea4c89..7facd614e50a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1268,6 +1268,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, amdgpu_bo_list_for_each_entry(e, p->bo_list) { struct dma_resv *resv = e->tv.bo->base.resv; struct dma_fence_chain *chain = e->chain;
struct dma_resv_iter cursor;
struct dma_fence *fence;
if (!chain) continue; @@ -1277,9 +1279,10 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, * submission in a dma_fence_chain and add it as exclusive * fence. */
dma_fence_chain_init(chain, dma_resv_excl_fence(resv),
		     dma_fence_get(p->fence), 1);
dma_resv_for_each_fence(&cursor, resv, false, fence) {
	break;
}
dma_fence_chain_init(chain, fence, dma_fence_get(p->fence), 1);

Uh this needs a TODO. I'm assuming you'll fix this up later on when there's more than write fence, but in case of bisect or whatever this is a bit too clever. Like you just replace one "dig around in dma-resv implementation details" with one that's not even a documented interface :-)

With an adequately loud comment added interim:

Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch

...

rcu_assign_pointer(resv->fence_excl, &chain->base);
e->chain = NULL;

}

2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

3 Jan 3 Jan

12:24 p.m.

New subject: [PATCH 11/24] drm/amdgpu: use dma_resv_for_each_fence for CS workaround

Am 22.12.21 um 22:37 schrieb Daniel Vetter:

...

On Tue, Dec 07, 2021 at 01:33:58PM +0100, Christian König wrote:

...
Get the write fence using dma_resv_for_each_fence instead of accessing it manually.

Signed-off-by: Christian König christian.koenig@amd.com

drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 53e407ea4c89..7facd614e50a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1268,6 +1268,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, amdgpu_bo_list_for_each_entry(e, p->bo_list) { struct dma_resv *resv = e->tv.bo->base.resv; struct dma_fence_chain *chain = e->chain;
struct dma_resv_iter cursor;
struct dma_fence *fence;
if (!chain) continue; @@ -1277,9 +1279,10 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p, * submission in a dma_fence_chain and add it as exclusive * fence. */
dma_fence_chain_init(chain, dma_resv_excl_fence(resv),
		     dma_fence_get(p->fence), 1);
dma_resv_for_each_fence(&cursor, resv, false, fence) {
	break;
}
dma_fence_chain_init(chain, fence, dma_fence_get(p->fence), 1);
Uh this needs a TODO. I'm assuming you'll fix this up later on when there's more than write fence, but in case of bisect or whatever this is a bit too clever. Like you just replace one "dig around in dma-resv implementation details" with one that's not even a documented interface :-)

Ah, yes. There is a rather big TODO just above this, but I should probably make that even more stronger.

...

With an adequately loud comment added interim:

Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch

Thanks, Christian.

...

rcu_assign_pointer(resv->fence_excl, &chain->base);
e->chain = NULL;

}

2.25.1

Christian König

7 Dec 7 Dec

12:33 p.m.

New subject: [PATCH 12/24] dma-buf: finally make dma_resv_excl_fence private

Drivers should never touch this directly.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/dma-buf/dma-resv.c | 17 +++++++++++++++++ include/linux/dma-resv.h | 17 ----------------- 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 694716a3d66d..9acceabc9399 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -147,6 +147,23 @@ void dma_resv_fini(struct dma_resv *obj) } EXPORT_SYMBOL(dma_resv_fini);

+/** + * dma_resv_excl_fence - return the object's exclusive fence + * @obj: the reservation object + * + * Returns the exclusive fence (if any). Caller must either hold the objects + * through dma_resv_lock() or the RCU read side lock through rcu_read_lock(), + * or one of the variants of each + * + * RETURNS + * The exclusive fence or NULL + */ +static inline struct dma_fence * +dma_resv_excl_fence(struct dma_resv *obj) +{ + return rcu_dereference_check(obj->fence_excl, dma_resv_held(obj)); +} + /** * dma_resv_shared_list - get the reservation object's shared fence list * @obj: the reservation object diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index cdfbbda6f600..40ac9d486f8f 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -412,23 +412,6 @@ static inline void dma_resv_unlock(struct dma_resv *obj) ww_mutex_unlock(&obj->lock); }

-/** - * dma_resv_excl_fence - return the object's exclusive fence - * @obj: the reservation object - * - * Returns the exclusive fence (if any). Caller must either hold the objects - * through dma_resv_lock() or the RCU read side lock through rcu_read_lock(), - * or one of the variants of each - * - * RETURNS - * The exclusive fence or NULL - */ -static inline struct dma_fence * -dma_resv_excl_fence(struct dma_resv *obj) -{ - return rcu_dereference_check(obj->fence_excl, dma_resv_held(obj)); -} - void dma_resv_init(struct dma_resv *obj); void dma_resv_fini(struct dma_resv *obj); int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences);

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

9:39 p.m.

New subject: [PATCH 12/24] dma-buf: finally make dma_resv_excl_fence private

On Tue, Dec 07, 2021 at 01:33:59PM +0100, Christian König wrote:

...

Drivers should never touch this directly.

Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/dma-resv.c | 17 +++++++++++++++++ include/linux/dma-resv.h | 17 ----------------- 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 694716a3d66d..9acceabc9399 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -147,6 +147,23 @@ void dma_resv_fini(struct dma_resv *obj) } EXPORT_SYMBOL(dma_resv_fini); +/**

dma_resv_excl_fence - return the object's exclusive fence

@obj: the reservation object

Returns the exclusive fence (if any). Caller must either hold the objects

through dma_resv_lock() or the RCU read side lock through rcu_read_lock(),

or one of the variants of each

RETURNS

The exclusive fence or NULL

*/

Same thing with us not documenting internals, pls drop the comment outright it doesn't really explain anything. With that:

Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch

...

+static inline struct dma_fence * +dma_resv_excl_fence(struct dma_resv *obj) +{

return rcu_dereference_check(obj->fence_excl, dma_resv_held(obj));

+}

/**

dma_resv_shared_list - get the reservation object's shared fence list

@obj: the reservation object

diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index cdfbbda6f600..40ac9d486f8f 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -412,23 +412,6 @@ static inline void dma_resv_unlock(struct dma_resv *obj) ww_mutex_unlock(&obj->lock); } -/**

dma_resv_excl_fence - return the object's exclusive fence

@obj: the reservation object

Returns the exclusive fence (if any). Caller must either hold the objects

through dma_resv_lock() or the RCU read side lock through rcu_read_lock(),

or one of the variants of each

RETURNS

The exclusive fence or NULL

*/

-static inline struct dma_fence * -dma_resv_excl_fence(struct dma_resv *obj) -{

return rcu_dereference_check(obj->fence_excl, dma_resv_held(obj));

-}

void dma_resv_init(struct dma_resv *obj); void dma_resv_fini(struct dma_resv *obj); int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences); -- 2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

7 Dec 7 Dec

12:34 p.m.

New subject: [PATCH 13/24] dma-buf: drop the DAG approach for the dma_resv object

So far we had the approach of using a directed acyclic graph with the dma_resv obj.

This turned out to have many downsides, especially it means that every single driver and user of this interface needs to be aware of this restriction when adding fences. If the rules for the DAG are not followed then we end up with potential hard to debug memory corruption, information leaks or even elephant big security holes because we allow userspace to access freed up memory.

Since we already took a step back from that by always looking at all fences we now go a step further and stop dropping the shared fences when a new exclusive one is added.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/dma-buf/dma-resv.c | 13 ------------- 1 file changed, 13 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 9acceabc9399..ecb2ff606bac 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -383,29 +383,16 @@ EXPORT_SYMBOL(dma_resv_replace_fences); void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence) { struct dma_fence *old_fence = dma_resv_excl_fence(obj); - struct dma_resv_list *old; - u32 i = 0;

dma_resv_assert_held(obj);

- old = dma_resv_shared_list(obj); - if (old) - i = old->shared_count; - dma_fence_get(fence);

write_seqcount_begin(&obj->seq); /* write_seqcount_begin provides the necessary memory barrier */ RCU_INIT_POINTER(obj->fence_excl, fence); - if (old) - old->shared_count = 0; write_seqcount_end(&obj->seq);

- /* inplace update, no shared fences */ - while (i--) - dma_fence_put(rcu_dereference_protected(old->shared[i], - dma_resv_held(obj))); - dma_fence_put(old_fence); } EXPORT_SYMBOL(dma_resv_add_excl_fence);

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

9:43 p.m.

New subject: [PATCH 13/24] dma-buf: drop the DAG approach for the dma_resv object

On Tue, Dec 07, 2021 at 01:34:00PM +0100, Christian König wrote:

...

So far we had the approach of using a directed acyclic graph with the dma_resv obj.

This turned out to have many downsides, especially it means that every single driver and user of this interface needs to be aware of this restriction when adding fences. If the rules for the DAG are not followed then we end up with potential hard to debug memory corruption, information leaks or even elephant big security holes because we allow userspace to access freed up memory.

Since we already took a step back from that by always looking at all fences we now go a step further and stop dropping the shared fences when a new exclusive one is added.

Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/dma-resv.c | 13 ------------- 1 file changed, 13 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 9acceabc9399..ecb2ff606bac 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c

No doc update at all!

I checked, we're not that shitty with docs, Minimally the DOC: section header and also the struct dma_resv kerneldoc. Also there's maybe more references and stuff I've missed on a quick look, please check for them (e.g. dma_buf.resv kerneldoc is rather important to keep correct too).

Code itself does what it says in the commit message, but we really need the most accurate docs we can get for this stuff, or the confusion will persist :-/

Cheers, Daniel

...

@@ -383,29 +383,16 @@ EXPORT_SYMBOL(dma_resv_replace_fences); void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence) { struct dma_fence *old_fence = dma_resv_excl_fence(obj);

struct dma_resv_list *old;

u32 i = 0;

dma_resv_assert_held(obj);
old = dma_resv_shared_list(obj);

if (old)
i = old->shared_count;
dma_fence_get(fence);
write_seqcount_begin(&obj->seq); /* write_seqcount_begin provides the necessary memory barrier */ RCU_INIT_POINTER(obj->fence_excl, fence);
if (old)
old->shared_count = 0;
write_seqcount_end(&obj->seq);
/* inplace update, no shared fences */

while (i--)
dma_fence_put(rcu_dereference_protected(old->shared[i],
				dma_resv_held(obj)));
dma_fence_put(old_fence);
} EXPORT_SYMBOL(dma_resv_add_excl_fence); -- 2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

4 Jan 4 Jan

3:08 p.m.

New subject: [PATCH 13/24] dma-buf: drop the DAG approach for the dma_resv object

Am 22.12.21 um 22:43 schrieb Daniel Vetter:

...

On Tue, Dec 07, 2021 at 01:34:00PM +0100, Christian König wrote:

...
So far we had the approach of using a directed acyclic graph with the dma_resv obj.

This turned out to have many downsides, especially it means that every single driver and user of this interface needs to be aware of this restriction when adding fences. If the rules for the DAG are not followed then we end up with potential hard to debug memory corruption, information leaks or even elephant big security holes because we allow userspace to access freed up memory.

Since we already took a step back from that by always looking at all fences we now go a step further and stop dropping the shared fences when a new exclusive one is added.

Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/dma-resv.c | 13 ------------- 1 file changed, 13 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 9acceabc9399..ecb2ff606bac 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c

No doc update at all!

Scratching my head I'm pretty sure I've updated at least the kerneldoc for dma_resv_add_excl_fence(). Must have gone lost in some rebase.

...

I checked, we're not that shitty with docs,

Well I wouldn't say shitty, but they are not perfect either.

...

Minimally the DOC: section header and also the struct dma_resv kerneldoc. Also there's maybe more references and stuff I've missed on a quick look, please check for them (e.g. dma_buf.resv kerneldoc is rather important to keep correct too).

Code itself does what it says in the commit message, but we really need the most accurate docs we can get for this stuff, or the confusion will persist :-/

Yeah completely agree, going to fix that.

Thanks, Christian.

...

Cheers, Daniel

...
@@ -383,29 +383,16 @@ EXPORT_SYMBOL(dma_resv_replace_fences); void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence) { struct dma_fence *old_fence = dma_resv_excl_fence(obj);

struct dma_resv_list *old;

u32 i = 0;

dma_resv_assert_held(obj);
old = dma_resv_shared_list(obj);

if (old)
i = old->shared_count;
dma_fence_get(fence);
write_seqcount_begin(&obj->seq); /* write_seqcount_begin provides the necessary memory barrier */ RCU_INIT_POINTER(obj->fence_excl, fence);
if (old)
old->shared_count = 0;
write_seqcount_end(&obj->seq);
/* inplace update, no shared fences */

while (i--)
dma_fence_put(rcu_dereference_protected(old->shared[i],
				dma_resv_held(obj)));
dma_fence_put(old_fence); } EXPORT_SYMBOL(dma_resv_add_excl_fence);
-- 2.25.1

Daniel Vetter

14 Jan 14 Jan

4:33 p.m.

New subject: [PATCH 13/24] dma-buf: drop the DAG approach for the dma_resv object

On Tue, Jan 04, 2022 at 04:08:20PM +0100, Christian König wrote:

...

Am 22.12.21 um 22:43 schrieb Daniel Vetter:

...
On Tue, Dec 07, 2021 at 01:34:00PM +0100, Christian König wrote:

...
So far we had the approach of using a directed acyclic graph with the dma_resv obj.

This turned out to have many downsides, especially it means that every single driver and user of this interface needs to be aware of this restriction when adding fences. If the rules for the DAG are not followed then we end up with potential hard to debug memory corruption, information leaks or even elephant big security holes because we allow userspace to access freed up memory.

Since we already took a step back from that by always looking at all fences we now go a step further and stop dropping the shared fences when a new exclusive one is added.

Signed-off-by: Christian König christian.koenig@amd.com

drivers/dma-buf/dma-resv.c | 13 ------------- 1 file changed, 13 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 9acceabc9399..ecb2ff606bac 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c

No doc update at all!

Scratching my head I'm pretty sure I've updated at least the kerneldoc for dma_resv_add_excl_fence(). Must have gone lost in some rebase.

...
I checked, we're not that shitty with docs,

Well I wouldn't say shitty, but they are not perfect either.

This was sarcasm, I meant to say that despite the struggles the docs in-tree are pretty good nowadays. Email just sucks sometimes for communication.

...

...
Minimally the DOC: section header and also the struct dma_resv kerneldoc. Also there's maybe more references and stuff I've missed on a quick look, please check for them (e.g. dma_buf.resv kerneldoc is rather important to keep correct too).

Code itself does what it says in the commit message, but we really need the most accurate docs we can get for this stuff, or the confusion will persist :-/

Yeah completely agree, going to fix that.

Awesome!

Cheers, Daniel

...

Thanks, Christian.

...
Cheers, Daniel

...
@@ -383,29 +383,16 @@ EXPORT_SYMBOL(dma_resv_replace_fences); void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence) { struct dma_fence *old_fence = dma_resv_excl_fence(obj);
struct dma_resv_list *old;

u32 i = 0; dma_resv_assert_held(obj);

old = dma_resv_shared_list(obj);

if (old)
i = old->shared_count;
dma_fence_get(fence); write_seqcount_begin(&obj->seq); /* write_seqcount_begin provides the necessary memory barrier */ RCU_INIT_POINTER(obj->fence_excl, fence);

if (old)
old->shared_count = 0;
write_seqcount_end(&obj->seq);
/* inplace update, no shared fences */

while (i--)
dma_fence_put(rcu_dereference_protected(old->shared[i],
				dma_resv_held(obj)));
dma_fence_put(old_fence); } EXPORT_SYMBOL(dma_resv_add_excl_fence);
-- 2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

7 Dec 7 Dec

12:34 p.m.

New subject: [PATCH 14/24] dma-buf/drivers: make reserving a shared slot mandatory v2

Audit all the users of dma_resv_add_excl_fence() and make sure they reserve a shared slot also when only trying to add an exclusive fence.

This is the next step towards handling the exclusive fence like a shared one.

v2: fix missed case in amdgpu

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/dma-buf/st-dma-resv.c | 64 +++++++++---------- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 +++ drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 8 +-- drivers/gpu/drm/i915/gem/i915_gem_clflush.c | 3 +- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 8 +-- .../drm/i915/gem/selftests/i915_gem_migrate.c | 5 +- drivers/gpu/drm/i915/i915_vma.c | 6 ++ .../drm/i915/selftests/intel_memory_region.c | 7 ++ drivers/gpu/drm/lima/lima_gem.c | 10 ++- drivers/gpu/drm/msm/msm_gem_submit.c | 18 +++--- drivers/gpu/drm/nouveau/nouveau_fence.c | 9 +-- drivers/gpu/drm/panfrost/panfrost_job.c | 4 ++ drivers/gpu/drm/ttm/ttm_bo_util.c | 12 +++- drivers/gpu/drm/ttm/ttm_execbuf_util.c | 11 ++-- drivers/gpu/drm/v3d/v3d_gem.c | 15 +++-- drivers/gpu/drm/vgem/vgem_fence.c | 12 ++-- drivers/gpu/drm/virtio/virtgpu_gem.c | 9 +++ drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 16 +++-- 18 files changed, 133 insertions(+), 92 deletions(-)

diff --git a/drivers/dma-buf/st-dma-resv.c b/drivers/dma-buf/st-dma-resv.c index cbe999c6e7a6..f33bafc78693 100644 --- a/drivers/dma-buf/st-dma-resv.c +++ b/drivers/dma-buf/st-dma-resv.c @@ -75,17 +75,16 @@ static int test_signaling(void *arg, bool shared) goto err_free; }

- if (shared) { - r = dma_resv_reserve_shared(&resv, 1); - if (r) { - pr_err("Resv shared slot allocation failed\n"); - goto err_unlock; - } + r = dma_resv_reserve_shared(&resv, 1); + if (r) { + pr_err("Resv shared slot allocation failed\n"); + goto err_unlock; + }

+ if (shared) dma_resv_add_shared_fence(&resv, f); - } else { + else dma_resv_add_excl_fence(&resv, f); - }

if (dma_resv_test_signaled(&resv, shared)) { pr_err("Resv unexpectedly signaled\n"); @@ -134,17 +133,16 @@ static int test_for_each(void *arg, bool shared) goto err_free; }

+ if (shared) dma_resv_add_shared_fence(&resv, f); - } else { + else dma_resv_add_excl_fence(&resv, f); - }

r = -ENOENT; dma_resv_for_each_fence(&cursor, &resv, shared, fence) { @@ -206,18 +204,17 @@ static int test_for_each_unlocked(void *arg, bool shared) goto err_free; }

- if (shared) { - r = dma_resv_reserve_shared(&resv, 1); - if (r) { - pr_err("Resv shared slot allocation failed\n"); - dma_resv_unlock(&resv); - goto err_free; - } + r = dma_resv_reserve_shared(&resv, 1); + if (r) { + pr_err("Resv shared slot allocation failed\n"); + dma_resv_unlock(&resv); + goto err_free; + }

+ if (shared) dma_resv_add_shared_fence(&resv, f); - } else { + else dma_resv_add_excl_fence(&resv, f); - } dma_resv_unlock(&resv);

r = -ENOENT; @@ -290,18 +287,17 @@ static int test_get_fences(void *arg, bool shared) goto err_resv; }

- if (shared) { - r = dma_resv_reserve_shared(&resv, 1); - if (r) { - pr_err("Resv shared slot allocation failed\n"); - dma_resv_unlock(&resv); - goto err_resv; - } + r = dma_resv_reserve_shared(&resv, 1); + if (r) { + pr_err("Resv shared slot allocation failed\n"); + dma_resv_unlock(&resv); + goto err_resv; + }

+ if (shared) dma_resv_add_shared_fence(&resv, f); - } else { + else dma_resv_add_excl_fence(&resv, f); - } dma_resv_unlock(&resv);

r = dma_resv_get_fences(&resv, shared, &i, &fences); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 4fcfc2313b8c..24a6b88afcca 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1367,6 +1367,14 @@ void amdgpu_bo_fence(struct amdgpu_bo *bo, struct dma_fence *fence, bool shared) { struct dma_resv *resv = bo->tbo.base.resv; + int r; + + r = dma_resv_reserve_shared(resv, 1); + if (r) { + /* As last resort on OOM we block for the fence */ + dma_fence_wait(fence, false); + return; + }

if (shared) dma_resv_add_shared_fence(resv, fence); diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c index 4286dc93fdaa..d4a7073190ec 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c @@ -179,11 +179,9 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit) struct etnaviv_gem_submit_bo *bo = &submit->bos[i]; struct dma_resv *robj = bo->obj->base.resv;

- if (!(bo->flags & ETNA_SUBMIT_BO_WRITE)) { - ret = dma_resv_reserve_shared(robj, 1); - if (ret) - return ret; - } + ret = dma_resv_reserve_shared(robj, 1); + if (ret) + return ret;

if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT) continue; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c index f0435c6feb68..fc57ab914b60 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c @@ -100,7 +100,8 @@ bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, trace_i915_gem_object_clflush(obj);

clflush = NULL; - if (!(flags & I915_CLFLUSH_SYNC)) + if (!(flags & I915_CLFLUSH_SYNC) && + dma_resv_reserve_shared(obj->base.resv, 1) == 0) clflush = clflush_work_create(obj); if (clflush) { i915_sw_fence_await_reservation(&clflush->base.chain, diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 4d7da07442f2..fc0e1625847c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -989,11 +989,9 @@ static int eb_validate_vmas(struct i915_execbuffer *eb) } }

- if (!(ev->flags & EXEC_OBJECT_WRITE)) { - err = dma_resv_reserve_shared(vma->resv, 1); - if (err) - return err; - } + err = dma_resv_reserve_shared(vma->resv, 1); + if (err) + return err;

GEM_BUG_ON(drm_mm_node_allocated(&vma->node) && eb_vma_misplaced(&eb->exec[i], vma, ev->flags)); diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c index 28a700f08b49..2bf491fd5cdf 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c @@ -179,7 +179,10 @@ static int igt_lmem_pages_migrate(void *arg) i915_gem_object_is_lmem(obj), 0xdeadbeaf, &rq); if (rq) { - dma_resv_add_excl_fence(obj->base.resv, &rq->fence); + err = dma_resv_reserve_shared(obj->base.resv, 1); + if (!err) + dma_resv_add_excl_fence(obj->base.resv, + &rq->fence); i915_request_put(rq); } if (err) diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index bef795e265a6..5ec87de63963 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -1255,6 +1255,12 @@ int _i915_vma_move_to_active(struct i915_vma *vma, intel_frontbuffer_put(front); }

+ if (!(flags & __EXEC_OBJECT_NO_RESERVE)) { + err = dma_resv_reserve_shared(vma->resv, 1); + if (unlikely(err)) + return err; + } + if (fence) { dma_resv_add_excl_fence(vma->resv, fence); obj->write_domain = I915_GEM_DOMAIN_RENDER; diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index 418caae84759..b85af1672a7e 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -894,6 +894,13 @@ static int igt_lmem_write_cpu(void *arg) }

i915_gem_object_lock(obj, NULL); + + err = dma_resv_reserve_shared(obj->base.resv, 1); + if (err) { + i915_gem_object_unlock(obj); + goto out_put; + } + /* Put the pages into a known state -- from the gpu for added fun */ intel_engine_pm_get(engine); err = intel_context_migrate_clear(engine->gt->migrate.context, NULL, diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c index f9a9198ef198..b4846007463f 100644 --- a/drivers/gpu/drm/lima/lima_gem.c +++ b/drivers/gpu/drm/lima/lima_gem.c @@ -255,13 +255,11 @@ int lima_gem_get_info(struct drm_file *file, u32 handle, u32 *va, u64 *offset) static int lima_gem_sync_bo(struct lima_sched_task *task, struct lima_bo *bo, bool write, bool explicit) { - int err = 0; + int err;

- if (!write) { - err = dma_resv_reserve_shared(lima_bo_resv(bo), 1); - if (err) - return err; - } + err = dma_resv_reserve_shared(lima_bo_resv(bo), 1); + if (err) + return err;

/* explicit sync use user passed dep fence */ if (explicit) diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c index 3cb029f10925..e874d09b74ef 100644 --- a/drivers/gpu/drm/msm/msm_gem_submit.c +++ b/drivers/gpu/drm/msm/msm_gem_submit.c @@ -320,16 +320,14 @@ static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit) struct drm_gem_object *obj = &submit->bos[i].obj->base; bool write = submit->bos[i].flags & MSM_SUBMIT_BO_WRITE;

- if (!write) { - /* NOTE: _reserve_shared() must happen before - * _add_shared_fence(), which makes this a slightly - * strange place to call it. OTOH this is a - * convenient can-fail point to hook it in. - */ - ret = dma_resv_reserve_shared(obj->resv, 1); - if (ret) - return ret; - } + /* NOTE: _reserve_shared() must happen before + * _add_shared_fence(), which makes this a slightly + * strange place to call it. OTOH this is a + * convenient can-fail point to hook it in. + */ + ret = dma_resv_reserve_shared(obj->resv, 1); + if (ret) + return ret;

/* exclusive fences must be ordered */ if (no_implicit && !write) diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index 26f9299df881..cd6715bd6d6b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -349,12 +349,9 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, struct nouveau_fence *f; int ret;

- if (!exclusive) { - ret = dma_resv_reserve_shared(resv, 1); - - if (ret) - return ret; - } + ret = dma_resv_reserve_shared(resv, 1); + if (ret) + return ret;

dma_resv_for_each_fence(&cursor, resv, exclusive, fence) { struct nouveau_channel *prev = NULL; diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 908d79520853..89c3fe389476 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -247,6 +247,10 @@ static int panfrost_acquire_object_fences(struct drm_gem_object **bos, int i, ret;

for (i = 0; i < bo_count; i++) { + ret = dma_resv_reserve_shared(bos[i]->resv, 1); + if (ret) + return ret; + /* panfrost always uses write mode in its current uapi */ ret = drm_sched_job_add_implicit_dependencies(job, bos[i], true); diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index 72a94301bc95..ea9eabcc0a0c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -221,9 +221,6 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo,

fbo->base = *bo;

- ttm_bo_get(bo); - fbo->bo = bo; - /** * Fix up members that we shouldn't copy directly: * TODO: Explicit member copy would probably be better here. @@ -246,6 +243,15 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, ret = dma_resv_trylock(&fbo->base.base._resv); WARN_ON(!ret);

+ ret = dma_resv_reserve_shared(&fbo->base.base._resv, 1); + if (ret) { + kfree(fbo); + return ret; + } + + ttm_bo_get(bo); + fbo->bo = bo; + ttm_bo_move_to_lru_tail_unlocked(&fbo->base);

*new_obj = &fbo->base; diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/ttm/ttm_execbuf_util.c index 071c48d672c6..5da922639d54 100644 --- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c +++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c @@ -90,6 +90,7 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket,

list_for_each_entry(entry, list, head) { struct ttm_buffer_object *bo = entry->bo; + unsigned int num_fences;

ret = ttm_bo_reserve(bo, intr, (ticket == NULL), ticket); if (ret == -EALREADY && dups) { @@ -100,12 +101,10 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket, continue; }

+ num_fences = min(entry->num_shared, 1u); if (!ret) { - if (!entry->num_shared) - continue; - ret = dma_resv_reserve_shared(bo->base.resv, - entry->num_shared); + num_fences); if (!ret) continue; } @@ -120,9 +119,9 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket, ret = ttm_bo_reserve_slowpath(bo, intr, ticket); }

- if (!ret && entry->num_shared) + if (!ret) ret = dma_resv_reserve_shared(bo->base.resv, - entry->num_shared); + num_fences);

if (unlikely(ret != 0)) { if (ticket) { diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c index c7ed2e1cbab6..1bea90e40ce1 100644 --- a/drivers/gpu/drm/v3d/v3d_gem.c +++ b/drivers/gpu/drm/v3d/v3d_gem.c @@ -259,16 +259,21 @@ v3d_lock_bo_reservations(struct v3d_job *job, return ret;

for (i = 0; i < job->bo_count; i++) { + ret = dma_resv_reserve_shared(job->bo[i]->resv, 1); + if (ret) + goto fail; + ret = drm_sched_job_add_implicit_dependencies(&job->base, job->bo[i], true); - if (ret) { - drm_gem_unlock_reservations(job->bo, job->bo_count, - acquire_ctx); - return ret; - } + if (ret) + goto fail; }

return 0; + +fail: + drm_gem_unlock_reservations(job->bo, job->bo_count, acquire_ctx); + return ret; }

/** diff --git a/drivers/gpu/drm/vgem/vgem_fence.c b/drivers/gpu/drm/vgem/vgem_fence.c index bd6f75285fd9..a4cb296d4fcd 100644 --- a/drivers/gpu/drm/vgem/vgem_fence.c +++ b/drivers/gpu/drm/vgem/vgem_fence.c @@ -157,12 +157,14 @@ int vgem_fence_attach_ioctl(struct drm_device *dev, }

/* Expose the fence via the dma-buf */ - ret = 0; dma_resv_lock(resv, NULL); - if (arg->flags & VGEM_FENCE_WRITE) - dma_resv_add_excl_fence(resv, fence); - else if ((ret = dma_resv_reserve_shared(resv, 1)) == 0) - dma_resv_add_shared_fence(resv, fence); + ret = dma_resv_reserve_shared(resv, 1); + if (!ret) { + if (arg->flags & VGEM_FENCE_WRITE) + dma_resv_add_excl_fence(resv, fence); + else + dma_resv_add_shared_fence(resv, fence); + } dma_resv_unlock(resv);

/* Record the fence in our idr for later signaling */ diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c b/drivers/gpu/drm/virtio/virtgpu_gem.c index 2de61b63ef91..aec105cdd64c 100644 --- a/drivers/gpu/drm/virtio/virtgpu_gem.c +++ b/drivers/gpu/drm/virtio/virtgpu_gem.c @@ -214,6 +214,7 @@ void virtio_gpu_array_add_obj(struct virtio_gpu_object_array *objs,

int virtio_gpu_array_lock_resv(struct virtio_gpu_object_array *objs) { + unsigned int i; int ret;

if (objs->nents == 1) { @@ -222,6 +223,14 @@ int virtio_gpu_array_lock_resv(struct virtio_gpu_object_array *objs) ret = drm_gem_lock_reservations(objs->objs, objs->nents, &objs->ticket); } + if (ret) + return ret; + + for (i = 0; i < objs->nents; ++i) { + ret = dma_resv_reserve_shared(objs->objs[i]->resv, 1); + if (ret) + return ret; + } return ret; }

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c index c97a3d5e90ce..6d0abc2b0beb 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c @@ -1053,16 +1053,22 @@ void vmw_bo_fence_single(struct ttm_buffer_object *bo, struct vmw_fence_obj *fence) { struct ttm_device *bdev = bo->bdev; - struct vmw_private *dev_priv = container_of(bdev, struct vmw_private, bdev); + int ret;

- if (fence == NULL) { + if (fence == NULL) vmw_execbuf_fence_commands(NULL, dev_priv, &fence, NULL); + else + dma_fence_get(&fence->base); + + ret = dma_resv_reserve_shared(bo->base.resv, 1); + if (!ret) dma_resv_add_excl_fence(bo->base.resv, &fence->base); - dma_fence_put(&fence->base); - } else - dma_resv_add_excl_fence(bo->base.resv, &fence->base); + else + /* Last resort fallback when we are OOM */ + dma_fence_wait(&fence->base, false); + dma_fence_put(&fence->base); }

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

9:50 p.m.

New subject: [PATCH 14/24] dma-buf/drivers: make reserving a shared slot mandatory v2

On Tue, Dec 07, 2021 at 01:34:01PM +0100, Christian König wrote:

...

Audit all the users of dma_resv_add_excl_fence() and make sure they reserve a shared slot also when only trying to add an exclusive fence.

This is the next step towards handling the exclusive fence like a shared one.

v2: fix missed case in amdgpu

Signed-off-by: Christian König christian.koenig@amd.com

Needs all the driver cc and also at least some acks/testing.

...

drivers/dma-buf/st-dma-resv.c | 64 +++++++++---------- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 +++ drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 8 +-- drivers/gpu/drm/i915/gem/i915_gem_clflush.c | 3 +- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 8 +-- .../drm/i915/gem/selftests/i915_gem_migrate.c | 5 +- drivers/gpu/drm/i915/i915_vma.c | 6 ++ .../drm/i915/selftests/intel_memory_region.c | 7 ++ drivers/gpu/drm/lima/lima_gem.c | 10 ++- drivers/gpu/drm/msm/msm_gem_submit.c | 18 +++--- drivers/gpu/drm/nouveau/nouveau_fence.c | 9 +-- drivers/gpu/drm/panfrost/panfrost_job.c | 4 ++ drivers/gpu/drm/ttm/ttm_bo_util.c | 12 +++- drivers/gpu/drm/ttm/ttm_execbuf_util.c | 11 ++--

vc4 seems missing?

Also I think I found one bug below in the conversions. -Daniel

...

drivers/gpu/drm/v3d/v3d_gem.c | 15 +++-- drivers/gpu/drm/vgem/vgem_fence.c | 12 ++-- drivers/gpu/drm/virtio/virtgpu_gem.c | 9 +++ drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 16 +++-- 18 files changed, 133 insertions(+), 92 deletions(-)

diff --git a/drivers/dma-buf/st-dma-resv.c b/drivers/dma-buf/st-dma-resv.c index cbe999c6e7a6..f33bafc78693 100644 --- a/drivers/dma-buf/st-dma-resv.c +++ b/drivers/dma-buf/st-dma-resv.c @@ -75,17 +75,16 @@ static int test_signaling(void *arg, bool shared) goto err_free; }
if (shared) {
r = dma_resv_reserve_shared(&resv, 1);
if (r) {
	pr_err("Resv shared slot allocation failed\n");
	goto err_unlock;
}
r = dma_resv_reserve_shared(&resv, 1);

if (r) {
pr_err("Resv shared slot allocation failed\n");
goto err_unlock;
}
if (shared) dma_resv_add_shared_fence(&resv, f);

} else {

else dma_resv_add_excl_fence(&resv, f);

}

if (dma_resv_test_signaled(&resv, shared)) { pr_err("Resv unexpectedly signaled\n"); @@ -134,17 +133,16 @@ static int test_for_each(void *arg, bool shared) goto err_free; }
if (shared) {
r = dma_resv_reserve_shared(&resv, 1);
if (r) {
	pr_err("Resv shared slot allocation failed\n");
	goto err_unlock;
}
r = dma_resv_reserve_shared(&resv, 1);

if (r) {
pr_err("Resv shared slot allocation failed\n");
goto err_unlock;
}
if (shared) dma_resv_add_shared_fence(&resv, f);

} else {

else dma_resv_add_excl_fence(&resv, f);

}

r = -ENOENT; dma_resv_for_each_fence(&cursor, &resv, shared, fence) { @@ -206,18 +204,17 @@ static int test_for_each_unlocked(void *arg, bool shared) goto err_free; }
if (shared) {
r = dma_resv_reserve_shared(&resv, 1);
if (r) {
	pr_err("Resv shared slot allocation failed\n");
	dma_resv_unlock(&resv);
	goto err_free;
}
r = dma_resv_reserve_shared(&resv, 1);

if (r) {
pr_err("Resv shared slot allocation failed\n");
dma_resv_unlock(&resv);
goto err_free;
}
if (shared) dma_resv_add_shared_fence(&resv, f);

} else {

else dma_resv_add_excl_fence(&resv, f);

} dma_resv_unlock(&resv);

r = -ENOENT; @@ -290,18 +287,17 @@ static int test_get_fences(void *arg, bool shared) goto err_resv; }
if (shared) {
r = dma_resv_reserve_shared(&resv, 1);
if (r) {
	pr_err("Resv shared slot allocation failed\n");
	dma_resv_unlock(&resv);
	goto err_resv;
}
r = dma_resv_reserve_shared(&resv, 1);

if (r) {
pr_err("Resv shared slot allocation failed\n");
dma_resv_unlock(&resv);
goto err_resv;
}
if (shared) dma_resv_add_shared_fence(&resv, f);

} else {

else dma_resv_add_excl_fence(&resv, f);

} dma_resv_unlock(&resv);

r = dma_resv_get_fences(&resv, shared, &i, &fences); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 4fcfc2313b8c..24a6b88afcca 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1367,6 +1367,14 @@ void amdgpu_bo_fence(struct amdgpu_bo *bo, struct dma_fence *fence, bool shared) { struct dma_resv *resv = bo->tbo.base.resv;
int r;

r = dma_resv_reserve_shared(resv, 1);

if (r) {
/* As last resort on OOM we block for the fence */
dma_fence_wait(fence, false);
return;
}
if (shared) dma_resv_add_shared_fence(resv, fence); diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c index 4286dc93fdaa..d4a7073190ec 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c @@ -179,11 +179,9 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit) struct etnaviv_gem_submit_bo *bo = &submit->bos[i]; struct dma_resv *robj = bo->obj->base.resv;
if (!(bo->flags & ETNA_SUBMIT_BO_WRITE)) {
	ret = dma_resv_reserve_shared(robj, 1);
	if (ret)
		return ret;
}
ret = dma_resv_reserve_shared(robj, 1);
if (ret)
	return ret;
if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT) continue; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c index f0435c6feb68..fc57ab914b60 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c @@ -100,7 +100,8 @@ bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, trace_i915_gem_object_clflush(obj); clflush = NULL;

if (!(flags & I915_CLFLUSH_SYNC))
if (!(flags & I915_CLFLUSH_SYNC) &&
   dma_resv_reserve_shared(obj->base.resv, 1) == 0)
clflush = clflush_work_create(obj); if (clflush) { i915_sw_fence_await_reservation(&clflush->base.chain,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 4d7da07442f2..fc0e1625847c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -989,11 +989,9 @@ static int eb_validate_vmas(struct i915_execbuffer *eb) } }
if (!(ev->flags & EXEC_OBJECT_WRITE)) {
	err = dma_resv_reserve_shared(vma->resv, 1);
	if (err)
		return err;
}
err = dma_resv_reserve_shared(vma->resv, 1);
if (err)
	return err;
GEM_BUG_ON(drm_mm_node_allocated(&vma->node) && eb_vma_misplaced(&eb->exec[i], vma, ev->flags)); diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c index 28a700f08b49..2bf491fd5cdf 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c @@ -179,7 +179,10 @@ static int igt_lmem_pages_migrate(void *arg) i915_gem_object_is_lmem(obj), 0xdeadbeaf, &rq); if (rq) {
	dma_resv_add_excl_fence(obj->base.resv, &rq->fence);
	err = dma_resv_reserve_shared(obj->base.resv, 1);
	if (!err)
		dma_resv_add_excl_fence(obj->base.resv,
					&rq->fence);
i915_request_put(rq);
} if (err)
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index bef795e265a6..5ec87de63963 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -1255,6 +1255,12 @@ int _i915_vma_move_to_active(struct i915_vma *vma, intel_frontbuffer_put(front); }
if (!(flags & __EXEC_OBJECT_NO_RESERVE)) {
	err = dma_resv_reserve_shared(vma->resv, 1);
	if (unlikely(err))
		return err;
}
if (fence) { dma_resv_add_excl_fence(vma->resv, fence); obj->write_domain = I915_GEM_DOMAIN_RENDER;
diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c index 418caae84759..b85af1672a7e 100644 --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c @@ -894,6 +894,13 @@ static int igt_lmem_write_cpu(void *arg) } i915_gem_object_lock(obj, NULL);
err = dma_resv_reserve_shared(obj->base.resv, 1);

if (err) {
i915_gem_object_unlock(obj);
goto out_put;
}

/* Put the pages into a known state -- from the gpu for added fun */ intel_engine_pm_get(engine); err = intel_context_migrate_clear(engine->gt->migrate.context, NULL,
diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c index f9a9198ef198..b4846007463f 100644 --- a/drivers/gpu/drm/lima/lima_gem.c +++ b/drivers/gpu/drm/lima/lima_gem.c @@ -255,13 +255,11 @@ int lima_gem_get_info(struct drm_file *file, u32 handle, u32 *va, u64 *offset) static int lima_gem_sync_bo(struct lima_sched_task *task, struct lima_bo *bo, bool write, bool explicit) {

int err = 0;

int err;
if (!write) {
err = dma_resv_reserve_shared(lima_bo_resv(bo), 1);
if (err)
	return err;
}
err = dma_resv_reserve_shared(lima_bo_resv(bo), 1);

if (err)
return err;
/* explicit sync use user passed dep fence */ if (explicit) diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c index 3cb029f10925..e874d09b74ef 100644 --- a/drivers/gpu/drm/msm/msm_gem_submit.c +++ b/drivers/gpu/drm/msm/msm_gem_submit.c @@ -320,16 +320,14 @@ static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit) struct drm_gem_object *obj = &submit->bos[i].obj->base; bool write = submit->bos[i].flags & MSM_SUBMIT_BO_WRITE;
if (!write) {
	/* NOTE: _reserve_shared() must happen before
	 * _add_shared_fence(), which makes this a slightly
	 * strange place to call it.  OTOH this is a
	 * convenient can-fail point to hook it in.
	 */
	ret = dma_resv_reserve_shared(obj->resv, 1);
	if (ret)
		return ret;
}
/* NOTE: _reserve_shared() must happen before
 * _add_shared_fence(), which makes this a slightly
 * strange place to call it.  OTOH this is a
 * convenient can-fail point to hook it in.
 */
ret = dma_resv_reserve_shared(obj->resv, 1);
if (ret)
	return ret;
/* exclusive fences must be ordered */ if (no_implicit && !write) diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c index 26f9299df881..cd6715bd6d6b 100644 --- a/drivers/gpu/drm/nouveau/nouveau_fence.c +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c @@ -349,12 +349,9 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, struct nouveau_fence *f; int ret;
if (!exclusive) {
ret = dma_resv_reserve_shared(resv, 1);
if (ret)
	return ret;
}
ret = dma_resv_reserve_shared(resv, 1);

if (ret)
return ret;
dma_resv_for_each_fence(&cursor, resv, exclusive, fence) { struct nouveau_channel *prev = NULL; diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c index 908d79520853..89c3fe389476 100644 --- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -247,6 +247,10 @@ static int panfrost_acquire_object_fences(struct drm_gem_object **bos, int i, ret; for (i = 0; i < bo_count; i++) {
ret = dma_resv_reserve_shared(bos[i]->resv, 1);
if (ret)
	return ret;
/* panfrost always uses write mode in its current uapi */ ret = drm_sched_job_add_implicit_dependencies(job, bos[i], true);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index 72a94301bc95..ea9eabcc0a0c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -221,9 +221,6 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, fbo->base = *bo;

ttm_bo_get(bo);

fbo->bo = bo;

/**

Fix up members that we shouldn't copy directly:

TODO: Explicit member copy would probably be better here.

@@ -246,6 +243,15 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, ret = dma_resv_trylock(&fbo->base.base._resv); WARN_ON(!ret);
ret = dma_resv_reserve_shared(&fbo->base.base._resv, 1);

if (ret) {
kfree(fbo);
return ret;
}

ttm_bo_get(bo);

fbo->bo = bo;

ttm_bo_move_to_lru_tail_unlocked(&fbo->base);
*new_obj = &fbo->base; diff --git a/drivers/gpu/drm/ttm/ttm_execbuf_util.c b/drivers/gpu/drm/ttm/ttm_execbuf_util.c index 071c48d672c6..5da922639d54 100644 --- a/drivers/gpu/drm/ttm/ttm_execbuf_util.c +++ b/drivers/gpu/drm/ttm/ttm_execbuf_util.c @@ -90,6 +90,7 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket, list_for_each_entry(entry, list, head) { struct ttm_buffer_object *bo = entry->bo;
unsigned int num_fences;
ret = ttm_bo_reserve(bo, intr, (ticket == NULL), ticket); if (ret == -EALREADY && dups) { @@ -100,12 +101,10 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket, continue; }
num_fences = min(entry->num_shared, 1u);
if (!ret) {
	if (!entry->num_shared)
		continue;
ret = dma_resv_reserve_shared(bo->base.resv,
						entry->num_shared);
				      num_fences);

Needs to be at least one, otherwise you call this with 0 when we want to install a write fence ang go boom?

...

	if (!ret)
		continue;
}
@@ -120,9 +119,9 @@ int ttm_eu_reserve_buffers(struct ww_acquire_ctx *ticket, ret = ttm_bo_reserve_slowpath(bo, intr, ticket); }
if (!ret && entry->num_shared)
if (!ret)
ret = dma_resv_reserve_shared(bo->base.resv,
						entry->num_shared);
				      num_fences);
if (unlikely(ret != 0)) { if (ticket) { diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c index c7ed2e1cbab6..1bea90e40ce1 100644 --- a/drivers/gpu/drm/v3d/v3d_gem.c +++ b/drivers/gpu/drm/v3d/v3d_gem.c @@ -259,16 +259,21 @@ v3d_lock_bo_reservations(struct v3d_job *job, return ret; for (i = 0; i < job->bo_count; i++) {
ret = dma_resv_reserve_shared(job->bo[i]->resv, 1);
if (ret)
	goto fail;
ret = drm_sched_job_add_implicit_dependencies(&job->base, job->bo[i], true);
if (ret) {
	drm_gem_unlock_reservations(job->bo, job->bo_count,
				    acquire_ctx);
	return ret;
}
if (ret)
	goto fail;
}
return 0;

+fail:

drm_gem_unlock_reservations(job->bo, job->bo_count, acquire_ctx);

return ret;

} /** diff --git a/drivers/gpu/drm/vgem/vgem_fence.c b/drivers/gpu/drm/vgem/vgem_fence.c index bd6f75285fd9..a4cb296d4fcd 100644 --- a/drivers/gpu/drm/vgem/vgem_fence.c +++ b/drivers/gpu/drm/vgem/vgem_fence.c @@ -157,12 +157,14 @@ int vgem_fence_attach_ioctl(struct drm_device *dev, } /* Expose the fence via the dma-buf */
ret = 0; dma_resv_lock(resv, NULL);

if (arg->flags & VGEM_FENCE_WRITE)
dma_resv_add_excl_fence(resv, fence);
else if ((ret = dma_resv_reserve_shared(resv, 1)) == 0)
dma_resv_add_shared_fence(resv, fence);
ret = dma_resv_reserve_shared(resv, 1);

if (!ret) {
if (arg->flags & VGEM_FENCE_WRITE)
	dma_resv_add_excl_fence(resv, fence);
else
	dma_resv_add_shared_fence(resv, fence);
} dma_resv_unlock(resv);
/* Record the fence in our idr for later signaling */ diff --git a/drivers/gpu/drm/virtio/virtgpu_gem.c b/drivers/gpu/drm/virtio/virtgpu_gem.c index 2de61b63ef91..aec105cdd64c 100644 --- a/drivers/gpu/drm/virtio/virtgpu_gem.c +++ b/drivers/gpu/drm/virtio/virtgpu_gem.c @@ -214,6 +214,7 @@ void virtio_gpu_array_add_obj(struct virtio_gpu_object_array *objs, int virtio_gpu_array_lock_resv(struct virtio_gpu_object_array *objs) {

unsigned int i; int ret;

if (objs->nents == 1) { @@ -222,6 +223,14 @@ int virtio_gpu_array_lock_resv(struct virtio_gpu_object_array *objs) ret = drm_gem_lock_reservations(objs->objs, objs->nents, &objs->ticket); }
if (ret)
return ret;
for (i = 0; i < objs->nents; ++i) {
ret = dma_resv_reserve_shared(objs->objs[i]->resv, 1);
if (ret)
	return ret;
} return ret;
} diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c index c97a3d5e90ce..6d0abc2b0beb 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c @@ -1053,16 +1053,22 @@ void vmw_bo_fence_single(struct ttm_buffer_object *bo, struct vmw_fence_obj *fence) { struct ttm_device *bdev = bo->bdev;

struct vmw_private *dev_priv = container_of(bdev, struct vmw_private, bdev);

int ret;

if (fence == NULL) {
if (fence == NULL) vmw_execbuf_fence_commands(NULL, dev_priv, &fence, NULL);

else
dma_fence_get(&fence->base);
ret = dma_resv_reserve_shared(bo->base.resv, 1);

if (!ret) dma_resv_add_excl_fence(bo->base.resv, &fence->base);
dma_fence_put(&fence->base);
} else
dma_resv_add_excl_fence(bo->base.resv, &fence->base);
else
/* Last resort fallback when we are OOM */
dma_fence_wait(&fence->base, false);
dma_fence_put(&fence->base);
} -- 2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

7 Dec 7 Dec

12:34 p.m.

New subject: [PATCH 15/24] drm: support more than one write fence in drm_gem_plane_helper_prepare_fb

Use dma_resv_get_singleton() here to eventually get more than one write fence as single fence.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/drm_gem_atomic_helper.c | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_atomic_helper.c b/drivers/gpu/drm/drm_gem_atomic_helper.c index c3189afe10cb..9338ddb7edff 100644 --- a/drivers/gpu/drm/drm_gem_atomic_helper.c +++ b/drivers/gpu/drm/drm_gem_atomic_helper.c @@ -143,25 +143,21 @@ */ int drm_gem_plane_helper_prepare_fb(struct drm_plane *plane, struct drm_plane_state *state) { - struct dma_resv_iter cursor; struct drm_gem_object *obj; struct dma_fence *fence; + int ret;

if (!state->fb) return 0;

obj = drm_gem_fb_get_obj(state->fb, 0); - dma_resv_iter_begin(&cursor, obj->resv, false); - dma_resv_for_each_fence_unlocked(&cursor, fence) { - /* TODO: Currently there should be only one write fence, so this - * here works fine. But drm_atomic_set_fence_for_plane() should - * be changed to be able to handle more fences in general for - * multiple BOs per fb anyway. */ - dma_fence_get(fence); - break; - } - dma_resv_iter_end(&cursor); + ret = dma_resv_get_singleton(obj->resv, false, &fence); + if (ret) + return ret;

+ /* TODO: drm_atomic_set_fence_for_plane() should be changed to be able + * to handle more fences in general for multiple BOs per fb. + */ drm_atomic_set_fence_for_plane(state, fence); return 0; }

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

9:51 p.m.

New subject: [PATCH 15/24] drm: support more than one write fence in drm_gem_plane_helper_prepare_fb

On Tue, Dec 07, 2021 at 01:34:02PM +0100, Christian König wrote:

...

Use dma_resv_get_singleton() here to eventually get more than one write fence as single fence.

Signed-off-by: Christian König christian.koenig@amd.com

Patch title should be drm/atomic-helper: prefix, not just drm:

With that nit:

Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch

...

drivers/gpu/drm/drm_gem_atomic_helper.c | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_gem_atomic_helper.c b/drivers/gpu/drm/drm_gem_atomic_helper.c index c3189afe10cb..9338ddb7edff 100644 --- a/drivers/gpu/drm/drm_gem_atomic_helper.c +++ b/drivers/gpu/drm/drm_gem_atomic_helper.c @@ -143,25 +143,21 @@ */ int drm_gem_plane_helper_prepare_fb(struct drm_plane *plane, struct drm_plane_state *state) {

struct dma_resv_iter cursor; struct drm_gem_object *obj; struct dma_fence *fence;

int ret;

if (!state->fb) return 0; obj = drm_gem_fb_get_obj(state->fb, 0);
dma_resv_iter_begin(&cursor, obj->resv, false);

dma_resv_for_each_fence_unlocked(&cursor, fence) {
/* TODO: Currently there should be only one write fence, so this
 * here works fine. But drm_atomic_set_fence_for_plane() should
 * be changed to be able to handle more fences in general for
 * multiple BOs per fb anyway. */
dma_fence_get(fence);
break;
}

dma_resv_iter_end(&cursor);
ret = dma_resv_get_singleton(obj->resv, false, &fence);

if (ret)
return ret;
/* TODO: drm_atomic_set_fence_for_plane() should be changed to be able
* to handle more fences in general for multiple BOs per fb.
*/
drm_atomic_set_fence_for_plane(state, fence); return 0;
}

2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

7 Dec 7 Dec

12:34 p.m.

New subject: [PATCH 16/24] drm/nouveau: support more than one write fence in fenv50_wndw_prepare_fb

Use dma_resv_get_singleton() here to eventually get more than one write fence as single fence.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/nouveau/dispnv50/wndw.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c b/drivers/gpu/drm/nouveau/dispnv50/wndw.c index 133c8736426a..b55a8a723581 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c +++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c @@ -536,8 +536,6 @@ nv50_wndw_prepare_fb(struct drm_plane *plane, struct drm_plane_state *state) struct nouveau_bo *nvbo; struct nv50_head_atom *asyh; struct nv50_wndw_ctxdma *ctxdma; - struct dma_resv_iter cursor; - struct dma_fence *fence; int ret;

NV_ATOMIC(drm, "%s prepare: %p\n", plane->name, fb); @@ -560,13 +558,11 @@ nv50_wndw_prepare_fb(struct drm_plane *plane, struct drm_plane_state *state) asyw->image.handle[0] = ctxdma->object.handle; }

- dma_resv_iter_begin(&cursor, nvbo->bo.base.resv, false); - dma_resv_for_each_fence_unlocked(&cursor, fence) { - /* TODO: We only use the first writer here */ - asyw->state.fence = dma_fence_get(fence); - break; - } - dma_resv_iter_end(&cursor); + ret = dma_resv_get_singleton(nvbo->bo.base.resv, false, + &asyw->state.fence); + if (ret) + return ret; + asyw->image.offset[0] = nvbo->offset;

if (wndw->func->prepare) {

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

9:52 p.m.

New subject: [PATCH 16/24] drm/nouveau: support more than one write fence in fenv50_wndw_prepare_fb

On Tue, Dec 07, 2021 at 01:34:03PM +0100, Christian König wrote:

...

Use dma_resv_get_singleton() here to eventually get more than one write fence as single fence.

Signed-off-by: Christian König christian.koenig@amd.com

drivers/gpu/drm/nouveau/dispnv50/wndw.c | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c b/drivers/gpu/drm/nouveau/dispnv50/wndw.c index 133c8736426a..b55a8a723581 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c +++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c @@ -536,8 +536,6 @@ nv50_wndw_prepare_fb(struct drm_plane *plane, struct drm_plane_state *state) struct nouveau_bo *nvbo; struct nv50_head_atom *asyh; struct nv50_wndw_ctxdma *ctxdma;

struct dma_resv_iter cursor;

struct dma_fence *fence; int ret;

NV_ATOMIC(drm, "%s prepare: %p\n", plane->name, fb); @@ -560,13 +558,11 @@ nv50_wndw_prepare_fb(struct drm_plane *plane, struct drm_plane_state *state) asyw->image.handle[0] = ctxdma->object.handle; }
dma_resv_iter_begin(&cursor, nvbo->bo.base.resv, false);

dma_resv_for_each_fence_unlocked(&cursor, fence) {
/* TODO: We only use the first writer here */
asyw->state.fence = dma_fence_get(fence);
break;
}

dma_resv_iter_end(&cursor);
ret = dma_resv_get_singleton(nvbo->bo.base.resv, false,
		     &asyw->state.fence);

Needs nouveau-ack, but otherwise lgtm.

Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch

...

if (ret)
return ret;
asyw->image.offset[0] = nvbo->offset;
if (wndw->func->prepare) { -- 2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

7 Dec 7 Dec

12:34 p.m.

New subject: [PATCH 17/24] drm/amdgpu: use dma_resv_get_singleton in amdgpu_pasid_free_cb

Makes the code a bit more simpler.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 23 +++-------------------- 1 file changed, 3 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c index be48487e2ca7..888d97143177 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c @@ -107,36 +107,19 @@ static void amdgpu_pasid_free_cb(struct dma_fence *fence, void amdgpu_pasid_free_delayed(struct dma_resv *resv, u32 pasid) { - struct dma_fence *fence, **fences; struct amdgpu_pasid_cb *cb; - unsigned count; + struct dma_fence *fence; int r;

- r = dma_resv_get_fences(resv, true, &count, &fences); + r = dma_resv_get_singleton(resv, true, &fence); if (r) goto fallback;

- if (count == 0) { + if (!fence) { amdgpu_pasid_free(pasid); return; }

- if (count == 1) { - fence = fences[0]; - kfree(fences); - } else { - uint64_t context = dma_fence_context_alloc(1); - struct dma_fence_array *array; - - array = dma_fence_array_create(count, fences, context, - 1, false); - if (!array) { - kfree(fences); - goto fallback; - } - fence = &array->base; - } - cb = kmalloc(sizeof(*cb), GFP_KERNEL); if (!cb) { /* Last resort when we are OOM */

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

9:53 p.m.

New subject: [PATCH 17/24] drm/amdgpu: use dma_resv_get_singleton in amdgpu_pasid_free_cb

On Tue, Dec 07, 2021 at 01:34:04PM +0100, Christian König wrote:

...

Makes the code a bit more simpler.

Signed-off-by: Christian König christian.koenig@amd.com

Reviewed-by: Daniel Vetter daniel.vetter@ffwll.ch

...

drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 23 +++-------------------- 1 file changed, 3 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c index be48487e2ca7..888d97143177 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c @@ -107,36 +107,19 @@ static void amdgpu_pasid_free_cb(struct dma_fence *fence, void amdgpu_pasid_free_delayed(struct dma_resv *resv, u32 pasid) {

struct dma_fence *fence, **fences; struct amdgpu_pasid_cb *cb;

unsigned count;

struct dma_fence *fence; int r;

r = dma_resv_get_fences(resv, true, &count, &fences);

r = dma_resv_get_singleton(resv, true, &fence); if (r) goto fallback;

if (count == 0) {

if (!fence) { amdgpu_pasid_free(pasid); return; }
if (count == 1) {
fence = fences[0];
kfree(fences);
} else {
uint64_t context = dma_fence_context_alloc(1);
struct dma_fence_array *array;
array = dma_fence_array_create(count, fences, context,
			       1, false);
if (!array) {
	kfree(fences);
	goto fallback;
}
fence = &array->base;
}

cb = kmalloc(sizeof(*cb), GFP_KERNEL); if (!cb) { /* Last resort when we are OOM */
-- 2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

7 Dec 7 Dec

12:34 p.m.

New subject: [PATCH 20/24] dma-buf: add DMA_RESV_USAGE_KERNEL

Add an usage for kernel submissions. Waiting for those are mandatory for dynamic DMA-bufs.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/dma-buf/st-dma-resv.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 2 +- drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 6 ++++-- drivers/gpu/drm/i915/gem/i915_gem_clflush.c | 2 +- drivers/gpu/drm/nouveau/nouveau_bo.c | 4 ++-- drivers/gpu/drm/radeon/radeon_uvd.c | 2 +- drivers/gpu/drm/ttm/ttm_bo.c | 2 +- drivers/gpu/drm/ttm/ttm_bo_util.c | 4 ++-- drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 2 +- drivers/infiniband/core/umem_dmabuf.c | 2 +- include/linux/dma-resv.h | 22 ++++++++++++++++++++ 13 files changed, 39 insertions(+), 15 deletions(-)

diff --git a/drivers/dma-buf/st-dma-resv.c b/drivers/dma-buf/st-dma-resv.c index d0f7c2bfd4f0..062b57d63fa6 100644 --- a/drivers/dma-buf/st-dma-resv.c +++ b/drivers/dma-buf/st-dma-resv.c @@ -296,7 +296,7 @@ int dma_resv(void) int r;

spin_lock_init(&fence_lock); - for (usage = DMA_RESV_USAGE_WRITE; usage <= DMA_RESV_USAGE_READ; + for (usage = DMA_RESV_USAGE_KERNEL; usage <= DMA_RESV_USAGE_READ; ++usage) { r = subtests(tests, (void *)(unsigned long)usage); if (r) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index eaa19154551c..a40ede9bccd0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -764,7 +764,7 @@ int amdgpu_bo_kmap(struct amdgpu_bo *bo, void **ptr) return 0; }

- r = dma_resv_wait_timeout(bo->tbo.base.resv, DMA_RESV_USAGE_WRITE, + r = dma_resv_wait_timeout(bo->tbo.base.resv, DMA_RESV_USAGE_KERNEL, false, MAX_SCHEDULE_TIMEOUT); if (r < 0) return r; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c index 33deb0df62fd..9e102080dad9 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c @@ -1163,7 +1163,7 @@ static int amdgpu_uvd_send_msg(struct amdgpu_ring *ring, struct amdgpu_bo *bo,

if (direct) { r = dma_resv_wait_timeout(bo->tbo.base.resv, - DMA_RESV_USAGE_WRITE, false, + DMA_RESV_USAGE_KERNEL, false, msecs_to_jiffies(10)); if (r == 0) r = -ETIMEDOUT; diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c index 2d77e469ef3c..a2f627af3ce2 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c @@ -185,9 +185,11 @@ static int submit_fence_sync(struct etnaviv_gem_submit *submit) return ret;

if (submit->flags & ETNA_SUBMIT_NO_IMPLICIT) - continue; + usage = DMA_RESV_USAGE_KERNEL; + else + usage = dma_resv_usage_rw(bo->flags & + ETNA_SUBMIT_BO_WRITE);

- usage = dma_resv_usage_rw(bo->flags & ETNA_SUBMIT_BO_WRITE); ret = dma_resv_get_fences(robj, usage, &bo->nr_shared, &bo->shared); if (ret) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c index e70fb65bb54f..b9281ca96ece 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c @@ -109,7 +109,7 @@ bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, i915_fence_timeout(to_i915(obj->base.dev)), I915_FENCE_GFP); dma_resv_add_fence(obj->base.resv, &clflush->base.dma, - DMA_RESV_USAGE_WRITE); + DMA_RESV_USAGE_KERNEL); dma_fence_work_commit(&clflush->base); } else if (obj->mm.pages) { __do_clflush(obj); diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index 05076e530e7d..13deb6c70ba6 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -962,10 +962,10 @@ nouveau_bo_vm_cleanup(struct ttm_buffer_object *bo, struct dma_fence *fence; int ret;

- ret = dma_resv_get_singleton(bo->base.resv, DMA_RESV_USAGE_WRITE, + ret = dma_resv_get_singleton(bo->base.resv, DMA_RESV_USAGE_KERNEL, &fence); if (ret) - dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_WRITE, + dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_KERNEL, false, MAX_SCHEDULE_TIMEOUT);

nv10_bo_put_tile_region(dev, *old_tile, fence); diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c b/drivers/gpu/drm/radeon/radeon_uvd.c index 4000ad2f39ba..488e78889dd6 100644 --- a/drivers/gpu/drm/radeon/radeon_uvd.c +++ b/drivers/gpu/drm/radeon/radeon_uvd.c @@ -478,7 +478,7 @@ static int radeon_uvd_cs_msg(struct radeon_cs_parser *p, struct radeon_bo *bo, return -EINVAL; }

- r = dma_resv_wait_timeout(bo->tbo.base.resv, DMA_RESV_USAGE_WRITE, + r = dma_resv_wait_timeout(bo->tbo.base.resv, DMA_RESV_USAGE_KERNEL, false, MAX_SCHEDULE_TIMEOUT); if (r <= 0) { DRM_ERROR("Failed waiting for UVD message (%d)!\n", r); diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index f52b451e26dc..ad83f42fc9ee 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -762,7 +762,7 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, return ret; }

- dma_resv_add_fence(bo->base.resv, fence, DMA_RESV_USAGE_WRITE); + dma_resv_add_fence(bo->base.resv, fence, DMA_RESV_USAGE_KERNEL);

ret = dma_resv_reserve_fences(bo->base.resv, 1); if (unlikely(ret)) { diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index e56e16a7f886..b9cfb62c4b6e 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -504,7 +504,7 @@ static int ttm_bo_move_to_ghost(struct ttm_buffer_object *bo, return ret;

dma_resv_add_fence(&ghost_obj->base._resv, fence, - DMA_RESV_USAGE_WRITE); + DMA_RESV_USAGE_KERNEL);

/** * If we're not moving to fixed memory, the TTM object @@ -559,7 +559,7 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo, struct ttm_resource_manager *man = ttm_manager_type(bdev, new_mem->mem_type); int ret = 0;

- dma_resv_add_fence(bo->base.resv, fence, DMA_RESV_USAGE_WRITE); + dma_resv_add_fence(bo->base.resv, fence, DMA_RESV_USAGE_KERNEL); if (!evict) ret = ttm_bo_move_to_ghost(bo, fence, man->use_tt); else if (!from->use_tt && pipeline) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c index a77921a44a72..f871ddb62606 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c @@ -1065,7 +1065,7 @@ void vmw_bo_fence_single(struct ttm_buffer_object *bo, ret = dma_resv_reserve_fences(bo->base.resv, 1); if (!ret) dma_resv_add_fence(bo->base.resv, &fence->base, - DMA_RESV_USAGE_WRITE); + DMA_RESV_USAGE_KERNEL); else /* Last resort fallback when we are OOM */ dma_fence_wait(&fence->base, false); diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c index 29452e150424..9e3dcbb573e7 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c @@ -1170,7 +1170,7 @@ int vmw_resources_clean(struct vmw_buffer_object *vbo, pgoff_t start, dma_fence_put(bo->moving);

return dma_resv_get_singleton(bo->base.resv, - DMA_RESV_USAGE_WRITE, + DMA_RESV_USAGE_KERNEL, &bo->moving); }

diff --git a/drivers/infiniband/core/umem_dmabuf.c b/drivers/infiniband/core/umem_dmabuf.c index f9901d273b8e..fce80a4a5147 100644 --- a/drivers/infiniband/core/umem_dmabuf.c +++ b/drivers/infiniband/core/umem_dmabuf.c @@ -68,7 +68,7 @@ int ib_umem_dmabuf_map_pages(struct ib_umem_dmabuf *umem_dmabuf) * the migration. */ return dma_resv_wait_timeout(umem_dmabuf->attach->dmabuf->resv, - DMA_RESV_USAGE_WRITE, + DMA_RESV_USAGE_KERNEL, false, MAX_SCHEDULE_TIMEOUT); } EXPORT_SYMBOL(ib_umem_dmabuf_map_pages); diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 4f3a6abf43c4..29d799991496 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -54,8 +54,30 @@ struct dma_resv_list; * * This enum describes the different use cases for a dma_resv object and * controls which fences are returned when queried. + * + * An important fact is that there is the order KERNEL<WRITE<READ and + * when the dma_resv object is asked for fences for one use case the fences + * for the lower use case are returned as well. + * + * For example when asking for WRITE fences then the KERNEL fences are returned + * as well. Similar when asked for READ fences then both WRITE and KERNEL + * fences are returned as well. */ enum dma_resv_usage { + /** + * @DMA_RESV_USAGE_KERNEL: For in kernel memory management only. + * + * This should only be used for things like copying or clearing memory + * with a DMA hardware engine for the purpose of kernel memory + * management. + * + * Drivers *always* need to wait for those fences before accessing the + * resource protected by the dma_resv object. The only exception for + * that is when the resource is known to be locked down in place by + * pinning it previously. + */ + DMA_RESV_USAGE_KERNEL, + /** * @DMA_RESV_USAGE_WRITE: Implicit write synchronization. *

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

10:05 p.m.

New subject: [PATCH 20/24] dma-buf: add DMA_RESV_USAGE_KERNEL

On Tue, Dec 07, 2021 at 01:34:07PM +0100, Christian König wrote:

...

Add an usage for kernel submissions. Waiting for those are mandatory for dynamic DMA-bufs.

Signed-off-by: Christian König christian.koenig@amd.com

Again just skipping to the doc bikeshedding, maybe with more cc others help with some code review too.

...

EXPORT_SYMBOL(ib_umem_dmabuf_map_pages); diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 4f3a6abf43c4..29d799991496 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -54,8 +54,30 @@ struct dma_resv_list;

This enum describes the different use cases for a dma_resv object and

controls which fences are returned when queried.

An important fact is that there is the order KERNEL<WRITE<READ and

when the dma_resv object is asked for fences for one use case the fences

for the lower use case are returned as well.

For example when asking for WRITE fences then the KERNEL fences are returned

as well. Similar when asked for READ fences then both WRITE and KERNEL

fences are returned as well.

*/

enum dma_resv_usage {
/**
* @DMA_RESV_USAGE_KERNEL: For in kernel memory management only.
*
* This should only be used for things like copying or clearing memory
* with a DMA hardware engine for the purpose of kernel memory
* management.
*
    * Drivers *always* need to wait for those fences before accessing the

s/need to/must/ to stay with usual RFC wording. It's a hard requirement or there's a security bug somewhere.

...

* resource protected by the dma_resv object. The only exception for

* that is when the resource is known to be locked down in place by

```
* pinning it previously.
```

Is this true? This sounds more confusing than helpful, because afaik in general our pin interfaces do not block for any kernel fences. dma_buf_pin doesn't do that for sure. And I don't think ttm does that either.

I think the only safe thing here is to state that it's safe if a) the resource is pinned down and b) the callers has previously waited for the kernel fences.

I also think we should put that wait for kernel fences into dma_buf_pin(), but that's maybe a later patch. -Daniel

...

*/
DMA_RESV_USAGE_KERNEL,

/**

@DMA_RESV_USAGE_WRITE: Implicit write synchronization.
-- 2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Jason Ekstrand

2 Mar 2 Mar

6:11 p.m.

New subject: [PATCH 20/24] dma-buf: add DMA_RESV_USAGE_KERNEL

On Wed, Dec 22, 2021 at 4:05 PM Daniel Vetter daniel@ffwll.ch wrote:

...

On Tue, Dec 07, 2021 at 01:34:07PM +0100, Christian König wrote:

...
Add an usage for kernel submissions. Waiting for those are mandatory for dynamic DMA-bufs.

Signed-off-by: Christian König christian.koenig@amd.com

Again just skipping to the doc bikeshedding, maybe with more cc others help with some code review too.

...
EXPORT_SYMBOL(ib_umem_dmabuf_map_pages); diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 4f3a6abf43c4..29d799991496 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -54,8 +54,30 @@ struct dma_resv_list;

This enum describes the different use cases for a dma_resv object and

controls which fences are returned when queried.

An important fact is that there is the order KERNEL<WRITE<READ and

when the dma_resv object is asked for fences for one use case the

fences

...

for the lower use case are returned as well.

For example when asking for WRITE fences then the KERNEL fences are

returned

...

as well. Similar when asked for READ fences then both WRITE and

KERNEL

...
fences are returned as well.

*/

enum dma_resv_usage {
/**
 * @DMA_RESV_USAGE_KERNEL: For in kernel memory management only.
 *
 * This should only be used for things like copying or clearing
memory

...
 * with a DMA hardware engine for the purpose of kernel memory
 * management.
 *
    * Drivers *always* need to wait for those fences before
accessing the

super-nit: Your whitespace is wrong here.

...

s/need to/must/ to stay with usual RFC wording. It's a hard requirement or there's a security bug somewhere.

Yeah, probably. I like *must* but that's because that's what we use in the VK spec. Do whatever's usual for kernel docs.

Not sure where to put this comment but I feel like the way things are framed is a bit the wrong way around. Specifically, I don't think we should be talking about what fences you must wait on so much as what fences you can safely skip. In the previous model, the exclusive fence had to be waited on at all times and the shared fences could be skipped unless you were doing something that would result in a new exclusive fence. In this new world of "it's just a bucket of fences", we need to be very sure the waiting is happening on the right things. It sounds (I could be wrong) like USAGE_KERNEL is the new exclusive fence. If so, we need to make it virtually impossible to ignore.

Sorry if that's a bit of a ramble. I think what I'm saying is this: In whatever helpers or iterators we have, be that get_singleton or iter_begin or whatever, we need to be sure we specify things in terms of exclusion and not inclusion. "Give me everything except implicit sync read fences" rather than "give me implicit sync write fences". If having a single, well-ordered enum is sufficient for that, great. If we think we'll ever end up with something other than a strict ordering, we may need to re-think a bit.

Concerning well-ordering... I'm a bit surprised to only see three values here. I expected 4:

- kernel exclusive, used for memory moves and the like - kernel shared, used for "I'm using this right now, don't yank it out from under me" which may not have any implicit sync implications whatsoever - implicit sync write - implicit sync read

If we had those four, I don't think the strict ordering works anymore. From the POV of implicit sync, they would look at the implicit sync read/write fences and maybe not even kernel exclusive. From the POV of some doing a BO move, they'd look at all of them. From the POV of holding on to memory while Vulkan is using it, you want to set a kernel shared fence but it doesn't need to interact with implicit sync at all. Am I missing something obvious here?

--Jason

...

...
 * resource protected by the dma_resv object. The only exception
for

...
 * that is when the resource is known to be locked down in place by
 * pinning it previously.
Is this true? This sounds more confusing than helpful, because afaik in general our pin interfaces do not block for any kernel fences. dma_buf_pin doesn't do that for sure. And I don't think ttm does that either.

I think the only safe thing here is to state that it's safe if a) the resource is pinned down and b) the callers has previously waited for the kernel fences.

I also think we should put that wait for kernel fences into dma_buf_pin(), but that's maybe a later patch. -Daniel

...
 */
DMA_RESV_USAGE_KERNEL,
/**
 * @DMA_RESV_USAGE_WRITE: Implicit write synchronization.
 *
-- 2.25.1
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

3 Mar 3 Mar

1:49 p.m.

New subject: [PATCH 20/24] dma-buf: add DMA_RESV_USAGE_KERNEL

Am 02.03.22 um 19:11 schrieb Jason Ekstrand:

...

On Wed, Dec 22, 2021 at 4:05 PM Daniel Vetter daniel@ffwll.ch wrote:

On Tue, Dec 07, 2021 at 01:34:07PM +0100, Christian König wrote:
> Add an usage for kernel submissions. Waiting for those
> are mandatory for dynamic DMA-bufs.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>

Again just skipping to the doc bikeshedding, maybe with more cc others
help with some code review too.

>  EXPORT_SYMBOL(ib_umem_dmabuf_map_pages);
> diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
> index 4f3a6abf43c4..29d799991496 100644
> --- a/include/linux/dma-resv.h
> +++ b/include/linux/dma-resv.h
> @@ -54,8 +54,30 @@ struct dma_resv_list;
>   *
>   * This enum describes the different use cases for a dma_resv
object and
>   * controls which fences are returned when queried.
> + *
> + * An important fact is that there is the order
KERNEL<WRITE<READ and
> + * when the dma_resv object is asked for fences for one use
case the fences
> + * for the lower use case are returned as well.
> + *
> + * For example when asking for WRITE fences then the KERNEL
fences are returned
> + * as well. Similar when asked for READ fences then both WRITE
and KERNEL
> + * fences are returned as well.
>   */
>  enum dma_resv_usage {
> +     /**
> +      * @DMA_RESV_USAGE_KERNEL: For in kernel memory management
only.
> +      *
> +      * This should only be used for things like copying or
clearing memory
> +      * with a DMA hardware engine for the purpose of kernel memory
> +      * management.
> +      *
> +         * Drivers *always* need to wait for those fences
before accessing the

super-nit: Your whitespace is wrong here.

Fixed, thanks.

...

s/need to/must/ to stay with usual RFC wording. It's a hard
requirement or
there's a security bug somewhere.
Yeah, probably. I like *must* but that's because that's what we use in the VK spec. Do whatever's usual for kernel docs.

I agree, must sounds better and is already fixed.

...

Not sure where to put this comment but I feel like the way things are framed is a bit the wrong way around. Specifically, I don't think we should be talking about what fences you must wait on so much as what fences you can safely skip. In the previous model, the exclusive fence had to be waited on at all times and the shared fences could be skipped unless you were doing something that would result in a new exclusive fence.

Well exactly that's what we unfortunately didn't do, as Daniel explained some drivers just ignored the exclusive fence sometimes.

...

In this new world of "it's just a bucket of fences", we need to be very sure the waiting is happening on the right things. It sounds (I could be wrong) like USAGE_KERNEL is the new exclusive fence. If so, we need to make it virtually impossible to ignore.

Yes, exactly that's the goal here.

...

Sorry if that's a bit of a ramble. I think what I'm saying is this: In whatever helpers or iterators we have, be that get_singleton or iter_begin or whatever, we need to be sure we specify things in terms of exclusion and not inclusion. "Give me everything except implicit sync read fences" rather than "give me implicit sync write fences".

Mhm, exactly that's what I tried to avoid. The basic idea here is that the driver and memory management components tells the framework what use case it has and the framework returns the appropriate fences for that.

So when the use case is mmap() the buffer on the CPU without any further sync (for example) you only get the kernel fences.

When the use case is you want to add a CS which is an implicit read you get all kernel fences plus all writers (see function dma_resv_usage_rw).

When the use case is you want to add a CS which is an implicit write you get all kernel fences, other writers as well as readers.

And last when you are the memory management which wants to move a buffer around you get everything.

...

If having a single, well-ordered enum is sufficient for that, great. If we think we'll ever end up with something other than a strict ordering, we may need to re-think a bit.

I actually started with a matrix which gives you an indicator when to sync with what, but at least for now the well-ordered enum seems to get the job done as well and is far less complex.

...

Concerning well-ordering... I'm a bit surprised to only see three values here. I expected 4:

- kernel exclusive, used for memory moves and the like - kernel shared, used for "I'm using this right now, don't yank it out from under me" which may not have any implicit sync implications whatsoever - implicit sync write - implicit sync read

See the follow up patch which adds DMA_RESV_USAGE_BOOKKEEP. That's the 4th one you are missing.

...

If we had those four, I don't think the strict ordering works anymore. From the POV of implicit sync, they would look at the implicit sync read/write fences and maybe not even kernel exclusive. From the POV of some doing a BO move, they'd look at all of them. From the POV of holding on to memory while Vulkan is using it, you want to set a kernel shared fence but it doesn't need to interact with implicit sync at all. Am I missing something obvious here?

Yeah, sounds like you didn't looked at patch 21 :)

My thinking is more or less exactly the same. Only difference is that I've put the BOOKKEEP usage after the implicit read and write usages. This way you can keep the strict ordering since the implicit submissions won't ask for the BOOKKEEP usage.

The order is then KERNEL<WRITE<READ<BOOKKEEP. See the final documentation here as well:

* An important fact is that there is the order KERNEL<WRITE<READ<BOOKKEEP and * when the dma_resv object is asked for fences for one use case the fences * for the lower use case are returned as well. * * For example when asking for WRITE fences then the KERNEL fences are returned * as well. Similar when asked for READ fences then both WRITE and KERNEL * fences are returned as well.

Regards, Christian.

...

--Jason

> +      * resource protected by the dma_resv object. The only
exception for
> +      * that is when the resource is known to be locked down in
place by
> +      * pinning it previously.

Is this true? This sounds more confusing than helpful, because
afaik in
general our pin interfaces do not block for any kernel fences.
dma_buf_pin
doesn't do that for sure. And I don't think ttm does that either.

I think the only safe thing here is to state that it's safe if a) the
resource is pinned down and b) the callers has previously waited
for the
kernel fences.

I also think we should put that wait for kernel fences into
dma_buf_pin(),
but that's maybe a later patch.
-Daniel



> +      */
> +     DMA_RESV_USAGE_KERNEL,
> +
>       /**
>        * @DMA_RESV_USAGE_WRITE: Implicit write synchronization.
>        *
> --
> 2.25.1
>

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Christian König

7 Dec 7 Dec

12:34 p.m.

New subject: [PATCH 21/24] dma-buf: add DMA_RESV_USAGE_BOOKKEEP

Add an usage for submissions independent of implicit sync but still interesting for memory management.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/dma-buf/dma-resv.c | 2 +- drivers/dma-buf/st-dma-resv.c | 2 +- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 +++--- drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 2 +- drivers/gpu/drm/qxl/qxl_debugfs.c | 2 +- drivers/gpu/drm/radeon/radeon_gem.c | 2 +- drivers/gpu/drm/radeon/radeon_mn.c | 2 +- drivers/gpu/drm/ttm/ttm_bo.c | 14 +++++++------- include/linux/dma-resv.h | 18 +++++++++++++++++- 15 files changed, 40 insertions(+), 24 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index a2a0b5b6c107..a058a3e805ab 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -548,7 +548,7 @@ int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src)

list = NULL;

- dma_resv_iter_begin(&cursor, src, DMA_RESV_USAGE_READ); + dma_resv_iter_begin(&cursor, src, DMA_RESV_USAGE_BOOKKEEP); dma_resv_for_each_fence_unlocked(&cursor, f) {

if (dma_resv_iter_is_restarted(&cursor)) { diff --git a/drivers/dma-buf/st-dma-resv.c b/drivers/dma-buf/st-dma-resv.c index 062b57d63fa6..8ace9e84c845 100644 --- a/drivers/dma-buf/st-dma-resv.c +++ b/drivers/dma-buf/st-dma-resv.c @@ -296,7 +296,7 @@ int dma_resv(void) int r;

spin_lock_init(&fence_lock); - for (usage = DMA_RESV_USAGE_KERNEL; usage <= DMA_RESV_USAGE_READ; + for (usage = DMA_RESV_USAGE_KERNEL; usage <= DMA_RESV_USAGE_BOOKKEEP; ++usage) { r = subtests(tests, (void *)(unsigned long)usage); if (r) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 4a469831afe3..bbfd7a1e42e8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -246,7 +246,7 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo, */ replacement = dma_fence_get_stub(); dma_resv_replace_fences(bo->tbo.base.resv, ef->base.context, - replacement, DMA_RESV_USAGE_READ); + replacement, DMA_RESV_USAGE_BOOKKEEP); dma_fence_put(replacement); return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c index 490d2a7a3e2b..ddf46802b1ff 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c @@ -111,7 +111,7 @@ void amdgpu_pasid_free_delayed(struct dma_resv *resv, struct dma_fence *fence; int r;

- r = dma_resv_get_singleton(resv, DMA_RESV_USAGE_READ, &fence); + r = dma_resv_get_singleton(resv, DMA_RESV_USAGE_BOOKKEEP, &fence); if (r) goto fallback;

@@ -139,7 +139,7 @@ void amdgpu_pasid_free_delayed(struct dma_resv *resv, /* Not enough memory for the delayed delete, as last resort * block for all the fences to complete. */ - dma_resv_wait_timeout(resv, DMA_RESV_USAGE_READ, + dma_resv_wait_timeout(resv, DMA_RESV_USAGE_BOOKKEEP, false, MAX_SCHEDULE_TIMEOUT); amdgpu_pasid_free(pasid); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c index 86f5248676b0..b86c0b8252a5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c @@ -75,7 +75,7 @@ static bool amdgpu_mn_invalidate_gfx(struct mmu_interval_notifier *mni,

mmu_interval_set_seq(mni, cur_seq);

- r = dma_resv_wait_timeout(bo->tbo.base.resv, DMA_RESV_USAGE_READ, + r = dma_resv_wait_timeout(bo->tbo.base.resv, DMA_RESV_USAGE_BOOKKEEP, false, MAX_SCHEDULE_TIMEOUT); mutex_unlock(&adev->notifier_lock); if (r <= 0) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c index 183623806056..1447f009a957 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c @@ -260,7 +260,7 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, struct amdgpu_sync *sync, return -EINVAL;

/* TODO: Use DMA_RESV_USAGE_READ here */ - dma_resv_for_each_fence(&cursor, resv, DMA_RESV_USAGE_READ, f) { + dma_resv_for_each_fence(&cursor, resv, DMA_RESV_USAGE_BOOKKEEP, f) { dma_fence_chain_for_each(f, f) { struct dma_fence_chain *chain = to_dma_fence_chain(f);

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index fd339762f534..3740d6e788ef 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1361,7 +1361,7 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo, * be resident to run successfully */ dma_resv_for_each_fence(&resv_cursor, bo->base.resv, - DMA_RESV_USAGE_READ, f) { + DMA_RESV_USAGE_BOOKKEEP, f) { if (amdkfd_fence_check_mm(f, current->mm)) return false; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 9eac1e783bbb..6a3ccd344a9f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2105,7 +2105,7 @@ static void amdgpu_vm_prt_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm) struct dma_resv_iter cursor; struct dma_fence *fence;

- dma_resv_for_each_fence(&cursor, resv, DMA_RESV_USAGE_READ, fence) { + dma_resv_for_each_fence(&cursor, resv, DMA_RESV_USAGE_BOOKKEEP, fence) { /* Add a callback for each fence in the reservation object */ amdgpu_vm_prt_get(adev); amdgpu_vm_add_prt_cb(adev, fence); @@ -2707,7 +2707,7 @@ bool amdgpu_vm_evictable(struct amdgpu_bo *bo) return true;

/* Don't evict VM page tables while they are busy */ - if (!dma_resv_test_signaled(bo->tbo.base.resv, DMA_RESV_USAGE_READ)) + if (!dma_resv_test_signaled(bo->tbo.base.resv, DMA_RESV_USAGE_BOOKKEEP)) return false;

/* Try to block ongoing updates */ @@ -2888,7 +2888,7 @@ void amdgpu_vm_adjust_size(struct amdgpu_device *adev, uint32_t min_vm_size, long amdgpu_vm_wait_idle(struct amdgpu_vm *vm, long timeout) { timeout = dma_resv_wait_timeout(vm->root.bo->tbo.base.resv, - DMA_RESV_USAGE_READ, + DMA_RESV_USAGE_BOOKKEEP, true, timeout); if (timeout <= 0) return timeout; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c index a200d3e66573..4115a222a853 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c @@ -66,7 +66,7 @@ bool __i915_gem_object_is_lmem(struct drm_i915_gem_object *obj) struct intel_memory_region *mr = READ_ONCE(obj->mm.region);

#ifdef CONFIG_LOCKDEP - GEM_WARN_ON(dma_resv_test_signaled(obj->base.resv, DMA_RESV_USAGE_READ) && + GEM_WARN_ON(dma_resv_test_signaled(obj->base.resv, DMA_RESV_USAGE_BOOKKEEP) && i915_gem_object_evictable(obj)); #endif return mr && (mr->type == INTEL_MEMORY_LOCAL || diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 0ccb91385f84..67b1fa845f22 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -85,7 +85,7 @@ static bool i915_gem_userptr_invalidate(struct mmu_interval_notifier *mni, return true;

/* we will unbind on next submission, still have userptr pins */ - r = dma_resv_wait_timeout(obj->base.resv, DMA_RESV_USAGE_READ, false, + r = dma_resv_wait_timeout(obj->base.resv, DMA_RESV_USAGE_BOOKKEEP, false, MAX_SCHEDULE_TIMEOUT); if (r <= 0) drm_err(&i915->drm, "(%ld) failed to wait for idle\n", r); diff --git a/drivers/gpu/drm/qxl/qxl_debugfs.c b/drivers/gpu/drm/qxl/qxl_debugfs.c index 33e5889d6608..2d9ed3b94574 100644 --- a/drivers/gpu/drm/qxl/qxl_debugfs.c +++ b/drivers/gpu/drm/qxl/qxl_debugfs.c @@ -62,7 +62,7 @@ qxl_debugfs_buffers_info(struct seq_file *m, void *data) int rel = 0;

dma_resv_iter_begin(&cursor, bo->tbo.base.resv, - DMA_RESV_USAGE_READ); + DMA_RESV_USAGE_BOOKKEEP); dma_resv_for_each_fence_unlocked(&cursor, fence) { if (dma_resv_iter_is_restarted(&cursor)) rel = 0; diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c index 71bf9299e45c..9587ab88bedd 100644 --- a/drivers/gpu/drm/radeon/radeon_gem.c +++ b/drivers/gpu/drm/radeon/radeon_gem.c @@ -162,7 +162,7 @@ static int radeon_gem_set_domain(struct drm_gem_object *gobj, if (domain == RADEON_GEM_DOMAIN_CPU) { /* Asking for cpu access wait for object idle */ r = dma_resv_wait_timeout(robj->tbo.base.resv, - DMA_RESV_USAGE_READ, + DMA_RESV_USAGE_BOOKKEEP, true, 30 * HZ); if (!r) r = -EBUSY; diff --git a/drivers/gpu/drm/radeon/radeon_mn.c b/drivers/gpu/drm/radeon/radeon_mn.c index 68ebeb1bdfff..29fe8423bd90 100644 --- a/drivers/gpu/drm/radeon/radeon_mn.c +++ b/drivers/gpu/drm/radeon/radeon_mn.c @@ -66,7 +66,7 @@ static bool radeon_mn_invalidate(struct mmu_interval_notifier *mn, return true; }

- r = dma_resv_wait_timeout(bo->tbo.base.resv, DMA_RESV_USAGE_READ, + r = dma_resv_wait_timeout(bo->tbo.base.resv, DMA_RESV_USAGE_BOOKKEEP, false, MAX_SCHEDULE_TIMEOUT); if (r <= 0) DRM_ERROR("(%ld) failed to wait for user bo\n", r); diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index ad83f42fc9ee..d3527d3f7b18 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -272,7 +272,7 @@ static void ttm_bo_flush_all_fences(struct ttm_buffer_object *bo) struct dma_resv_iter cursor; struct dma_fence *fence;

- dma_resv_iter_begin(&cursor, resv, DMA_RESV_USAGE_READ); + dma_resv_iter_begin(&cursor, resv, DMA_RESV_USAGE_BOOKKEEP); dma_resv_for_each_fence_unlocked(&cursor, fence) { if (!fence->ops->signaled) dma_fence_enable_sw_signaling(fence); @@ -301,7 +301,7 @@ static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo, struct dma_resv *resv = &bo->base._resv; int ret;

- if (dma_resv_test_signaled(resv, DMA_RESV_USAGE_READ)) + if (dma_resv_test_signaled(resv, DMA_RESV_USAGE_BOOKKEEP)) ret = 0; else ret = -EBUSY; @@ -313,7 +313,7 @@ static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo, dma_resv_unlock(bo->base.resv); spin_unlock(&bo->bdev->lru_lock);

- lret = dma_resv_wait_timeout(resv, DMA_RESV_USAGE_READ, + lret = dma_resv_wait_timeout(resv, DMA_RESV_USAGE_BOOKKEEP, interruptible, 30 * HZ);

@@ -418,7 +418,7 @@ static void ttm_bo_release(struct kref *kref) * fences block for the BO to become idle */ dma_resv_wait_timeout(bo->base.resv, - DMA_RESV_USAGE_READ, false, + DMA_RESV_USAGE_BOOKKEEP, false, 30 * HZ); }

@@ -429,7 +429,7 @@ static void ttm_bo_release(struct kref *kref) ttm_mem_io_free(bdev, bo->resource); }

- if (!dma_resv_test_signaled(bo->base.resv, DMA_RESV_USAGE_READ) || + if (!dma_resv_test_signaled(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP) || !dma_resv_trylock(bo->base.resv)) { /* The BO is not idle, resurrect it for delayed destroy */ ttm_bo_flush_all_fences(bo); @@ -1074,13 +1074,13 @@ int ttm_bo_wait(struct ttm_buffer_object *bo, long timeout = 15 * HZ;

if (no_wait) { - if (dma_resv_test_signaled(bo->base.resv, DMA_RESV_USAGE_READ)) + if (dma_resv_test_signaled(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP)) return 0; else return -EBUSY; }

- timeout = dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_READ, + timeout = dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_BOOKKEEP, interruptible, timeout); if (timeout < 0) return timeout; diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 29d799991496..07ae5b00c1fa 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -55,7 +55,7 @@ struct dma_resv_list; * This enum describes the different use cases for a dma_resv object and * controls which fences are returned when queried. * - * An important fact is that there is the order KERNEL<WRITE<READ and + * An important fact is that there is the order KERNEL<WRITE<READ<BOOKKEEP and * when the dma_resv object is asked for fences for one use case the fences * for the lower use case are returned as well. * @@ -93,6 +93,22 @@ enum dma_resv_usage { * an implicit read dependency. */ DMA_RESV_USAGE_READ, + + /** + * @DMA_RESV_USAGE_BOOKKEEP: No implicit sync. + * + * This should be used by submissions which don't want to participate in + * implicit synchronization. + * + * The most common case are submissions with explicit synchronization, + * but also things like preemption fences as well as page table updates + * might use this. + * + * The kernel memory management *always* need to wait for those fences + * before moving or freeing the resource protected by the dma_resv + * object. + */ + DMA_RESV_USAGE_BOOKKEEP };

/**

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

10:10 p.m.

New subject: [PATCH 21/24] dma-buf: add DMA_RESV_USAGE_BOOKKEEP

On Tue, Dec 07, 2021 at 01:34:08PM +0100, Christian König wrote:

...

Add an usage for submissions independent of implicit sync but still interesting for memory management.

Signed-off-by: Christian König christian.koenig@amd.com

Focusing on the kerneldoc first to get semantics agreed.

...

diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 29d799991496..07ae5b00c1fa 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -55,7 +55,7 @@ struct dma_resv_list;

This enum describes the different use cases for a dma_resv object and

controls which fences are returned when queried.

An important fact is that there is the order KERNEL<WRITE<READ and

An important fact is that there is the order KERNEL<WRITE<READ<BOOKKEEP and

when the dma_resv object is asked for fences for one use case the fences

for the lower use case are returned as well.

@@ -93,6 +93,22 @@ enum dma_resv_usage { * an implicit read dependency. */ DMA_RESV_USAGE_READ,
/**
* @DMA_RESV_USAGE_BOOKKEEP: No implicit sync.
*
* This should be used by submissions which don't want to participate in
* implicit synchronization.

Uh we might still have a disagreement, because that isn't really what drivers which added opt-in implicit sync have done thus far. Minimally we need a note that some drivers also use _READ for this.

...

```
*
```

* The most common case are submissions with explicit synchronization,

* but also things like preemption fences as well as page table updates

```
* might use this.
```
```
*
```

* The kernel memory management *always* need to wait for those fences

* before moving or freeing the resource protected by the dma_resv

```
* object.
```

Yeah this is the comment I wanted to see for READ, and which now is in bookkeeping (where it's correct in the end). I think we still should have something in the READ comment (and here) explaining that there could very well be writes hiding behind this, and that the kernel cannot assume anything about what's going on in general (maybe some drivers enforce read/write through command parsers).

Also all the text in dma_buf.resv needs to be updated to use the right constants instead of words. -Daniel

...

*/
DMA_RESV_USAGE_BOOKKEEP
}; /** -- 2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

7 Dec 7 Dec

12:34 p.m.

New subject: [PATCH 22/24] dma-buf: wait for map to complete for static attachments

We have previously done that in the individual drivers but it is more defensive to move that into the common code.

Dynamic attachments should wait for map operations to complete by themselves.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/dma-buf/dma-buf.c | 18 +++++++++++++++--- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 14 +------------- drivers/gpu/drm/nouveau/nouveau_prime.c | 17 +---------------- drivers/gpu/drm/radeon/radeon_prime.c | 16 +++------------- 4 files changed, 20 insertions(+), 45 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 528983d3ba64..d3dd602c4753 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -660,12 +660,24 @@ static struct sg_table * __map_dma_buf(struct dma_buf_attachment *attach, enum dma_data_direction direction) { struct sg_table *sg_table; + signed long ret;

sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction); + if (IS_ERR_OR_NULL(sg_table)) + return sg_table; + + if (!dma_buf_attachment_is_dynamic(attach)) { + ret = dma_resv_wait_timeout(attach->dmabuf->resv, + DMA_RESV_USAGE_KERNEL, true, + MAX_SCHEDULE_TIMEOUT); + if (ret < 0) { + attach->dmabuf->ops->unmap_dma_buf(attach, sg_table, + direction); + return ERR_PTR(ret); + } + }

- if (!IS_ERR_OR_NULL(sg_table)) - mangle_sg_table(sg_table); - + mangle_sg_table(sg_table); return sg_table; }

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index 4896c876ffec..33127bd56c64 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -102,21 +102,9 @@ static int amdgpu_dma_buf_pin(struct dma_buf_attachment *attach) { struct drm_gem_object *obj = attach->dmabuf->priv; struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj); - int r;

/* pin buffer into GTT */ - r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT); - if (r) - return r; - - if (bo->tbo.moving) { - r = dma_fence_wait(bo->tbo.moving, true); - if (r) { - amdgpu_bo_unpin(bo); - return r; - } - } - return 0; + return amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT); }

/** diff --git a/drivers/gpu/drm/nouveau/nouveau_prime.c b/drivers/gpu/drm/nouveau/nouveau_prime.c index 60019d0532fc..347488685f74 100644 --- a/drivers/gpu/drm/nouveau/nouveau_prime.c +++ b/drivers/gpu/drm/nouveau/nouveau_prime.c @@ -93,22 +93,7 @@ int nouveau_gem_prime_pin(struct drm_gem_object *obj) if (ret) return -EINVAL;

- ret = ttm_bo_reserve(&nvbo->bo, false, false, NULL); - if (ret) - goto error; - - if (nvbo->bo.moving) - ret = dma_fence_wait(nvbo->bo.moving, true); - - ttm_bo_unreserve(&nvbo->bo); - if (ret) - goto error; - - return ret; - -error: - nouveau_bo_unpin(nvbo); - return ret; + return 0; }

void nouveau_gem_prime_unpin(struct drm_gem_object *obj) diff --git a/drivers/gpu/drm/radeon/radeon_prime.c b/drivers/gpu/drm/radeon/radeon_prime.c index 4a90807351e7..42a87948e28c 100644 --- a/drivers/gpu/drm/radeon/radeon_prime.c +++ b/drivers/gpu/drm/radeon/radeon_prime.c @@ -77,19 +77,9 @@ int radeon_gem_prime_pin(struct drm_gem_object *obj)

/* pin buffer into GTT */ ret = radeon_bo_pin(bo, RADEON_GEM_DOMAIN_GTT, NULL); - if (unlikely(ret)) - goto error; - - if (bo->tbo.moving) { - ret = dma_fence_wait(bo->tbo.moving, false); - if (unlikely(ret)) { - radeon_bo_unpin(bo); - goto error; - } - } - - bo->prime_shared_count++; -error: + if (likely(ret == 0)) + bo->prime_shared_count++; + radeon_bo_unreserve(bo); return ret; }

-- 2.25.1

Daniel Vetter

22 Dec 22 Dec

10:16 p.m.

New subject: [PATCH 22/24] dma-buf: wait for map to complete for static attachments

On Tue, Dec 07, 2021 at 01:34:09PM +0100, Christian König wrote:

...

We have previously done that in the individual drivers but it is more defensive to move that into the common code.

Dynamic attachments should wait for map operations to complete by themselves.

Signed-off-by: Christian König christian.koenig@amd.com

i915 should probably stop reinveinting so much stuff here and align more ...

I do wonder whether we want the same for dma_buf_pin(), or at least document that for dynamic attachments, you still need to sync even if it's pinned. Especially since your kerneldoc for the usage flags suggests that waiting isn't needed, but after this patch waiting _is_ needed even for dynamic importers.

So there is a gap here I think, and I deleted my r-b tag that I already typed again. Or do I miss something?

Minimally needs accurate docs, but I'm leaning towards an unconditional dma_resv_wait() in dma_buf_pin() for safety's sake.

...

drivers/dma-buf/dma-buf.c | 18 +++++++++++++++--- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 14 +------------- drivers/gpu/drm/nouveau/nouveau_prime.c | 17 +---------------- drivers/gpu/drm/radeon/radeon_prime.c | 16 +++------------- 4 files changed, 20 insertions(+), 45 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 528983d3ba64..d3dd602c4753 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -660,12 +660,24 @@ static struct sg_table * __map_dma_buf(struct dma_buf_attachment *attach, enum dma_data_direction direction) { struct sg_table *sg_table;

signed long ret;

sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction);
if (IS_ERR_OR_NULL(sg_table))
return sg_table;
if (!dma_buf_attachment_is_dynamic(attach)) {
ret = dma_resv_wait_timeout(attach->dmabuf->resv,

Another place where this dma_resv_wait() wrapper would be good. I think we should have it :-)

Cheers, Daniel

...

			    DMA_RESV_USAGE_KERNEL, true,
			    MAX_SCHEDULE_TIMEOUT);
if (ret < 0) {
	attach->dmabuf->ops->unmap_dma_buf(attach, sg_table,
					   direction);
	return ERR_PTR(ret);
}
}
if (!IS_ERR_OR_NULL(sg_table))
mangle_sg_table(sg_table);
mangle_sg_table(sg_table); return sg_table;

} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c index 4896c876ffec..33127bd56c64 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c @@ -102,21 +102,9 @@ static int amdgpu_dma_buf_pin(struct dma_buf_attachment *attach) { struct drm_gem_object *obj = attach->dmabuf->priv; struct amdgpu_bo *bo = gem_to_amdgpu_bo(obj);

int r;

/* pin buffer into GTT */
r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT);

if (r)
return r;
if (bo->tbo.moving) {
r = dma_fence_wait(bo->tbo.moving, true);
if (r) {
	amdgpu_bo_unpin(bo);
	return r;
}
}

return 0;
return amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT);

} /** diff --git a/drivers/gpu/drm/nouveau/nouveau_prime.c b/drivers/gpu/drm/nouveau/nouveau_prime.c index 60019d0532fc..347488685f74 100644 --- a/drivers/gpu/drm/nouveau/nouveau_prime.c +++ b/drivers/gpu/drm/nouveau/nouveau_prime.c @@ -93,22 +93,7 @@ int nouveau_gem_prime_pin(struct drm_gem_object *obj) if (ret) return -EINVAL;
ret = ttm_bo_reserve(&nvbo->bo, false, false, NULL);

if (ret)
goto error;
if (nvbo->bo.moving)
ret = dma_fence_wait(nvbo->bo.moving, true);
ttm_bo_unreserve(&nvbo->bo);

if (ret)
goto error;
return ret;
-error:

nouveau_bo_unpin(nvbo);

return ret;

return 0;

} void nouveau_gem_prime_unpin(struct drm_gem_object *obj) diff --git a/drivers/gpu/drm/radeon/radeon_prime.c b/drivers/gpu/drm/radeon/radeon_prime.c index 4a90807351e7..42a87948e28c 100644 --- a/drivers/gpu/drm/radeon/radeon_prime.c +++ b/drivers/gpu/drm/radeon/radeon_prime.c @@ -77,19 +77,9 @@ int radeon_gem_prime_pin(struct drm_gem_object *obj) /* pin buffer into GTT */ ret = radeon_bo_pin(bo, RADEON_GEM_DOMAIN_GTT, NULL);
if (unlikely(ret))
goto error;
if (bo->tbo.moving) {
ret = dma_fence_wait(bo->tbo.moving, false);
if (unlikely(ret)) {
	radeon_bo_unpin(bo);
	goto error;
}
}

bo->prime_shared_count++;
-error:
if (likely(ret == 0))
bo->prime_shared_count++;
radeon_bo_unreserve(bo); return ret;
}

2.25.1

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

7 Dec 7 Dec

12:34 p.m.

New subject: [PATCH 23/24] amdgpu: remove DMA-buf fence workaround

Not needed any more now we have that inside the framework.

Signed-off-by: Christian König christian.koenig@amd.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h | 1 - drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 52 +++------------------ 2 files changed, 6 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h index 044b41f0bfd9..529d52a204cf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h @@ -34,7 +34,6 @@ struct amdgpu_fpriv; struct amdgpu_bo_list_entry { struct ttm_validate_buffer tv; struct amdgpu_bo_va *bo_va; - struct dma_fence_chain *chain; uint32_t priority; struct page **user_pages; bool user_invalidated; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 92091e800022..413606d10080 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -576,14 +576,6 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);

e->bo_va = amdgpu_vm_bo_find(vm, bo); - - if (bo->tbo.base.dma_buf && !amdgpu_bo_explicit_sync(bo)) { - e->chain = dma_fence_chain_alloc(); - if (!e->chain) { - r = -ENOMEM; - goto error_validate; - } - } }

amdgpu_cs_get_threshold_for_moves(p->adev, &p->bytes_moved_threshold, @@ -634,13 +626,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, }

error_validate: - if (r) { - amdgpu_bo_list_for_each_entry(e, p->bo_list) { - dma_fence_chain_free(e->chain); - e->chain = NULL; - } + if (r) ttm_eu_backoff_reservation(&p->ticket, &p->validated); - } out: return r; } @@ -680,17 +667,9 @@ static void amdgpu_cs_parser_fini(struct amdgpu_cs_parser *parser, int error, { unsigned i;

- if (error && backoff) { - struct amdgpu_bo_list_entry *e; - - amdgpu_bo_list_for_each_entry(e, parser->bo_list) { - dma_fence_chain_free(e->chain); - e->chain = NULL; - } - + if (error && backoff) ttm_eu_backoff_reservation(&parser->ticket, &parser->validated); - }

for (i = 0; i < parser->num_post_deps; i++) { drm_syncobj_put(parser->post_deps[i].syncobj); @@ -1265,29 +1244,10 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,

amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);

- amdgpu_bo_list_for_each_entry(e, p->bo_list) { - struct dma_resv *resv = e->tv.bo->base.resv; - struct dma_fence_chain *chain = e->chain; - struct dma_resv_iter cursor; - struct dma_fence *fence; - - if (!chain) - continue; - - /* - * Work around dma_resv shortcommings by wrapping up the - * submission in a dma_fence_chain and add it as exclusive - * fence. - */ - dma_resv_for_each_fence(&cursor, resv, - DMA_RESV_USAGE_WRITE, - fence) { - break; - } - dma_fence_chain_init(chain, fence, dma_fence_get(p->fence), 1); - dma_resv_add_fence(resv, &chain->base, DMA_RESV_USAGE_WRITE); - e->chain = NULL; - } + /* For now manually add the resulting fence as writer as well */ + amdgpu_bo_list_for_each_entry(e, p->bo_list) + dma_resv_add_fence(e->tv.bo->base.resv, p->fence, + DMA_RESV_USAGE_WRITE);

ttm_eu_fence_buffer_objects(&p->ticket, &p->validated, p->fence); mutex_unlock(&p->adev->notifier_lock);

-- 2.25.1

Christian König

12:34 p.m.

New subject: [PATCH 24/24] drm/ttm: remove bo->moving

This is now handled by the DMA-buf framework in the dma_resv obj.

Signed-off-by: Christian König christian.koenig@amd.com --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 13 ++++--- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 7 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c | 11 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 11 ++++-- drivers/gpu/drm/ttm/ttm_bo.c | 10 ++---- drivers/gpu/drm/ttm/ttm_bo_util.c | 7 ---- drivers/gpu/drm/ttm/ttm_bo_vm.c | 34 +++++++------------ drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 6 ---- include/drm/ttm/ttm_bo_api.h | 2 -- 9 files changed, 40 insertions(+), 61 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index bbfd7a1e42e8..7bd39e5d36dd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -2330,6 +2330,8 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef) struct amdgpu_bo *bo = mem->bo; uint32_t domain = mem->domain; struct kfd_mem_attachment *attachment; + struct dma_resv_iter cursor; + struct dma_fence *fence;

total_size += amdgpu_bo_size(bo);

@@ -2344,10 +2346,13 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef) goto validate_map_fail; } } - ret = amdgpu_sync_fence(&sync_obj, bo->tbo.moving); - if (ret) { - pr_debug("Memory eviction: Sync BO fence failed. Try again\n"); - goto validate_map_fail; + dma_resv_for_each_fence(&cursor, bo->tbo.base.resv, + DMA_RESV_USAGE_KERNEL, fence) { + ret = amdgpu_sync_fence(&sync_obj, fence); + if (ret) { + pr_debug("Memory eviction: Sync BO fence failed. Try again\n"); + goto validate_map_fail; + } } list_for_each_entry(attachment, &mem->attachments, list) { if (!attachment->is_mapped) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index a40ede9bccd0..3881a503a7bf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -608,9 +608,8 @@ int amdgpu_bo_create(struct amdgpu_device *adev, if (unlikely(r)) goto fail_unreserve;

- amdgpu_bo_fence(bo, fence, false); - dma_fence_put(bo->tbo.moving); - bo->tbo.moving = dma_fence_get(fence); + dma_resv_add_fence(bo->tbo.base.resv, fence, + DMA_RESV_USAGE_KERNEL); dma_fence_put(fence); } if (!bp->resv) @@ -1290,7 +1289,7 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo)

r = amdgpu_fill_buffer(abo, AMDGPU_POISON, bo->base.resv, &fence); if (!WARN_ON(r)) { - amdgpu_bo_fence(abo, fence, false); + dma_resv_add_fence(bo->base.resv, fence, DMA_RESV_USAGE_KERNEL); dma_fence_put(fence); }

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c index e3fbf0f10add..31913ae86de6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c @@ -74,13 +74,12 @@ static int amdgpu_vm_cpu_update(struct amdgpu_vm_update_params *p, { unsigned int i; uint64_t value; - int r; + long r;

- if (vmbo->bo.tbo.moving) { - r = dma_fence_wait(vmbo->bo.tbo.moving, true); - if (r) - return r; - } + r = dma_resv_wait_timeout(vmbo->bo.tbo.base.resv, DMA_RESV_USAGE_KERNEL, + true, MAX_SCHEDULE_TIMEOUT); + if (r < 0) + return r;

pe += (unsigned long)amdgpu_bo_kptr(&vmbo->bo);

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c index dbb551762805..bdb44cee19d3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c @@ -204,14 +204,19 @@ static int amdgpu_vm_sdma_update(struct amdgpu_vm_update_params *p, struct amdgpu_bo *bo = &vmbo->bo; enum amdgpu_ib_pool_type pool = p->immediate ? AMDGPU_IB_POOL_IMMEDIATE : AMDGPU_IB_POOL_DELAYED; + struct dma_resv_iter cursor; unsigned int i, ndw, nptes; + struct dma_fence *fence; uint64_t *pte; int r;

/* Wait for PD/PT moves to be completed */ - r = amdgpu_sync_fence(&p->job->sync, bo->tbo.moving); - if (r) - return r; + dma_resv_for_each_fence(&cursor, bo->tbo.base.resv, + DMA_RESV_USAGE_KERNEL, fence) { + r = amdgpu_sync_fence(&p->job->sync, fence); + if (r) + return r; + }

do { ndw = p->num_dw_left; diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index d3527d3f7b18..7b9e0f46f121 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -468,7 +468,6 @@ static void ttm_bo_release(struct kref *kref) dma_resv_unlock(bo->base.resv);

atomic_dec(&ttm_glob.bo_count); - dma_fence_put(bo->moving); bo->destroy(bo); }

@@ -737,9 +736,8 @@ int ttm_mem_evict_first(struct ttm_device *bdev, }

/* - * Add the last move fence to the BO and reserve a new shared slot. We only use - * a shared slot to avoid unecessary sync and rely on the subsequent bo move to - * either stall or use an exclusive fence respectively set bo->moving. + * Add the last move fence to the BO as kernel dependency and reserve a new + * fence slot. */ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, struct ttm_resource_manager *man, @@ -769,9 +767,6 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, dma_fence_put(fence); return ret; } - - dma_fence_put(bo->moving); - bo->moving = fence; return 0; }

@@ -978,7 +973,6 @@ int ttm_bo_init_reserved(struct ttm_device *bdev, bo->bdev = bdev; bo->type = type; bo->page_alignment = page_alignment; - bo->moving = NULL; bo->pin_count = 0; bo->sg = sg; if (resv) { diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index b9cfb62c4b6e..95de2691ee7c 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -229,7 +229,6 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, atomic_inc(&ttm_glob.bo_count); INIT_LIST_HEAD(&fbo->base.ddestroy); INIT_LIST_HEAD(&fbo->base.lru); - fbo->base.moving = NULL; drm_vma_node_reset(&fbo->base.base.vma_node);

kref_init(&fbo->base.kref); @@ -496,9 +495,6 @@ static int ttm_bo_move_to_ghost(struct ttm_buffer_object *bo, * operation has completed. */

- dma_fence_put(bo->moving); - bo->moving = dma_fence_get(fence); - ret = ttm_buffer_object_transfer(bo, &ghost_obj); if (ret) return ret; @@ -543,9 +539,6 @@ static void ttm_bo_move_pipeline_evict(struct ttm_buffer_object *bo, spin_unlock(&from->move_lock);

ttm_resource_free(bo, &bo->resource); - - dma_fence_put(bo->moving); - bo->moving = dma_fence_get(fence); }

int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo, diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 08ba083a80d2..5b324f245265 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -46,17 +46,13 @@ static vm_fault_t ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo, struct vm_fault *vmf) { - vm_fault_t ret = 0; - int err = 0; - - if (likely(!bo->moving)) - goto out_unlock; + long err = 0;

/* * Quick non-stalling check for idle. */ - if (dma_fence_is_signaled(bo->moving)) - goto out_clear; + if (dma_resv_test_signaled(bo->base.resv, DMA_RESV_USAGE_KERNEL)) + return 0;

/* * If possible, avoid waiting for GPU with mmap_lock @@ -64,34 +60,30 @@ static vm_fault_t ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo, * is the first attempt. */ if (fault_flag_allow_retry_first(vmf->flags)) { - ret = VM_FAULT_RETRY; if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT) - goto out_unlock; + return VM_FAULT_RETRY;

ttm_bo_get(bo); mmap_read_unlock(vmf->vma->vm_mm); - (void) dma_fence_wait(bo->moving, true); + (void)dma_resv_wait_timeout(bo->base.resv, + DMA_RESV_USAGE_KERNEL, true, + MAX_SCHEDULE_TIMEOUT); dma_resv_unlock(bo->base.resv); ttm_bo_put(bo); - goto out_unlock; + return VM_FAULT_RETRY; }

/* * Ordinary wait. */ - err = dma_fence_wait(bo->moving, true); - if (unlikely(err != 0)) { - ret = (err != -ERESTARTSYS) ? VM_FAULT_SIGBUS : + err = dma_resv_wait_timeout(bo->base.resv, DMA_RESV_USAGE_KERNEL, true, + MAX_SCHEDULE_TIMEOUT); + if (unlikely(err < 0)) { + return (err != -ERESTARTSYS) ? VM_FAULT_SIGBUS : VM_FAULT_NOPAGE; - goto out_unlock; }

-out_clear: - dma_fence_put(bo->moving); - bo->moving = NULL; - -out_unlock: - return ret; + return 0; }

static unsigned long ttm_bo_io_mem_pfn(struct ttm_buffer_object *bo, diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c index 9e3dcbb573e7..40cc2c13e963 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c @@ -1166,12 +1166,6 @@ int vmw_resources_clean(struct vmw_buffer_object *vbo, pgoff_t start, *num_prefault = __KERNEL_DIV_ROUND_UP(last_cleaned - res_start, PAGE_SIZE); vmw_bo_fence_single(bo, NULL); - if (bo->moving) - dma_fence_put(bo->moving); - - return dma_resv_get_singleton(bo->base.resv, - DMA_RESV_USAGE_KERNEL, - &bo->moving); }

return 0; diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h index c17b2df9178b..4c7134550262 100644 --- a/include/drm/ttm/ttm_bo_api.h +++ b/include/drm/ttm/ttm_bo_api.h @@ -97,7 +97,6 @@ struct ttm_tt; * @lru: List head for the lru list. * @ddestroy: List head for the delayed destroy list. * @swap: List head for swap LRU list. - * @moving: Fence set when BO is moving * @offset: The current GPU offset, which can have different meanings * depending on the memory type. For SYSTEM type memory, it should be 0. * @cur_placement: Hint of current placement. @@ -150,7 +149,6 @@ struct ttm_buffer_object { * Members protected by a bo reservation. */

- struct dma_fence *moving; unsigned priority; unsigned pin_count;

-- 2.25.1

Christian König

17 Dec 17 Dec

2:39 p.m.

Hi Daniel,

looks like this is going nowhere and you don't seem to have time to review.

What can we do?

Thanks, Christian.

Am 07.12.21 um 13:33 schrieb Christian König:

...

Hi Daniel,

just a gentle ping that you wanted to take a look at this.

Not much changed compared to the last version, only a minor bugfix in the dma_resv_get_singleton error handling.

Regards, Christian.

Daniel Vetter

22 Dec 22 Dec

10:17 p.m.

On Fri, Dec 17, 2021 at 03:39:52PM +0100, Christian König wrote:

...

Hi Daniel,

looks like this is going nowhere and you don't seem to have time to review.

What can we do?

cc more people, you didn't cc any of the driver folks :-)

Also I did find some review before I disappeared, back on 10th Jan.

Cheers, Daniel

...

Thanks, Christian.

Am 07.12.21 um 13:33 schrieb Christian König:

...
Hi Daniel,

just a gentle ping that you wanted to take a look at this.

Not much changed compared to the last version, only a minor bugfix in the dma_resv_get_singleton error handling.

Regards, Christian.

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

23 Dec 23 Dec

9:11 a.m.

Am 22.12.21 um 23:17 schrieb Daniel Vetter:

...

On Fri, Dec 17, 2021 at 03:39:52PM +0100, Christian König wrote:

...
Hi Daniel,

looks like this is going nowhere and you don't seem to have time to review.

What can we do?

cc more people, you didn't cc any of the driver folks :-)

Well I've CCed more people and lists and the first round of the patches. Just wanted to get some more comments from you first before widening the audience.

...

Also I did find some review before I disappeared, back on 10th Jan.

Good, then I have at least something todo for the first week on January.

Happy holidays, Christian.

...

Cheers, Daniel

...
Thanks, Christian.

Am 07.12.21 um 13:33 schrieb Christian König:

...
Hi Daniel,

just a gentle ping that you wanted to take a look at this.

Not much changed compared to the last version, only a minor bugfix in the dma_resv_get_singleton error handling.

Regards, Christian.

Daniel Vetter

14 Jan 14 Jan

4:35 p.m.

On Thu, Dec 23, 2021 at 10:11:20AM +0100, Christian König wrote:

...

Am 22.12.21 um 23:17 schrieb Daniel Vetter:

...
On Fri, Dec 17, 2021 at 03:39:52PM +0100, Christian König wrote:

...
Hi Daniel,

looks like this is going nowhere and you don't seem to have time to review.

What can we do?

cc more people, you didn't cc any of the driver folks :-)

Well I've CCed more people and lists and the first round of the patches. Just wanted to get some more comments from you first before widening the audience.

Ime it's good to just always spam driver authors on big stuff like this, increases the odds more folks get involved. And in the end we need the entire subsystem to understand this (or at least not accidentally break the rules you roll out now like we've done in the past).

Plus you'll get the driver acks faster that way :-) -Daniel

...

...
Also I did find some review before I disappeared, back on 10th Jan.

Good, then I have at least something todo for the first week on January.

Happy holidays, Christian.

...
Cheers, Daniel

...
Thanks, Christian.

Am 07.12.21 um 13:33 schrieb Christian König:

...
Hi Daniel,

just a gentle ping that you wanted to take a look at this.

Not much changed compared to the last version, only a minor bugfix in the dma_resv_get_singleton error handling.

Regards, Christian.

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

1537

days inactive

1623

days old

linaro-mm-sig@lists.linaro.org

57 comments

participants

tags (0)

participants (3)

Christian König
Daniel Vetter
Jason Ekstrand