Linaro-mm-sig

linaro-mm-sig@lists.linaro.org

13 participants
3149 discussions

Re: [PATCH v7 00/11] vfio/pci: Allow MMIO regions to be exported through dma-buf

by Jason Gunthorpe

On Mon, Nov 17, 2025 at 08:36:20AM -0700, Alex Williamson wrote: > On Tue, 11 Nov 2025 09:54:22 +0100 > Christian König <christian.koenig(a)amd.com> wrote: > > > On 11/10/25 21:42, Alex Williamson wrote: > > > On Thu, 6 Nov 2025 16:16:45 +0200 > > > Leon Romanovsky <leon(a)kernel.org> wrote: > > > > > >> Changelog: > > >> v7: > > >> * Dropped restore_revoke flag and added vfio_pci_dma_buf_move > > >> to reverse loop. > > >> * Fixed spelling errors in documentation patch. > > >> * Rebased on top of v6.18-rc3. > > >> * Added include to stddef.h to vfio.h, to keep uapi header file independent. > > > > > > I think we're winding down on review comments. It'd be great to get > > > p2pdma and dma-buf acks on this series. Otherwise it's been posted > > > enough that we'll assume no objections. Thanks, > > > > Already have it on my TODO list to take a closer look, but no idea when that will be. > > > > This patch set is on place 4 or 5 on a rather long list of stuff to review/finish. > > Hi Christian, > > Gentle nudge. Leon posted v8[1] last week, which is not drawing any > new comments. Do you foresee having time for review that I should > still hold off merging for v6.19 a bit longer? Thanks, I really want this merged this cycle, along with the iommufd part, which means it needs to go into your tree by very early next week on a shared branch so I can do the iommufd part on top. It is the last blocking kernel piece to conclude the viommu support roll out into qemu for iommufd which quite a lot of people have been working on for years now. IMHO there is nothing profound in the dmabuf patch, it was written by the expert in the new DMA API operation, and doesn't form any troublesome API contracts. It is also the same basic code as from the v1 in July just moved into dmabuf .c files instead of vfio .c files at Christoph's request. My hope is DRM folks will pick up the baton and continue to improve this to move other drivers away from dma_map_resource(). Simona told me people have wanted DMA API improvements for ages, now we have them, now is the time! Any remarks after the fact can be addressed incrementally. If there are no concrete technical remarks please take it. 6 months is long enough to wait for feedback. Thanks, Jason

19 hours, 21 minutes

Re: [PATCH 0/9] Initial DMABUF support for iommufd

by Jason Gunthorpe

On Thu, Nov 13, 2025 at 11:37:12AM -0700, Alex Williamson wrote: > > The latest series for interconnect negotation to exchange a phys_addr is: > > https://lore.kernel.org/r/20251027044712.1676175-1-vivek.kasireddy@intel.com > > If this is in development, why are we pursuing a vfio specific > temporary "private interconnect" here rather than building on that > work? What are the gaps/barriers/timeline? I broadly don't expect to see an agreement on the above for probably half a year, and I see no reason to hold this up for it. Many people are asking for this P2P support to be completed in iommufd. Further, I think the above will be easier to work on when we have this merged as an example that can consume it in a different way. Right now it is too theoretical, IMHO. > I don't see any uAPI changes here, is there any visibility to userspace > whether IOMMUFD supports this feature or is it simply a try and fail > approach? So far we haven't done discoverably things beyond try and fail. I'd be happy if the userspace folks doing libvirt or whatever came with some requests/patches for discoverability. It is not just this feature, but things like nesting and IOMMU driver support and so on. > The latter makes it difficult for management tools to select > whether to choose a VM configuration based on IOMMUFD or legacy vfio if > p2p DMA is a requirement. Thanks, In alot of cases it isn't really a choice as you need iommufd to do an accelerated vIOMMU. But yes, it would be nice to eventually automatically use iommufd whenever possible. Thanks, Jason

20 hours, 47 minutes

Re: [PATCH 08/18] drm/sched: use inline locks for the drm-sched-fence

by Christian König

On 11/13/25 17:23, Philipp Stanner wrote: > On Thu, 2025-11-13 at 15:51 +0100, Christian König wrote: >> Using the inline lock is now the recommended way for dma_fence implementations. >> >> So use this approach for the scheduler fences as well just in case if >> anybody uses this as blueprint for its own implementation. >> >> Also saves about 4 bytes for the external spinlock. > > So you changed your mind and want to keep this patch? Actually it was you who changed my mind. When we want to document that using the internal lock is now the norm and all implementations should switch to that if possible we should push as much as possible for using this in the driver common code as well. Regards, Christian. > > P. >

21 hours, 5 minutes

Re: Independence for dma_fences! v3

by Christian König

On 11/13/25 17:20, Philipp Stanner wrote: > On Thu, 2025-11-13 at 15:51 +0100, Christian König wrote: >> Hi everyone, >> >> dma_fences have ever lived under the tyranny dictated by the module >> lifetime of their issuer, leading to crashes should anybody still holding >> a reference to a dma_fence when the module of the issuer was unloaded. >> >> The basic problem is that when buffer are shared between drivers >> dma_fence objects can leak into external drivers and stay there even >> after they are signaled. The dma_resv object for example only lazy releases >> dma_fences. >> >> So what happens is that when the module who originally created the dma_fence >> unloads the dma_fence_ops function table becomes unavailable as well and so >> any attempt to release the fence crashes the system. >> >> Previously various approaches have been discussed, including changing the >> locking semantics of the dma_fence callbacks (by me) as well as using the >> drm scheduler as intermediate layer (by Sima) to disconnect dma_fences >> from their actual users, but none of them are actually solving all problems. >> >> Tvrtko did some really nice prerequisite work by protecting the returned >> strings of the dma_fence_ops by RCU. This way dma_fence creators where >> able to just wait for an RCU grace period after fence signaling before >> they could be save to free those data structures. >> >> Now this patch set here goes a step further and protects the whole >> dma_fence_ops structure by RCU, so that after the fence signals the >> pointer to the dma_fence_ops is set to NULL when there is no wait nor >> release callback given. All functionality which use the dma_fence_ops >> reference are put inside an RCU critical section, except for the >> deprecated issuer specific wait and of course the optional release >> callback. >> >> Additional to the RCU changes the lock protecting the dma_fence state >> previously had to be allocated external. This set here now changes the >> functionality to make that external lock optional and allows dma_fences >> to use an inline lock and be self contained. >> >> This patch set addressed all previous code review comments and is based >> on drm-tip, includes my changes for amdgpu as well as Mathew's patches for XE. >> >> Going to push the core DMA-buf changes to drm-misc-next as soon as I get >> the appropriate rb. The driver specific changes can go upstream through >> the driver channels as necessary. > > No changelog? :( On the cover letter? For dma-buf patches we usually do that on the individual patches. Christian. > > P. > >> >> Please review and comment, >> Christian. >> >> >

21 hours, 10 minutes

Re: [PATCH v2 18/20] drm/amdgpu: rename amdgpu_fill_buffer as amdgpu_ttm_clear_buffer

by Christian König

On 11/13/25 17:05, Pierre-Eric Pelloux-Prayer wrote: > This is the only use case for this function. > > --- > v2: amdgpu_ttm_clear_buffer instead of amdgpu_clear_buffer > --- > > Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> Reviewed-by: Christian König <christian.koenig(a)amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 8 +++---- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 26 ++++++++++------------ > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 15 ++++++------- > 3 files changed, 23 insertions(+), 26 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > index 4490b19752b8..4b9518097899 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > @@ -725,8 +725,8 @@ int amdgpu_bo_create(struct amdgpu_device *adev, > bo->tbo.resource->mem_type == TTM_PL_VRAM) { > struct dma_fence *fence; > > - r = amdgpu_fill_buffer(NULL, bo, 0, NULL, &fence, NULL, > - true, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER); > + r = amdgpu_ttm_clear_buffer(NULL, bo, NULL, &fence, NULL, > + true, AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER); > if (unlikely(r)) > goto fail_unreserve; > > @@ -1324,8 +1324,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) > if (r) > goto out; > > - r = amdgpu_fill_buffer(NULL, abo, 0, &bo->base._resv, &fence, NULL, > - false, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); > + r = amdgpu_ttm_clear_buffer(NULL, abo, &bo->base._resv, &fence, NULL, > + false, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); > if (WARN_ON(r)) > goto out; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index df05768c3817..0a55bc4ea91f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -433,9 +433,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, > (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) { > struct dma_fence *wipe_fence = NULL; > > - r = amdgpu_fill_buffer(entity, > - abo, 0, NULL, &wipe_fence, fence, > - false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); > + r = amdgpu_ttm_clear_buffer(entity, > + abo, NULL, &wipe_fence, fence, > + false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); > if (r) { > goto error; > } else if (wipe_fence) { > @@ -2418,11 +2418,10 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, > } > > /** > - * amdgpu_fill_buffer - fill a buffer with a given value > + * amdgpu_ttm_clear_buffer - fill a buffer with 0 > * @entity: optional entity to use. If NULL, the clearing entities will be > * used to load-balance the partial clears > * @bo: the bo to fill > - * @src_data: the value to set > * @resv: fences contained in this reservation will be used as dependencies. > * @out_fence: the fence from the last clear will be stored here. It might be > * NULL if no job was run. > @@ -2432,14 +2431,13 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, > * @k_job_id: trace id > * > */ > -int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity, > - struct amdgpu_bo *bo, > - uint32_t src_data, > - struct dma_resv *resv, > - struct dma_fence **out_fence, > - struct dma_fence *dependency, > - bool consider_clear_status, > - u64 k_job_id) > +int amdgpu_ttm_clear_buffer(struct amdgpu_ttm_buffer_entity *entity, > + struct amdgpu_bo *bo, > + struct dma_resv *resv, > + struct dma_fence **out_fence, > + struct dma_fence *dependency, > + bool consider_clear_status, > + u64 k_job_id) > { > struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); > struct dma_fence *fence = NULL; > @@ -2486,7 +2484,7 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity, > goto error; > > r = amdgpu_ttm_fill_mem(ring, &entity->base, > - src_data, to, cur_size, resv, > + 0, to, cur_size, resv, > &next, true, k_job_id); > if (r) > goto error; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > index e01c2173d79f..585aee9a173b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > @@ -181,14 +181,13 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, > struct dma_resv *resv, > struct dma_fence **fence, > bool vm_needs_flush, uint32_t copy_flags); > -int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity, > - struct amdgpu_bo *bo, > - uint32_t src_data, > - struct dma_resv *resv, > - struct dma_fence **out_fence, > - struct dma_fence *dependency, > - bool consider_clear_status, > - u64 k_job_id); > +int amdgpu_ttm_clear_buffer(struct amdgpu_ttm_buffer_entity *entity, > + struct amdgpu_bo *bo, > + struct dma_resv *resv, > + struct dma_fence **out_fence, > + struct dma_fence *dependency, > + bool consider_clear_status, > + u64 k_job_id); > > int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo); > void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);

1 day, 2 hours

Re: [PATCH v2 10/20] drm/admgpu: handle resv dependencies in amdgpu_ttm_map_buffer

by Christian König

On 11/13/25 17:05, Pierre-Eric Pelloux-Prayer wrote: > If a resv object is passed, its fences are treated as a dependency > for the amdgpu_ttm_map_buffer operation. > > This will be used by amdgpu_bo_release_notify through > amdgpu_fill_buffer. Why should updating the GART window depend on fences in a resv object? Regards, Christian. > > Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 16 +++++++++++----- > 1 file changed, 11 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index b13f0993dbf1..411997db70eb 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -184,7 +184,8 @@ static int amdgpu_ttm_map_buffer(struct drm_sched_entity *entity, > struct amdgpu_res_cursor *mm_cur, > unsigned int window, struct amdgpu_ring *ring, > bool tmz, uint64_t *size, uint64_t *addr, > - struct dma_fence *dep) > + struct dma_fence *dep, > + struct dma_resv *resv) > { > struct amdgpu_device *adev = ring->adev; > unsigned int offset, num_pages, num_dw, num_bytes; > @@ -239,6 +240,10 @@ static int amdgpu_ttm_map_buffer(struct drm_sched_entity *entity, > if (dep) > drm_sched_job_add_dependency(&job->base, dma_fence_get(dep)); > > + if (resv) > + drm_sched_job_add_resv_dependencies(&job->base, resv, > + DMA_RESV_USAGE_BOOKKEEP); > + > src_addr = num_dw * 4; > src_addr += job->ibs[0].gpu_addr; > > @@ -332,14 +337,14 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, > r = amdgpu_ttm_map_buffer(&entity->base, > src->bo, src->mem, &src_mm, > entity->gart_window_id0, ring, tmz, &cur_size, &from, > - NULL); > + NULL, NULL); > if (r) > goto error; > > r = amdgpu_ttm_map_buffer(&entity->base, > dst->bo, dst->mem, &dst_mm, > entity->gart_window_id1, ring, tmz, &cur_size, &to, > - NULL); > + NULL, NULL); > if (r) > goto error; > > @@ -2451,7 +2456,7 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, > r = amdgpu_ttm_map_buffer(&entity->base, > &bo->tbo, bo->tbo.resource, &cursor, > entity->gart_window_id1, ring, false, &size, &addr, > - NULL); > + NULL, NULL); > if (r) > goto err; > > @@ -2506,7 +2511,8 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity, > &bo->tbo, bo->tbo.resource, &dst, > entity->gart_window_id1, ring, false, > &cur_size, &to, > - dependency); > + dependency, > + resv); > if (r) > goto error; >

1 day, 3 hours

Re: [PATCH v2 09/20] drm/amdgpu: pass optional dependency to amdgpu_fill_buffer

by Christian König

On 11/13/25 17:05, Pierre-Eric Pelloux-Prayer wrote: > In case the fill job depends on a previous fence, the caller can > now pass it to make sure the ordering of the jobs is correct. I don't think you need that patch any more. > > Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 22 ++++++++++++++++------ > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 1 + > 3 files changed, 18 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > index e7b2cae031b3..be3532134e46 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > @@ -1322,7 +1322,7 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) > goto out; > > r = amdgpu_fill_buffer(&adev->mman.clear_entities[0], abo, 0, &bo->base._resv, > - &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); > + &fence, NULL, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); > if (WARN_ON(r)) > goto out; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index e1f0567fd2d5..b13f0993dbf1 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -173,6 +173,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, > * @tmz: if we should setup a TMZ enabled mapping > * @size: in number of bytes to map, out number of bytes mapped > * @addr: resulting address inside the MC address space > + * @dep: optional dependency > * > * Setup one of the GART windows to access a specific piece of memory or return > * the physical address for local memory. > @@ -182,7 +183,8 @@ static int amdgpu_ttm_map_buffer(struct drm_sched_entity *entity, > struct ttm_resource *mem, > struct amdgpu_res_cursor *mm_cur, > unsigned int window, struct amdgpu_ring *ring, > - bool tmz, uint64_t *size, uint64_t *addr) > + bool tmz, uint64_t *size, uint64_t *addr, > + struct dma_fence *dep) > { > struct amdgpu_device *adev = ring->adev; > unsigned int offset, num_pages, num_dw, num_bytes; > @@ -234,6 +236,9 @@ static int amdgpu_ttm_map_buffer(struct drm_sched_entity *entity, > if (r) > return r; > > + if (dep) > + drm_sched_job_add_dependency(&job->base, dma_fence_get(dep)); > + > src_addr = num_dw * 4; > src_addr += job->ibs[0].gpu_addr; > > @@ -326,13 +331,15 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, > /* Map src to window 0 and dst to window 1. */ > r = amdgpu_ttm_map_buffer(&entity->base, > src->bo, src->mem, &src_mm, > - entity->gart_window_id0, ring, tmz, &cur_size, &from); > + entity->gart_window_id0, ring, tmz, &cur_size, &from, > + NULL); > if (r) > goto error; > > r = amdgpu_ttm_map_buffer(&entity->base, > dst->bo, dst->mem, &dst_mm, > - entity->gart_window_id1, ring, tmz, &cur_size, &to); > + entity->gart_window_id1, ring, tmz, &cur_size, &to, > + NULL); > if (r) > goto error; > > @@ -415,7 +422,7 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, > struct dma_fence *wipe_fence = NULL; > > r = amdgpu_fill_buffer(&adev->mman.move_entities[0], > - abo, 0, NULL, &wipe_fence, > + abo, 0, NULL, &wipe_fence, fence, > AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); > if (r) { > goto error; > @@ -2443,7 +2450,8 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, > > r = amdgpu_ttm_map_buffer(&entity->base, > &bo->tbo, bo->tbo.resource, &cursor, > - entity->gart_window_id1, ring, false, &size, &addr); > + entity->gart_window_id1, ring, false, &size, &addr, > + NULL); > if (r) > goto err; > > @@ -2469,6 +2477,7 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity, > uint32_t src_data, > struct dma_resv *resv, > struct dma_fence **f, > + struct dma_fence *dependency, > u64 k_job_id) > { > struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); > @@ -2496,7 +2505,8 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity, > r = amdgpu_ttm_map_buffer(&entity->base, > &bo->tbo, bo->tbo.resource, &dst, > entity->gart_window_id1, ring, false, > - &cur_size, &to); > + &cur_size, &to, > + dependency); > if (r) > goto error; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > index 9d4891e86675..e8f8165f5bcf 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > @@ -186,6 +186,7 @@ int amdgpu_fill_buffer(struct amdgpu_ttm_buffer_entity *entity, > uint32_t src_data, > struct dma_resv *resv, > struct dma_fence **f, > + struct dma_fence *dependency, > u64 k_job_id); > > int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);

1 day, 3 hours

Re: [PATCH v2 05/20] drm/amdgpu: pass the entity to use to ttm functions

by Felix Kuehling

On 2025-11-13 11:05, Pierre-Eric Pelloux-Prayer wrote: > This way the caller can select the one it wants to use. > > Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> I agree with Christian's comment to eliminate the ring parameter where it's implied by the entity. Other than that, the patch is Acked-by: Felix Kuehling <felix.kuehling(a)amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 3 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 75 +++++++++++-------- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 16 ++-- > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 +- > 5 files changed, 60 insertions(+), 41 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c > index 02c2479a8840..b59040a8771f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c > @@ -38,7 +38,8 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device *adev, unsigned size, > stime = ktime_get(); > for (i = 0; i < n; i++) { > struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; > - r = amdgpu_copy_buffer(ring, saddr, daddr, size, NULL, &fence, > + r = amdgpu_copy_buffer(ring, &adev->mman.default_entity.base, > + saddr, daddr, size, NULL, &fence, > false, 0); > if (r) > goto exit_do_move; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > index e08f58de4b17..c06c132a753c 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > @@ -1321,8 +1321,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) > if (r) > goto out; > > - r = amdgpu_fill_buffer(abo, 0, &bo->base._resv, &fence, true, > - AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); > + r = amdgpu_fill_buffer(&adev->mman.clear_entity, abo, 0, &bo->base._resv, > + &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); > if (WARN_ON(r)) > goto out; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index 42d448cd6a6d..c8d59ca2b3bd 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -164,6 +164,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, > > /** > * amdgpu_ttm_map_buffer - Map memory into the GART windows > + * @entity: entity to run the window setup job > * @bo: buffer object to map > * @mem: memory object to map > * @mm_cur: range to map > @@ -176,7 +177,8 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, > * Setup one of the GART windows to access a specific piece of memory or return > * the physical address for local memory. > */ > -static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo, > +static int amdgpu_ttm_map_buffer(struct drm_sched_entity *entity, > + struct ttm_buffer_object *bo, > struct ttm_resource *mem, > struct amdgpu_res_cursor *mm_cur, > unsigned int window, struct amdgpu_ring *ring, > @@ -224,7 +226,7 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo, > num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8); > num_bytes = num_pages * 8 * AMDGPU_GPU_PAGES_IN_CPU_PAGE; > > - r = amdgpu_job_alloc_with_ib(adev, &adev->mman.default_entity.base, > + r = amdgpu_job_alloc_with_ib(adev, entity, > AMDGPU_FENCE_OWNER_UNDEFINED, > num_dw * 4 + num_bytes, > AMDGPU_IB_POOL_DELAYED, &job, > @@ -274,6 +276,7 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo, > /** > * amdgpu_ttm_copy_mem_to_mem - Helper function for copy > * @adev: amdgpu device > + * @entity: entity to run the jobs > * @src: buffer/address where to read from > * @dst: buffer/address where to write to > * @size: number of bytes to copy > @@ -288,6 +291,7 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo, > */ > __attribute__((nonnull)) > static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, > + struct drm_sched_entity *entity, > const struct amdgpu_copy_mem *src, > const struct amdgpu_copy_mem *dst, > uint64_t size, bool tmz, > @@ -320,12 +324,14 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, > cur_size = min3(src_mm.size, dst_mm.size, 256ULL << 20); > > /* Map src to window 0 and dst to window 1. */ > - r = amdgpu_ttm_map_buffer(src->bo, src->mem, &src_mm, > + r = amdgpu_ttm_map_buffer(entity, > + src->bo, src->mem, &src_mm, > 0, ring, tmz, &cur_size, &from); > if (r) > goto error; > > - r = amdgpu_ttm_map_buffer(dst->bo, dst->mem, &dst_mm, > + r = amdgpu_ttm_map_buffer(entity, > + dst->bo, dst->mem, &dst_mm, > 1, ring, tmz, &cur_size, &to); > if (r) > goto error; > @@ -353,7 +359,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, > write_compress_disable)); > } > > - r = amdgpu_copy_buffer(ring, from, to, cur_size, resv, > + r = amdgpu_copy_buffer(ring, entity, from, to, cur_size, resv, > &next, true, copy_flags); > if (r) > goto error; > @@ -394,7 +400,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, > src.offset = 0; > dst.offset = 0; > > - r = amdgpu_ttm_copy_mem_to_mem(adev, &src, &dst, > + r = amdgpu_ttm_copy_mem_to_mem(adev, > + &adev->mman.move_entity.base, > + &src, &dst, > new_mem->size, > amdgpu_bo_encrypted(abo), > bo->base.resv, &fence); > @@ -406,8 +414,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, > (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) { > struct dma_fence *wipe_fence = NULL; > > - r = amdgpu_fill_buffer(abo, 0, NULL, &wipe_fence, > - false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); > + r = amdgpu_fill_buffer(&adev->mman.move_entity, > + abo, 0, NULL, &wipe_fence, > + AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); > if (r) { > goto error; > } else if (wipe_fence) { > @@ -2223,16 +2232,15 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) > } > > static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev, > + struct drm_sched_entity *entity, > unsigned int num_dw, > struct dma_resv *resv, > bool vm_needs_flush, > struct amdgpu_job **job, > - bool delayed, u64 k_job_id) > + u64 k_job_id) > { > enum amdgpu_ib_pool_type pool = AMDGPU_IB_POOL_DELAYED; > int r; > - struct drm_sched_entity *entity = delayed ? &adev->mman.clear_entity.base : > - &adev->mman.move_entity.base; > r = amdgpu_job_alloc_with_ib(adev, entity, > AMDGPU_FENCE_OWNER_UNDEFINED, > num_dw * 4, pool, job, k_job_id); > @@ -2252,7 +2260,9 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev, > DMA_RESV_USAGE_BOOKKEEP); > } > > -int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, > +int amdgpu_copy_buffer(struct amdgpu_ring *ring, > + struct drm_sched_entity *entity, > + uint64_t src_offset, > uint64_t dst_offset, uint32_t byte_count, > struct dma_resv *resv, > struct dma_fence **fence, > @@ -2274,8 +2284,8 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, > max_bytes = adev->mman.buffer_funcs->copy_max_bytes; > num_loops = DIV_ROUND_UP(byte_count, max_bytes); > num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8); > - r = amdgpu_ttm_prepare_job(adev, num_dw, > - resv, vm_needs_flush, &job, false, > + r = amdgpu_ttm_prepare_job(adev, entity, num_dw, > + resv, vm_needs_flush, &job, > AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER); > if (r) > return r; > @@ -2304,11 +2314,13 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, > return r; > } > > -static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data, > +static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, > + struct drm_sched_entity *entity, > + uint32_t src_data, > uint64_t dst_addr, uint32_t byte_count, > struct dma_resv *resv, > struct dma_fence **fence, > - bool vm_needs_flush, bool delayed, > + bool vm_needs_flush, > u64 k_job_id) > { > struct amdgpu_device *adev = ring->adev; > @@ -2321,8 +2333,8 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data, > max_bytes = adev->mman.buffer_funcs->fill_max_bytes; > num_loops = DIV_ROUND_UP_ULL(byte_count, max_bytes); > num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->fill_num_dw, 8); > - r = amdgpu_ttm_prepare_job(adev, num_dw, resv, vm_needs_flush, > - &job, delayed, k_job_id); > + r = amdgpu_ttm_prepare_job(adev, entity, num_dw, resv, > + vm_needs_flush, &job, k_job_id); > if (r) > return r; > > @@ -2386,13 +2398,14 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, > /* Never clear more than 256MiB at once to avoid timeouts */ > size = min(cursor.size, 256ULL << 20); > > - r = amdgpu_ttm_map_buffer(&bo->tbo, bo->tbo.resource, &cursor, > + r = amdgpu_ttm_map_buffer(&adev->mman.clear_entity.base, > + &bo->tbo, bo->tbo.resource, &cursor, > 1, ring, false, &size, &addr); > if (r) > goto err; > > - r = amdgpu_ttm_fill_mem(ring, 0, addr, size, resv, > - &next, true, true, > + r = amdgpu_ttm_fill_mem(ring, &adev->mman.clear_entity.base, 0, addr, size, resv, > + &next, true, > AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER); > if (r) > goto err; > @@ -2408,12 +2421,12 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, > return r; > } > > -int amdgpu_fill_buffer(struct amdgpu_bo *bo, > - uint32_t src_data, > - struct dma_resv *resv, > - struct dma_fence **f, > - bool delayed, > - u64 k_job_id) > +int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, > + struct amdgpu_bo *bo, > + uint32_t src_data, > + struct dma_resv *resv, > + struct dma_fence **f, > + u64 k_job_id) > { > struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); > struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; > @@ -2437,13 +2450,15 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo, > /* Never fill more than 256MiB at once to avoid timeouts */ > cur_size = min(dst.size, 256ULL << 20); > > - r = amdgpu_ttm_map_buffer(&bo->tbo, bo->tbo.resource, &dst, > + r = amdgpu_ttm_map_buffer(&entity->base, > + &bo->tbo, bo->tbo.resource, &dst, > 1, ring, false, &cur_size, &to); > if (r) > goto error; > > - r = amdgpu_ttm_fill_mem(ring, src_data, to, cur_size, resv, > - &next, true, delayed, k_job_id); > + r = amdgpu_ttm_fill_mem(ring, &entity->base, > + src_data, to, cur_size, resv, > + &next, true, k_job_id); > if (r) > goto error; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > index d2295d6c2b67..e1655f86a016 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > @@ -167,7 +167,9 @@ int amdgpu_ttm_init(struct amdgpu_device *adev); > void amdgpu_ttm_fini(struct amdgpu_device *adev); > void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, > bool enable); > -int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, > +int amdgpu_copy_buffer(struct amdgpu_ring *ring, > + struct drm_sched_entity *entity, > + uint64_t src_offset, > uint64_t dst_offset, uint32_t byte_count, > struct dma_resv *resv, > struct dma_fence **fence, > @@ -175,12 +177,12 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, > int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, > struct dma_resv *resv, > struct dma_fence **fence); > -int amdgpu_fill_buffer(struct amdgpu_bo *bo, > - uint32_t src_data, > - struct dma_resv *resv, > - struct dma_fence **fence, > - bool delayed, > - u64 k_job_id); > +int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, > + struct amdgpu_bo *bo, > + uint32_t src_data, > + struct dma_resv *resv, > + struct dma_fence **f, > + u64 k_job_id); > > int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo); > void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo); > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > index d74ff6e90590..09756132fa1b 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > @@ -157,7 +157,8 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys, > goto out_unlock; > } > > - r = amdgpu_copy_buffer(ring, gart_s, gart_d, size * PAGE_SIZE, > + r = amdgpu_copy_buffer(ring, &entity->base, > + gart_s, gart_d, size * PAGE_SIZE, > NULL, &next, true, 0); > if (r) { > dev_err(adev->dev, "fail %d to copy memory\n", r);

3 days, 16 hours

Re: [PATCH v2 05/20] drm/amdgpu: pass the entity to use to ttm functions

by Christian König

On 11/13/25 17:05, Pierre-Eric Pelloux-Prayer wrote: > This way the caller can select the one it wants to use. > > Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 3 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 4 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 75 +++++++++++-------- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 16 ++-- > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 +- > 5 files changed, 60 insertions(+), 41 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c > index 02c2479a8840..b59040a8771f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c > @@ -38,7 +38,8 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device *adev, unsigned size, > stime = ktime_get(); > for (i = 0; i < n; i++) { > struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; > - r = amdgpu_copy_buffer(ring, saddr, daddr, size, NULL, &fence, > + r = amdgpu_copy_buffer(ring, &adev->mman.default_entity.base, > + saddr, daddr, size, NULL, &fence, > false, 0); > if (r) > goto exit_do_move; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > index e08f58de4b17..c06c132a753c 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > @@ -1321,8 +1321,8 @@ void amdgpu_bo_release_notify(struct ttm_buffer_object *bo) > if (r) > goto out; > > - r = amdgpu_fill_buffer(abo, 0, &bo->base._resv, &fence, true, > - AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); > + r = amdgpu_fill_buffer(&adev->mman.clear_entity, abo, 0, &bo->base._resv, > + &fence, AMDGPU_KERNEL_JOB_ID_CLEAR_ON_RELEASE); > if (WARN_ON(r)) > goto out; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index 42d448cd6a6d..c8d59ca2b3bd 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -164,6 +164,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, > > /** > * amdgpu_ttm_map_buffer - Map memory into the GART windows > + * @entity: entity to run the window setup job > * @bo: buffer object to map > * @mem: memory object to map > * @mm_cur: range to map > @@ -176,7 +177,8 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, > * Setup one of the GART windows to access a specific piece of memory or return > * the physical address for local memory. > */ > -static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo, > +static int amdgpu_ttm_map_buffer(struct drm_sched_entity *entity, > + struct ttm_buffer_object *bo, Probably better to split this patch into multiple patches. One which changes amdgpu_ttm_map_buffer() and then another one or two for the higher level copy_buffer and fill_buffer functions. > struct ttm_resource *mem, > struct amdgpu_res_cursor *mm_cur, > unsigned int window, struct amdgpu_ring *ring, > @@ -224,7 +226,7 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo, > num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8); > num_bytes = num_pages * 8 * AMDGPU_GPU_PAGES_IN_CPU_PAGE; > > - r = amdgpu_job_alloc_with_ib(adev, &adev->mman.default_entity.base, > + r = amdgpu_job_alloc_with_ib(adev, entity, > AMDGPU_FENCE_OWNER_UNDEFINED, > num_dw * 4 + num_bytes, > AMDGPU_IB_POOL_DELAYED, &job, > @@ -274,6 +276,7 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo, > /** > * amdgpu_ttm_copy_mem_to_mem - Helper function for copy > * @adev: amdgpu device > + * @entity: entity to run the jobs > * @src: buffer/address where to read from > * @dst: buffer/address where to write to > * @size: number of bytes to copy > @@ -288,6 +291,7 @@ static int amdgpu_ttm_map_buffer(struct ttm_buffer_object *bo, > */ > __attribute__((nonnull)) > static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, > + struct drm_sched_entity *entity, > const struct amdgpu_copy_mem *src, > const struct amdgpu_copy_mem *dst, > uint64_t size, bool tmz, > @@ -320,12 +324,14 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, > cur_size = min3(src_mm.size, dst_mm.size, 256ULL << 20); > > /* Map src to window 0 and dst to window 1. */ > - r = amdgpu_ttm_map_buffer(src->bo, src->mem, &src_mm, > + r = amdgpu_ttm_map_buffer(entity, > + src->bo, src->mem, &src_mm, > 0, ring, tmz, &cur_size, &from); > if (r) > goto error; > > - r = amdgpu_ttm_map_buffer(dst->bo, dst->mem, &dst_mm, > + r = amdgpu_ttm_map_buffer(entity, > + dst->bo, dst->mem, &dst_mm, > 1, ring, tmz, &cur_size, &to); > if (r) > goto error; > @@ -353,7 +359,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, > write_compress_disable)); > } > > - r = amdgpu_copy_buffer(ring, from, to, cur_size, resv, > + r = amdgpu_copy_buffer(ring, entity, from, to, cur_size, resv, > &next, true, copy_flags); > if (r) > goto error; > @@ -394,7 +400,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, > src.offset = 0; > dst.offset = 0; > > - r = amdgpu_ttm_copy_mem_to_mem(adev, &src, &dst, > + r = amdgpu_ttm_copy_mem_to_mem(adev, > + &adev->mman.move_entity.base, > + &src, &dst, > new_mem->size, > amdgpu_bo_encrypted(abo), > bo->base.resv, &fence); > @@ -406,8 +414,9 @@ static int amdgpu_move_blit(struct ttm_buffer_object *bo, > (abo->flags & AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE)) { > struct dma_fence *wipe_fence = NULL; > > - r = amdgpu_fill_buffer(abo, 0, NULL, &wipe_fence, > - false, AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); > + r = amdgpu_fill_buffer(&adev->mman.move_entity, > + abo, 0, NULL, &wipe_fence, > + AMDGPU_KERNEL_JOB_ID_MOVE_BLIT); > if (r) { > goto error; > } else if (wipe_fence) { > @@ -2223,16 +2232,15 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) > } > > static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev, > + struct drm_sched_entity *entity, > unsigned int num_dw, > struct dma_resv *resv, > bool vm_needs_flush, > struct amdgpu_job **job, > - bool delayed, u64 k_job_id) > + u64 k_job_id) > { > enum amdgpu_ib_pool_type pool = AMDGPU_IB_POOL_DELAYED; > int r; > - struct drm_sched_entity *entity = delayed ? &adev->mman.clear_entity.base : > - &adev->mman.move_entity.base; > r = amdgpu_job_alloc_with_ib(adev, entity, > AMDGPU_FENCE_OWNER_UNDEFINED, > num_dw * 4, pool, job, k_job_id); > @@ -2252,7 +2260,9 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev, > DMA_RESV_USAGE_BOOKKEEP); > } > > -int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, > +int amdgpu_copy_buffer(struct amdgpu_ring *ring, > + struct drm_sched_entity *entity, > + uint64_t src_offset, > uint64_t dst_offset, uint32_t byte_count, > struct dma_resv *resv, > struct dma_fence **fence, > @@ -2274,8 +2284,8 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, > max_bytes = adev->mman.buffer_funcs->copy_max_bytes; > num_loops = DIV_ROUND_UP(byte_count, max_bytes); > num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8); > - r = amdgpu_ttm_prepare_job(adev, num_dw, > - resv, vm_needs_flush, &job, false, > + r = amdgpu_ttm_prepare_job(adev, entity, num_dw, > + resv, vm_needs_flush, &job, > AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER); > if (r) > return r; > @@ -2304,11 +2314,13 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, > return r; > } > > -static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data, > +static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, > + struct drm_sched_entity *entity, > + uint32_t src_data, > uint64_t dst_addr, uint32_t byte_count, > struct dma_resv *resv, > struct dma_fence **fence, > - bool vm_needs_flush, bool delayed, > + bool vm_needs_flush, > u64 k_job_id) > { > struct amdgpu_device *adev = ring->adev; > @@ -2321,8 +2333,8 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data, > max_bytes = adev->mman.buffer_funcs->fill_max_bytes; > num_loops = DIV_ROUND_UP_ULL(byte_count, max_bytes); > num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->fill_num_dw, 8); > - r = amdgpu_ttm_prepare_job(adev, num_dw, resv, vm_needs_flush, > - &job, delayed, k_job_id); > + r = amdgpu_ttm_prepare_job(adev, entity, num_dw, resv, > + vm_needs_flush, &job, k_job_id); > if (r) > return r; > > @@ -2386,13 +2398,14 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, > /* Never clear more than 256MiB at once to avoid timeouts */ > size = min(cursor.size, 256ULL << 20); > > - r = amdgpu_ttm_map_buffer(&bo->tbo, bo->tbo.resource, &cursor, > + r = amdgpu_ttm_map_buffer(&adev->mman.clear_entity.base, > + &bo->tbo, bo->tbo.resource, &cursor, > 1, ring, false, &size, &addr); > if (r) > goto err; > > - r = amdgpu_ttm_fill_mem(ring, 0, addr, size, resv, > - &next, true, true, > + r = amdgpu_ttm_fill_mem(ring, &adev->mman.clear_entity.base, 0, addr, size, resv, > + &next, true, > AMDGPU_KERNEL_JOB_ID_TTM_CLEAR_BUFFER); > if (r) > goto err; > @@ -2408,12 +2421,12 @@ int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, > return r; > } > > -int amdgpu_fill_buffer(struct amdgpu_bo *bo, > - uint32_t src_data, > - struct dma_resv *resv, > - struct dma_fence **f, > - bool delayed, > - u64 k_job_id) > +int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, > + struct amdgpu_bo *bo, > + uint32_t src_data, > + struct dma_resv *resv, > + struct dma_fence **f, > + u64 k_job_id) > { > struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); > struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; > @@ -2437,13 +2450,15 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo, > /* Never fill more than 256MiB at once to avoid timeouts */ > cur_size = min(dst.size, 256ULL << 20); > > - r = amdgpu_ttm_map_buffer(&bo->tbo, bo->tbo.resource, &dst, > + r = amdgpu_ttm_map_buffer(&entity->base, > + &bo->tbo, bo->tbo.resource, &dst, > 1, ring, false, &cur_size, &to); > if (r) > goto error; > > - r = amdgpu_ttm_fill_mem(ring, src_data, to, cur_size, resv, > - &next, true, delayed, k_job_id); > + r = amdgpu_ttm_fill_mem(ring, &entity->base, > + src_data, to, cur_size, resv, > + &next, true, k_job_id); > if (r) > goto error; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > index d2295d6c2b67..e1655f86a016 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > @@ -167,7 +167,9 @@ int amdgpu_ttm_init(struct amdgpu_device *adev); > void amdgpu_ttm_fini(struct amdgpu_device *adev); > void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, > bool enable); > -int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, > +int amdgpu_copy_buffer(struct amdgpu_ring *ring, > + struct drm_sched_entity *entity, If I'm not completely mistaken you should be able to drop the ring argument since that can be determined from the entity. Apart from that looks rather good to me. Regards, Christian. > + uint64_t src_offset, > uint64_t dst_offset, uint32_t byte_count, > struct dma_resv *resv, > struct dma_fence **fence, > @@ -175,12 +177,12 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, > int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, > struct dma_resv *resv, > struct dma_fence **fence); > -int amdgpu_fill_buffer(struct amdgpu_bo *bo, > - uint32_t src_data, > - struct dma_resv *resv, > - struct dma_fence **fence, > - bool delayed, > - u64 k_job_id); > +int amdgpu_fill_buffer(struct amdgpu_ttm_entity *entity, > + struct amdgpu_bo *bo, > + uint32_t src_data, > + struct dma_resv *resv, > + struct dma_fence **f, > + u64 k_job_id); > > int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo); > void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo); > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > index d74ff6e90590..09756132fa1b 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > @@ -157,7 +157,8 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys, > goto out_unlock; > } > > - r = amdgpu_copy_buffer(ring, gart_s, gart_d, size * PAGE_SIZE, > + r = amdgpu_copy_buffer(ring, &entity->base, > + gart_s, gart_d, size * PAGE_SIZE, > NULL, &next, true, 0); > if (r) { > dev_err(adev->dev, "fail %d to copy memory\n", r);

3 days, 23 hours

Re: [PATCH v2 03/20] drm/amdgpu: remove direct_submit arg from amdgpu_copy_buffer

by Christian König

On 11/13/25 17:05, Pierre-Eric Pelloux-Prayer wrote: > It was always false. > > Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer(a)amd.com> > Reviewed-by: Christian König <christian.koenig(a)amd.com> Please push to amd-staging-drm-next. > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 20 +++++++------------ > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 2 +- > drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- > 4 files changed, 10 insertions(+), 16 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c > index 199693369c7c..02c2479a8840 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c > @@ -39,7 +39,7 @@ static int amdgpu_benchmark_do_move(struct amdgpu_device *adev, unsigned size, > for (i = 0; i < n; i++) { > struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring; > r = amdgpu_copy_buffer(ring, saddr, daddr, size, NULL, &fence, > - false, false, 0); > + false, 0); > if (r) > goto exit_do_move; > r = dma_fence_wait(fence, false); > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index 3b46a24a8c48..c985f57fa227 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -354,7 +354,7 @@ static int amdgpu_ttm_copy_mem_to_mem(struct amdgpu_device *adev, > } > > r = amdgpu_copy_buffer(ring, from, to, cur_size, resv, > - &next, false, true, copy_flags); > + &next, true, copy_flags); > if (r) > goto error; > > @@ -2211,16 +2211,13 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, bool enable) > } > > static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev, > - bool direct_submit, > unsigned int num_dw, > struct dma_resv *resv, > bool vm_needs_flush, > struct amdgpu_job **job, > bool delayed, u64 k_job_id) > { > - enum amdgpu_ib_pool_type pool = direct_submit ? > - AMDGPU_IB_POOL_DIRECT : > - AMDGPU_IB_POOL_DELAYED; > + enum amdgpu_ib_pool_type pool = AMDGPU_IB_POOL_DELAYED; > int r; > struct drm_sched_entity *entity = delayed ? &adev->mman.low_pr : > &adev->mman.high_pr; > @@ -2246,7 +2243,7 @@ static int amdgpu_ttm_prepare_job(struct amdgpu_device *adev, > int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, > uint64_t dst_offset, uint32_t byte_count, > struct dma_resv *resv, > - struct dma_fence **fence, bool direct_submit, > + struct dma_fence **fence, > bool vm_needs_flush, uint32_t copy_flags) > { > struct amdgpu_device *adev = ring->adev; > @@ -2256,7 +2253,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, > unsigned int i; > int r; > > - if (!direct_submit && !ring->sched.ready) { > + if (!ring->sched.ready) { > dev_err(adev->dev, > "Trying to move memory with ring turned off.\n"); > return -EINVAL; > @@ -2265,7 +2262,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, > max_bytes = adev->mman.buffer_funcs->copy_max_bytes; > num_loops = DIV_ROUND_UP(byte_count, max_bytes); > num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->copy_num_dw, 8); > - r = amdgpu_ttm_prepare_job(adev, direct_submit, num_dw, > + r = amdgpu_ttm_prepare_job(adev, num_dw, > resv, vm_needs_flush, &job, false, > AMDGPU_KERNEL_JOB_ID_TTM_COPY_BUFFER); > if (r) > @@ -2283,10 +2280,7 @@ int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, > > amdgpu_ring_pad_ib(ring, &job->ibs[0]); > WARN_ON(job->ibs[0].length_dw > num_dw); > - if (direct_submit) > - r = amdgpu_job_submit_direct(job, ring, fence); > - else > - *fence = amdgpu_job_submit(job); > + *fence = amdgpu_job_submit(job); > if (r) > goto error_free; > > @@ -2315,7 +2309,7 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, uint32_t src_data, > max_bytes = adev->mman.buffer_funcs->fill_max_bytes; > num_loops = DIV_ROUND_UP_ULL(byte_count, max_bytes); > num_dw = ALIGN(num_loops * adev->mman.buffer_funcs->fill_num_dw, 8); > - r = amdgpu_ttm_prepare_job(adev, false, num_dw, resv, vm_needs_flush, > + r = amdgpu_ttm_prepare_job(adev, num_dw, resv, vm_needs_flush, > &job, delayed, k_job_id); > if (r) > return r; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > index 577ee04ce0bf..50e40380fe95 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h > @@ -166,7 +166,7 @@ void amdgpu_ttm_set_buffer_funcs_status(struct amdgpu_device *adev, > int amdgpu_copy_buffer(struct amdgpu_ring *ring, uint64_t src_offset, > uint64_t dst_offset, uint32_t byte_count, > struct dma_resv *resv, > - struct dma_fence **fence, bool direct_submit, > + struct dma_fence **fence, > bool vm_needs_flush, uint32_t copy_flags); > int amdgpu_ttm_clear_buffer(struct amdgpu_bo *bo, > struct dma_resv *resv, > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > index 46c84fc60af1..378af0b2aaa9 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c > @@ -153,7 +153,7 @@ svm_migrate_copy_memory_gart(struct amdgpu_device *adev, dma_addr_t *sys, > } > > r = amdgpu_copy_buffer(ring, gart_s, gart_d, size * PAGE_SIZE, > - NULL, &next, false, true, 0); > + NULL, &next, true, 0); > if (r) { > dev_err(adev->dev, "fail %d to copy memory\n", r); > goto out_unlock;

3 days, 23 hours

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig