On Mon, Nov 17, 2025 at 08:36:20AM -0700, Alex Williamson wrote:
> On Tue, 11 Nov 2025 09:54:22 +0100
> Christian König <christian.koenig(a)amd.com> wrote:
>
> > On 11/10/25 21:42, Alex Williamson wrote:
> > > On Thu, 6 Nov 2025 16:16:45 +0200
> > > Leon Romanovsky <leon(a)kernel.org> wrote:
> > >
> > >> Changelog:
> > >> v7:
> > >> * Dropped restore_revoke flag and added vfio_pci_dma_buf_move
> > >> to reverse loop.
> > >> * Fixed spelling errors in documentation patch.
> > >> * Rebased on top of v6.18-rc3.
> > >> * Added include to stddef.h to vfio.h, to keep uapi header file independent.
> > >
> > > I think we're winding down on review comments. It'd be great to get
> > > p2pdma and dma-buf acks on this series. Otherwise it's been posted
> > > enough that we'll assume no objections. Thanks,
> >
> > Already have it on my TODO list to take a closer look, but no idea when that will be.
> >
> > This patch set is on place 4 or 5 on a rather long list of stuff to review/finish.
>
> Hi Christian,
>
> Gentle nudge. Leon posted v8[1] last week, which is not drawing any
> new comments. Do you foresee having time for review that I should
> still hold off merging for v6.19 a bit longer? Thanks,
I really want this merged this cycle, along with the iommufd part,
which means it needs to go into your tree by very early next week on a
shared branch so I can do the iommufd part on top.
It is the last blocking kernel piece to conclude the viommu support
roll out into qemu for iommufd which quite a lot of people have been
working on for years now.
IMHO there is nothing profound in the dmabuf patch, it was written by
the expert in the new DMA API operation, and doesn't form any
troublesome API contracts. It is also the same basic code as from the
v1 in July just moved into dmabuf .c files instead of vfio .c files at
Christoph's request.
My hope is DRM folks will pick up the baton and continue to improve
this to move other drivers away from dma_map_resource(). Simona told
me people have wanted DMA API improvements for ages, now we have them,
now is the time!
Any remarks after the fact can be addressed incrementally.
If there are no concrete technical remarks please take it. 6 months is
long enough to wait for feedback.
Thanks,
Jason
On Tue, Nov 18, 2025 at 07:59:20AM +0000, Ankit Agrawal wrote:
> + if (nvdev->resmem.memlength && region_index == RESMEM_REGION_INDEX) {
> + /*
> + * The P2P properties of the non-BAR memory is the same as the
> + * BAR memory, so just use the provider for index 0. Someday
> + * when CXL gets P2P support we could create CXLish providers
> + * for the non-BAR memory.
> + */
> + mem_region = &nvdev->resmem;
> + } else if (region_index == USEMEM_REGION_INDEX) {
> + /*
> + * This is actually cachable memory and isn't treated as P2P in
> + * the chip. For now we have no way to push cachable memory
> + * through everything and the Grace HW doesn't care what caching
> + * attribute is programmed into the SMMU. So use BAR 0.
> + */
> + mem_region = &nvdev->usemem;
> + }
> +
>
> Can we replace this with nvgrace_gpu_memregion()?
Yes, looks like
But we need to preserve the comments above as well somehow.
Jason
On 11/14/25 11:50, Tvrtko Ursulin wrote:
>> @@ -569,12 +577,12 @@ void dma_fence_release(struct kref *kref)
>> spin_unlock_irqrestore(fence->lock, flags);
>> }
>> - rcu_read_unlock();
>> -
>> - if (fence->ops->release)
>> - fence->ops->release(fence);
>> + ops = rcu_dereference(fence->ops);
>> + if (ops->release)
>> + ops->release(fence);
>> else
>> dma_fence_free(fence);
>> + rcu_read_unlock();
>
> Risk being a spin lock in the release callback will trigger a warning on PREEMPT_RT. But at least the current code base does not have anything like that AFAICS so I guess it is okay.
I don't think that this is a problem. When PREEMPT_RT is enabled both RCU and spinlocks become preemptible.
So as far as I know it is perfectly valid to grab a spinlock under an rcu read side critical section.
>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
>> index 64639e104110..77f07735f556 100644
>> --- a/include/linux/dma-fence.h
>> +++ b/include/linux/dma-fence.h
>> @@ -66,7 +66,7 @@ struct seq_file;
>> */
>> struct dma_fence {
>> spinlock_t *lock;
>> - const struct dma_fence_ops *ops;
>> + const struct dma_fence_ops __rcu *ops;
>> /*
>> * We clear the callback list on kref_put so that by the time we
>> * release the fence it is unused. No one should be adding to the
>> @@ -218,6 +218,10 @@ struct dma_fence_ops {
>> * timed out. Can also return other error values on custom implementations,
>> * which should be treated as if the fence is signaled. For example a hardware
>> * lockup could be reported like that.
>> + *
>> + * Implementing this callback prevents the BO from detaching after
>
> s/BO/fence/
>
>> + * signaling and so it is mandatory for the module providing the
>> + * dma_fence_ops to stay loaded as long as the dma_fence exists.
>> */
>> signed long (*wait)(struct dma_fence *fence,
>> bool intr, signed long timeout);
>> @@ -229,6 +233,13 @@ struct dma_fence_ops {
>> * Can be called from irq context. This callback is optional. If it is
>> * NULL, then dma_fence_free() is instead called as the default
>> * implementation.
>> + *
>> + * Implementing this callback prevents the BO from detaching after
>
> Ditto.
Both fixed, thanks.
>
>> + * signaling and so it is mandatory for the module providing the
>> + * dma_fence_ops to stay loaded as long as the dma_fence exists.
>> + *
>> + * If the callback is implemented the memory backing the dma_fence
>> + * object must be freed RCU safe.
>> */
>> void (*release)(struct dma_fence *fence);
>> @@ -418,13 +429,19 @@ const char __rcu *dma_fence_timeline_name(struct dma_fence *fence);
>> static inline bool
>> dma_fence_is_signaled_locked(struct dma_fence *fence)
>> {
>> + const struct dma_fence_ops *ops;
>> +
>> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>> return true;
>> - if (fence->ops->signaled && fence->ops->signaled(fence)) {
>> + rcu_read_lock();
>> + ops = rcu_dereference(fence->ops);
>> + if (ops->signaled && ops->signaled(fence)) {
>> + rcu_read_unlock();
>> dma_fence_signal_locked(fence);
>> return true;
>> }
>> + rcu_read_unlock();
>> return false;
>> }
>> @@ -448,13 +465,19 @@ dma_fence_is_signaled_locked(struct dma_fence *fence)
>> static inline bool
>> dma_fence_is_signaled(struct dma_fence *fence)
>> {
>> + const struct dma_fence_ops *ops;
>> +
>> if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>> return true;
>> - if (fence->ops->signaled && fence->ops->signaled(fence)) {
>> + rcu_read_lock();
>> + ops = rcu_dereference(fence->ops);
>> + if (ops->signaled && ops->signaled(fence)) {
>> + rcu_read_unlock();
>
> With the unlocked version two threads could race and one could make the fence->lock go away just around here, before the dma_fence_signal below will take it. It seems it is only safe to rcu_read_unlock before signaling if using the embedded fence (later in the series). Can you think of a downside to holding the rcu read lock to after signaling? that would make it safe I think.
Well it's good to talk about it but I think that it is not necessary to protect the lock in this particular case.
See the RCU protection is only for the fence->ops pointer, but the lock can be taken way after the fence is already signaled.
That's why I came up with the patch to move the lock into the fence in the first place.
Regards,
Christian.
>
> Regards,
>
> Tvrtko
>
>> dma_fence_signal(fence);
>> return true;
>> }
>> + rcu_read_unlock();
>> return false;
>> }
>
On Thu, Nov 13, 2025 at 11:37:12AM -0700, Alex Williamson wrote:
> > The latest series for interconnect negotation to exchange a phys_addr is:
> > https://lore.kernel.org/r/20251027044712.1676175-1-vivek.kasireddy@intel.com
>
> If this is in development, why are we pursuing a vfio specific
> temporary "private interconnect" here rather than building on that
> work? What are the gaps/barriers/timeline?
I broadly don't expect to see an agreement on the above for probably
half a year, and I see no reason to hold this up for it. Many people
are asking for this P2P support to be completed in iommufd.
Further, I think the above will be easier to work on when we have this
merged as an example that can consume it in a different way. Right now
it is too theoretical, IMHO.
> I don't see any uAPI changes here, is there any visibility to userspace
> whether IOMMUFD supports this feature or is it simply a try and fail
> approach?
So far we haven't done discoverably things beyond try and fail.
I'd be happy if the userspace folks doing libvirt or whatever came
with some requests/patches for discoverability. It is not just this
feature, but things like nesting and IOMMU driver support and so on.
> The latter makes it difficult for management tools to select
> whether to choose a VM configuration based on IOMMUFD or legacy vfio if
> p2p DMA is a requirement. Thanks,
In alot of cases it isn't really a choice as you need iommufd to do an
accelerated vIOMMU.
But yes, it would be nice to eventually automatically use iommufd
whenever possible.
Thanks,
Jason
On 11/13/25 17:23, Philipp Stanner wrote:
> On Thu, 2025-11-13 at 15:51 +0100, Christian König wrote:
>> Using the inline lock is now the recommended way for dma_fence implementations.
>>
>> So use this approach for the scheduler fences as well just in case if
>> anybody uses this as blueprint for its own implementation.
>>
>> Also saves about 4 bytes for the external spinlock.
>
> So you changed your mind and want to keep this patch?
Actually it was you who changed my mind.
When we want to document that using the internal lock is now the norm and all implementations should switch to that if possible we should push as much as possible for using this in the driver common code as well.
Regards,
Christian.
>
> P.
>
On 11/13/25 17:20, Philipp Stanner wrote:
> On Thu, 2025-11-13 at 15:51 +0100, Christian König wrote:
>> Hi everyone,
>>
>> dma_fences have ever lived under the tyranny dictated by the module
>> lifetime of their issuer, leading to crashes should anybody still holding
>> a reference to a dma_fence when the module of the issuer was unloaded.
>>
>> The basic problem is that when buffer are shared between drivers
>> dma_fence objects can leak into external drivers and stay there even
>> after they are signaled. The dma_resv object for example only lazy releases
>> dma_fences.
>>
>> So what happens is that when the module who originally created the dma_fence
>> unloads the dma_fence_ops function table becomes unavailable as well and so
>> any attempt to release the fence crashes the system.
>>
>> Previously various approaches have been discussed, including changing the
>> locking semantics of the dma_fence callbacks (by me) as well as using the
>> drm scheduler as intermediate layer (by Sima) to disconnect dma_fences
>> from their actual users, but none of them are actually solving all problems.
>>
>> Tvrtko did some really nice prerequisite work by protecting the returned
>> strings of the dma_fence_ops by RCU. This way dma_fence creators where
>> able to just wait for an RCU grace period after fence signaling before
>> they could be save to free those data structures.
>>
>> Now this patch set here goes a step further and protects the whole
>> dma_fence_ops structure by RCU, so that after the fence signals the
>> pointer to the dma_fence_ops is set to NULL when there is no wait nor
>> release callback given. All functionality which use the dma_fence_ops
>> reference are put inside an RCU critical section, except for the
>> deprecated issuer specific wait and of course the optional release
>> callback.
>>
>> Additional to the RCU changes the lock protecting the dma_fence state
>> previously had to be allocated external. This set here now changes the
>> functionality to make that external lock optional and allows dma_fences
>> to use an inline lock and be self contained.
>>
>> This patch set addressed all previous code review comments and is based
>> on drm-tip, includes my changes for amdgpu as well as Mathew's patches for XE.
>>
>> Going to push the core DMA-buf changes to drm-misc-next as soon as I get
>> the appropriate rb. The driver specific changes can go upstream through
>> the driver channels as necessary.
>
> No changelog? :(
On the cover letter? For dma-buf patches we usually do that on the individual patches.
Christian.
>
> P.
>
>>
>> Please review and comment,
>> Christian.
>>
>>
>