Hi, Pierre-Eric
On Thu, 2025-11-13 at 17:05 +0100, Pierre-Eric Pelloux-Prayer wrote:
> Until now ttm stored a single pipelined eviction fence which means
> drivers had to use a single entity for these evictions.
>
> To lift this requirement, this commit allows up to 8 entities to
> be used.
>
> Ideally a dma_resv object would have been used as a container of
> the eviction fences, but the locking rules makes it complex.
> dma_resv all have the same ww_class, which means "Attempting to
> lock more mutexes after ww_acquire_done." is an error.
>
> One alternative considered was to introduced a 2nd ww_class for
> specific resv to hold a single "transient" lock (= the resv lock
> would only be held for a short period, without taking any other
> locks).
Wouldn't it be possible to use lockdep_set_class_and_name() to modify
the resv lock class for these particular resv objects after they are
allocated? Reusing the resv code certainly sounds attractive.
Thanks,
Thomas
On Wed, Nov 19, 2025 at 12:02:02AM +0000, Tian, Kevin wrote:
> > From: Keith Busch <kbusch(a)kernel.org>
> > Sent: Wednesday, November 19, 2025 4:19 AM
> >
> > On Tue, Nov 18, 2025 at 07:18:36AM +0000, Tian, Kevin wrote:
> > > > From: Leon Romanovsky <leon(a)kernel.org>
> > > > Sent: Tuesday, November 11, 2025 5:58 PM
> > > >
> > > > From: Leon Romanovsky <leonro(a)nvidia.com>
> > >
> > > not required with only your own s-o-b
> >
> > That's automatically appended when the sender and signer don't match.
> > It's not uncommon for developers to send from a kernel.org email but
> > sign off with a corporate account, or the other way around.
>
> Good to know.
Yes, in addition, I used to separate between code authorship and my
open-source activity. Code belongs to my employer and this is why corporate
address is used as an author, but all emails and communications are coming from
my kernel.org account.
Thanks
On Wed, Nov 19, 2025 at 05:54:55AM +0000, Tian, Kevin wrote:
> > From: Leon Romanovsky <leon(a)kernel.org>
> > Sent: Tuesday, November 11, 2025 5:58 PM
> > +
> > + if (dma->state && dma_use_iova(dma->state)) {
> > + WARN_ON_ONCE(mapped_len != size);
>
> then "goto err_unmap_dma".
It never should happen, there is no need to provide error unwind to
something that you won't get.
>
> Reviewed-by: Kevin Tian <kevin.tian(a)intel.com>
Thanks
On Tue, Nov 18, 2025 at 04:06:11PM -0800, Nicolin Chen wrote:
> On Tue, Nov 11, 2025 at 11:57:48AM +0200, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leonro(a)nvidia.com>
> >
> > Add dma_buf_map() and dma_buf_unmap() helpers to convert an array of
> > MMIO physical address ranges into scatter-gather tables with proper
> > DMA mapping.
> >
> > These common functions are a starting point and support any PCI
> > drivers creating mappings from their BAR's MMIO addresses. VFIO is one
> > case, as shortly will be RDMA. We can review existing DRM drivers to
> > refactor them separately. We hope this will evolve into routines to
> > help common DRM that include mixed CPU and MMIO mappings.
> >
> > Compared to the dma_map_resource() abuse this implementation handles
> > the complicated PCI P2P scenarios properly, especially when an IOMMU
> > is enabled:
> >
> > - Direct bus address mapping without IOVA allocation for
> > PCI_P2PDMA_MAP_BUS_ADDR, using pci_p2pdma_bus_addr_map(). This
> > happens if the IOMMU is enabled but the PCIe switch ACS flags allow
> > transactions to avoid the host bridge.
> >
> > Further, this handles the slightly obscure, case of MMIO with a
> > phys_addr_t that is different from the physical BAR programming
> > (bus offset). The phys_addr_t is converted to a dma_addr_t and
> > accommodates this effect. This enables certain real systems to
> > work, especially on ARM platforms.
> >
> > - Mapping through host bridge with IOVA allocation and DMA_ATTR_MMIO
> > attribute for MMIO memory regions (PCI_P2PDMA_MAP_THRU_HOST_BRIDGE).
> > This happens when the IOMMU is enabled and the ACS flags are forcing
> > all traffic to the IOMMU - ie for virtualization systems.
> >
> > - Cases where P2P is not supported through the host bridge/CPU. The
> > P2P subsystem is the proper place to detect this and block it.
> >
> > Helper functions fill_sg_entry() and calc_sg_nents() handle the
> > scatter-gather table construction, splitting large regions into
> > UINT_MAX-sized chunks to fit within sg->length field limits.
> >
> > Since the physical address based DMA API forbids use of the CPU list
> > of the scatterlist this will produce a mangled scatterlist that has
> > a fully zero-length and NULL'd CPU list. The list is 0 length,
> > all the struct page pointers are NULL and zero sized. This is stronger
> > and more robust than the existing mangle_sg_table() technique. It is
> > a future project to migrate DMABUF as a subsystem away from using
> > scatterlist for this data structure.
> >
> > Tested-by: Alex Mastro <amastro(a)fb.com>
> > Tested-by: Nicolin Chen <nicolinc(a)nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
>
> Reviewed-by: Nicolin Chen <nicolinc(a)nvidia.com>
>
> With a nit:
>
> > +err_unmap_dma:
> > + if (!i || !dma->state) {
> > + ; /* Do nothing */
> > + } else if (dma_use_iova(dma->state)) {
> > + dma_iova_destroy(attach->dev, dma->state, mapped_len, dir,
> > + DMA_ATTR_MMIO);
> > + } else {
> > + for_each_sgtable_dma_sg(&dma->sgt, sgl, i)
> > + dma_unmap_phys(attach->dev, sg_dma_address(sgl),
> > + sg_dma_len(sgl), dir, DMA_ATTR_MMIO);
>
> Would it be safer to skip dma_unmap_phys() the range [i, nents)?
[i, nents) is not supposed to be in SG list which we are iterating.
Thanks
On Thu, Nov 13, 2025 at 04:05:09PM -0800, Nicolin Chen wrote:
> > -struct iopt_pages *iopt_alloc_file_pages(struct file *file, unsigned long start,
> > +struct iopt_pages *iopt_alloc_file_pages(struct file *file,
> > + unsigned long start_byte,
> > + unsigned long start,
> > unsigned long length, bool writable);
>
> Passing in start_byte looks like a cleanup to me, aligning with
> what iopt_map_common() has.
> Since we are doing this cleanup, maybe we could follow the same
> sequence: xxx, start, length, start_byte, writable?
??
static int iopt_map_common(struct iommufd_ctx *ictx, struct io_pagetable *iopt,
struct iopt_pages *pages, unsigned long *iova,
unsigned long length, unsigned long start_byte,
int iommu_prot, unsigned int flags)
Not the same arguments, we don't pass start and start_byte there?
Jason
On Tue, Nov 18, 2025 at 05:37:59AM +0000, Kasireddy, Vivek wrote:
> Hi Jason,
>
> > Subject: Re: [PATCH 0/9] Initial DMABUF support for iommufd
> >
> > On Thu, Nov 13, 2025 at 11:37:12AM -0700, Alex Williamson wrote:
> > > > The latest series for interconnect negotation to exchange a phys_addr is:
> > > > https://lore.kernel.org/r/20251027044712.1676175-1-
> > vivek.kasireddy(a)intel.com
> > >
> > > If this is in development, why are we pursuing a vfio specific
> > > temporary "private interconnect" here rather than building on that
> > > work? What are the gaps/barriers/timeline?
> >
> > I broadly don't expect to see an agreement on the above for probably
> Are you planning to post your SGT mapping type patches soon, so that we
> can start discussion on the design?
It is on my list, but probably not soon enough :\
I wanted to address the remarks given and I still have to conclude
some urgent things for this merge window.
> I went ahead and tested your patches and did not notice any regressions
> with my test-cases (after adding some minor fixups). I have also added/tested
> support for IOV mapping type based on your design:
> https://gitlab.freedesktop.org/Vivek/drm-tip/-/commits/dmabuf_iov_v1
Wow, that's great!
Jason
On Mon, Nov 17, 2025 at 08:36:20AM -0700, Alex Williamson wrote:
> On Tue, 11 Nov 2025 09:54:22 +0100
> Christian König <christian.koenig(a)amd.com> wrote:
>
> > On 11/10/25 21:42, Alex Williamson wrote:
> > > On Thu, 6 Nov 2025 16:16:45 +0200
> > > Leon Romanovsky <leon(a)kernel.org> wrote:
> > >
> > >> Changelog:
> > >> v7:
> > >> * Dropped restore_revoke flag and added vfio_pci_dma_buf_move
> > >> to reverse loop.
> > >> * Fixed spelling errors in documentation patch.
> > >> * Rebased on top of v6.18-rc3.
> > >> * Added include to stddef.h to vfio.h, to keep uapi header file independent.
> > >
> > > I think we're winding down on review comments. It'd be great to get
> > > p2pdma and dma-buf acks on this series. Otherwise it's been posted
> > > enough that we'll assume no objections. Thanks,
> >
> > Already have it on my TODO list to take a closer look, but no idea when that will be.
> >
> > This patch set is on place 4 or 5 on a rather long list of stuff to review/finish.
>
> Hi Christian,
>
> Gentle nudge. Leon posted v8[1] last week, which is not drawing any
> new comments. Do you foresee having time for review that I should
> still hold off merging for v6.19 a bit longer? Thanks,
I really want this merged this cycle, along with the iommufd part,
which means it needs to go into your tree by very early next week on a
shared branch so I can do the iommufd part on top.
It is the last blocking kernel piece to conclude the viommu support
roll out into qemu for iommufd which quite a lot of people have been
working on for years now.
IMHO there is nothing profound in the dmabuf patch, it was written by
the expert in the new DMA API operation, and doesn't form any
troublesome API contracts. It is also the same basic code as from the
v1 in July just moved into dmabuf .c files instead of vfio .c files at
Christoph's request.
My hope is DRM folks will pick up the baton and continue to improve
this to move other drivers away from dma_map_resource(). Simona told
me people have wanted DMA API improvements for ages, now we have them,
now is the time!
Any remarks after the fact can be addressed incrementally.
If there are no concrete technical remarks please take it. 6 months is
long enough to wait for feedback.
Thanks,
Jason
On Tue, Nov 18, 2025 at 07:59:20AM +0000, Ankit Agrawal wrote:
> + if (nvdev->resmem.memlength && region_index == RESMEM_REGION_INDEX) {
> + /*
> + * The P2P properties of the non-BAR memory is the same as the
> + * BAR memory, so just use the provider for index 0. Someday
> + * when CXL gets P2P support we could create CXLish providers
> + * for the non-BAR memory.
> + */
> + mem_region = &nvdev->resmem;
> + } else if (region_index == USEMEM_REGION_INDEX) {
> + /*
> + * This is actually cachable memory and isn't treated as P2P in
> + * the chip. For now we have no way to push cachable memory
> + * through everything and the Grace HW doesn't care what caching
> + * attribute is programmed into the SMMU. So use BAR 0.
> + */
> + mem_region = &nvdev->usemem;
> + }
> +
>
> Can we replace this with nvgrace_gpu_memregion()?
Yes, looks like
But we need to preserve the comments above as well somehow.
Jason