On Tue, Jun 22, 2021 at 06:24:28PM +0300, Oded Gabbay wrote:
On Tue, Jun 22, 2021 at 6:11 PM Jason Gunthorpe jgg@ziepe.ca wrote:
On Tue, Jun 22, 2021 at 04:12:26PM +0300, Oded Gabbay wrote:
- Setting sg_page to NULL
- 'mapping' pages for P2P DMA without going through the iommu
- Allowing P2P DMA without using the p2p dma API to validate that it can work at all in the first place.
All of these result in functional bugs in certain system configurations.
Jason
Hi Jason, Thanks for the feedback. Regarding point 1, why is that a problem if we disable the option to mmap the dma-buf from user-space ?
Userspace has nothing to do with needing struct pages or not
Point 1 and 2 mostly go together, you supporting the iommu is not nice if you dont have struct pages.
You should study Logan's patches I pointed you at as they are solving exactly this problem.
Yes, I do need to study them. I agree with you here. It appears I have a hole in my understanding. I'm missing the connection between iommu support (which I must have of course) and struct pages.
Chistian explained what the AMD driver is doing by calling dma_map_resource().
Which is a hacky and slow way of achieving what Logan's series is doing.
No, the design of the dmabuf requires the exporter to do the dma maps and so it is only the exporter that is wrong to omit all the iommu and p2p logic.
RDMA is OK today only because nobody has implemented dma buf support in rxe/si - mainly because the only implementations of exporters don't
Can you please educate me, what is rxe/si ?
Sorry, rxe/siw - these are the all-software implementations of RDMA and they require the struct page to do a SW memory copy. They can't implement dmabuf without it.
ok... so how come that patch-set was merged into 5.12 if it's buggy ?
We only implemented true dma devices for RDMA DMABUF support, so it is isn't buggy right now.
Yes, that's what I expect to see. But I want to see it with my own eyes and then figure out how to solve this.
It might be tricky to test because you have to ensure the iommu is turned on and has a non-idenity page table. Basically if it doesn't trigger a IOMMU failure then the IOMMU isn't setup properly.
Jason