Am 12.03.20 um 15:19 schrieb Jason Gunthorpe:
On Thu, Mar 12, 2020 at 03:47:29AM -0700, Christoph Hellwig wrote:
On Thu, Mar 12, 2020 at 11:31:35AM +0100, Christian König wrote:
But how should we then deal with all the existing interfaces which already take a scatterlist/sg_table ?
The whole DMA-buf design and a lot of drivers are build around scatterlist/sg_table and to me that actually makes quite a lot of sense.
Replace them with a saner interface that doesn't take a scatterlist. At very least for new functionality like peer to peer DMA, but especially this code would also benefit from a general move away from the scatterlist.
If dma buf can do P2P I'd like to see support for consuming a dmabuf in RDMA.
That would indeed be awesome.
Looking at how.. there is an existing sgl based path starting from get_user_pages through dma map to the drivers. (ib_umem)
I can replace the driver part with something else (dma_sg), but not until we get a way to DMA map pages directly into that something else..
The non-page scatterlist is also a big concern for RDMA as we have drivers that want the page list, so even if we did as this series contemplates I'd have still have to split the drivers and create the notion of a dma-only SGL.
Yeah that's my concern as well. For GPU drivers I don't think we need the struct pages anywhere, but that might not be true for others.
I mean we could come up with a new structure for this, but to me that just looks like reinventing the wheel. Especially since drivers need to be able to handle both I/O to system memory and I/O to PCIe BARs.
The structure for holding the struct page side of the scatterlist is called struct bio_vec, so far mostly used by the block and networking code.
I haven't used bio_vecs before, do they support chaining like SGL so they can be very big? RDMA dma maps gigabytes of memory
The structure for holding dma addresses doesn't really exist in a generic form, but would be an array of these structures:
struct dma_sg { dma_addr_t addr; u32 len; };
Same question, RDMA needs to represent gigabytes of pages in a DMA list, we will need some generic way to handle that. I suspect GPU has a similar need? Can it be accomidated in some generic dma_sg?
Yes, we easily have ranges of >1GB. So I would certainly say u64 for the len here.
So I'm guessing the path forward is something like
- Add some generic dma_sg data structure and helper
- Add dma mapping code to go from pages to dma_sg
- Rework RDMA to use dma_sg and the new dma mapping code
- Rework dmabuf to support dma mapping to a dma_sg
- Rework GPU drivers to use dma_sg
- Teach p2pdma to generate a dma_sg from a BAR page list
- This series
?
Sounds pretty much like a plan to me, but unfortunately like a rather huge one.
Because of this and cause I don't know if all drivers can live with dma_sg I'm not sure if we shouldn't have the switch from scatterlist to dma_sg separately to this peer2peer work.
Christian.
Jason
linaro-mm-sig@lists.linaro.org