On Thu, Mar 28, 2024 at 12:31 AM Christoph Hellwig hch@infradead.org wrote:
On Tue, Mar 26, 2024 at 01:19:20PM -0700, Mina Almasry wrote:
Are you envisioning that dmabuf support would be added to the block layer
Yes.
(which I understand is part of the VFS and not driver specific),
The block layer isn't really the VFS, it's just another core stack like the network stack.
or as part of the specific storage driver (like nvme for example)? If we can add dmabuf support to the block layer itself that sounds awesome. We may then be able to do devmem TCP on all/most storage devices without having to modify each individual driver.
I suspect we'll still need to touch the drivers to understand it, but hopefully all the main infrastructure can live in the block layer.
In your estimation, is adding dmabuf support to the block layer something technically feasible & acceptable upstream? I notice you suggested it so I'm guessing yes to both, but I thought I'd confirm.
I think so, and I know there has been quite some interest to at least pre-register userspace memory so that the iommu overhead can be pre-loaded. It also is a much better interface for Peer to Peer transfers than what we currently have.
I think this is positively thrilling news for me. I was worried that adding devmemTCP support to storage devices would involve using a non-dmabuf standard of buffer sharing like pci_p2pdma_ (drivers/pci/p2pdma.c) and that would require messy changes to pci_p2pdma_ that would get nacked. Also it would require adding pci_p2pdma_ support to devmem TCP, which is a can of worms. If adding dma-buf support to storage devices is feasible and desirable, that's a much better approach IMO. (a) it will maybe work with devmem TCP without any changes needed on the netdev side of things and (b) dma-buf support may be generically useful and a good contribution even outside of devmem TCP.
I don't have a concrete user for devmem TCP for storage devices but the use case is very similar to GPU and I imagine the benefits in perf can be significant in some setups.
Christoph, if you have any hints or rough specific design in mind for how dma-buf support can be added to the block layer, please do let us know and we'll follow your hints to investigate. But I don't want to use up too much of your time. Marc and I can definitely read enough code to figure out how to do it ourselves :-)
Marc, please review and consider this thread and work, this could be a good project for you and I. I imagine the work would be:
1. Investigate how to add dma-buf support to the block layer (maybe write a prototype code, and maybe even test it with devmem TCP). 2. Share a code or no-code proposal with netdev/fs/block layer mailing list and try to work through concerns/nacks. 3. Finally share RFC through merging etc.
-- Thanks, Mina