On Tue, Jul 16, 2024 at 02:07:20PM +0200, Christian König wrote:
Am 16.07.24 um 11:31 schrieb Daniel Vetter:
On Tue, Jul 16, 2024 at 10:48:40AM +0800, Huan Yang wrote:
I just research the udmabuf, Please correct me if I'm wrong.
在 2024/7/15 20:32, Christian König 写道:
Am 15.07.24 um 11:11 schrieb Daniel Vetter:
On Thu, Jul 11, 2024 at 11:00:02AM +0200, Christian König wrote:
Am 11.07.24 um 09:42 schrieb Huan Yang: > Some user may need load file into dma-buf, current > way is: > 1. allocate a dma-buf, get dma-buf fd > 2. mmap dma-buf fd into vaddr > 3. read(file_fd, vaddr, fsz) > This is too heavy if fsz reached to GB. You need to describe a bit more why that is to heavy. I can only assume you need to save memory bandwidth and avoid the extra copy with the CPU.
> This patch implement a feature called DMA_HEAP_IOCTL_ALLOC_READ_FILE. > User need to offer a file_fd which you want to load into > dma-buf, then, > it promise if you got a dma-buf fd, it will contains the file content. Interesting idea, that has at least more potential than trying to enable direct I/O on mmap()ed DMA-bufs.
The approach with the new IOCTL might not work because it is a very specialized use case.
But IIRC there was a copy_file_range callback in the file_operations structure you could use for that. I'm just not sure when and how that's used with the copy_file_range() system call.
I'm not sure any of those help, because internally they're all still based on struct page (or maybe in the future on folios). And that's the thing dma-buf can't give you, at least without peaking behind the curtain.
I think an entirely different option would be malloc+udmabuf. That essentially handles the impendence-mismatch between direct I/O and dma-buf on the dma-buf side. The downside is that it'll make the permanently pinned memory accounting and tracking issues even more apparent, but I guess eventually we do need to sort that one out.
Oh, very good idea! Just one minor correction: it's not malloc+udmabuf, but rather create_memfd()+udmabuf.
Hm right, it's create_memfd() + mmap(memfd) + udmabuf
And you need to complete your direct I/O before creating the udmabuf since that reference will prevent direct I/O from working.
udmabuf will pin all pages, so, if returned fd, can't trigger direct I/O (same as dmabuf). So, must complete read before pin it.
Why does pinning prevent direct I/O? I haven't tested, but I'd expect the rdma folks would be really annoyed if that's the case ...
Pinning (or rather taking another page reference) prevents writes from using direct I/O because writes try to find all references and make them read only so that nobody modifies the content while the write is done.
Where do you see that happen? That's counter to my understading of what pin_user_page() does, which is what direct I/O uses ...
As far as I know the same approach is used for NUMA migration and replacing small pages with big ones in THP. But for the read case here it should still work.
Yeah elevated refcount breaks migration, but that's entirely different from the direct I/O use-case. Count me somewhat confused. -Sima