On 5/21/25 06:17, wangtao wrote:
Reducing CPU overhead/power consumption is critical for mobile devices. We need simpler and more efficient dmabuf direct I/O support.
As Christian evaluated sendfile performance based on your data, could you confirm whether the cache was cleared? If not, please share the post-cache-clearing test data. Thank you for your support.
Yes sorry, I was out yesterday riding motorcycles. I did not clear the cache for the buffered reads, I didn't realize you had. The IO plus the copy certainly explains the difference.
Your point about the unlikelihood of any of that data being in the cache also makes sense.
[wangtao] Thank you for testing and clarifying.
I'm not sure it changes anything about the ioctl approach though. Another way to do this would be to move the (optional) support for direct IO into the exporter via dma_buf_fops and dma_buf_ops. Then normal read() syscalls would just work for buffers that support them. I know that's more complicated, but at least it doesn't require inventing new uapi to do it.
[wangtao] Thank you for the discussion. I fully support any method that enables dmabuf direct I/O.
I understand using sendfile/splice with regular files for dmabuf adds an extra CPU copy, preventing zero-copy. For example: sendfile path: [DISK] → DMA → [page cache] → CPU copy → [memory file].
Yeah, but why can't you work on improving that?
The read() syscall can't pass regular file fd parameters, so I added an ioctl command. While copy_file_range() supports two fds (fd_in/fd_out), it blocks cross-fs use. Even without this restriction, file_out->f_op->copy_file_range only enables dmabuf direct reads from regular files, not writes.
Since dmabuf's direct I/O limitation comes from its unique attachment/map/fence model and lacks suitable syscalls, adding an ioctl seems necessary.
I absolutely don't see that. Both splice and sendfile can take two regular file descriptors.
That the underlying fops currently can't do that is not a valid argument for adding new uAPI. It just means that you need to work on improving those fops.
As long as nobody proves to me that the existing uAPI isn't sufficient for this use case I will systematically reject any approach to adding new one.
Regards, Christian.
When system exporters return a duplicated sg_table via map_dma_buf (used exclusively like a pages array), they should retain control over it.
I welcome all solutions to achieve dmabuf direct I/O! Your feedback is greatly appreciated.
1G from ext4 on 6.12.20 | read/sendfile (ms) w/ 3 > drop_caches ------------------------|------------------- udmabuf buffer read | 1210 udmabuf direct read | 671 udmabuf buffer sendfile | 1096 udmabuf direct sendfile | 2340
> dmabuf buffer read | 51 | 1068 | 1118 > dmabuf direct read | 52 | 297 | 349 > > udmabuf sendfile test steps: > 1. Open data file(1024MB), get back_fd 2. Create memfd(32MB) # > Loop steps 2-6 3. Allocate udmabuf with memfd 4. Call > sendfile(memfd, > back_fd) 5. Close memfd after sendfile 6. Close udmabuf 7. > Close back_fd > >> >> Regards, >> Christian. >
linaro-mm-sig@lists.linaro.org