New subject: [PATCH v4 0/4] Implement dmabuf direct I/O via copy_file_range

10 Jun 2025


      On 6/9/25 11:32, wangtao wrote:
...
...
-----Original Message-----
From: Christoph Hellwig hch@infradead.org
Sent: Monday, June 9, 2025 12:35 PM
To: Christian König christian.koenig@amd.com
Cc: wangtao tao.wangtao@honor.com; Christoph Hellwig
hch@infradead.org; sumit.semwal@linaro.org; kraxel@redhat.com;
vivek.kasireddy@intel.com; viro@zeniv.linux.org.uk; brauner@kernel.org;
hughd@google.com; akpm@linux-foundation.org; amir73il@gmail.com;
benjamin.gaignard@collabora.com; Brian.Starkey@arm.com;
jstultz@google.com; tjmercier@google.com; jack@suse.cz;
baolin.wang@linux.alibaba.com; linux-media@vger.kernel.org; dri-
devel@lists.freedesktop.org; linaro-mm-sig@lists.linaro.org; linux-
kernel@vger.kernel.org; linux-fsdevel@vger.kernel.org; linux-
mm@kvack.org; wangbintian(BintianWang) bintian.wang@honor.com;
yipengxiang yipengxiang@honor.com; liulu 00013167
liulu.liu@honor.com; hanfeng 00012985 feng.han@honor.com
Subject: Re: [PATCH v4 0/4] Implement dmabuf direct I/O via
copy_file_range
On Fri, Jun 06, 2025 at 01:20:48PM +0200, Christian König wrote:
...
...
dmabuf acts as a driver and shouldn't be handled by VFS, so I made
dmabuf implement copy_file_range callbacks to support direct I/O
zero-copy. I'm open to both approaches. What's the preference of VFS
experts?
That would probably be illegal. Using the sg_table in the DMA-buf
implementation turned out to be a mistake.
Two thing here that should not be directly conflated.  Using the sg_table was
a huge mistake, and we should try to move dmabuf to switch that to a pure
I'm a bit confused: don't dmabuf importers need to traverse sg_table to
access folios or dma_addr/len? Do you mean restricting sg_table access
(e.g., only via iov_iter) or proposing alternative approaches?
No, accessing pages folios inside the sg_table of a DMA-buf is strictly forbidden.
We have removed most use cases of that over the years and push back on generating new ones.
...
...
dma_addr_t/len array now that the new DMA API supporting that has been
merged.  Is there any chance the dma-buf maintainers could start to kick this
off?  I'm of course happy to assist.
Work on that is already underway for some time.
Most GPU drivers already do sg_table -> DMA array conversion, I need to push on the remaining to clean up.
But there are also tons of other users of dma_buf_map_attachment() which needs to be converted.
...
...
But that notwithstanding, dma-buf is THE buffer sharing mechanism in the
kernel, and we should promote it instead of reinventing it badly.
And there is a use case for having a fully DMA mapped buffer in the block
layer and I/O path, especially on systems with an IOMMU.
So having an iov_iter backed by a dma-buf would be extremely helpful.
That's mostly lib/iov_iter.c code, not VFS, though.
Are you suggesting adding an ITER_DMABUF type to iov_iter, or
implementing dmabuf-to-iov_bvec conversion within iov_iter?
That would be rather nice to have, yeah.
...
...
...
The question Christoph raised was rather why is your CPU so slow that
walking the page tables has a significant overhead compared to the
actual I/O?
Yes, that's really puzzling and should be addressed first.
With high CPU performance (e.g., 3GHz), GUP (get_user_pages) overhead
is relatively low (observed in 3GHz tests).
Even on a low end CPU walking the page tables and grabbing references shouldn't be that much of an overhead.
There must be some reason why you see so much CPU overhead. E.g. compound pages are broken up or similar which should not happen in the first place.
Regards,
Christian.
...
|    32x32MB Read 1024MB    |Creat-ms|Close-ms|  I/O-ms|I/O-MB/s| I/O%
|---------------------------|--------|--------|--------|--------|-----
| 1)        memfd direct R/W|      1 |    118 |    312 |   3448 | 100%
| 2)      u+memfd direct R/W|    196 |    123 |    295 |   3651 | 105%
| 3) u+memfd direct sendfile|    175 |    102 |    976 |   1100 |  31%
| 4)   u+memfd direct splice|    173 |    103 |    443 |   2428 |  70%
| 5)      udmabuf buffer R/W|    183 |    100 |    453 |   2375 |  68%
| 6)       dmabuf buffer R/W|     34 |      4 |    427 |   2519 |  73%
| 7)    udmabuf direct c_f_r|    200 |    102 |    278 |   3874 | 112%
| 8)     dmabuf direct c_f_r|     36 |      5 |    269 |   4002 | 116%
With lower CPU performance (e.g., 1GHz), GUP overhead becomes more
significant (as seen in 1GHz tests).
|    32x32MB Read 1024MB    |Creat-ms|Close-ms|  I/O-ms|I/O-MB/s| I/O%
|---------------------------|--------|--------|--------|--------|-----
| 1)        memfd direct R/W|      2 |    393 |    969 |   1109 | 100%
| 2)      u+memfd direct R/W|    592 |    424 |    570 |   1884 | 169%
| 3) u+memfd direct sendfile|    587 |    356 |   2229 |    481 |  43%
| 4)   u+memfd direct splice|    568 |    352 |    795 |   1350 | 121%
| 5)      udmabuf buffer R/W|    597 |    343 |   1238 |    867 |  78%
| 6)       dmabuf buffer R/W|     69 |     13 |   1128 |    952 |  85%
| 7)    udmabuf direct c_f_r|    595 |    345 |    372 |   2889 | 260%
| 8)     dmabuf direct c_f_r|     80 |     13 |    274 |   3929 | 354%
Regards,
Wangtao.

Re: [PATCH v4 0/4] Implement dmabuf direct I/O via copy_file_range