On Tue, Jun 10, 2025 at 12:52:18PM +0200, Christian König wrote:
dma_addr_t/len array now that the new DMA API supporting that has been merged. Is there any chance the dma-buf maintainers could start to kick this off? I'm of course happy to assist.
Work on that is already underway for some time.
Most GPU drivers already do sg_table -> DMA array conversion, I need to push on the remaining to clean up.
Do you have a pointer?
Yes, that's really puzzling and should be addressed first.
With high CPU performance (e.g., 3GHz), GUP (get_user_pages) overhead is relatively low (observed in 3GHz tests).
Even on a low end CPU walking the page tables and grabbing references shouldn't be that much of an overhead.
Yes.
There must be some reason why you see so much CPU overhead. E.g. compound pages are broken up or similar which should not happen in the first place.
pin_user_pages outputs an array of PAGE_SIZE (modulo offset and shorter last length) array strut pages unfortunately. The block direct I/O code has grown code to reassemble folios from them fairly recently which did speed up some workloads.
Is this test using the block device or iomap direct I/O code? What kernel version is it run on?