Linaro-mm-sig July 2024

linaro-mm-sig@lists.linaro.org

19 participants
47 discussions

Re: [PATCH 0/2] Support direct I/O read and write for memory allocated by dmabuf

by Christian König

Am 10.07.24 um 15:57 schrieb Lei Liu: > Use vm_insert_page to establish a mapping for the memory allocated > by dmabuf, thus supporting direct I/O read and write; and fix the > issue of incorrect memory statistics after mapping dmabuf memory. Well big NAK to that! Direct I/O is intentionally disabled on DMA-bufs. We already discussed enforcing that in the DMA-buf framework and this patch probably means that we should really do that. Regards, Christian. > > Lei Liu (2): > mm: dmabuf_direct_io: Support direct_io for memory allocated by dmabuf > mm: dmabuf_direct_io: Fix memory statistics error for dmabuf allocated > memory with direct_io support > > drivers/dma-buf/heaps/system_heap.c | 5 +++-- > fs/proc/task_mmu.c | 8 +++++++- > include/linux/mm.h | 1 + > mm/memory.c | 15 ++++++++++----- > mm/rmap.c | 9 +++++---- > 5 files changed, 26 insertions(+), 12 deletions(-) >

1 year, 12 months

Re: [PATCH 1/2] dma-buf: heaps: DMA_HEAP_IOCTL_ALLOC_READ_FILE framework

by kernel test robot

Hi Huan, kernel test robot noticed the following build warnings: [auto build test WARNING on 523b23f0bee3014a7a752c9bb9f5c54f0eddae88] url: https://github.com/intel-lab-lkp/linux/commits/Huan-Yang/dma-buf-heaps-DMA_… base: 523b23f0bee3014a7a752c9bb9f5c54f0eddae88 patch link: https://lore.kernel.org/r/20240711074221.459589-2-link%40vivo.com patch subject: [PATCH 1/2] dma-buf: heaps: DMA_HEAP_IOCTL_ALLOC_READ_FILE framework config: i386-buildonly-randconfig-002-20240713 (https://download.01.org/0day-ci/archive/20240713/202407131825.A44mFGu1-lkp@…) compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240713/202407131825.A44mFGu1-lkp@…) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp(a)intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202407131825.A44mFGu1-lkp@intel.com/ All warnings (new ones prefixed by >>): >> drivers/dma-buf/dma-heap.c:293:18: warning: format specifies type 'long' but the argument has type 'ssize_t' (aka 'int') [-Wformat] 292 | pr_err("failed to use buffer kernel_read_file %s, err=%ld, [%ld, %ld], f_sz=%ld\n", | ~~~ | %zd 293 | pathp, err, start, fsz, fsz); | ^~~ include/linux/printk.h:533:33: note: expanded from macro 'pr_err' 533 | printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__) | ~~~ ^~~~~~~~~~~ include/linux/printk.h:490:60: note: expanded from macro 'printk' 490 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__) | ~~~ ^~~~~~~~~~~ include/linux/printk.h:462:19: note: expanded from macro 'printk_index_wrap' 462 | _p_func(_fmt, ##__VA_ARGS__); \ | ~~~~ ^~~~~~~~~~~ >> drivers/dma-buf/dma-heap.c:293:23: warning: format specifies type 'long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat] 292 | pr_err("failed to use buffer kernel_read_file %s, err=%ld, [%ld, %ld], f_sz=%ld\n", | ~~~ | %zu 293 | pathp, err, start, fsz, fsz); | ^~~~~ include/linux/printk.h:533:33: note: expanded from macro 'pr_err' 533 | printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__) | ~~~ ^~~~~~~~~~~ include/linux/printk.h:490:60: note: expanded from macro 'printk' 490 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__) | ~~~ ^~~~~~~~~~~ include/linux/printk.h:462:19: note: expanded from macro 'printk_index_wrap' 462 | _p_func(_fmt, ##__VA_ARGS__); \ | ~~~~ ^~~~~~~~~~~ drivers/dma-buf/dma-heap.c:293:30: warning: format specifies type 'long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat] 292 | pr_err("failed to use buffer kernel_read_file %s, err=%ld, [%ld, %ld], f_sz=%ld\n", | ~~~ | %zu 293 | pathp, err, start, fsz, fsz); | ^~~ include/linux/printk.h:533:33: note: expanded from macro 'pr_err' 533 | printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__) | ~~~ ^~~~~~~~~~~ include/linux/printk.h:490:60: note: expanded from macro 'printk' 490 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__) | ~~~ ^~~~~~~~~~~ include/linux/printk.h:462:19: note: expanded from macro 'printk_index_wrap' 462 | _p_func(_fmt, ##__VA_ARGS__); \ | ~~~~ ^~~~~~~~~~~ drivers/dma-buf/dma-heap.c:293:35: warning: format specifies type 'long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat] 292 | pr_err("failed to use buffer kernel_read_file %s, err=%ld, [%ld, %ld], f_sz=%ld\n", | ~~~ | %zu 293 | pathp, err, start, fsz, fsz); | ^~~ include/linux/printk.h:533:33: note: expanded from macro 'pr_err' 533 | printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__) | ~~~ ^~~~~~~~~~~ include/linux/printk.h:490:60: note: expanded from macro 'printk' 490 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__) | ~~~ ^~~~~~~~~~~ include/linux/printk.h:462:19: note: expanded from macro 'printk_index_wrap' 462 | _p_func(_fmt, ##__VA_ARGS__); \ | ~~~~ ^~~~~~~~~~~ drivers/dma-buf/dma-heap.c:367:10: warning: format specifies type 'long' but the argument has type 'ssize_t' (aka 'int') [-Wformat] 366 | pr_err("use kernel_read_file, err=%ld, [%ld, %ld], f_sz=%ld\n", | ~~~ | %zd 367 | err, start, (start + size), heap_file->fsz); | ^~~ include/linux/printk.h:533:33: note: expanded from macro 'pr_err' 533 | printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__) | ~~~ ^~~~~~~~~~~ include/linux/printk.h:490:60: note: expanded from macro 'printk' 490 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__) | ~~~ ^~~~~~~~~~~ include/linux/printk.h:462:19: note: expanded from macro 'printk_index_wrap' 462 | _p_func(_fmt, ##__VA_ARGS__); \ | ~~~~ ^~~~~~~~~~~ drivers/dma-buf/dma-heap.c:367:15: warning: format specifies type 'long' but the argument has type 'ssize_t' (aka 'int') [-Wformat] 366 | pr_err("use kernel_read_file, err=%ld, [%ld, %ld], f_sz=%ld\n", | ~~~ | %zd 367 | err, start, (start + size), heap_file->fsz); | ^~~~~ include/linux/printk.h:533:33: note: expanded from macro 'pr_err' 533 | printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__) | ~~~ ^~~~~~~~~~~ include/linux/printk.h:490:60: note: expanded from macro 'printk' 490 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__) | ~~~ ^~~~~~~~~~~ include/linux/printk.h:462:19: note: expanded from macro 'printk_index_wrap' 462 | _p_func(_fmt, ##__VA_ARGS__); \ | ~~~~ ^~~~~~~~~~~ drivers/dma-buf/dma-heap.c:367:22: warning: format specifies type 'long' but the argument has type 'ssize_t' (aka 'int') [-Wformat] 366 | pr_err("use kernel_read_file, err=%ld, [%ld, %ld], f_sz=%ld\n", | ~~~ | %zd 367 | err, start, (start + size), heap_file->fsz); | ^~~~~~~~~~~~~~ include/linux/printk.h:533:33: note: expanded from macro 'pr_err' 533 | printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__) | ~~~ ^~~~~~~~~~~ include/linux/printk.h:490:60: note: expanded from macro 'printk' 490 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__) | ~~~ ^~~~~~~~~~~ include/linux/printk.h:462:19: note: expanded from macro 'printk_index_wrap' 462 | _p_func(_fmt, ##__VA_ARGS__); \ | ~~~~ ^~~~~~~~~~~ drivers/dma-buf/dma-heap.c:367:38: warning: format specifies type 'long' but the argument has type 'size_t' (aka 'unsigned int') [-Wformat] 366 | pr_err("use kernel_read_file, err=%ld, [%ld, %ld], f_sz=%ld\n", | ~~~ | %zu 367 | err, start, (start + size), heap_file->fsz); | ^~~~~~~~~~~~~~ include/linux/printk.h:533:33: note: expanded from macro 'pr_err' 533 | printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__) | ~~~ ^~~~~~~~~~~ include/linux/printk.h:490:60: note: expanded from macro 'printk' 490 | #define printk(fmt, ...) printk_index_wrap(_printk, fmt, ##__VA_ARGS__) vim +293 drivers/dma-buf/dma-heap.c 239 240 int dma_heap_submit_file_read(struct dma_heap_file_task *heap_ftask) 241 { 242 struct dma_heap_file_work *heap_fwork = init_file_work(heap_ftask); 243 struct page *last = NULL; 244 struct dma_heap_file *heap_file = heap_ftask->heap_file; 245 size_t start = heap_ftask->roffset; 246 struct file *file = heap_file->file; 247 size_t fsz = heap_file->fsz; 248 249 if (unlikely(!heap_fwork)) 250 return -ENOMEM; 251 252 /** 253 * If file size is not page aligned, direct io can't process the tail. 254 * So, if reach to tail, remain the last page use buffer read. 255 */ 256 if (heap_file->direct && start + heap_ftask->rsize > fsz) { 257 heap_fwork->need_size -= PAGE_SIZE; 258 last = heap_ftask->parray[heap_ftask->pindex - 1]; 259 } 260 261 spin_lock(&heap_fctl->lock); 262 list_add_tail(&heap_fwork->list, &heap_fctl->works); 263 spin_unlock(&heap_fctl->lock); 264 atomic_inc(&heap_fctl->nr_work); 265 266 wake_up(&heap_fctl->threadwq); 267 268 if (last) { 269 char *buf, *pathp; 270 ssize_t err; 271 void *buffer; 272 273 buf = kmalloc(PATH_MAX, GFP_KERNEL); 274 if (unlikely(!buf)) 275 return -ENOMEM; 276 277 start = PAGE_ALIGN_DOWN(fsz); 278 279 pathp = file_path(file, buf, PATH_MAX); 280 if (IS_ERR(pathp)) { 281 kfree(buf); 282 return PTR_ERR(pathp); 283 } 284 285 buffer = kmap_local_page(last); // use page's kaddr. 286 err = kernel_read_file_from_path(pathp, start, &buffer, 287 fsz - start, &fsz, 288 READING_POLICY); 289 kunmap_local(buffer); 290 kfree(buf); 291 if (err < 0) { 292 pr_err("failed to use buffer kernel_read_file %s, err=%ld, [%ld, %ld], f_sz=%ld\n", > 293 pathp, err, start, fsz, fsz); 294 295 return err; 296 } 297 } 298 299 heap_ftask->roffset += heap_ftask->rsize; 300 heap_ftask->rsize = 0; 301 heap_ftask->pindex = 0; 302 heap_ftask->rbatch = min_t(size_t, 303 PAGE_ALIGN(fsz) - heap_ftask->roffset, 304 heap_ftask->rbatch); 305 return 0; 306 } 307 -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

2 years

Re: [PATCH 1/2] dma-buf: heaps: DMA_HEAP_IOCTL_ALLOC_READ_FILE framework

by Christian König

Am 12.07.24 um 09:52 schrieb Huan Yang: > > 在 2024/7/12 15:41, Christian König 写道: >> Am 12.07.24 um 09:29 schrieb Huan Yang: >>> Hi Christian, >>> >>> 在 2024/7/12 15:10, Christian König 写道: >>>> Am 12.07.24 um 04:14 schrieb Huan Yang: >>>>> 在 2024/7/12 9:59, Huan Yang 写道: >>>>>> Hi Christian, >>>>>> >>>>>> 在 2024/7/11 19:39, Christian König 写道: >>>>>>> Am 11.07.24 um 11:18 schrieb Huan Yang: >>>>>>>> Hi Christian, >>>>>>>> >>>>>>>> Thanks for your reply. >>>>>>>> >>>>>>>> 在 2024/7/11 17:00, Christian König 写道: >>>>>>>>> Am 11.07.24 um 09:42 schrieb Huan Yang: >>>>>>>>>> Some user may need load file into dma-buf, current >>>>>>>>>> way is: >>>>>>>>>> 1. allocate a dma-buf, get dma-buf fd >>>>>>>>>> 2. mmap dma-buf fd into vaddr >>>>>>>>>> 3. read(file_fd, vaddr, fsz) >>>>>>>>>> This is too heavy if fsz reached to GB. >>>>>>>>> >>>>>>>>> You need to describe a bit more why that is to heavy. I can >>>>>>>>> only assume you need to save memory bandwidth and avoid the >>>>>>>>> extra copy with the CPU. >>>>>>>> >>>>>>>> Sorry for the oversimplified explanation. But, yes, you're >>>>>>>> right, we want to avoid this. >>>>>>>> >>>>>>>> As we are dealing with embedded devices, the available memory >>>>>>>> and computing power for users are usually limited.(The maximum >>>>>>>> available memory is currently >>>>>>>> >>>>>>>> 24GB, typically ranging from 8-12GB. ) >>>>>>>> >>>>>>>> Also, the CPU computing power is also usually in short supply, >>>>>>>> due to limited battery capacity and limited heat dissipation >>>>>>>> capabilities. >>>>>>>> >>>>>>>> So, we hope to avoid ineffective paths as much as possible. >>>>>>>> >>>>>>>>> >>>>>>>>>> This patch implement a feature called >>>>>>>>>> DMA_HEAP_IOCTL_ALLOC_READ_FILE. >>>>>>>>>> User need to offer a file_fd which you want to load into >>>>>>>>>> dma-buf, then, >>>>>>>>>> it promise if you got a dma-buf fd, it will contains the file >>>>>>>>>> content. >>>>>>>>> >>>>>>>>> Interesting idea, that has at least more potential than trying >>>>>>>>> to enable direct I/O on mmap()ed DMA-bufs. >>>>>>>>> >>>>>>>>> The approach with the new IOCTL might not work because it is a >>>>>>>>> very specialized use case. >>>>>>>> >>>>>>>> Thank you for your advice. maybe the "read file" behavior can >>>>>>>> be attached to an existing allocation? >>>>>>> >>>>>>> The point is there are already system calls to do something like >>>>>>> that. >>>>>>> >>>>>>> See copy_file_range() >>>>>>> (https://man7.org/linux/man-pages/man2/copy_file_range.2.html) >>>>>>> and send_file() >>>>>>> (https://man7.org/linux/man-pages/man2/sendfile.2.html). >>>>>> >>>>>> That's helpfull to learn it, thanks. >>>>>> >>>>>> In terms of only DMA-BUF supporting direct I/O, >>>>>> copy_file_range/send_file may help to achieve this functionality. >>>>>> >>>>>> However, my patchset also aims to achieve parallel copying of >>>>>> file contents while allocating the DMA-BUF, which is something >>>>>> that the current set of calls may not be able to accomplish. >>>> >>>> And exactly that is a no-go. Use the existing IOCTLs and system >>>> calls instead they should have similar performance when done right. >>> >>> Get it, but In my testing process, even without memory pressure, it >>> takes about 60ms to allocate a 3GB DMA-BUF. When there is >>> significant memory pressure, the allocation time for a 3GB >> >> Well exactly that doesn't make sense. Even if you read the content of >> the DMA-buf from a file you still need to allocate it first. > > Yes, need allocate first, but in kernelspace, no need to wait all > memory allocated done and then trigger file load. That doesn't really make sense. Allocating a large bunch of memory is more efficient than allocating less multiple times because of cache locality for example. You could of course hide latency caused by operations to reduce memory pressure when you have a specific use case, but you don't need to use an in kernel implementation for that. Question is do you have clear on allocation or clear on free enabled? > This patchset use `batch` to done(default 128MB), ever 128MB > allocated, vmap and get vaddr, then trigger this vaddr load file's > target pos content. Again that sounds really not ideal to me. Creating the vmap alone is complete unnecessary overhead. >> So the question is why should reading and allocating it at the same >> time be better in any way? > > Memory pressure will trigger reclaim, it must to wait.(ms) Asume I > already allocated 512MB(need 3G) without enter slowpath, > > Even I need to enter slowpath to allocated remain memory, the already > allocated memory is using load file content.(Save time compare to > allocated done and read) > > The time difference between them can be expressed by the formula: > > 1. Allocate dmabuf time + file load time -- for original > > 2. first prepare batch time + Max(file load time, allocate remain > dma-buf time) + latest batch prepare time -- for new > > When the file reaches the gigabyte level, the significant difference > between the two can be clearly observed. I have strong doubts about that. The method you describe above is actually really inefficient. First of all you create a memory mapping just to load data, that is superfluous and TLB flushes are usually extremely costly. Both for userspace as well as kernel. I strongly suggest to try to use copy_file_range() instead. But could be that copy_file_range() doesn't even work right now because of some restrictions, never tried that on a DMA-buf. When that works as far as I can see what could still be saved on overhead is the following: 1. Clearing of memory on allocation. That could potentially be done with delayed allocation or clear on free instead. 2. CPU copy between the I/O target buffer and the DMA-buf backing pages. In theory it should be possible to avoid that by implementing the copy_file_range() callback, but I'm not 100% sure. Regards, Christian. > >> >> Regards, >> Christian. >> >>> >>> >>> DMA-BUF can increase to 300ms-1s. (The above test times can also >>> demonstrate the difference.) >>> >>> But, talk is cheap, I agree to research use existing way to >>> implements it and give a test. >>> >>> I'll show this if I done . >>> >>> Thanks for your suggestions. >>> >>>> >>>> Regards, >>>> Christian. >>>> >>>>> >>>>> You can see cover-letter, here are the normal test and this >>>>> IOCTL's compare in memory pressure, even if buffered I/O in this >>>>> ioctl can have 50% improve by parallel. >>>>> >>>>> dd a 3GB file for test, 12G RAM phone, UFS4.0, stressapptest 4G >>>>> memory pressure. >>>>> >>>>> 1. original >>>>> ```shel >>>>> # create a model file >>>>> dd if=/dev/zero of=./model.txt bs=1M count=3072 >>>>> # drop page cache >>>>> echo 3 > /proc/sys/vm/drop_caches >>>>> ./dmabuf-heap-file-read mtk_mm-uncached normal >>>>> >>>>>> result is total cost 13087213847ns >>>>> >>>>> ``` >>>>> >>>>> 2.DMA_HEAP_IOCTL_ALLOC_AND_READ O_DIRECT >>>>> ```shel >>>>> # create a model file >>>>> dd if=/dev/zero of=./model.txt bs=1M count=3072 >>>>> # drop page cache >>>>> echo 3 > /proc/sys/vm/drop_caches >>>>> ./dmabuf-heap-file-read mtk_mm-uncached direct_io >>>>> >>>>>> result is total cost 2902386846ns >>>>> >>>>> # use direct_io_check can check the content if is same to file. >>>>> ``` >>>>> >>>>> 3. DMA_HEAP_IOCTL_ALLOC_AND_READ BUFFER I/O >>>>> ```shel >>>>> # create a model file >>>>> dd if=/dev/zero of=./model.txt bs=1M count=3072 >>>>> # drop page cache >>>>> echo 3 > /proc/sys/vm/drop_caches >>>>> ./dmabuf-heap-file-read mtk_mm-uncached normal_io >>>>> >>>>>> result is total cost 5735579385ns >>>>> >>>>> ``` >>>>> >>>>>> >>>>>> Perhaps simply returning the DMA-BUF file descriptor and then >>>>>> implementing copy_file_range, while populating the memory and >>>>>> content during the copy process, could achieve this? At present, >>>>>> it seems that it will be quite complex - We need to ensure that >>>>>> only the returned DMA-BUF file descriptor will fail in case of >>>>>> memory not fill, like mmap, vmap, attach, and so on. >>>>>> >>>>>>> >>>>>>> What we probably could do is to internally optimize those. >>>>>>> >>>>>>>> I am currently creating a new ioctl to remind the user that >>>>>>>> memory is being allocated and read, and I am also unsure >>>>>>>> >>>>>>>> whether it is appropriate to add additional parameters to the >>>>>>>> existing allocate behavior. >>>>>>>> >>>>>>>> Please, give me more suggestion. Thanks. >>>>>>>> >>>>>>>>> >>>>>>>>> But IIRC there was a copy_file_range callback in the >>>>>>>>> file_operations structure you could use for that. I'm just not >>>>>>>>> sure when and how that's used with the copy_file_range() >>>>>>>>> system call. >>>>>>>> >>>>>>>> Sorry, I'm not familiar with this, but I will look into it. >>>>>>>> However, this type of callback function is not currently >>>>>>>> implemented when exporting >>>>>>>> >>>>>>>> the dma_buf file, which means that I need to implement the >>>>>>>> callback for it? >>>>>>> >>>>>>> If I'm not completely mistaken the copy_file_range, splice_read >>>>>>> and splice_write callbacks on the struct file_operations >>>>>>> (https://elixir.bootlin.com/linux/v6.10-rc7/source/include/linux/fs.h#L1999). >>>>>>> >>>>>>> Can be used to implement what you want to do. >>>>>> Yes. >>>>>>> >>>>>>> Regards, >>>>>>> Christian. >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Christian. >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Notice, file_fd depends on user how to open this file. So, >>>>>>>>>> both buffer >>>>>>>>>> I/O and Direct I/O is supported. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Huan Yang <link(a)vivo.com> >>>>>>>>>> --- >>>>>>>>>> drivers/dma-buf/dma-heap.c | 525 >>>>>>>>>> +++++++++++++++++++++++++++++++++- >>>>>>>>>> include/linux/dma-heap.h | 57 +++- >>>>>>>>>> include/uapi/linux/dma-heap.h | 32 +++ >>>>>>>>>> 3 files changed, 611 insertions(+), 3 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/drivers/dma-buf/dma-heap.c >>>>>>>>>> b/drivers/dma-buf/dma-heap.c >>>>>>>>>> index 2298ca5e112e..abe17281adb8 100644 >>>>>>>>>> --- a/drivers/dma-buf/dma-heap.c >>>>>>>>>> +++ b/drivers/dma-buf/dma-heap.c >>>>>>>>>> @@ -15,9 +15,11 @@ >>>>>>>>>> #include <linux/list.h> >>>>>>>>>> #include <linux/slab.h> >>>>>>>>>> #include <linux/nospec.h> >>>>>>>>>> +#include <linux/highmem.h> >>>>>>>>>> #include <linux/uaccess.h> >>>>>>>>>> #include <linux/syscalls.h> >>>>>>>>>> #include <linux/dma-heap.h> >>>>>>>>>> +#include <linux/vmalloc.h> >>>>>>>>>> #include <uapi/linux/dma-heap.h> >>>>>>>>>> #define DEVNAME "dma_heap" >>>>>>>>>> @@ -43,12 +45,462 @@ struct dma_heap { >>>>>>>>>> struct cdev heap_cdev; >>>>>>>>>> }; >>>>>>>>>> +/** >>>>>>>>>> + * struct dma_heap_file - wrap the file, read task for >>>>>>>>>> dma_heap allocate use. >>>>>>>>>> + * @file: file to read from. >>>>>>>>>> + * >>>>>>>>>> + * @cred: kthread use, user cred copy to use for the >>>>>>>>>> read. >>>>>>>>>> + * >>>>>>>>>> + * @max_batch: maximum batch size to read, if collect >>>>>>>>>> match batch, >>>>>>>>>> + * trigger read, default 128MB, must below file >>>>>>>>>> size. >>>>>>>>>> + * >>>>>>>>>> + * @fsz: file size. >>>>>>>>>> + * >>>>>>>>>> + * @direct: use direct IO? >>>>>>>>>> + */ >>>>>>>>>> +struct dma_heap_file { >>>>>>>>>> + struct file *file; >>>>>>>>>> + struct cred *cred; >>>>>>>>>> + size_t max_batch; >>>>>>>>>> + size_t fsz; >>>>>>>>>> + bool direct; >>>>>>>>>> +}; >>>>>>>>>> + >>>>>>>>>> +/** >>>>>>>>>> + * struct dma_heap_file_work - represents a dma_heap file >>>>>>>>>> read real work. >>>>>>>>>> + * @vaddr: contigous virtual address alloc by vmap, >>>>>>>>>> file read need. >>>>>>>>>> + * >>>>>>>>>> + * @start_size: file read start offset, same to >>>>>>>>>> @dma_heap_file_task->roffset. >>>>>>>>>> + * >>>>>>>>>> + * @need_size: file read need size, same to >>>>>>>>>> @dma_heap_file_task->rsize. >>>>>>>>>> + * >>>>>>>>>> + * @heap_file: file wrapper. >>>>>>>>>> + * >>>>>>>>>> + * @list: child node of @dma_heap_file_control->works. >>>>>>>>>> + * >>>>>>>>>> + * @refp: same @dma_heap_file_task->ref, if end of >>>>>>>>>> read, put ref. >>>>>>>>>> + * >>>>>>>>>> + * @failp: if any work io failed, set it true, pointp >>>>>>>>>> @dma_heap_file_task->fail. >>>>>>>>>> + */ >>>>>>>>>> +struct dma_heap_file_work { >>>>>>>>>> + void *vaddr; >>>>>>>>>> + ssize_t start_size; >>>>>>>>>> + ssize_t need_size; >>>>>>>>>> + struct dma_heap_file *heap_file; >>>>>>>>>> + struct list_head list; >>>>>>>>>> + atomic_t *refp; >>>>>>>>>> + bool *failp; >>>>>>>>>> +}; >>>>>>>>>> + >>>>>>>>>> +/** >>>>>>>>>> + * struct dma_heap_file_task - represents a dma_heap file >>>>>>>>>> read process >>>>>>>>>> + * @ref: current file work counter, if zero, allocate >>>>>>>>>> and read >>>>>>>>>> + * done. >>>>>>>>>> + * >>>>>>>>>> + * @roffset: last read offset, current prepared work' >>>>>>>>>> begin file >>>>>>>>>> + * start offset. >>>>>>>>>> + * >>>>>>>>>> + * @rsize: current allocated page size use to read, >>>>>>>>>> if reach rbatch, >>>>>>>>>> + * trigger commit. >>>>>>>>>> + * >>>>>>>>>> + * @rbatch: current prepared work's batch, below >>>>>>>>>> @dma_heap_file's >>>>>>>>>> + * batch. >>>>>>>>>> + * >>>>>>>>>> + * @heap_file: current dma_heap_file >>>>>>>>>> + * >>>>>>>>>> + * @parray: used for vmap, size is @dma_heap_file's >>>>>>>>>> batch's number >>>>>>>>>> + * pages.(this is maximum). Due to single thread >>>>>>>>>> file read, >>>>>>>>>> + * one page array reuse each work prepare is OK. >>>>>>>>>> + * Each index in parray is PAGE_SIZE.(vmap need) >>>>>>>>>> + * >>>>>>>>>> + * @pindex: current allocated page filled in >>>>>>>>>> @parray's index. >>>>>>>>>> + * >>>>>>>>>> + * @fail: any work failed when file read? >>>>>>>>>> + * >>>>>>>>>> + * dma_heap_file_task is the production of file read, will >>>>>>>>>> prepare each work >>>>>>>>>> + * during allocate dma_buf pages, if match current batch, >>>>>>>>>> then trigger commit >>>>>>>>>> + * and prepare next work. After all batch queued, user going >>>>>>>>>> on prepare dma_buf >>>>>>>>>> + * and so on, but before return dma_buf fd, need to wait >>>>>>>>>> file read end and >>>>>>>>>> + * check read result. >>>>>>>>>> + */ >>>>>>>>>> +struct dma_heap_file_task { >>>>>>>>>> + atomic_t ref; >>>>>>>>>> + size_t roffset; >>>>>>>>>> + size_t rsize; >>>>>>>>>> + size_t rbatch; >>>>>>>>>> + struct dma_heap_file *heap_file; >>>>>>>>>> + struct page **parray; >>>>>>>>>> + unsigned int pindex; >>>>>>>>>> + bool fail; >>>>>>>>>> +}; >>>>>>>>>> + >>>>>>>>>> +/** >>>>>>>>>> + * struct dma_heap_file_control - global control of dma_heap >>>>>>>>>> file read. >>>>>>>>>> + * @works: @dma_heap_file_work's list head. >>>>>>>>>> + * >>>>>>>>>> + * @lock: only lock for @works. >>>>>>>>>> + * >>>>>>>>>> + * @threadwq: wait queue for @work_thread, if commit >>>>>>>>>> work, @work_thread >>>>>>>>>> + * wakeup and read this work's file contains. >>>>>>>>>> + * >>>>>>>>>> + * @workwq: used for main thread wait for file read >>>>>>>>>> end, if allocation >>>>>>>>>> + * end before file read. @dma_heap_file_task ref >>>>>>>>>> effect this. >>>>>>>>>> + * >>>>>>>>>> + * @work_thread: file read kthread. the >>>>>>>>>> dma_heap_file_task work's consumer. >>>>>>>>>> + * >>>>>>>>>> + * @heap_fwork_cachep: @dma_heap_file_work's cachep, it's >>>>>>>>>> alloc/free frequently. >>>>>>>>>> + * >>>>>>>>>> + * @nr_work: global number of how many work committed. >>>>>>>>>> + */ >>>>>>>>>> +struct dma_heap_file_control { >>>>>>>>>> + struct list_head works; >>>>>>>>>> + spinlock_t lock; >>>>>>>>>> + wait_queue_head_t threadwq; >>>>>>>>>> + wait_queue_head_t workwq; >>>>>>>>>> + struct task_struct *work_thread; >>>>>>>>>> + struct kmem_cache *heap_fwork_cachep; >>>>>>>>>> + atomic_t nr_work; >>>>>>>>>> +}; >>>>>>>>>> + >>>>>>>>>> +static struct dma_heap_file_control *heap_fctl; >>>>>>>>>> static LIST_HEAD(heap_list); >>>>>>>>>> static DEFINE_MUTEX(heap_list_lock); >>>>>>>>>> static dev_t dma_heap_devt; >>>>>>>>>> static struct class *dma_heap_class; >>>>>>>>>> static DEFINE_XARRAY_ALLOC(dma_heap_minors); >>>>>>>>>> +/** >>>>>>>>>> + * map_pages_to_vaddr - map each scatter page into >>>>>>>>>> contiguous virtual address. >>>>>>>>>> + * @heap_ftask: prepared and need to commit's work. >>>>>>>>>> + * >>>>>>>>>> + * Cached pages need to trigger file read, this function map >>>>>>>>>> each scatter page >>>>>>>>>> + * into contiguous virtual address, so that file read can >>>>>>>>>> easy use. >>>>>>>>>> + * Now that we get vaddr page, cached pages can return to >>>>>>>>>> original user, so we >>>>>>>>>> + * will not effect dma-buf export even if file read not end. >>>>>>>>>> + */ >>>>>>>>>> +static void *map_pages_to_vaddr(struct dma_heap_file_task >>>>>>>>>> *heap_ftask) >>>>>>>>>> +{ >>>>>>>>>> + return vmap(heap_ftask->parray, heap_ftask->pindex, VM_MAP, >>>>>>>>>> + PAGE_KERNEL); >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +bool dma_heap_prepare_file_read(struct dma_heap_file_task >>>>>>>>>> *heap_ftask, >>>>>>>>>> + struct page *page) >>>>>>>>>> +{ >>>>>>>>>> + struct page **array = heap_ftask->parray; >>>>>>>>>> + int index = heap_ftask->pindex; >>>>>>>>>> + int num = compound_nr(page), i; >>>>>>>>>> + unsigned long sz = page_size(page); >>>>>>>>>> + >>>>>>>>>> + heap_ftask->rsize += sz; >>>>>>>>>> + for (i = 0; i < num; ++i) >>>>>>>>>> + array[index++] = &page[i]; >>>>>>>>>> + heap_ftask->pindex = index; >>>>>>>>>> + >>>>>>>>>> + return heap_ftask->rsize >= heap_ftask->rbatch; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +static struct dma_heap_file_work * >>>>>>>>>> +init_file_work(struct dma_heap_file_task *heap_ftask) >>>>>>>>>> +{ >>>>>>>>>> + struct dma_heap_file_work *heap_fwork; >>>>>>>>>> + struct dma_heap_file *heap_file = heap_ftask->heap_file; >>>>>>>>>> + >>>>>>>>>> + if (READ_ONCE(heap_ftask->fail)) >>>>>>>>>> + return NULL; >>>>>>>>>> + >>>>>>>>>> + heap_fwork = >>>>>>>>>> kmem_cache_alloc(heap_fctl->heap_fwork_cachep, GFP_KERNEL); >>>>>>>>>> + if (unlikely(!heap_fwork)) >>>>>>>>>> + return NULL; >>>>>>>>>> + >>>>>>>>>> + heap_fwork->vaddr = map_pages_to_vaddr(heap_ftask); >>>>>>>>>> + if (unlikely(!heap_fwork->vaddr)) { >>>>>>>>>> + kmem_cache_free(heap_fctl->heap_fwork_cachep, heap_fwork); >>>>>>>>>> + return NULL; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + heap_fwork->heap_file = heap_file; >>>>>>>>>> + heap_fwork->start_size = heap_ftask->roffset; >>>>>>>>>> + heap_fwork->need_size = heap_ftask->rsize; >>>>>>>>>> + heap_fwork->refp = &heap_ftask->ref; >>>>>>>>>> + heap_fwork->failp = &heap_ftask->fail; >>>>>>>>>> + atomic_inc(&heap_ftask->ref); >>>>>>>>>> + return heap_fwork; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +static void destroy_file_work(struct dma_heap_file_work >>>>>>>>>> *heap_fwork) >>>>>>>>>> +{ >>>>>>>>>> + vunmap(heap_fwork->vaddr); >>>>>>>>>> + atomic_dec(heap_fwork->refp); >>>>>>>>>> + wake_up(&heap_fctl->workwq); >>>>>>>>>> + >>>>>>>>>> + kmem_cache_free(heap_fctl->heap_fwork_cachep, heap_fwork); >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +int dma_heap_submit_file_read(struct dma_heap_file_task >>>>>>>>>> *heap_ftask) >>>>>>>>>> +{ >>>>>>>>>> + struct dma_heap_file_work *heap_fwork = >>>>>>>>>> init_file_work(heap_ftask); >>>>>>>>>> + struct page *last = NULL; >>>>>>>>>> + struct dma_heap_file *heap_file = heap_ftask->heap_file; >>>>>>>>>> + size_t start = heap_ftask->roffset; >>>>>>>>>> + struct file *file = heap_file->file; >>>>>>>>>> + size_t fsz = heap_file->fsz; >>>>>>>>>> + >>>>>>>>>> + if (unlikely(!heap_fwork)) >>>>>>>>>> + return -ENOMEM; >>>>>>>>>> + >>>>>>>>>> + /** >>>>>>>>>> + * If file size is not page aligned, direct io can't >>>>>>>>>> process the tail. >>>>>>>>>> + * So, if reach to tail, remain the last page use buffer >>>>>>>>>> read. >>>>>>>>>> + */ >>>>>>>>>> + if (heap_file->direct && start + heap_ftask->rsize > fsz) { >>>>>>>>>> + heap_fwork->need_size -= PAGE_SIZE; >>>>>>>>>> + last = heap_ftask->parray[heap_ftask->pindex - 1]; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + spin_lock(&heap_fctl->lock); >>>>>>>>>> + list_add_tail(&heap_fwork->list, &heap_fctl->works); >>>>>>>>>> + spin_unlock(&heap_fctl->lock); >>>>>>>>>> + atomic_inc(&heap_fctl->nr_work); >>>>>>>>>> + >>>>>>>>>> + wake_up(&heap_fctl->threadwq); >>>>>>>>>> + >>>>>>>>>> + if (last) { >>>>>>>>>> + char *buf, *pathp; >>>>>>>>>> + ssize_t err; >>>>>>>>>> + void *buffer; >>>>>>>>>> + >>>>>>>>>> + buf = kmalloc(PATH_MAX, GFP_KERNEL); >>>>>>>>>> + if (unlikely(!buf)) >>>>>>>>>> + return -ENOMEM; >>>>>>>>>> + >>>>>>>>>> + start = PAGE_ALIGN_DOWN(fsz); >>>>>>>>>> + >>>>>>>>>> + pathp = file_path(file, buf, PATH_MAX); >>>>>>>>>> + if (IS_ERR(pathp)) { >>>>>>>>>> + kfree(buf); >>>>>>>>>> + return PTR_ERR(pathp); >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + buffer = kmap_local_page(last); // use page's kaddr. >>>>>>>>>> + err = kernel_read_file_from_path(pathp, start, &buffer, >>>>>>>>>> + fsz - start, &fsz, >>>>>>>>>> + READING_POLICY); >>>>>>>>>> + kunmap_local(buffer); >>>>>>>>>> + kfree(buf); >>>>>>>>>> + if (err < 0) { >>>>>>>>>> + pr_err("failed to use buffer kernel_read_file >>>>>>>>>> %s, err=%ld, [%ld, %ld], f_sz=%ld\n", >>>>>>>>>> + pathp, err, start, fsz, fsz); >>>>>>>>>> + >>>>>>>>>> + return err; >>>>>>>>>> + } >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + heap_ftask->roffset += heap_ftask->rsize; >>>>>>>>>> + heap_ftask->rsize = 0; >>>>>>>>>> + heap_ftask->pindex = 0; >>>>>>>>>> + heap_ftask->rbatch = min_t(size_t, >>>>>>>>>> + PAGE_ALIGN(fsz) - heap_ftask->roffset, >>>>>>>>>> + heap_ftask->rbatch); >>>>>>>>>> + return 0; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +bool dma_heap_wait_for_file_read(struct dma_heap_file_task >>>>>>>>>> *heap_ftask) >>>>>>>>>> +{ >>>>>>>>>> + wait_event_freezable(heap_fctl->workwq, >>>>>>>>>> + atomic_read(&heap_ftask->ref) == 0); >>>>>>>>>> + return heap_ftask->fail; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +bool dma_heap_destroy_file_read(struct dma_heap_file_task >>>>>>>>>> *heap_ftask) >>>>>>>>>> +{ >>>>>>>>>> + bool fail; >>>>>>>>>> + >>>>>>>>>> + dma_heap_wait_for_file_read(heap_ftask); >>>>>>>>>> + fail = heap_ftask->fail; >>>>>>>>>> + kvfree(heap_ftask->parray); >>>>>>>>>> + kfree(heap_ftask); >>>>>>>>>> + return fail; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +struct dma_heap_file_task * >>>>>>>>>> +dma_heap_declare_file_read(struct dma_heap_file *heap_file) >>>>>>>>>> +{ >>>>>>>>>> + struct dma_heap_file_task *heap_ftask = >>>>>>>>>> + kzalloc(sizeof(*heap_ftask), GFP_KERNEL); >>>>>>>>>> + if (unlikely(!heap_ftask)) >>>>>>>>>> + return NULL; >>>>>>>>>> + >>>>>>>>>> + /** >>>>>>>>>> + * Batch is the maximum size which we prepare work will >>>>>>>>>> meet. >>>>>>>>>> + * So, direct alloc this number's page array is OK. >>>>>>>>>> + */ >>>>>>>>>> + heap_ftask->parray = kvmalloc_array(heap_file->max_batch >>>>>>>>>> >> PAGE_SHIFT, >>>>>>>>>> + sizeof(struct page *), GFP_KERNEL); >>>>>>>>>> + if (unlikely(!heap_ftask->parray)) >>>>>>>>>> + goto put; >>>>>>>>>> + >>>>>>>>>> + heap_ftask->heap_file = heap_file; >>>>>>>>>> + heap_ftask->rbatch = heap_file->max_batch; >>>>>>>>>> + return heap_ftask; >>>>>>>>>> +put: >>>>>>>>>> + kfree(heap_ftask); >>>>>>>>>> + return NULL; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +static void __work_this_io(struct dma_heap_file_work >>>>>>>>>> *heap_fwork) >>>>>>>>>> +{ >>>>>>>>>> + struct dma_heap_file *heap_file = heap_fwork->heap_file; >>>>>>>>>> + struct file *file = heap_file->file; >>>>>>>>>> + ssize_t start = heap_fwork->start_size; >>>>>>>>>> + ssize_t size = heap_fwork->need_size; >>>>>>>>>> + void *buffer = heap_fwork->vaddr; >>>>>>>>>> + const struct cred *old_cred; >>>>>>>>>> + ssize_t err; >>>>>>>>>> + >>>>>>>>>> + // use real task's cred to read this file. >>>>>>>>>> + old_cred = override_creds(heap_file->cred); >>>>>>>>>> + err = kernel_read_file(file, start, &buffer, size, >>>>>>>>>> &heap_file->fsz, >>>>>>>>>> + READING_POLICY); >>>>>>>>>> + if (err < 0) { >>>>>>>>>> + pr_err("use kernel_read_file, err=%ld, [%ld, %ld], >>>>>>>>>> f_sz=%ld\n", >>>>>>>>>> + err, start, (start + size), heap_file->fsz); >>>>>>>>>> + WRITE_ONCE(*heap_fwork->failp, true); >>>>>>>>>> + } >>>>>>>>>> + // recovery to my cred. >>>>>>>>>> + revert_creds(old_cred); >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +static int dma_heap_file_control_thread(void *data) >>>>>>>>>> +{ >>>>>>>>>> + struct dma_heap_file_control *heap_fctl = >>>>>>>>>> + (struct dma_heap_file_control *)data; >>>>>>>>>> + struct dma_heap_file_work *worker, *tmp; >>>>>>>>>> + int nr_work; >>>>>>>>>> + >>>>>>>>>> + LIST_HEAD(pages); >>>>>>>>>> + LIST_HEAD(workers); >>>>>>>>>> + >>>>>>>>>> + while (true) { >>>>>>>>>> + wait_event_freezable(heap_fctl->threadwq, >>>>>>>>>> + atomic_read(&heap_fctl->nr_work) > 0); >>>>>>>>>> +recheck: >>>>>>>>>> + spin_lock(&heap_fctl->lock); >>>>>>>>>> + list_splice_init(&heap_fctl->works, &workers); >>>>>>>>>> + spin_unlock(&heap_fctl->lock); >>>>>>>>>> + >>>>>>>>>> + if (unlikely(kthread_should_stop())) { >>>>>>>>>> + list_for_each_entry_safe(worker, tmp, &workers, >>>>>>>>>> list) { >>>>>>>>>> + list_del(&worker->list); >>>>>>>>>> + destroy_file_work(worker); >>>>>>>>>> + } >>>>>>>>>> + break; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + nr_work = 0; >>>>>>>>>> + list_for_each_entry_safe(worker, tmp, &workers, list) { >>>>>>>>>> + ++nr_work; >>>>>>>>>> + list_del(&worker->list); >>>>>>>>>> + __work_this_io(worker); >>>>>>>>>> + >>>>>>>>>> + destroy_file_work(worker); >>>>>>>>>> + } >>>>>>>>>> + atomic_sub(nr_work, &heap_fctl->nr_work); >>>>>>>>>> + >>>>>>>>>> + if (atomic_read(&heap_fctl->nr_work) > 0) >>>>>>>>>> + goto recheck; >>>>>>>>>> + } >>>>>>>>>> + return 0; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +size_t dma_heap_file_size(struct dma_heap_file *heap_file) >>>>>>>>>> +{ >>>>>>>>>> + return heap_file->fsz; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +static int prepare_dma_heap_file(struct dma_heap_file >>>>>>>>>> *heap_file, int file_fd, >>>>>>>>>> + size_t batch) >>>>>>>>>> +{ >>>>>>>>>> + struct file *file; >>>>>>>>>> + size_t fsz; >>>>>>>>>> + int ret; >>>>>>>>>> + >>>>>>>>>> + file = fget(file_fd); >>>>>>>>>> + if (!file) >>>>>>>>>> + return -EINVAL; >>>>>>>>>> + >>>>>>>>>> + fsz = i_size_read(file_inode(file)); >>>>>>>>>> + if (fsz < batch) { >>>>>>>>>> + ret = -EINVAL; >>>>>>>>>> + goto err; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + /** >>>>>>>>>> + * Selinux block our read, but actually we are reading >>>>>>>>>> the stand-in >>>>>>>>>> + * for this file. >>>>>>>>>> + * So save current's cred and when going to read, >>>>>>>>>> override mine, and >>>>>>>>>> + * end of read, revert. >>>>>>>>>> + */ >>>>>>>>>> + heap_file->cred = prepare_kernel_cred(current); >>>>>>>>>> + if (unlikely(!heap_file->cred)) { >>>>>>>>>> + ret = -ENOMEM; >>>>>>>>>> + goto err; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + heap_file->file = file; >>>>>>>>>> + heap_file->max_batch = batch; >>>>>>>>>> + heap_file->fsz = fsz; >>>>>>>>>> + >>>>>>>>>> + heap_file->direct = file->f_flags & O_DIRECT; >>>>>>>>>> + >>>>>>>>>> +#define DMA_HEAP_SUGGEST_DIRECT_IO_SIZE (1UL << 30) >>>>>>>>>> + if (!heap_file->direct && fsz >= >>>>>>>>>> DMA_HEAP_SUGGEST_DIRECT_IO_SIZE) >>>>>>>>>> + pr_warn("alloc read file better to use O_DIRECT to >>>>>>>>>> read larget file\n"); >>>>>>>>>> + >>>>>>>>>> + return 0; >>>>>>>>>> + >>>>>>>>>> +err: >>>>>>>>>> + fput(file); >>>>>>>>>> + return ret; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +static void destroy_dma_heap_file(struct dma_heap_file >>>>>>>>>> *heap_file) >>>>>>>>>> +{ >>>>>>>>>> + fput(heap_file->file); >>>>>>>>>> + put_cred(heap_file->cred); >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> +static int dma_heap_buffer_alloc_read_file(struct dma_heap >>>>>>>>>> *heap, int file_fd, >>>>>>>>>> + size_t batch, unsigned int fd_flags, >>>>>>>>>> + unsigned int heap_flags) >>>>>>>>>> +{ >>>>>>>>>> + struct dma_buf *dmabuf; >>>>>>>>>> + int fd; >>>>>>>>>> + struct dma_heap_file heap_file; >>>>>>>>>> + >>>>>>>>>> + fd = prepare_dma_heap_file(&heap_file, file_fd, batch); >>>>>>>>>> + if (fd) >>>>>>>>>> + goto error_file; >>>>>>>>>> + >>>>>>>>>> + dmabuf = heap->ops->allocate_read_file(heap, &heap_file, >>>>>>>>>> fd_flags, >>>>>>>>>> + heap_flags); >>>>>>>>>> + if (IS_ERR(dmabuf)) { >>>>>>>>>> + fd = PTR_ERR(dmabuf); >>>>>>>>>> + goto error; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + fd = dma_buf_fd(dmabuf, fd_flags); >>>>>>>>>> + if (fd < 0) { >>>>>>>>>> + dma_buf_put(dmabuf); >>>>>>>>>> + /* just return, as put will call release and that >>>>>>>>>> will free */ >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> +error: >>>>>>>>>> + destroy_dma_heap_file(&heap_file); >>>>>>>>>> +error_file: >>>>>>>>>> + return fd; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> static int dma_heap_buffer_alloc(struct dma_heap *heap, >>>>>>>>>> size_t len, >>>>>>>>>> u32 fd_flags, >>>>>>>>>> u64 heap_flags) >>>>>>>>>> @@ -93,6 +545,38 @@ static int dma_heap_open(struct inode >>>>>>>>>> *inode, struct file *file) >>>>>>>>>> return 0; >>>>>>>>>> } >>>>>>>>>> +static long dma_heap_ioctl_allocate_read_file(struct file >>>>>>>>>> *file, void *data) >>>>>>>>>> +{ >>>>>>>>>> + struct dma_heap_allocation_file_data >>>>>>>>>> *heap_allocation_file = data; >>>>>>>>>> + struct dma_heap *heap = file->private_data; >>>>>>>>>> + int fd; >>>>>>>>>> + >>>>>>>>>> + if (heap_allocation_file->fd || >>>>>>>>>> !heap_allocation_file->file_fd) >>>>>>>>>> + return -EINVAL; >>>>>>>>>> + >>>>>>>>>> + if (heap_allocation_file->fd_flags & >>>>>>>>>> ~DMA_HEAP_VALID_FD_FLAGS) >>>>>>>>>> + return -EINVAL; >>>>>>>>>> + >>>>>>>>>> + if (heap_allocation_file->heap_flags & >>>>>>>>>> ~DMA_HEAP_VALID_HEAP_FLAGS) >>>>>>>>>> + return -EINVAL; >>>>>>>>>> + >>>>>>>>>> + if (!heap->ops->allocate_read_file) >>>>>>>>>> + return -EINVAL; >>>>>>>>>> + >>>>>>>>>> + fd = dma_heap_buffer_alloc_read_file( >>>>>>>>>> + heap, heap_allocation_file->file_fd, >>>>>>>>>> + heap_allocation_file->batch ? >>>>>>>>>> + PAGE_ALIGN(heap_allocation_file->batch) : >>>>>>>>>> + DEFAULT_ADI_BATCH, >>>>>>>>>> + heap_allocation_file->fd_flags, >>>>>>>>>> + heap_allocation_file->heap_flags); >>>>>>>>>> + if (fd < 0) >>>>>>>>>> + return fd; >>>>>>>>>> + >>>>>>>>>> + heap_allocation_file->fd = fd; >>>>>>>>>> + return 0; >>>>>>>>>> +} >>>>>>>>>> + >>>>>>>>>> static long dma_heap_ioctl_allocate(struct file *file, void >>>>>>>>>> *data) >>>>>>>>>> { >>>>>>>>>> struct dma_heap_allocation_data *heap_allocation = data; >>>>>>>>>> @@ -121,6 +605,7 @@ static long >>>>>>>>>> dma_heap_ioctl_allocate(struct file *file, void *data) >>>>>>>>>> static unsigned int dma_heap_ioctl_cmds[] = { >>>>>>>>>> DMA_HEAP_IOCTL_ALLOC, >>>>>>>>>> + DMA_HEAP_IOCTL_ALLOC_AND_READ, >>>>>>>>>> }; >>>>>>>>>> static long dma_heap_ioctl(struct file *file, unsigned >>>>>>>>>> int ucmd, >>>>>>>>>> @@ -170,6 +655,9 @@ static long dma_heap_ioctl(struct file >>>>>>>>>> *file, unsigned int ucmd, >>>>>>>>>> case DMA_HEAP_IOCTL_ALLOC: >>>>>>>>>> ret = dma_heap_ioctl_allocate(file, kdata); >>>>>>>>>> break; >>>>>>>>>> + case DMA_HEAP_IOCTL_ALLOC_AND_READ: >>>>>>>>>> + ret = dma_heap_ioctl_allocate_read_file(file, kdata); >>>>>>>>>> + break; >>>>>>>>>> default: >>>>>>>>>> ret = -ENOTTY; >>>>>>>>>> goto err; >>>>>>>>>> @@ -316,11 +804,44 @@ static int dma_heap_init(void) >>>>>>>>>> dma_heap_class = class_create(DEVNAME); >>>>>>>>>> if (IS_ERR(dma_heap_class)) { >>>>>>>>>> - unregister_chrdev_region(dma_heap_devt, >>>>>>>>>> NUM_HEAP_MINORS); >>>>>>>>>> - return PTR_ERR(dma_heap_class); >>>>>>>>>> + ret = PTR_ERR(dma_heap_class); >>>>>>>>>> + goto fail_class; >>>>>>>>>> } >>>>>>>>>> dma_heap_class->devnode = dma_heap_devnode; >>>>>>>>>> + heap_fctl = kzalloc(sizeof(*heap_fctl), GFP_KERNEL); >>>>>>>>>> + if (unlikely(!heap_fctl)) { >>>>>>>>>> + ret = -ENOMEM; >>>>>>>>>> + goto fail_alloc; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + INIT_LIST_HEAD(&heap_fctl->works); >>>>>>>>>> + init_waitqueue_head(&heap_fctl->threadwq); >>>>>>>>>> + init_waitqueue_head(&heap_fctl->workwq); >>>>>>>>>> + >>>>>>>>>> + heap_fctl->work_thread = >>>>>>>>>> kthread_run(dma_heap_file_control_thread, >>>>>>>>>> + heap_fctl, "heap_fwork_t"); >>>>>>>>>> + if (IS_ERR(heap_fctl->work_thread)) { >>>>>>>>>> + ret = -ENOMEM; >>>>>>>>>> + goto fail_thread; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + heap_fctl->heap_fwork_cachep = >>>>>>>>>> KMEM_CACHE(dma_heap_file_work, 0); >>>>>>>>>> + if (unlikely(!heap_fctl->heap_fwork_cachep)) { >>>>>>>>>> + ret = -ENOMEM; >>>>>>>>>> + goto fail_cache; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> return 0; >>>>>>>>>> + >>>>>>>>>> +fail_cache: >>>>>>>>>> + kthread_stop(heap_fctl->work_thread); >>>>>>>>>> +fail_thread: >>>>>>>>>> + kfree(heap_fctl); >>>>>>>>>> +fail_alloc: >>>>>>>>>> + class_destroy(dma_heap_class); >>>>>>>>>> +fail_class: >>>>>>>>>> + unregister_chrdev_region(dma_heap_devt, NUM_HEAP_MINORS); >>>>>>>>>> + return ret; >>>>>>>>>> } >>>>>>>>>> subsys_initcall(dma_heap_init); >>>>>>>>>> diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h >>>>>>>>>> index 064bad725061..9c25383f816c 100644 >>>>>>>>>> --- a/include/linux/dma-heap.h >>>>>>>>>> +++ b/include/linux/dma-heap.h >>>>>>>>>> @@ -12,12 +12,17 @@ >>>>>>>>>> #include <linux/cdev.h> >>>>>>>>>> #include <linux/types.h> >>>>>>>>>> +#define DEFAULT_ADI_BATCH (128 << 20) >>>>>>>>>> + >>>>>>>>>> struct dma_heap; >>>>>>>>>> +struct dma_heap_file_task; >>>>>>>>>> +struct dma_heap_file; >>>>>>>>>> /** >>>>>>>>>> * struct dma_heap_ops - ops to operate on a given heap >>>>>>>>>> * @allocate: allocate dmabuf and return struct >>>>>>>>>> dma_buf ptr >>>>>>>>>> - * >>>>>>>>>> + * @allocate_read_file: allocate dmabuf and read file, then >>>>>>>>>> return struct >>>>>>>>>> + * dma_buf ptr. >>>>>>>>>> * allocate returns dmabuf on success, ERR_PTR(-errno) on >>>>>>>>>> error. >>>>>>>>>> */ >>>>>>>>>> struct dma_heap_ops { >>>>>>>>>> @@ -25,6 +30,11 @@ struct dma_heap_ops { >>>>>>>>>> unsigned long len, >>>>>>>>>> u32 fd_flags, >>>>>>>>>> u64 heap_flags); >>>>>>>>>> + >>>>>>>>>> + struct dma_buf *(*allocate_read_file)(struct dma_heap >>>>>>>>>> *heap, >>>>>>>>>> + struct dma_heap_file *heap_file, >>>>>>>>>> + u32 fd_flags, >>>>>>>>>> + u64 heap_flags); >>>>>>>>>> }; >>>>>>>>>> /** >>>>>>>>>> @@ -65,4 +75,49 @@ const char *dma_heap_get_name(struct >>>>>>>>>> dma_heap *heap); >>>>>>>>>> */ >>>>>>>>>> struct dma_heap *dma_heap_add(const struct >>>>>>>>>> dma_heap_export_info *exp_info); >>>>>>>>>> +/** >>>>>>>>>> + * dma_heap_destroy_file_read - waits for a file read to >>>>>>>>>> complete then destroy it >>>>>>>>>> + * Returns: true if the file read failed, false otherwise >>>>>>>>>> + */ >>>>>>>>>> +bool dma_heap_destroy_file_read(struct dma_heap_file_task >>>>>>>>>> *heap_ftask); >>>>>>>>>> + >>>>>>>>>> +/** >>>>>>>>>> + * dma_heap_wait_for_file_read - waits for a file read to >>>>>>>>>> complete >>>>>>>>>> + * Returns: true if the file read failed, false otherwise >>>>>>>>>> + */ >>>>>>>>>> +bool dma_heap_wait_for_file_read(struct dma_heap_file_task >>>>>>>>>> *heap_ftask); >>>>>>>>>> + >>>>>>>>>> +/** >>>>>>>>>> + * dma_heap_alloc_file_read - Declare a task to read file >>>>>>>>>> when allocate pages. >>>>>>>>>> + * @heap_file: target file to read >>>>>>>>>> + * >>>>>>>>>> + * Return NULL if failed, otherwise return a struct pointer. >>>>>>>>>> + */ >>>>>>>>>> +struct dma_heap_file_task * >>>>>>>>>> +dma_heap_declare_file_read(struct dma_heap_file *heap_file); >>>>>>>>>> + >>>>>>>>>> +/** >>>>>>>>>> + * dma_heap_prepare_file_read - cache each allocated page >>>>>>>>>> until we meet this batch. >>>>>>>>>> + * @heap_ftask: prepared and need to commit's work. >>>>>>>>>> + * @page: current allocated page. don't care which >>>>>>>>>> order. >>>>>>>>>> + * >>>>>>>>>> + * Returns true if reach to batch, false so go on prepare. >>>>>>>>>> + */ >>>>>>>>>> +bool dma_heap_prepare_file_read(struct dma_heap_file_task >>>>>>>>>> *heap_ftask, >>>>>>>>>> + struct page *page); >>>>>>>>>> + >>>>>>>>>> +/** >>>>>>>>>> + * dma_heap_commit_file_read - prepare collect enough >>>>>>>>>> memory, going to trigger IO >>>>>>>>>> + * @heap_ftask: info that current IO needs >>>>>>>>>> + * >>>>>>>>>> + * This commit will also check if reach to tail read. >>>>>>>>>> + * For direct I/O submissions, it is necessary to pay >>>>>>>>>> attention to file reads >>>>>>>>>> + * that are not page-aligned. For the unaligned portion of >>>>>>>>>> the read, buffer IO >>>>>>>>>> + * needs to be triggered. >>>>>>>>>> + * Returns: >>>>>>>>>> + * 0 if all right, -errno if something wrong >>>>>>>>>> + */ >>>>>>>>>> +int dma_heap_submit_file_read(struct dma_heap_file_task >>>>>>>>>> *heap_ftask); >>>>>>>>>> +size_t dma_heap_file_size(struct dma_heap_file *heap_file); >>>>>>>>>> + >>>>>>>>>> #endif /* _DMA_HEAPS_H */ >>>>>>>>>> diff --git a/include/uapi/linux/dma-heap.h >>>>>>>>>> b/include/uapi/linux/dma-heap.h >>>>>>>>>> index a4cf716a49fa..8c20e8b74eed 100644 >>>>>>>>>> --- a/include/uapi/linux/dma-heap.h >>>>>>>>>> +++ b/include/uapi/linux/dma-heap.h >>>>>>>>>> @@ -39,6 +39,27 @@ struct dma_heap_allocation_data { >>>>>>>>>> __u64 heap_flags; >>>>>>>>>> }; >>>>>>>>>> +/** >>>>>>>>>> + * struct dma_heap_allocation_file_data - metadata passed >>>>>>>>>> from userspace for >>>>>>>>>> + * allocations and read file >>>>>>>>>> + * @fd: will be populated with a fd which >>>>>>>>>> provides the >>>>>>>>>> + * �� handle to the allocated dma-buf >>>>>>>>>> + * @file_fd: file descriptor to read from(suggested >>>>>>>>>> to use O_DIRECT open file) >>>>>>>>>> + * @batch: how many memory alloced then file >>>>>>>>>> read(bytes), default 128MB >>>>>>>>>> + * will auto aligned to PAGE_SIZE >>>>>>>>>> + * @fd_flags: file descriptor flags used when allocating >>>>>>>>>> + * @heap_flags: flags passed to heap >>>>>>>>>> + * >>>>>>>>>> + * Provided by userspace as an argument to the ioctl >>>>>>>>>> + */ >>>>>>>>>> +struct dma_heap_allocation_file_data { >>>>>>>>>> + __u32 fd; >>>>>>>>>> + __u32 file_fd; >>>>>>>>>> + __u32 batch; >>>>>>>>>> + __u32 fd_flags; >>>>>>>>>> + __u64 heap_flags; >>>>>>>>>> +}; >>>>>>>>>> + >>>>>>>>>> #define DMA_HEAP_IOC_MAGIC 'H' >>>>>>>>>> /** >>>>>>>>>> @@ -50,4 +71,15 @@ struct dma_heap_allocation_data { >>>>>>>>>> #define DMA_HEAP_IOCTL_ALLOC _IOWR(DMA_HEAP_IOC_MAGIC, 0x0,\ >>>>>>>>>> struct dma_heap_allocation_data) >>>>>>>>>> +/** >>>>>>>>>> + * DOC: DMA_HEAP_IOCTL_ALLOC_AND_READ - allocate memory from >>>>>>>>>> pool and both >>>>>>>>>> + * read file when allocate memory. >>>>>>>>>> + * >>>>>>>>>> + * Takes a dma_heap_allocation_file_data struct and returns >>>>>>>>>> it with the fd field >>>>>>>>>> + * populated with the dmabuf handle of the allocation. When >>>>>>>>>> return, the dma-buf >>>>>>>>>> + * content is read from file. >>>>>>>>>> + */ >>>>>>>>>> +#define DMA_HEAP_IOCTL_ALLOC_AND_READ \ >>>>>>>>>> + _IOWR(DMA_HEAP_IOC_MAGIC, 0x1, struct >>>>>>>>>> dma_heap_allocation_file_data) >>>>>>>>>> + >>>>>>>>>> #endif /* _UAPI_LINUX_DMABUF_POOL_H */ >>>>>>>>> >>>>>>> >>>> >>

2 years

[PATCH 0/8] dma-buf: heaps: Support carved-out heaps and ECC related-flags

by Maxime Ripard

Hi, This series is the follow-up of the discussion that John and I had a few months ago here: https://lore.kernel.org/all/CANDhNCquJn6bH3KxKf65BWiTYLVqSd9892-xtFDHHqqyrr… The initial problem we were discussing was that I'm currently working on a platform which has a memory layout with ECC enabled. However, enabling the ECC has a number of drawbacks on that platform: lower performance, increased memory usage, etc. So for things like framebuffers, the trade-off isn't great and thus there's a memory region with ECC disabled to allocate from for such use cases. After a suggestion from John, I chose to start using heap allocations flags to allow for userspace to ask for a particular ECC setup. This is then backed by a new heap type that runs from reserved memory chunks flagged as such, and the existing DT properties to specify the ECC properties. We could also easily extend this mechanism to support more flags, or through a new ioctl to discover which flags a given heap supports. I submitted a draft PR to the DT schema for the bindings used in this PR: https://github.com/devicetree-org/dt-schema/pull/138 Let me know what you think, Maxime Signed-off-by: Maxime Ripard <mripard(a)kernel.org> --- Maxime Ripard (8): dma-buf: heaps: Introduce a new heap for reserved memory of: Add helper to retrieve ECC memory bits dma-buf: heaps: Import uAPI header dma-buf: heaps: Add ECC protection flags dma-buf: heaps: system: Remove global variable dma-buf: heaps: system: Handle ECC flags dma-buf: heaps: cma: Handle ECC flags dma-buf: heaps: carveout: Handle ECC flags drivers/dma-buf/dma-heap.c | 4 + drivers/dma-buf/heaps/Kconfig | 8 + drivers/dma-buf/heaps/Makefile | 1 + drivers/dma-buf/heaps/carveout_heap.c | 330 ++++++++++++++++++++++++++++++++++ drivers/dma-buf/heaps/cma_heap.c | 10 ++ drivers/dma-buf/heaps/system_heap.c | 29 ++- include/linux/dma-heap.h | 2 + include/linux/of.h | 25 +++ include/uapi/linux/dma-heap.h | 5 +- 9 files changed, 407 insertions(+), 7 deletions(-) --- base-commit: a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6 change-id: 20240515-dma-buf-ecc-heap-28a311d2c94e Best regards, -- Maxime Ripard <mripard(a)kernel.org>

2 years

Re: [PATCH 1/2] dma-buf: heaps: DMA_HEAP_IOCTL_ALLOC_READ_FILE framework

by Christian König

Am 12.07.24 um 09:29 schrieb Huan Yang: > Hi Christian, > > 在 2024/7/12 15:10, Christian König 写道: >> Am 12.07.24 um 04:14 schrieb Huan Yang: >>> 在 2024/7/12 9:59, Huan Yang 写道: >>>> Hi Christian, >>>> >>>> 在 2024/7/11 19:39, Christian König 写道: >>>>> Am 11.07.24 um 11:18 schrieb Huan Yang: >>>>>> Hi Christian, >>>>>> >>>>>> Thanks for your reply. >>>>>> >>>>>> 在 2024/7/11 17:00, Christian König 写道: >>>>>>> Am 11.07.24 um 09:42 schrieb Huan Yang: >>>>>>>> Some user may need load file into dma-buf, current >>>>>>>> way is: >>>>>>>> 1. allocate a dma-buf, get dma-buf fd >>>>>>>> 2. mmap dma-buf fd into vaddr >>>>>>>> 3. read(file_fd, vaddr, fsz) >>>>>>>> This is too heavy if fsz reached to GB. >>>>>>> >>>>>>> You need to describe a bit more why that is to heavy. I can only >>>>>>> assume you need to save memory bandwidth and avoid the extra >>>>>>> copy with the CPU. >>>>>> >>>>>> Sorry for the oversimplified explanation. But, yes, you're right, >>>>>> we want to avoid this. >>>>>> >>>>>> As we are dealing with embedded devices, the available memory and >>>>>> computing power for users are usually limited.(The maximum >>>>>> available memory is currently >>>>>> >>>>>> 24GB, typically ranging from 8-12GB. ) >>>>>> >>>>>> Also, the CPU computing power is also usually in short supply, >>>>>> due to limited battery capacity and limited heat dissipation >>>>>> capabilities. >>>>>> >>>>>> So, we hope to avoid ineffective paths as much as possible. >>>>>> >>>>>>> >>>>>>>> This patch implement a feature called >>>>>>>> DMA_HEAP_IOCTL_ALLOC_READ_FILE. >>>>>>>> User need to offer a file_fd which you want to load into >>>>>>>> dma-buf, then, >>>>>>>> it promise if you got a dma-buf fd, it will contains the file >>>>>>>> content. >>>>>>> >>>>>>> Interesting idea, that has at least more potential than trying >>>>>>> to enable direct I/O on mmap()ed DMA-bufs. >>>>>>> >>>>>>> The approach with the new IOCTL might not work because it is a >>>>>>> very specialized use case. >>>>>> >>>>>> Thank you for your advice. maybe the "read file" behavior can be >>>>>> attached to an existing allocation? >>>>> >>>>> The point is there are already system calls to do something like >>>>> that. >>>>> >>>>> See copy_file_range() >>>>> (https://man7.org/linux/man-pages/man2/copy_file_range.2.html) and >>>>> send_file() (https://man7.org/linux/man-pages/man2/sendfile.2.html). >>>> >>>> That's helpfull to learn it, thanks. >>>> >>>> In terms of only DMA-BUF supporting direct I/O, >>>> copy_file_range/send_file may help to achieve this functionality. >>>> >>>> However, my patchset also aims to achieve parallel copying of file >>>> contents while allocating the DMA-BUF, which is something that the >>>> current set of calls may not be able to accomplish. >> >> And exactly that is a no-go. Use the existing IOCTLs and system calls >> instead they should have similar performance when done right. > > Get it, but In my testing process, even without memory pressure, it > takes about 60ms to allocate a 3GB DMA-BUF. When there is significant > memory pressure, the allocation time for a 3GB Well exactly that doesn't make sense. Even if you read the content of the DMA-buf from a file you still need to allocate it first. So the question is why should reading and allocating it at the same time be better in any way? Regards, Christian. > > > DMA-BUF can increase to 300ms-1s. (The above test times can also > demonstrate the difference.) > > But, talk is cheap, I agree to research use existing way to implements > it and give a test. > > I'll show this if I done . > > Thanks for your suggestions. > >> >> Regards, >> Christian. >> >>> >>> You can see cover-letter, here are the normal test and this IOCTL's >>> compare in memory pressure, even if buffered I/O in this ioctl can >>> have 50% improve by parallel. >>> >>> dd a 3GB file for test, 12G RAM phone, UFS4.0, stressapptest 4G >>> memory pressure. >>> >>> 1. original >>> ```shel >>> # create a model file >>> dd if=/dev/zero of=./model.txt bs=1M count=3072 >>> # drop page cache >>> echo 3 > /proc/sys/vm/drop_caches >>> ./dmabuf-heap-file-read mtk_mm-uncached normal >>> >>>> result is total cost 13087213847ns >>> >>> ``` >>> >>> 2.DMA_HEAP_IOCTL_ALLOC_AND_READ O_DIRECT >>> ```shel >>> # create a model file >>> dd if=/dev/zero of=./model.txt bs=1M count=3072 >>> # drop page cache >>> echo 3 > /proc/sys/vm/drop_caches >>> ./dmabuf-heap-file-read mtk_mm-uncached direct_io >>> >>>> result is total cost 2902386846ns >>> >>> # use direct_io_check can check the content if is same to file. >>> ``` >>> >>> 3. DMA_HEAP_IOCTL_ALLOC_AND_READ BUFFER I/O >>> ```shel >>> # create a model file >>> dd if=/dev/zero of=./model.txt bs=1M count=3072 >>> # drop page cache >>> echo 3 > /proc/sys/vm/drop_caches >>> ./dmabuf-heap-file-read mtk_mm-uncached normal_io >>> >>>> result is total cost 5735579385ns >>> >>> ``` >>> >>>> >>>> Perhaps simply returning the DMA-BUF file descriptor and then >>>> implementing copy_file_range, while populating the memory and >>>> content during the copy process, could achieve this? At present, it >>>> seems that it will be quite complex - We need to ensure that only >>>> the returned DMA-BUF file descriptor will fail in case of memory >>>> not fill, like mmap, vmap, attach, and so on. >>>> >>>>> >>>>> What we probably could do is to internally optimize those. >>>>> >>>>>> I am currently creating a new ioctl to remind the user that >>>>>> memory is being allocated and read, and I am also unsure >>>>>> >>>>>> whether it is appropriate to add additional parameters to the >>>>>> existing allocate behavior. >>>>>> >>>>>> Please, give me more suggestion. Thanks. >>>>>> >>>>>>> >>>>>>> But IIRC there was a copy_file_range callback in the >>>>>>> file_operations structure you could use for that. I'm just not >>>>>>> sure when and how that's used with the copy_file_range() system >>>>>>> call. >>>>>> >>>>>> Sorry, I'm not familiar with this, but I will look into it. >>>>>> However, this type of callback function is not currently >>>>>> implemented when exporting >>>>>> >>>>>> the dma_buf file, which means that I need to implement the >>>>>> callback for it? >>>>> >>>>> If I'm not completely mistaken the copy_file_range, splice_read >>>>> and splice_write callbacks on the struct file_operations >>>>> (https://elixir.bootlin.com/linux/v6.10-rc7/source/include/linux/fs.h#L1999). >>>>> >>>>> Can be used to implement what you want to do. >>>> Yes. >>>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>>> >>>>>>> >>>>>>> Regards, >>>>>>> Christian. >>>>>>> >>>>>>>> >>>>>>>> Notice, file_fd depends on user how to open this file. So, both >>>>>>>> buffer >>>>>>>> I/O and Direct I/O is supported. >>>>>>>> >>>>>>>> Signed-off-by: Huan Yang <link(a)vivo.com> >>>>>>>> --- >>>>>>>> drivers/dma-buf/dma-heap.c | 525 >>>>>>>> +++++++++++++++++++++++++++++++++- >>>>>>>> include/linux/dma-heap.h | 57 +++- >>>>>>>> include/uapi/linux/dma-heap.h | 32 +++ >>>>>>>> 3 files changed, 611 insertions(+), 3 deletions(-) >>>>>>>> >>>>>>>> diff --git a/drivers/dma-buf/dma-heap.c >>>>>>>> b/drivers/dma-buf/dma-heap.c >>>>>>>> index 2298ca5e112e..abe17281adb8 100644 >>>>>>>> --- a/drivers/dma-buf/dma-heap.c >>>>>>>> +++ b/drivers/dma-buf/dma-heap.c >>>>>>>> @@ -15,9 +15,11 @@ >>>>>>>> #include <linux/list.h> >>>>>>>> #include <linux/slab.h> >>>>>>>> #include <linux/nospec.h> >>>>>>>> +#include <linux/highmem.h> >>>>>>>> #include <linux/uaccess.h> >>>>>>>> #include <linux/syscalls.h> >>>>>>>> #include <linux/dma-heap.h> >>>>>>>> +#include <linux/vmalloc.h> >>>>>>>> #include <uapi/linux/dma-heap.h> >>>>>>>> #define DEVNAME "dma_heap" >>>>>>>> @@ -43,12 +45,462 @@ struct dma_heap { >>>>>>>> struct cdev heap_cdev; >>>>>>>> }; >>>>>>>> +/** >>>>>>>> + * struct dma_heap_file - wrap the file, read task for >>>>>>>> dma_heap allocate use. >>>>>>>> + * @file: file to read from. >>>>>>>> + * >>>>>>>> + * @cred: kthread use, user cred copy to use for the read. >>>>>>>> + * >>>>>>>> + * @max_batch: maximum batch size to read, if collect >>>>>>>> match batch, >>>>>>>> + * trigger read, default 128MB, must below file size. >>>>>>>> + * >>>>>>>> + * @fsz: file size. >>>>>>>> + * >>>>>>>> + * @direct: use direct IO? >>>>>>>> + */ >>>>>>>> +struct dma_heap_file { >>>>>>>> + struct file *file; >>>>>>>> + struct cred *cred; >>>>>>>> + size_t max_batch; >>>>>>>> + size_t fsz; >>>>>>>> + bool direct; >>>>>>>> +}; >>>>>>>> + >>>>>>>> +/** >>>>>>>> + * struct dma_heap_file_work - represents a dma_heap file read >>>>>>>> real work. >>>>>>>> + * @vaddr: contigous virtual address alloc by vmap, >>>>>>>> file read need. >>>>>>>> + * >>>>>>>> + * @start_size: file read start offset, same to >>>>>>>> @dma_heap_file_task->roffset. >>>>>>>> + * >>>>>>>> + * @need_size: file read need size, same to >>>>>>>> @dma_heap_file_task->rsize. >>>>>>>> + * >>>>>>>> + * @heap_file: file wrapper. >>>>>>>> + * >>>>>>>> + * @list: child node of @dma_heap_file_control->works. >>>>>>>> + * >>>>>>>> + * @refp: same @dma_heap_file_task->ref, if end of >>>>>>>> read, put ref. >>>>>>>> + * >>>>>>>> + * @failp: if any work io failed, set it true, pointp >>>>>>>> @dma_heap_file_task->fail. >>>>>>>> + */ >>>>>>>> +struct dma_heap_file_work { >>>>>>>> + void *vaddr; >>>>>>>> + ssize_t start_size; >>>>>>>> + ssize_t need_size; >>>>>>>> + struct dma_heap_file *heap_file; >>>>>>>> + struct list_head list; >>>>>>>> + atomic_t *refp; >>>>>>>> + bool *failp; >>>>>>>> +}; >>>>>>>> + >>>>>>>> +/** >>>>>>>> + * struct dma_heap_file_task - represents a dma_heap file read >>>>>>>> process >>>>>>>> + * @ref: current file work counter, if zero, allocate >>>>>>>> and read >>>>>>>> + * done. >>>>>>>> + * >>>>>>>> + * @roffset: last read offset, current prepared work' >>>>>>>> begin file >>>>>>>> + * start offset. >>>>>>>> + * >>>>>>>> + * @rsize: current allocated page size use to read, if >>>>>>>> reach rbatch, >>>>>>>> + * trigger commit. >>>>>>>> + * >>>>>>>> + * @rbatch: current prepared work's batch, below >>>>>>>> @dma_heap_file's >>>>>>>> + * batch. >>>>>>>> + * >>>>>>>> + * @heap_file: current dma_heap_file >>>>>>>> + * >>>>>>>> + * @parray: used for vmap, size is @dma_heap_file's >>>>>>>> batch's number >>>>>>>> + * pages.(this is maximum). Due to single thread >>>>>>>> file read, >>>>>>>> + * one page array reuse each work prepare is OK. >>>>>>>> + * Each index in parray is PAGE_SIZE.(vmap need) >>>>>>>> + * >>>>>>>> + * @pindex: current allocated page filled in @parray's >>>>>>>> index. >>>>>>>> + * >>>>>>>> + * @fail: any work failed when file read? >>>>>>>> + * >>>>>>>> + * dma_heap_file_task is the production of file read, will >>>>>>>> prepare each work >>>>>>>> + * during allocate dma_buf pages, if match current batch, then >>>>>>>> trigger commit >>>>>>>> + * and prepare next work. After all batch queued, user going >>>>>>>> on prepare dma_buf >>>>>>>> + * and so on, but before return dma_buf fd, need to wait file >>>>>>>> read end and >>>>>>>> + * check read result. >>>>>>>> + */ >>>>>>>> +struct dma_heap_file_task { >>>>>>>> + atomic_t ref; >>>>>>>> + size_t roffset; >>>>>>>> + size_t rsize; >>>>>>>> + size_t rbatch; >>>>>>>> + struct dma_heap_file *heap_file; >>>>>>>> + struct page **parray; >>>>>>>> + unsigned int pindex; >>>>>>>> + bool fail; >>>>>>>> +}; >>>>>>>> + >>>>>>>> +/** >>>>>>>> + * struct dma_heap_file_control - global control of dma_heap >>>>>>>> file read. >>>>>>>> + * @works: @dma_heap_file_work's list head. >>>>>>>> + * >>>>>>>> + * @lock: only lock for @works. >>>>>>>> + * >>>>>>>> + * @threadwq: wait queue for @work_thread, if commit >>>>>>>> work, @work_thread >>>>>>>> + * wakeup and read this work's file contains. >>>>>>>> + * >>>>>>>> + * @workwq: used for main thread wait for file read >>>>>>>> end, if allocation >>>>>>>> + * end before file read. @dma_heap_file_task ref >>>>>>>> effect this. >>>>>>>> + * >>>>>>>> + * @work_thread: file read kthread. the dma_heap_file_task >>>>>>>> work's consumer. >>>>>>>> + * >>>>>>>> + * @heap_fwork_cachep: @dma_heap_file_work's cachep, it's >>>>>>>> alloc/free frequently. >>>>>>>> + * >>>>>>>> + * @nr_work: global number of how many work committed. >>>>>>>> + */ >>>>>>>> +struct dma_heap_file_control { >>>>>>>> + struct list_head works; >>>>>>>> + spinlock_t lock; >>>>>>>> + wait_queue_head_t threadwq; >>>>>>>> + wait_queue_head_t workwq; >>>>>>>> + struct task_struct *work_thread; >>>>>>>> + struct kmem_cache *heap_fwork_cachep; >>>>>>>> + atomic_t nr_work; >>>>>>>> +}; >>>>>>>> + >>>>>>>> +static struct dma_heap_file_control *heap_fctl; >>>>>>>> static LIST_HEAD(heap_list); >>>>>>>> static DEFINE_MUTEX(heap_list_lock); >>>>>>>> static dev_t dma_heap_devt; >>>>>>>> static struct class *dma_heap_class; >>>>>>>> static DEFINE_XARRAY_ALLOC(dma_heap_minors); >>>>>>>> +/** >>>>>>>> + * map_pages_to_vaddr - map each scatter page into contiguous >>>>>>>> virtual address. >>>>>>>> + * @heap_ftask: prepared and need to commit's work. >>>>>>>> + * >>>>>>>> + * Cached pages need to trigger file read, this function map >>>>>>>> each scatter page >>>>>>>> + * into contiguous virtual address, so that file read can easy >>>>>>>> use. >>>>>>>> + * Now that we get vaddr page, cached pages can return to >>>>>>>> original user, so we >>>>>>>> + * will not effect dma-buf export even if file read not end. >>>>>>>> + */ >>>>>>>> +static void *map_pages_to_vaddr(struct dma_heap_file_task >>>>>>>> *heap_ftask) >>>>>>>> +{ >>>>>>>> + return vmap(heap_ftask->parray, heap_ftask->pindex, VM_MAP, >>>>>>>> + PAGE_KERNEL); >>>>>>>> +} >>>>>>>> + >>>>>>>> +bool dma_heap_prepare_file_read(struct dma_heap_file_task >>>>>>>> *heap_ftask, >>>>>>>> + struct page *page) >>>>>>>> +{ >>>>>>>> + struct page **array = heap_ftask->parray; >>>>>>>> + int index = heap_ftask->pindex; >>>>>>>> + int num = compound_nr(page), i; >>>>>>>> + unsigned long sz = page_size(page); >>>>>>>> + >>>>>>>> + heap_ftask->rsize += sz; >>>>>>>> + for (i = 0; i < num; ++i) >>>>>>>> + array[index++] = &page[i]; >>>>>>>> + heap_ftask->pindex = index; >>>>>>>> + >>>>>>>> + return heap_ftask->rsize >= heap_ftask->rbatch; >>>>>>>> +} >>>>>>>> + >>>>>>>> +static struct dma_heap_file_work * >>>>>>>> +init_file_work(struct dma_heap_file_task *heap_ftask) >>>>>>>> +{ >>>>>>>> + struct dma_heap_file_work *heap_fwork; >>>>>>>> + struct dma_heap_file *heap_file = heap_ftask->heap_file; >>>>>>>> + >>>>>>>> + if (READ_ONCE(heap_ftask->fail)) >>>>>>>> + return NULL; >>>>>>>> + >>>>>>>> + heap_fwork = >>>>>>>> kmem_cache_alloc(heap_fctl->heap_fwork_cachep, GFP_KERNEL); >>>>>>>> + if (unlikely(!heap_fwork)) >>>>>>>> + return NULL; >>>>>>>> + >>>>>>>> + heap_fwork->vaddr = map_pages_to_vaddr(heap_ftask); >>>>>>>> + if (unlikely(!heap_fwork->vaddr)) { >>>>>>>> + kmem_cache_free(heap_fctl->heap_fwork_cachep, heap_fwork); >>>>>>>> + return NULL; >>>>>>>> + } >>>>>>>> + >>>>>>>> + heap_fwork->heap_file = heap_file; >>>>>>>> + heap_fwork->start_size = heap_ftask->roffset; >>>>>>>> + heap_fwork->need_size = heap_ftask->rsize; >>>>>>>> + heap_fwork->refp = &heap_ftask->ref; >>>>>>>> + heap_fwork->failp = &heap_ftask->fail; >>>>>>>> + atomic_inc(&heap_ftask->ref); >>>>>>>> + return heap_fwork; >>>>>>>> +} >>>>>>>> + >>>>>>>> +static void destroy_file_work(struct dma_heap_file_work >>>>>>>> *heap_fwork) >>>>>>>> +{ >>>>>>>> + vunmap(heap_fwork->vaddr); >>>>>>>> + atomic_dec(heap_fwork->refp); >>>>>>>> + wake_up(&heap_fctl->workwq); >>>>>>>> + >>>>>>>> + kmem_cache_free(heap_fctl->heap_fwork_cachep, heap_fwork); >>>>>>>> +} >>>>>>>> + >>>>>>>> +int dma_heap_submit_file_read(struct dma_heap_file_task >>>>>>>> *heap_ftask) >>>>>>>> +{ >>>>>>>> + struct dma_heap_file_work *heap_fwork = >>>>>>>> init_file_work(heap_ftask); >>>>>>>> + struct page *last = NULL; >>>>>>>> + struct dma_heap_file *heap_file = heap_ftask->heap_file; >>>>>>>> + size_t start = heap_ftask->roffset; >>>>>>>> + struct file *file = heap_file->file; >>>>>>>> + size_t fsz = heap_file->fsz; >>>>>>>> + >>>>>>>> + if (unlikely(!heap_fwork)) >>>>>>>> + return -ENOMEM; >>>>>>>> + >>>>>>>> + /** >>>>>>>> + * If file size is not page aligned, direct io can't >>>>>>>> process the tail. >>>>>>>> + * So, if reach to tail, remain the last page use buffer >>>>>>>> read. >>>>>>>> + */ >>>>>>>> + if (heap_file->direct && start + heap_ftask->rsize > fsz) { >>>>>>>> + heap_fwork->need_size -= PAGE_SIZE; >>>>>>>> + last = heap_ftask->parray[heap_ftask->pindex - 1]; >>>>>>>> + } >>>>>>>> + >>>>>>>> + spin_lock(&heap_fctl->lock); >>>>>>>> + list_add_tail(&heap_fwork->list, &heap_fctl->works); >>>>>>>> + spin_unlock(&heap_fctl->lock); >>>>>>>> + atomic_inc(&heap_fctl->nr_work); >>>>>>>> + >>>>>>>> + wake_up(&heap_fctl->threadwq); >>>>>>>> + >>>>>>>> + if (last) { >>>>>>>> + char *buf, *pathp; >>>>>>>> + ssize_t err; >>>>>>>> + void *buffer; >>>>>>>> + >>>>>>>> + buf = kmalloc(PATH_MAX, GFP_KERNEL); >>>>>>>> + if (unlikely(!buf)) >>>>>>>> + return -ENOMEM; >>>>>>>> + >>>>>>>> + start = PAGE_ALIGN_DOWN(fsz); >>>>>>>> + >>>>>>>> + pathp = file_path(file, buf, PATH_MAX); >>>>>>>> + if (IS_ERR(pathp)) { >>>>>>>> + kfree(buf); >>>>>>>> + return PTR_ERR(pathp); >>>>>>>> + } >>>>>>>> + >>>>>>>> + buffer = kmap_local_page(last); // use page's kaddr. >>>>>>>> + err = kernel_read_file_from_path(pathp, start, &buffer, >>>>>>>> + fsz - start, &fsz, >>>>>>>> + READING_POLICY); >>>>>>>> + kunmap_local(buffer); >>>>>>>> + kfree(buf); >>>>>>>> + if (err < 0) { >>>>>>>> + pr_err("failed to use buffer kernel_read_file %s, >>>>>>>> err=%ld, [%ld, %ld], f_sz=%ld\n", >>>>>>>> + pathp, err, start, fsz, fsz); >>>>>>>> + >>>>>>>> + return err; >>>>>>>> + } >>>>>>>> + } >>>>>>>> + >>>>>>>> + heap_ftask->roffset += heap_ftask->rsize; >>>>>>>> + heap_ftask->rsize = 0; >>>>>>>> + heap_ftask->pindex = 0; >>>>>>>> + heap_ftask->rbatch = min_t(size_t, >>>>>>>> + PAGE_ALIGN(fsz) - heap_ftask->roffset, >>>>>>>> + heap_ftask->rbatch); >>>>>>>> + return 0; >>>>>>>> +} >>>>>>>> + >>>>>>>> +bool dma_heap_wait_for_file_read(struct dma_heap_file_task >>>>>>>> *heap_ftask) >>>>>>>> +{ >>>>>>>> + wait_event_freezable(heap_fctl->workwq, >>>>>>>> + atomic_read(&heap_ftask->ref) == 0); >>>>>>>> + return heap_ftask->fail; >>>>>>>> +} >>>>>>>> + >>>>>>>> +bool dma_heap_destroy_file_read(struct dma_heap_file_task >>>>>>>> *heap_ftask) >>>>>>>> +{ >>>>>>>> + bool fail; >>>>>>>> + >>>>>>>> + dma_heap_wait_for_file_read(heap_ftask); >>>>>>>> + fail = heap_ftask->fail; >>>>>>>> + kvfree(heap_ftask->parray); >>>>>>>> + kfree(heap_ftask); >>>>>>>> + return fail; >>>>>>>> +} >>>>>>>> + >>>>>>>> +struct dma_heap_file_task * >>>>>>>> +dma_heap_declare_file_read(struct dma_heap_file *heap_file) >>>>>>>> +{ >>>>>>>> + struct dma_heap_file_task *heap_ftask = >>>>>>>> + kzalloc(sizeof(*heap_ftask), GFP_KERNEL); >>>>>>>> + if (unlikely(!heap_ftask)) >>>>>>>> + return NULL; >>>>>>>> + >>>>>>>> + /** >>>>>>>> + * Batch is the maximum size which we prepare work will meet. >>>>>>>> + * So, direct alloc this number's page array is OK. >>>>>>>> + */ >>>>>>>> + heap_ftask->parray = kvmalloc_array(heap_file->max_batch >>>>>>>> >> PAGE_SHIFT, >>>>>>>> + sizeof(struct page *), GFP_KERNEL); >>>>>>>> + if (unlikely(!heap_ftask->parray)) >>>>>>>> + goto put; >>>>>>>> + >>>>>>>> + heap_ftask->heap_file = heap_file; >>>>>>>> + heap_ftask->rbatch = heap_file->max_batch; >>>>>>>> + return heap_ftask; >>>>>>>> +put: >>>>>>>> + kfree(heap_ftask); >>>>>>>> + return NULL; >>>>>>>> +} >>>>>>>> + >>>>>>>> +static void __work_this_io(struct dma_heap_file_work *heap_fwork) >>>>>>>> +{ >>>>>>>> + struct dma_heap_file *heap_file = heap_fwork->heap_file; >>>>>>>> + struct file *file = heap_file->file; >>>>>>>> + ssize_t start = heap_fwork->start_size; >>>>>>>> + ssize_t size = heap_fwork->need_size; >>>>>>>> + void *buffer = heap_fwork->vaddr; >>>>>>>> + const struct cred *old_cred; >>>>>>>> + ssize_t err; >>>>>>>> + >>>>>>>> + // use real task's cred to read this file. >>>>>>>> + old_cred = override_creds(heap_file->cred); >>>>>>>> + err = kernel_read_file(file, start, &buffer, size, >>>>>>>> &heap_file->fsz, >>>>>>>> + READING_POLICY); >>>>>>>> + if (err < 0) { >>>>>>>> + pr_err("use kernel_read_file, err=%ld, [%ld, %ld], >>>>>>>> f_sz=%ld\n", >>>>>>>> + err, start, (start + size), heap_file->fsz); >>>>>>>> + WRITE_ONCE(*heap_fwork->failp, true); >>>>>>>> + } >>>>>>>> + // recovery to my cred. >>>>>>>> + revert_creds(old_cred); >>>>>>>> +} >>>>>>>> + >>>>>>>> +static int dma_heap_file_control_thread(void *data) >>>>>>>> +{ >>>>>>>> + struct dma_heap_file_control *heap_fctl = >>>>>>>> + (struct dma_heap_file_control *)data; >>>>>>>> + struct dma_heap_file_work *worker, *tmp; >>>>>>>> + int nr_work; >>>>>>>> + >>>>>>>> + LIST_HEAD(pages); >>>>>>>> + LIST_HEAD(workers); >>>>>>>> + >>>>>>>> + while (true) { >>>>>>>> + wait_event_freezable(heap_fctl->threadwq, >>>>>>>> + atomic_read(&heap_fctl->nr_work) > 0); >>>>>>>> +recheck: >>>>>>>> + spin_lock(&heap_fctl->lock); >>>>>>>> + list_splice_init(&heap_fctl->works, &workers); >>>>>>>> + spin_unlock(&heap_fctl->lock); >>>>>>>> + >>>>>>>> + if (unlikely(kthread_should_stop())) { >>>>>>>> + list_for_each_entry_safe(worker, tmp, &workers, >>>>>>>> list) { >>>>>>>> + list_del(&worker->list); >>>>>>>> + destroy_file_work(worker); >>>>>>>> + } >>>>>>>> + break; >>>>>>>> + } >>>>>>>> + >>>>>>>> + nr_work = 0; >>>>>>>> + list_for_each_entry_safe(worker, tmp, &workers, list) { >>>>>>>> + ++nr_work; >>>>>>>> + list_del(&worker->list); >>>>>>>> + __work_this_io(worker); >>>>>>>> + >>>>>>>> + destroy_file_work(worker); >>>>>>>> + } >>>>>>>> + atomic_sub(nr_work, &heap_fctl->nr_work); >>>>>>>> + >>>>>>>> + if (atomic_read(&heap_fctl->nr_work) > 0) >>>>>>>> + goto recheck; >>>>>>>> + } >>>>>>>> + return 0; >>>>>>>> +} >>>>>>>> + >>>>>>>> +size_t dma_heap_file_size(struct dma_heap_file *heap_file) >>>>>>>> +{ >>>>>>>> + return heap_file->fsz; >>>>>>>> +} >>>>>>>> + >>>>>>>> +static int prepare_dma_heap_file(struct dma_heap_file >>>>>>>> *heap_file, int file_fd, >>>>>>>> + size_t batch) >>>>>>>> +{ >>>>>>>> + struct file *file; >>>>>>>> + size_t fsz; >>>>>>>> + int ret; >>>>>>>> + >>>>>>>> + file = fget(file_fd); >>>>>>>> + if (!file) >>>>>>>> + return -EINVAL; >>>>>>>> + >>>>>>>> + fsz = i_size_read(file_inode(file)); >>>>>>>> + if (fsz < batch) { >>>>>>>> + ret = -EINVAL; >>>>>>>> + goto err; >>>>>>>> + } >>>>>>>> + >>>>>>>> + /** >>>>>>>> + * Selinux block our read, but actually we are reading the >>>>>>>> stand-in >>>>>>>> + * for this file. >>>>>>>> + * So save current's cred and when going to read, override >>>>>>>> mine, and >>>>>>>> + * end of read, revert. >>>>>>>> + */ >>>>>>>> + heap_file->cred = prepare_kernel_cred(current); >>>>>>>> + if (unlikely(!heap_file->cred)) { >>>>>>>> + ret = -ENOMEM; >>>>>>>> + goto err; >>>>>>>> + } >>>>>>>> + >>>>>>>> + heap_file->file = file; >>>>>>>> + heap_file->max_batch = batch; >>>>>>>> + heap_file->fsz = fsz; >>>>>>>> + >>>>>>>> + heap_file->direct = file->f_flags & O_DIRECT; >>>>>>>> + >>>>>>>> +#define DMA_HEAP_SUGGEST_DIRECT_IO_SIZE (1UL << 30) >>>>>>>> + if (!heap_file->direct && fsz >= >>>>>>>> DMA_HEAP_SUGGEST_DIRECT_IO_SIZE) >>>>>>>> + pr_warn("alloc read file better to use O_DIRECT to >>>>>>>> read larget file\n"); >>>>>>>> + >>>>>>>> + return 0; >>>>>>>> + >>>>>>>> +err: >>>>>>>> + fput(file); >>>>>>>> + return ret; >>>>>>>> +} >>>>>>>> + >>>>>>>> +static void destroy_dma_heap_file(struct dma_heap_file >>>>>>>> *heap_file) >>>>>>>> +{ >>>>>>>> + fput(heap_file->file); >>>>>>>> + put_cred(heap_file->cred); >>>>>>>> +} >>>>>>>> + >>>>>>>> +static int dma_heap_buffer_alloc_read_file(struct dma_heap >>>>>>>> *heap, int file_fd, >>>>>>>> + size_t batch, unsigned int fd_flags, >>>>>>>> + unsigned int heap_flags) >>>>>>>> +{ >>>>>>>> + struct dma_buf *dmabuf; >>>>>>>> + int fd; >>>>>>>> + struct dma_heap_file heap_file; >>>>>>>> + >>>>>>>> + fd = prepare_dma_heap_file(&heap_file, file_fd, batch); >>>>>>>> + if (fd) >>>>>>>> + goto error_file; >>>>>>>> + >>>>>>>> + dmabuf = heap->ops->allocate_read_file(heap, &heap_file, >>>>>>>> fd_flags, >>>>>>>> + heap_flags); >>>>>>>> + if (IS_ERR(dmabuf)) { >>>>>>>> + fd = PTR_ERR(dmabuf); >>>>>>>> + goto error; >>>>>>>> + } >>>>>>>> + >>>>>>>> + fd = dma_buf_fd(dmabuf, fd_flags); >>>>>>>> + if (fd < 0) { >>>>>>>> + dma_buf_put(dmabuf); >>>>>>>> + /* just return, as put will call release and that will >>>>>>>> free */ >>>>>>>> + } >>>>>>>> + >>>>>>>> +error: >>>>>>>> + destroy_dma_heap_file(&heap_file); >>>>>>>> +error_file: >>>>>>>> + return fd; >>>>>>>> +} >>>>>>>> + >>>>>>>> static int dma_heap_buffer_alloc(struct dma_heap *heap, >>>>>>>> size_t len, >>>>>>>> u32 fd_flags, >>>>>>>> u64 heap_flags) >>>>>>>> @@ -93,6 +545,38 @@ static int dma_heap_open(struct inode >>>>>>>> *inode, struct file *file) >>>>>>>> return 0; >>>>>>>> } >>>>>>>> +static long dma_heap_ioctl_allocate_read_file(struct file >>>>>>>> *file, void *data) >>>>>>>> +{ >>>>>>>> + struct dma_heap_allocation_file_data *heap_allocation_file >>>>>>>> = data; >>>>>>>> + struct dma_heap *heap = file->private_data; >>>>>>>> + int fd; >>>>>>>> + >>>>>>>> + if (heap_allocation_file->fd || >>>>>>>> !heap_allocation_file->file_fd) >>>>>>>> + return -EINVAL; >>>>>>>> + >>>>>>>> + if (heap_allocation_file->fd_flags & >>>>>>>> ~DMA_HEAP_VALID_FD_FLAGS) >>>>>>>> + return -EINVAL; >>>>>>>> + >>>>>>>> + if (heap_allocation_file->heap_flags & >>>>>>>> ~DMA_HEAP_VALID_HEAP_FLAGS) >>>>>>>> + return -EINVAL; >>>>>>>> + >>>>>>>> + if (!heap->ops->allocate_read_file) >>>>>>>> + return -EINVAL; >>>>>>>> + >>>>>>>> + fd = dma_heap_buffer_alloc_read_file( >>>>>>>> + heap, heap_allocation_file->file_fd, >>>>>>>> + heap_allocation_file->batch ? >>>>>>>> + PAGE_ALIGN(heap_allocation_file->batch) : >>>>>>>> + DEFAULT_ADI_BATCH, >>>>>>>> + heap_allocation_file->fd_flags, >>>>>>>> + heap_allocation_file->heap_flags); >>>>>>>> + if (fd < 0) >>>>>>>> + return fd; >>>>>>>> + >>>>>>>> + heap_allocation_file->fd = fd; >>>>>>>> + return 0; >>>>>>>> +} >>>>>>>> + >>>>>>>> static long dma_heap_ioctl_allocate(struct file *file, void >>>>>>>> *data) >>>>>>>> { >>>>>>>> struct dma_heap_allocation_data *heap_allocation = data; >>>>>>>> @@ -121,6 +605,7 @@ static long dma_heap_ioctl_allocate(struct >>>>>>>> file *file, void *data) >>>>>>>> static unsigned int dma_heap_ioctl_cmds[] = { >>>>>>>> DMA_HEAP_IOCTL_ALLOC, >>>>>>>> + DMA_HEAP_IOCTL_ALLOC_AND_READ, >>>>>>>> }; >>>>>>>> static long dma_heap_ioctl(struct file *file, unsigned int >>>>>>>> ucmd, >>>>>>>> @@ -170,6 +655,9 @@ static long dma_heap_ioctl(struct file >>>>>>>> *file, unsigned int ucmd, >>>>>>>> case DMA_HEAP_IOCTL_ALLOC: >>>>>>>> ret = dma_heap_ioctl_allocate(file, kdata); >>>>>>>> break; >>>>>>>> + case DMA_HEAP_IOCTL_ALLOC_AND_READ: >>>>>>>> + ret = dma_heap_ioctl_allocate_read_file(file, kdata); >>>>>>>> + break; >>>>>>>> default: >>>>>>>> ret = -ENOTTY; >>>>>>>> goto err; >>>>>>>> @@ -316,11 +804,44 @@ static int dma_heap_init(void) >>>>>>>> dma_heap_class = class_create(DEVNAME); >>>>>>>> if (IS_ERR(dma_heap_class)) { >>>>>>>> - unregister_chrdev_region(dma_heap_devt, NUM_HEAP_MINORS); >>>>>>>> - return PTR_ERR(dma_heap_class); >>>>>>>> + ret = PTR_ERR(dma_heap_class); >>>>>>>> + goto fail_class; >>>>>>>> } >>>>>>>> dma_heap_class->devnode = dma_heap_devnode; >>>>>>>> + heap_fctl = kzalloc(sizeof(*heap_fctl), GFP_KERNEL); >>>>>>>> + if (unlikely(!heap_fctl)) { >>>>>>>> + ret = -ENOMEM; >>>>>>>> + goto fail_alloc; >>>>>>>> + } >>>>>>>> + >>>>>>>> + INIT_LIST_HEAD(&heap_fctl->works); >>>>>>>> + init_waitqueue_head(&heap_fctl->threadwq); >>>>>>>> + init_waitqueue_head(&heap_fctl->workwq); >>>>>>>> + >>>>>>>> + heap_fctl->work_thread = >>>>>>>> kthread_run(dma_heap_file_control_thread, >>>>>>>> + heap_fctl, "heap_fwork_t"); >>>>>>>> + if (IS_ERR(heap_fctl->work_thread)) { >>>>>>>> + ret = -ENOMEM; >>>>>>>> + goto fail_thread; >>>>>>>> + } >>>>>>>> + >>>>>>>> + heap_fctl->heap_fwork_cachep = >>>>>>>> KMEM_CACHE(dma_heap_file_work, 0); >>>>>>>> + if (unlikely(!heap_fctl->heap_fwork_cachep)) { >>>>>>>> + ret = -ENOMEM; >>>>>>>> + goto fail_cache; >>>>>>>> + } >>>>>>>> + >>>>>>>> return 0; >>>>>>>> + >>>>>>>> +fail_cache: >>>>>>>> + kthread_stop(heap_fctl->work_thread); >>>>>>>> +fail_thread: >>>>>>>> + kfree(heap_fctl); >>>>>>>> +fail_alloc: >>>>>>>> + class_destroy(dma_heap_class); >>>>>>>> +fail_class: >>>>>>>> + unregister_chrdev_region(dma_heap_devt, NUM_HEAP_MINORS); >>>>>>>> + return ret; >>>>>>>> } >>>>>>>> subsys_initcall(dma_heap_init); >>>>>>>> diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h >>>>>>>> index 064bad725061..9c25383f816c 100644 >>>>>>>> --- a/include/linux/dma-heap.h >>>>>>>> +++ b/include/linux/dma-heap.h >>>>>>>> @@ -12,12 +12,17 @@ >>>>>>>> #include <linux/cdev.h> >>>>>>>> #include <linux/types.h> >>>>>>>> +#define DEFAULT_ADI_BATCH (128 << 20) >>>>>>>> + >>>>>>>> struct dma_heap; >>>>>>>> +struct dma_heap_file_task; >>>>>>>> +struct dma_heap_file; >>>>>>>> /** >>>>>>>> * struct dma_heap_ops - ops to operate on a given heap >>>>>>>> * @allocate: allocate dmabuf and return struct >>>>>>>> dma_buf ptr >>>>>>>> - * >>>>>>>> + * @allocate_read_file: allocate dmabuf and read file, then >>>>>>>> return struct >>>>>>>> + * dma_buf ptr. >>>>>>>> * allocate returns dmabuf on success, ERR_PTR(-errno) on error. >>>>>>>> */ >>>>>>>> struct dma_heap_ops { >>>>>>>> @@ -25,6 +30,11 @@ struct dma_heap_ops { >>>>>>>> unsigned long len, >>>>>>>> u32 fd_flags, >>>>>>>> u64 heap_flags); >>>>>>>> + >>>>>>>> + struct dma_buf *(*allocate_read_file)(struct dma_heap *heap, >>>>>>>> + struct dma_heap_file *heap_file, >>>>>>>> + u32 fd_flags, >>>>>>>> + u64 heap_flags); >>>>>>>> }; >>>>>>>> /** >>>>>>>> @@ -65,4 +75,49 @@ const char *dma_heap_get_name(struct >>>>>>>> dma_heap *heap); >>>>>>>> */ >>>>>>>> struct dma_heap *dma_heap_add(const struct >>>>>>>> dma_heap_export_info *exp_info); >>>>>>>> +/** >>>>>>>> + * dma_heap_destroy_file_read - waits for a file read to >>>>>>>> complete then destroy it >>>>>>>> + * Returns: true if the file read failed, false otherwise >>>>>>>> + */ >>>>>>>> +bool dma_heap_destroy_file_read(struct dma_heap_file_task >>>>>>>> *heap_ftask); >>>>>>>> + >>>>>>>> +/** >>>>>>>> + * dma_heap_wait_for_file_read - waits for a file read to >>>>>>>> complete >>>>>>>> + * Returns: true if the file read failed, false otherwise >>>>>>>> + */ >>>>>>>> +bool dma_heap_wait_for_file_read(struct dma_heap_file_task >>>>>>>> *heap_ftask); >>>>>>>> + >>>>>>>> +/** >>>>>>>> + * dma_heap_alloc_file_read - Declare a task to read file when >>>>>>>> allocate pages. >>>>>>>> + * @heap_file: target file to read >>>>>>>> + * >>>>>>>> + * Return NULL if failed, otherwise return a struct pointer. >>>>>>>> + */ >>>>>>>> +struct dma_heap_file_task * >>>>>>>> +dma_heap_declare_file_read(struct dma_heap_file *heap_file); >>>>>>>> + >>>>>>>> +/** >>>>>>>> + * dma_heap_prepare_file_read - cache each allocated page >>>>>>>> until we meet this batch. >>>>>>>> + * @heap_ftask: prepared and need to commit's work. >>>>>>>> + * @page: current allocated page. don't care which order. >>>>>>>> + * >>>>>>>> + * Returns true if reach to batch, false so go on prepare. >>>>>>>> + */ >>>>>>>> +bool dma_heap_prepare_file_read(struct dma_heap_file_task >>>>>>>> *heap_ftask, >>>>>>>> + struct page *page); >>>>>>>> + >>>>>>>> +/** >>>>>>>> + * dma_heap_commit_file_read - prepare collect enough memory, >>>>>>>> going to trigger IO >>>>>>>> + * @heap_ftask: info that current IO needs >>>>>>>> + * >>>>>>>> + * This commit will also check if reach to tail read. >>>>>>>> + * For direct I/O submissions, it is necessary to pay >>>>>>>> attention to file reads >>>>>>>> + * that are not page-aligned. For the unaligned portion of the >>>>>>>> read, buffer IO >>>>>>>> + * needs to be triggered. >>>>>>>> + * Returns: >>>>>>>> + * 0 if all right, -errno if something wrong >>>>>>>> + */ >>>>>>>> +int dma_heap_submit_file_read(struct dma_heap_file_task >>>>>>>> *heap_ftask); >>>>>>>> +size_t dma_heap_file_size(struct dma_heap_file *heap_file); >>>>>>>> + >>>>>>>> #endif /* _DMA_HEAPS_H */ >>>>>>>> diff --git a/include/uapi/linux/dma-heap.h >>>>>>>> b/include/uapi/linux/dma-heap.h >>>>>>>> index a4cf716a49fa..8c20e8b74eed 100644 >>>>>>>> --- a/include/uapi/linux/dma-heap.h >>>>>>>> +++ b/include/uapi/linux/dma-heap.h >>>>>>>> @@ -39,6 +39,27 @@ struct dma_heap_allocation_data { >>>>>>>> __u64 heap_flags; >>>>>>>> }; >>>>>>>> +/** >>>>>>>> + * struct dma_heap_allocation_file_data - metadata passed from >>>>>>>> userspace for >>>>>>>> + * allocations and read file >>>>>>>> + * @fd: will be populated with a fd which provides the >>>>>>>> + * �� handle to the allocated dma-buf >>>>>>>> + * @file_fd: file descriptor to read from(suggested to >>>>>>>> use O_DIRECT open file) >>>>>>>> + * @batch: how many memory alloced then file >>>>>>>> read(bytes), default 128MB >>>>>>>> + * will auto aligned to PAGE_SIZE >>>>>>>> + * @fd_flags: file descriptor flags used when allocating >>>>>>>> + * @heap_flags: flags passed to heap >>>>>>>> + * >>>>>>>> + * Provided by userspace as an argument to the ioctl >>>>>>>> + */ >>>>>>>> +struct dma_heap_allocation_file_data { >>>>>>>> + __u32 fd; >>>>>>>> + __u32 file_fd; >>>>>>>> + __u32 batch; >>>>>>>> + __u32 fd_flags; >>>>>>>> + __u64 heap_flags; >>>>>>>> +}; >>>>>>>> + >>>>>>>> #define DMA_HEAP_IOC_MAGIC 'H' >>>>>>>> /** >>>>>>>> @@ -50,4 +71,15 @@ struct dma_heap_allocation_data { >>>>>>>> #define DMA_HEAP_IOCTL_ALLOC _IOWR(DMA_HEAP_IOC_MAGIC, 0x0,\ >>>>>>>> struct dma_heap_allocation_data) >>>>>>>> +/** >>>>>>>> + * DOC: DMA_HEAP_IOCTL_ALLOC_AND_READ - allocate memory from >>>>>>>> pool and both >>>>>>>> + * read file when allocate memory. >>>>>>>> + * >>>>>>>> + * Takes a dma_heap_allocation_file_data struct and returns it >>>>>>>> with the fd field >>>>>>>> + * populated with the dmabuf handle of the allocation. When >>>>>>>> return, the dma-buf >>>>>>>> + * content is read from file. >>>>>>>> + */ >>>>>>>> +#define DMA_HEAP_IOCTL_ALLOC_AND_READ \ >>>>>>>> + _IOWR(DMA_HEAP_IOC_MAGIC, 0x1, struct >>>>>>>> dma_heap_allocation_file_data) >>>>>>>> + >>>>>>>> #endif /* _UAPI_LINUX_DMABUF_POOL_H */ >>>>>>> >>>>> >>

2 years

Re: [PATCH 1/2] dma-buf: heaps: DMA_HEAP_IOCTL_ALLOC_READ_FILE framework

by Christian König

Am 12.07.24 um 04:14 schrieb Huan Yang: > 在 2024/7/12 9:59, Huan Yang 写道: >> Hi Christian, >> >> 在 2024/7/11 19:39, Christian König 写道: >>> Am 11.07.24 um 11:18 schrieb Huan Yang: >>>> Hi Christian, >>>> >>>> Thanks for your reply. >>>> >>>> 在 2024/7/11 17:00, Christian König 写道: >>>>> Am 11.07.24 um 09:42 schrieb Huan Yang: >>>>>> Some user may need load file into dma-buf, current >>>>>> way is: >>>>>> 1. allocate a dma-buf, get dma-buf fd >>>>>> 2. mmap dma-buf fd into vaddr >>>>>> 3. read(file_fd, vaddr, fsz) >>>>>> This is too heavy if fsz reached to GB. >>>>> >>>>> You need to describe a bit more why that is to heavy. I can only >>>>> assume you need to save memory bandwidth and avoid the extra copy >>>>> with the CPU. >>>> >>>> Sorry for the oversimplified explanation. But, yes, you're right, >>>> we want to avoid this. >>>> >>>> As we are dealing with embedded devices, the available memory and >>>> computing power for users are usually limited.(The maximum >>>> available memory is currently >>>> >>>> 24GB, typically ranging from 8-12GB. ) >>>> >>>> Also, the CPU computing power is also usually in short supply, due >>>> to limited battery capacity and limited heat dissipation capabilities. >>>> >>>> So, we hope to avoid ineffective paths as much as possible. >>>> >>>>> >>>>>> This patch implement a feature called >>>>>> DMA_HEAP_IOCTL_ALLOC_READ_FILE. >>>>>> User need to offer a file_fd which you want to load into dma-buf, >>>>>> then, >>>>>> it promise if you got a dma-buf fd, it will contains the file >>>>>> content. >>>>> >>>>> Interesting idea, that has at least more potential than trying to >>>>> enable direct I/O on mmap()ed DMA-bufs. >>>>> >>>>> The approach with the new IOCTL might not work because it is a >>>>> very specialized use case. >>>> >>>> Thank you for your advice. maybe the "read file" behavior can be >>>> attached to an existing allocation? >>> >>> The point is there are already system calls to do something like that. >>> >>> See copy_file_range() >>> (https://man7.org/linux/man-pages/man2/copy_file_range.2.html) and >>> send_file() (https://man7.org/linux/man-pages/man2/sendfile.2.html). >> >> That's helpfull to learn it, thanks. >> >> In terms of only DMA-BUF supporting direct I/O, >> copy_file_range/send_file may help to achieve this functionality. >> >> However, my patchset also aims to achieve parallel copying of file >> contents while allocating the DMA-BUF, which is something that the >> current set of calls may not be able to accomplish. And exactly that is a no-go. Use the existing IOCTLs and system calls instead they should have similar performance when done right. Regards, Christian. > > You can see cover-letter, here are the normal test and this IOCTL's > compare in memory pressure, even if buffered I/O in this ioctl can > have 50% improve by parallel. > > dd a 3GB file for test, 12G RAM phone, UFS4.0, stressapptest 4G memory > pressure. > > 1. original > ```shel > # create a model file > dd if=/dev/zero of=./model.txt bs=1M count=3072 > # drop page cache > echo 3 > /proc/sys/vm/drop_caches > ./dmabuf-heap-file-read mtk_mm-uncached normal > >> result is total cost 13087213847ns > > ``` > > 2.DMA_HEAP_IOCTL_ALLOC_AND_READ O_DIRECT > ```shel > # create a model file > dd if=/dev/zero of=./model.txt bs=1M count=3072 > # drop page cache > echo 3 > /proc/sys/vm/drop_caches > ./dmabuf-heap-file-read mtk_mm-uncached direct_io > >> result is total cost 2902386846ns > > # use direct_io_check can check the content if is same to file. > ``` > > 3. DMA_HEAP_IOCTL_ALLOC_AND_READ BUFFER I/O > ```shel > # create a model file > dd if=/dev/zero of=./model.txt bs=1M count=3072 > # drop page cache > echo 3 > /proc/sys/vm/drop_caches > ./dmabuf-heap-file-read mtk_mm-uncached normal_io > >> result is total cost 5735579385ns > > ``` > >> >> Perhaps simply returning the DMA-BUF file descriptor and then >> implementing copy_file_range, while populating the memory and content >> during the copy process, could achieve this? At present, it seems >> that it will be quite complex - We need to ensure that only the >> returned DMA-BUF file descriptor will fail in case of memory not >> fill, like mmap, vmap, attach, and so on. >> >>> >>> What we probably could do is to internally optimize those. >>> >>>> I am currently creating a new ioctl to remind the user that memory >>>> is being allocated and read, and I am also unsure >>>> >>>> whether it is appropriate to add additional parameters to the >>>> existing allocate behavior. >>>> >>>> Please, give me more suggestion. Thanks. >>>> >>>>> >>>>> But IIRC there was a copy_file_range callback in the >>>>> file_operations structure you could use for that. I'm just not >>>>> sure when and how that's used with the copy_file_range() system call. >>>> >>>> Sorry, I'm not familiar with this, but I will look into it. >>>> However, this type of callback function is not currently >>>> implemented when exporting >>>> >>>> the dma_buf file, which means that I need to implement the callback >>>> for it? >>> >>> If I'm not completely mistaken the copy_file_range, splice_read and >>> splice_write callbacks on the struct file_operations >>> (https://elixir.bootlin.com/linux/v6.10-rc7/source/include/linux/fs.h#L1999). >>> >>> Can be used to implement what you want to do. >> Yes. >>> >>> Regards, >>> Christian. >>> >>>> >>>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>>> >>>>>> Notice, file_fd depends on user how to open this file. So, both >>>>>> buffer >>>>>> I/O and Direct I/O is supported. >>>>>> >>>>>> Signed-off-by: Huan Yang <link(a)vivo.com> >>>>>> --- >>>>>> drivers/dma-buf/dma-heap.c | 525 >>>>>> +++++++++++++++++++++++++++++++++- >>>>>> include/linux/dma-heap.h | 57 +++- >>>>>> include/uapi/linux/dma-heap.h | 32 +++ >>>>>> 3 files changed, 611 insertions(+), 3 deletions(-) >>>>>> >>>>>> diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c >>>>>> index 2298ca5e112e..abe17281adb8 100644 >>>>>> --- a/drivers/dma-buf/dma-heap.c >>>>>> +++ b/drivers/dma-buf/dma-heap.c >>>>>> @@ -15,9 +15,11 @@ >>>>>> #include <linux/list.h> >>>>>> #include <linux/slab.h> >>>>>> #include <linux/nospec.h> >>>>>> +#include <linux/highmem.h> >>>>>> #include <linux/uaccess.h> >>>>>> #include <linux/syscalls.h> >>>>>> #include <linux/dma-heap.h> >>>>>> +#include <linux/vmalloc.h> >>>>>> #include <uapi/linux/dma-heap.h> >>>>>> #define DEVNAME "dma_heap" >>>>>> @@ -43,12 +45,462 @@ struct dma_heap { >>>>>> struct cdev heap_cdev; >>>>>> }; >>>>>> +/** >>>>>> + * struct dma_heap_file - wrap the file, read task for dma_heap >>>>>> allocate use. >>>>>> + * @file: file to read from. >>>>>> + * >>>>>> + * @cred: kthread use, user cred copy to use for the read. >>>>>> + * >>>>>> + * @max_batch: maximum batch size to read, if collect >>>>>> match batch, >>>>>> + * trigger read, default 128MB, must below file size. >>>>>> + * >>>>>> + * @fsz: file size. >>>>>> + * >>>>>> + * @direct: use direct IO? >>>>>> + */ >>>>>> +struct dma_heap_file { >>>>>> + struct file *file; >>>>>> + struct cred *cred; >>>>>> + size_t max_batch; >>>>>> + size_t fsz; >>>>>> + bool direct; >>>>>> +}; >>>>>> + >>>>>> +/** >>>>>> + * struct dma_heap_file_work - represents a dma_heap file read >>>>>> real work. >>>>>> + * @vaddr: contigous virtual address alloc by vmap, file >>>>>> read need. >>>>>> + * >>>>>> + * @start_size: file read start offset, same to >>>>>> @dma_heap_file_task->roffset. >>>>>> + * >>>>>> + * @need_size: file read need size, same to >>>>>> @dma_heap_file_task->rsize. >>>>>> + * >>>>>> + * @heap_file: file wrapper. >>>>>> + * >>>>>> + * @list: child node of @dma_heap_file_control->works. >>>>>> + * >>>>>> + * @refp: same @dma_heap_file_task->ref, if end of read, >>>>>> put ref. >>>>>> + * >>>>>> + * @failp: if any work io failed, set it true, pointp >>>>>> @dma_heap_file_task->fail. >>>>>> + */ >>>>>> +struct dma_heap_file_work { >>>>>> + void *vaddr; >>>>>> + ssize_t start_size; >>>>>> + ssize_t need_size; >>>>>> + struct dma_heap_file *heap_file; >>>>>> + struct list_head list; >>>>>> + atomic_t *refp; >>>>>> + bool *failp; >>>>>> +}; >>>>>> + >>>>>> +/** >>>>>> + * struct dma_heap_file_task - represents a dma_heap file read >>>>>> process >>>>>> + * @ref: current file work counter, if zero, allocate and >>>>>> read >>>>>> + * done. >>>>>> + * >>>>>> + * @roffset: last read offset, current prepared work' >>>>>> begin file >>>>>> + * start offset. >>>>>> + * >>>>>> + * @rsize: current allocated page size use to read, if >>>>>> reach rbatch, >>>>>> + * trigger commit. >>>>>> + * >>>>>> + * @rbatch: current prepared work's batch, below >>>>>> @dma_heap_file's >>>>>> + * batch. >>>>>> + * >>>>>> + * @heap_file: current dma_heap_file >>>>>> + * >>>>>> + * @parray: used for vmap, size is @dma_heap_file's >>>>>> batch's number >>>>>> + * pages.(this is maximum). Due to single thread file >>>>>> read, >>>>>> + * one page array reuse each work prepare is OK. >>>>>> + * Each index in parray is PAGE_SIZE.(vmap need) >>>>>> + * >>>>>> + * @pindex: current allocated page filled in @parray's >>>>>> index. >>>>>> + * >>>>>> + * @fail: any work failed when file read? >>>>>> + * >>>>>> + * dma_heap_file_task is the production of file read, will >>>>>> prepare each work >>>>>> + * during allocate dma_buf pages, if match current batch, then >>>>>> trigger commit >>>>>> + * and prepare next work. After all batch queued, user going on >>>>>> prepare dma_buf >>>>>> + * and so on, but before return dma_buf fd, need to wait file >>>>>> read end and >>>>>> + * check read result. >>>>>> + */ >>>>>> +struct dma_heap_file_task { >>>>>> + atomic_t ref; >>>>>> + size_t roffset; >>>>>> + size_t rsize; >>>>>> + size_t rbatch; >>>>>> + struct dma_heap_file *heap_file; >>>>>> + struct page **parray; >>>>>> + unsigned int pindex; >>>>>> + bool fail; >>>>>> +}; >>>>>> + >>>>>> +/** >>>>>> + * struct dma_heap_file_control - global control of dma_heap >>>>>> file read. >>>>>> + * @works: @dma_heap_file_work's list head. >>>>>> + * >>>>>> + * @lock: only lock for @works. >>>>>> + * >>>>>> + * @threadwq: wait queue for @work_thread, if commit >>>>>> work, @work_thread >>>>>> + * wakeup and read this work's file contains. >>>>>> + * >>>>>> + * @workwq: used for main thread wait for file read end, >>>>>> if allocation >>>>>> + * end before file read. @dma_heap_file_task ref >>>>>> effect this. >>>>>> + * >>>>>> + * @work_thread: file read kthread. the dma_heap_file_task >>>>>> work's consumer. >>>>>> + * >>>>>> + * @heap_fwork_cachep: @dma_heap_file_work's cachep, it's >>>>>> alloc/free frequently. >>>>>> + * >>>>>> + * @nr_work: global number of how many work committed. >>>>>> + */ >>>>>> +struct dma_heap_file_control { >>>>>> + struct list_head works; >>>>>> + spinlock_t lock; >>>>>> + wait_queue_head_t threadwq; >>>>>> + wait_queue_head_t workwq; >>>>>> + struct task_struct *work_thread; >>>>>> + struct kmem_cache *heap_fwork_cachep; >>>>>> + atomic_t nr_work; >>>>>> +}; >>>>>> + >>>>>> +static struct dma_heap_file_control *heap_fctl; >>>>>> static LIST_HEAD(heap_list); >>>>>> static DEFINE_MUTEX(heap_list_lock); >>>>>> static dev_t dma_heap_devt; >>>>>> static struct class *dma_heap_class; >>>>>> static DEFINE_XARRAY_ALLOC(dma_heap_minors); >>>>>> +/** >>>>>> + * map_pages_to_vaddr - map each scatter page into contiguous >>>>>> virtual address. >>>>>> + * @heap_ftask: prepared and need to commit's work. >>>>>> + * >>>>>> + * Cached pages need to trigger file read, this function map >>>>>> each scatter page >>>>>> + * into contiguous virtual address, so that file read can easy use. >>>>>> + * Now that we get vaddr page, cached pages can return to >>>>>> original user, so we >>>>>> + * will not effect dma-buf export even if file read not end. >>>>>> + */ >>>>>> +static void *map_pages_to_vaddr(struct dma_heap_file_task >>>>>> *heap_ftask) >>>>>> +{ >>>>>> + return vmap(heap_ftask->parray, heap_ftask->pindex, VM_MAP, >>>>>> + PAGE_KERNEL); >>>>>> +} >>>>>> + >>>>>> +bool dma_heap_prepare_file_read(struct dma_heap_file_task >>>>>> *heap_ftask, >>>>>> + struct page *page) >>>>>> +{ >>>>>> + struct page **array = heap_ftask->parray; >>>>>> + int index = heap_ftask->pindex; >>>>>> + int num = compound_nr(page), i; >>>>>> + unsigned long sz = page_size(page); >>>>>> + >>>>>> + heap_ftask->rsize += sz; >>>>>> + for (i = 0; i < num; ++i) >>>>>> + array[index++] = &page[i]; >>>>>> + heap_ftask->pindex = index; >>>>>> + >>>>>> + return heap_ftask->rsize >= heap_ftask->rbatch; >>>>>> +} >>>>>> + >>>>>> +static struct dma_heap_file_work * >>>>>> +init_file_work(struct dma_heap_file_task *heap_ftask) >>>>>> +{ >>>>>> + struct dma_heap_file_work *heap_fwork; >>>>>> + struct dma_heap_file *heap_file = heap_ftask->heap_file; >>>>>> + >>>>>> + if (READ_ONCE(heap_ftask->fail)) >>>>>> + return NULL; >>>>>> + >>>>>> + heap_fwork = kmem_cache_alloc(heap_fctl->heap_fwork_cachep, >>>>>> GFP_KERNEL); >>>>>> + if (unlikely(!heap_fwork)) >>>>>> + return NULL; >>>>>> + >>>>>> + heap_fwork->vaddr = map_pages_to_vaddr(heap_ftask); >>>>>> + if (unlikely(!heap_fwork->vaddr)) { >>>>>> + kmem_cache_free(heap_fctl->heap_fwork_cachep, heap_fwork); >>>>>> + return NULL; >>>>>> + } >>>>>> + >>>>>> + heap_fwork->heap_file = heap_file; >>>>>> + heap_fwork->start_size = heap_ftask->roffset; >>>>>> + heap_fwork->need_size = heap_ftask->rsize; >>>>>> + heap_fwork->refp = &heap_ftask->ref; >>>>>> + heap_fwork->failp = &heap_ftask->fail; >>>>>> + atomic_inc(&heap_ftask->ref); >>>>>> + return heap_fwork; >>>>>> +} >>>>>> + >>>>>> +static void destroy_file_work(struct dma_heap_file_work >>>>>> *heap_fwork) >>>>>> +{ >>>>>> + vunmap(heap_fwork->vaddr); >>>>>> + atomic_dec(heap_fwork->refp); >>>>>> + wake_up(&heap_fctl->workwq); >>>>>> + >>>>>> + kmem_cache_free(heap_fctl->heap_fwork_cachep, heap_fwork); >>>>>> +} >>>>>> + >>>>>> +int dma_heap_submit_file_read(struct dma_heap_file_task >>>>>> *heap_ftask) >>>>>> +{ >>>>>> + struct dma_heap_file_work *heap_fwork = >>>>>> init_file_work(heap_ftask); >>>>>> + struct page *last = NULL; >>>>>> + struct dma_heap_file *heap_file = heap_ftask->heap_file; >>>>>> + size_t start = heap_ftask->roffset; >>>>>> + struct file *file = heap_file->file; >>>>>> + size_t fsz = heap_file->fsz; >>>>>> + >>>>>> + if (unlikely(!heap_fwork)) >>>>>> + return -ENOMEM; >>>>>> + >>>>>> + /** >>>>>> + * If file size is not page aligned, direct io can't process >>>>>> the tail. >>>>>> + * So, if reach to tail, remain the last page use buffer read. >>>>>> + */ >>>>>> + if (heap_file->direct && start + heap_ftask->rsize > fsz) { >>>>>> + heap_fwork->need_size -= PAGE_SIZE; >>>>>> + last = heap_ftask->parray[heap_ftask->pindex - 1]; >>>>>> + } >>>>>> + >>>>>> + spin_lock(&heap_fctl->lock); >>>>>> + list_add_tail(&heap_fwork->list, &heap_fctl->works); >>>>>> + spin_unlock(&heap_fctl->lock); >>>>>> + atomic_inc(&heap_fctl->nr_work); >>>>>> + >>>>>> + wake_up(&heap_fctl->threadwq); >>>>>> + >>>>>> + if (last) { >>>>>> + char *buf, *pathp; >>>>>> + ssize_t err; >>>>>> + void *buffer; >>>>>> + >>>>>> + buf = kmalloc(PATH_MAX, GFP_KERNEL); >>>>>> + if (unlikely(!buf)) >>>>>> + return -ENOMEM; >>>>>> + >>>>>> + start = PAGE_ALIGN_DOWN(fsz); >>>>>> + >>>>>> + pathp = file_path(file, buf, PATH_MAX); >>>>>> + if (IS_ERR(pathp)) { >>>>>> + kfree(buf); >>>>>> + return PTR_ERR(pathp); >>>>>> + } >>>>>> + >>>>>> + buffer = kmap_local_page(last); // use page's kaddr. >>>>>> + err = kernel_read_file_from_path(pathp, start, &buffer, >>>>>> + fsz - start, &fsz, >>>>>> + READING_POLICY); >>>>>> + kunmap_local(buffer); >>>>>> + kfree(buf); >>>>>> + if (err < 0) { >>>>>> + pr_err("failed to use buffer kernel_read_file %s, >>>>>> err=%ld, [%ld, %ld], f_sz=%ld\n", >>>>>> + pathp, err, start, fsz, fsz); >>>>>> + >>>>>> + return err; >>>>>> + } >>>>>> + } >>>>>> + >>>>>> + heap_ftask->roffset += heap_ftask->rsize; >>>>>> + heap_ftask->rsize = 0; >>>>>> + heap_ftask->pindex = 0; >>>>>> + heap_ftask->rbatch = min_t(size_t, >>>>>> + PAGE_ALIGN(fsz) - heap_ftask->roffset, >>>>>> + heap_ftask->rbatch); >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> +bool dma_heap_wait_for_file_read(struct dma_heap_file_task >>>>>> *heap_ftask) >>>>>> +{ >>>>>> + wait_event_freezable(heap_fctl->workwq, >>>>>> + atomic_read(&heap_ftask->ref) == 0); >>>>>> + return heap_ftask->fail; >>>>>> +} >>>>>> + >>>>>> +bool dma_heap_destroy_file_read(struct dma_heap_file_task >>>>>> *heap_ftask) >>>>>> +{ >>>>>> + bool fail; >>>>>> + >>>>>> + dma_heap_wait_for_file_read(heap_ftask); >>>>>> + fail = heap_ftask->fail; >>>>>> + kvfree(heap_ftask->parray); >>>>>> + kfree(heap_ftask); >>>>>> + return fail; >>>>>> +} >>>>>> + >>>>>> +struct dma_heap_file_task * >>>>>> +dma_heap_declare_file_read(struct dma_heap_file *heap_file) >>>>>> +{ >>>>>> + struct dma_heap_file_task *heap_ftask = >>>>>> + kzalloc(sizeof(*heap_ftask), GFP_KERNEL); >>>>>> + if (unlikely(!heap_ftask)) >>>>>> + return NULL; >>>>>> + >>>>>> + /** >>>>>> + * Batch is the maximum size which we prepare work will meet. >>>>>> + * So, direct alloc this number's page array is OK. >>>>>> + */ >>>>>> + heap_ftask->parray = kvmalloc_array(heap_file->max_batch >> >>>>>> PAGE_SHIFT, >>>>>> + sizeof(struct page *), GFP_KERNEL); >>>>>> + if (unlikely(!heap_ftask->parray)) >>>>>> + goto put; >>>>>> + >>>>>> + heap_ftask->heap_file = heap_file; >>>>>> + heap_ftask->rbatch = heap_file->max_batch; >>>>>> + return heap_ftask; >>>>>> +put: >>>>>> + kfree(heap_ftask); >>>>>> + return NULL; >>>>>> +} >>>>>> + >>>>>> +static void __work_this_io(struct dma_heap_file_work *heap_fwork) >>>>>> +{ >>>>>> + struct dma_heap_file *heap_file = heap_fwork->heap_file; >>>>>> + struct file *file = heap_file->file; >>>>>> + ssize_t start = heap_fwork->start_size; >>>>>> + ssize_t size = heap_fwork->need_size; >>>>>> + void *buffer = heap_fwork->vaddr; >>>>>> + const struct cred *old_cred; >>>>>> + ssize_t err; >>>>>> + >>>>>> + // use real task's cred to read this file. >>>>>> + old_cred = override_creds(heap_file->cred); >>>>>> + err = kernel_read_file(file, start, &buffer, size, >>>>>> &heap_file->fsz, >>>>>> + READING_POLICY); >>>>>> + if (err < 0) { >>>>>> + pr_err("use kernel_read_file, err=%ld, [%ld, %ld], >>>>>> f_sz=%ld\n", >>>>>> + err, start, (start + size), heap_file->fsz); >>>>>> + WRITE_ONCE(*heap_fwork->failp, true); >>>>>> + } >>>>>> + // recovery to my cred. >>>>>> + revert_creds(old_cred); >>>>>> +} >>>>>> + >>>>>> +static int dma_heap_file_control_thread(void *data) >>>>>> +{ >>>>>> + struct dma_heap_file_control *heap_fctl = >>>>>> + (struct dma_heap_file_control *)data; >>>>>> + struct dma_heap_file_work *worker, *tmp; >>>>>> + int nr_work; >>>>>> + >>>>>> + LIST_HEAD(pages); >>>>>> + LIST_HEAD(workers); >>>>>> + >>>>>> + while (true) { >>>>>> + wait_event_freezable(heap_fctl->threadwq, >>>>>> + atomic_read(&heap_fctl->nr_work) > 0); >>>>>> +recheck: >>>>>> + spin_lock(&heap_fctl->lock); >>>>>> + list_splice_init(&heap_fctl->works, &workers); >>>>>> + spin_unlock(&heap_fctl->lock); >>>>>> + >>>>>> + if (unlikely(kthread_should_stop())) { >>>>>> + list_for_each_entry_safe(worker, tmp, &workers, list) { >>>>>> + list_del(&worker->list); >>>>>> + destroy_file_work(worker); >>>>>> + } >>>>>> + break; >>>>>> + } >>>>>> + >>>>>> + nr_work = 0; >>>>>> + list_for_each_entry_safe(worker, tmp, &workers, list) { >>>>>> + ++nr_work; >>>>>> + list_del(&worker->list); >>>>>> + __work_this_io(worker); >>>>>> + >>>>>> + destroy_file_work(worker); >>>>>> + } >>>>>> + atomic_sub(nr_work, &heap_fctl->nr_work); >>>>>> + >>>>>> + if (atomic_read(&heap_fctl->nr_work) > 0) >>>>>> + goto recheck; >>>>>> + } >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> +size_t dma_heap_file_size(struct dma_heap_file *heap_file) >>>>>> +{ >>>>>> + return heap_file->fsz; >>>>>> +} >>>>>> + >>>>>> +static int prepare_dma_heap_file(struct dma_heap_file >>>>>> *heap_file, int file_fd, >>>>>> + size_t batch) >>>>>> +{ >>>>>> + struct file *file; >>>>>> + size_t fsz; >>>>>> + int ret; >>>>>> + >>>>>> + file = fget(file_fd); >>>>>> + if (!file) >>>>>> + return -EINVAL; >>>>>> + >>>>>> + fsz = i_size_read(file_inode(file)); >>>>>> + if (fsz < batch) { >>>>>> + ret = -EINVAL; >>>>>> + goto err; >>>>>> + } >>>>>> + >>>>>> + /** >>>>>> + * Selinux block our read, but actually we are reading the >>>>>> stand-in >>>>>> + * for this file. >>>>>> + * So save current's cred and when going to read, override >>>>>> mine, and >>>>>> + * end of read, revert. >>>>>> + */ >>>>>> + heap_file->cred = prepare_kernel_cred(current); >>>>>> + if (unlikely(!heap_file->cred)) { >>>>>> + ret = -ENOMEM; >>>>>> + goto err; >>>>>> + } >>>>>> + >>>>>> + heap_file->file = file; >>>>>> + heap_file->max_batch = batch; >>>>>> + heap_file->fsz = fsz; >>>>>> + >>>>>> + heap_file->direct = file->f_flags & O_DIRECT; >>>>>> + >>>>>> +#define DMA_HEAP_SUGGEST_DIRECT_IO_SIZE (1UL << 30) >>>>>> + if (!heap_file->direct && fsz >= >>>>>> DMA_HEAP_SUGGEST_DIRECT_IO_SIZE) >>>>>> + pr_warn("alloc read file better to use O_DIRECT to read >>>>>> larget file\n"); >>>>>> + >>>>>> + return 0; >>>>>> + >>>>>> +err: >>>>>> + fput(file); >>>>>> + return ret; >>>>>> +} >>>>>> + >>>>>> +static void destroy_dma_heap_file(struct dma_heap_file *heap_file) >>>>>> +{ >>>>>> + fput(heap_file->file); >>>>>> + put_cred(heap_file->cred); >>>>>> +} >>>>>> + >>>>>> +static int dma_heap_buffer_alloc_read_file(struct dma_heap >>>>>> *heap, int file_fd, >>>>>> + size_t batch, unsigned int fd_flags, >>>>>> + unsigned int heap_flags) >>>>>> +{ >>>>>> + struct dma_buf *dmabuf; >>>>>> + int fd; >>>>>> + struct dma_heap_file heap_file; >>>>>> + >>>>>> + fd = prepare_dma_heap_file(&heap_file, file_fd, batch); >>>>>> + if (fd) >>>>>> + goto error_file; >>>>>> + >>>>>> + dmabuf = heap->ops->allocate_read_file(heap, &heap_file, >>>>>> fd_flags, >>>>>> + heap_flags); >>>>>> + if (IS_ERR(dmabuf)) { >>>>>> + fd = PTR_ERR(dmabuf); >>>>>> + goto error; >>>>>> + } >>>>>> + >>>>>> + fd = dma_buf_fd(dmabuf, fd_flags); >>>>>> + if (fd < 0) { >>>>>> + dma_buf_put(dmabuf); >>>>>> + /* just return, as put will call release and that will >>>>>> free */ >>>>>> + } >>>>>> + >>>>>> +error: >>>>>> + destroy_dma_heap_file(&heap_file); >>>>>> +error_file: >>>>>> + return fd; >>>>>> +} >>>>>> + >>>>>> static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t >>>>>> len, >>>>>> u32 fd_flags, >>>>>> u64 heap_flags) >>>>>> @@ -93,6 +545,38 @@ static int dma_heap_open(struct inode *inode, >>>>>> struct file *file) >>>>>> return 0; >>>>>> } >>>>>> +static long dma_heap_ioctl_allocate_read_file(struct file >>>>>> *file, void *data) >>>>>> +{ >>>>>> + struct dma_heap_allocation_file_data *heap_allocation_file = >>>>>> data; >>>>>> + struct dma_heap *heap = file->private_data; >>>>>> + int fd; >>>>>> + >>>>>> + if (heap_allocation_file->fd || !heap_allocation_file->file_fd) >>>>>> + return -EINVAL; >>>>>> + >>>>>> + if (heap_allocation_file->fd_flags & ~DMA_HEAP_VALID_FD_FLAGS) >>>>>> + return -EINVAL; >>>>>> + >>>>>> + if (heap_allocation_file->heap_flags & >>>>>> ~DMA_HEAP_VALID_HEAP_FLAGS) >>>>>> + return -EINVAL; >>>>>> + >>>>>> + if (!heap->ops->allocate_read_file) >>>>>> + return -EINVAL; >>>>>> + >>>>>> + fd = dma_heap_buffer_alloc_read_file( >>>>>> + heap, heap_allocation_file->file_fd, >>>>>> + heap_allocation_file->batch ? >>>>>> + PAGE_ALIGN(heap_allocation_file->batch) : >>>>>> + DEFAULT_ADI_BATCH, >>>>>> + heap_allocation_file->fd_flags, >>>>>> + heap_allocation_file->heap_flags); >>>>>> + if (fd < 0) >>>>>> + return fd; >>>>>> + >>>>>> + heap_allocation_file->fd = fd; >>>>>> + return 0; >>>>>> +} >>>>>> + >>>>>> static long dma_heap_ioctl_allocate(struct file *file, void *data) >>>>>> { >>>>>> struct dma_heap_allocation_data *heap_allocation = data; >>>>>> @@ -121,6 +605,7 @@ static long dma_heap_ioctl_allocate(struct >>>>>> file *file, void *data) >>>>>> static unsigned int dma_heap_ioctl_cmds[] = { >>>>>> DMA_HEAP_IOCTL_ALLOC, >>>>>> + DMA_HEAP_IOCTL_ALLOC_AND_READ, >>>>>> }; >>>>>> static long dma_heap_ioctl(struct file *file, unsigned int ucmd, >>>>>> @@ -170,6 +655,9 @@ static long dma_heap_ioctl(struct file *file, >>>>>> unsigned int ucmd, >>>>>> case DMA_HEAP_IOCTL_ALLOC: >>>>>> ret = dma_heap_ioctl_allocate(file, kdata); >>>>>> break; >>>>>> + case DMA_HEAP_IOCTL_ALLOC_AND_READ: >>>>>> + ret = dma_heap_ioctl_allocate_read_file(file, kdata); >>>>>> + break; >>>>>> default: >>>>>> ret = -ENOTTY; >>>>>> goto err; >>>>>> @@ -316,11 +804,44 @@ static int dma_heap_init(void) >>>>>> dma_heap_class = class_create(DEVNAME); >>>>>> if (IS_ERR(dma_heap_class)) { >>>>>> - unregister_chrdev_region(dma_heap_devt, NUM_HEAP_MINORS); >>>>>> - return PTR_ERR(dma_heap_class); >>>>>> + ret = PTR_ERR(dma_heap_class); >>>>>> + goto fail_class; >>>>>> } >>>>>> dma_heap_class->devnode = dma_heap_devnode; >>>>>> + heap_fctl = kzalloc(sizeof(*heap_fctl), GFP_KERNEL); >>>>>> + if (unlikely(!heap_fctl)) { >>>>>> + ret = -ENOMEM; >>>>>> + goto fail_alloc; >>>>>> + } >>>>>> + >>>>>> + INIT_LIST_HEAD(&heap_fctl->works); >>>>>> + init_waitqueue_head(&heap_fctl->threadwq); >>>>>> + init_waitqueue_head(&heap_fctl->workwq); >>>>>> + >>>>>> + heap_fctl->work_thread = >>>>>> kthread_run(dma_heap_file_control_thread, >>>>>> + heap_fctl, "heap_fwork_t"); >>>>>> + if (IS_ERR(heap_fctl->work_thread)) { >>>>>> + ret = -ENOMEM; >>>>>> + goto fail_thread; >>>>>> + } >>>>>> + >>>>>> + heap_fctl->heap_fwork_cachep = >>>>>> KMEM_CACHE(dma_heap_file_work, 0); >>>>>> + if (unlikely(!heap_fctl->heap_fwork_cachep)) { >>>>>> + ret = -ENOMEM; >>>>>> + goto fail_cache; >>>>>> + } >>>>>> + >>>>>> return 0; >>>>>> + >>>>>> +fail_cache: >>>>>> + kthread_stop(heap_fctl->work_thread); >>>>>> +fail_thread: >>>>>> + kfree(heap_fctl); >>>>>> +fail_alloc: >>>>>> + class_destroy(dma_heap_class); >>>>>> +fail_class: >>>>>> + unregister_chrdev_region(dma_heap_devt, NUM_HEAP_MINORS); >>>>>> + return ret; >>>>>> } >>>>>> subsys_initcall(dma_heap_init); >>>>>> diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h >>>>>> index 064bad725061..9c25383f816c 100644 >>>>>> --- a/include/linux/dma-heap.h >>>>>> +++ b/include/linux/dma-heap.h >>>>>> @@ -12,12 +12,17 @@ >>>>>> #include <linux/cdev.h> >>>>>> #include <linux/types.h> >>>>>> +#define DEFAULT_ADI_BATCH (128 << 20) >>>>>> + >>>>>> struct dma_heap; >>>>>> +struct dma_heap_file_task; >>>>>> +struct dma_heap_file; >>>>>> /** >>>>>> * struct dma_heap_ops - ops to operate on a given heap >>>>>> * @allocate: allocate dmabuf and return struct dma_buf ptr >>>>>> - * >>>>>> + * @allocate_read_file: allocate dmabuf and read file, then >>>>>> return struct >>>>>> + * dma_buf ptr. >>>>>> * allocate returns dmabuf on success, ERR_PTR(-errno) on error. >>>>>> */ >>>>>> struct dma_heap_ops { >>>>>> @@ -25,6 +30,11 @@ struct dma_heap_ops { >>>>>> unsigned long len, >>>>>> u32 fd_flags, >>>>>> u64 heap_flags); >>>>>> + >>>>>> + struct dma_buf *(*allocate_read_file)(struct dma_heap *heap, >>>>>> + struct dma_heap_file *heap_file, >>>>>> + u32 fd_flags, >>>>>> + u64 heap_flags); >>>>>> }; >>>>>> /** >>>>>> @@ -65,4 +75,49 @@ const char *dma_heap_get_name(struct dma_heap >>>>>> *heap); >>>>>> */ >>>>>> struct dma_heap *dma_heap_add(const struct dma_heap_export_info >>>>>> *exp_info); >>>>>> +/** >>>>>> + * dma_heap_destroy_file_read - waits for a file read to >>>>>> complete then destroy it >>>>>> + * Returns: true if the file read failed, false otherwise >>>>>> + */ >>>>>> +bool dma_heap_destroy_file_read(struct dma_heap_file_task >>>>>> *heap_ftask); >>>>>> + >>>>>> +/** >>>>>> + * dma_heap_wait_for_file_read - waits for a file read to complete >>>>>> + * Returns: true if the file read failed, false otherwise >>>>>> + */ >>>>>> +bool dma_heap_wait_for_file_read(struct dma_heap_file_task >>>>>> *heap_ftask); >>>>>> + >>>>>> +/** >>>>>> + * dma_heap_alloc_file_read - Declare a task to read file when >>>>>> allocate pages. >>>>>> + * @heap_file: target file to read >>>>>> + * >>>>>> + * Return NULL if failed, otherwise return a struct pointer. >>>>>> + */ >>>>>> +struct dma_heap_file_task * >>>>>> +dma_heap_declare_file_read(struct dma_heap_file *heap_file); >>>>>> + >>>>>> +/** >>>>>> + * dma_heap_prepare_file_read - cache each allocated page until >>>>>> we meet this batch. >>>>>> + * @heap_ftask: prepared and need to commit's work. >>>>>> + * @page: current allocated page. don't care which order. >>>>>> + * >>>>>> + * Returns true if reach to batch, false so go on prepare. >>>>>> + */ >>>>>> +bool dma_heap_prepare_file_read(struct dma_heap_file_task >>>>>> *heap_ftask, >>>>>> + struct page *page); >>>>>> + >>>>>> +/** >>>>>> + * dma_heap_commit_file_read - prepare collect enough memory, >>>>>> going to trigger IO >>>>>> + * @heap_ftask: info that current IO needs >>>>>> + * >>>>>> + * This commit will also check if reach to tail read. >>>>>> + * For direct I/O submissions, it is necessary to pay attention >>>>>> to file reads >>>>>> + * that are not page-aligned. For the unaligned portion of the >>>>>> read, buffer IO >>>>>> + * needs to be triggered. >>>>>> + * Returns: >>>>>> + * 0 if all right, -errno if something wrong >>>>>> + */ >>>>>> +int dma_heap_submit_file_read(struct dma_heap_file_task >>>>>> *heap_ftask); >>>>>> +size_t dma_heap_file_size(struct dma_heap_file *heap_file); >>>>>> + >>>>>> #endif /* _DMA_HEAPS_H */ >>>>>> diff --git a/include/uapi/linux/dma-heap.h >>>>>> b/include/uapi/linux/dma-heap.h >>>>>> index a4cf716a49fa..8c20e8b74eed 100644 >>>>>> --- a/include/uapi/linux/dma-heap.h >>>>>> +++ b/include/uapi/linux/dma-heap.h >>>>>> @@ -39,6 +39,27 @@ struct dma_heap_allocation_data { >>>>>> __u64 heap_flags; >>>>>> }; >>>>>> +/** >>>>>> + * struct dma_heap_allocation_file_data - metadata passed from >>>>>> userspace for >>>>>> + * allocations and read file >>>>>> + * @fd: will be populated with a fd which provides the >>>>>> + * �� handle to the allocated dma-buf >>>>>> + * @file_fd: file descriptor to read from(suggested to >>>>>> use O_DIRECT open file) >>>>>> + * @batch: how many memory alloced then file read(bytes), >>>>>> default 128MB >>>>>> + * will auto aligned to PAGE_SIZE >>>>>> + * @fd_flags: file descriptor flags used when allocating >>>>>> + * @heap_flags: flags passed to heap >>>>>> + * >>>>>> + * Provided by userspace as an argument to the ioctl >>>>>> + */ >>>>>> +struct dma_heap_allocation_file_data { >>>>>> + __u32 fd; >>>>>> + __u32 file_fd; >>>>>> + __u32 batch; >>>>>> + __u32 fd_flags; >>>>>> + __u64 heap_flags; >>>>>> +}; >>>>>> + >>>>>> #define DMA_HEAP_IOC_MAGIC 'H' >>>>>> /** >>>>>> @@ -50,4 +71,15 @@ struct dma_heap_allocation_data { >>>>>> #define DMA_HEAP_IOCTL_ALLOC _IOWR(DMA_HEAP_IOC_MAGIC, 0x0,\ >>>>>> struct dma_heap_allocation_data) >>>>>> +/** >>>>>> + * DOC: DMA_HEAP_IOCTL_ALLOC_AND_READ - allocate memory from >>>>>> pool and both >>>>>> + * read file when allocate memory. >>>>>> + * >>>>>> + * Takes a dma_heap_allocation_file_data struct and returns it >>>>>> with the fd field >>>>>> + * populated with the dmabuf handle of the allocation. When >>>>>> return, the dma-buf >>>>>> + * content is read from file. >>>>>> + */ >>>>>> +#define DMA_HEAP_IOCTL_ALLOC_AND_READ \ >>>>>> + _IOWR(DMA_HEAP_IOC_MAGIC, 0x1, struct >>>>>> dma_heap_allocation_file_data) >>>>>> + >>>>>> #endif /* _UAPI_LINUX_DMABUF_POOL_H */ >>>>> >>>

2 years

Re: [PATCH 0/2] Support direct I/O read and write for memory allocated by dmabuf

by T.J. Mercier

On Wed, Jul 10, 2024 at 8:08 AM Lei Liu <liulei.rjpt(a)vivo.com> wrote: > > > on 2024/7/10 22:48, Christian König wrote: > > Am 10.07.24 um 16:35 schrieb Lei Liu: > >> > >> on 2024/7/10 22:14, Christian König wrote: > >>> Am 10.07.24 um 15:57 schrieb Lei Liu: > >>>> Use vm_insert_page to establish a mapping for the memory allocated > >>>> by dmabuf, thus supporting direct I/O read and write; and fix the > >>>> issue of incorrect memory statistics after mapping dmabuf memory. > >>> > >>> Well big NAK to that! Direct I/O is intentionally disabled on DMA-bufs. > >> > >> Hello! Could you explain why direct_io is disabled on DMABUF? Is > >> there any historical reason for this? > > > > It's basically one of the most fundamental design decision of DMA-Buf. > > The attachment/map/fence model DMA-buf uses is not really compatible > > with direct I/O on the underlying pages. > > Thank you! Is there any related documentation on this? I would like to > understand and learn more about the fundamental reasons for the lack of > support. Hi Lei and Christian, This is now the third request I've seen from three different companies who are interested in this, but the others are not for reasons of read performance that you mention in the commit message on your first patch. Someone else at Google ran a comparison between a normal read() and a direct I/O read() into a preallocated user buffer and found that with large readahead (16 MB) the throughput can actually be slightly higher than direct I/O. If you have concerns about read performance, have you tried increasing the readahead size? The other motivation is to load a gajillion byte file from disk into a dmabuf without evicting the entire contents of pagecache while doing so. Something like this (which does not currently work because read() tries to GUP on the dmabuf memory as you mention): static int dmabuf_heap_alloc(int heap_fd, size_t len) { struct dma_heap_allocation_data data = { .len = len, .fd = 0, .fd_flags = O_RDWR | O_CLOEXEC, .heap_flags = 0, }; int ret = ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &data); if (ret < 0) return ret; return data.fd; } int main(int, char **argv) { const char *file_path = argv[1]; printf("File: %s\n", file_path); int file_fd = open(file_path, O_RDONLY | O_DIRECT); struct stat st; stat(file_path, &st); ssize_t file_size = st.st_size; ssize_t aligned_size = (file_size + 4095) & ~4095; printf("File size: %zd Aligned size: %zd\n", file_size, aligned_size); int heap_fd = open("/dev/dma_heap/system", O_RDONLY); int dmabuf_fd = dmabuf_heap_alloc(heap_fd, aligned_size); void *vm = mmap(nullptr, aligned_size, PROT_READ | PROT_WRITE, MAP_SHARED, dmabuf_fd, 0); printf("VM at 0x%lx\n", (unsigned long)vm); dma_buf_sync sync_flags { DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE }; ioctl(dmabuf_fd, DMA_BUF_IOCTL_SYNC, &sync_flags); ssize_t rc = read(file_fd, vm, file_size); printf("Read: %zd %s\n", rc, rc < 0 ? strerror(errno) : ""); sync_flags.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE; ioctl(dmabuf_fd, DMA_BUF_IOCTL_SYNC, &sync_flags); } Or replace the mmap() + read() with sendfile(). So I would also like to see the above code (or something else similar) be able to work and I understand some of the reasons why it currently does not, but I don't understand why we should actively prevent this type of behavior entirely. Best, T.J. > > > >>> > >>> We already discussed enforcing that in the DMA-buf framework and > >>> this patch probably means that we should really do that. > >>> > >>> Regards, > >>> Christian. > >> > >> Thank you for your response. With the application of AI large model > >> edgeification, we urgently need support for direct_io on DMABUF to > >> read some very large files. Do you have any new solutions or plans > >> for this? > > > > We have seen similar projects over the years and all of those turned > > out to be complete shipwrecks. > > > > There is currently a patch set under discussion to give the network > > subsystem DMA-buf support. If you are interest in network direct I/O > > that could help. > > Is there a related introduction link for this patch? > > > > > Additional to that a lot of GPU drivers support userptr usages, e.g. > > to import malloced memory into the GPU driver. You can then also do > > direct I/O on that malloced memory and the kernel will enforce correct > > handling with the GPU driver through MMU notifiers. > > > > But as far as I know a general DMA-buf based solution isn't possible. > > 1.The reason we need to use DMABUF memory here is that we need to share > memory between the CPU and APU. Currently, only DMABUF memory is > suitable for this purpose. Additionally, we need to read very large files. > > 2. Are there any other solutions for this? Also, do you have any plans > to support direct_io for DMABUF memory in the future? > > > > > Regards, > > Christian. > > > >> > >> Regards, > >> Lei Liu. > >> > >>> > >>>> > >>>> Lei Liu (2): > >>>> mm: dmabuf_direct_io: Support direct_io for memory allocated by > >>>> dmabuf > >>>> mm: dmabuf_direct_io: Fix memory statistics error for dmabuf > >>>> allocated > >>>> memory with direct_io support > >>>> > >>>> drivers/dma-buf/heaps/system_heap.c | 5 +++-- > >>>> fs/proc/task_mmu.c | 8 +++++++- > >>>> include/linux/mm.h | 1 + > >>>> mm/memory.c | 15 ++++++++++----- > >>>> mm/rmap.c | 9 +++++---- > >>>> 5 files changed, 26 insertions(+), 12 deletions(-) > >>>> > >>> > >

2 years

Re: [PATCH 1/2] dma-buf: heaps: DMA_HEAP_IOCTL_ALLOC_READ_FILE framework

by Christian König

Am 11.07.24 um 11:18 schrieb Huan Yang: > Hi Christian, > > Thanks for your reply. > > 在 2024/7/11 17:00, Christian König 写道: >> Am 11.07.24 um 09:42 schrieb Huan Yang: >>> Some user may need load file into dma-buf, current >>> way is: >>> 1. allocate a dma-buf, get dma-buf fd >>> 2. mmap dma-buf fd into vaddr >>> 3. read(file_fd, vaddr, fsz) >>> This is too heavy if fsz reached to GB. >> >> You need to describe a bit more why that is to heavy. I can only >> assume you need to save memory bandwidth and avoid the extra copy >> with the CPU. > > Sorry for the oversimplified explanation. But, yes, you're right, we > want to avoid this. > > As we are dealing with embedded devices, the available memory and > computing power for users are usually limited.(The maximum available > memory is currently > > 24GB, typically ranging from 8-12GB. ) > > Also, the CPU computing power is also usually in short supply, due to > limited battery capacity and limited heat dissipation capabilities. > > So, we hope to avoid ineffective paths as much as possible. > >> >>> This patch implement a feature called DMA_HEAP_IOCTL_ALLOC_READ_FILE. >>> User need to offer a file_fd which you want to load into dma-buf, then, >>> it promise if you got a dma-buf fd, it will contains the file content. >> >> Interesting idea, that has at least more potential than trying to >> enable direct I/O on mmap()ed DMA-bufs. >> >> The approach with the new IOCTL might not work because it is a very >> specialized use case. > > Thank you for your advice. maybe the "read file" behavior can be > attached to an existing allocation? The point is there are already system calls to do something like that. See copy_file_range() (https://man7.org/linux/man-pages/man2/copy_file_range.2.html) and send_file() (https://man7.org/linux/man-pages/man2/sendfile.2.html). What we probably could do is to internally optimize those. > I am currently creating a new ioctl to remind the user that memory is > being allocated and read, and I am also unsure > > whether it is appropriate to add additional parameters to the existing > allocate behavior. > > Please, give me more suggestion. Thanks. > >> >> But IIRC there was a copy_file_range callback in the file_operations >> structure you could use for that. I'm just not sure when and how >> that's used with the copy_file_range() system call. > > Sorry, I'm not familiar with this, but I will look into it. However, > this type of callback function is not currently implemented when > exporting > > the dma_buf file, which means that I need to implement the callback > for it? If I'm not completely mistaken the copy_file_range, splice_read and splice_write callbacks on the struct file_operations (https://elixir.bootlin.com/linux/v6.10-rc7/source/include/linux/fs.h#L1999). Can be used to implement what you want to do. Regards, Christian. > >> >> Regards, >> Christian. >> >>> >>> Notice, file_fd depends on user how to open this file. So, both buffer >>> I/O and Direct I/O is supported. >>> >>> Signed-off-by: Huan Yang <link(a)vivo.com> >>> --- >>> drivers/dma-buf/dma-heap.c | 525 >>> +++++++++++++++++++++++++++++++++- >>> include/linux/dma-heap.h | 57 +++- >>> include/uapi/linux/dma-heap.h | 32 +++ >>> 3 files changed, 611 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c >>> index 2298ca5e112e..abe17281adb8 100644 >>> --- a/drivers/dma-buf/dma-heap.c >>> +++ b/drivers/dma-buf/dma-heap.c >>> @@ -15,9 +15,11 @@ >>> #include <linux/list.h> >>> #include <linux/slab.h> >>> #include <linux/nospec.h> >>> +#include <linux/highmem.h> >>> #include <linux/uaccess.h> >>> #include <linux/syscalls.h> >>> #include <linux/dma-heap.h> >>> +#include <linux/vmalloc.h> >>> #include <uapi/linux/dma-heap.h> >>> #define DEVNAME "dma_heap" >>> @@ -43,12 +45,462 @@ struct dma_heap { >>> struct cdev heap_cdev; >>> }; >>> +/** >>> + * struct dma_heap_file - wrap the file, read task for dma_heap >>> allocate use. >>> + * @file: file to read from. >>> + * >>> + * @cred: kthread use, user cred copy to use for the read. >>> + * >>> + * @max_batch: maximum batch size to read, if collect match >>> batch, >>> + * trigger read, default 128MB, must below file size. >>> + * >>> + * @fsz: file size. >>> + * >>> + * @direct: use direct IO? >>> + */ >>> +struct dma_heap_file { >>> + struct file *file; >>> + struct cred *cred; >>> + size_t max_batch; >>> + size_t fsz; >>> + bool direct; >>> +}; >>> + >>> +/** >>> + * struct dma_heap_file_work - represents a dma_heap file read real >>> work. >>> + * @vaddr: contigous virtual address alloc by vmap, file >>> read need. >>> + * >>> + * @start_size: file read start offset, same to >>> @dma_heap_file_task->roffset. >>> + * >>> + * @need_size: file read need size, same to >>> @dma_heap_file_task->rsize. >>> + * >>> + * @heap_file: file wrapper. >>> + * >>> + * @list: child node of @dma_heap_file_control->works. >>> + * >>> + * @refp: same @dma_heap_file_task->ref, if end of read, put >>> ref. >>> + * >>> + * @failp: if any work io failed, set it true, pointp >>> @dma_heap_file_task->fail. >>> + */ >>> +struct dma_heap_file_work { >>> + void *vaddr; >>> + ssize_t start_size; >>> + ssize_t need_size; >>> + struct dma_heap_file *heap_file; >>> + struct list_head list; >>> + atomic_t *refp; >>> + bool *failp; >>> +}; >>> + >>> +/** >>> + * struct dma_heap_file_task - represents a dma_heap file read process >>> + * @ref: current file work counter, if zero, allocate and read >>> + * done. >>> + * >>> + * @roffset: last read offset, current prepared work' begin >>> file >>> + * start offset. >>> + * >>> + * @rsize: current allocated page size use to read, if reach >>> rbatch, >>> + * trigger commit. >>> + * >>> + * @rbatch: current prepared work's batch, below >>> @dma_heap_file's >>> + * batch. >>> + * >>> + * @heap_file: current dma_heap_file >>> + * >>> + * @parray: used for vmap, size is @dma_heap_file's batch's >>> number >>> + * pages.(this is maximum). Due to single thread file read, >>> + * one page array reuse each work prepare is OK. >>> + * Each index in parray is PAGE_SIZE.(vmap need) >>> + * >>> + * @pindex: current allocated page filled in @parray's index. >>> + * >>> + * @fail: any work failed when file read? >>> + * >>> + * dma_heap_file_task is the production of file read, will prepare >>> each work >>> + * during allocate dma_buf pages, if match current batch, then >>> trigger commit >>> + * and prepare next work. After all batch queued, user going on >>> prepare dma_buf >>> + * and so on, but before return dma_buf fd, need to wait file read >>> end and >>> + * check read result. >>> + */ >>> +struct dma_heap_file_task { >>> + atomic_t ref; >>> + size_t roffset; >>> + size_t rsize; >>> + size_t rbatch; >>> + struct dma_heap_file *heap_file; >>> + struct page **parray; >>> + unsigned int pindex; >>> + bool fail; >>> +}; >>> + >>> +/** >>> + * struct dma_heap_file_control - global control of dma_heap file >>> read. >>> + * @works: @dma_heap_file_work's list head. >>> + * >>> + * @lock: only lock for @works. >>> + * >>> + * @threadwq: wait queue for @work_thread, if commit work, >>> @work_thread >>> + * wakeup and read this work's file contains. >>> + * >>> + * @workwq: used for main thread wait for file read end, if >>> allocation >>> + * end before file read. @dma_heap_file_task ref effect >>> this. >>> + * >>> + * @work_thread: file read kthread. the dma_heap_file_task >>> work's consumer. >>> + * >>> + * @heap_fwork_cachep: @dma_heap_file_work's cachep, it's >>> alloc/free frequently. >>> + * >>> + * @nr_work: global number of how many work committed. >>> + */ >>> +struct dma_heap_file_control { >>> + struct list_head works; >>> + spinlock_t lock; >>> + wait_queue_head_t threadwq; >>> + wait_queue_head_t workwq; >>> + struct task_struct *work_thread; >>> + struct kmem_cache *heap_fwork_cachep; >>> + atomic_t nr_work; >>> +}; >>> + >>> +static struct dma_heap_file_control *heap_fctl; >>> static LIST_HEAD(heap_list); >>> static DEFINE_MUTEX(heap_list_lock); >>> static dev_t dma_heap_devt; >>> static struct class *dma_heap_class; >>> static DEFINE_XARRAY_ALLOC(dma_heap_minors); >>> +/** >>> + * map_pages_to_vaddr - map each scatter page into contiguous >>> virtual address. >>> + * @heap_ftask: prepared and need to commit's work. >>> + * >>> + * Cached pages need to trigger file read, this function map each >>> scatter page >>> + * into contiguous virtual address, so that file read can easy use. >>> + * Now that we get vaddr page, cached pages can return to original >>> user, so we >>> + * will not effect dma-buf export even if file read not end. >>> + */ >>> +static void *map_pages_to_vaddr(struct dma_heap_file_task *heap_ftask) >>> +{ >>> + return vmap(heap_ftask->parray, heap_ftask->pindex, VM_MAP, >>> + PAGE_KERNEL); >>> +} >>> + >>> +bool dma_heap_prepare_file_read(struct dma_heap_file_task *heap_ftask, >>> + struct page *page) >>> +{ >>> + struct page **array = heap_ftask->parray; >>> + int index = heap_ftask->pindex; >>> + int num = compound_nr(page), i; >>> + unsigned long sz = page_size(page); >>> + >>> + heap_ftask->rsize += sz; >>> + for (i = 0; i < num; ++i) >>> + array[index++] = &page[i]; >>> + heap_ftask->pindex = index; >>> + >>> + return heap_ftask->rsize >= heap_ftask->rbatch; >>> +} >>> + >>> +static struct dma_heap_file_work * >>> +init_file_work(struct dma_heap_file_task *heap_ftask) >>> +{ >>> + struct dma_heap_file_work *heap_fwork; >>> + struct dma_heap_file *heap_file = heap_ftask->heap_file; >>> + >>> + if (READ_ONCE(heap_ftask->fail)) >>> + return NULL; >>> + >>> + heap_fwork = kmem_cache_alloc(heap_fctl->heap_fwork_cachep, >>> GFP_KERNEL); >>> + if (unlikely(!heap_fwork)) >>> + return NULL; >>> + >>> + heap_fwork->vaddr = map_pages_to_vaddr(heap_ftask); >>> + if (unlikely(!heap_fwork->vaddr)) { >>> + kmem_cache_free(heap_fctl->heap_fwork_cachep, heap_fwork); >>> + return NULL; >>> + } >>> + >>> + heap_fwork->heap_file = heap_file; >>> + heap_fwork->start_size = heap_ftask->roffset; >>> + heap_fwork->need_size = heap_ftask->rsize; >>> + heap_fwork->refp = &heap_ftask->ref; >>> + heap_fwork->failp = &heap_ftask->fail; >>> + atomic_inc(&heap_ftask->ref); >>> + return heap_fwork; >>> +} >>> + >>> +static void destroy_file_work(struct dma_heap_file_work *heap_fwork) >>> +{ >>> + vunmap(heap_fwork->vaddr); >>> + atomic_dec(heap_fwork->refp); >>> + wake_up(&heap_fctl->workwq); >>> + >>> + kmem_cache_free(heap_fctl->heap_fwork_cachep, heap_fwork); >>> +} >>> + >>> +int dma_heap_submit_file_read(struct dma_heap_file_task *heap_ftask) >>> +{ >>> + struct dma_heap_file_work *heap_fwork = >>> init_file_work(heap_ftask); >>> + struct page *last = NULL; >>> + struct dma_heap_file *heap_file = heap_ftask->heap_file; >>> + size_t start = heap_ftask->roffset; >>> + struct file *file = heap_file->file; >>> + size_t fsz = heap_file->fsz; >>> + >>> + if (unlikely(!heap_fwork)) >>> + return -ENOMEM; >>> + >>> + /** >>> + * If file size is not page aligned, direct io can't process >>> the tail. >>> + * So, if reach to tail, remain the last page use buffer read. >>> + */ >>> + if (heap_file->direct && start + heap_ftask->rsize > fsz) { >>> + heap_fwork->need_size -= PAGE_SIZE; >>> + last = heap_ftask->parray[heap_ftask->pindex - 1]; >>> + } >>> + >>> + spin_lock(&heap_fctl->lock); >>> + list_add_tail(&heap_fwork->list, &heap_fctl->works); >>> + spin_unlock(&heap_fctl->lock); >>> + atomic_inc(&heap_fctl->nr_work); >>> + >>> + wake_up(&heap_fctl->threadwq); >>> + >>> + if (last) { >>> + char *buf, *pathp; >>> + ssize_t err; >>> + void *buffer; >>> + >>> + buf = kmalloc(PATH_MAX, GFP_KERNEL); >>> + if (unlikely(!buf)) >>> + return -ENOMEM; >>> + >>> + start = PAGE_ALIGN_DOWN(fsz); >>> + >>> + pathp = file_path(file, buf, PATH_MAX); >>> + if (IS_ERR(pathp)) { >>> + kfree(buf); >>> + return PTR_ERR(pathp); >>> + } >>> + >>> + buffer = kmap_local_page(last); // use page's kaddr. >>> + err = kernel_read_file_from_path(pathp, start, &buffer, >>> + fsz - start, &fsz, >>> + READING_POLICY); >>> + kunmap_local(buffer); >>> + kfree(buf); >>> + if (err < 0) { >>> + pr_err("failed to use buffer kernel_read_file %s, >>> err=%ld, [%ld, %ld], f_sz=%ld\n", >>> + pathp, err, start, fsz, fsz); >>> + >>> + return err; >>> + } >>> + } >>> + >>> + heap_ftask->roffset += heap_ftask->rsize; >>> + heap_ftask->rsize = 0; >>> + heap_ftask->pindex = 0; >>> + heap_ftask->rbatch = min_t(size_t, >>> + PAGE_ALIGN(fsz) - heap_ftask->roffset, >>> + heap_ftask->rbatch); >>> + return 0; >>> +} >>> + >>> +bool dma_heap_wait_for_file_read(struct dma_heap_file_task >>> *heap_ftask) >>> +{ >>> + wait_event_freezable(heap_fctl->workwq, >>> + atomic_read(&heap_ftask->ref) == 0); >>> + return heap_ftask->fail; >>> +} >>> + >>> +bool dma_heap_destroy_file_read(struct dma_heap_file_task *heap_ftask) >>> +{ >>> + bool fail; >>> + >>> + dma_heap_wait_for_file_read(heap_ftask); >>> + fail = heap_ftask->fail; >>> + kvfree(heap_ftask->parray); >>> + kfree(heap_ftask); >>> + return fail; >>> +} >>> + >>> +struct dma_heap_file_task * >>> +dma_heap_declare_file_read(struct dma_heap_file *heap_file) >>> +{ >>> + struct dma_heap_file_task *heap_ftask = >>> + kzalloc(sizeof(*heap_ftask), GFP_KERNEL); >>> + if (unlikely(!heap_ftask)) >>> + return NULL; >>> + >>> + /** >>> + * Batch is the maximum size which we prepare work will meet. >>> + * So, direct alloc this number's page array is OK. >>> + */ >>> + heap_ftask->parray = kvmalloc_array(heap_file->max_batch >> >>> PAGE_SHIFT, >>> + sizeof(struct page *), GFP_KERNEL); >>> + if (unlikely(!heap_ftask->parray)) >>> + goto put; >>> + >>> + heap_ftask->heap_file = heap_file; >>> + heap_ftask->rbatch = heap_file->max_batch; >>> + return heap_ftask; >>> +put: >>> + kfree(heap_ftask); >>> + return NULL; >>> +} >>> + >>> +static void __work_this_io(struct dma_heap_file_work *heap_fwork) >>> +{ >>> + struct dma_heap_file *heap_file = heap_fwork->heap_file; >>> + struct file *file = heap_file->file; >>> + ssize_t start = heap_fwork->start_size; >>> + ssize_t size = heap_fwork->need_size; >>> + void *buffer = heap_fwork->vaddr; >>> + const struct cred *old_cred; >>> + ssize_t err; >>> + >>> + // use real task's cred to read this file. >>> + old_cred = override_creds(heap_file->cred); >>> + err = kernel_read_file(file, start, &buffer, size, >>> &heap_file->fsz, >>> + READING_POLICY); >>> + if (err < 0) { >>> + pr_err("use kernel_read_file, err=%ld, [%ld, %ld], >>> f_sz=%ld\n", >>> + err, start, (start + size), heap_file->fsz); >>> + WRITE_ONCE(*heap_fwork->failp, true); >>> + } >>> + // recovery to my cred. >>> + revert_creds(old_cred); >>> +} >>> + >>> +static int dma_heap_file_control_thread(void *data) >>> +{ >>> + struct dma_heap_file_control *heap_fctl = >>> + (struct dma_heap_file_control *)data; >>> + struct dma_heap_file_work *worker, *tmp; >>> + int nr_work; >>> + >>> + LIST_HEAD(pages); >>> + LIST_HEAD(workers); >>> + >>> + while (true) { >>> + wait_event_freezable(heap_fctl->threadwq, >>> + atomic_read(&heap_fctl->nr_work) > 0); >>> +recheck: >>> + spin_lock(&heap_fctl->lock); >>> + list_splice_init(&heap_fctl->works, &workers); >>> + spin_unlock(&heap_fctl->lock); >>> + >>> + if (unlikely(kthread_should_stop())) { >>> + list_for_each_entry_safe(worker, tmp, &workers, list) { >>> + list_del(&worker->list); >>> + destroy_file_work(worker); >>> + } >>> + break; >>> + } >>> + >>> + nr_work = 0; >>> + list_for_each_entry_safe(worker, tmp, &workers, list) { >>> + ++nr_work; >>> + list_del(&worker->list); >>> + __work_this_io(worker); >>> + >>> + destroy_file_work(worker); >>> + } >>> + atomic_sub(nr_work, &heap_fctl->nr_work); >>> + >>> + if (atomic_read(&heap_fctl->nr_work) > 0) >>> + goto recheck; >>> + } >>> + return 0; >>> +} >>> + >>> +size_t dma_heap_file_size(struct dma_heap_file *heap_file) >>> +{ >>> + return heap_file->fsz; >>> +} >>> + >>> +static int prepare_dma_heap_file(struct dma_heap_file *heap_file, >>> int file_fd, >>> + size_t batch) >>> +{ >>> + struct file *file; >>> + size_t fsz; >>> + int ret; >>> + >>> + file = fget(file_fd); >>> + if (!file) >>> + return -EINVAL; >>> + >>> + fsz = i_size_read(file_inode(file)); >>> + if (fsz < batch) { >>> + ret = -EINVAL; >>> + goto err; >>> + } >>> + >>> + /** >>> + * Selinux block our read, but actually we are reading the >>> stand-in >>> + * for this file. >>> + * So save current's cred and when going to read, override >>> mine, and >>> + * end of read, revert. >>> + */ >>> + heap_file->cred = prepare_kernel_cred(current); >>> + if (unlikely(!heap_file->cred)) { >>> + ret = -ENOMEM; >>> + goto err; >>> + } >>> + >>> + heap_file->file = file; >>> + heap_file->max_batch = batch; >>> + heap_file->fsz = fsz; >>> + >>> + heap_file->direct = file->f_flags & O_DIRECT; >>> + >>> +#define DMA_HEAP_SUGGEST_DIRECT_IO_SIZE (1UL << 30) >>> + if (!heap_file->direct && fsz >= DMA_HEAP_SUGGEST_DIRECT_IO_SIZE) >>> + pr_warn("alloc read file better to use O_DIRECT to read >>> larget file\n"); >>> + >>> + return 0; >>> + >>> +err: >>> + fput(file); >>> + return ret; >>> +} >>> + >>> +static void destroy_dma_heap_file(struct dma_heap_file *heap_file) >>> +{ >>> + fput(heap_file->file); >>> + put_cred(heap_file->cred); >>> +} >>> + >>> +static int dma_heap_buffer_alloc_read_file(struct dma_heap *heap, >>> int file_fd, >>> + size_t batch, unsigned int fd_flags, >>> + unsigned int heap_flags) >>> +{ >>> + struct dma_buf *dmabuf; >>> + int fd; >>> + struct dma_heap_file heap_file; >>> + >>> + fd = prepare_dma_heap_file(&heap_file, file_fd, batch); >>> + if (fd) >>> + goto error_file; >>> + >>> + dmabuf = heap->ops->allocate_read_file(heap, &heap_file, fd_flags, >>> + heap_flags); >>> + if (IS_ERR(dmabuf)) { >>> + fd = PTR_ERR(dmabuf); >>> + goto error; >>> + } >>> + >>> + fd = dma_buf_fd(dmabuf, fd_flags); >>> + if (fd < 0) { >>> + dma_buf_put(dmabuf); >>> + /* just return, as put will call release and that will free */ >>> + } >>> + >>> +error: >>> + destroy_dma_heap_file(&heap_file); >>> +error_file: >>> + return fd; >>> +} >>> + >>> static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len, >>> u32 fd_flags, >>> u64 heap_flags) >>> @@ -93,6 +545,38 @@ static int dma_heap_open(struct inode *inode, >>> struct file *file) >>> return 0; >>> } >>> +static long dma_heap_ioctl_allocate_read_file(struct file *file, >>> void *data) >>> +{ >>> + struct dma_heap_allocation_file_data *heap_allocation_file = data; >>> + struct dma_heap *heap = file->private_data; >>> + int fd; >>> + >>> + if (heap_allocation_file->fd || !heap_allocation_file->file_fd) >>> + return -EINVAL; >>> + >>> + if (heap_allocation_file->fd_flags & ~DMA_HEAP_VALID_FD_FLAGS) >>> + return -EINVAL; >>> + >>> + if (heap_allocation_file->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS) >>> + return -EINVAL; >>> + >>> + if (!heap->ops->allocate_read_file) >>> + return -EINVAL; >>> + >>> + fd = dma_heap_buffer_alloc_read_file( >>> + heap, heap_allocation_file->file_fd, >>> + heap_allocation_file->batch ? >>> + PAGE_ALIGN(heap_allocation_file->batch) : >>> + DEFAULT_ADI_BATCH, >>> + heap_allocation_file->fd_flags, >>> + heap_allocation_file->heap_flags); >>> + if (fd < 0) >>> + return fd; >>> + >>> + heap_allocation_file->fd = fd; >>> + return 0; >>> +} >>> + >>> static long dma_heap_ioctl_allocate(struct file *file, void *data) >>> { >>> struct dma_heap_allocation_data *heap_allocation = data; >>> @@ -121,6 +605,7 @@ static long dma_heap_ioctl_allocate(struct file >>> *file, void *data) >>> static unsigned int dma_heap_ioctl_cmds[] = { >>> DMA_HEAP_IOCTL_ALLOC, >>> + DMA_HEAP_IOCTL_ALLOC_AND_READ, >>> }; >>> static long dma_heap_ioctl(struct file *file, unsigned int ucmd, >>> @@ -170,6 +655,9 @@ static long dma_heap_ioctl(struct file *file, >>> unsigned int ucmd, >>> case DMA_HEAP_IOCTL_ALLOC: >>> ret = dma_heap_ioctl_allocate(file, kdata); >>> break; >>> + case DMA_HEAP_IOCTL_ALLOC_AND_READ: >>> + ret = dma_heap_ioctl_allocate_read_file(file, kdata); >>> + break; >>> default: >>> ret = -ENOTTY; >>> goto err; >>> @@ -316,11 +804,44 @@ static int dma_heap_init(void) >>> dma_heap_class = class_create(DEVNAME); >>> if (IS_ERR(dma_heap_class)) { >>> - unregister_chrdev_region(dma_heap_devt, NUM_HEAP_MINORS); >>> - return PTR_ERR(dma_heap_class); >>> + ret = PTR_ERR(dma_heap_class); >>> + goto fail_class; >>> } >>> dma_heap_class->devnode = dma_heap_devnode; >>> + heap_fctl = kzalloc(sizeof(*heap_fctl), GFP_KERNEL); >>> + if (unlikely(!heap_fctl)) { >>> + ret = -ENOMEM; >>> + goto fail_alloc; >>> + } >>> + >>> + INIT_LIST_HEAD(&heap_fctl->works); >>> + init_waitqueue_head(&heap_fctl->threadwq); >>> + init_waitqueue_head(&heap_fctl->workwq); >>> + >>> + heap_fctl->work_thread = kthread_run(dma_heap_file_control_thread, >>> + heap_fctl, "heap_fwork_t"); >>> + if (IS_ERR(heap_fctl->work_thread)) { >>> + ret = -ENOMEM; >>> + goto fail_thread; >>> + } >>> + >>> + heap_fctl->heap_fwork_cachep = KMEM_CACHE(dma_heap_file_work, 0); >>> + if (unlikely(!heap_fctl->heap_fwork_cachep)) { >>> + ret = -ENOMEM; >>> + goto fail_cache; >>> + } >>> + >>> return 0; >>> + >>> +fail_cache: >>> + kthread_stop(heap_fctl->work_thread); >>> +fail_thread: >>> + kfree(heap_fctl); >>> +fail_alloc: >>> + class_destroy(dma_heap_class); >>> +fail_class: >>> + unregister_chrdev_region(dma_heap_devt, NUM_HEAP_MINORS); >>> + return ret; >>> } >>> subsys_initcall(dma_heap_init); >>> diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h >>> index 064bad725061..9c25383f816c 100644 >>> --- a/include/linux/dma-heap.h >>> +++ b/include/linux/dma-heap.h >>> @@ -12,12 +12,17 @@ >>> #include <linux/cdev.h> >>> #include <linux/types.h> >>> +#define DEFAULT_ADI_BATCH (128 << 20) >>> + >>> struct dma_heap; >>> +struct dma_heap_file_task; >>> +struct dma_heap_file; >>> /** >>> * struct dma_heap_ops - ops to operate on a given heap >>> * @allocate: allocate dmabuf and return struct dma_buf ptr >>> - * >>> + * @allocate_read_file: allocate dmabuf and read file, then return >>> struct >>> + * dma_buf ptr. >>> * allocate returns dmabuf on success, ERR_PTR(-errno) on error. >>> */ >>> struct dma_heap_ops { >>> @@ -25,6 +30,11 @@ struct dma_heap_ops { >>> unsigned long len, >>> u32 fd_flags, >>> u64 heap_flags); >>> + >>> + struct dma_buf *(*allocate_read_file)(struct dma_heap *heap, >>> + struct dma_heap_file *heap_file, >>> + u32 fd_flags, >>> + u64 heap_flags); >>> }; >>> /** >>> @@ -65,4 +75,49 @@ const char *dma_heap_get_name(struct dma_heap >>> *heap); >>> */ >>> struct dma_heap *dma_heap_add(const struct dma_heap_export_info >>> *exp_info); >>> +/** >>> + * dma_heap_destroy_file_read - waits for a file read to complete >>> then destroy it >>> + * Returns: true if the file read failed, false otherwise >>> + */ >>> +bool dma_heap_destroy_file_read(struct dma_heap_file_task >>> *heap_ftask); >>> + >>> +/** >>> + * dma_heap_wait_for_file_read - waits for a file read to complete >>> + * Returns: true if the file read failed, false otherwise >>> + */ >>> +bool dma_heap_wait_for_file_read(struct dma_heap_file_task >>> *heap_ftask); >>> + >>> +/** >>> + * dma_heap_alloc_file_read - Declare a task to read file when >>> allocate pages. >>> + * @heap_file: target file to read >>> + * >>> + * Return NULL if failed, otherwise return a struct pointer. >>> + */ >>> +struct dma_heap_file_task * >>> +dma_heap_declare_file_read(struct dma_heap_file *heap_file); >>> + >>> +/** >>> + * dma_heap_prepare_file_read - cache each allocated page until we >>> meet this batch. >>> + * @heap_ftask: prepared and need to commit's work. >>> + * @page: current allocated page. don't care which order. >>> + * >>> + * Returns true if reach to batch, false so go on prepare. >>> + */ >>> +bool dma_heap_prepare_file_read(struct dma_heap_file_task *heap_ftask, >>> + struct page *page); >>> + >>> +/** >>> + * dma_heap_commit_file_read - prepare collect enough memory, >>> going to trigger IO >>> + * @heap_ftask: info that current IO needs >>> + * >>> + * This commit will also check if reach to tail read. >>> + * For direct I/O submissions, it is necessary to pay attention to >>> file reads >>> + * that are not page-aligned. For the unaligned portion of the >>> read, buffer IO >>> + * needs to be triggered. >>> + * Returns: >>> + * 0 if all right, -errno if something wrong >>> + */ >>> +int dma_heap_submit_file_read(struct dma_heap_file_task *heap_ftask); >>> +size_t dma_heap_file_size(struct dma_heap_file *heap_file); >>> + >>> #endif /* _DMA_HEAPS_H */ >>> diff --git a/include/uapi/linux/dma-heap.h >>> b/include/uapi/linux/dma-heap.h >>> index a4cf716a49fa..8c20e8b74eed 100644 >>> --- a/include/uapi/linux/dma-heap.h >>> +++ b/include/uapi/linux/dma-heap.h >>> @@ -39,6 +39,27 @@ struct dma_heap_allocation_data { >>> __u64 heap_flags; >>> }; >>> +/** >>> + * struct dma_heap_allocation_file_data - metadata passed from >>> userspace for >>> + * allocations and read file >>> + * @fd: will be populated with a fd which provides the >>> + * handle to the allocated dma-buf >>> + * @file_fd: file descriptor to read from(suggested to use >>> O_DIRECT open file) >>> + * @batch: how many memory alloced then file read(bytes), >>> default 128MB >>> + * will auto aligned to PAGE_SIZE >>> + * @fd_flags: file descriptor flags used when allocating >>> + * @heap_flags: flags passed to heap >>> + * >>> + * Provided by userspace as an argument to the ioctl >>> + */ >>> +struct dma_heap_allocation_file_data { >>> + __u32 fd; >>> + __u32 file_fd; >>> + __u32 batch; >>> + __u32 fd_flags; >>> + __u64 heap_flags; >>> +}; >>> + >>> #define DMA_HEAP_IOC_MAGIC 'H' >>> /** >>> @@ -50,4 +71,15 @@ struct dma_heap_allocation_data { >>> #define DMA_HEAP_IOCTL_ALLOC _IOWR(DMA_HEAP_IOC_MAGIC, 0x0,\ >>> struct dma_heap_allocation_data) >>> +/** >>> + * DOC: DMA_HEAP_IOCTL_ALLOC_AND_READ - allocate memory from pool >>> and both >>> + * read file when allocate memory. >>> + * >>> + * Takes a dma_heap_allocation_file_data struct and returns it with >>> the fd field >>> + * populated with the dmabuf handle of the allocation. When return, >>> the dma-buf >>> + * content is read from file. >>> + */ >>> +#define DMA_HEAP_IOCTL_ALLOC_AND_READ \ >>> + _IOWR(DMA_HEAP_IOC_MAGIC, 0x1, struct >>> dma_heap_allocation_file_data) >>> + >>> #endif /* _UAPI_LINUX_DMABUF_POOL_H */ >>

2 years

Re: [PATCH] media: videobuf2: sync caches for dmabuf memory

by Tomasz Figa

On Thu, Jun 20, 2024 at 3:52 PM Hans Verkuil <hverkuil-cisco(a)xs4all.nl> wrote: > > On 19/06/2024 06:19, Tomasz Figa wrote: > > On Wed, Jun 19, 2024 at 1:24 AM Nicolas Dufresne <nicolas(a)ndufresne.ca> wrote: > >> > >> Le mardi 18 juin 2024 à 16:47 +0900, Tomasz Figa a écrit : > >>> Hi TaoJiang, > >>> > >>> On Tue, Jun 18, 2024 at 4:30 PM TaoJiang <tao.jiang_2(a)nxp.com> wrote: > >>>> > >>>> From: Ming Qian <ming.qian(a)nxp.com> > >>>> > >>>> When the memory type is VB2_MEMORY_DMABUF, the v4l2 device can't know > >>>> whether the dma buffer is coherent or synchronized. > >>>> > >>>> The videobuf2-core will skip cache syncs as it think the DMA exporter > >>>> should take care of cache syncs > >>>> > >>>> But in fact it's likely that the client doesn't > >>>> synchronize the dma buf before qbuf() or after dqbuf(). and it's > >>>> difficult to find this type of error directly. > >>>> > >>>> I think it's helpful that videobuf2-core can call > >>>> dma_buf_end_cpu_access() and dma_buf_begin_cpu_access() to handle the > >>>> cache syncs. > >>>> > >>>> Signed-off-by: Ming Qian <ming.qian(a)nxp.com> > >>>> Signed-off-by: TaoJiang <tao.jiang_2(a)nxp.com> > >>>> --- > >>>> .../media/common/videobuf2/videobuf2-core.c | 22 +++++++++++++++++++ > >>>> 1 file changed, 22 insertions(+) > >>>> > >>> > >>> Sorry, that patch is incorrect. I believe you're misunderstanding the > >>> way DMA-buf buffers should be managed in the userspace. It's the > >>> userspace responsibility to call the DMA_BUF_IOCTL_SYNC ioctl [1] to > >>> signal start and end of CPU access to the kernel and imply necessary > >>> cache synchronization. > >>> > >>> [1] https://docs.kernel.org/driver-api/dma-buf.html#dma-buffer-ioctls > >>> > >>> So, really sorry, but it's a NAK. > >> > >> > >> > >> This patch *could* make sense if it was inside UVC Driver as an example, as this > >> driver can import dmabuf, to CPU memcpy, and does omits the required sync calls > >> (unless that got added recently, I can easily have missed it). > > > > Yeah, currently V4L2 drivers don't call the in-kernel > > dma_buf_{begin,end}_cpu_access() when they need to access the buffers > > from the CPU, while my quick grep [1] reveals that we have 68 files > > retrieving plane vaddr by calling vb2_plane_vaddr() (not necessarily a > > 100% guarantee of CPU access being done, but rather likely so). > > > > I also repeated the same thing with VB2_DMABUF [2] and tried to > > attribute both lists to specific drivers (by retaining the path until > > the first - or _ [3]; which seemed to be relatively accurate), leading > > to the following drivers that claim support for DMABUF while also > > retrieving plane vaddr (without proper synchronization - no drivers > > currently call any begin/end CPU access): > > > > i2c/video > > pci/bt8xx/bttv > > pci/cobalt/cobalt > > pci/cx18/cx18 > > pci/tw5864/tw5864 > > pci/tw686x/tw686x > > platform/allegro > > platform/amphion/vpu > > platform/chips > > platform/intel/pxa > > platform/marvell/mcam > > platform/mediatek/jpeg/mtk > > platform/mediatek/vcodec/decoder/mtk > > platform/mediatek/vcodec/encoder/mtk > > platform/nuvoton/npcm > > platform/nvidia/tegra > > platform/nxp/imx > > platform/renesas/rcar > > platform/renesas/vsp1/vsp1 > > platform/rockchip/rkisp1/rkisp1 > > platform/samsung/exynos4 > > platform/samsung/s5p > > platform/st/sti/delta/delta > > platform/st/sti/hva/hva > > platform/verisilicon/hantro > > usb/au0828/au0828 > > usb/cx231xx/cx231xx > > usb/dvb > > usb/em28xx/em28xx > > usb/gspca/gspca.c > > usb/hackrf/hackrf.c > > usb/stk1160/stk1160 > > usb/uvc/uvc > > > > which means we potentially have ~30 drivers which likely don't handle > > imported DMABUFs correctly (there is still a chance that DMABUF is > > advertised for one queue, while vaddr is used for another). > > > > I think we have two options: > > 1) add vb2_{begin/end}_cpu_access() helpers, carefully audit each > > driver and add calls to those > > I actually started on that 9 (!) years ago: > > https://git.linuxtv.org/hverkuil/media_tree.git/log/?h=vb2-cpu-access > > If memory serves, the main problem was that there were some drivers where > it wasn't clear what should be done. In the end I never continued this > work since nobody complained about it. > > This patch series adds vb2_plane_begin/end_cpu_access() functions, > replaces all calls to vb2_plane_vaddr() in drivers to the new functions, > and at the end removes vb2_plane_vaddr() altogether. > > > 2) take a heavy gun approach and just call vb2_begin_cpu_access() > > whenever vb2_plane_vaddr() is called and then vb2_end_cpu_access() > > whenever vb2_buffer_done() is called (if begin was called before). > > > > The latter has the disadvantage of drivers not having control over the > > timing of the cache sync, so could end up with less than optimal > > performance. Also there could be some more complex cases, where the > > driver needs to mix DMA and CPU accesses to the buffer, so the fixed > > sequence just wouldn't work for them. (But then they just wouldn't > > work today either.) > > > > Hans, Marek, do you have any thoughts? (I'd personally just go with 2 > > and if any driver in the future needs something else, they could call > > begin/end CPU access manually.) > > I prefer 1. If nothing else, that makes it easy to identify drivers > that do such things. > > But perhaps a mix is possible: if a VB2 flag is set by the driver, then > approach 2 is used. That might help with the drivers where it isn't clear > what they should do. Although perhaps this can all be done in the driver > itself: instead of vb2_plane_vaddr they call vb2_begin_cpu_access for the > whole buffer, and at buffer_done time they call vb2_end_cpu_access. Should > work just as well for the very few drivers that need this. That's a good point. I guess we don't really need to dig so much into those drivers in this case. Just mechanically do the same for all of them (+/- maybe checking for some obvious corner cases which don't need the extra calls). Let me see if I can give it a stab. Best, Tomasz > > Regards, > > Hans > > > > > [1] git grep vb2_plane_vaddr | cut -d":" -f 1 | sort | uniq > > [2] git grep VB2_DMABUF | cut -d":" -f 1 | sort | uniq > > [3] by running [1] and [2] through | cut -d"-" -f 1 | cut -d"_" -f 1 | uniq > > > > Best, > > Tomasz > > > >> > >> But generally speaking, bracketing all driver with CPU access synchronization > >> does not make sense indeed, so I second the rejection. > >> > >> Nicolas > >> > >>> > >>> Best regards, > >>> Tomasz > >>> > >>>> diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c > >>>> index 358f1fe42975..4734ff9cf3ce 100644 > >>>> --- a/drivers/media/common/videobuf2/videobuf2-core.c > >>>> +++ b/drivers/media/common/videobuf2/videobuf2-core.c > >>>> @@ -340,6 +340,17 @@ static void __vb2_buf_mem_prepare(struct vb2_buffer *vb) > >>>> vb->synced = 1; > >>>> for (plane = 0; plane < vb->num_planes; ++plane) > >>>> call_void_memop(vb, prepare, vb->planes[plane].mem_priv); > >>>> + > >>>> + if (vb->memory != VB2_MEMORY_DMABUF) > >>>> + return; > >>>> + for (plane = 0; plane < vb->num_planes; ++plane) { > >>>> + struct dma_buf *dbuf = vb->planes[plane].dbuf; > >>>> + > >>>> + if (!dbuf) > >>>> + continue; > >>>> + > >>>> + dma_buf_end_cpu_access(dbuf, vb->vb2_queue->dma_dir); > >>>> + } > >>>> } > >>>> > >>>> /* > >>>> @@ -356,6 +367,17 @@ static void __vb2_buf_mem_finish(struct vb2_buffer *vb) > >>>> vb->synced = 0; > >>>> for (plane = 0; plane < vb->num_planes; ++plane) > >>>> call_void_memop(vb, finish, vb->planes[plane].mem_priv); > >>>> + > >>>> + if (vb->memory != VB2_MEMORY_DMABUF) > >>>> + return; > >>>> + for (plane = 0; plane < vb->num_planes; ++plane) { > >>>> + struct dma_buf *dbuf = vb->planes[plane].dbuf; > >>>> + > >>>> + if (!dbuf) > >>>> + continue; > >>>> + > >>>> + dma_buf_begin_cpu_access(dbuf, vb->vb2_queue->dma_dir); > >>>> + } > >>>> } > >>>> > >>>> /* > >>>> -- > >>>> 2.43.0-rc1 > >>>> > >> > > >

2 years

Re: [PATCH 0/2] Support direct I/O read and write for memory allocated by dmabuf

by Christian König

Am 10.07.24 um 16:35 schrieb Lei Liu: > > 在 2024/7/10 22:14, Christian König 写道: >> Am 10.07.24 um 15:57 schrieb Lei Liu: >>> Use vm_insert_page to establish a mapping for the memory allocated >>> by dmabuf, thus supporting direct I/O read and write; and fix the >>> issue of incorrect memory statistics after mapping dmabuf memory. >> >> Well big NAK to that! Direct I/O is intentionally disabled on DMA-bufs. > > Hello! Could you explain why direct_io is disabled on DMABUF? Is there > any historical reason for this? It's basically one of the most fundamental design decision of DMA-Buf. The attachment/map/fence model DMA-buf uses is not really compatible with direct I/O on the underlying pages. >> >> We already discussed enforcing that in the DMA-buf framework and this >> patch probably means that we should really do that. >> >> Regards, >> Christian. > > Thank you for your response. With the application of AI large model > edgeification, we urgently need support for direct_io on DMABUF to > read some very large files. Do you have any new solutions or plans for > this? We have seen similar projects over the years and all of those turned out to be complete shipwrecks. There is currently a patch set under discussion to give the network subsystem DMA-buf support. If you are interest in network direct I/O that could help. Additional to that a lot of GPU drivers support userptr usages, e.g. to import malloced memory into the GPU driver. You can then also do direct I/O on that malloced memory and the kernel will enforce correct handling with the GPU driver through MMU notifiers. But as far as I know a general DMA-buf based solution isn't possible. Regards, Christian. > > Regards, > Lei Liu. > >> >>> >>> Lei Liu (2): >>> mm: dmabuf_direct_io: Support direct_io for memory allocated by >>> dmabuf >>> mm: dmabuf_direct_io: Fix memory statistics error for dmabuf >>> allocated >>> memory with direct_io support >>> >>> drivers/dma-buf/heaps/system_heap.c | 5 +++-- >>> fs/proc/task_mmu.c | 8 +++++++- >>> include/linux/mm.h | 1 + >>> mm/memory.c | 15 ++++++++++----- >>> mm/rmap.c | 9 +++++---- >>> 5 files changed, 26 insertions(+), 12 deletions(-) >>> >>

2 years

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig July 2024