Linaro-mm-sig

linaro-mm-sig@lists.linaro.org

6 participants
2911 discussions

Re: [RFC v3 07/10] dma-fence: Add safe access helpers and document the rules

by Christian König

I'm going to push patches #1-#6 to drm-misc-next. They make sense as a stand alone cleanups anyway. But that here needs a bit more documentation I think. On 5/13/25 09:45, Tvrtko Ursulin wrote: > Dma-fence objects currently suffer from a potential use after free problem > where fences exported to userspace and other drivers can outlive the > exporting driver, or the associated data structures. > > The discussion on how to address this concluded that adding reference > counting to all the involved objects is not desirable, since it would need > to be very wide reaching and could cause unloadable drivers if another > entity would be holding onto a signaled fence reference potentially > indefinitely. > > This patch enables the safe access by introducing and documenting a > contract between fence exporters and users. It documents a set of > contraints and adds helpers which a) drivers with potential to suffer from > the use after free must use and b) users of the dma-fence API must use as > well. > > Premise of the design has multiple sides: > > 1. Drivers (fence exporters) MUST ensure a RCU grace period between > signalling a fence and freeing the driver private data associated with it. That's a must have anyway, otherwise functions like dma_fence_get_rcu() won't work. I hope that we have documented that somewhere, but I'm not 100% sure to be honest. > The grace period does not have to follow the signalling immediately but > HAS to happen before data is freed. That is the new requirement we have to document somehow. I'm not 100% sure but I think module unloading waits for an RCU grace period anyway. > 2. Users of the dma-fence API marked with such requirement MUST contain > the complete access to the data within a single code block guarded by the > new dma_fence_access_begin() and dma_fence_access_end() helpers. > > The combination of the two ensures that whoever sees the > DMA_FENCE_FLAG_SIGNALED_BIT not set is guaranteed to have access to a > valid fence->lock and valid data potentially accessed by the fence->ops > virtual functions, until the call to dma_fence_access_end(). Mhm, how about returning copies of the string? This is only for debugging anyway and kstrdup_const() isn't that costly. Regards, Christian. > > 3. Module unload (fence->ops) disappearing is for now explicitly not > handled. That would required a more complex protection, possibly needing > SRCU instead of RCU to handle callers such as dma_fence_wait_timeout(), > where race between dma_fence_enable_sw_signaling, signalling, and > dereference of fence->ops->wait() would need a sleeping SRCU context. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> > --- > drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++++++++++++++ > include/linux/dma-fence.h | 32 ++++++++++++----- > 2 files changed, 93 insertions(+), 8 deletions(-) > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c > index dc2456f68685..cfe1d7b79c22 100644 > --- a/drivers/dma-buf/dma-fence.c > +++ b/drivers/dma-buf/dma-fence.c > @@ -533,6 +533,7 @@ void dma_fence_release(struct kref *kref) > struct dma_fence *fence = > container_of(kref, struct dma_fence, refcount); > > + dma_fence_access_begin(); > trace_dma_fence_destroy(fence); > > if (WARN(!list_empty(&fence->cb_list) && > @@ -560,6 +561,8 @@ void dma_fence_release(struct kref *kref) > fence->ops->release(fence); > else > dma_fence_free(fence); > + > + dma_fence_access_end(); > } > EXPORT_SYMBOL(dma_fence_release); > > @@ -982,11 +985,13 @@ EXPORT_SYMBOL(dma_fence_set_deadline); > */ > void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq) > { > + dma_fence_access_begin(); > seq_printf(seq, "%s %s seq %llu %ssignalled\n", > dma_fence_driver_name(fence), > dma_fence_timeline_name(fence), > fence->seqno, > dma_fence_is_signaled(fence) ? "" : "un"); > + dma_fence_access_end(); > } > EXPORT_SYMBOL(dma_fence_describe); > > @@ -1033,3 +1038,67 @@ dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops, > __set_bit(DMA_FENCE_FLAG_SEQNO64_BIT, &fence->flags); > } > EXPORT_SYMBOL(dma_fence_init64); > + > +/** > + * dma_fence_driver_name - Access the driver name > + * @fence: the fence to query > + * > + * Returns a driver name backing the dma-fence implementation. > + * > + * IMPORTANT CONSIDERATION: > + * Dma-fence contract stipulates that access to driver provided data (data not > + * directly embedded into the object itself), such as the &dma_fence.lock and > + * memory potentially accessed by the &dma_fence.ops functions, is forbidden > + * after the fence has been signalled. Drivers are allowed to free that data, > + * and some do. > + * > + * To allow safe access drivers are mandated to guarantee a RCU grace period > + * between signalling the fence and freeing said data. > + * > + * As such access to the driver name is only valid inside a RCU locked section. > + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded > + * by the &dma_fence_access_being and &dma_fence_access_end pair. > + */ > +const char *dma_fence_driver_name(struct dma_fence *fence) > +{ > + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), > + "rcu_read_lock() required for safe access to returned string"); > + > + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) > + return fence->ops->get_driver_name(fence); > + else > + return "detached-driver"; > +} > +EXPORT_SYMBOL(dma_fence_driver_name); > + > +/** > + * dma_fence_timeline_name - Access the timeline name > + * @fence: the fence to query > + * > + * Returns a timeline name provided by the dma-fence implementation. > + * > + * IMPORTANT CONSIDERATION: > + * Dma-fence contract stipulates that access to driver provided data (data not > + * directly embedded into the object itself), such as the &dma_fence.lock and > + * memory potentially accessed by the &dma_fence.ops functions, is forbidden > + * after the fence has been signalled. Drivers are allowed to free that data, > + * and some do. > + * > + * To allow safe access drivers are mandated to guarantee a RCU grace period > + * between signalling the fence and freeing said data. > + * > + * As such access to the driver name is only valid inside a RCU locked section. > + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded > + * by the &dma_fence_access_being and &dma_fence_access_end pair. > + */ > +const char *dma_fence_timeline_name(struct dma_fence *fence) > +{ > + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), > + "rcu_read_lock() required for safe access to returned string"); > + > + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) > + return fence->ops->get_driver_name(fence); > + else > + return "signaled-timeline"; > +} > +EXPORT_SYMBOL(dma_fence_timeline_name); > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h > index c5ac37e10d85..b39e430142ea 100644 > --- a/include/linux/dma-fence.h > +++ b/include/linux/dma-fence.h > @@ -377,15 +377,31 @@ bool dma_fence_remove_callback(struct dma_fence *fence, > struct dma_fence_cb *cb); > void dma_fence_enable_sw_signaling(struct dma_fence *fence); > > -static inline const char *dma_fence_driver_name(struct dma_fence *fence) > -{ > - return fence->ops->get_driver_name(fence); > -} > +/** > + * DOC: Safe external access to driver provided object members > + * > + * All data not stored directly in the dma-fence object, such as the > + * &dma_fence.lock and memory potentially accessed by functions in the > + * &dma_fence.ops table, MUST NOT be accessed after the fence has been signalled > + * because after that point drivers are allowed to free it. > + * > + * All code accessing that data via the dma-fence API (or directly, which is > + * discouraged), MUST make sure to contain the complete access within a > + * &dma_fence_access_begin and &dma_fence_access_end pair. > + * > + * Some dma-fence API handles this automatically, while other, as for example > + * &dma_fence_driver_name and &dma_fence_timeline_name, leave that > + * responsibility to the caller. > + * > + * To enable this scheme to work drivers MUST ensure a RCU grace period elapses > + * between signalling the fence and freeing the said data. > + * > + */ > +#define dma_fence_access_begin rcu_read_lock > +#define dma_fence_access_end rcu_read_unlock > > -static inline const char *dma_fence_timeline_name(struct dma_fence *fence) > -{ > - return fence->ops->get_timeline_name(fence); > -} > +const char *dma_fence_driver_name(struct dma_fence *fence); > +const char *dma_fence_timeline_name(struct dma_fence *fence); > > /** > * dma_fence_is_signaled_locked - Return an indication if the fence

1 month, 2 weeks

Re: [RFC v3 05/10] drm/amdgpu: Use dma-fence driver and timeline name helpers

by Christian König

On 5/13/25 09:45, Tvrtko Ursulin wrote: > Access the dma-fence internals via the previously added helpers. > > Drop the macro while at it, since the length is now more manageable. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> Reviewed-by: Christian König <christian.koenig(a)amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 9 ++------- > 1 file changed, 2 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h > index 11dd2e0f7979..4c61e4168f23 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h > @@ -32,9 +32,6 @@ > #define TRACE_SYSTEM amdgpu > #define TRACE_INCLUDE_FILE amdgpu_trace > > -#define AMDGPU_JOB_GET_TIMELINE_NAME(job) \ > - job->base.s_fence->finished.ops->get_timeline_name(&job->base.s_fence->finished) > - > TRACE_EVENT(amdgpu_device_rreg, > TP_PROTO(unsigned did, uint32_t reg, uint32_t value), > TP_ARGS(did, reg, value), > @@ -168,7 +165,7 @@ TRACE_EVENT(amdgpu_cs_ioctl, > TP_ARGS(job), > TP_STRUCT__entry( > __field(uint64_t, sched_job_id) > - __string(timeline, AMDGPU_JOB_GET_TIMELINE_NAME(job)) > + __string(timeline, dma_fence_timeline_name(&job->base.s_fence->finished)) > __field(unsigned int, context) > __field(unsigned int, seqno) > __field(struct dma_fence *, fence) > @@ -194,7 +191,7 @@ TRACE_EVENT(amdgpu_sched_run_job, > TP_ARGS(job), > TP_STRUCT__entry( > __field(uint64_t, sched_job_id) > - __string(timeline, AMDGPU_JOB_GET_TIMELINE_NAME(job)) > + __string(timeline, dma_fence_timeline_name(&job->base.s_fence->finished)) > __field(unsigned int, context) > __field(unsigned int, seqno) > __string(ring, to_amdgpu_ring(job->base.sched)->name) > @@ -585,8 +582,6 @@ TRACE_EVENT(amdgpu_reset_reg_dumps, > __entry->address, > __entry->value) > ); > - > -#undef AMDGPU_JOB_GET_TIMELINE_NAME > #endif > > /* This part must be outside protection */

1 month, 2 weeks

Re: [PATCH] dma-buf: insert memory barrier before updating num_fences

by Christian König

On 5/13/25 04:06, Hyejeong Choi wrote: > smp_store_mb() inserts memory barrier after storing operation. > It is different with what the comment is originally aiming so Null > pointer dereference can be happened if memory update is reordered. > > Signed-off-by: Hyejeong Choi <hjeong.choi(a)samsung.com> I've reviewed, add CC stable and Fixes tags and pushed it to drm-misc-fixes. Thanks, Christian. > --- > drivers/dma-buf/dma-resv.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c > index 5f8d010516f0..b1ef4546346d 100644 > --- a/drivers/dma-buf/dma-resv.c > +++ b/drivers/dma-buf/dma-resv.c > @@ -320,8 +320,9 @@ void dma_resv_add_fence(struct dma_resv *obj, struct dma_fence *fence, > count++; > > dma_resv_list_set(fobj, i, fence, usage); > - /* pointer update must be visible before we extend the num_fences */ > - smp_store_mb(fobj->num_fences, count); > + /* fence update must be visible before we extend the num_fences */ > + smp_wmb(); > + fobj->num_fences = count; > } > EXPORT_SYMBOL(dma_resv_add_fence); > > >

1 month, 2 weeks

Re: [PATCH 2/2] dmabuf/heaps: implement DMA_BUF_IOCTL_RW_FILE for system_heap

by Christian König

On 5/14/25 13:02, wangtao wrote: >> -----Original Message----- >> From: Christian König <christian.koenig(a)amd.com> >> Sent: Tuesday, May 13, 2025 9:18 PM >> To: wangtao <tao.wangtao(a)honor.com>; sumit.semwal(a)linaro.org; >> benjamin.gaignard(a)collabora.com; Brian.Starkey(a)arm.com; >> jstultz(a)google.com; tjmercier(a)google.com >> Cc: linux-media(a)vger.kernel.org; dri-devel(a)lists.freedesktop.org; linaro- >> mm-sig(a)lists.linaro.org; linux-kernel(a)vger.kernel.org; >> wangbintian(BintianWang) <bintian.wang(a)honor.com>; yipengxiang >> <yipengxiang(a)honor.com>; <liulu.liu(a)honor.com>; <feng.han(a)honor.com> >> Subject: Re: [PATCH 2/2] dmabuf/heaps: implement >> DMA_BUF_IOCTL_RW_FILE for system_heap >> >> On 5/13/25 14:30, wangtao wrote: >>>> -----Original Message----- >>>> From: Christian König <christian.koenig(a)amd.com> >>>> Sent: Tuesday, May 13, 2025 7:32 PM >>>> To: wangtao <tao.wangtao(a)honor.com>; sumit.semwal(a)linaro.org; >>>> benjamin.gaignard(a)collabora.com; Brian.Starkey(a)arm.com; >>>> jstultz(a)google.com; tjmercier(a)google.com >>>> Cc: linux-media(a)vger.kernel.org; dri-devel(a)lists.freedesktop.org; >>>> linaro- mm-sig(a)lists.linaro.org; linux-kernel(a)vger.kernel.org; >>>> wangbintian(BintianWang) <bintian.wang(a)honor.com>; yipengxiang >>>> <yipengxiang(a)honor.com>; <liulu.liu(a)honor.com>; >>>> <feng.han(a)honor.com> >>>> Subject: Re: [PATCH 2/2] dmabuf/heaps: implement >>>> DMA_BUF_IOCTL_RW_FILE for system_heap >>>> >>>> On 5/13/25 11:28, wangtao wrote: >>>>> Support direct file I/O operations for system_heap dma-buf objects. >>>>> Implementation includes: >>>>> 1. Convert sg_table to bio_vec >>>> >>>> That is usually illegal for DMA-bufs. >>> [wangtao] The term 'convert' is misleading in this context. The appropriate >> phrasing should be: Construct bio_vec from sg_table. >> >> Well it doesn't matter what you call it. Touching the page inside an sg table of >> a DMA-buf is illegal, we even have code to actively prevent that. > [wangtao] For a driver using DMA-buf: Don't touch pages in the sg_table. But the system heap exporter (sg_table owner) should be allowed to use them. Good point that might be possible. > If a driver takes ownership via dma_buf_map_attachment or similar calls, the exporter must stop using the sg_table. > User-space programs should call DMA_BUF_IOCTL_RW_FILE only when the DMA-buf is not attached. > The exporter must check ownership (e.g., ensure no map_dma_buf/vmap is active) and block new calls during operations. > I'll add these checks in patch v2. > >> >> Once more: This approach was already rejected multiple times! Please use >> udmabuf instead! >> >> The hack you came up here is simply not necessary. > [wangtao] Many people need DMA-buf direct I/O. I tried it 2 years ago. My method is simpler, uses less CPU/power, and performs better: I don't think that this is a valid argument. > - Speed: 3418 MB/s vs. 2073 MB/s (udmabuf) at 1GHz CPU. > - udmabuf wastes half its CPU time on __get_user_pages. > - Creating 32x32MB DMA-bufs + reading 1GB file takes 346 ms vs. 1145 ms for udmabuf (10x slower) vs. 1503 ms for DMA-buf normal. Why would using udmabuf be slower here? > udmabuf is slightly faster but not enough. Switching to udmabuf is easy for small apps but hard in complex systems without major benefits. Yeah, but your approach here is a rather clear hack. Using udmabuf is much more cleaner and generally accepted by everybody now. As far as I can see I have to reject your approach here. Regards, Christian. >> >> Regards, >> Christian. >> >> >>> Appreciate your feedback. >>>> >>>> Regards, >>>> Christian. >>>> >>>>> 2. Set IOCB_DIRECT when O_DIRECT is supported 3. Invoke >>>>> vfs_iocb_iter_read()/vfs_iocb_iter_write() for actual I/O >>>>> >>>>> Performance metrics (UFS 4.0 device @4GB/s, Arm64 CPU @1GHz): >>>>> >>>>> | Metric | 1MB | 8MB | 64MB | 1024MB | 3072MB | >>>>> |--------------------|-------:|-------:|--------:|---------:|------- >>>>> |--------------------|-- >>>>> |--------------------|:| >>>>> | Buffer Read (us) | 1658 | 9028 | 69295 | 1019783 | 2978179 | >>>>> | Direct Read (us) | 707 | 2647 | 18689 | 299627 | 937758 | >>>>> | Buffer Rate (MB/s) | 603 | 886 | 924 | 1004 | 1032 | >>>>> | Direct Rate (MB/s) | 1414 | 3022 | 3425 | 3418 | 3276 | >>>>> >>>>> Signed-off-by: wangtao <tao.wangtao(a)honor.com> >>>>> --- >>>>> drivers/dma-buf/heaps/system_heap.c | 118 >>>>> ++++++++++++++++++++++++++++ >>>>> 1 file changed, 118 insertions(+) >>>>> >>>>> diff --git a/drivers/dma-buf/heaps/system_heap.c >>>>> b/drivers/dma-buf/heaps/system_heap.c >>>>> index 26d5dc89ea16..f7b71b9843aa 100644 >>>>> --- a/drivers/dma-buf/heaps/system_heap.c >>>>> +++ b/drivers/dma-buf/heaps/system_heap.c >>>>> @@ -20,6 +20,8 @@ >>>>> #include <linux/scatterlist.h> >>>>> #include <linux/slab.h> >>>>> #include <linux/vmalloc.h> >>>>> +#include <linux/bvec.h> >>>>> +#include <linux/uio.h> >>>>> >>>>> static struct dma_heap *sys_heap; >>>>> >>>>> @@ -281,6 +283,121 @@ static void system_heap_vunmap(struct >> dma_buf >>>> *dmabuf, struct iosys_map *map) >>>>> iosys_map_clear(map); >>>>> } >>>>> >>>>> +static struct bio_vec *system_heap_init_bvec(struct >>>> system_heap_buffer *buffer, >>>>> + size_t offset, size_t len, int *nr_segs) { >>>>> + struct sg_table *sgt = &buffer->sg_table; >>>>> + struct scatterlist *sg; >>>>> + size_t length = 0; >>>>> + unsigned int i, k = 0; >>>>> + struct bio_vec *bvec; >>>>> + size_t sg_left; >>>>> + size_t sg_offset; >>>>> + size_t sg_len; >>>>> + >>>>> + bvec = kvcalloc(sgt->nents, sizeof(*bvec), GFP_KERNEL); >>>>> + if (!bvec) >>>>> + return NULL; >>>>> + >>>>> + for_each_sg(sgt->sgl, sg, sgt->nents, i) { >>>>> + length += sg->length; >>>>> + if (length <= offset) >>>>> + continue; >>>>> + >>>>> + sg_left = length - offset; >>>>> + sg_offset = sg->offset + sg->length - sg_left; >>>>> + sg_len = min(sg_left, len); >>>>> + >>>>> + bvec[k].bv_page = sg_page(sg); >>>>> + bvec[k].bv_len = sg_len; >>>>> + bvec[k].bv_offset = sg_offset; >>>>> + k++; >>>>> + >>>>> + offset += sg_len; >>>>> + len -= sg_len; >>>>> + if (len <= 0) >>>>> + break; >>>>> + } >>>>> + >>>>> + *nr_segs = k; >>>>> + return bvec; >>>>> +} >>>>> + >>>>> +static int system_heap_rw_file(struct system_heap_buffer *buffer, >>>>> +bool >>>> is_read, >>>>> + bool direct_io, struct file *filp, loff_t file_offset, >>>>> + size_t buf_offset, size_t len) >>>>> +{ >>>>> + struct bio_vec *bvec; >>>>> + int nr_segs = 0; >>>>> + struct iov_iter iter; >>>>> + struct kiocb kiocb; >>>>> + ssize_t ret = 0; >>>>> + >>>>> + if (direct_io) { >>>>> + if (!(filp->f_mode & FMODE_CAN_ODIRECT)) >>>>> + return -EINVAL; >>>>> + } >>>>> + >>>>> + bvec = system_heap_init_bvec(buffer, buf_offset, len, &nr_segs); >>>>> + if (!bvec) >>>>> + return -ENOMEM; >>>>> + >>>>> + iov_iter_bvec(&iter, is_read ? ITER_DEST : ITER_SOURCE, bvec, >>>> nr_segs, len); >>>>> + init_sync_kiocb(&kiocb, filp); >>>>> + kiocb.ki_pos = file_offset; >>>>> + if (direct_io) >>>>> + kiocb.ki_flags |= IOCB_DIRECT; >>>>> + >>>>> + while (kiocb.ki_pos < file_offset + len) { >>>>> + if (is_read) >>>>> + ret = vfs_iocb_iter_read(filp, &kiocb, &iter); >>>>> + else >>>>> + ret = vfs_iocb_iter_write(filp, &kiocb, &iter); >>>>> + if (ret <= 0) >>>>> + break; >>>>> + } >>>>> + >>>>> + kvfree(bvec); >>>>> + return ret < 0 ? ret : 0; >>>>> +} >>>>> + >>>>> +static int system_heap_dma_buf_rw_file(struct dma_buf *dmabuf, >>>>> + struct dma_buf_rw_file *back) >>>>> +{ >>>>> + struct system_heap_buffer *buffer = dmabuf->priv; >>>>> + int ret = 0; >>>>> + __u32 op = back->flags & DMA_BUF_RW_FLAGS_OP_MASK; >>>>> + bool direct_io = back->flags & DMA_BUF_RW_FLAGS_DIRECT; >>>>> + struct file *filp; >>>>> + >>>>> + if (op != DMA_BUF_RW_FLAGS_READ && op != >>>> DMA_BUF_RW_FLAGS_WRITE) >>>>> + return -EINVAL; >>>>> + if (direct_io) { >>>>> + if (!PAGE_ALIGNED(back->file_offset) || >>>>> + !PAGE_ALIGNED(back->buf_offset) || >>>>> + !PAGE_ALIGNED(back->buf_len)) >>>>> + return -EINVAL; >>>>> + } >>>>> + if (!back->buf_len || back->buf_len > dmabuf->size || >>>>> + back->buf_offset >= dmabuf->size || >>>>> + back->buf_offset + back->buf_len > dmabuf->size) >>>>> + return -EINVAL; >>>>> + if (back->file_offset + back->buf_len < back->file_offset) >>>>> + return -EINVAL; >>>>> + >>>>> + filp = fget(back->fd); >>>>> + if (!filp) >>>>> + return -EBADF; >>>>> + >>>>> + mutex_lock(&buffer->lock); >>>>> + ret = system_heap_rw_file(buffer, op == >>>> DMA_BUF_RW_FLAGS_READ, direct_io, >>>>> + filp, back->file_offset, back->buf_offset, back- >>>>> buf_len); >>>>> + mutex_unlock(&buffer->lock); >>>>> + >>>>> + fput(filp); >>>>> + return ret; >>>>> +} >>>>> + >>>>> static void system_heap_dma_buf_release(struct dma_buf *dmabuf) { >>>>> struct system_heap_buffer *buffer = dmabuf->priv; @@ -308,6 >>>> +425,7 >>>>> @@ static const struct dma_buf_ops system_heap_buf_ops = { >>>>> .mmap = system_heap_mmap, >>>>> .vmap = system_heap_vmap, >>>>> .vunmap = system_heap_vunmap, >>>>> + .rw_file = system_heap_dma_buf_rw_file, >>>>> .release = system_heap_dma_buf_release, }; >>>>> >>> >

1 month, 2 weeks

Re: [PATCH 1/7] drm/shmem-helper: Add lockdep asserts to vmap/vunmap

by Christian König

Am 18.03.25 um 20:22 schrieb Daniel Almeida: > From: Asahi Lina <lina(a)asahilina.net> > > Since commit 21aa27ddc582 ("drm/shmem-helper: Switch to reservation > lock"), the drm_gem_shmem_vmap and drm_gem_shmem_vunmap functions > require that the caller holds the DMA reservation lock for the object. > Add lockdep assertions to help validate this. > > Signed-off-by: Asahi Lina <lina(a)asahilina.net> > Signed-off-by: Daniel Almeida <daniel.almeida(a)collabora.com> Oh, yeah that is certainly a good idea. Reviewed-by: Christian König <christian.koenig(a)amd.com> > --- > drivers/gpu/drm/drm_gem_shmem_helper.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c > index 5ab351409312b5a0de542df2b636278d6186cb7b..ec89e9499f5f02a2a35713669bf649dd2abb9938 100644 > --- a/drivers/gpu/drm/drm_gem_shmem_helper.c > +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c > @@ -338,6 +338,8 @@ int drm_gem_shmem_vmap(struct drm_gem_shmem_object *shmem, > struct drm_gem_object *obj = &shmem->base; > int ret = 0; > > + dma_resv_assert_held(obj->resv); > + > if (obj->import_attach) { > ret = dma_buf_vmap(obj->import_attach->dmabuf, map); > if (!ret) { > @@ -404,6 +406,8 @@ void drm_gem_shmem_vunmap(struct drm_gem_shmem_object *shmem, > { > struct drm_gem_object *obj = &shmem->base; > > + dma_resv_assert_held(obj->resv); > + > if (obj->import_attach) { > dma_buf_vunmap(obj->import_attach->dmabuf, map); > } else { >

1 month, 3 weeks

[PATCH bpf-next v6 0/5] Replace CONFIG_DMABUF_SYSFS_STATS with BPF

by T.J. Mercier

Until CONFIG_DMABUF_SYSFS_STATS was added [1] it was only possible to perform per-buffer accounting with debugfs which is not suitable for production environments. Eventually we discovered the overhead with per-buffer sysfs file creation/removal was significantly impacting allocation and free times, and exacerbated kernfs lock contention. [2] dma_buf_stats_setup() is responsible for 39% of single-page buffer creation duration, or 74% of single-page dma_buf_export() duration when stressing dmabuf allocations and frees. I prototyped a change from per-buffer to per-exporter statistics with a RCU protected list of exporter allocations that accommodates most (but not all) of our use-cases and avoids almost all of the sysfs overhead. While that adds less overhead than per-buffer sysfs, and less even than the maintenance of the dmabuf debugfs_list, it's still *additional* overhead on top of the debugfs_list and doesn't give us per-buffer info. This series uses the existing dmabuf debugfs_list to implement a BPF dmabuf iterator, which adds no overhead to buffer allocation/free and provides per-buffer info. The list has been moved outside of CONFIG_DEBUG_FS scope so that it is always populated. The BPF program loaded by userspace that extracts per-buffer information gets to define its own interface which avoids the lack of ABI stability with debugfs. This will allow us to replace our use of CONFIG_DMABUF_SYSFS_STATS, and the plan is to remove it from the kernel after the next longterm stable release. [1] https://lore.kernel.org/linux-media/20201210044400.1080308-1-hridya@google.… [2] https://lore.kernel.org/all/20220516171315.2400578-1-tjmercier@google.com v1: https://lore.kernel.org/all/20250414225227.3642618-1-tjmercier@google.com v1 -> v2: Make the DMA buffer list independent of CONFIG_DEBUG_FS per Christian König Add CONFIG_DMA_SHARED_BUFFER check to kernel/bpf/Makefile per kernel test robot Use BTF_ID_LIST_SINGLE instead of BTF_ID_LIST_GLOBAL_SINGLE per Song Liu Fixup comment style, mixing code/declarations, and use ASSERT_OK_FD in selftest per Song Liu Add BPF_ITER_RESCHED feature to bpf_dmabuf_reg_info per Alexei Starovoitov Add open-coded iterator and selftest per Alexei Starovoitov Add a second test buffer from the system dmabuf heap to selftests Use the BPF program we'll use in production for selftest per Alexei Starovoitov https://r.android.com/c/platform/system/bpfprogs/+/3616123/2/dmabufIter.c https://r.android.com/c/platform/system/memory/libmeminfo/+/3614259/1/libdm… v2: https://lore.kernel.org/all/20250504224149.1033867-1-tjmercier@google.com v2 -> v3: Rebase onto bpf-next/master Move get_next_dmabuf() into drivers/dma-buf/dma-buf.c, along with the new get_first_dmabuf(). This avoids having to expose the dmabuf list and mutex to the rest of the kernel, and keeps the dmabuf mutex operations near each other in the same file. (Christian König) Add Christian's RB to dma-buf: Rename debugfs symbols Drop RFC: dma-buf: Remove DMA-BUF statistics v3: https://lore.kernel.org/all/20250507001036.2278781-1-tjmercier@google.com v3 -> v4: Fix selftest BPF program comment style (not kdoc) per Alexei Starovoitov Fix dma-buf.c kdoc comment style per Alexei Starovoitov Rename get_first_dmabuf / get_next_dmabuf to dma_buf_iter_begin / dma_buf_iter_next per Christian König Add Christian's RB to bpf: Add dmabuf iterator v4: https://lore.kernel.org/all/20250508182025.2961555-1-tjmercier@google.com v4 -> v5: Add Christian's Acks to all patches Add Song Liu's Acks Move BTF_ID_LIST_SINGLE and DEFINE_BPF_ITER_FUNC closer to usage per Song Liu Fix open-coded iterator comment style per Song Liu Move iterator termination check to its own subtest per Song Liu Rework selftest buffer creation per Song Liu Fix spacing in sanitize_string per BPF CI v5: https://lore.kernel.org/all/20250512174036.266796-1-tjmercier@google.com v5 -> v6: Song Liu: Init test buffer FDs to -1 Zero-init udmabuf_create for future proofing Bail early for iterator fd/FILE creation failure Dereference char ptr to check for NUL in sanitize_string() Move map insertion from create_test_buffers() to test_dmabuf_iter() Add ACK to selftests/bpf: Add test for open coded dmabuf_iter T.J. Mercier (5): dma-buf: Rename debugfs symbols bpf: Add dmabuf iterator bpf: Add open coded dmabuf iterator selftests/bpf: Add test for dmabuf_iter selftests/bpf: Add test for open coded dmabuf_iter drivers/dma-buf/dma-buf.c | 98 ++++-- include/linux/dma-buf.h | 4 +- kernel/bpf/Makefile | 3 + kernel/bpf/dmabuf_iter.c | 150 +++++++++ kernel/bpf/helpers.c | 5 + .../testing/selftests/bpf/bpf_experimental.h | 5 + tools/testing/selftests/bpf/config | 3 + .../selftests/bpf/prog_tests/dmabuf_iter.c | 285 ++++++++++++++++++ .../testing/selftests/bpf/progs/dmabuf_iter.c | 91 ++++++ 9 files changed, 622 insertions(+), 22 deletions(-) create mode 100644 kernel/bpf/dmabuf_iter.c create mode 100644 tools/testing/selftests/bpf/prog_tests/dmabuf_iter.c create mode 100644 tools/testing/selftests/bpf/progs/dmabuf_iter.c base-commit: 43745d11bfd9683abdf08ad7a5cc403d6a9ffd15 -- 2.49.0.1045.g170613ef41-goog

1 month, 3 weeks

Re: [PATCH 1/2] dmabuf: add DMA_BUF_IOCTL_RW_FILE

by Christian König

On 5/13/25 11:27, wangtao wrote: > Add DMA_BUF_IOCTL_RW_FILE to save/restore data from/to a dma-buf. Similar approach where rejected before in favor of using udmabuf. Is there any reason you can't use that approach as well? Regards, Christian. > > Signed-off-by: wangtao <tao.wangtao(a)honor.com> > --- > drivers/dma-buf/dma-buf.c | 8 ++++++++ > include/linux/dma-buf.h | 3 +++ > include/uapi/linux/dma-buf.h | 29 +++++++++++++++++++++++++++++ > 3 files changed, 40 insertions(+) > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c > index 5baa83b85515..95d8b0158ffd 100644 > --- a/drivers/dma-buf/dma-buf.c > +++ b/drivers/dma-buf/dma-buf.c > @@ -460,6 +460,7 @@ static long dma_buf_ioctl(struct file *file, > struct dma_buf *dmabuf; > struct dma_buf_sync sync; > enum dma_data_direction direction; > + struct dma_buf_rw_file kfile; > int ret; > > dmabuf = file->private_data; > @@ -504,6 +505,13 @@ static long dma_buf_ioctl(struct file *file, > return dma_buf_import_sync_file(dmabuf, (const void __user *)arg); > #endif > > + case DMA_BUF_IOCTL_RW_FILE: > + if (copy_from_user(&kfile, (void __user *) arg, sizeof(kfile))) > + return -EFAULT; > + if (!dmabuf->ops->rw_file) > + return -EINVAL; > + return dmabuf->ops->rw_file(dmabuf, &kfile); > + > default: > return -ENOTTY; > } > diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h > index 36216d28d8bd..de236ba2094b 100644 > --- a/include/linux/dma-buf.h > +++ b/include/linux/dma-buf.h > @@ -22,6 +22,7 @@ > #include <linux/fs.h> > #include <linux/dma-fence.h> > #include <linux/wait.h> > +#include <uapi/linux/dma-buf.h> > > struct device; > struct dma_buf; > @@ -285,6 +286,8 @@ struct dma_buf_ops { > > int (*vmap)(struct dma_buf *dmabuf, struct iosys_map *map); > void (*vunmap)(struct dma_buf *dmabuf, struct iosys_map *map); > + > + int (*rw_file)(struct dma_buf *dmabuf, struct dma_buf_rw_file *file); > }; > > /** > diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h > index 5a6fda66d9ad..ec9164b7b753 100644 > --- a/include/uapi/linux/dma-buf.h > +++ b/include/uapi/linux/dma-buf.h > @@ -167,6 +167,29 @@ struct dma_buf_import_sync_file { > __s32 fd; > }; > > +/** > + * struct dma_buf_rw_file - read/write file associated with a dma-buf > + * > + * Userspace can performs a DMA_BUF_IOCTL_BACK to save data from a dma-buf or > + * restore data to a dma-buf. > + */ > +struct dma_buf_rw_file { > + > + /** @flags: Flags indicating read/write for this dma-buf. */ > + __u32 flags; > + /** @fd: File descriptor of the file associated with this dma-buf. */ > + __s32 fd; > + /** @file_offset: Offset within the file where this dma-buf starts. > + * > + * Offset and Length must be page-aligned for direct I/O. > + */ > + __u64 file_offset; > + /** @buf_offset: Offset within this dma-buf where the read/write starts. */ > + __u64 buf_offset; > + /** @buf_len: Length of this dma-buf read/write. */ > + __u64 buf_len; > +}; > + > #define DMA_BUF_BASE 'b' > #define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync) > > @@ -179,4 +202,10 @@ struct dma_buf_import_sync_file { > #define DMA_BUF_IOCTL_EXPORT_SYNC_FILE _IOWR(DMA_BUF_BASE, 2, struct dma_buf_export_sync_file) > #define DMA_BUF_IOCTL_IMPORT_SYNC_FILE _IOW(DMA_BUF_BASE, 3, struct dma_buf_import_sync_file) > > +#define DMA_BUF_RW_FLAGS_OP_MASK (0xFF << 0) > +#define DMA_BUF_RW_FLAGS_READ (1 << 0) /* Restore dma-buf data */ > +#define DMA_BUF_RW_FLAGS_WRITE (2 << 0) /* Save dma-buf data */ > +#define DMA_BUF_RW_FLAGS_DIRECT (1u << 31) /* Direct read/write file */ > +#define DMA_BUF_IOCTL_RW_FILE _IOW(DMA_BUF_BASE, 4, struct dma_buf_rw_file) > + > #endif

1 month, 3 weeks

Re: [RFC v2 10/13] dma-fence: Add safe access helpers and document the rules

by Rob Clark

On Fri, May 9, 2025 at 8:34 AM Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> wrote: > > Dma-fence objects currently suffer from a potential use after free problem > where fences exported to userspace and other drivers can outlive the > exporting driver, or the associated data structures. > > The discussion on how to address this concluded that adding reference > counting to all the involved objects is not desirable, since it would need > to be very wide reaching and could cause unloadable drivers if another > entity would be holding onto a signaled fence reference potentially > indefinitely. > > This patch enables the safe access by introducing and documenting a > contract between fence exporters and users. It documents a set of > contraints and adds helpers which a) drivers with potential to suffer from > the use after free must use and b) users of the dma-fence API must use as > well. > > Premise of the design has multiple sides: > > 1. Drivers (fence exporters) MUST ensure a RCU grace period between > signalling a fence and freeing the driver private data associated with it. > > The grace period does not have to follow the signalling immediately but > HAS to happen before data is freed. > > 2. Users of the dma-fence API marked with such requirement MUST contain > the complete access to the data within a single code block guarded by the > new dma_fence_access_begin() and dma_fence_access_end() helpers. > > The combination of the two ensures that whoever sees the > DMA_FENCE_FLAG_SIGNALED_BIT not set is guaranteed to have access to a > valid fence->lock and valid data potentially accessed by the fence->ops > virtual functions, until the call to dma_fence_access_end(). > > 3. Module unload (fence->ops) disappearing is for now explicitly not > handled. That would required a more complex protection, possibly needing > SRCU instead of RCU to handle callers such as dma_fence_wait_timeout(), > where race between dma_fence_enable_sw_signaling, signalling, and > dereference of fence->ops->wait() would need a sleeping SRCU context. > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> > --- > drivers/dma-buf/dma-fence.c | 69 +++++++++++++++++++++++++++++++++++++ > include/linux/dma-fence.h | 32 ++++++++++++----- > 2 files changed, 93 insertions(+), 8 deletions(-) > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c > index dc2456f68685..cfe1d7b79c22 100644 > --- a/drivers/dma-buf/dma-fence.c > +++ b/drivers/dma-buf/dma-fence.c > @@ -533,6 +533,7 @@ void dma_fence_release(struct kref *kref) > struct dma_fence *fence = > container_of(kref, struct dma_fence, refcount); > > + dma_fence_access_begin(); > trace_dma_fence_destroy(fence); > > if (WARN(!list_empty(&fence->cb_list) && > @@ -560,6 +561,8 @@ void dma_fence_release(struct kref *kref) > fence->ops->release(fence); > else > dma_fence_free(fence); > + > + dma_fence_access_end(); > } > EXPORT_SYMBOL(dma_fence_release); > > @@ -982,11 +985,13 @@ EXPORT_SYMBOL(dma_fence_set_deadline); > */ > void dma_fence_describe(struct dma_fence *fence, struct seq_file *seq) > { > + dma_fence_access_begin(); > seq_printf(seq, "%s %s seq %llu %ssignalled\n", > dma_fence_driver_name(fence), > dma_fence_timeline_name(fence), > fence->seqno, > dma_fence_is_signaled(fence) ? "" : "un"); > + dma_fence_access_end(); > } > EXPORT_SYMBOL(dma_fence_describe); > > @@ -1033,3 +1038,67 @@ dma_fence_init64(struct dma_fence *fence, const struct dma_fence_ops *ops, > __set_bit(DMA_FENCE_FLAG_SEQNO64_BIT, &fence->flags); > } > EXPORT_SYMBOL(dma_fence_init64); > + > +/** > + * dma_fence_driver_name - Access the driver name > + * @fence: the fence to query > + * > + * Returns a driver name backing the dma-fence implementation. > + * > + * IMPORTANT CONSIDERATION: > + * Dma-fence contract stipulates that access to driver provided data (data not > + * directly embedded into the object itself), such as the &dma_fence.lock and > + * memory potentially accessed by the &dma_fence.ops functions, is forbidden > + * after the fence has been signalled. Drivers are allowed to free that data, > + * and some do. > + * > + * To allow safe access drivers are mandated to guarantee a RCU grace period > + * between signalling the fence and freeing said data. > + * > + * As such access to the driver name is only valid inside a RCU locked section. > + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded > + * by the &dma_fence_access_being and &dma_fence_access_end pair. > + */ > +const char *dma_fence_driver_name(struct dma_fence *fence) > +{ > + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), > + "rcu_read_lock() required for safe access to returned string"); > + > + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) > + return fence->ops->get_driver_name(fence); > + else > + return "detached-driver"; > +} > +EXPORT_SYMBOL(dma_fence_driver_name); > + > +/** > + * dma_fence_timeline_name - Access the timeline name > + * @fence: the fence to query > + * > + * Returns a timeline name provided by the dma-fence implementation. > + * > + * IMPORTANT CONSIDERATION: > + * Dma-fence contract stipulates that access to driver provided data (data not > + * directly embedded into the object itself), such as the &dma_fence.lock and > + * memory potentially accessed by the &dma_fence.ops functions, is forbidden > + * after the fence has been signalled. Drivers are allowed to free that data, > + * and some do. > + * > + * To allow safe access drivers are mandated to guarantee a RCU grace period > + * between signalling the fence and freeing said data. > + * > + * As such access to the driver name is only valid inside a RCU locked section. > + * The pointer MUST be both queried and USED ONLY WITHIN a SINGLE block guarded > + * by the &dma_fence_access_being and &dma_fence_access_end pair. > + */ > +const char *dma_fence_timeline_name(struct dma_fence *fence) > +{ > + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), > + "rcu_read_lock() required for safe access to returned string"); > + > + if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) > + return fence->ops->get_driver_name(fence); > + else > + return "signaled-timeline"; This means that trace_dma_fence_signaled() will get the wrong timeline/driver name, which probably screws up perfetto and maybe other tools. Maybe it would work well enough just to move the trace_dma_fence_signaled() call ahead of the test_and_set_bit()? Idk if some things will start getting confused if they see that trace multiple times. Maybe a better solution would be to add a new bit for this purpose, which is set after the tracepoint in dma_fence_signal_timestamp_locked(). (trace_dma_fence_destroy() will I guess be messed up regardless.. it doesn't look like perfetto cares about this tracepoint, so maybe that is ok. It doesn't seem so useful.) BR, -R > +} > +EXPORT_SYMBOL(dma_fence_timeline_name); > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h > index c814a86087f8..c8a9915eb360 100644 > --- a/include/linux/dma-fence.h > +++ b/include/linux/dma-fence.h > @@ -387,15 +387,31 @@ bool dma_fence_remove_callback(struct dma_fence *fence, > struct dma_fence_cb *cb); > void dma_fence_enable_sw_signaling(struct dma_fence *fence); > > -static inline const char *dma_fence_driver_name(struct dma_fence *fence) > -{ > - return fence->ops->get_driver_name(fence); > -} > +/** > + * DOC: Safe external access to driver provided object members > + * > + * All data not stored directly in the dma-fence object, such as the > + * &dma_fence.lock and memory potentially accessed by functions in the > + * &dma_fence.ops table, MUST NOT be accessed after the fence has been signalled > + * because after that point drivers are allowed to free it. > + * > + * All code accessing that data via the dma-fence API (or directly, which is > + * discouraged), MUST make sure to contain the complete access within a > + * &dma_fence_access_begin and &dma_fence_access_end pair. > + * > + * Some dma-fence API handles this automatically, while other, as for example > + * &dma_fence_driver_name and &dma_fence_timeline_name, leave that > + * responsibility to the caller. > + * > + * To enable this scheme to work drivers MUST ensure a RCU grace period elapses > + * between signalling the fence and freeing the said data. > + * > + */ > +#define dma_fence_access_begin rcu_read_lock > +#define dma_fence_access_end rcu_read_unlock > > -static inline const char *dma_fence_timeline_name(struct dma_fence *fence) > -{ > - return fence->ops->get_timeline_name(fence); > -} > +const char *dma_fence_driver_name(struct dma_fence *fence); > +const char *dma_fence_timeline_name(struct dma_fence *fence); > > /** > * dma_fence_is_signaled_locked - Return an indication if the fence > -- > 2.48.0 >

1 month, 3 weeks

Re: [PATCH 2/2] dmabuf/heaps: implement DMA_BUF_IOCTL_RW_FILE for system_heap

by Christian König

On 5/13/25 14:30, wangtao wrote: >> -----Original Message----- >> From: Christian König <christian.koenig(a)amd.com> >> Sent: Tuesday, May 13, 2025 7:32 PM >> To: wangtao <tao.wangtao(a)honor.com>; sumit.semwal(a)linaro.org; >> benjamin.gaignard(a)collabora.com; Brian.Starkey(a)arm.com; >> jstultz(a)google.com; tjmercier(a)google.com >> Cc: linux-media(a)vger.kernel.org; dri-devel(a)lists.freedesktop.org; linaro- >> mm-sig(a)lists.linaro.org; linux-kernel(a)vger.kernel.org; >> wangbintian(BintianWang) <bintian.wang(a)honor.com>; yipengxiang >> <yipengxiang(a)honor.com>; liulu 00013167 <liulu.liu(a)honor.com>; hanfeng >> 00012985 <feng.han(a)honor.com> >> Subject: Re: [PATCH 2/2] dmabuf/heaps: implement >> DMA_BUF_IOCTL_RW_FILE for system_heap >> >> On 5/13/25 11:28, wangtao wrote: >>> Support direct file I/O operations for system_heap dma-buf objects. >>> Implementation includes: >>> 1. Convert sg_table to bio_vec >> >> That is usually illegal for DMA-bufs. > [wangtao] The term 'convert' is misleading in this context. The appropriate phrasing should be: Construct bio_vec from sg_table. Well it doesn't matter what you call it. Touching the page inside an sg table of a DMA-buf is illegal, we even have code to actively prevent that. Once more: This approach was already rejected multiple times! Please use udmabuf instead! The hack you came up here is simply not necessary. Regards, Christian. > Appreciate your feedback. >> >> Regards, >> Christian. >> >>> 2. Set IOCB_DIRECT when O_DIRECT is supported 3. Invoke >>> vfs_iocb_iter_read()/vfs_iocb_iter_write() for actual I/O >>> >>> Performance metrics (UFS 4.0 device @4GB/s, Arm64 CPU @1GHz): >>> >>> | Metric | 1MB | 8MB | 64MB | 1024MB | 3072MB | >>> |--------------------|-------:|-------:|--------:|---------:|--------- >>> |--------------------|:| >>> | Buffer Read (us) | 1658 | 9028 | 69295 | 1019783 | 2978179 | >>> | Direct Read (us) | 707 | 2647 | 18689 | 299627 | 937758 | >>> | Buffer Rate (MB/s) | 603 | 886 | 924 | 1004 | 1032 | >>> | Direct Rate (MB/s) | 1414 | 3022 | 3425 | 3418 | 3276 | >>> >>> Signed-off-by: wangtao <tao.wangtao(a)honor.com> >>> --- >>> drivers/dma-buf/heaps/system_heap.c | 118 >>> ++++++++++++++++++++++++++++ >>> 1 file changed, 118 insertions(+) >>> >>> diff --git a/drivers/dma-buf/heaps/system_heap.c >>> b/drivers/dma-buf/heaps/system_heap.c >>> index 26d5dc89ea16..f7b71b9843aa 100644 >>> --- a/drivers/dma-buf/heaps/system_heap.c >>> +++ b/drivers/dma-buf/heaps/system_heap.c >>> @@ -20,6 +20,8 @@ >>> #include <linux/scatterlist.h> >>> #include <linux/slab.h> >>> #include <linux/vmalloc.h> >>> +#include <linux/bvec.h> >>> +#include <linux/uio.h> >>> >>> static struct dma_heap *sys_heap; >>> >>> @@ -281,6 +283,121 @@ static void system_heap_vunmap(struct dma_buf >> *dmabuf, struct iosys_map *map) >>> iosys_map_clear(map); >>> } >>> >>> +static struct bio_vec *system_heap_init_bvec(struct >> system_heap_buffer *buffer, >>> + size_t offset, size_t len, int *nr_segs) { >>> + struct sg_table *sgt = &buffer->sg_table; >>> + struct scatterlist *sg; >>> + size_t length = 0; >>> + unsigned int i, k = 0; >>> + struct bio_vec *bvec; >>> + size_t sg_left; >>> + size_t sg_offset; >>> + size_t sg_len; >>> + >>> + bvec = kvcalloc(sgt->nents, sizeof(*bvec), GFP_KERNEL); >>> + if (!bvec) >>> + return NULL; >>> + >>> + for_each_sg(sgt->sgl, sg, sgt->nents, i) { >>> + length += sg->length; >>> + if (length <= offset) >>> + continue; >>> + >>> + sg_left = length - offset; >>> + sg_offset = sg->offset + sg->length - sg_left; >>> + sg_len = min(sg_left, len); >>> + >>> + bvec[k].bv_page = sg_page(sg); >>> + bvec[k].bv_len = sg_len; >>> + bvec[k].bv_offset = sg_offset; >>> + k++; >>> + >>> + offset += sg_len; >>> + len -= sg_len; >>> + if (len <= 0) >>> + break; >>> + } >>> + >>> + *nr_segs = k; >>> + return bvec; >>> +} >>> + >>> +static int system_heap_rw_file(struct system_heap_buffer *buffer, bool >> is_read, >>> + bool direct_io, struct file *filp, loff_t file_offset, >>> + size_t buf_offset, size_t len) >>> +{ >>> + struct bio_vec *bvec; >>> + int nr_segs = 0; >>> + struct iov_iter iter; >>> + struct kiocb kiocb; >>> + ssize_t ret = 0; >>> + >>> + if (direct_io) { >>> + if (!(filp->f_mode & FMODE_CAN_ODIRECT)) >>> + return -EINVAL; >>> + } >>> + >>> + bvec = system_heap_init_bvec(buffer, buf_offset, len, &nr_segs); >>> + if (!bvec) >>> + return -ENOMEM; >>> + >>> + iov_iter_bvec(&iter, is_read ? ITER_DEST : ITER_SOURCE, bvec, >> nr_segs, len); >>> + init_sync_kiocb(&kiocb, filp); >>> + kiocb.ki_pos = file_offset; >>> + if (direct_io) >>> + kiocb.ki_flags |= IOCB_DIRECT; >>> + >>> + while (kiocb.ki_pos < file_offset + len) { >>> + if (is_read) >>> + ret = vfs_iocb_iter_read(filp, &kiocb, &iter); >>> + else >>> + ret = vfs_iocb_iter_write(filp, &kiocb, &iter); >>> + if (ret <= 0) >>> + break; >>> + } >>> + >>> + kvfree(bvec); >>> + return ret < 0 ? ret : 0; >>> +} >>> + >>> +static int system_heap_dma_buf_rw_file(struct dma_buf *dmabuf, >>> + struct dma_buf_rw_file *back) >>> +{ >>> + struct system_heap_buffer *buffer = dmabuf->priv; >>> + int ret = 0; >>> + __u32 op = back->flags & DMA_BUF_RW_FLAGS_OP_MASK; >>> + bool direct_io = back->flags & DMA_BUF_RW_FLAGS_DIRECT; >>> + struct file *filp; >>> + >>> + if (op != DMA_BUF_RW_FLAGS_READ && op != >> DMA_BUF_RW_FLAGS_WRITE) >>> + return -EINVAL; >>> + if (direct_io) { >>> + if (!PAGE_ALIGNED(back->file_offset) || >>> + !PAGE_ALIGNED(back->buf_offset) || >>> + !PAGE_ALIGNED(back->buf_len)) >>> + return -EINVAL; >>> + } >>> + if (!back->buf_len || back->buf_len > dmabuf->size || >>> + back->buf_offset >= dmabuf->size || >>> + back->buf_offset + back->buf_len > dmabuf->size) >>> + return -EINVAL; >>> + if (back->file_offset + back->buf_len < back->file_offset) >>> + return -EINVAL; >>> + >>> + filp = fget(back->fd); >>> + if (!filp) >>> + return -EBADF; >>> + >>> + mutex_lock(&buffer->lock); >>> + ret = system_heap_rw_file(buffer, op == >> DMA_BUF_RW_FLAGS_READ, direct_io, >>> + filp, back->file_offset, back->buf_offset, back- >>> buf_len); >>> + mutex_unlock(&buffer->lock); >>> + >>> + fput(filp); >>> + return ret; >>> +} >>> + >>> static void system_heap_dma_buf_release(struct dma_buf *dmabuf) { >>> struct system_heap_buffer *buffer = dmabuf->priv; @@ -308,6 >> +425,7 >>> @@ static const struct dma_buf_ops system_heap_buf_ops = { >>> .mmap = system_heap_mmap, >>> .vmap = system_heap_vmap, >>> .vunmap = system_heap_vunmap, >>> + .rw_file = system_heap_dma_buf_rw_file, >>> .release = system_heap_dma_buf_release, }; >>> >

1 month, 3 weeks

Re: [PATCH 2/2] dmabuf/heaps: implement DMA_BUF_IOCTL_RW_FILE for system_heap

by Christian König

On 5/13/25 11:28, wangtao wrote: > Support direct file I/O operations for system_heap dma-buf objects. > Implementation includes: > 1. Convert sg_table to bio_vec That is usually illegal for DMA-bufs. Regards, Christian. > 2. Set IOCB_DIRECT when O_DIRECT is supported > 3. Invoke vfs_iocb_iter_read()/vfs_iocb_iter_write() for actual I/O > > Performance metrics (UFS 4.0 device @4GB/s, Arm64 CPU @1GHz): > > | Metric | 1MB | 8MB | 64MB | 1024MB | 3072MB | > |--------------------|-------:|-------:|--------:|---------:|---------:| > | Buffer Read (us) | 1658 | 9028 | 69295 | 1019783 | 2978179 | > | Direct Read (us) | 707 | 2647 | 18689 | 299627 | 937758 | > | Buffer Rate (MB/s) | 603 | 886 | 924 | 1004 | 1032 | > | Direct Rate (MB/s) | 1414 | 3022 | 3425 | 3418 | 3276 | > > Signed-off-by: wangtao <tao.wangtao(a)honor.com> > --- > drivers/dma-buf/heaps/system_heap.c | 118 ++++++++++++++++++++++++++++ > 1 file changed, 118 insertions(+) > > diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c > index 26d5dc89ea16..f7b71b9843aa 100644 > --- a/drivers/dma-buf/heaps/system_heap.c > +++ b/drivers/dma-buf/heaps/system_heap.c > @@ -20,6 +20,8 @@ > #include <linux/scatterlist.h> > #include <linux/slab.h> > #include <linux/vmalloc.h> > +#include <linux/bvec.h> > +#include <linux/uio.h> > > static struct dma_heap *sys_heap; > > @@ -281,6 +283,121 @@ static void system_heap_vunmap(struct dma_buf *dmabuf, struct iosys_map *map) > iosys_map_clear(map); > } > > +static struct bio_vec *system_heap_init_bvec(struct system_heap_buffer *buffer, > + size_t offset, size_t len, int *nr_segs) > +{ > + struct sg_table *sgt = &buffer->sg_table; > + struct scatterlist *sg; > + size_t length = 0; > + unsigned int i, k = 0; > + struct bio_vec *bvec; > + size_t sg_left; > + size_t sg_offset; > + size_t sg_len; > + > + bvec = kvcalloc(sgt->nents, sizeof(*bvec), GFP_KERNEL); > + if (!bvec) > + return NULL; > + > + for_each_sg(sgt->sgl, sg, sgt->nents, i) { > + length += sg->length; > + if (length <= offset) > + continue; > + > + sg_left = length - offset; > + sg_offset = sg->offset + sg->length - sg_left; > + sg_len = min(sg_left, len); > + > + bvec[k].bv_page = sg_page(sg); > + bvec[k].bv_len = sg_len; > + bvec[k].bv_offset = sg_offset; > + k++; > + > + offset += sg_len; > + len -= sg_len; > + if (len <= 0) > + break; > + } > + > + *nr_segs = k; > + return bvec; > +} > + > +static int system_heap_rw_file(struct system_heap_buffer *buffer, bool is_read, > + bool direct_io, struct file *filp, loff_t file_offset, > + size_t buf_offset, size_t len) > +{ > + struct bio_vec *bvec; > + int nr_segs = 0; > + struct iov_iter iter; > + struct kiocb kiocb; > + ssize_t ret = 0; > + > + if (direct_io) { > + if (!(filp->f_mode & FMODE_CAN_ODIRECT)) > + return -EINVAL; > + } > + > + bvec = system_heap_init_bvec(buffer, buf_offset, len, &nr_segs); > + if (!bvec) > + return -ENOMEM; > + > + iov_iter_bvec(&iter, is_read ? ITER_DEST : ITER_SOURCE, bvec, nr_segs, len); > + init_sync_kiocb(&kiocb, filp); > + kiocb.ki_pos = file_offset; > + if (direct_io) > + kiocb.ki_flags |= IOCB_DIRECT; > + > + while (kiocb.ki_pos < file_offset + len) { > + if (is_read) > + ret = vfs_iocb_iter_read(filp, &kiocb, &iter); > + else > + ret = vfs_iocb_iter_write(filp, &kiocb, &iter); > + if (ret <= 0) > + break; > + } > + > + kvfree(bvec); > + return ret < 0 ? ret : 0; > +} > + > +static int system_heap_dma_buf_rw_file(struct dma_buf *dmabuf, > + struct dma_buf_rw_file *back) > +{ > + struct system_heap_buffer *buffer = dmabuf->priv; > + int ret = 0; > + __u32 op = back->flags & DMA_BUF_RW_FLAGS_OP_MASK; > + bool direct_io = back->flags & DMA_BUF_RW_FLAGS_DIRECT; > + struct file *filp; > + > + if (op != DMA_BUF_RW_FLAGS_READ && op != DMA_BUF_RW_FLAGS_WRITE) > + return -EINVAL; > + if (direct_io) { > + if (!PAGE_ALIGNED(back->file_offset) || > + !PAGE_ALIGNED(back->buf_offset) || > + !PAGE_ALIGNED(back->buf_len)) > + return -EINVAL; > + } > + if (!back->buf_len || back->buf_len > dmabuf->size || > + back->buf_offset >= dmabuf->size || > + back->buf_offset + back->buf_len > dmabuf->size) > + return -EINVAL; > + if (back->file_offset + back->buf_len < back->file_offset) > + return -EINVAL; > + > + filp = fget(back->fd); > + if (!filp) > + return -EBADF; > + > + mutex_lock(&buffer->lock); > + ret = system_heap_rw_file(buffer, op == DMA_BUF_RW_FLAGS_READ, direct_io, > + filp, back->file_offset, back->buf_offset, back->buf_len); > + mutex_unlock(&buffer->lock); > + > + fput(filp); > + return ret; > +} > + > static void system_heap_dma_buf_release(struct dma_buf *dmabuf) > { > struct system_heap_buffer *buffer = dmabuf->priv; > @@ -308,6 +425,7 @@ static const struct dma_buf_ops system_heap_buf_ops = { > .mmap = system_heap_mmap, > .vmap = system_heap_vmap, > .vunmap = system_heap_vunmap, > + .rw_file = system_heap_dma_buf_rw_file, > .release = system_heap_dma_buf_release, > }; >

1 month, 3 weeks

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig