Am 23.01.24 um 14:02 schrieb Paul Cercueil:
[SNIP]
That an exporter has to call extra functions to access his own buffers is a complete no-go for the design since this forces exporters into doing extra steps for allowing importers to access their data.
Then what about we add these dma_buf_{begin,end}_access(), with only implementations for "dumb" exporters e.g. udmabuf or the dmabuf heaps? And only importers (who cache the mapping and actually care about non- coherency) would have to call these.
No, the problem is still that you would have to change all importers to mandatory use dma_buf_begin/end.
But going a step back caching the mapping is irrelevant for coherency. Even if you don't cache the mapping you don't get coherency.
You actually do - at least with udmabuf, as in that case dma_buf_map_attachment() / dma_buf_unmap_attachment() will handle cache coherency when the SGs are mapped/unmapped.
Well I just double checked the source in 6.7.1 and I can't see udmabuf doing anything for cache coherency in map/unmap.
All it does is calling dma_map_sgtable() and dma_unmap_sgtable() to create and destroy the SG table and those are not supposed to sync anything to the CPU cache.
In other words drivers usually use DMA_ATTR_SKIP_CPU_SYNC here, it's just that this is missing from udmabuf.
The problem was then that dma_buf_unmap_attachment cannot be called before the dma_fence is signaled, and calling it after is already too late (because the fence would be signaled before the data is sync'd).
Well what sync are you talking about? CPU sync? In DMA-buf that is handled differently.
For importers it's mandatory that they can be coherent with the exporter. That usually means they can snoop the CPU cache if the exporter can snoop the CPU cache.
For exporters you can implement the begin/end CPU access functions which allows you to implement something even if your exporting device can't snoop the CPU cache.
Daniel / Sima suggested then that I cache the mapping and add new functions to ensure cache coherency, which is what these patches are about.
Yeah, I've now catched up on the latest mail. Sorry I haven't seen that before.
In other words exporters are not require to call sync_to_cpu or sync_to_device when you create a mapping.
What exactly is your use case here? And why does coherency matters?
My use-case is, I create DMABUFs with udmabuf, that I attach to USB/functionfs with the interface introduced by this patchset. I attach them to IIO with a similar interface (being upstreamed in parallel), and transfer data from USB to IIO and vice-versa in a zero-copy fashion.
This works perfectly fine as long as the USB and IIO hardware are coherent between themselves, which is the case on most of our boards. However I do have a board (with a Xilinx Ultrascale SoC) where it is not the case, and cache flushes/sync are needed. So I was trying to rework these new interfaces to work on that system too.
Yeah, that sounds strongly like one of the use cases we have rejected so far.
If this really is a no-no, then I am fine with the assumption that devices sharing a DMABUF must be coherent between themselves; but that's something that should probably be enforced rather than assumed.
(and I *think* there is a way to force coherency in the Ultrascale's interconnect - we're investigating it)
What you can do is that instead of using udmabuf or dma-heaps is that the device which can't provide coherency act as exporters of the buffers.
The exporter is allowed to call sync_for_cpu/sync_for_device on it's own buffers and also gets begin/end CPU access notfications. So you can then handle coherency between the exporter and the CPU.
If you really don't have coherency between devices then that would be a really new use case and we would need much more agreement on how to do this.
Regards, Christian.
Cheers, -Paul
At the very least, is there a way to check that "the data can be accessed coherently by the involved devices"? So that my importer can EPERM if there is no coherency vs. a device that's already attached.
Yeah, there is functionality for this in the DMA subsystem. I've once created prototype patches for enforcing the same coherency approach between importer and exporter, but we never got around to upstream them.
Cheers, -Paul
That in turn is pretty much un-testable unless you have every possible importer around while testing the exporter.
Regards, Christian.
Regards, Christian.
Cheers, -Paul
> Signed-off-by: Paul Cercueilpaul@crapouillou.net > > --- > v5: New patch > --- > drivers/dma-buf/dma-buf.c | 66 > +++++++++++++++++++++++++++++++++++++++ > include/linux/dma-buf.h | 37 ++++++++++++++++++++++ > 2 files changed, 103 insertions(+) > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma- > buf/dma- > buf.c > index 8fe5aa67b167..a8bab6c18fcd 100644 > --- a/drivers/dma-buf/dma-buf.c > +++ b/drivers/dma-buf/dma-buf.c > @@ -830,6 +830,8 @@ static struct sg_table * > __map_dma_buf(struct > dma_buf_attachment *attach, > * - dma_buf_mmap() > * - dma_buf_begin_cpu_access() > * - dma_buf_end_cpu_access() > + * - dma_buf_begin_access() > + * - dma_buf_end_access() > * - dma_buf_map_attachment_unlocked() > * - dma_buf_unmap_attachment_unlocked() > * - dma_buf_vmap_unlocked() > @@ -1602,6 +1604,70 @@ void dma_buf_vunmap_unlocked(struct > dma_buf > *dmabuf, struct iosys_map *map) > } > EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap_unlocked, DMA_BUF); > > +/** > + * @dma_buf_begin_access - Call before any hardware access > from/to > the DMABUF > + * @attach: [in] attachment used for hardware > access > + * @sg_table: [in] scatterlist used for the DMA > transfer > + * @direction: [in] direction of DMA transfer > + */ > +int dma_buf_begin_access(struct dma_buf_attachment > *attach, > + struct sg_table *sgt, enum > dma_data_direction dir) > +{ > + struct dma_buf *dmabuf; > + bool cookie; > + int ret; > + > + if (WARN_ON(!attach)) > + return -EINVAL; > + > + dmabuf = attach->dmabuf; > + > + if (!dmabuf->ops->begin_access) > + return 0; > + > + cookie = dma_fence_begin_signalling(); > + ret = dmabuf->ops->begin_access(attach, sgt, dir); > + dma_fence_end_signalling(cookie); > + > + if (WARN_ON_ONCE(ret)) > + return ret; > + > + return 0; > +} > +EXPORT_SYMBOL_NS_GPL(dma_buf_begin_access, DMA_BUF); > + > +/** > + * @dma_buf_end_access - Call after any hardware access > from/to > the DMABUF > + * @attach: [in] attachment used for hardware > access > + * @sg_table: [in] scatterlist used for the DMA > transfer > + * @direction: [in] direction of DMA transfer > + */ > +int dma_buf_end_access(struct dma_buf_attachment *attach, > + struct sg_table *sgt, enum > dma_data_direction dir) > +{ > + struct dma_buf *dmabuf; > + bool cookie; > + int ret; > + > + if (WARN_ON(!attach)) > + return -EINVAL; > + > + dmabuf = attach->dmabuf; > + > + if (!dmabuf->ops->end_access) > + return 0; > + > + cookie = dma_fence_begin_signalling(); > + ret = dmabuf->ops->end_access(attach, sgt, dir); > + dma_fence_end_signalling(cookie); > + > + if (WARN_ON_ONCE(ret)) > + return ret; > + > + return 0; > +} > +EXPORT_SYMBOL_NS_GPL(dma_buf_end_access, DMA_BUF); > + > #ifdef CONFIG_DEBUG_FS > static int dma_buf_debug_show(struct seq_file *s, void > *unused) > { > diff --git a/include/linux/dma-buf.h b/include/linux/dma- > buf.h > index 8ff4add71f88..8ba612c7cc16 100644 > --- a/include/linux/dma-buf.h > +++ b/include/linux/dma-buf.h > @@ -246,6 +246,38 @@ struct dma_buf_ops { > */ > int (*end_cpu_access)(struct dma_buf *, enum > dma_data_direction); > > + /** > + * @begin_access: > + * > + * This is called from dma_buf_begin_access() when > a > device driver > + * wants to access the data of the DMABUF. The > exporter > can use this > + * to flush/sync the caches if needed. > + * > + * This callback is optional. > + * > + * Returns: > + * > + * 0 on success or a negative error code on > failure. > + */ > + int (*begin_access)(struct dma_buf_attachment *, > struct > sg_table *, > + enum dma_data_direction); > + > + /** > + * @end_access: > + * > + * This is called from dma_buf_end_access() when a > device > driver is > + * done accessing the data of the DMABUF. The > exporter > can > use this > + * to flush/sync the caches if needed. > + * > + * This callback is optional. > + * > + * Returns: > + * > + * 0 on success or a negative error code on > failure. > + */ > + int (*end_access)(struct dma_buf_attachment *, > struct > sg_table *, > + enum dma_data_direction); > + > /** > * @mmap: > * > @@ -606,6 +638,11 @@ void dma_buf_detach(struct dma_buf > *dmabuf, > int dma_buf_pin(struct dma_buf_attachment *attach); > void dma_buf_unpin(struct dma_buf_attachment *attach); > > +int dma_buf_begin_access(struct dma_buf_attachment > *attach, > + struct sg_table *sgt, enum > dma_data_direction dir); > +int dma_buf_end_access(struct dma_buf_attachment *attach, > + struct sg_table *sgt, enum > dma_data_direction dir); > + > struct dma_buf *dma_buf_export(const struct > dma_buf_export_info > *exp_info); > > int dma_buf_fd(struct dma_buf *dmabuf, int flags);