Am 24.01.24 um 11:58 schrieb Paul Cercueil:
[SNIP]
The problem was then that dma_buf_unmap_attachment cannot be called
before the dma_fence is signaled, and calling it after is already
too
late (because the fence would be signaled before the data is
sync'd).
Well what sync are you talking about? CPU sync? In DMA-buf that is
handled differently.
For importers it's mandatory that they can be coherent with the
exporter. That usually means they can snoop the CPU cache if the
exporter can snoop the CPU cache.
I seem to have such a system where one device can snoop the CPU cache
and the other cannot. Therefore if I want to support it properly, I do
need cache flush/sync. I don't actually try to access the data using
the CPU (and when I do, I call the sync start/end ioctls).
Usually that isn't a problem as long as you don't access the data
with the CPU.
[SNIP]
(and I *think* there is a way to force coherency in the
Ultrascale's
interconnect - we're investigating it)
What you can do is that instead of using udmabuf or dma-heaps is
that the device which can't provide coherency act as exporters of the
buffers.
The exporter is allowed to call sync_for_cpu/sync_for_device on it's
own buffers and also gets begin/end CPU access notfications. So you
can then handle coherency between the exporter and the CPU.
But again that would only work if the importers would call
begin_cpu_access() / end_cpu_access(), which they don't, because they
don't actually access the data using the CPU.
Wow, that is a completely new use case then.
Neither DMA-buf nor the DMA subsystem in Linux actually supports
this as far as I can see.
Unless you mean that the exporter can call sync_for_cpu/sync_for_device
before/after every single DMA transfer so that the data appears
coherent to the importers, without them having to call
begin_cpu_access() / end_cpu_access().
Yeah, I mean the importers don't have to call begin_cpu_access() /
end_cpu_access() if they don't do CPU access :)
What you can still do as exporter is to call sync_for_device() and
sync_for_cpu() before and after each operation on your non-coherent
device. Paired with the fence signaling that should still work fine
then.
But taking a step back, this use case is not something even the low
level DMA subsystem supports. That sync_for_cpu() does the right
thing is coincident and not proper engineering.
What you need is a sync_device_to_device() which does the
appropriate actions depending on which devices are involved.
In which case - this would still demultiply the complexity; my USB-
functionfs interface here (and IIO interface in the separate patchset)
are not device-specific, so I'd rather keep them importers.
If you really don't have coherency between devices then that would
be a really new use case and we would need much more agreement on how
to do this.
[snip]
Agreed. Desiging a good generic solution would be better.
With that said...
Let's keep it out of this USB-functionfs interface for now. The
interface does work perfectly fine on platforms that don't have
coherency problems. The coherency issue in itself really is a
tangential issue.
Yeah, completely agree.
So I will send a v6 where I don't try to force the cache coherency -
and instead assume that the attached devices are coherent between
themselves.
But it would be even better to have a way to detect non-coherency and
return an error on attach.
Take a look into the DMA subsystem. I'm pretty sure we already have
something like this in there.
If nothing else helps you could take a look if the coherent memory
access mask is non zero or something like that.
Regards,
Christian.
Cheers,
-Paul