Hi Christian,
(Your email software is configured for HTML btw)
Le mardi 30 janvier 2024 à 10:23 +0100, Christian König a écrit :
Am 30.01.24 um 10:01 schrieb Daniel Vetter:
On Fri, Jan 26, 2024 at 05:42:50PM +0100, Christian König wrote:
[SNIP] Well I think we should have some solution, but I'm not sure if it should be part of DMA-buf.
Essentially a DMA-buf exports the buffers as he uses it and the importer (or the DMA-buf subsystem) is then the one who says ok I can use this or I can't use this or I need to call extra functions to use this or whatever.
It's not the job of the exporter to provide the coherency for the importer, cause otherwise we would have a lot of code in the exporter which can only be tested when you have the right importer around. And I strongly think that this is a no-go for having a reliable solution.
The trouble is, that if you have other memory than stuff allocated by the dma-api or mapped using the dma-api, then by necessity the exporter has to deal with this.
Yes, I was thinking about that as well.
Which is the exact same reason we also force the exporters to deal with the cpu cache flushing - you're argument that it's not great to replicate this everywhere holds there equally.
And I'm not really happy with that either.
The other thing is that right now the exporter is the only one who actually knows what kind of dma coherency rules apply for a certain piece of memory. E.g. on i915-gem even if it's dma_map_sg mapped the underlying i915-gem buffer might be non-coherent, and i915-gem makes it all work by doing the appropriate amount of clflush.
Yeah, exactly that's the reason why I think that this stuff doesn't belong into exporters/drivers. Looking at what kind of hacks and workarounds we have in both amdgpu as well as i915 it's pretty clear that we need to improve this design somehow.
Similar funky things happen in other cases.
So unless we add an interface which allows importers to figure out how much flushing is needed, currently the exporter is the only one who knows (because it can inspect the struct device at dma_buf_attach time).
We could flip this around, but it would be a rather serious depature from the dma-buf design approach thus far.
Well clients already give the DMA-direction to exporters when creating the mapping and get an appropriate sg_table in return. All we need to do is getting the information what flushing is needed into the object returned here so that the DMA API can work with it. Christoph Hellwig pretty much nailed it when he said that the problem with the sg_table is that it mixes input and output parameters of the DMA-API. I would extend that and say that we need a mapping object the DMA- API can work with so that it can know what needs to be done when devices request that data is made coherent between them or the CPU.
That's why I think the approach of having DMA-buf callbacks is most likely the wrong thing to do.
What should happen instead is that the DMA subsystem provides functionality which to devices which don't support coherency through it's connection to say I want to access this data, please make sure to flush the appropriate catches. But that's just a very very rough design idea.
This will become more with CXL at the horizon I think.
Yeah CXL will make this all even more fun, but we are firmly there already with devices deciding per-buffer (or sometimes even per-access with intel's MOCS stuff) what coherency mode to use for a buffer.
Also arm soc generally have both coherent and non-coherent device interconnects, and I think some devices can switch with runtime flags too which mode they use for a specific transition.
CXL just extends this to pcie devices.
So the mess is here, how do we deal with it?
I would say we start with the DMA-API by getting away from sg_tables to something cleaner and state oriented.
FYI I am already adding a 'dma_vec' object in my IIO DMABUF patchset, which is just a dead simple
struct dma_vec { dma_addr_t addr; size_t len; };
(The rationale for introducing it in the IIO DMABUF patchset was that the "scatterlist" wouldn't allow me to change the transfer size.)
So I believe a new "sg_table"-like could just be an array of struct dma_vec + flags.
Cheers, -Paul
My take is that the opt-in callback addition is far from great, but it's in line with how we extended dma-buf the past decade plus too. So unless someone's volunteering to pour some serious time into re- engineering this all (including testing all the different device/driver<-
device/driver
interactions) I think there's only really one other option: To not support these cases at all. And I don't really like that, because it means people will hack together something even worse in their drivers.
By adding it to dma-buf it'll stare us in our faces at least :-/
Yeah, it's the way of the least resistance. But with CXL at the horizon and more and more drivers using it I think it's predictable that this will sooner or later blow up. Cheers, Christian.
Cheers, Sima
Regards, Christian.
Cheers, Sima
_______________________________________________ Linaro-mm-sig mailing list -- linaro-mm-sig@lists.linaro.org To unsubscribe send an email to linaro-mm-sig- leave@lists.linaro.org