On Mon, Feb 15, 2021 at 09:58:08AM +0100, Christian König wrote:
Hi guys,
we are currently working an Freesync and direct scan out from system memory on AMD APUs in A+A laptops.
On problem we stumbled over is that our display hardware needs to scan out from uncached system memory and we currently don't have a way to communicate that through DMA-buf.
For our specific use case at hand we are going to implement something driver specific, but the question is should we have something more generic for this?
After all the system memory access pattern is a PCIe extension and as such something generic.
Yes it's a problem, and it's a complete mess. So the defacto rules are:
1. importer has to do coherent transactions to snoop cpu caches.
This way both modes work:
- if the buffer is cached, we're fine
- if the buffer is not cached, but the exporter has flushed all the caches, you're mostly just wasting time on inefficient bus cycles. Also this doesn't work if your CPU doesn't just drop clean cachelines. Like I thing some ARM are prone to do, not idea about AMD, Intel is guaranteed to drop them which is why the uncached scanout for integrated Intel + amd discrete "works".
2. exporters picks the mode freely, and can even change it at runtime (i915 does this, since we don't have an "allocate for scanout" flag wired through consistently). This doesn't work on arm, there the rule is "all devices in the same system must use the same mode".
3. This should be solved at the dma-buf layer, but the dma-api refuses to tell you this information (at least for dma_alloc_coherent). And I'm not going to deal with the bikeshed that would bring into my inbox. Or at least there's always been screaming that drivers shouldn't peek behind the abstraction.
So I think if AMD also guarantees to drop clean cachelines just do the same thing we do right now for intel integrated + discrete amd, but in reserve. It's fragile, but it does work.
What we imo shouldn't do is driver private interfaces here, that's just going to make the problem worse long term. Or at least driver-private interfaces that spawn across drivers behind dma-buf, because imo this is really a problem that dma-buf should solve.
If you do want to solve this at the dma-buf level I can try and point you at the respective i915 and amdgpu code that makes the magic work - I've had to fix it a few times in the past. I'm not sure whether we'd need to pass the dynamic nature through though, i.e. whether we want to be able to scan out imported dma-buf and hence request they be used in uncached mode. -Daniel