On Wed, Apr 25, 2018 at 11:54:43PM +0100, Russell King - ARM Linux wrote:
if the memory was previously dirty (iow, CPU has written), you need to flush the dirty cache lines _before_ the GPU writes happen, but you don't know whether the CPU has speculatively prefetched, so you need to flush any prefetched cache lines before reading from the cacheable memory _after_ the GPU has finished writing.
Also note that "flush" there can be "clean the cache", "clean and invalidate the cache" or "invalidate the cache" as appropriate - some CPUs are able to perform those three operations, and the appropriate one depends on not only where in the above sequence it's being used, but also on what the operations are.
Agreed on all these counts.
If we can agree a set of interfaces that allows _proper_ use of these facilities, one which can be used appropriately, then there shouldn't be a problem. The DMA API does that via it's ideas about who owns a particular buffer (because of the above problem) and that's something which would need to be carried over to such a cache flushing API (it should be pretty obvious that having a GPU read or write memory while the cache for that memory is being cleaned will lead to unexpected results.)
I've been trying to come up with such an interface, for now only for internal use in a generic set of noncoherent ops. The API is basically a variant of the existing dma_sync_single_to_device/cpu calls:
http://git.infradead.org/users/hch/misc.git/commitdiff/044dae5f94509288f4655...
Review welcome!
The next issue, which I've brought up before, is that exposing cache flushing to userspace on architectures where it isn't already exposed comes. As has been shown by Google Project Zero, this risks exposing those architectures to Spectre and Meltdown exploits where they weren't at such a risk before. (I've pretty much shown here that you _do_ need to control which cache lines get flushed to make these exploits work, and flushing the cache by reading lots of data in liu of having the ability to explicitly flush bits of cache makes it very difficult to impossible for them to work.)
Extending dma coherence to userspace is going to be the next major nightmare indeed. I'm not sure how much of that actually still is going on in the graphics world, but we'll need a coherent (pun intended) plan how to deal with it.