On Thu, Apr 26, 2018 at 11:20:44AM +0200, Daniel Vetter wrote:
The above is already what we're implementing in i915, at least conceptually (it all boils down to clflush instructions because those both invalidate and flush).
The clwb instruction that just writes back dirty cache lines might be very interesting for the x86 non-coherent dma case. A lot of architectures use their equivalent to prepare to to device transfers.
One architectural guarantee we're exploiting is that prefetched (and hence non-dirty) cachelines will never get written back, but dropped instead.
And to make this work you'll need exactly this guarantee.