On Thu, Apr 26, 2018 at 11:24 AM, Christoph Hellwig hch@infradead.org wrote:
On Thu, Apr 26, 2018 at 11:20:44AM +0200, Daniel Vetter wrote:
The above is already what we're implementing in i915, at least conceptually (it all boils down to clflush instructions because those both invalidate and flush).
The clwb instruction that just writes back dirty cache lines might be very interesting for the x86 non-coherent dma case. A lot of architectures use their equivalent to prepare to to device transfers.
Iirc didn't help for i915 use-cases much. Either data gets streamed between cpu and gpu, and then keeping the clean cacheline around doesn't buy you anything. In other cases we need to flush because the gpu really wants to use non-snooped transactions (faster/lower latency/less power required for display because you can shut down the caches), and then there's also no benefit with keeping the cacheline around (no one will ever need it again).
I think clwb is more for persistent memory and stuff like that, not so much for gpus. -Daniel