On Wed, Mar 21, 2012 at 03:44:38PM -0700, Rebecca Schultz Zavin wrote:
Couldn't this just as easily be handled by not having those mappings be mapped cached or write combine to userspace? They'd be coherent, just slow. I'm not sure we can actually say that all these cpu access are necessary slow path operations anyway. On android we do sometimes decide to software render things to eliminate the overhead of maintaining a hardware context for context switching the gpu. If you want cached or writecombine mappings you'd have to manage them explicitly. If you can't manage them explicitly you have to settle for slow. That seems reasonable to me.
Well the usual approach is writecombine, which doesn't need any explicit cache management.
As far as I can tell with explicit operations I have to invalidate before touching from mmap and clean after. With these implicit ones, I stil have to invalidate and clean, but now I also have to remap them before and after. I don't know what the performance hit of this remapping step is, but I'd like to if you have any insight.
We have a few inefficiencies in the drm/i915 fault path which makes it slow, but generally pagefault performance should be rather quick (at least quicker than flushing the actual data). At least if your fault handler is somewhat clever and prefaults a few more pages in both x and y direction.
But if that's too slow, I'm open to extending dma-buf later on to support more explicit cache management for userspace mmaps (like I've explained below in my previous mail). I just think we should have real benchmark results (and hence some real users of dma-buf) before we add this complexity. Atm I have no idea whether it's worth it. After all, as soon as we expect a lot of rendering/processing, some special dsp/gpu/whatever is likely to take over.
Imo the best way to enable cached mappings is to later on extend dma-buf (as soon as we have some actual exporters/importers in the mainline kernel) with an optional cached_mmap interface which requires explict prepare_mmap_access/finish_mmap_acces calls. Then if both exporter and importer support this, it could get used - otherwise the dma-buf layer could transparently fall back to coherent mappings.
Yours, Daniel