On Mon, Jun 13, 2011 at 11:54 AM, Jesse Barnes jbarnes@virtuousgeek.org wrote:
Well only if things are really broken. sysfs exposes _wc resource files to allow userland drivers to map a given PCI BAR using write combining, if the underlying platform supports it.
Mmm, I hadn't spotted that; that is useful, at least as sample code. Doesn't do me any good directly, though; I'm not on a PCI device, I'm on a SoC. And what I need to do is to allocate normal memory through an uncacheable write-combining page table entry (with certainty that it is not aliased by a cacheable entry for the same physical memory), and use it for interchange of data (GPU assets, compressed video) with other on-chip cores. (Or with off-chip PCI devices which use DMA to transfer data to/from these buffers and then interrupt the CPU to notify it to rotate them.)
What doesn't seem to be straightforward to do from userland is to allocate pages that are locked to physical memory and mapped for write-combining. The device driver shouldn't have to mediate their allocation, just map to a physical address (or set up an IOMMU entry, I suppose) and pass that to the hardware that needs it. Typical userland code that could use such a mechanism would be the Qt/OpenGL back end (which needs to store decompressed images and other pre-rendered assets in GPU-ready buffers) and media pipelines.
Similarly, userland mapping of GEM objects through the GTT are supposed to be write combined, though I need to verify this (we've had trouble with it in the past).
Also a nice source of sample code; though, again, I don't want this to be driver-specific. I might want a stage in my media pipeline that uses the GPU to perform, say, lens distortion correction. I shouldn't have to go through contortions to use the same buffers from the GPU and the video capture device. The two devices are likely to have their own variants on scatter-gather DMA, with a circularly linked list of block descriptors with ownership bits and all that jazz; but the actual data buffers should be generic, and the userland pipeline setup code should just allocate them (presumably as contiguous regions in a write-combining hugepage) and feed them to the plumbing.
Cheers, - Michael