Re: [Linaro-mm-sig] [PATCH] RFC: dma-buf: userspace mmap support

19 Mar 2012


      ...
-----Original Message-----
From: Alan Cox [mailto:alan@lxorguk.ukuu.org.uk]
Sent: 19 March 2012 16:57
To: Tom Cooksey
Cc: 'Rob Clark'; linaro-mm-sig@lists.linaro.org; dri-
devel@lists.freedesktop.org; linux-media@vger.kernel.org;
rschultz@google.com; Rob Clark; sumit.semwal@linaro.org;
patches@linaro.org
Subject: Re: [PATCH] RFC: dma-buf: userspace mmap support
...
If the API was to also be used for synchronization it would have to
include an atomic "prepare multiple" ioctl which blocked until all
the buffers listed by the application were available. In the same
Too slow already. You are now serializing stuff while what we want to
do
really is
nobody_else_gets_buffers_next([list])
   on available(buffer)
   	dispatch_work(buffer)
so that you can maximise parallelism without allowing deadlocks. If
you've got a high memory bandwith and 8+ cores the 'stop everything'
model isn't great.
Yes, sorry I wasn't clear here. By atomic I meant that a job starts
using all buffers at the same time, once they are available. You are
right, a job waiting for a list of buffers to become available should
not prevent other jobs running or queuing new jobs (eughh). We actually
have the option of using asynchronous call-backs in KDS: A driver lists
all the buffers it needs when adding a job and that job gets added to
the FIFO of each buffer as an atomic operation. However, once the job
is added to all the FIFOs, that atomic operation is complete and another
job can be "queued" up. When a job completes, it is removed from each
buffer's FIFO. At that point, all the "next" jobs in each buffer's FIFO
are evaluated to see if they can run. If they can run, the job's
"start" call-back is called. There's also a synchronous mode of
operation where a blocked thread is "woken up" instead of calling a
call-back function. It is this synchronous mode I would imagine
would be used for user-space access.
...
...
This might be a good argument for keeping synchronization and cache
maintenance separate, though even ignoring synchronization I would
think being able to issue cache maintenance operations for multiple
buffers in a single ioctl might present some small efficiency gains.
However as Rob points out, CPU access is already in slow/legacy
territory.
Dangerous assumption. I do think they should be separate. For one it
makes the case of synchronization needed but hardware cache management
much easier to split cleanly. Assuming CPU access is slow/legacy
reflects a certain model of relatively slow CPU and accelerators
where falling off the acceleration path is bad. On a higher end
processor falling off the acceleration path isn't a performance
matter so much as a power concern.
On some GPU architectures, glReadPixels is a _very_ heavy-weight
operation, so is very much a performance issue and I think always
will be. However I think this might be a special case for certain
GPUs: Other GPU architectures or device-types might be able to
share data with the CPU without such a large impact to performance.
The example of writing subtitles onto a video frame decoded by
a v4l2 hardware codec seems a good example.
...
...
KDS we differentiated jobs which needed "exclusive access" to a
buffer and jobs which needed "shared access" to a buffer. Multiple
jobs could access a buffer at the same time if those jobs all
Makes sense as it's a reader/writer lock and it reflects MESI/MOESI
caching and cache policy in some hardware/software assists.
Actually, this got me thinking... Several ARM implementations rely
on CPU/NEON to perform X.Org's 2D operations and those tend to
operate directly on the framebuffer. So in that case, both the CPU
and display controller need to access the same buffer at the same
time, even though one of them is writing to the buffer. This is
the main reason we called it shared/exclusive access in KDS rather
than read-only/read-write access. In such scenarios, you'd still
want to do a CPU cache flush after CPU-based 2D drawing is complete
to make sure the display controller "saw" those changes. So yes,
perhaps there's actually a use-case where synchronization must be
kept separate to cache-maintenance? In which case, it is worth
making the proposed prepare/finish API more explicit in that it is
a CPU cache invalidate and CPU cache flush operation only? Or are
there other things one might want to do in prepare/finish?
Automatic cache domain tracking for example?
...
...
display controller will be reading the front buffer, but the GPU
might also need to read that front buffer. So perhaps adding
"read-only" & "read-write" access flags to prepare could also be
interpreted as shared & exclusive accesses, if we went down this
route for synchronization that is. :-)
mmap includes read/write info so probably using that works out. It also
means that you have the stuff mapped in a way that will bus error or
segfault anyone who goofs rather than give them the usual 'deep
weirdness' behaviour you get with mishandling of caching bits.
I think it might be possible to make the case to cache user-space
mappings. In which case, you might want to always mmap read-write
but sometimes do a read operation and sometimes a write. So I think
we'd prefer not to make the read-only/read-write decision at mmap
time.
Cheers,
Tom

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [Linaro-mm-sig] [PATCH] RFC: dma-buf: userspace mmap support