On Fri, 18 Jan 2019, Andrew F. Davis wrote:
> On 1/18/19 12:37 PM, Liam Mark wrote:
> > The ION begin_cpu_access and end_cpu_access functions use the
> > dma_sync_sg_for_cpu and dma_sync_sg_for_device APIs to perform cache
> > maintenance.
> >
> > Currently it is possible to apply cache maintenance, via the
> > begin_cpu_access and end_cpu_access APIs, to ION buffers which are not
> > dma mapped.
> >
> > The dma sync sg APIs should not be called on sg lists which have not been
> > dma mapped as this can result in cache maintenance being applied to the
> > wrong address. If an sg list has not been dma mapped then its dma_address
> > field has not been populated, some dma ops such as the swiotlb_dma_ops ops
> > use the dma_address field to calculate the address onto which to apply
> > cache maintenance.
> >
> > Also I don’t think we want CMOs to be applied to a buffer which is not
> > dma mapped as the memory should already be coherent for access from the
> > CPU. Any CMOs required for device access taken care of in the
> > dma_buf_map_attachment and dma_buf_unmap_attachment calls.
> > So really it only makes sense for begin_cpu_access and end_cpu_access to
> > apply CMOs if the buffer is dma mapped.
> >
> > Fix the ION begin_cpu_access and end_cpu_access functions to only apply
> > cache maintenance to buffers which are dma mapped.
> >
> > Fixes: 2a55e7b5e544 ("staging: android: ion: Call dma_map_sg for syncing and mapping")
> > Signed-off-by: Liam Mark <lmark(a)codeaurora.org>
> > ---
> > drivers/staging/android/ion/ion.c | 26 +++++++++++++++++++++-----
> > 1 file changed, 21 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c
> > index 6f5afab7c1a1..1fe633a7fdba 100644
> > --- a/drivers/staging/android/ion/ion.c
> > +++ b/drivers/staging/android/ion/ion.c
> > @@ -210,6 +210,7 @@ struct ion_dma_buf_attachment {
> > struct device *dev;
> > struct sg_table *table;
> > struct list_head list;
> > + bool dma_mapped;
> > };
> >
> > static int ion_dma_buf_attach(struct dma_buf *dmabuf,
> > @@ -231,6 +232,7 @@ static int ion_dma_buf_attach(struct dma_buf *dmabuf,
> >
> > a->table = table;
> > a->dev = attachment->dev;
> > + a->dma_mapped = false;
> > INIT_LIST_HEAD(&a->list);
> >
> > attachment->priv = a;
> > @@ -261,12 +263,18 @@ static struct sg_table *ion_map_dma_buf(struct dma_buf_attachment *attachment,
> > {
> > struct ion_dma_buf_attachment *a = attachment->priv;
> > struct sg_table *table;
> > + struct ion_buffer *buffer = attachment->dmabuf->priv;
> >
> > table = a->table;
> >
> > + mutex_lock(&buffer->lock);
> > if (!dma_map_sg(attachment->dev, table->sgl, table->nents,
> > - direction))
> > + direction)) {
> > + mutex_unlock(&buffer->lock);
> > return ERR_PTR(-ENOMEM);
> > + }
> > + a->dma_mapped = true;
> > + mutex_unlock(&buffer->lock);
> >
> > return table;
> > }
> > @@ -275,7 +283,13 @@ static void ion_unmap_dma_buf(struct dma_buf_attachment *attachment,
> > struct sg_table *table,
> > enum dma_data_direction direction)
> > {
> > + struct ion_dma_buf_attachment *a = attachment->priv;
> > + struct ion_buffer *buffer = attachment->dmabuf->priv;
> > +
> > + mutex_lock(&buffer->lock);
> > dma_unmap_sg(attachment->dev, table->sgl, table->nents, direction);
> > + a->dma_mapped = false;
> > + mutex_unlock(&buffer->lock);
> > }
> >
> > static int ion_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma)
> > @@ -346,8 +360,9 @@ static int ion_dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
> >
> > mutex_lock(&buffer->lock);
> > list_for_each_entry(a, &buffer->attachments, list) {
>
> When no devices are attached then buffer->attachments is empty and the
> below does not run, so if I understand this patch correctly then what
> you are protecting against is CPU access in the window after
> dma_buf_attach but before dma_buf_map.
>
Yes
> This is the kind of thing that again makes me think a couple more
> ordering requirements on DMA-BUF ops are needed. DMA-BUFs do not require
> the backing memory to be allocated until map time, this is why the
> dma_address field would still be null as you note in the commit message.
> So why should the CPU be performing accesses on a buffer that is not
> actually backed yet?
>
> I can think of two solutions:
>
> 1) Only allow CPU access (mmap, kmap, {begin,end}_cpu_access) while at
> least one device is mapped.
>
Would be quite limiting to clients.
> 2) Treat the CPU access request like the a device map request and
> trigger the allocation of backing memory just like if a device map had
> come in.
>
Which is, as you mention pretty much what we have now (though the buffer
is allocated even earlier).
> I know the current Ion heaps (and most other DMA-BUF exporters) all do
> the allocation up front so the memory is already there, but DMA-BUF was
> designed with late allocation in mind. I have a use-case I'm working on
> that finally exercises this DMA-BUF functionality and I would like to
> have it export through ION. This patch doesn't prevent that, but seems
> like it is endorsing the the idea that buffers always need to be backed,
> even before device attach/map is has occurred.
>
I didn't interpret the DMA-buf contract as requiring the dma-map to be
called in order for a backing store to be provided, I interpreted it as
meaning there could be a backing store before the dma-map but at the
dma-map call the final backing store configuration would be decided
(perhaps involving migrating the memory to the final backing store).
I will let the dma-buf experts correct me on that.
Limiting userspace clients to not be able to access buffers until after
they are dma-mapped seems unfortuntate to me, dma-mapping usually means a
change of ownership of the memory from the CPU to the device. So generally
while a buffer is dma mapped you have the device access it (though of
course it is supported for CPU to access to the buffer while dma mapped)
and then once the buffer is dma-unmapped the CPU can access it. This is
how the DMA APIs are frequently used, and the changes above make ION align
more with the way the DMA APIs are used. Basically when the buffer is not
dma-mapped the CPU doesn't need to do any CMOs to access the buffer (and
ION ensures not CMOs are applied) but if the CPU does want to access the
buffer while it is dma mapped then ION ensures that the appropriate CMOs
are applied.
It seems like a legitimate uses case to me to allow clients to access the
buffer before (and after) dma-mapping, example post processing of buffers.
> Either of the above two solutions would need to target the DMA-BUF
> framework,
>
> Sumit,
>
> Any comment?
>
> Thanks,
> Andrew
>
> > - dma_sync_sg_for_cpu(a->dev, a->table->sgl, a->table->nents,
> > - direction);
> > + if (a->dma_mapped)
> > + dma_sync_sg_for_cpu(a->dev, a->table->sgl,
> > + a->table->nents, direction);
> > }
> >
> > unlock:
> > @@ -369,8 +384,9 @@ static int ion_dma_buf_end_cpu_access(struct dma_buf *dmabuf,
> >
> > mutex_lock(&buffer->lock);
> > list_for_each_entry(a, &buffer->attachments, list) {
> > - dma_sync_sg_for_device(a->dev, a->table->sgl, a->table->nents,
> > - direction);
> > + if (a->dma_mapped)
> > + dma_sync_sg_for_device(a->dev, a->table->sgl,
> > + a->table->nents, direction);
> > }
> > mutex_unlock(&buffer->lock);
> >
> >
>
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
On Tue, 22 Jan 2019, Andrew F. Davis wrote:
> On 1/21/19 4:12 PM, Liam Mark wrote:
> > On Mon, 21 Jan 2019, Christoph Hellwig wrote:
> >
> >> On Mon, Jan 21, 2019 at 11:44:10AM -0800, Liam Mark wrote:
> >>> The main use case is for allowing clients to pass in
> >>> DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance
> >>> which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In
> >>> ION the buffers aren't usually accessed from the CPU so this allows
> >>> clients to often avoid doing unnecessary cache maintenance.
> >>
> >> This can't work. The cpu can still easily speculate into this area.
> >
> > Can you provide more detail on your concern here.
> > The use case I am thinking about here is a cached buffer which is accessed
> > by a non IO-coherent device (quite a common use case for ION).
> >
> > Guessing on your concern:
> > The speculative access can be an issue if you are going to access the
> > buffer from the CPU after the device has written to it, however if you
> > know you aren't going to do any CPU access before the buffer is again
> > returned to the device then I don't think the speculative access is a
> > concern.
> >
> >> Moreover in general these operations should be cheap if the addresses
> >> aren't cached.
> >>
> >
> > I am thinking of use cases with cached buffers here, so CMO isn't cheap.
> >
>
> These buffers are cacheable, not cached, if you haven't written anything
> the data wont actually be in cache.
That's true
> And in the case of speculative cache
> filling the lines are marked clean. In either case the only cost is the
> little 7 instruction loop calling the clean/invalidate instruction (dc
> civac for ARMv8) for the cache-lines. Unless that is the cost you are
> trying to avoid?
>
This is the cost I am trying to avoid and this comes back to our previous
discussion. We have a coherent system cache so if you are doing this for
every cache line on a large buffer it adds up with this work and the going
to the bus.
For example I believe 1080P buffers are 8MB, and 4K buffers are even
larger.
I also still think you would want to solve this properly such that
invalidates aren't being done unnecessarily.
> In that case if you are mapping and unmapping so much that the little
> CMO here is hurting performance then I would argue your usage is broken
> and needs to be re-worked a bit.
>
I am not sure I would say it is broken, the large buffers (example 1080P
buffers) are mapped and unmapped on every frame. I don't think there is
any clean way to avoid that in a pipelining framework, you could ask
clients to keep the buffers dma mapped but there isn't necessarily a good
time to tell them to unmap.
It would be unfortunate to not consider this something legitimate for
usespace to do in a pipelining use case.
Requiring devices to stay attached doesn't seem very clean to me as there
isn't necessarily a nice place to tell them when to detach.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
On Tue, 22 Jan 2019, Andrew F. Davis wrote:
> On 1/21/19 4:18 PM, Liam Mark wrote:
> > On Mon, 21 Jan 2019, Andrew F. Davis wrote:
> >
> >> On 1/21/19 2:20 PM, Liam Mark wrote:
> >>> On Mon, 21 Jan 2019, Andrew F. Davis wrote:
> >>>
> >>>> On 1/21/19 1:44 PM, Liam Mark wrote:
> >>>>> On Mon, 21 Jan 2019, Christoph Hellwig wrote:
> >>>>>
> >>>>>> On Sat, Jan 19, 2019 at 08:50:41AM -0800, Laura Abbott wrote:
> >>>>>>>> And who is going to decide which ones to pass? And who documents
> >>>>>>>> which ones are safe?
> >>>>>>>>
> >>>>>>>> I'd much rather have explicit, well documented dma-buf flags that
> >>>>>>>> might get translated to the DMA API flags, which are not error checked,
> >>>>>>>> not very well documented and way to easy to get wrong.
> >>>>>>>>
> >>>>>>>
> >>>>>>> I'm not sure having flags in dma-buf really solves anything
> >>>>>>> given drivers can use the attributes directly with dma_map
> >>>>>>> anyway, which is what we're looking to do. The intention
> >>>>>>> is for the driver creating the dma_buf attachment to have
> >>>>>>> the knowledge of which flags to use.
> >>>>>>
> >>>>>> Well, there are very few flags that you can simply use for all calls of
> >>>>>> dma_map*. And given how badly these flags are defined I just don't want
> >>>>>> people to add more places where they indirectly use these flags, as
> >>>>>> it will be more than enough work to clean up the current mess.
> >>>>>>
> >>>>>> What flag(s) do you want to pass this way, btw? Maybe that is where
> >>>>>> the problem is.
> >>>>>>
> >>>>>
> >>>>> The main use case is for allowing clients to pass in
> >>>>> DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance
> >>>>> which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In
> >>>>> ION the buffers aren't usually accessed from the CPU so this allows
> >>>>> clients to often avoid doing unnecessary cache maintenance.
> >>>>>
> >>>>
> >>>> How can a client know that no CPU access has occurred that needs to be
> >>>> flushed out?
> >>>>
> >>>
> >>> I have left this to clients, but if they own the buffer they can have the
> >>> knowledge as to whether CPU access is needed in that use case (example for
> >>> post-processing).
> >>>
> >>> For example with the previous version of ION we left all decisions of
> >>> whether cache maintenance was required up to the client, they would use
> >>> the ION cache maintenance IOCTL to force cache maintenance only when it
> >>> was required.
> >>> In these cases almost all of the access was being done by the device and
> >>> in the rare cases CPU access was required clients would initiate the
> >>> required cache maintenance before and after the CPU access.
> >>>
> >>
> >> I think we have different definitions of "client", I'm talking about the
> >> DMA-BUF client (the importer), that is who can set this flag. It seems
> >> you mean the userspace application, which has no control over this flag.
> >>
> >
> > I am also talking about dma-buf clients, I am referring to both the
> > userspace and kernel component of the client. For example our Camera ION
> > client has both a usersapce and kernel component and they have ION
> > buffers, which they control the access to, which may or may not be
> > accessed by the CPU in certain uses cases.
> >
>
> I know they often work together, but for this discussion it would be
> good to keep kernel clients and usperspace clients separate. There are
> three types of actors at play here, userspace clients, kernel clients,
> and exporters.
>
> DMA-BUF only provides the basic sync primitive + mmap directly to
> userspace,
Well dma-buf does provide dma_buf_kmap/dma_buf_begin_cpu_access which
allows the same fucntionality in the kernel, but I don't think that changes
your argument.
> both operations are fulfilled by the exporter. This patch is
> about adding more control to the kernel side clients. The kernel side
> clients cannot know what userspace or other kernel side clients have
> done with the buffer, *only* the exporter has the whole picture.
>
> Therefor neither type of client should be deciding if the CPU needs
> flushed or not, only the exporter, based on the type of buffer, the
> current set attachments, and previous actions (is this first attachment,
> CPU get access in-between, etc...) can make this decision.
>
> You goal seems to be to avoid unneeded CPU side CMOs when a device
> detaches and another attaches with no CPU access in-between, right?
> That's reasonable to me, but it must be the exporter who keeps track and
> skips the CMO. This patch allows the client to tell the exporter the CMO
> is not needed and that is not safe.
>
I agree it would be better have this logic in the exporter, but I just
haven't heard an upstreamable way to make that work.
But maybe to explore that a bit more.
If we consider having CPU access with no devices attached a legitimate use
case:
The pipelining use case I am thinking of is
1) dev 1 attach, map, access, unmap
2) dev 1 detach
3) (maybe) CPU access
4) dev 2 attach
5) dev 2 map, access
6) ...
It would be unfortunate to not consider this something legitimate for
userspace to do in a pipelining use case.
Requiring devices to stay attached doesn't seem very clean to me as there
isn't necessarily a nice place to tell them when to detach.
If we considered the above a supported use case I think we could support
it in dma-buf (based on past discussions) if we had 2 things
#1 if we tracked the state of the buffer (example if it has had a previous
cached/uncached write and no following CMO). Then when either the CPU or
a device was going to access a buffer it could decide, based on the
previous access if any CMO needs to be applied first.
#2 we had a non-architecture specific way to apply cache maintenance
without a device, so that in step #3 the begin_cpu_acess call could
successfully invalidate the buffer.
I think #1 is doable since we can tell tell if devices are IO coherent or
not and we know the direction of accesses in dma map and begin cpu access.
I think we would probably agree that #2 is a problem though, getting the
kernel to expose that API seems like a hard argument.
Liam
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
Some stability changes to improve ION robustness and a perf related
change to make it easier for clients to avoid unnecessary cache
maintenance, such as when buffers are clean and haven't had any CPU
access.
Liam Mark (4):
staging: android: ion: Support cpu access during dma_buf_detach
staging: android: ion: Restrict cache maintenance to dma mapped memory
dma-buf: add support for mapping with dma mapping attributes
staging: android: ion: Support for mapping with dma mapping attributes
drivers/staging/android/ion/ion.c | 33 +++++++++++++++++++++++++--------
include/linux/dma-buf.h | 3 +++
2 files changed, 28 insertions(+), 8 deletions(-)
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
On Mon, 21 Jan 2019, Andrew F. Davis wrote:
> On 1/21/19 2:20 PM, Liam Mark wrote:
> > On Mon, 21 Jan 2019, Andrew F. Davis wrote:
> >
> >> On 1/21/19 1:44 PM, Liam Mark wrote:
> >>> On Mon, 21 Jan 2019, Christoph Hellwig wrote:
> >>>
> >>>> On Sat, Jan 19, 2019 at 08:50:41AM -0800, Laura Abbott wrote:
> >>>>>> And who is going to decide which ones to pass? And who documents
> >>>>>> which ones are safe?
> >>>>>>
> >>>>>> I'd much rather have explicit, well documented dma-buf flags that
> >>>>>> might get translated to the DMA API flags, which are not error checked,
> >>>>>> not very well documented and way to easy to get wrong.
> >>>>>>
> >>>>>
> >>>>> I'm not sure having flags in dma-buf really solves anything
> >>>>> given drivers can use the attributes directly with dma_map
> >>>>> anyway, which is what we're looking to do. The intention
> >>>>> is for the driver creating the dma_buf attachment to have
> >>>>> the knowledge of which flags to use.
> >>>>
> >>>> Well, there are very few flags that you can simply use for all calls of
> >>>> dma_map*. And given how badly these flags are defined I just don't want
> >>>> people to add more places where they indirectly use these flags, as
> >>>> it will be more than enough work to clean up the current mess.
> >>>>
> >>>> What flag(s) do you want to pass this way, btw? Maybe that is where
> >>>> the problem is.
> >>>>
> >>>
> >>> The main use case is for allowing clients to pass in
> >>> DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance
> >>> which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In
> >>> ION the buffers aren't usually accessed from the CPU so this allows
> >>> clients to often avoid doing unnecessary cache maintenance.
> >>>
> >>
> >> How can a client know that no CPU access has occurred that needs to be
> >> flushed out?
> >>
> >
> > I have left this to clients, but if they own the buffer they can have the
> > knowledge as to whether CPU access is needed in that use case (example for
> > post-processing).
> >
> > For example with the previous version of ION we left all decisions of
> > whether cache maintenance was required up to the client, they would use
> > the ION cache maintenance IOCTL to force cache maintenance only when it
> > was required.
> > In these cases almost all of the access was being done by the device and
> > in the rare cases CPU access was required clients would initiate the
> > required cache maintenance before and after the CPU access.
> >
>
> I think we have different definitions of "client", I'm talking about the
> DMA-BUF client (the importer), that is who can set this flag. It seems
> you mean the userspace application, which has no control over this flag.
>
I am also talking about dma-buf clients, I am referring to both the
userspace and kernel component of the client. For example our Camera ION
client has both a usersapce and kernel component and they have ION
buffers, which they control the access to, which may or may not be
accessed by the CPU in certain uses cases.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
On Mon, 21 Jan 2019, Andrew F. Davis wrote:
> On 1/21/19 1:44 PM, Liam Mark wrote:
> > On Mon, 21 Jan 2019, Christoph Hellwig wrote:
> >
> >> On Sat, Jan 19, 2019 at 08:50:41AM -0800, Laura Abbott wrote:
> >>>> And who is going to decide which ones to pass? And who documents
> >>>> which ones are safe?
> >>>>
> >>>> I'd much rather have explicit, well documented dma-buf flags that
> >>>> might get translated to the DMA API flags, which are not error checked,
> >>>> not very well documented and way to easy to get wrong.
> >>>>
> >>>
> >>> I'm not sure having flags in dma-buf really solves anything
> >>> given drivers can use the attributes directly with dma_map
> >>> anyway, which is what we're looking to do. The intention
> >>> is for the driver creating the dma_buf attachment to have
> >>> the knowledge of which flags to use.
> >>
> >> Well, there are very few flags that you can simply use for all calls of
> >> dma_map*. And given how badly these flags are defined I just don't want
> >> people to add more places where they indirectly use these flags, as
> >> it will be more than enough work to clean up the current mess.
> >>
> >> What flag(s) do you want to pass this way, btw? Maybe that is where
> >> the problem is.
> >>
> >
> > The main use case is for allowing clients to pass in
> > DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance
> > which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In
> > ION the buffers aren't usually accessed from the CPU so this allows
> > clients to often avoid doing unnecessary cache maintenance.
> >
>
> How can a client know that no CPU access has occurred that needs to be
> flushed out?
>
I have left this to clients, but if they own the buffer they can have the
knowledge as to whether CPU access is needed in that use case (example for
post-processing).
For example with the previous version of ION we left all decisions of
whether cache maintenance was required up to the client, they would use
the ION cache maintenance IOCTL to force cache maintenance only when it
was required.
In these cases almost all of the access was being done by the device and
in the rare cases CPU access was required clients would initiate the
required cache maintenance before and after the CPU access.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
This is a small fallout from a work to allow batching WW mutex locks and
unlocks.
Our Wound-Wait mutexes actually don't use the Wound-Wait algorithm but
the Wait-Die algorithm. One could perhaps rename those mutexes tree-wide to
"Wait-Die mutexes" or "Deadlock Avoidance mutexes". Another approach suggested
here is to implement also the "Wound-Wait" algorithm as a per-WW-class
choice, as it has advantages in some cases. See for example
http://www.mathcs.emory.edu/~cheung/Courses/554/Syllabus/8-recv+serial/dead…
Now Wound-Wait is a preemptive algorithm, and the preemption is implemented
using a lazy scheme: If a wounded transaction is about to go to sleep on
a contended WW mutex, we return -EDEADLK. That is sufficient for deadlock
prevention. Since with WW mutexes we also require the aborted transaction to
sleep waiting to lock the WW mutex it was aborted on, this choice also provides
a suitable WW mutex to sleep on. If we were to return -EDEADLK on the first
WW mutex lock after the transaction was wounded whether the WW mutex was
contended or not, the transaction might frequently be restarted without a wait,
which is far from optimal. Note also that with the lazy preemption scheme,
contrary to Wait-Die there will be no rollbacks on lock contention of locks
held by a transaction that has completed its locking sequence.
The modeset locks are then changed from Wait-Die to Wound-Wait since the
typical locking pattern of those locks very well matches the criterion for
a substantial reduction in the number of rollbacks. For reservation objects,
the benefit is more unclear at this point and they remain using Wait-Die.
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Gustavo Padovan <gustavo(a)padovan.org>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Sean Paul <seanpaul(a)chromium.org>
Cc: David Airlie <airlied(a)linux.ie>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: "Paul E. McKenney" <paulmck(a)linux.vnet.ibm.com>
Cc: Josh Triplett <josh(a)joshtriplett.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Kate Stewart <kstewart(a)linuxfoundation.org>
Cc: Philippe Ombredanne <pombredanne(a)nexb.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: linux-doc(a)vger.kernel.org
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
v2:
Updated DEFINE_WW_CLASS() API, minor code- and comment fixes as
detailed in each patch.
v3:
Included cleanup patch from Peter Zijlstra. Documentation fixes and
a small correctness fix.
v4:
Reworked the correctness fix.
On 1/3/19 5:42 PM, Zengtao (B) wrote:
>> -----Original Message-----
>> From: Laura Abbott [mailto:labbott@redhat.com]
>> Sent: Thursday, January 03, 2019 6:31 AM
>> To: Zengtao (B) <prime.zeng(a)hisilicon.com>; sumit.semwal(a)linaro.org
>> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>; Arve Hjønnevåg
>> <arve(a)android.com>; Todd Kjos <tkjos(a)android.com>; Martijn Coenen
>> <maco(a)android.com>; Joel Fernandes <joel(a)joelfernandes.org>;
>> devel(a)driverdev.osuosl.org; dri-devel(a)lists.freedesktop.org;
>> linaro-mm-sig(a)lists.linaro.org; linux-kernel(a)vger.kernel.org
>> Subject: Re: [PATCH] staging: android: ion: add buffer flag update ioctl
>>
>> On 12/23/18 6:47 PM, Zengtao (B) wrote:
>>> Hi laura:
>>>
>>>> -----Original Message-----
>>>> From: Laura Abbott [mailto:labbott@redhat.com]
>>>> Sent: Friday, December 21, 2018 4:50 AM
>>>> To: Zengtao (B) <prime.zeng(a)hisilicon.com>;
>> sumit.semwal(a)linaro.org
>>>> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>; Arve
>> Hjønnevåg
>>>> <arve(a)android.com>; Todd Kjos <tkjos(a)android.com>; Martijn
>> Coenen
>>>> <maco(a)android.com>; Joel Fernandes <joel(a)joelfernandes.org>;
>>>> devel(a)driverdev.osuosl.org; dri-devel(a)lists.freedesktop.org;
>>>> linaro-mm-sig(a)lists.linaro.org; linux-kernel(a)vger.kernel.org
>>>> Subject: Re: [PATCH] staging: android: ion: add buffer flag update
>>>> ioctl
>>>>
>>>> On 12/19/18 5:39 PM, Zengtao (B) wrote:
>>>>> Hi laura:
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Laura Abbott [mailto:labbott@redhat.com]
>>>>>> Sent: Thursday, December 20, 2018 2:10 AM
>>>>>> To: Zengtao (B) <prime.zeng(a)hisilicon.com>;
>>>> sumit.semwal(a)linaro.org
>>>>>> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>; Arve
>>>> Hjønnevåg
>>>>>> <arve(a)android.com>; Todd Kjos <tkjos(a)android.com>; Martijn
>>>> Coenen
>>>>>> <maco(a)android.com>; Joel Fernandes <joel(a)joelfernandes.org>;
>>>>>> devel(a)driverdev.osuosl.org; dri-devel(a)lists.freedesktop.org;
>>>>>> linaro-mm-sig(a)lists.linaro.org; linux-kernel(a)vger.kernel.org
>>>>>> Subject: Re: [PATCH] staging: android: ion: add buffer flag update
>>>>>> ioctl
>>>>>>
>>>>>> On 12/19/18 9:19 AM, Zeng Tao wrote:
>>>>>>> In some usecases, the buffer cached attribute is not determined at
>>>>>>> allocation time, it's determined just before the real cpu mapping.
>>>>>>> And from the memory view of point, a buffer should not have the
>>>>>> cached
>>>>>>> attribute util is really mapped by the cpu. So in this patch, we
>>>>>>> introduced the new ioctl command to target the requirement.
>>>>>>>
>>>>>>
>>>>>> This is racy and error prone. Can you explain more what problem
>> you
>>>>>> are trying to solve?
>>>>>
>>>>> My use case is like this:
>>>>> 1. There are two process A and B, A takes case of ion buffer
>>>>> allocation,
>>>> and
>>>>> pass the buffer fd to B, then B maps and uses it.
>>>>> 2. Process B need to map the buffer with different cached attribute
>>>>> for different use case, for example, if the buffer is used for pure
>>>>> software algorithm, then we need to map it as cached, otherwise
>>>>> non-cached, and B needs to deal with both cases.
>>>>> And unfortunately the mmap syscall takes no cached flags and we
>>>>> can't decide the cache attribute when we are doing the mmap, so I
>>>>> introduce new the ioctl even though I think the solution is not as
>> good.
>>>>>
>>>>>
>>>>
>>>> Thanks for the explanation, this was about the use case I expected.
>>>> I'm pretty sure I had this exact problem once upon a time and we
>>>> didn't come up with a solution. I'd still like to get rid of uncached
>>>> buffers in general and just use cached buffers (see
>>>> http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/201
>>>> 8-N
>>>> ovember/128842.html)
>>>> What's your usecase for uncached buffers?
>>>
>>> Some buffers are only used by HW, in this case, we tend to use
>> uncached buffers.
>>> But sometimes the SW need to read few buffer contents for debug
>>> purpose and no synchronization is needed.
>>>
>>
>> If this is primarily for debug, that's not really a compelling reason to
>> support uncached buffers. It's incredibly difficult to do uncached buffers
>> correctly I'd rather spend effort on making the cached use cases work
>> better.
>>
> No, not only for debug, I meant if the buffers are uncached, the SW will only mmap
> them for debug or even don't need to mmap them.
> So for the two kinds of buffers:
> 1. uncached: only the HW need to access it, the SW don't need to mmap it or just for debug.
> 2. cached: both the HW and the SW need to access it.
> The ION is designed as a generalized memory manager, I think we should cover as many
> requirements as we can in the ION design, so I 'd rather that we keep the uncached support.
> And I don’t quite understand the difficulty you mentioned to support the uncached buffers, would
> you explain a little more about it.
>
We end up with aliasing problems. Each kernel page still has a cached
mapping so it's very difficult to keep the two mappings in sync.
https://lore.kernel.org/lkml/20181120164636.jcw7li2uaa3cmwc3@DESKTOP-E1NTVV…
This thread does a better job of explaining the problem and the
objections to uncached buffers.
And if the software never touches the buffer, why does it
matter if the buffer is uncached?
Laura
> Thanks
> Zengtao
>
>>
>>>>
>>>>>>
>>>>>>> Signed-off-by: Zeng Tao <prime.zeng(a)hisilicon.com>
>>>>>>> ---
>>>>>>> drivers/staging/android/ion/ion-ioctl.c | 4 ++++
>>>>>>> drivers/staging/android/ion/ion.c | 17
>>>> +++++++++++++++++
>>>>>>> drivers/staging/android/ion/ion.h | 1 +
>>>>>>> drivers/staging/android/uapi/ion.h | 22
>>>>>> ++++++++++++++++++++++
>>>>>>> 4 files changed, 44 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/staging/android/ion/ion-ioctl.c
>>>>>>> b/drivers/staging/android/ion/ion-ioctl.c
>>>>>>> index a8d3cc4..60bb702 100644
>>>>>>> --- a/drivers/staging/android/ion/ion-ioctl.c
>>>>>>> +++ b/drivers/staging/android/ion/ion-ioctl.c
>>>>>>> @@ -12,6 +12,7 @@
>>>>>>>
>>>>>>> union ion_ioctl_arg {
>>>>>>> struct ion_allocation_data allocation;
>>>>>>> + struct ion_buffer_flag_data update;
>>>>>>> struct ion_heap_query query;
>>>>>>> };
>>>>>>>
>>>>>>> @@ -83,6 +84,9 @@ long ion_ioctl(struct file *filp, unsigned int
>>>>>>> cmd, unsigned long arg)
>>>>>>>
>>>>>>> break;
>>>>>>> }
>>>>>>> + case ION_IOC_BUFFER_UPDATE:
>>>>>>> + ret = ion_buffer_update(data.update.fd, data.update.flags);
>>>>>>> + break;
>>>>>>> case ION_IOC_HEAP_QUERY:
>>>>>>> ret = ion_query_heaps(&data.query);
>>>>>>> break;
>>>>>>> diff --git a/drivers/staging/android/ion/ion.c
>>>>>>> b/drivers/staging/android/ion/ion.c
>>>>>>> index 9907332..f1404dc 100644
>>>>>>> --- a/drivers/staging/android/ion/ion.c
>>>>>>> +++ b/drivers/staging/android/ion/ion.c
>>>>>>> @@ -436,6 +436,23 @@ int ion_alloc(size_t len, unsigned int
>>>>>> heap_id_mask, unsigned int flags)
>>>>>>> return fd;
>>>>>>> }
>>>>>>>
>>>>>>> +int ion_buffer_update(unsigned int fd, unsigned int flags) {
>>>>>>> + struct dma_buf *dmabuf;
>>>>>>> + struct ion_buffer *buffer;
>>>>>>> +
>>>>>>> + dmabuf = dma_buf_get(fd);
>>>>>>> +
>>>>>>> + if (!dmabuf)
>>>>>>> + return -EINVAL;
>>>>>>> +
>>>>>>> + buffer = dmabuf->priv;
>>>>>>> + buffer->flags = flags;
>>>>>>> + dma_buf_put(dmabuf);
>>>>>>> +
>>>>>>> + return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> int ion_query_heaps(struct ion_heap_query *query)
>>>>>>> {
>>>>>>> struct ion_device *dev = internal_dev; diff --git
>>>>>>> a/drivers/staging/android/ion/ion.h
>>>>>>> b/drivers/staging/android/ion/ion.h
>>>>>>> index c006fc1..99bf9ab 100644
>>>>>>> --- a/drivers/staging/android/ion/ion.h
>>>>>>> +++ b/drivers/staging/android/ion/ion.h
>>>>>>> @@ -199,6 +199,7 @@ int ion_heap_pages_zero(struct page
>> *page,
>>>>>> size_t size, pgprot_t pgprot);
>>>>>>> int ion_alloc(size_t len,
>>>>>>> unsigned int heap_id_mask,
>>>>>>> unsigned int flags);
>>>>>>> +int ion_buffer_update(unsigned int fd, unsigned int flags);
>>>>>>>
>>>>>>> /**
>>>>>>> * ion_heap_init_shrinker
>>>>>>> diff --git a/drivers/staging/android/uapi/ion.h
>>>>>>> b/drivers/staging/android/uapi/ion.h
>>>>>>> index 5d70098..99753fc 100644
>>>>>>> --- a/drivers/staging/android/uapi/ion.h
>>>>>>> +++ b/drivers/staging/android/uapi/ion.h
>>>>>>> @@ -74,6 +74,20 @@ struct ion_allocation_data {
>>>>>>> __u32 unused;
>>>>>>> };
>>>>>>>
>>>>>>> +/**
>>>>>>> + * struct ion_buffer_flag_data - metadata passed from userspace
>>>>>>> +for update
>>>>>>> + * buffer flags
>>>>>>> + * @fd: file descriptor of the buffer
>>>>>>> + * @flags: flags passed to the buffer
>>>>>>> + *
>>>>>>> + * Provided by userspace as an argument to the ioctl */
>>>>>>> +
>>>>>>> +struct ion_buffer_flag_data {
>>>>>>> + __u32 fd;
>>>>>>> + __u32 flags;
>>>>>>> +}
>>>>>>> +
>>>>>>> #define MAX_HEAP_NAME 32
>>>>>>>
>>>>>>> /**
>>>>>>> @@ -116,6 +130,14 @@ struct ion_heap_query {
>>>>>>> struct ion_allocation_data)
>>>>>>>
>>>>>>> /**
>>>>>>> + * DOC: ION_IOC_BUFFER_UPDATE - update the specified ion
>> buffer
>>>>>> flags
>>>>>>> + *
>>>>>>> + * Takes an ion_buffer_flag_data structure and returns the result
>>>>>>> +of the
>>>>>>> + * buffer flag update operation.
>>>>>>> + */
>>>>>>> +#define ION_IOC_BUFFER_UPDATE _IOWR(ION_IOC_MAGIC, 1, \
>>>>>>> + struct ion_buffer_flag_data)
>>>>>>> +/**
>>>>>>> * DOC: ION_IOC_HEAP_QUERY - information about available
>>>> heaps
>>>>>>> *
>>>>>>> * Takes an ion_heap_query structure and populates
>> information
>>>>>> about
>>>>>>>
>>>>>
>>>
>
On 12/23/18 6:47 PM, Zengtao (B) wrote:
> Hi laura:
>
>> -----Original Message-----
>> From: Laura Abbott [mailto:labbott@redhat.com]
>> Sent: Friday, December 21, 2018 4:50 AM
>> To: Zengtao (B) <prime.zeng(a)hisilicon.com>; sumit.semwal(a)linaro.org
>> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>; Arve Hjønnevåg
>> <arve(a)android.com>; Todd Kjos <tkjos(a)android.com>; Martijn Coenen
>> <maco(a)android.com>; Joel Fernandes <joel(a)joelfernandes.org>;
>> devel(a)driverdev.osuosl.org; dri-devel(a)lists.freedesktop.org;
>> linaro-mm-sig(a)lists.linaro.org; linux-kernel(a)vger.kernel.org
>> Subject: Re: [PATCH] staging: android: ion: add buffer flag update ioctl
>>
>> On 12/19/18 5:39 PM, Zengtao (B) wrote:
>>> Hi laura:
>>>
>>>> -----Original Message-----
>>>> From: Laura Abbott [mailto:labbott@redhat.com]
>>>> Sent: Thursday, December 20, 2018 2:10 AM
>>>> To: Zengtao (B) <prime.zeng(a)hisilicon.com>;
>> sumit.semwal(a)linaro.org
>>>> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>; Arve
>> Hjønnevåg
>>>> <arve(a)android.com>; Todd Kjos <tkjos(a)android.com>; Martijn
>> Coenen
>>>> <maco(a)android.com>; Joel Fernandes <joel(a)joelfernandes.org>;
>>>> devel(a)driverdev.osuosl.org; dri-devel(a)lists.freedesktop.org;
>>>> linaro-mm-sig(a)lists.linaro.org; linux-kernel(a)vger.kernel.org
>>>> Subject: Re: [PATCH] staging: android: ion: add buffer flag update
>>>> ioctl
>>>>
>>>> On 12/19/18 9:19 AM, Zeng Tao wrote:
>>>>> In some usecases, the buffer cached attribute is not determined at
>>>>> allocation time, it's determined just before the real cpu mapping.
>>>>> And from the memory view of point, a buffer should not have the
>>>> cached
>>>>> attribute util is really mapped by the cpu. So in this patch, we
>>>>> introduced the new ioctl command to target the requirement.
>>>>>
>>>>
>>>> This is racy and error prone. Can you explain more what problem you
>>>> are trying to solve?
>>>
>>> My use case is like this:
>>> 1. There are two process A and B, A takes case of ion buffer allocation,
>> and
>>> pass the buffer fd to B, then B maps and uses it.
>>> 2. Process B need to map the buffer with different cached attribute
>>> for different use case, for example, if the buffer is used for pure
>>> software algorithm, then we need to map it as cached, otherwise
>>> non-cached, and B needs to deal with both cases.
>>> And unfortunately the mmap syscall takes no cached flags and we can't
>>> decide the cache attribute when we are doing the mmap, so I introduce
>>> new the ioctl even though I think the solution is not as good.
>>>
>>>
>>
>> Thanks for the explanation, this was about the use case I expected.
>> I'm pretty sure I had this exact problem once upon a time and we didn't
>> come up with a solution. I'd still like to get rid of uncached buffers in
>> general and just use cached buffers (see
>> http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2018-N
>> ovember/128842.html)
>> What's your usecase for uncached buffers?
>
> Some buffers are only used by HW, in this case, we tend to use uncached buffers.
> But sometimes the SW need to read few buffer contents for debug purpose and
> no synchronization is needed.
>
If this is primarily for debug, that's not really a compelling reason to
support uncached buffers. It's incredibly difficult to do uncached
buffers correctly I'd rather spend effort on making the cached
use cases work better.
Thanks,
Laura
>>
>>>>
>>>>> Signed-off-by: Zeng Tao <prime.zeng(a)hisilicon.com>
>>>>> ---
>>>>> drivers/staging/android/ion/ion-ioctl.c | 4 ++++
>>>>> drivers/staging/android/ion/ion.c | 17
>> +++++++++++++++++
>>>>> drivers/staging/android/ion/ion.h | 1 +
>>>>> drivers/staging/android/uapi/ion.h | 22
>>>> ++++++++++++++++++++++
>>>>> 4 files changed, 44 insertions(+)
>>>>>
>>>>> diff --git a/drivers/staging/android/ion/ion-ioctl.c
>>>>> b/drivers/staging/android/ion/ion-ioctl.c
>>>>> index a8d3cc4..60bb702 100644
>>>>> --- a/drivers/staging/android/ion/ion-ioctl.c
>>>>> +++ b/drivers/staging/android/ion/ion-ioctl.c
>>>>> @@ -12,6 +12,7 @@
>>>>>
>>>>> union ion_ioctl_arg {
>>>>> struct ion_allocation_data allocation;
>>>>> + struct ion_buffer_flag_data update;
>>>>> struct ion_heap_query query;
>>>>> };
>>>>>
>>>>> @@ -83,6 +84,9 @@ long ion_ioctl(struct file *filp, unsigned int
>>>>> cmd, unsigned long arg)
>>>>>
>>>>> break;
>>>>> }
>>>>> + case ION_IOC_BUFFER_UPDATE:
>>>>> + ret = ion_buffer_update(data.update.fd, data.update.flags);
>>>>> + break;
>>>>> case ION_IOC_HEAP_QUERY:
>>>>> ret = ion_query_heaps(&data.query);
>>>>> break;
>>>>> diff --git a/drivers/staging/android/ion/ion.c
>>>>> b/drivers/staging/android/ion/ion.c
>>>>> index 9907332..f1404dc 100644
>>>>> --- a/drivers/staging/android/ion/ion.c
>>>>> +++ b/drivers/staging/android/ion/ion.c
>>>>> @@ -436,6 +436,23 @@ int ion_alloc(size_t len, unsigned int
>>>> heap_id_mask, unsigned int flags)
>>>>> return fd;
>>>>> }
>>>>>
>>>>> +int ion_buffer_update(unsigned int fd, unsigned int flags) {
>>>>> + struct dma_buf *dmabuf;
>>>>> + struct ion_buffer *buffer;
>>>>> +
>>>>> + dmabuf = dma_buf_get(fd);
>>>>> +
>>>>> + if (!dmabuf)
>>>>> + return -EINVAL;
>>>>> +
>>>>> + buffer = dmabuf->priv;
>>>>> + buffer->flags = flags;
>>>>> + dma_buf_put(dmabuf);
>>>>> +
>>>>> + return 0;
>>>>> +}
>>>>> +
>>>>> int ion_query_heaps(struct ion_heap_query *query)
>>>>> {
>>>>> struct ion_device *dev = internal_dev; diff --git
>>>>> a/drivers/staging/android/ion/ion.h
>>>>> b/drivers/staging/android/ion/ion.h
>>>>> index c006fc1..99bf9ab 100644
>>>>> --- a/drivers/staging/android/ion/ion.h
>>>>> +++ b/drivers/staging/android/ion/ion.h
>>>>> @@ -199,6 +199,7 @@ int ion_heap_pages_zero(struct page *page,
>>>> size_t size, pgprot_t pgprot);
>>>>> int ion_alloc(size_t len,
>>>>> unsigned int heap_id_mask,
>>>>> unsigned int flags);
>>>>> +int ion_buffer_update(unsigned int fd, unsigned int flags);
>>>>>
>>>>> /**
>>>>> * ion_heap_init_shrinker
>>>>> diff --git a/drivers/staging/android/uapi/ion.h
>>>>> b/drivers/staging/android/uapi/ion.h
>>>>> index 5d70098..99753fc 100644
>>>>> --- a/drivers/staging/android/uapi/ion.h
>>>>> +++ b/drivers/staging/android/uapi/ion.h
>>>>> @@ -74,6 +74,20 @@ struct ion_allocation_data {
>>>>> __u32 unused;
>>>>> };
>>>>>
>>>>> +/**
>>>>> + * struct ion_buffer_flag_data - metadata passed from userspace for
>>>>> +update
>>>>> + * buffer flags
>>>>> + * @fd: file descriptor of the buffer
>>>>> + * @flags: flags passed to the buffer
>>>>> + *
>>>>> + * Provided by userspace as an argument to the ioctl */
>>>>> +
>>>>> +struct ion_buffer_flag_data {
>>>>> + __u32 fd;
>>>>> + __u32 flags;
>>>>> +}
>>>>> +
>>>>> #define MAX_HEAP_NAME 32
>>>>>
>>>>> /**
>>>>> @@ -116,6 +130,14 @@ struct ion_heap_query {
>>>>> struct ion_allocation_data)
>>>>>
>>>>> /**
>>>>> + * DOC: ION_IOC_BUFFER_UPDATE - update the specified ion buffer
>>>> flags
>>>>> + *
>>>>> + * Takes an ion_buffer_flag_data structure and returns the result
>>>>> +of the
>>>>> + * buffer flag update operation.
>>>>> + */
>>>>> +#define ION_IOC_BUFFER_UPDATE _IOWR(ION_IOC_MAGIC, 1, \
>>>>> + struct ion_buffer_flag_data)
>>>>> +/**
>>>>> * DOC: ION_IOC_HEAP_QUERY - information about available
>> heaps
>>>>> *
>>>>> * Takes an ion_heap_query structure and populates information
>>>> about
>>>>>
>>>
>