Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms

4 Jun 2015

      On 3 June 2015 at 15:07, Hans Verkuil hverkuil@xs4all.nl wrote:
...
On 06/03/15 10:41, Russell King - ARM Linux wrote:
...
On Wed, Jun 03, 2015 at 08:39:55AM +0200, Hans Verkuil wrote:
...
Hi Sumit,
On 05/05/2015 04:41 PM, Sumit Semwal wrote:
...
Hi Russell, everyone,
First up, sincere apologies for being awol for sometime; had some
personal / medical things to take care of, and then I thought I'd wait
for the merge window to get over before beginning to discuss this
again.
On 11 February 2015 at 21:53, Russell King - ARM Linux
linux@arm.linux.org.uk wrote:
...
On Wed, Feb 11, 2015 at 01:20:24PM +0100, Marek Szyprowski wrote:
...
Hello,
On 2015-02-11 12:12, Russell King - ARM Linux wrote:
> Which is a damn good reason to NAK it - by that admission, it's a half-baked
> idea.
>
> If all we want to know is whether the importer can accept only contiguous
> memory or not, make a flag to do that, and allow the exporter to test this
> flag.  Don't over-engineer this to make it _seem_ like it can do something
> that it actually totally fails with.
>
> As I've already pointed out, there's a major problem if you have already
> had a less restrictive attachment which has an active mapping, and a new
> more restrictive attachment comes along later.
>
> It seems from Rob's descriptions that we also need another flag in the
> importer to indicate whether it wants to have a valid struct page in the
> scatter list, or whether it (correctly) uses the DMA accessors on the
> scatter list - so that exporters can reject importers which are buggy.
Okay, but flag-based approach also have limitations.
Yes, the flag-based approach doesn't let you describe in detail what
the importer can accept - which, given the issues that I've raised
is a *good* thing.  We won't be misleading anyone into thinking that
we can do something that's really half-baked, and which we have no
present requirement for.
This is precisely what Linus talks about when he says "don't over-
engineer" - if we over-engineer this, we end up with something that
sort-of works, and that's a bad thing.
The Keep It Simple approach here makes total sense - what are our
current requirements - to be able to say that an importer can only accept:

contiguous memory rather than a scatterlist
scatterlists with struct page pointers

Does solving that need us to compare all the constraints of each and
every importer, possibly ending up with constraints which can't be
satisfied?  No.  Does the flag approach satisfy the requirements?  Yes.
So, for basic constraint-sharing, we'll just go with the flag based
approach, with a flag (best place for it is still dev->dma_params I
suppose) for denoting contiguous or scatterlist. Is that agreed, then?
Also, with this idea, of course, there won't be any helpers for trying
to calculate constraints; it would be totally the exporter's
responsibility to handle it via the attach() dma_buf_op if it wishes
to.
What's wrong with the proposed max_segment_count? Many media devices do
have a limited max_segment_count and that should be taken into account.
So what happens if you have a dma_buf exporter, and several dma_buf
importers.  One dma_buf importer attaches to the exporter, and asks
for the buffer, and starts making use of the buffer.  This export has
many scatterlist segments.
Another dma_buf importer attaches to the same buffer, and now asks for
the buffer, but the number of scatterlist segments exceeds it
requirement.
So, in the midst of all the various directions this discussion has
taken, I seem to have missed to reiterate the base premise for this
suggestion [1] - that we can use this information to help implement a
deferred allocation logic - so that all the importers can attach
first, and the exporter can do the actual allocation on the first
map() call.
This is also inline with the prescribed usage of dma_buf_attach() /
dma_buf_map_attachment() sequence - ideally speaking, all
participating 'importers' of dma_buf should only attach first, and
then map() at a 'later' time, which is usually right before using the
buffer actually.
Note: at present, both DRI and V4L subsystems don't do that; while
proposing this RFC I had deliberately kept that separate, as it is a
related but orthogonal problem to solve. I guess I should address that
in parallel.
...
...
You can't reallocate the buffer because it's in-use by another importer.
There is no way to revoke the buffer from the other importer.  So there
is no way to satisfy this importer's requirements.
You're right; but in a deferred allocation mechanism, this
constraint-sharing can atleast help decide on the most restrictive
allocation at the time of first map() ,and if later, an importer with
more relaxed constraints attaches, it can still use the same buffer.
A more restrictive importer would still be not allowed, but in that
case the exporter can disallow that importer from attaching, and a
feedback to userspace is possible.
...
...
What I'm showing is that the idea that exporting these parameters fixes
some problem is just an illusion - it may work for the single importer
case, but doesn't for the multiple importer case.
Importers really have two choices here: either they accept what the
exporter is giving them, or they reject it.
I agree completely with that.
In a non-deferred allocation case, these constraints have no meaning,
since it will always depend on the first subsystem to attach,
irrespective of the exporter's possible capability to allocate from
different allocators with different constraints; for example, if a
subsystem with relaxed constraints attaches first, a later more
restrictive constraints attach() request will fail, even though the
exporter might have the ability to allocate with the more restricted
constraints.
In deferred allocation, on the other hand, the exporter atleast gets
the ability to possibly choose the allocation mechanism based on the
match of most restricted importers, so the order of attach() ceases to
matter.
...
...
The other issue here is that DMA scatterlists are _not_ really that
determinable in terms of number of entries when it comes to systems with
system IOMMUs.  System IOMMUs, which should be integrated into the DMA
API, are permitted to coalesce entries in the physical page range.  For
example:
  nsg = 128;
  n = dma_map_sg(dev, sg, nsg, DMA_TO_DEVICE);

Here, n might be 4 if the system IOMMU has been able to coalesce the 128
entries down to 4 IOMMU entries - and that means for DMA purposes, only
the first four scatterlist entries should be walked (this is why
dma_map_sg() returns a positive integer when mapping.)
Each struct device has a set of parameters which control how the IOMMU
entries are coalesced:
struct device_dma_parameters {
        /*
         * a low level driver may set these to teach IOMMU code about
         * sg limitations.
         */
        unsigned int max_segment_size;
        unsigned long segment_boundary_mask;
};
and this is independent of the dma_buf API.  This doesn't indicate the
maximum number of segments, but as I've shown above, it's not something
that you can say "I want a scatterlist for this memory with only 32
segments" so it's totally unclear how an exporter would limit that.
The only thing an exporter could do would be to fail the
buffer didn't end up having fewer than the requested scatterlist entries,
which is something the importer can do too.
You're right in that after allocation is done, an exporter can only
fail a more restrictive attach request, but I don't think the importer
has any way of knowing the information about the current allocation?
Unless I misunderstood something.
...
Right.
...
...
One of the main problems end-users are faced with today is that they do not
know which device should be the exporter of buffers and which should be the
importer. This depends on the constraints and right now applications have
no way of knowing this. It's nuts that this hasn't been addressed yet since
it is the main complaint I am getting.
One of the ways to try and solve this is via the deferred allocation
mechanism described above; I hope it makes sense to you all, but if it
doesn't, may I request you to please help me understand why not?
...
...
IT's nuts that we've ended up in this situation in the first place.  This
was bound to happen as soon as the dma_buf sharing was introduced, because
it immediately introduced this problem.  I don't think there is any easy
solution to it, and what's being proposed with flags and other stuff is
just trying to paper over the problem.
This was the first thing raised in the initial discussions. My suggestion at
the time was to give userspace limited information about the buffer restrictions:
Physically contiguous, scatter-gather and 'weird'. But obviously you need
segment_boundary_mask and max_segment_size as well.
And the application can decide based on that info which device has the most
restrictive requirements and make that the exporter.
I am not sure whether there is any sense in exporting the max_segment_count
to userspace (probably not), but I see no reason why we can't set it internally.
That said, a single flag is OK for me as well.
...
What you're actually asking is that each dmabuf exporting subsystem needs
to publish their DMA parameters to userspace, and userspace then gets to
decide which dmabuf exporter should be used.
Yes, that was the initial plan.
In absence of deferred allocation, that could be the other option.
With deferred allocation, we could try and keep this internal to the
kernel.
...
...
That's not a dmabuf problem, that's a subsystem problem,
Well, yes, but it doesn't hurt to sync which DMA parameters are exposed with
what dma-buf uses.
...
but even so, we
don't have a standardised way to export that information (and I'd suspect
that it would be very difficult to get agreements between subsystems on
a standard ioctl and/or data structure.)  In my experience, getting cross-
subsystem agreement in the kernel with anything is very difficult, you
normally end up with 60% of people agreeing, and the other 40% going off
and doing something completely different because they object to it
(figures vary, 90% of all statistics are made up on the spot!)
I don't care which ioctl or other mechanism a subsystem uses to expose the
information. Each subsystem should design their own method for that. Imposing
a standard API for that generally doesn't work for the reasons you give.
But deciding *which* minimum set of information is exposed, that is another
matter and that should be synced. And the easiest starting point for that is
that the device will store that information internally in device_dma_parameters.
The various subsystems can then make APIs to expose that info.
Regards,
    Hans

Best regards,
Sumit.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [Linaro-mm-sig] [RFCv3 2/2] dma-buf: add helpers for sharing attacher constraints with dma-parms