On Wed, Jun 03, 2015 at 08:39:55AM +0200, Hans Verkuil wrote:
Hi Sumit,
On 05/05/2015 04:41 PM, Sumit Semwal wrote:
Hi Russell, everyone,
First up, sincere apologies for being awol for sometime; had some personal / medical things to take care of, and then I thought I'd wait for the merge window to get over before beginning to discuss this again.
On 11 February 2015 at 21:53, Russell King - ARM Linux linux@arm.linux.org.uk wrote:
On Wed, Feb 11, 2015 at 01:20:24PM +0100, Marek Szyprowski wrote:
Hello,
On 2015-02-11 12:12, Russell King - ARM Linux wrote:
Which is a damn good reason to NAK it - by that admission, it's a half-baked idea.
If all we want to know is whether the importer can accept only contiguous memory or not, make a flag to do that, and allow the exporter to test this flag. Don't over-engineer this to make it _seem_ like it can do something that it actually totally fails with.
As I've already pointed out, there's a major problem if you have already had a less restrictive attachment which has an active mapping, and a new more restrictive attachment comes along later.
It seems from Rob's descriptions that we also need another flag in the importer to indicate whether it wants to have a valid struct page in the scatter list, or whether it (correctly) uses the DMA accessors on the scatter list - so that exporters can reject importers which are buggy.
Okay, but flag-based approach also have limitations.
Yes, the flag-based approach doesn't let you describe in detail what the importer can accept - which, given the issues that I've raised is a *good* thing. We won't be misleading anyone into thinking that we can do something that's really half-baked, and which we have no present requirement for.
This is precisely what Linus talks about when he says "don't over- engineer" - if we over-engineer this, we end up with something that sort-of works, and that's a bad thing.
The Keep It Simple approach here makes total sense - what are our current requirements - to be able to say that an importer can only accept:
- contiguous memory rather than a scatterlist
- scatterlists with struct page pointers
Does solving that need us to compare all the constraints of each and every importer, possibly ending up with constraints which can't be satisfied? No. Does the flag approach satisfy the requirements? Yes.
So, for basic constraint-sharing, we'll just go with the flag based approach, with a flag (best place for it is still dev->dma_params I suppose) for denoting contiguous or scatterlist. Is that agreed, then? Also, with this idea, of course, there won't be any helpers for trying to calculate constraints; it would be totally the exporter's responsibility to handle it via the attach() dma_buf_op if it wishes to.
What's wrong with the proposed max_segment_count? Many media devices do have a limited max_segment_count and that should be taken into account.
So what happens if you have a dma_buf exporter, and several dma_buf importers. One dma_buf importer attaches to the exporter, and asks for the buffer, and starts making use of the buffer. This export has many scatterlist segments.
Another dma_buf importer attaches to the same buffer, and now asks for the buffer, but the number of scatterlist segments exceeds it's requirement.
You can't reallocate the buffer because it's in-use by another importer. There is no way to revoke the buffer from the other importer. So there is no way to satisfy this importer's requirements.
What I'm showing is that the idea that exporting these parameters fixes some problem is just an illusion - it may work for the single importer case, but doesn't for the multiple importer case.
Importers really have two choices here: either they accept what the exporter is giving them, or they reject it.
The other issue here is that DMA scatterlists are _not_ really that determinable in terms of number of entries when it comes to systems with system IOMMUs. System IOMMUs, which should be integrated into the DMA API, are permitted to coalesce entries in the physical page range. For example:
nsg = 128; n = dma_map_sg(dev, sg, nsg, DMA_TO_DEVICE);
Here, n might be 4 if the system IOMMU has been able to coalesce the 128 entries down to 4 IOMMU entries - and that means for DMA purposes, only the first four scatterlist entries should be walked (this is why dma_map_sg() returns a positive integer when mapping.)
Each struct device has a set of parameters which control how the IOMMU entries are coalesced:
struct device_dma_parameters { /* * a low level driver may set these to teach IOMMU code about * sg limitations. */ unsigned int max_segment_size; unsigned long segment_boundary_mask; };
and this is independent of the dma_buf API. This doesn't indicate the maximum number of segments, but as I've shown above, it's not something that you can say "I want a scatterlist for this memory with only 32 segments" so it's totally unclear how an exporter would limit that.
The only thing an exporter could do would be to fail the export if the buffer didn't end up having fewer than the requested scatterlist entries, which is something the importer can do too.
One of the main problems end-users are faced with today is that they do not know which device should be the exporter of buffers and which should be the importer. This depends on the constraints and right now applications have no way of knowing this. It's nuts that this hasn't been addressed yet since it is the main complaint I am getting.
IT's nuts that we've ended up in this situation in the first place. This was bound to happen as soon as the dma_buf sharing was introduced, because it immediately introduced this problem. I don't think there is any easy solution to it, and what's being proposed with flags and other stuff is just trying to paper over the problem.
What you're actually asking is that each dmabuf exporting subsystem needs to publish their DMA parameters to userspace, and userspace then gets to decide which dmabuf exporter should be used.
That's not a dmabuf problem, that's a subsystem problem, but even so, we don't have a standardised way to export that information (and I'd suspect that it would be very difficult to get agreements between subsystems on a standard ioctl and/or data structure.) In my experience, getting cross- subsystem agreement in the kernel with anything is very difficult, you normally end up with 60% of people agreeing, and the other 40% going off and doing something completely different because they object to it (figures vary, 90% of all statistics are made up on the spot!)