On Thu, Jan 08, 2026 at 10:19:18AM +0800, Ming Lei wrote:
> > The feature is in no way nvme specific. nvme is just the initial
> > underlying driver. It makes total sense to support this for any high
> > performance block device, and to pass it through file systems.
>
> But why does FS care the dma buffer attachment? Since high performance
> host controller is exactly the dma buffer attachment point.
I can't parse what you're trying to say here.
> If the callback is added in `struct file_operations` for wiring dma buffer
> and the importer(host contrller), you will see it is hard to let it cross device
> mapper/raid or other stackable block devices.
Why?
But even when not stacking, the registration still needs to go
through the file system even for a single device, never mind multiple
controlled by the file system.
On 1/4/26 02:42, Ming Lei wrote:
> On Thu, Dec 04, 2025 at 02:10:25PM +0100, Christoph Hellwig wrote:
>> On Thu, Dec 04, 2025 at 12:09:46PM +0100, Christian König wrote:
>>>> I find the naming pretty confusing a well. But what this does is to
>>>> tell the file system/driver that it should expect a future
>>>> read_iter/write_iter operation that takes data from / puts data into
>>>> the dmabuf passed to this operation.
>>>
>>> That explanation makes much more sense.
>>>
>>> The remaining question is why does the underlying file system / driver
>>> needs to know that it will get addresses from a DMA-buf?
>>
>> This eventually ends up calling dma_buf_dynamic_attach and provides
>> a way to find the dma_buf_attachment later in the I/O path.
>
> Maybe it can be named as ->dma_buf_attach()? For wiring dma-buf and the
> importer side(nvme).
Yeah that would make it much more cleaner.
Also some higher level documentation would certainly help.
> But I am wondering why not make it as one subsystem interface, such as nvme
> ioctl, then the whole implementation can be simplified a lot. It is reasonable
> because subsystem is exactly the side for consuming/importing the dma-buf.
Yeah that it might be better if it's more nvme specific came to me as well.
Regards,
Christian.
>
>
> Thanks,
> Ming
>
On 12/19/25 16:58, Maxime Ripard wrote:
> On Fri, Dec 19, 2025 at 02:50:50PM +0100, Christian König wrote:
>> On 12/19/25 11:25, Maxime Ripard wrote:
>>> On Mon, Dec 15, 2025 at 03:53:22PM +0100, Christian König wrote:
>>>> On 12/15/25 14:59, Maxime Ripard wrote:
>> ...
>>>>>>> The shared ownership is indeed broken, but it's not more or less broken
>>>>>>> than, say, memfd + udmabuf, and I'm sure plenty of others.
>>>>>>>
>>>>>>> So we really improve the common case, but only make the "advanced"
>>>>>>> slightly more broken than it already is.
>>>>>>>
>>>>>>> Would you disagree?
>>>>>>
>>>>>> I strongly disagree. As far as I can see there is a huge chance we
>>>>>> break existing use cases with that.
>>>>>
>>>>> Which ones? And what about the ones that are already broken?
>>>>
>>>> Well everybody that expects that driver resources are *not* accounted to memcg.
>>>
>>> Which is a thing only because these buffers have never been accounted
>>> for in the first place.
>>
>> Yeah, completely agree. By not accounting it for such a long time we
>> ended up with people depending on this behavior.
>>
>> Not nice, but that's what it is.
>>
>>> So I guess the conclusion is that we shouldn't
>>> even try to do memory accounting, because someone somewhere might not
>>> expect that one of its application would take too much RAM in the
>>> system?
>>
>> Well we do need some kind of solution to the problem. Either having
>> some setting where you say "This memcg limit is inclusive/exclusive
>> device driver allocated memory" or have a completely separate limit
>> for device driver allocated memory.
>
> A device driver memory specific limit sounds like a good idea because it
> would make it easier to bridge the gap with dmem.
Completely agree, but that approach was rejected by the cgroups people.
I mean we can already use udmabuf to allocate memcg accounted system memory which then can be imported into device drivers.
So I don't see much reason why we should account dma-buf heaps and driver interfaces to memcg as well, we just need some way to limit them.
Regards,
Christian.
>
> Happy holidays,
> Maxime
On Tue, Jan 06, 2026 at 07:51:12PM +0000, Pavel Begunkov wrote:
>> But I am wondering why not make it as one subsystem interface, such as nvme
>> ioctl, then the whole implementation can be simplified a lot. It is reasonable
>> because subsystem is exactly the side for consuming/importing the dma-buf.
>
> It's not an nvme specific interface, and so a file op was much more
> convenient.
It is the much better abstraction. Also the nvme subsystems is not
an actor, and registering things to the subsystems does not work.
The nvme controller is the entity that does the dma mapping, and this
interface works very well for that.