On Sun, Mar 17, 2024 at 7:03 PM Christoph Hellwig hch@infradead.org wrote:
On Mon, Mar 04, 2024 at 06:01:37PM -0800, Mina Almasry wrote:
From: Jakub Kicinski kuba@kernel.org
The page providers which try to reuse the same pages will need to hold onto the ref, even if page gets released from the pool - as in releasing the page from the pp just transfers the "ownership" reference from pp to the provider, and provider will wait for other references to be gone before feeding this page back into the pool.
The word hook always rings a giant warning bell for me, and looking into this series I am concerned indeed.
The only provider provided here is the dma-buf one, and that basically is the only sensible one for the documented design.
Sorry I don't mean to argue but as David mentioned, there are some plans in the works and ones not in the works to extend this to other memory types. David mentioned io_uring & Jakub's huge page use cases which may want to re-use this design. I have an additional one in mind, which is extending devmem TCP for storage devices. Currently storage devices do not support dmabuf and my understanding is that it's very hard to do so, and NVMe uses pci_p2pdma instead. I wonder if it's possible to extend devmem TCP in the future to support pci_p2pdma to support nvme devices in the future.
Additionally I've been thinking about a use case of limiting the amount of memory the net stack can use. Currently the page pool is free to allocate as much memory as it wants from the buddy allocator. This may be undesirable in very low memory setups such as overcommited VMs. We can imagine a memory provider that allows allocation only if the page_pool is below a certain limit. We can also imagine a memory provider that preallocates memory and only uses that pinned pool. None of these are in the works at the moment, but are examples of how this can be (reasonably?) extended.
So instead of adding hooks that random proprietary crap can hook into,
To be completely honest I'm unsure how to design hooks for proprietary code to hook into. I think that would be done on the basis of EXPORTED_SYMBOL? We do not export these hooks, nor plan to at the moment.
why not hard code the dma buf provide and just use a flag? That'll also avoid expensive indirect calls.
Thankfully the indirect calls do not seem to be an issue. We've been able to hit 95% line rate with devmem TCP and I think the remaining 5% are a bottleneck unrelated to the indirect calls. Page_pool benchmarks show a very minor degradation in the fast path, so small it may be just noise in the measurement (may!):
https://lore.kernel.org/netdev/20240305020153.2787423-1-almasrymina@google.c...
This is because the code path that does indirect allocations is the slow path. The page_pool recycles netmem aggressively.