Sorry for the late reply.
On Wed, May 1, 2024 at 12:55 AM Christoph Hellwig hch@infradead.org wrote:
Still NAK to creating aⅺbitrary hooks here.
Is the concern still that folks may be able to hook proprietary stuff into this like you mentioned before[1]?
I don't see how that can be done as currently written. The page_pool grabs the memory_provider_ops from the netdev_rx_queue struct managed by core net stack and not really overridable by external modules. When the netdev creates the page_pool, it gets the core-managed netdev_rx_queue via something like __netif_get_rx_queue() and passes that to page_pool_create().
We could make the memory_provider_ops even more opaque by only allowing the device to only pass in the netdev + queue num to the page_pool_create, and have the page_pool_create query the netdev_rx_queue struct, to make sure we're getting the one managed by core.
Long story short is that as currently written I think it's pretty much impossible for someone to plug in a proprietary out-of-tree memory provider using these hooks, and if desired I can change the code slightly to make it even more difficult (but maybe that's pointless, I don't think it's possible even in the current iteration). The only way to get a memory_provider_ops in is to seek to merge it as part of the kernel with community approval. Is there something I'm missing here?
This should be a page or dmabuf pool and not an indirect call abstraction allowing random crap to hook into it.
What is the suggested fix here? I do something like:
cp net/core/page_pool.c net/core/dmabuf_pool.c
and then modify it such that the net stack maintains 2 page_pools? There are a lot of cons to that:
1. Code duplication/maintenance (page_pool.c + dmabuf_pool.c will look very similar).
2. The hooks enable more use cases than dmabuf_pool + standard pages. In addition to those, I'm thinking of (but not working on): a. Limited memory pools. I.e. a page_pool limited to a certain amount of memory (for overcommited VMs). b. dmabuf pools with GPU virtual addresses. Currently we seek to support dmabuf memory where the virtual address is an offset into the dmabuf for CPU access. For GPU memory accessible to the GPU we need dmabuf memory where the virtual address is the GPU virtual address.
3. Support for multiple page_pools is actually more proprietary friendly IMO. Currently the page_pool is internal to core. If we start adding additional pools we need to have some uniform behavior between all the pools so core can operate on memory that originated from any one of them. In that case it becomes actually easier for someone to develop an out of tree pool and use it from their out-of-tree driver and as long as their out of tree page_pool behaves similarly enough to the decided uniform behavior, it may be able to fool core into thinking it's an in-tree pool...
[1] https://lore.kernel.org/linux-kernel/ZfegzB341oNc_Ocz@infradead.org/
-- Thanks, Mina