On Tue, May 28, 2024 at 7:42 PM Pavel Begunkov asml.silence@gmail.com wrote:
On 5/28/24 18:36, Mina Almasry wrote:
On Wed, May 22, 2024 at 11:02 PM David Wei dw@davidwei.uk wrote:
...
*/
if (!skb_frag_net_iov(frag)) {
net_err_ratelimited("Found non-dmabuf skb with net_iov");
err = -ENODEV;
goto out;
}
niov = skb_frag_net_iov(frag);
Sorry if we've already discussed this.
We have this additional hunk:
- if (niov->pp->mp_ops != &dmabuf_devmem_ops) {
err = -ENODEV;
goto out;
- }
In case one of our skbs end up here, skb_frag_is_net_iov() and !skb_frags_readable(). Does this even matter? And if so then is there a better way to distinguish between our two types of net_iovs?
Thanks for bringing this up, yes, maybe we do need a way to distinguish, but it's not 100% critical, no? It's mostly for debug checking?
Not really. io_uring definitely wouldn't want the devmem completion path taking an iov and basically stashing it into a socket (via refcount), that's a lifetime problem. Nor we'd have all the binding/chunk_owner parts you have and probably use there.
Same the other way around, you don't want io_uring grabbing your iov and locking it up, it won't even be possible to return it back. We also may want to have access to backing pages for different fallback purposes, for which we need to know the iov came from this particular ring.
It shouldn't happen for a behaving user, but most of it would likely be exploitable one way or another.
I would say add a helper, like net_iov_is_dmabuf() or net_iov_is_io_uring().
We're verifying that the context the iov bound to is the current context (e.g. io_uring instance) we're executing from. If we can agree that mp_priv should be a valid pointer, the check would look like:
if (pp->mp_priv == io_uring_ifq)
Checking for niov->pp->mp_ops seems a bit hacky to me, and may be outright broken. IIRC niov's can be disconnected from the page_pool via page_pool_clear_pp_info(), and niov->pp may be null. Abstractly
It's called in the release path like page_pool_return_page(), I can't imagine someone can sanely clear it while inflight ...
Ah, yes, I wasn't sure what happens to the inflight pages when the pp gets destroyed. I thought maybe the pp would return the inflight pages, but it looks to me like the pp just returns the free pages in the alloc cache and the ptr_ring, and the pp stays alive until all the inflight pages are freed. So indeed niov->pp should always be valid while it's in flight. I still prefer to have the memory type to be part of the niov itself, but I don't feel strongly at this point; up to you.
speaking the niov type maybe should be a property of the niov itself, and not the pp the niov is attached to.
... but I can just stash all that in niov->owner, struct dmabuf_genpool_chunk_owner you have. That might be even cleaner. And regardless of it I'll be making some minor changes to the structure to make it generic.
It is not immediately obvious to me what the best thing to do here is, maybe it's best to add a flag to niov or to use niov->pp_magic for this.
I would humbly ask that your follow up patchset takes care of this bit, if possible. I think mine is doing quite a bit of heavy lifting as is (and I think may be close to ready?), when it comes to concerns of devmem + io_uring coexisting if you're able to take care, awesome, if not, I can look into squashing some fix.
Let it be this way then. It's not a problem while there is only one such a provider.
Thank you!