On Thu, Jun 27, 2024 at 04:32:42PM -0700, Andrew Morton wrote:
On Thu, 27 Jun 2024 19:19:40 -0400 Peter Xu peterx@redhat.com wrote:
Yang,
On Thu, Jun 27, 2024 at 03:14:13PM -0700, Yang Shi wrote:
The try_grab_folio() is supposed to be used in fast path and it elevates folio refcount by using add ref unless zero. We are guaranteed to have at least one stable reference in slow path, so the simple atomic add could be used. The performance difference should be trivial, but the misuse may be confusing and misleading.
This first paragraph is IMHO misleading itself..
I think we should mention upfront the important bit, on the user impact.
Here IMO the user impact should be: Linux may fail longterm pin in some releavnt paths when applied over CMA reserved blocks. And if to extend a bit, that include not only slow-gup but also the new memfd pinning, because both of them used try_grab_folio() which used to be only for fast-gup.
It's still unclear how users will be affected. What do the *users* see? If it's a slight slowdown, do we need to backport this at all?
The user will see the pin fails, for gpu-slow it further triggers the WARN right below that failure (as in the original report):
folio = try_grab_folio(page, page_increm - 1, foll_flags); if (WARN_ON_ONCE(!folio)) { <------------------------ here /* * Release the 1st page ref if the * folio is problematic, fail hard. */ gup_put_folio(page_folio(page), 1, foll_flags); ret = -EFAULT; goto out; }
For memfd pin and hugepd paths, they should just observe GUP failure on those longterm pins, and it'll be the caller context to decide what user can see, I think.
The patch itself looks mostly ok to me.
There's still some "cleanup" part mangled together, e.g., the real meat should be avoiding the folio_is_longterm_pinnable() check in relevant paths. The rest (e.g. switch slow-gup / memfd pin to use folio_ref_add() not try_get_folio(), and renames) could be good cleanups.
So a smaller fix might be doable, but again I don't have a strong opinion here.
The smaller the better for backporting, of course.
I think a smaller version might be yangge's patch, plus Yang's hugepd "fast" parameter for the hugepd stack, then hugepd can also use try_grab_page(). memfd-pin change can be a separate small patch perhaps squashed.
I'll leave how to move on to Yang.
Thanks,