On Tue 12-11-19 20:26:56, John Hubbard wrote:
Introduce pin_user_pages*() variations of get_user_pages*() calls, and also pin_longterm_pages*() variations.
These variants all set FOLL_PIN, which is also introduced, and thoroughly documented.
The pin_longterm*() variants also set FOLL_LONGTERM, in addition to FOLL_PIN:
pin_user_pages() pin_user_pages_remote() pin_user_pages_fast() pin_longterm_pages() pin_longterm_pages_remote() pin_longterm_pages_fast()
All pages that are pinned via the above calls, must be unpinned via put_user_page().
The underlying rules are:
- These are gup-internal flags, so the call sites should not directly
set FOLL_PIN nor FOLL_LONGTERM. That behavior is enforced with assertions, for the new FOLL_PIN flag. However, for the pre-existing FOLL_LONGTERM flag, which has some call sites that still directly set FOLL_LONGTERM, there is no assertion yet.
Call sites that want to indicate that they are going to do DirectIO ("DIO") or something with similar characteristics, should call a get_user_pages()-like wrapper call that sets FOLL_PIN. These wrappers will: * Start with "pin_user_pages" instead of "get_user_pages". That makes it easy to find and audit the call sites. * Set FOLL_PIN
For pages that are received via FOLL_PIN, those pages must be returned via put_user_page().
Thanks to Jan Kara and Vlastimil Babka for explaining the 4 cases in this documentation. (I've reworded it and expanded upon it.)
Reviewed-by: Mike Rapoport rppt@linux.ibm.com # Documentation Reviewed-by: Jérôme Glisse jglisse@redhat.com Cc: Jonathan Corbet corbet@lwn.net Cc: Ira Weiny ira.weiny@intel.com Signed-off-by: John Hubbard jhubbard@nvidia.com
Thanks for the documentation. It looks great!
diff --git a/mm/gup.c b/mm/gup.c index 83702b2e86c8..4409e84dff51 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -201,6 +201,10 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, spinlock_t *ptl; pte_t *ptep, pte;
- /* FOLL_GET and FOLL_PIN are mutually exclusive. */
- if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
(FOLL_PIN | FOLL_GET)))
return ERR_PTR(-EINVAL);
retry: if (unlikely(pmd_bad(*pmd))) return no_page_table(vma, flags);
How does FOLL_PIN result in grabbing (at least normal, for now) page reference? I didn't find that anywhere in this patch but it is a prerequisite to converting any user to pin_user_pages() interface, right?
+/**
- pin_user_pages_fast() - pin user pages in memory without taking locks
- Nearly the same as get_user_pages_fast(), except that FOLL_PIN is set. See
- get_user_pages_fast() for documentation on the function arguments, because
- the arguments here are identical.
- FOLL_PIN means that the pages must be released via put_user_page(). Please
- see Documentation/vm/pin_user_pages.rst for further details.
- This is intended for Case 1 (DIO) in Documentation/vm/pin_user_pages.rst. It
- is NOT intended for Case 2 (RDMA: long-term pins).
- */
+int pin_user_pages_fast(unsigned long start, int nr_pages,
unsigned int gup_flags, struct page **pages)
+{
- /* FOLL_GET and FOLL_PIN are mutually exclusive. */
- if (WARN_ON_ONCE(gup_flags & FOLL_GET))
return -EINVAL;
- gup_flags |= FOLL_PIN;
- return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages);
+} +EXPORT_SYMBOL_GPL(pin_user_pages_fast);
I was somewhat wondering about the number of functions you add here. So we have:
pin_user_pages() pin_user_pages_fast() pin_user_pages_remote()
and then longterm variants:
pin_longterm_pages() pin_longterm_pages_fast() pin_longterm_pages_remote()
and obviously we have gup like: get_user_pages() get_user_pages_fast() get_user_pages_remote() ... and some other gup variants ...
I think we really should have pin_* vs get_* variants as they are very different in terms of guarantees and after conversion, any use of get_* variant in non-mm code should be closely scrutinized. OTOH pin_longterm_* don't look *that* useful to me and just using pin_* instead with FOLL_LONGTERM flag would look OK to me and somewhat reduce the number of functions which is already large enough? What do people think? I don't feel too strongly about this but wanted to bring this up.
Honza