On 11/13/19 2:00 PM, Dan Williams wrote: ...
Ugh, when did all this HMM specific manipulation sneak into the generic ZONE_DEVICE path? It used to be gated by pgmap type with its own put_zone_device_private_page(). For example it's certainly unnecessary and might be broken (would need to check) to call mem_cgroup_uncharge() on a DAX page. ZONE_DEVICE users are not a monolith and the HMM use case leaks pages into code paths that DAX explicitly avoids.
It's been this way for a while and I did not react previously, apologies for that. I think __ClearPageActive, __ClearPageWaiters, and mem_cgroup_uncharge, belong behind a device-private conditional. The history here is:
Move some, but not all HMM specifics to hmm_devmem_free(): 2fa147bdbf67 mm, dev_pagemap: Do not clear ->mapping on final put
Remove the clearing of mapping since no upstream consumers needed it: b7a523109fb5 mm: don't clear ->mapping in hmm_devmem_free
Add it back in once an upstream consumer arrived: 7ab0ad0e74f8 mm/hmm: fix ZONE_DEVICE anon page mapping reuse
We're now almost entirely free of ->page_free callbacks except for that weird nouveau case, can that FIXME in nouveau_dmem_page_free() also result in killing the ->page_free() callback altogether? In the meantime I'm proposing a cleanup like this:
OK, assuming this is acceptable (no obvious problems jump out at me, and we can also test it with HMM), then how would you like to proceed, as far as patches go: add such a patch as part of this series here, or as a stand-alone patch either before or after this series? Or something else? And did you plan on sending it out as such?
Also, the diffs didn't quite make it through intact to my "git apply", so I'm re-posting the diff in hopes that this time it survives:
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index f9f76f6ba07b..21db1ce8c0ae 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -338,13 +338,7 @@ static void pmem_release_disk(void *__pmem) put_disk(pmem->disk); }
-static void pmem_pagemap_page_free(struct page *page) -{ - wake_up_var(&page->_refcount); -} - static const struct dev_pagemap_ops fsdax_pagemap_ops = { - .page_free = pmem_pagemap_page_free, .kill = pmem_pagemap_kill, .cleanup = pmem_pagemap_cleanup, }; diff --git a/mm/memremap.c b/mm/memremap.c index 03ccbdfeb697..157edb8f7cf8 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -419,12 +419,6 @@ void __put_devmap_managed_page(struct page *page) * holds a reference on the page. */ if (count == 1) { - /* Clear Active bit in case of parallel mark_page_accessed */ - __ClearPageActive(page); - __ClearPageWaiters(page); - - mem_cgroup_uncharge(page); - /* * When a device_private page is freed, the page->mapping field * may still contain a (stale) mapping value. For example, the @@ -446,10 +440,17 @@ void __put_devmap_managed_page(struct page *page) * handled differently or not done at all, so there is no need * to clear page->mapping. */ - if (is_device_private_page(page)) - page->mapping = NULL; + if (is_device_private_page(page)) { + /* Clear Active bit in case of parallel mark_page_accessed */ + __ClearPageActive(page); + __ClearPageWaiters(page);
- page->pgmap->ops->page_free(page); + mem_cgroup_uncharge(page); + + page->mapping = NULL; + page->pgmap->ops->page_free(page); + } else + wake_up_var(&page->_refcount); } else if (!count) __put_page(page); }