On Wed, Jan 29, 2025 at 03:55:07AM +0900, Hyeonggon Yoo wrote:
Commit b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()") skips charging any zswapped base pages when it failed to zswap the entire folio.
However, when some base pages are zswapped but it failed to zswap the entire folio, the zswap operation is rolled back. When freeing zswap entries for those pages, zswap_entry_free() uncharges the pages that were not previously charged, causing zswap charging to become inconsistent.
This inconsistency triggers two warnings with following steps: # On a machine with 64GiB of RAM and 36GiB of zswap $ stress-ng --bigheap 2 # wait until the OOM-killer kills stress-ng $ sudo reboot
Two warnings are: in mm/memcontrol.c:163, function obj_cgroup_release(): WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1));
in mm/page_counter.c:60, function page_counter_cancel(): if (WARN_ONCE(new < 0, "page_counter underflow: %ld nr_pages=%lu\n", new, nr_pages))
While objcg events should only be accounted for when the entire folio is zswapped, objcg charging should be performed regardlessly. Fix accordingly.
After resolving the inconsistency, these warnings disappear.
Fixes: b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()") Cc: stable@vger.kernel.org Signed-off-by: Hyeonggon Yoo 42.hyeyoo@gmail.com
v1->v2:
Fixed objcg events being accounted for on zswap failure.
Fixed the incorrect description. I misunderstood that the base pages are going to be stored in zswap, but their zswap entries are freed immediately.
Added a comment on why it charges pages that are going to be removed from zswap.
mm/zswap.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/mm/zswap.c b/mm/zswap.c index 6504174fbc6a..10b30ac46deb 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1568,20 +1568,26 @@ bool zswap_store(struct folio *folio) bytes = zswap_store_page(page, objcg, pool); if (bytes < 0)
goto put_pool;
compressed_bytes += bytes; }goto charge_zswap;
- if (objcg) {
obj_cgroup_charge_zswap(objcg, compressed_bytes);
- if (objcg) count_objcg_events(objcg, ZSWPOUT, nr_pages);
- }
atomic_long_add(nr_pages, &zswap_stored_pages); count_vm_events(ZSWPOUT, nr_pages); ret = true; +charge_zswap:
- /*
* Charge zswapped pages even when it failed to zswap the entire folio,
* because zswap_entry_free() will uncharge them anyway.
* Otherwise zswap charging will become inconsistent.
*/
- if (objcg)
obj_cgroup_charge_zswap(objcg, compressed_bytes);
Thanks for fixing this!
Having to charge just to uncharge right after is annoying. Ideally we'd just clear entry->objcg if we fail before charging, but we don't have a direct reference to the entries here and another tree lookup is not ideal either.
I guess we may be able to improve this handling once [1] lands, as we can move the charging logic into zswap_store_folio() where we'd have access to the entries.
For now, would the control flow be easier if we move the charge ahead of the zswap_store_page() loop instead? There is an existing if (objcg) block there as well.
[1]https://lore.kernel.org/linux-mm/20241221063119.29140-12-kanchana.p.sridhar@...
put_pool: zswap_pool_put(pool); put_objcg: -- 2.47.1