RE: [PATCH v2 mm-hotfixes] mm/zswap: fix inconsistent charging when zswap_store_page() fails

28 Jan 2025

Hi Hyeonggon,
...
-----Original Message-----
From: Hyeonggon Yoo 42.hyeyoo@gmail.com
Sent: Tuesday, January 28, 2025 10:55 AM
To: Sridhar, Kanchana P kanchana.p.sridhar@intel.com; Johannes Weiner
hannes@cmpxchg.org; Yosry Ahmed yosryahmed@google.com; Nhat
Pham nphamcs@gmail.com; Chengming Zhou
chengming.zhou@linux.dev; Andrew Morton <akpm@linux-
foundation.org>
Cc: linux-mm@kvack.org; Hyeonggon Yoo 42.hyeyoo@gmail.com;
stable@vger.kernel.org
Subject: [PATCH v2 mm-hotfixes] mm/zswap: fix inconsistent charging when
zswap_store_page() fails
Commit b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()")
skips charging any zswapped base pages when it failed to zswap the entire
folio.
However, when some base pages are zswapped but it failed to zswap
the entire folio, the zswap operation is rolled back.
When freeing zswap entries for those pages, zswap_entry_free() uncharges
the pages that were not previously charged, causing zswap charging to
become inconsistent.
This inconsistency triggers two warnings with following steps:
  # On a machine with 64GiB of RAM and 36GiB of zswap
  $ stress-ng --bigheap 2 # wait until the OOM-killer kills stress-ng
  $ sudo reboot
Two warnings are:
    in mm/memcontrol.c:163, function obj_cgroup_release():
      WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1));
in mm/page_counter.c:60, function page_counter_cancel():
  if (WARN_ONCE(new < 0, "page_counter underflow: %ld

nr_pages=%lu\n",
     new, nr_pages))
While objcg events should only be accounted for when the entire folio is
zswapped, objcg charging should be performed regardlessly.
Fix accordingly.
After resolving the inconsistency, these warnings disappear.
Fixes: b7c0ccdfbafd ("mm: zswap: support large folios in zswap_store()")
Cc: stable@vger.kernel.org
Signed-off-by: Hyeonggon Yoo 42.hyeyoo@gmail.com

v1->v2:
Fixed objcg events being accounted for on zswap failure.
Fixed the incorrect description. I misunderstood that the base pages are
 going to be stored in zswap, but their zswap entries are freed immediately.
Added a comment on why it charges pages that are going to be removed
 from zswap.
mm/zswap.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/mm/zswap.c b/mm/zswap.c
index 6504174fbc6a..10b30ac46deb 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1568,20 +1568,26 @@ bool zswap_store(struct folio *folio)
bytes = zswap_store_page(page, objcg, pool);
if (bytes < 0)


	goto put_pool;




	goto charge_zswap;

compressed_bytes += bytes;
}


if (objcg) {
obj_cgroup_charge_zswap(objcg, compressed_bytes);




if (objcg)
count_objcg_events(objcg, ZSWPOUT, nr_pages);


}
atomic_long_add(nr_pages, &zswap_stored_pages);
count_vm_events(ZSWPOUT, nr_pages);
ret = true;


+charge_zswap:

/*
* Charge zswapped pages even when it failed to zswap the entire



folio,

* because zswap_entry_free() will uncharge them anyway.


* Otherwise zswap charging will become inconsistent.


*/


if (objcg)
obj_cgroup_charge_zswap(objcg, compressed_bytes);



Thanks for finding this bug! I am thinking it might make sense to charge
and increment the zswap_stored_pages counter in zswap_store_page().
Something like:
diff --git a/mm/zswap.c b/mm/zswap.c
index b84c20d889b1..fd2a72598a8a 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1504,11 +1504,14 @@ static ssize_t zswap_store_page(struct page *page,
    entry->pool = pool;
    entry->swpentry = page_swpentry;
    entry->objcg = objcg;
+	if (objcg)
+		obj_cgroup_charge_zswap(objcg, entry->length);
    entry->referenced = true;
    if (entry->length) {
    	INIT_LIST_HEAD(&entry->lru);
    	zswap_lru_add(&zswap_list_lru, entry);
    }
+	atomic_long_inc(&zswap_stored_pages);
return entry->length;
@@ -1526,7 +1529,6 @@ bool zswap_store(struct folio *folio)
    struct obj_cgroup *objcg = NULL;
    struct mem_cgroup *memcg = NULL;
    struct zswap_pool *pool;
-	size_t compressed_bytes = 0;
    bool ret = false;
    long index;
@@ -1569,15 +1571,11 @@ bool zswap_store(struct folio *folio)
    	bytes = zswap_store_page(page, objcg, pool);
    	if (bytes < 0)
    		goto put_pool;
-		compressed_bytes += bytes;
    }
-	if (objcg) {
-		obj_cgroup_charge_zswap(objcg, compressed_bytes);
+	if (objcg)
    	count_objcg_events(objcg, ZSWPOUT, nr_pages);
-	}
-	atomic_long_add(nr_pages, &zswap_stored_pages);
    count_vm_events(ZSWPOUT, nr_pages);
ret = true;
What do you think?
Yosry, Nhat, Johannes, please let me know if this would be a cleaner
approach. If so, I don't think we would be losing a lot of performance
by not doing the one-time charge per folio, but please let me know
your thoughts as well.
Thanks,
Kanchana
...
put_pool:
   zswap_pool_put(pool);
 put_objcg:
--
2.47.1

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

RE: [PATCH v2 mm-hotfixes] mm/zswap: fix inconsistent charging when zswap_store_page() fails