A filesystem writeback performance issue was discovered by repeatedly running CPU hotplug operations while a process in a cgroup with memory and io controllers enabled wrote to an ext4 file in a loop.
When a CPU is offlined, the memcg_hotplug_cpu_dead() callback function flushes per-cpu vmstats counters. However, instead of applying a per-cpu counter once to each cgroup in the heirarchy, the per-cpu counter is applied repeatedly just to the nested cgroup. Under certain conditions, the per-cpu NR_FILE_DIRTY counter is routinely positive during hotplug events and the dirty file count artifically inflates. Once the dirty file count grows past the dirty_freerun_ceiling(), balance_dirty_pages() starts a backgroup writeback each time a file page is marked dirty within the nested cgroup.
This change fixes memcg_hotplug_cpu_dead() so that the per-cpu vmstats and vmevents counters are applied once to each cgroup in the heirarchy, similar to __mod_memcg_state() and __count_memcg_events().
Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty") Signed-off-by: Andrew Guerrero ajgja@amazon.com Reviewed-by: Gunnar Kudrjavets gunnarku@amazon.com --- Hey all,
This patch is intended for the 5.10 longterm release branch. It will not apply cleanly to mainline and is inadvertantly fixed by a larger series of changes in later release branches: a3d4c05a4474 ("mm: memcontrol: fix cpuhotplug statistics flushing").
In 5.15, the counter flushing code is completely removed. This may be another viable option here too, though it's a larger change.
Thanks! --- mm/memcontrol.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 142b4d5e08fe..8e085a4f45b7 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2394,7 +2394,7 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu) x = this_cpu_xchg(memcg->vmstats_percpu->stat[i], 0); if (x) for (mi = memcg; mi; mi = parent_mem_cgroup(mi)) - atomic_long_add(x, &memcg->vmstats[i]); + atomic_long_add(x, &mi->vmstats[i]);
if (i >= NR_VM_NODE_STAT_ITEMS) continue; @@ -2417,7 +2417,7 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu) x = this_cpu_xchg(memcg->vmstats_percpu->events[i], 0); if (x) for (mi = memcg; mi; mi = parent_mem_cgroup(mi)) - atomic_long_add(x, &memcg->vmevents[i]); + atomic_long_add(x, &mi->vmevents[i]); } }
base-commit: c30b4019ea89633d790f0bfcbb03234f0d006f87
Hi,
Thanks for your patch.
FYI: kernel test robot notices the stable kernel rule is not satisfied.
The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#opti...
Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree. Subject: [PATCH] mm: memcontrol: fix memcg accounting during cpu hotplug Link: https://lore.kernel.org/stable/20250906032108.30539-1-ajgja%40amazon.com
On Sat, Sep 06, 2025 at 03:21:08AM +0000, Andrew Guerrero wrote:
A filesystem writeback performance issue was discovered by repeatedly running CPU hotplug operations while a process in a cgroup with memory and io controllers enabled wrote to an ext4 file in a loop.
When a CPU is offlined, the memcg_hotplug_cpu_dead() callback function flushes per-cpu vmstats counters. However, instead of applying a per-cpu counter once to each cgroup in the heirarchy, the per-cpu counter is applied repeatedly just to the nested cgroup. Under certain conditions, the per-cpu NR_FILE_DIRTY counter is routinely positive during hotplug events and the dirty file count artifically inflates. Once the dirty file count grows past the dirty_freerun_ceiling(), balance_dirty_pages() starts a backgroup writeback each time a file page is marked dirty within the nested cgroup.
This change fixes memcg_hotplug_cpu_dead() so that the per-cpu vmstats and vmevents counters are applied once to each cgroup in the heirarchy, similar to __mod_memcg_state() and __count_memcg_events().
Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty") Signed-off-by: Andrew Guerrero ajgja@amazon.com Reviewed-by: Gunnar Kudrjavets gunnarku@amazon.com
Hey all,
This patch is intended for the 5.10 longterm release branch. It will not apply cleanly to mainline and is inadvertantly fixed by a larger series of changes in later release branches: a3d4c05a4474 ("mm: memcontrol: fix cpuhotplug statistics flushing").
Why can't we take those instead?
In 5.15, the counter flushing code is completely removed. This may be another viable option here too, though it's a larger change.
If it's not needed anymore, why not just remove it with the upstream commits as well?
thanks,
greg k-h
On 2025-09-07 13:10 UTC, Greg KH wrote:
On Sat, Sep 06, 2025 at 03:21:08AM +0000, Andrew Guerrero wrote:
This patch is intended for the 5.10 longterm release branch. It will not apply cleanly to mainline and is inadvertantly fixed by a larger series of changes in later release branches: a3d4c05a4474 ("mm: memcontrol: fix cpuhotplug statistics flushing").
Why can't we take those instead?
In 5.15, the counter flushing code is completely removed. This may be another viable option here too, though it's a larger change.
If it's not needed anymore, why not just remove it with the upstream commits as well?
Yeah, my understanding is the typical flow is to pull commits from upstream into stable branches. However, I'm not confident I know the the answer to "which upstream commits?" To get started,
`git log -L :memcg_hotplug_cpu_dead:mm/memcontrol.c linux-5.10.y..linux-5.15.y`
tells me that the upstream changes to pull are:
- https://lore.kernel.org/all/20210209163304.77088-1-hannes@cmpxchg.org/T/#u - https://lore.kernel.org/all/20210716212137.1391164-1-shakeelb@google.com/T/#...
However, these are substantial features that "fix" the issue indirectly by transitioning the memcg accounting system over to rstats. I can pick these 10 upstream commits, but I'm worried I may overlook some additional patches from 5.15.y that need to go along with them. I may need some guidance if we go this route.
Another reasonable option is to take neither route. We can maintain this patch internally and then drop it once we upgrade to a new kernel version.
Let me know how you would like to proceed.
Thanks! Andrew
On Mon, Sep 08, 2025 at 09:09:00PM +0000, Andrew Guerrero wrote:
On 2025-09-07 13:10 UTC, Greg KH wrote:
On Sat, Sep 06, 2025 at 03:21:08AM +0000, Andrew Guerrero wrote:
This patch is intended for the 5.10 longterm release branch. It will not apply cleanly to mainline and is inadvertantly fixed by a larger series of changes in later release branches: a3d4c05a4474 ("mm: memcontrol: fix cpuhotplug statistics flushing").
Why can't we take those instead?
In 5.15, the counter flushing code is completely removed. This may be another viable option here too, though it's a larger change.
If it's not needed anymore, why not just remove it with the upstream commits as well?
Yeah, my understanding is the typical flow is to pull commits from upstream into stable branches. However, I'm not confident I know the the answer to "which upstream commits?" To get started,
`git log -L :memcg_hotplug_cpu_dead:mm/memcontrol.c linux-5.10.y..linux-5.15.y`
tells me that the upstream changes to pull are:
- https://lore.kernel.org/all/20210209163304.77088-1-hannes@cmpxchg.org/T/#u
- https://lore.kernel.org/all/20210716212137.1391164-1-shakeelb@google.com/T/#...
However, these are substantial features that "fix" the issue indirectly by transitioning the memcg accounting system over to rstats. I can pick these 10 upstream commits, but I'm worried I may overlook some additional patches from 5.15.y that need to go along with them. I may need some guidance if we go this route.
Testing is key :)
Another reasonable option is to take neither route. We can maintain this patch internally and then drop it once we upgrade to a new kernel version.
Perhaps just do that for now if you all are hitting this issue? It seems to be the only report I've seen so far.
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org