On Sun, Sep 17, 2023 at 09:12:40PM +0200, Greg Kroah-Hartman wrote:
6.1-stable review patch. If anyone has any objections, please let me know.
Hi Greg/Michal,
This commit breaks userspace which makes it a bad commit for mainline and an even worse commit for stable.
We ingested 6.1.54 into our nightly testing and found that runc fails to gather cgroup statistics (when reading kmem.limit_in_bytes). The same code is vendored into kubelet and kubelet fails to start if this operation fails. 6.1.53 is fine.
Address this by wiping out the file completely and effectively get back to pre 4.5 era and CONFIG_MEMCG_KMEM=n configuration.
On reads, the runc code checks for MEMCG_KMEM=n by checking kmem.usage_in_bytes. If it is present then runc expects the other cgroup files to be there (including kmem.limit_in_bytes). So this change is not effectively the same.
Here's a link to the PR that would be needed to handle this change in userspace (not merged yet and would need to be propagated through the ecosystem):
https://github.com/opencontainers/runc/pull/4018.
Jeremi
From: Michal Hocko mhocko@suse.com
commit 86327e8eb94c52eca4f93cfece2e29d1bf52acbf upstream.
kmem.limit_in_bytes (v1 way to limit kernel memory usage) has been deprecated since 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes") merged in 5.16. We haven't heard about any serious users since then but it seems that the mere presence of the file is causing more harm thatn good. We (SUSE) have had several bug reports from customers where Docker based containers started to fail because a write to kmem.limit_in_bytes has failed.
This was unexpected because runc code only expects ENOENT (kmem disabled) or EBUSY (tasks already running within cgroup). So a new error code was unexpected and the whole container startup failed. This has been later addressed by https://github.com/opencontainers/runc/commit/52390d68040637dfc77f9fda6bbe70... so current Docker runtimes do not suffer from the problem anymore. There are still older version of Docker in use and likely hard to get rid of completely.
Address this by wiping out the file completely and effectively get back to pre 4.5 era and CONFIG_MEMCG_KMEM=n configuration.
I would recommend backporting to stable trees which have picked up 58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes").
[mhocko@suse.com: restore _KMEM switch case] Link: https://lkml.kernel.org/r/ZKe5wxdbvPi5Cwd7@dhcp22.suse.cz Link: https://lkml.kernel.org/r/20230704115240.14672-1-mhocko@kernel.org Signed-off-by: Michal Hocko mhocko@suse.com Acked-by: Shakeel Butt shakeelb@google.com Acked-by: Johannes Weiner hannes@cmpxchg.org Acked-by: Roman Gushchin roman.gushchin@linux.dev Cc: Muchun Song muchun.song@linux.dev Cc: Tejun Heo tj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Documentation/admin-guide/cgroup-v1/memory.rst | 2 -- mm/memcontrol.c | 10 ---------- 2 files changed, 12 deletions(-)
--- a/Documentation/admin-guide/cgroup-v1/memory.rst +++ b/Documentation/admin-guide/cgroup-v1/memory.rst @@ -91,8 +91,6 @@ Brief summary of control files. memory.oom_control set/show oom controls. memory.numa_stat show the number of memory usage per numa node
- memory.kmem.limit_in_bytes This knob is deprecated and writing to
memory.kmem.usage_in_bytes show current kernel memory allocation memory.kmem.failcnt show the number of kernel memory usage hits limitsit will return -ENOTSUPP.
--- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3841,10 +3841,6 @@ static ssize_t mem_cgroup_write(struct k case _MEMSWAP: ret = mem_cgroup_resize_max(memcg, nr_pages, true); break;
case _KMEM:
/* kmem.limit_in_bytes is deprecated. */
ret = -EOPNOTSUPP;
case _TCP: ret = memcg_update_tcp_max(memcg, nr_pages); break;break;
@@ -5056,12 +5052,6 @@ static struct cftype mem_cgroup_legacy_f }, #endif {
.name = "kmem.limit_in_bytes",
.private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT),
.write = mem_cgroup_write,
.read_u64 = mem_cgroup_read_u64,
- },
- { .name = "kmem.usage_in_bytes", .private = MEMFILE_PRIVATE(_KMEM, RES_USAGE), .read_u64 = mem_cgroup_read_u64,