2025年11月20日 11:04, "Roman Gushchin" <roman.gushchin@linux.dev mailto:roman.gushchin@linux.dev?to=%22Roman%20Gushchin%22%20%3Croman.gushchin%40linux.dev%3E > 写到:
Hui Zhu hui.zhu@linux.dev writes:
From: Hui Zhu zhuhui@kylinos.cn
This series proposes adding eBPF support to the Linux memory controller, enabling dynamic and extensible memory management policies at runtime.
Background
The memory controller (memcg) currently provides fixed memory accounting and reclamation policies through static kernel code. This limits flexibility for specialized workloads and use cases that require custom memory management strategies.
By enabling eBPF programs to hook into key memory control operations, administrators can implement custom policies without recompiling the kernel, while maintaining the safety guarantees provided by the BPF verifier.
Use Cases
- Custom memory reclamation strategies for specialized workloads
- Dynamic memory pressure monitoring and telemetry
- Memory accounting adjustments based on runtime conditions
- Integration with container orchestration systems for
intelligent resource management 5. Research and experimentation with novel memory management algorithms
Design Overview
This series introduces:
- A new BPF struct ops type (`memcg_ops`) that allows eBPF
programs to implement custom behavior for memory charging operations.
- A hook point in the `try_charge_memcg()` fast path that
invokes registered eBPF programs to determine if custom memory management should be applied.
- The eBPF handler can inspect memory cgroup context and
optionally modify certain parameters (e.g., `nr_pages` for reclamation size).
- A reference counting mechanism using `percpu_ref` to safely
manage the lifecycle of registered eBPF struct ops instances.
Can you please describe how these hooks will be used in practice? What's the problem you can solve with it and can't without?
I generally agree with an idea to use BPF for various memcg-related policies, but I'm not sure how specific callbacks can be used in practice.
Hi Roman,
Following are some ideas that can use ebpf memcg:
Priority‑Based Reclaim and Limits in Multi‑Tenant Environments: On a single machine with multiple tenants / namespaces / containers, under memory pressure it’s hard to decide “who should be squeezed first” with static policies baked into the kernel. Assign a BPF profile to each tenant’s memcg: Under high global pressure, BPF can decide: Which memcgs’ memory.high should be raised (delaying reclaim), Which memcgs should be scanned and reclaimed more aggressively.
Online Profiling / Diagnosing Memory Hotspots: A cgroup’s memory keeps growing, but without patching the kernel it’s difficult to obtain fine‑grained information. Attach BPF to the memcg charge/uncharge path: Record large allocations (greater than N KB) with call stacks and owning file/module, and send them to user space via a BPF ring buffer. Based on sampled data, generate: “Top N memory allocation stacks in this container over the last 10 minutes,” Reports of which objects / call paths are growing fastest. This makes it possible to pinpoint the root cause of host memory anomalies without changing application code, which is very useful in operations/ops scenarios.
SLO‑Driven Auto Throttling / Scale‑In/Out Signals: Use eBPF to observe memory usage slope, frequent reclaim, or near‑OOM behavior within a memcg. When it decides “OOM is imminent,” instead of just killing/raising limits, it can emit a signal to a control‑plane component. For example, send an event to a user‑space agent to trigger automatic scaling, QPS adjustment, or throttling.
Prevent a cgroup from launching a large‑scale fork+malloc attack: BPF checks per‑uid or per‑cgroup allocation behavior over the last few seconds during memcg charge.
And I maintain a software project, https://github.com/teawater/mem-agent, for specialized memory management and related functions. However, I found that implementing certain memory QoS categories for memcg solely from user space is rather inefficient, as it requires frequent access to values within memcg. This is why I want memcg to support eBPF—so that I can place custom memory management logic directly into the kernel using eBPF, greatly improving efficiency.
Best, Hui
Thanks!