On Thu 20-11-25 09:29:52, hui.zhu@linux.dev wrote: [...]
I generally agree with an idea to use BPF for various memcg-related policies, but I'm not sure how specific callbacks can be used in practice.
Hi Roman,
Following are some ideas that can use ebpf memcg:
Priority‑Based Reclaim and Limits in Multi‑Tenant Environments: On a single machine with multiple tenants / namespaces / containers, under memory pressure it’s hard to decide “who should be squeezed first” with static policies baked into the kernel. Assign a BPF profile to each tenant’s memcg: Under high global pressure, BPF can decide: Which memcgs’ memory.high should be raised (delaying reclaim), Which memcgs should be scanned and reclaimed more aggressively.
Online Profiling / Diagnosing Memory Hotspots: A cgroup’s memory keeps growing, but without patching the kernel it’s difficult to obtain fine‑grained information. Attach BPF to the memcg charge/uncharge path: Record large allocations (greater than N KB) with call stacks and owning file/module, and send them to user space via a BPF ring buffer. Based on sampled data, generate: “Top N memory allocation stacks in this container over the last 10 minutes,” Reports of which objects / call paths are growing fastest. This makes it possible to pinpoint the root cause of host memory anomalies without changing application code, which is very useful in operations/ops scenarios.
SLO‑Driven Auto Throttling / Scale‑In/Out Signals: Use eBPF to observe memory usage slope, frequent reclaim, or near‑OOM behavior within a memcg. When it decides “OOM is imminent,” instead of just killing/raising limits, it can emit a signal to a control‑plane component. For example, send an event to a user‑space agent to trigger automatic scaling, QPS adjustment, or throttling.
Prevent a cgroup from launching a large‑scale fork+malloc attack: BPF checks per‑uid or per‑cgroup allocation behavior over the last few seconds during memcg charge.
AFAIU, these are just very high level ideas rather than anything you are trying to target with this patch series, right?
All I can see is that you add a reclaim hook but it is not really clear to me how feasible it is to actually implement a real memory reclaim strategy this way.
In prinicipal I am not really opposed but the memory reclaim process is rather involved process and I would really like to see there is something real to be done without exporting all the MM code to BPF for any practical use. Is there any POC out there?