Re: [RFC PATCH v2 bpf-next 0/3] bpf: cgroup: support writing and freezing cgroups from BPF

19 Aug 2025

      On 8/20/25 00:31, Djalal Harouni wrote:
...
On 8/18/25 18:32, Tejun Heo wrote:
...
On Mon, Aug 18, 2025 at 10:04:21AM +0100, Djalal Harouni wrote:
...
This patch series add support to write cgroup interfaces from BPF.
It is useful to freeze a cgroup hierarchy on suspicious activity for
a more thorough analysis before killing it. Planned users of this
feature are: systemd and BPF tools where the cgroup hierarchy could
be a system service, user session, k8s pod or a container.
The writing happens via kernfs nodes and the cgroup must be on the
default hierarchy. It implements the requests and feedback from v1 [1]
where now we use a unified path for cgroup user space and BPF writing.
So I want to validate that this is the right approach first.
I don't see any reason to object to the feature but the way it's 
constructed
seems rather odd to me. If it's going to need per-feature code, might as
well bypass the write part and implement a simpler interface - ie.
bpf_cgroup_freeze().
Approach 1:
First RFC months ago was something like that 
"bpf_task_freeze_cgroup" [1], can make it bpf_cgroup_freeze() as a 
proper kfunc, so resurrect approach 1?
Internally it used an ugly path to workaround kernfs active reference 
since we don't hold a kernfs_open_file coming from userspace
kernfs->write path.
I can improve it, but let's discuss please approach (2) since you
suggested it ;-)
Approach 2:
Per the old suggestions from you and Alexei [2] [3] you wanted something
like:
s32 bpf_kernfs_knob_write(struct kernfs_node *dir,
                             const char *knob, char *buf);
I didn't make it generic for kernfs, since don't know yet about sysfs 
use cases and named it "bpf_cgroup_write_interface" to focus on cgroup 
base interfaces.
Doing something that generic now including sysfs without a proper valid 
use cases seems a bit too much. Also we have some cgroup kfunc to 
acquire and release that integrate well, so I kept it focused.
Alexei suggested to refactor the cgroup_base_file[] [4][5] to take 
"kernfs_node" as argument instead of "kernfs_open_file", which will open 
other possibilities for BPF.
However, instead of going full change on cgroup_base_files[], I added a 
minimalist: cgroup_kn_cftype kn_cfts[] that for now hold only 
"cgroup.freeze".
I see three possibilities here:
A. Minimal change with approach presented here:
    add dedicated array cgroup_kn_cftype kn_cfts[] with only
    "cgroup.freeze" and later try to unify it inside cgroup_base_file[].
B. Add "->bpf_write()" handler to cgroup_base_file[] and start only with
    "cgroup.freeze".
C. Refactor all cgroup_base_file[] to take a kernfs_node directly
    instead of kernfs_open_file as suggested.
I took (C) the simple one since I wanted to do cgroup freeze first. You
Sorry here I meant (A).
Thanks.
...
also suggested maybe in future "cgroup.kill" well if we have it, we 
definitely will start using it. Not sure if we are allowed to BPF sleep 
that path, however we can also start doing "cgroup.freeze" from BPF and 
kill from user space as a first step. But we definitly want more BPF 
operations on cgroup interfaces, I can think of a companion cgroup where 
we migrate tasks there on specific signals...
So more or less current proposed approach (2) followed the suggestions, 
but focused only on writing cgroup kernfs knobs.
Thoughts, did I miss something?
...
Otherwise, can't it actually write to kernfs files so
that we don't need to add code per enabled feature?
I'm not sure how we would write to kernfs files? As pointed by Alexei 
[6] it is more involved if we want to open files...
About "that we don't need to add code per enabled feature?" well if we
go the path of "bpf_cgroup_write_interface" or "bpf_cgroup_write_knob" 
adding a new write interface will involve theoretically:

Check if program can sleep or/and the calling context.
Add the new "cgroup.x" either in cgroup_base_file[] or in the new

array.
3. New handlers or refactor old ones to take a "kernfs_node" instead of
    "kernfs_open_file".
Compared to having multiple bpf_cgroup_(freeze|kill|...) kfunc seems 
fair too, and not that much code from BPF part.
BTW current patches contain a bug, after testing. In normal writes from 
user context we break kernfs active protection to avoid nesting cgroup 
locking under.
Forget this part. In next series since we don't grab the kernfs_node 
active protection like userspace kernfs_write, then no need to break 
it... will add a parameter to check like the revalidate one that checks 
things is cgroup on dfl and not root, things that are automatically 
handled from normal userspace ->write.
Thank you!
[1] https://lore.kernel.org/bpf/20240327225334.58474-3-tixxdz@gmail.com/
[2] https://lore.kernel.org/bpf/ZgXMww9kJiKi4Vmd@slm.duckdns.org/
[3] https://lore.kernel.org/bpf/Zgc1BZnYCS9OSSTw@slm.duckdns.org/
[4] https://lore.kernel.org/bpf/ 
CAADnVQK970_Nx3918V41ue031RkGs+WsteOAm6EJOY7oSwzS1A@mail.gmail.com/
[5] https://lore.kernel.org/bpf/ 
CAADnVQ+WmaPG1WOaSDbjxNPVzVape_JfG_CNSRy188ni076Mog@mail.gmail.com/
[6] https://lore.kernel.org/bpf/CAADnVQLhWDcX-7XCdo- 
W=jthU=9iPqODwrE6c9fvU8sfAJ5ARg@mail.gmail.com/
...
Thanks.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [RFC PATCH v2 bpf-next 0/3] bpf: cgroup: support writing and freezing cgroups from BPF