In [1] it was reported that the acct(2) system call can be used to trigger a NULL deref in cases where it is set to write to a file that triggers an internal lookup.
This can e.g., happen when pointing acct(2) to /sys/power/resume. At the point the where the write to this file happens the calling task has already exited and called exit_fs() but an internal lookup might be triggered through lookup_bdev(). This may trigger a NULL-deref when accessing current->fs.
This series does two things:
- Reorganize the code so that the the final write happens from the workqueue but with the caller's credentials. This preserves the (strange) permission model and has almost no regression risk.
- Block access to kernel internal filesystems as well as procfs and sysfs in the first place.
This api should stop to exist imho.
Link: https://lore.kernel.org/r/20250127091811.3183623-1-quzicheng@huawei.com [1]
Signed-off-by: Christian Brauner brauner@kernel.org --- Christian Brauner (2): acct: perform last write from workqueue acct: block access to kernel internal filesystems
kernel/acct.c | 134 ++++++++++++++++++++++++++++++++++++---------------------- 1 file changed, 84 insertions(+), 50 deletions(-) --- base-commit: af69e27b3c8240f7889b6c457d71084458984d8e change-id: 20250211-work-acct-a6d8e92a5fe0