On Wed, Feb 12, 2025 at 12:16:44PM +0100, Christian Brauner wrote:
On Tue, Feb 11, 2025 at 01:56:41PM -0500, Jeff Layton wrote:
On Tue, 2025-02-11 at 18:15 +0100, Christian Brauner wrote:
In [1] it was reported that the acct(2) system call can be used to trigger a NULL deref in cases where it is set to write to a file that triggers an internal lookup.
This can e.g., happen when pointing acct(2) to /sys/power/resume. At the point the where the write to this file happens the calling task has already exited and called exit_fs() but an internal lookup might be triggered through lookup_bdev(). This may trigger a NULL-deref when accessing current->fs.
This series does two things:
Reorganize the code so that the the final write happens from the workqueue but with the caller's credentials. This preserves the (strange) permission model and has almost no regression risk.
Block access to kernel internal filesystems as well as procfs and sysfs in the first place.
This api should stop to exist imho.
I wonder who uses it these days, and what would we suggest they replace it with? Maybe syscall auditing?
Someone pointed me to atop but that also works without it. Since this is a privileged api I think the natural candidate to replace all of this is bpf. I'm pretty sure that it's relatively straightforward to get a lot more information out of it than with acct(2) and it will probably be more performant too.
Without any limitations as it is right now, acct(2) can easily lockup the system quite easily by pointing it to various things in sysfs and I'm sure it can be abused in other ways. So I wouldn't enable it.
And I totally forgot about taskstats via Netlink: https://www.kernel.org/doc/Documentation/accounting/taskstats.txt include/uapi/linux/taskstats.h