On Fri, Aug 26, 2022 at 5:11 AM Ignat Korchagin ignat@cloudflare.com wrote:
I would also add here that seccomp allows more flexibility than just delivering SIGSYS to a violating application. We can program seccomp bpf to:
- deliver a signal
- return a CUSTOM error code (and BTW somehow this does not trigger
any requirements to change userapi or document in manpages: in my toy example in [1] I'm delivering ENETDOWN from a uname(2) system call, which is not documented in the man pages, but totally valid from a seccomp usage perspective)
- do-nothing, but log the action
So I would say the seccomp reference supports the current approach more than the alternative approach of delivering SIGSYS as technically an LSM implementation of the hook (at least in-kernel one) can chose to deliver a signal to a task via kernel-api, but BPF-LSM (and others) can deliver custom error codes and log the actions as well.
I agree that seccomp mode 2 allows for more flexibility than was mentioned earlier, however seccomp filtering has some limitations in this particular case which can be an issue for some. The first, and perhaps most important, is that some of the information that a seccomp filter might want to inspect is effectively hidden with the clone3(2) syscall due to the clone_args struct; this would make it difficult for a seccomp filter to identify namespace related operations. The second issue is that a seccomp mode 2 based approach requires the applications themselves to "Do The Right Thing" and ensure that the proper seccomp filter is loaded into the kernel before the target fork()/clone()/unshare() call is executed; a LSM which implements a proper mandatory access control mechanism does not rely on the application, it enforces the system's security policy regardless of what actions userspace performs.