Jeff Xu wrote on Wed, Jun 28, 2023 at 09:33:27PM -0700:
BTW I find the current behaviour rather hard to use: setting this to 2 should still set NOEXEC by default in my opinion, just refuse anything that explicitly requested EXEC.
And I just noticed it's not possible to lower the value despite having CAP_SYS_ADMIN: what the heck?! I have never seen such a sysctl and it just forced me to reboot because I willy-nilly tested in the init pid namespace, and quite a few applications that don't require exec broke exactly as I described below.
If the user has CAP_SYS_ADMIN there are more container escape methods than I can count, this is basically free pass to root on main namespace anyway, you're not protecting anything. Please let people set the sysctl to what they want.
Yama has a similar setting, for example, 3 (YAMA_SCOPE_NO_ATTACH) will not allow downgrading at runtime.
Since this is a security feature, not allowing downgrading at run time is part of the security consideration. I hope you understand.
I didn't remember yama had this stuck bit; that still strikes me as unusual, and if you require a custom LSM rule for memfd anyway I don't see why it couldn't enforce that the sysctl is unchanged, but sure.
Please, though: - I have a hard time thinking of 1 as a security flag in general (even if I do agree a sloppy LSM rule could require it); I would only lock 2 - please make it clear, I don't see any entry in the sysctl documentation[1] about memfd_noexec, there should be one and you can copy the wording from yama's doc[2]: "Once set, this sysctl value cannot be changed" [1] Documentation/admin-guide/sysctl/vm.rst [2] Documentation/admin-guide/LSM/Yama.rst
Either way as it stands I still don't think one can expect most userspace applications to be converted until some libc wrapper takes care of the retry logic and a couple of years, so I'll go look for another way of filtering this (and eventually setting this to 1) as you suggested. I'll leave the follow-up up to you and won't bother you more.
Thanks,