Hello!
On Wed, Jun 28, 2023 at 12:31 PM Dominique Martinet asmadeus@codewreck.org wrote:
Dominique Martinet wrote on Wed, Jun 28, 2023 at 08:42:41PM +0900:
If flags already has either MFD_EXEC or MFD_NOEXEC_SEAL, you don't check the sysctl at all. [...repro snipped..]
What am I missing?
(Perhaps the intent is just to force people to use the flag so it is easier to check for memfd_create in seccomp or other LSM? But I don't see why such a check couldn't consider the absence of a flag as well, so I don't see the point.)
Yes. There is consideration to motivate app devs to migrate their code to use the new EXEC/NOEXEC_SEAL flag for memfd_create, if that answers your question.
BTW I find the current behaviour rather hard to use: setting this to 2 should still set NOEXEC by default in my opinion, just refuse anything that explicitly requested EXEC.
And I just noticed it's not possible to lower the value despite having CAP_SYS_ADMIN: what the heck?! I have never seen such a sysctl and it just forced me to reboot because I willy-nilly tested in the init pid namespace, and quite a few applications that don't require exec broke exactly as I described below.
If the user has CAP_SYS_ADMIN there are more container escape methods than I can count, this is basically free pass to root on main namespace anyway, you're not protecting anything. Please let people set the sysctl to what they want.
Yama has a similar setting, for example, 3 (YAMA_SCOPE_NO_ATTACH) will not allow downgrading at runtime.
Since this is a security feature, not allowing downgrading at run time is part of the security consideration. I hope you understand.
-- Dominique Martinet | Asmadeus
Thanks! -Jeff