On Fri, Dec 06, 2024 at 09:14:58PM +0000, Lorenzo Stoakes wrote:
On Fri, Dec 06, 2024 at 12:48:09PM -0800, Isaac Manjarres wrote:
On Fri, Dec 06, 2024 at 06:19:49PM +0000, Lorenzo Stoakes wrote:
On Thu, Dec 05, 2024 at 05:09:22PM -0800, Isaac J. Manjarres wrote:
diff --git a/mm/mmap.c b/mm/mmap.c index b1b2a24ef82e..c7b96b057fda 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -375,6 +375,17 @@ unsigned long do_mmap(struct file *file, unsigned long addr, if (!file_mmap_ok(file, inode, pgoff, len)) return -EOVERFLOW;
Not maybe in favour of _where_ in the logic we check this and definitely not in expanding this do_mmap() stuff much further.
See comment at bottom though... I have a cunning plan :)
if (is_exec_sealed(seals)) {
Are we intentionally disallowing a MAP_PRIVATE memfd's mapping's execution? I've not tested this scenario so don't know if we somehow disallow this in another way but note on write checks we only care about shared mappings.
I mean one could argue that a MAP_PRIVATE situation is the same as copying the data into an anon buffer and doing what you want with it, here you could argue the same...
So probably we should only care about VM_SHARED?
Thanks for taking a look at this!
I'd originally implemented it for just the VM_SHARED case, but after discussing it with Kalesh, I changed it to disallow executable mappings for both MAP_SHARED and MAP_PRIVATE.
Our thought was that write sealing didn't apply in the MAP_PRIVATE case to support COW with MAP_PRIVATE. There's nothing similar to COW with execution, so I decided to prevent it for both cases; it also retains the same behavior as the ashmem driver.
Hm, yeah I'm not sure that's really justified, I mean what's to stop a caller from just mapping their own memory mapping executable, copying the data and executing?
That's a fair point. In that case, I think it makes sense to enforce the seal only when the mapping is shared.
The case I'm trying to address is when a process (A) allocates a memfd that is meant to be read and written by itself and another process (B). A shares the buffer with B, but B injects code into the buffer, and compromises A such that A maps the buffer with PROT_EXEC and runs the code that B injected into it.
If A used F_SEAL_FUTURE_EXEC prior to sharing the buffer, then it could reduce the attack surface on itself in this scenario.
There's also further concerns around execution restriction for instance in memfd_add_seals():
/* * SEAL_EXEC implys SEAL_WRITE, making W^X from the start. */ if (seals & F_SEAL_EXEC && inode->i_mode & 0111) seals |= F_SEAL_SHRINK|F_SEAL_GROW|F_SEAL_WRITE|F_SEAL_FUTURE_WRITE;
So you probably want to change this to include F_SEAL_FUTURE_EXEC, and note
Do you mean adding a case where if F_SEAL_FUTURE_EXEC is in the seals, then we should clear the X bits of the file and use F_SEAL_EXEC as well?
I don't think the case in the if condition should imply F_SEAL_FUTURE_EXEC, since the file is still executable in this case?
your proposal interacts negatively with the whole MFD_NOEXEC_SCOPE_NOEXEC_ENFORCED mode set in vm.memfd_noeec - any system with this set to '2' will literally not allow you to do what you want if set to 2.
See https://origin.kernel.org/doc/html/latest/userspace-api/mfd_noexec.html
Sorry, I didn't follow how this would impact MFD_NOEXEC_SCOPE_NOEXEC_ENFORCED. Could you please clarify that?
Thanks again for reviewing these patches! Happy that I was able to get the gears turning :)
I'm really interested in helping with this, so is there any forum you'd like to use for collaborating on this or any way I can help?
I'm also more than happy to test out any patches that you'd like!
Thanks, I'm just going to post to the mailing list, this is the discussion forum I'm making use of for this :)
I will cc- you on my patch and definitely I'd appreciate testing thanks!
But yeah, to be clear I'm not done with reviewing this, I need more time to digest what you're trying to do here, but you definitely need to think about the exec limitations.
Thanks for sending out the patch. I took a look and tested it out and it definitely makes implementing this a lot nicer!
Thanks, Isaac