On Thu, Feb 1, 2024 at 3:15 PM Linus Torvalds torvalds@linux-foundation.org wrote:
On Thu, 1 Feb 2024 at 14:54, Theo de Raadt deraadt@openbsd.org wrote:
Linus, you are in for a shock when the proposal doesn't work for glibc and all the applications!
Heh. I've enjoyed seeing your argumentative style that made you so famous back in the days. Maybe it's always been there, but I haven't seen the BSD people in so long that I'd forgotten all about it.
That said, famously argumentative or not, I think Theo is right, and I do think the MAP_SEALABLE bit is nonsensical.
If somebody wants to mseal() a memory region, why would they need to express that ahead of time?
I like to look at things from the point of view of average Linux userspace developers, they might not have the same level of expertise as the other folks on this email list or they might not have time and mileage for those details.
To me, the most important thing is to deliver a feature that's easy to use and works well. I don't want users to mess things up, so if I'm the one giving them the tools, I'm going to make sure they have all the information they need and that there are safeguards in place.
e.g. considering the following user case: 1> a security sensitive data is allocated from heap, using malloc, from the software component A, and filled with information. 2> software component B then uses mprotect to change it to RO, and seal it using mseal().
Yes. we could choose to allow it. But there are complications:
1> Is this the right pattern ? why don't component A already seal it if they think it is important ? 2> Why heap, why not mmap() a new memory mapping for that security data ? 3> free() will not respect the situation of whether the memory is sealed or not. How would a new developer know they probably shall never free the sealed memory ? 4> brk-shrink will never be able to pass the VMA that gets splited out by mseal(), there are memory footprint implications to the process. 5> what if the security sensitive data happens to be the first VMA or last VMA of the heap, will sealing the first VMA/last VMA cause any issue there ? since they might carry important VMA flags ? ( I don't know enough about brk.) 6> If we ever support sealing the heap for its entirety (make it not executable), and still want to support other brk behaviors, such as shrink/grow, would that conflict with current mseal(), if we allow it on heap from beginning ?
Questions like that, without clear answers, to me it is premature to already let developers start using mseal() for heap.
And even if we have all the answers for heap, how about stack, or other types of virtual memory ?
Again, I don't have enough knowledge to get a complete list that shouldn't be sealed, the input from Theo is none should I worry about. However it is clearly not none to me, besides heap mentioned, there is also aio/shm.
So MAP_SEALABLE is a conservative approach to limit the scope to *** two known use cases *** that I want to work on (libc and chrome) and give time needed to answer those questions. It is like a claim: only those marked by MAP_SEALABLE support the sealing at this point of time.
And MAP_SEALABLE is reversible, e.g. a sysctl could be added to make all memory sealable in the future, or we could obsoleted it entirely when time comes, an application that already passes MAP_SEALABLE can be treated as noop. However, if all memory were allowed to be sealable from the beginning, reversing that decision would be hard.
After those considerations, if MAP_SEALABLE is still not preferred by you. Then I have the following options for you to choose:
1. MAP_NOT_SEALABLE in the mmap(). And I will use them for the heap/aio/shm case. This basically says Linux does not officially support sealing on those, until we support them, we discourage the sealing on those mappings.
2. make MAP_NOT_SEALABLE only a kernel visible flag. So application space won't be able to use it.
3. open for all, and list as much as details in the documentation. If we choose this route, I would like to have more discussion on the heap/stack, at least the Linux developers will learn from those discussions.
So the part I think is sane is the mseal() system call itself, in that it allows *potential* future expansion of the semantics.
But hopefully said future expansion isn't even needed, and all users want the base experience, which is why I think PROT_SEAL (both to mmap and to mprotect) makes sense as an alternative form.
So yes, to my mind
mprotect(addr, len, PROT_READ); mseal(addr, len, 0);
should basically give identical results to
mprotect(addr, len, PROT_READ | PROT_SEAL);
and using PROT_SEAL at mmap() time is similarly the same obvious notion of "map this, and then seal that mapping".
The reason for having "mseal()" as a separate call at all from the PROT_SEAL bit is that it does allow possible future expansion (while PROT_SEAL is just a single bit, and it won't change semantics) but also so that you can do whatever prep-work in stages if you want to, and then just go "now we seal it all".
To clarify: do you mean to have the following ?
mmap(PROT_READ|PROT_SEAL) mseal(addr,len,0) mprotect(addr,len,PROT_READ|PROT_SEAL) ?
I have to think about the mprotect() case.
For mmap(PROT_READ|PROT_SEAL), I might have a use case already:
fs/binfmt_elf.c if (current->personality & MMAP_PAGE_ZERO) { /* Why this, you ask??? Well SVr4 maps page 0 as read-only, and some applications "depend" upon this behavior. Since we do not have the power to recompile these, we emulate the SVr4 behavior. Sigh. */
error = vm_mmap(NULL, 0, PAGE_SIZE, PROT_READ | PROT_EXEC, <-- add PROT_SEAL MAP_FIXED | MAP_PRIVATE, 0); }
I don't see the benefit of RWX page 0, which might make a null pointers error to become executable for some code.
Best Regards, -Jeff
Linus