Linus Torvalds torvalds@linux-foundation.org wrote:
On Tue, 17 Oct 2023 at 02:08, Jeff Xu jeffxu@google.com wrote:
It is probably worth noting that I choose to check one and only one sealing type per syscall. i.e. munmap(2) checks MM_SEAL_MUNMAP only.
Yeah, this is wrong.
It's wrong exactly because other system calls will unmap things too.
Using mmap() to over-map something will unmap the old one.
Same goes for mremap() to move over an existing mapping.
So the whole "do things by the name of the system call" is not workable.
All that matters is what the system calls *do*, not what their name is.
I agree completely...
mseal() is a clone of mimmutable(2), but with an extremely over-complicated API based upon dubious arguments.
I designed mimmutable(2) [1] in OpenBSD, it took about a year to get all the components working correctly. There were many intermediate API during development, but in the end the API is simply:
int mimmutable(void *addr, size_t len);
The kernel code for mimmutable() traverses the specified VA range. In that range, it will find unmapped sub-regions (which are are ignored) and mapped sub-regions. For these mapped regions, it does not care what the permissions are, it just marks each sub-region as immutable.
Later on, when any VM operation request upon a VA range attempts to (1) change the permissions (2) to re-map on top (3) or dispose of the mapping, that operation is refused with errno EPERM. We don't care where the request comes from (ie. what system call). It is a behaviour of the VM system, when asked to act upon a VA sub-range mapping.
Very simple semantics.
The only case where the immutable marker is ignored is during address space teardown as a result of process termination.
In his submission of this API, Jeff Xu makes three claims I find dubious;
Also, Chrome wants to adopt this feature for their CFI work [2] and this patchset has been designed to be compatible with the Chrome use case.
I specifically designed mimmutable(2) with chrome in mind, and the chrome binary running on OpenBSD is full of immutable mappings. All the library regions automatically become immutable because ld.so can infer it and do the mimmutable calls for the right subregions.
So this chrome work has already been done by OpenBSD, and it is dead simple. During early development I thought mimmutable(2) would be called by applications or libraries, but I was dead wrong: 99.9% of calls are from ld.so, and no applications need to call it, these are the two exceptions:
In OpenBSD, mimmutable() is used in libc malloc() to lock-down some data structures at initialization time, so they canoot be attacked to create an invariant for use in ROP return-to-libc style methods.
In Chrome, there is a v8_flags variable rounded out to a full page, and placed in .data. Chrome initialized this variable, and wants to mprotect PROT_READ, but .data has been made immutable by ld.so. So we force this page into a new ELF section called "openbsd.mutable" which also behaves RW like .data. Where chrome does the mprotect PROT_READ, it now also performs mimmutable() on that page.
Having a seal type per syscall type helps to add the feature incrementally.
Yet, somehow OpenBSD didn't do it per syscall, and we managed to make our entire base operating system and 10,000+ applications automatically receive the benefits. In one year's effort. The only application which cared about it was chrome, described in the previous paragraph.
I think Jeff's idea here is super dangerous. What will actually happen is people will add a few mseal() sub-operations and think the job is done. It isn't done. They need all the mseal() requests, or the mapping are not safe.
It is very counterproductive to provide developers a complex API that has insecure suboperations.
Applications also know exactly what is sealed.
Actually applicatins won't know because there is no tooling to inspect this -- but I will argue further that applications don't need to know. Immutable marking is a system decision, not a program decision.
I'll close by asking for a new look at the mimmutable(2) API we settled on for OpenBSD. I think there is nothing wrong with it. I'm willing to help guide glibc / ld.so / musl teams through the problems they may find along the way, I know where the skeletons are buried. Two in particular: -znow RELRO already today, and xonly-text in the future.