Re: [RFC PATCH v1 0/8] Introduce mseal() syscall

20 Oct 2023

      Stephen Röttger sroettger@google.com wrote:
...
...
...
IMO: The approaches mimmutable() and mseal() took are different, but
we all want to seal the memory from attackers and make the linux
application safer.
I think you are building mseal for chrome, and chrome alone.
I do not think this will work out for the rest of the application space
because

it is too complicated
experience with mimmutable() says that applications don't do any of it
themselves, it is all in execve(), libc initialization, and ld.so.
You don't strike me as an execve, libc, or ld.so developer.

We do want to build this in a way that it can be applied automatically by ld.so
and we appreciate all your feedback on this.
Hi Stephen,
I am pretty sure your mechanism will be useable by ld.so.
What bothers me is the complex many-bits approach may encourage people
to set only a subset of the bits, and then believe they have a security
primitive.
Partial sealing is not safe.  I define partial sealing as "blocking munmap,
but not mprotect".  Or "blocking mprotect, but not madvise or mmap".
In Message-id ZS/3GCKvNn5qzhC4@casper.infradead.org Matthew stated there
that there are two aspects being locked: which object is mapped, and the
permission of that mapping.  When additional system calls msync() and madvise()
are included in the picture, there are 3 actions being prevented:
- Can someone replace the object
 - Can someone change the permission
 - Can someone throw away the cached pages, reverting to original
   content of the object (that is the madvise / msync)
In Message-id: CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426FkcgnfUGLvA@mail.gmail.com
Jan reminded us of this piece.  I'm taking this as a long-standing security
hole in some sub-operations of msync/madvise which can write to data regions
that aren't actually writeable.  Sub-operations with this problem are MADV_FREE,
MADV_DONTNEED, POSIX_MADV_DONTNEED, MS_INVALIDATE.. on Linux MADV_WIPEONFORK,
and probably a whole bunch of others.  I am testing OpenBSD changes which
demand PROT_WRITE permission for these sub-operations.  Perhaps some systems
are already careful.
If you leave any of these operators available, the object is not actually sealed
against abuse.  I believe an attacker will simply switch to a different operator
(mmap, munmap, mprotect, madvise, msync) to achieve a similar objective of
damaging the permission or contents.
Since mseal() is designed to create partial sealings, the name of the proposed
system call really smells.
...
The intention of
splitting the sealing
by syscall was to provide flexibility while still allowing ld.so to
seal all operations.
Yes, you will have ld.so set all the bits, and the same in C runtime
initialization.  If you convince glibc to stop make the stack executable
in dlopen(), the kernel could automatically do it.. With Linux backwards
compat management, getting there would be an extremely long long long
roadmap.  But anyways the idea would be "set all the bits".  Because otherwise
the object or data isn't safe.
...
Does Linus' proposal to just split munmap / mprotect sealing address your
complexity concerns? ld.so would always use both flags which should then behave
similar to mimmutable().
No, I think it is weak, because it isn't sealed.
A seperate mail in the thread from you says this is about chrome wanting
to use PKU on RWX objects.  I think that's the reason for wanting to
seperate the sealing (I haven't heard of other applications wanting that).
How about we explore that in the other subthread..

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [RFC PATCH v1 0/8] Introduce mseal() syscall