On 11/02/2020 4:40 PM Mike Rapoport firstname.lastname@example.org wrote:
Isn't memfd_secret currently *unnecessarily* designed to be a "one task feature"? memfd_secret fulfills exactly two (generic) features:
- address space isolation from kernel (aka SECRET_EXCLUSIVE, not in kernel's direct map) - hide from kernel, great
- disabling processor's memory caches against speculative-execution vulnerabilities (spectre and friends, aka SECRET_UNCACHED), also great
But, what about the following use-case: implementing a hardened IPC mechanism where even the kernel is not aware of any data and optionally via SECRET_UNCACHED even the hardware caches are bypassed! With the patches we are so close to achieving this.
How? Shared, SECRET_EXCLUSIVE and SECRET_UNCACHED mmaped pages for IPC involved tasks required to know this mapping (and memfd_secret fd). After IPC is done, tasks can copy sensitive data from IPC pages into memfd_secret() pages, un-sensitive data can be used/copied everywhere.
As long as the task share the file descriptor, they can share the secretmem pages, pretty much like normal memfd.
Including process_vm_readv() and process_vm_writev()? Let's take a hypothetical "dbus-daemon-secure" service that receives data from process A and wants to copy/distribute it to data areas of N other processes. Much like dbus but without SOCK_DGRAM rather direct copy into secretmem/mmap pages (ring-buffer). Should be possible, right?
One missing piece is still the secure zeroization of the page(s) if the mapping is closed by last process to guarantee a secure cleanup. This can probably done as an general mmap feature, not coupled to memfd_secret() and can be done independently ("reverse" MAP_UNINITIALIZED feature).
There are "init_on_alloc" and "init_on_free" kernel parameters that enable zeroing of the pages on alloc and on free globally. Anyway, I'll add zeroing of the freed memory to secretmem.
Great, this allows page-specific (thus runtime-performance-optimized) zeroing of secured pages. init_on_free lowers the performance to much and is not precice enough.