On 8/24/22 02:41, Chao Peng wrote:
On Tue, Aug 23, 2022 at 04:05:27PM +0000, Sean Christopherson wrote:
On Tue, Aug 23, 2022, David Hildenbrand wrote:
On 19.08.22 05:38, Hugh Dickins wrote:
On Fri, 19 Aug 2022, Sean Christopherson wrote:
On Thu, Aug 18, 2022, Kirill A . Shutemov wrote:
On Wed, Aug 17, 2022 at 10:40:12PM -0700, Hugh Dickins wrote: > On Wed, 6 Jul 2022, Chao Peng wrote: > But since then, TDX in particular has forced an effort into preventing > (by flags, seals, notifiers) almost everything that makes it shmem/tmpfs. > > Are any of the shmem.c mods useful to existing users of shmem.c? No. > Is MFD_INACCESSIBLE useful or comprehensible to memfd_create() users? No.
But QEMU and other VMMs are users of shmem and memfd. The new features certainly aren't useful for _all_ existing users, but I don't think it's fair to say that they're not useful for _any_ existing users.
Okay, I stand corrected: there exist some users of memfd_create() who will also have use for "INACCESSIBLE" memory.
As raised in reply to the relevant patch, I'm not sure if we really have to/want to expose MFD_INACCESSIBLE to user space. I feel like this is a requirement of specific memfd_notifer (memfile_notifier) implementations -- such as TDX that will convert the memory and MCE-kill the machine on ordinary write access. We might be able to set/enforce this when registering a notifier internally instead, and fail notifier registration if a condition isn't met (e.g., existing mmap).
So I'd be curious, which other users of shmem/memfd would benefit from (MMU)-"INACCESSIBLE" memory obtained via memfd_create()?
I agree that there's no need to expose the inaccessible behavior via uAPI. Making it a kernel-internal thing that's negotiated/resolved when KVM binds to the fd would align INACCESSIBLE with the UNMOVABLE and UNRECLAIMABLE flags (and any other flags that get added in the future).
AFAICT, the user-visible flag is a holdover from the early RFCs and doesn't provide any unique functionality.
That's also what I'm thinking. And I don't see problem immediately if user has populated the fd at the binding time. Actually that looks an advantage for previously discussed guest payload pre-loading.
I think this gets awkward. Trying to define sensible semantics for what happens if a shmem or similar fd gets used as secret guest memory and that fd isn't utterly and completely empty can get quite nasty. For example:
If there are already mmaps, then TDX (much more so than SEV) really doesn't want to also use it as guest memory.
If there is already data in the fd, then maybe some technologies can use this for pre-population, but TDX needs explicit instructions in order to get the guest's hash right.
In general, it seems like it will be much more likely to actually work well if the user (uAPI) is required to declare to the kernel exactly what the fd is for (e.g. TDX secret memory, software-only secret memory, etc) before doing anything at all with it other than binding it to KVM.
INACCESSIBLE is a way to achieve this. Maybe it's not the prettiest in the world -- I personally would rather see an explicit request for, say, TDX or SEV memory or maybe the memory that works for a particular KVM instance instead of something generic like INACCESSIBLE, but this is a pretty weak preference. But I think that just starting with a plain memfd is a can of worms.