Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory

9 Sep 2022


      On Fri, Sep 09, 2022 at 12:11:05PM -0700, Andy Lutomirski wrote:
...
On Fri, Sep 9, 2022, at 7:32 AM, Kirill A . Shutemov wrote:
...
On Thu, Sep 08, 2022 at 09:48:35PM -0700, Andy Lutomirski wrote:
...
On 8/19/22 17:27, Kirill A. Shutemov wrote:
...
On Thu, Aug 18, 2022 at 08:00:41PM -0700, Hugh Dickins wrote:
...
On Thu, 18 Aug 2022, Kirill A . Shutemov wrote:
...
On Wed, Aug 17, 2022 at 10:40:12PM -0700, Hugh Dickins wrote:
> 
> If your memory could be swapped, that would be enough of a good reason
> to make use of shmem.c: but it cannot be swapped; and although there
> are some references in the mailthreads to it perhaps being swappable
> in future, I get the impression that will not happen soon if ever.
> 
> If your memory could be migrated, that would be some reason to use
> filesystem page cache (because page migration happens to understand
> that type of memory): but it cannot be migrated.
Migration support is in pipeline. It is part of TDX 1.5 [1]. And swapping
theoretically possible, but I'm not aware of any plans as of now.
[1] https://www.intel.com/content/www/us/en/developer/articles/technical/intel-t...
I always forget, migration means different things to different audiences.
As an mm person, I was meaning page migration, whereas a virtualization
person thinks VM live migration (which that reference appears to be about),
a scheduler person task migration, an ornithologist bird migration, etc.
But you're an mm person too: you may have cited that reference in the
knowledge that TDX 1.5 Live Migration will entail page migration of the
kind I'm thinking of.  (Anyway, it's not important to clarify that here.)
TDX 1.5 brings both.
In TDX speak, mm migration called relocation. See TDH.MEM.PAGE.RELOCATE.
This seems to be a pretty bad fit for the way that the core mm migrates
pages.  The core mm unmaps the page, then moves (in software) the contents
to a new address, then faults it in.  TDH.MEM.PAGE.RELOCATE doesn't fit into
that workflow very well.  I'm not saying it can't be done, but it won't just
work.
Hm. From what I see we have all necessary infrastructure in place.
Unmaping is NOP for inaccessible pages as it is never mapped and we have
mapping->a_ops->migrate_folio() callback that allows to replace software
copying with whatever is needed, like TDH.MEM.PAGE.RELOCATE.
What do I miss?
Hmm, maybe this isn't as bad as I thought.
Right now, unless I've missed something, the migration workflow is to
unmap (via try_to_migrate) all mappings, then migrate the backing store
(with ->migrate_folio(), although it seems like most callers expect the
actual copy to happen outside of ->migrate_folio(),
Most? I guess you are talking about MIGRATE_SYNC_NO_COPY, right? AFAICS,
it is HMM thing and not a common thing.
...
and then make new
mappings.  With the *current* (vma-based, not fd-based) model for KVM
memory, this won't work -- we can't unmap before calling
TDH.MEM.PAGE.RELOCATE.
We don't need to unmap. The page is not mapped from core-mm PoV.
...
But maybe it's actually okay with some care or maybe mild modifications
with the fd-based model.  We don't have any mmaps, per se, to unmap for
secret / INACCESSIBLE memory.  So maybe we can get all the way to
->migrate_folio() without zapping anything in the secure EPT and just
call TDH-MEM.PAGE.RELOCATE from inside migrate_folio().  And there will
be nothing to fault back in.  From the core code's perspective, it's
like migrating a memfd that doesn't happen to have my mappings at the
time.
Modifications needed if we want to initiate migation from userspace. IIRC,
we don't have any API that can initiate page migration for file ranges,
without mapping the file.
But kernel can do it fine for own housekeeping, like compaction doesn't
need any VMA. And we need compaction working for long term stability of
the system.
-- 
  Kiryl Shutsemau / Kirill A. Shutemov

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory