On 02.08.23 17:27, Mel Gorman wrote:
On Tue, Aug 01, 2023 at 02:48:39PM +0200, David Hildenbrand wrote:
KVM is *the* case we know that really wants to honor NUMA hinting falls. As we want to stop setting FOLL_HONOR_NUMA_FAULT implicitly, set FOLL_HONOR_NUMA_FAULT whenever we might obtain pages on behalf of a VCPU to map them into a secondary MMU, and add a comment why.
Do that unconditionally in hva_to_pfn_slow() when calling get_user_pages_unlocked().
kvmppc_book3s_instantiate_page(), hva_to_pfn_fast() and gfn_to_page_many_atomic() are similarly used to map pages into a secondary MMU. However, FOLL_WRITE and get_user_page_fast_only() always implicitly honor NUMA hinting faults -- as documented for FOLL_HONOR_NUMA_FAULT -- so we can limit this change to a single location for now.
Don't set it in check_user_page_hwpoison(), where we really only want to check if the mapped page is HW-poisoned.
We won't set it for other KVM users of get_user_pages()/pin_user_pages()
- arch/powerpc/kvm/book3s_64_mmu_hv.c: not used to map pages into a secondary MMU.
- arch/powerpc/kvm/e500_mmu.c: only used on shared TLB pages with userspace
- arch/s390/kvm/*: s390x only supports a single NUMA node either way
- arch/x86/kvm/svm/sev.c: not used to map pages into a secondary MMU.
This is a preparation for making FOLL_HONOR_NUMA_FAULT no longer implicitly be set by get_user_pages() and friends.
Signed-off-by: David Hildenbrand david@redhat.com
Seems sane but I don't know KVM well enough to know if this is the only relevant case so didn't ack.
Makes sense, some careful eyes from KVM people would be appreciated.
At least from kvm_main.c POV, I'm pretty confident that that's it.