On Mon, Jul 10, 2023 at 10:19 AM Axel Rasmussen axelrasmussen@google.com wrote:
On Sat, Jul 8, 2023 at 6:08 PM Andrew Morton akpm@linux-foundation.org wrote:
On Fri, 7 Jul 2023 14:55:33 -0700 Axel Rasmussen axelrasmussen@google.com wrote:
Future patches will re-use PTE_MARKER_SWAPIN_ERROR to implement UFFDIO_POISON, so make some various preparations for that:
First, rename it to just PTE_MARKER_POISONED. The "SWAPIN" can be confusing since we're going to re-use it for something not really related to swap. This can be particularly confusing for things like hugetlbfs, which doesn't support swap whatsoever. Also rename some various helper functions.
Next, fix pte marker copying for hugetlbfs. Previously, it would WARN on seeing a PTE_MARKER_SWAPIN_ERROR, since hugetlbfs doesn't support swap. But, since we're going to re-use it, we want it to go ahead and copy it just like non-hugetlbfs memory does today. Since the code to do this is more complicated now, pull it out into a helper which can be re-used in both places. While we're at it, also make it slightly more explicit in its handling of e.g. uffd wp markers.
For non-hugetlbfs page faults, instead of returning VM_FAULT_SIGBUS for an error entry, return VM_FAULT_HWPOISON. For most cases this change doesn't matter, e.g. a userspace program would receive a SIGBUS either way. But for UFFDIO_POISON, this change will let KVM guests get an MCE out of the box, instead of giving a SIGBUS to the hypervisor and requiring it to somehow inject an MCE.
Finally, for hugetlbfs faults, handle PTE_MARKER_POISONED, and return VM_FAULT_HWPOISON_LARGE in such cases. Note that this can't happen today because the lack of swap support means we'll never end up with such a PTE anyway, but this behavior will be needed once such entries *can* show up via UFFDIO_POISON.
--- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -523,6 +523,25 @@ static inline bool mm_tlb_flush_nested(struct mm_struct *mm) return atomic_read(&mm->tlb_flush_pending) > 1; }
+/*
- Computes the pte marker to copy from the given source entry into dst_vma.
- If no marker should be copied, returns 0.
- The caller should insert a new pte created with make_pte_marker().
- */
+static inline pte_marker copy_pte_marker(
swp_entry_t entry, struct vm_area_struct *dst_vma)
+{
pte_marker srcm = pte_marker_get(entry);
/* Always copy error entries. */
pte_marker dstm = srcm & PTE_MARKER_POISONED;
/* Only copy PTE markers if UFFD register matches. */
if ((srcm & PTE_MARKER_UFFD_WP) && userfaultfd_wp(dst_vma))
dstm |= PTE_MARKER_UFFD_WP;
return dstm;
+}
Breaks the build with CONFIG_MMU=n (arm allnoconfig). pte_marker isn't defined.
I'll slap #ifdef CONFIG_MMU around this function, but probably somethng more fine-grained could be used, like CONFIG_PTE_MARKER_UFFD_WP. Please consider.
Whoops, sorry about this. This function "ought" to be in include/linux/swapops.h where it would be inside a #ifdef CONFIG_MMU anyway, but it can't be because it uses userfaultfd_wp() so there'd be a circular include. I think just wrapping it in CONFIG_MMU is the right way.
But, this has also made me realize we need to not advertise UFFDIO_POISON as supported unless we have CONFIG_MMU. I don't want HAVE_ARCH_USERFAULTFD_WP for that, because it's only enabled on x86_64, whereas I want to support at least arm64 as well. I don't see a strong reason not to just use CONFIG_MMU for this too; this feature depends on the API in swapops.h, which uses that ifdef, so I don't see a lot of value out of creating a new but equivalent config option.
Actually, I'm being silly. CONFIG_USERFAULTFD depends on CONFIG_MMU, so we don't need to worry about most of this.
Andrew's fix to just wrap the helper in CONFIG_MMU is enough.
I'll make the needed changes (and also address Peter's comment above) and send out a v5.
btw, both copy_pte_marker() and pte_install_uffd_wp_if_needed() look far too large to justify inlining. Please review the desirability of this.
As far as inlining goes, I'm not opposed to un-inlining this, I was mainly copying that pattern from existing helpers in swapops.h.
One question is, if it weren't inline, where should it go? There is no mm/swapops.c which I would say is otherwise the proper place for it. I don't see any other good place for the functions to go. The one I'm introducing isn't userfaultfd-specific so userfaultfd.c seems wrong.