On Fri, 7 Jul 2023 14:55:33 -0700 Axel Rasmussen axelrasmussen@google.com wrote:
Future patches will re-use PTE_MARKER_SWAPIN_ERROR to implement UFFDIO_POISON, so make some various preparations for that:
First, rename it to just PTE_MARKER_POISONED. The "SWAPIN" can be confusing since we're going to re-use it for something not really related to swap. This can be particularly confusing for things like hugetlbfs, which doesn't support swap whatsoever. Also rename some various helper functions.
Next, fix pte marker copying for hugetlbfs. Previously, it would WARN on seeing a PTE_MARKER_SWAPIN_ERROR, since hugetlbfs doesn't support swap. But, since we're going to re-use it, we want it to go ahead and copy it just like non-hugetlbfs memory does today. Since the code to do this is more complicated now, pull it out into a helper which can be re-used in both places. While we're at it, also make it slightly more explicit in its handling of e.g. uffd wp markers.
For non-hugetlbfs page faults, instead of returning VM_FAULT_SIGBUS for an error entry, return VM_FAULT_HWPOISON. For most cases this change doesn't matter, e.g. a userspace program would receive a SIGBUS either way. But for UFFDIO_POISON, this change will let KVM guests get an MCE out of the box, instead of giving a SIGBUS to the hypervisor and requiring it to somehow inject an MCE.
Finally, for hugetlbfs faults, handle PTE_MARKER_POISONED, and return VM_FAULT_HWPOISON_LARGE in such cases. Note that this can't happen today because the lack of swap support means we'll never end up with such a PTE anyway, but this behavior will be needed once such entries *can* show up via UFFDIO_POISON.
--- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -523,6 +523,25 @@ static inline bool mm_tlb_flush_nested(struct mm_struct *mm) return atomic_read(&mm->tlb_flush_pending) > 1; } +/*
- Computes the pte marker to copy from the given source entry into dst_vma.
- If no marker should be copied, returns 0.
- The caller should insert a new pte created with make_pte_marker().
- */
+static inline pte_marker copy_pte_marker(
swp_entry_t entry, struct vm_area_struct *dst_vma)
+{
- pte_marker srcm = pte_marker_get(entry);
- /* Always copy error entries. */
- pte_marker dstm = srcm & PTE_MARKER_POISONED;
- /* Only copy PTE markers if UFFD register matches. */
- if ((srcm & PTE_MARKER_UFFD_WP) && userfaultfd_wp(dst_vma))
dstm |= PTE_MARKER_UFFD_WP;
- return dstm;
+}
Breaks the build with CONFIG_MMU=n (arm allnoconfig). pte_marker isn't defined.
I'll slap #ifdef CONFIG_MMU around this function, but probably somethng more fine-grained could be used, like CONFIG_PTE_MARKER_UFFD_WP. Please consider.
btw, both copy_pte_marker() and pte_install_uffd_wp_if_needed() look far too large to justify inlining. Please review the desirability of this.