On Tue, Oct 28, 2025 at 12:09:52PM +0530, Dev Jain wrote:
Currently mremap folio pte batch ignores the writable bit during figuring out a set of similar ptes mapping the same folio. Suppose that the first pte of the batch is writable while the others are not - set_ptes will end up setting the writable bit on the other ptes, which is a violation of mremap semantics. Therefore, use FPB_RESPECT_WRITE to check the writable bit while determining the pte batch.
Hmm, it seems to be like we're doing the wrong thing by default here? I must admit I haven't followed the contpte work as much as I would've liked, but it doesn't make much sense to me why FPB_RESPECT_WRITE would be an option you have to explicitly pass, and where folio_pte_batch (the "simple" interface) doesn't Just Do The Right Thing for naive callers.
Auditing all callers: - khugepaged clears a variable number of ptes - memory.c clears a variable number of ptes - mempolicy.c grabs folios for migrations - mlock.c steps over nr_ptes - 1 ptes, speeding up traversal - mremap is borked since we're remapping nr_ptes ptes - rmap.c TTU unmaps nr_ptes ptes for a given folio
so while the vast majority of callers don't seem to care, it would make sense that folio_pte_batch() works conservatively by default, and folio_pte_batch_flags() would allow for further batching (or maybe we would add a separate folio_pte_batch_clear() or folio_pte_batch_greedy() or whatnot).
Cc: stable@vger.kernel.org #6.17 Fixes: f822a9a81a31 ("mm: optimize mremap() by PTE batching") Reported-by: David Hildenbrand david@redhat.com Debugged-by: David Hildenbrand david@redhat.com Signed-off-by: Dev Jain dev.jain@arm.com
But the solution itself looks okay to me. so, fwiw:
Acked-by: Pedro Falcato pfalcato@suse.de