On Sat, Oct 29, 2022 at 05:54:44PM -0700, Nadav Amit wrote:
On Oct 29, 2022, at 5:15 PM, Mike Kravetz mike.kravetz@oracle.com wrote:
zap_page_range is a bit confusing. It appears that the passed range can span multiple vmas. Otherwise, there would be no do while loop. Yet, there is only one mmu_notifier_range_init call specifying the passed vma.
It appears all callers pass a range entirely within a single vma.
The modifications above would work for a range within a single vma. However, things would be more complicated if the range can indeed span multiple vmas. For multiple vmas, we would need to check the first and last vmas for pmd sharing.
Anyone know more about this seeming confusing behavior? Perhaps, range spanning multiple vmas was left over earlier code?
I don’t have personal knowledge, but I noticed that it does not make much sense, at least for MADV_DONTNEED. I tried to batch the TLB flushes across VMAs for madvise’s. [1]
The loop comes from 7e027b14d53e ("vm: simplify unmap_vmas() calling convention", 2012-05-06), where zap_page_range() was used to replace a call to unmap_vmas() because the patch wanted to eliminate the zap details pointer for unmap_vmas(), which makes sense.
I didn't check the old code, but from what I can tell (and also as Mike pointed out) I don't think zap_page_range() in the lastest code base is ever used on multi-vma at all. Otherwise the mmu notifier is already broken - see mmu_notifier_range_init() where the vma pointer is also part of the notification.
Perhaps we should just remove the loop?
Need to get to it sometime.
[1] https://lore.kernel.org/lkml/20210926161259.238054-7-namit@vmware.com/