HAVE_MOVE_PUD enables remapping pages at the PUD level if both the source and destination addresses are PUD-aligned.
With HAVE_MOVE_PUD enabled it can be inferred that there is approximately a 19x improvement in performance on arm64. (See data below).
------- Test Results ---------
The following results were obtained using a 5.4 kernel, by remapping a PUD-aligned, 1GB sized region to a PUD-aligned destination. The results from 10 iterations of the test are given below:
Total mremap times for 1GB data on arm64. All times are in nanoseconds.
Control HAVE_MOVE_PUD
1247761 74271 1219896 46771 1094792 59687 1227760 48385 1043698 76666 1101771 50365 1159896 52500 1143594 75261 1025833 61354 1078125 48697
1134312.6 59395.7 <-- Mean time in nanoseconds
A 1GB mremap completion time drops from ~1.1 milliseconds to ~59 microseconds on arm64. (~19x speed up).
Signed-off-by: Kalesh Singh kaleshsingh@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Cc: Andrew Morton akpm@linux-foundation.org --- Changes in v3: - Add set_pud_at() macro - Used by move_normal_pud().
Changes in v4: - Add Kirill's Acked-by.
arch/arm64/Kconfig | 1 + arch/arm64/include/asm/pgtable.h | 1 + 2 files changed, 2 insertions(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 434d6791e869..7191a79fb44d 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -124,6 +124,7 @@ config ARM64 select HANDLE_DOMAIN_IRQ select HARDIRQS_SW_RESEND select HAVE_MOVE_PMD + select HAVE_MOVE_PUD select HAVE_PCI select HAVE_ACPI_APEI if (ACPI && EFI) select HAVE_ALIGNED_STRUCT_PAGE if SLUB diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index a11bf52e0c38..0b0b36974757 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -454,6 +454,7 @@ static inline pmd_t pmd_mkdevmap(pmd_t pmd) #define pfn_pud(pfn,prot) __pud(__phys_to_pud_val((phys_addr_t)(pfn) << PAGE_SHIFT) | pgprot_val(prot))
#define set_pmd_at(mm, addr, pmdp, pmd) set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd)) +#define set_pud_at(mm, addr, pudp, pud) set_pte_at(mm, addr, (pte_t *)pudp, pud_pte(pud))
#define __p4d_to_phys(p4d) __pte_to_phys(p4d_pte(p4d)) #define __phys_to_p4d_val(phys) __phys_to_pte_val(phys)