Hello,
This patch series implements a new ioctl on the pagemap proc fs file to get, clear and perform both get and clear at the same time atomically on the specified range of the memory.
Soft-dirty PTE bit of the memory pages can be viewed by using pagemap procfs file. The soft-dirty PTE bit for the whole memory range of the process can be cleared by writing to the clear_refs file. This series adds features that weren't present earlier. - There is no atomic get soft-dirty PTE bit status and clear operation present. - The soft-dirty PTE bit of only a part of memory cannot be cleared.
Historically, soft-dirty PTE bit tracking has been used in the CRIU project. The proc fs interface is enough for that as I think the process is frozen. We have the use case where we need to track the soft-dirty PTE bit for the running processes. We need this tracking and clear mechanism of a region of memory while the process is running to emulate the getWriteWatch() syscall of Windows. This syscall is used by games to keep track of dirty pages and keep processing only the dirty pages. This new ioctl can be used by the CRIU project and other applications which require soft-dirty PTE bit information.
As in the current kernel there is no way to clear a part of memory (instead of clearing the Soft-Dirty bits for the entire process) and get+clear operation cannot be performed atomically, there are other methods to mimic this information entirely in userspace with poor performance: - The mprotect syscall and SIGSEGV handler for bookkeeping - The userfaultfd syscall with the handler for bookkeeping Some benchmarks can be seen [1].
This ioctl can be used by the CRIU project and other applications which require soft-dirty PTE bit information. The following operations are supported in this ioctl: - Get the pages that are soft-dirty. - Clear the pages which are soft-dirty. - The optional flag to ignore the VM_SOFTDIRTY and only track per page soft-dirty PTE bit
There are two decisions which have been taken about how to get the output from the syscall. - Return offsets of the pages from the start in the vec - Stop execution when vec is filled with dirty pages These two arguments doesn't follow the mincore() philosophy where the output array corresponds to the address range in one to one fashion, hence the output buffer length isn't passed and only a flag is set if the page is present. This makes mincore() easy to use with less control. We are passing the size of the output array and putting return data consecutively which is offset of dirty pages from the start. The user can convert these offsets back into the dirty page addresses easily. Suppose, the user want to get first 10 dirty pages from a total memory of 100 pages. He'll allocate output buffer of size 10 and the ioctl will abort after finding the 10 pages. This behaviour is needed to support Windows' getWriteWatch(). The behaviour like mincore() can be achieved by passing output buffer of 100 size. This interface can be used for any desired behaviour.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora....
Regards, Muhammad Usama Anjum
Cc: Gabriel Krisman Bertazi krisman@collabora.com Cc: David Hildenbrand david@redhat.com Cc: Peter Enderborg peter.enderborg@sony.com
Muhammad Usama Anjum (4): fs/proc/task_mmu: update functions to clear the soft-dirty bit fs/proc/task_mmu: Implement IOCTL to get and clear soft dirty PTE bit selftests: vm: add pagemap ioctl tests mm: add documentation of the new ioctl on pagemap
Documentation/admin-guide/mm/soft-dirty.rst | 42 +- fs/proc/task_mmu.c | 337 ++++++++++- include/uapi/linux/fs.h | 13 + tools/include/uapi/linux/fs.h | 13 + tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/Makefile | 2 + tools/testing/selftests/vm/pagemap_ioctl.c | 629 ++++++++++++++++++++ 7 files changed, 1005 insertions(+), 32 deletions(-) create mode 100644 tools/testing/selftests/vm/pagemap_ioctl.c
Update the clear_soft_dirty() and clear_soft_dirty_pmd() to optionally clear and return the status if page is dirty.
Signed-off-by: Muhammad Usama Anjum usama.anjum@collabora.com --- Changes in v2: - Move back the functions back to their original file --- fs/proc/task_mmu.c | 82 ++++++++++++++++++++++++++++------------------ 1 file changed, 51 insertions(+), 31 deletions(-)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 8b4f3073f8f5..f66674033207 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -1095,8 +1095,8 @@ static inline bool pte_is_pinned(struct vm_area_struct *vma, unsigned long addr, return page_maybe_dma_pinned(page); }
-static inline void clear_soft_dirty(struct vm_area_struct *vma, - unsigned long addr, pte_t *pte) +static inline bool check_soft_dirty(struct vm_area_struct *vma, + unsigned long addr, pte_t *pte, bool clear) { /* * The soft-dirty tracker uses #PF-s to catch writes @@ -1105,55 +1105,75 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma, * of how soft-dirty works. */ pte_t ptent = *pte; + int dirty = 0;
if (pte_present(ptent)) { pte_t old_pte;
- if (pte_is_pinned(vma, addr, ptent)) - return; - old_pte = ptep_modify_prot_start(vma, addr, pte); - ptent = pte_wrprotect(old_pte); - ptent = pte_clear_soft_dirty(ptent); - ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent); + dirty = pte_soft_dirty(ptent); + + if (dirty && clear && !pte_is_pinned(vma, addr, ptent)) { + old_pte = ptep_modify_prot_start(vma, addr, pte); + ptent = pte_wrprotect(old_pte); + ptent = pte_clear_soft_dirty(ptent); + ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent); + } } else if (is_swap_pte(ptent)) { - ptent = pte_swp_clear_soft_dirty(ptent); - set_pte_at(vma->vm_mm, addr, pte, ptent); + dirty = pte_swp_soft_dirty(ptent); + + if (dirty && clear) { + ptent = pte_swp_clear_soft_dirty(ptent); + set_pte_at(vma->vm_mm, addr, pte, ptent); + } } + + return !!dirty; } #else -static inline void clear_soft_dirty(struct vm_area_struct *vma, - unsigned long addr, pte_t *pte) +static inline bool check_soft_dirty(struct vm_area_struct *vma, + unsigned long addr, pte_t *pte, bool clear) { + return false; } #endif
#if defined(CONFIG_MEM_SOFT_DIRTY) && defined(CONFIG_TRANSPARENT_HUGEPAGE) -static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma, - unsigned long addr, pmd_t *pmdp) +static inline bool check_soft_dirty_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmdp, bool clear) { pmd_t old, pmd = *pmdp; + int dirty = 0;
if (pmd_present(pmd)) { - /* See comment in change_huge_pmd() */ - old = pmdp_invalidate(vma, addr, pmdp); - if (pmd_dirty(old)) - pmd = pmd_mkdirty(pmd); - if (pmd_young(old)) - pmd = pmd_mkyoung(pmd); - - pmd = pmd_wrprotect(pmd); - pmd = pmd_clear_soft_dirty(pmd); - - set_pmd_at(vma->vm_mm, addr, pmdp, pmd); + dirty = pmd_soft_dirty(pmd); + if (dirty && clear) { + /* See comment in change_huge_pmd() */ + old = pmdp_invalidate(vma, addr, pmdp); + if (pmd_dirty(old)) + pmd = pmd_mkdirty(pmd); + if (pmd_young(old)) + pmd = pmd_mkyoung(pmd); + + pmd = pmd_wrprotect(pmd); + pmd = pmd_clear_soft_dirty(pmd); + + set_pmd_at(vma->vm_mm, addr, pmdp, pmd); + } } else if (is_migration_entry(pmd_to_swp_entry(pmd))) { - pmd = pmd_swp_clear_soft_dirty(pmd); - set_pmd_at(vma->vm_mm, addr, pmdp, pmd); + dirty = pmd_swp_soft_dirty(pmd); + + if (dirty && clear) { + pmd = pmd_swp_clear_soft_dirty(pmd); + set_pmd_at(vma->vm_mm, addr, pmdp, pmd); + } } + return !!dirty; } #else -static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma, - unsigned long addr, pmd_t *pmdp) +static inline bool check_soft_dirty_pmd(struct vm_area_struct *vma, + unsigned long addr, pmd_t *pmdp, bool clear) { + return false; } #endif
@@ -1169,7 +1189,7 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr, ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { if (cp->type == CLEAR_REFS_SOFT_DIRTY) { - clear_soft_dirty_pmd(vma, addr, pmd); + check_soft_dirty_pmd(vma, addr, pmd, true); goto out; }
@@ -1195,7 +1215,7 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr, ptent = *pte;
if (cp->type == CLEAR_REFS_SOFT_DIRTY) { - clear_soft_dirty(vma, addr, pte); + check_soft_dirty(vma, addr, pte, true); continue; }
This ioctl can be used to watch the process's memory and perform atomic operations which aren't possible through procfs. Three operations have been implemented:
- PAGEMAP_SD_GET gets the soft dirty pages in a address range. - PAGEMAP_SD_CLEAR clears the soft dirty bit from dirty pages in a address range. - PAGEMAP_SD_GET_AND_CLEAR gets and clears the soft dirty bit in a address range.
struct pagemap_sd_args is used as the argument of the IOCTL. In this struct: - The range is specified through start and len. - The output buffer and size is specified as vec and vec_len. - The flags can be specified in the flags field. Currently only one PAGEMAP_SD_NO_REUSED_REGIONS is supported which can be specified to ignore the VMA dirty flags.
This is based on a patch from Gabriel Krisman Bertazi.
Signed-off-by: Muhammad Usama Anjum usama.anjum@collabora.com --- Changes in v2: - Convert the interface from syscall to ioctl - Remove pidfd support as it doesn't make sense in ioctl --- fs/proc/task_mmu.c | 255 ++++++++++++++++++++++++++++++++++ include/uapi/linux/fs.h | 13 ++ tools/include/uapi/linux/fs.h | 13 ++ 3 files changed, 281 insertions(+)
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index f66674033207..bd09dcd52db2 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -19,6 +19,8 @@ #include <linux/shmem_fs.h> #include <linux/uaccess.h> #include <linux/pkeys.h> +#include <uapi/linux/fs.h> +#include <linux/vmalloc.h>
#include <asm/elf.h> #include <asm/tlb.h> @@ -1775,11 +1777,264 @@ static int pagemap_release(struct inode *inode, struct file *file) return 0; }
+#ifdef CONFIG_MEM_SOFT_DIRTY +#define IS_CLEAR_SD_OP(op) (op == PAGEMAP_SD_CLEAR || op == PAGEMAP_SD_GET_AND_CLEAR) +#define IS_GET_SD_OP(op) (op == PAGEMAP_SD_GET || op == PAGEMAP_SD_GET_AND_CLEAR) + +struct pagemap_sd_private { + unsigned long start; + unsigned int op; + unsigned int flags; + unsigned int index; + unsigned int vec_len; + unsigned long *vec; +}; + +static int pagemap_sd_pmd_entry(pmd_t *pmd, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct pagemap_sd_private *p = walk->private; + struct vm_area_struct *vma = walk->vma; + unsigned long start = addr; + spinlock_t *ptl; + pte_t *pte; + int dirty; + bool dirty_vma = (p->flags & PAGEMAP_SD_NO_REUSED_REGIONS) ? 0 : + (vma->vm_flags & VM_SOFTDIRTY); + + end = min(end, walk->vma->vm_end); + ptl = pmd_trans_huge_lock(pmd, vma); + if (ptl) { + if (dirty_vma || check_soft_dirty_pmd(vma, addr, pmd, false)) { + /* + * Break huge page into small pages if operation needs to be performed is + * on a portion of the huge page or the return buffer cannot store complete + * data. Then process this PMD as having normal pages. + */ + if ((IS_CLEAR_SD_OP(p->op) && (end - addr < HPAGE_SIZE)) || + (IS_GET_SD_OP(p->op) && (p->index + HPAGE_SIZE/PAGE_SIZE > p->vec_len))) { + spin_unlock(ptl); + split_huge_pmd(vma, pmd, addr); + goto process_smaller_pages; + } else { + dirty = check_soft_dirty_pmd(vma, addr, pmd, IS_CLEAR_SD_OP(p->op)); + if (IS_GET_SD_OP(p->op) && (dirty_vma || dirty)) { + for (; addr != end && p->index < p->vec_len; + addr += PAGE_SIZE) + p->vec[p->index++] = addr - p->start; + } + } + } + spin_unlock(ptl); + return 0; + } + +process_smaller_pages: + if (pmd_trans_unstable(pmd)) + return 0; + + pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + for (; addr != end; pte++, addr += PAGE_SIZE) { + dirty = check_soft_dirty(vma, addr, pte, IS_CLEAR_SD_OP(p->op)); + + if (IS_GET_SD_OP(p->op) && (dirty_vma || dirty)) { + p->vec[p->index++] = addr - p->start; + WARN_ON(p->index > p->vec_len); + } + } + pte_unmap_unlock(pte - 1, ptl); + cond_resched(); + + if (IS_CLEAR_SD_OP(p->op)) + flush_tlb_mm_range(vma->vm_mm, start, end, PAGE_SHIFT, false); + + return 0; +} + +static int pagemap_sd_pte_hole(unsigned long addr, unsigned long end, int depth, + struct mm_walk *walk) +{ + struct pagemap_sd_private *p = walk->private; + struct vm_area_struct *vma = walk->vma; + + if (p->flags & PAGEMAP_SD_NO_REUSED_REGIONS) + return 0; + + if (vma && (vma->vm_flags & VM_SOFTDIRTY) && IS_GET_SD_OP(p->op)) { + for (; addr != end && p->index < p->vec_len; addr += PAGE_SIZE) + p->vec[p->index++] = addr - p->start; + } + + return 0; +} + +static int pagemap_sd_pre_vma(unsigned long start, unsigned long end, struct mm_walk *walk) +{ + struct pagemap_sd_private *p = walk->private; + struct vm_area_struct *vma = walk->vma; + int ret; + unsigned long end_cut = end; + + if (p->flags & PAGEMAP_SD_NO_REUSED_REGIONS) + return 0; + + if (IS_CLEAR_SD_OP(p->op) && (vma->vm_flags & VM_SOFTDIRTY)) { + if (vma->vm_start < start) { + ret = split_vma(vma->vm_mm, vma, start, 1); + if (ret) + return ret; + } + + if (IS_GET_SD_OP(p->op)) + end_cut = min(start + p->vec_len * PAGE_SIZE, end); + + if (vma->vm_end > end_cut) { + ret = split_vma(vma->vm_mm, vma, end_cut, 0); + if (ret) + return ret; + } + } + + return 0; +} + +static void pagemap_sd_post_vma(struct mm_walk *walk) +{ + struct pagemap_sd_private *p = walk->private; + struct vm_area_struct *vma = walk->vma; + + if (p->flags & PAGEMAP_SD_NO_REUSED_REGIONS) + return; + + if (IS_CLEAR_SD_OP(p->op) && (vma->vm_flags & VM_SOFTDIRTY)) { + vma->vm_flags &= ~VM_SOFTDIRTY; + vma_set_page_prot(vma); + } +} + +static int pagemap_sd_pmd_test_walk(unsigned long start, unsigned long end, + struct mm_walk *walk) +{ + struct pagemap_sd_private *p = walk->private; + struct vm_area_struct *vma = walk->vma; + + if (IS_GET_SD_OP(p->op) && (p->index == p->vec_len)) + return -1; + + if (vma->vm_flags & VM_PFNMAP) + return 1; + + return 0; +} + +static const struct mm_walk_ops pagemap_sd_ops = { + .test_walk = pagemap_sd_pmd_test_walk, + .pre_vma = pagemap_sd_pre_vma, + .pmd_entry = pagemap_sd_pmd_entry, + .pte_hole = pagemap_sd_pte_hole, + .post_vma = pagemap_sd_post_vma, +}; + +static long do_pagemap_sd_cmd(struct mm_struct *mm, unsigned int cmd, struct pagemap_sd_args *arg) +{ + struct pagemap_sd_private sd_data; + struct mmu_notifier_range range; + unsigned long start, end; + int ret; + + start = (unsigned long)untagged_addr(arg->start); + if ((!IS_ALIGNED(start, PAGE_SIZE)) || !access_ok((void __user *)start, arg->len)) + return -EINVAL; + + if (IS_GET_SD_OP(cmd) && + ((arg->vec_len == 0) || (!arg->vec) || !access_ok(arg->vec, arg->vec_len))) + return -EINVAL; + + end = start + arg->len; + sd_data.start = start; + sd_data.op = cmd; + sd_data.flags = arg->flags; + sd_data.index = 0; + sd_data.vec_len = arg->vec_len; + + if (IS_GET_SD_OP(cmd)) { + sd_data.vec = vzalloc(arg->vec_len * sizeof(loff_t)); + if (!sd_data.vec) + return -ENOMEM; + } + + if (IS_CLEAR_SD_OP(cmd)) { + mmap_write_lock(mm); + + mmu_notifier_range_init(&range, MMU_NOTIFY_SOFT_DIRTY, 0, NULL, + mm, start, end); + mmu_notifier_invalidate_range_start(&range); + inc_tlb_flush_pending(mm); + } else { + mmap_read_lock(mm); + } + + ret = walk_page_range(mm, start, end, &pagemap_sd_ops, &sd_data); + + if (IS_CLEAR_SD_OP(cmd)) { + mmu_notifier_invalidate_range_end(&range); + dec_tlb_flush_pending(mm); + + mmap_write_unlock(mm); + } else { + mmap_read_unlock(mm); + } + + if (ret < 0) + goto free_sd_data; + + if (IS_GET_SD_OP(cmd)) { + ret = copy_to_user(arg->vec, sd_data.vec, sd_data.index * sizeof(loff_t)); + if (ret) { + ret = -EIO; + goto free_sd_data; + } + ret = sd_data.index; + } else { + ret = 0; + } + +free_sd_data: + if (IS_GET_SD_OP(cmd)) + vfree(sd_data.vec); + + return ret; +} + +static long pagemap_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + struct pagemap_sd_args __user *uarg = (struct pagemap_sd_args __user *)arg; + struct mm_struct *mm = file->private_data; + struct pagemap_sd_args arguments; + + switch (cmd) { + case PAGEMAP_SD_GET: + fallthrough; + case PAGEMAP_SD_CLEAR: + fallthrough; + case PAGEMAP_SD_GET_AND_CLEAR: + if (copy_from_user(&arguments, uarg, sizeof(struct pagemap_sd_args))) + return -EFAULT; + return do_pagemap_sd_cmd(mm, cmd, &arguments); + default: + return -EINVAL; + } +} +#endif /* CONFIG_MEM_SOFT_DIRTY */ + const struct file_operations proc_pagemap_operations = { .llseek = mem_lseek, /* borrow this */ .read = pagemap_read, .open = pagemap_open, .release = pagemap_release, +#ifdef CONFIG_MEM_SOFT_DIRTY + .unlocked_ioctl = pagemap_ioctl, +#endif /* CONFIG_MEM_SOFT_DIRTY */ }; #endif /* CONFIG_PROC_PAGE_MONITOR */
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index b7b56871029c..a7e48ba9457b 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -305,4 +305,17 @@ typedef int __bitwise __kernel_rwf_t; #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\ RWF_APPEND)
+struct pagemap_sd_args { + void __user *start; + int len; + loff_t __user *vec; + int vec_len; + int flags; +}; + +#define PAGEMAP_SD_GET _IOWR('f', 16, struct pagemap_sd_args) +#define PAGEMAP_SD_CLEAR _IOWR('f', 17, struct pagemap_sd_args) +#define PAGEMAP_SD_GET_AND_CLEAR _IOWR('f', 18, struct pagemap_sd_args) +#define PAGEMAP_SD_NO_REUSED_REGIONS 0x1 + #endif /* _UAPI_LINUX_FS_H */ diff --git a/tools/include/uapi/linux/fs.h b/tools/include/uapi/linux/fs.h index b7b56871029c..a7e48ba9457b 100644 --- a/tools/include/uapi/linux/fs.h +++ b/tools/include/uapi/linux/fs.h @@ -305,4 +305,17 @@ typedef int __bitwise __kernel_rwf_t; #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\ RWF_APPEND)
+struct pagemap_sd_args { + void __user *start; + int len; + loff_t __user *vec; + int vec_len; + int flags; +}; + +#define PAGEMAP_SD_GET _IOWR('f', 16, struct pagemap_sd_args) +#define PAGEMAP_SD_CLEAR _IOWR('f', 17, struct pagemap_sd_args) +#define PAGEMAP_SD_GET_AND_CLEAR _IOWR('f', 18, struct pagemap_sd_args) +#define PAGEMAP_SD_NO_REUSED_REGIONS 0x1 + #endif /* _UAPI_LINUX_FS_H */
On Thu, Aug 25, 2022 at 12:09:24PM +0500, Muhammad Usama Anjum wrote:
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h index b7b56871029c..a7e48ba9457b 100644 --- a/include/uapi/linux/fs.h +++ b/include/uapi/linux/fs.h @@ -305,4 +305,17 @@ typedef int __bitwise __kernel_rwf_t; #define RWF_SUPPORTED (RWF_HIPRI | RWF_DSYNC | RWF_SYNC | RWF_NOWAIT |\ RWF_APPEND) +struct pagemap_sd_args {
- void __user *start;
- int len;
"int" is not a valid type to cross the user/kernel boundry, sorry. Please be explicit here (__u64? __u32?)
- loff_t __user *vec;
- int vec_len;
- int flags;
Same with these.
thanks,
greg k-h
On Thu, Aug 25, 2022 at 12:09:24PM +0500, Muhammad Usama Anjum wrote:
- The flags can be specified in the flags field. Currently only one PAGEMAP_SD_NO_REUSED_REGIONS is supported which can be specified to ignore the VMA dirty flags.
You forgot to check that all other bits in that flag are set to 0 properly, otherwise you can never add a new bit to the field ever as you can't expect userspace got it right and did not accidentaly set it already.
There is kernel documentation on how to add a new ioctl, you might want to read up on that first before resending this.
thanks,
greg k-h
On 8/25/22 12:22 PM, Greg KH wrote:
On Thu, Aug 25, 2022 at 12:09:24PM +0500, Muhammad Usama Anjum wrote:
- The flags can be specified in the flags field. Currently only one PAGEMAP_SD_NO_REUSED_REGIONS is supported which can be specified to ignore the VMA dirty flags.
You forgot to check that all other bits in that flag are set to 0 properly, otherwise you can never add a new bit to the field ever as you can't expect userspace got it right and did not accidentaly set it already.
There is kernel documentation on how to add a new ioctl, you might want to read up on that first before resending this.
Thank you so much for the review. I'll revisit and resend a v3.
thanks,
greg k-h
Add pagemap ioctl tests. Add several different types of tests to judge the correction of the interface.
Signed-off-by: Muhammad Usama Anjum usama.anjum@collabora.com --- TAP version 13 1..42 ok 1 sanity_tests no flag specified ok 2 sanity_tests wrong flag specified ok 3 sanity_tests mixture of correct and wrong flags ok 4 sanity_tests Clear area with larger vec size ok 5 Page testing: all new pages must be soft dirty ok 6 Page testing: all pages must not be soft dirty ok 7 Page testing: all pages dirty other than first and the last one ok 8 Page testing: only middle page dirty ok 9 Page testing: only two middle pages dirty ok 10 Page testing: only get 2 dirty pages and clear them as well ok 11 Page testing: Range clear only ok 12 Large Page testing: all new pages must be soft dirty ok 13 Large Page testing: all pages must not be soft dirty ok 14 Large Page testing: all pages dirty other than first and the last one ok 15 Large Page testing: only middle page dirty ok 16 Large Page testing: only two middle pages dirty ok 17 Large Page testing: only get 2 dirty pages and clear them as well ok 18 Large Page testing: Range clear only ok 19 Huge page testing: all new pages must be soft dirty ok 20 Huge page testing: all pages must not be soft dirty ok 21 Huge page testing: all pages dirty other than first and the last one ok 22 Huge page testing: only middle page dirty ok 23 Huge page testing: only two middle pages dirty ok 24 Huge page testing: only get 2 dirty pages and clear them as well ok 25 Huge page testing: Range clear only ok 26 Performance Page testing: page isn't dirty ok 27 Performance Page testing: all pages must not be soft dirty ok 28 Performance Page testing: all pages dirty other than first and the last one ok 29 Performance Page testing: only middle page dirty ok 30 Performance Page testing: only two middle pages dirty ok 31 Performance Page testing: only get 2 dirty pages and clear them as well ok 32 Performance Page testing: Range clear only ok 33 hpage_unit_tests all new huge page must be dirty ok 34 hpage_unit_tests all the huge page must not be dirty ok 35 hpage_unit_tests all the huge page must be dirty and clear ok 36 hpage_unit_tests only middle page dirty ok 37 hpage_unit_tests clear first half of huge page ok 38 hpage_unit_tests clear first half of huge page with limited buffer ok 39 hpage_unit_tests clear second half huge page ok 40 unmapped_region_tests Get dirty pages ok 41 unmapped_region_tests Get dirty pages ok 42 Test test_simple # Totals: pass:42 fail:0 xfail:0 xpass:0 skip:0 error:0
Changes in v2: - Update the tests to use the ioctl interface instead of syscall --- tools/testing/selftests/vm/.gitignore | 1 + tools/testing/selftests/vm/Makefile | 2 + tools/testing/selftests/vm/pagemap_ioctl.c | 629 +++++++++++++++++++++ 3 files changed, 632 insertions(+) create mode 100644 tools/testing/selftests/vm/pagemap_ioctl.c
diff --git a/tools/testing/selftests/vm/.gitignore b/tools/testing/selftests/vm/.gitignore index 31e5eea2a9b9..334fde556499 100644 --- a/tools/testing/selftests/vm/.gitignore +++ b/tools/testing/selftests/vm/.gitignore @@ -16,6 +16,7 @@ mremap_dontunmap mremap_test on-fault-limit transhuge-stress +pagemap_ioctl protection_keys protection_keys_32 protection_keys_64 diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 266e965e724c..4296c3268f64 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -51,6 +51,7 @@ TEST_GEN_FILES += on-fault-limit TEST_GEN_FILES += thuge-gen TEST_GEN_FILES += transhuge-stress TEST_GEN_FILES += userfaultfd +TEST_GEN_PROGS += pagemap_ioctl TEST_GEN_PROGS += soft-dirty TEST_GEN_PROGS += split_huge_page_test TEST_GEN_FILES += ksm_tests @@ -98,6 +99,7 @@ TEST_FILES += va_128TBswitch.sh include ../lib.mk
$(OUTPUT)/madv_populate: vm_util.c +$(OUTPUT)/pagemap_ioctl: vm_util.c $(OUTPUT)/soft-dirty: vm_util.c $(OUTPUT)/split_huge_page_test: vm_util.c
diff --git a/tools/testing/selftests/vm/pagemap_ioctl.c b/tools/testing/selftests/vm/pagemap_ioctl.c new file mode 100644 index 000000000000..4cd0a8d03012 --- /dev/null +++ b/tools/testing/selftests/vm/pagemap_ioctl.c @@ -0,0 +1,629 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <stdio.h> +#include <fcntl.h> +#include <unistd.h> +#include <string.h> +#include <sys/mman.h> +#include <errno.h> +#include <malloc.h> +#include <asm-generic/unistd.h> +#include "vm_util.h" +#include "../kselftest.h" +#include <linux/types.h> +#include <linux/fs.h> +#include <sys/ioctl.h> + +#define TEST_ITERATIONS 10000 +#define PAGEMAP "/proc/self/pagemap" +int pagemap_fd; + +static long pagemap_ioctl(void *start, int len, unsigned int cmd, loff_t *vec, + int vec_len, int flag) +{ + struct pagemap_sd_args arg; + int ret; + + arg.start = start; + arg.len = len; + arg.vec = vec; + arg.vec_len = vec_len; + arg.flags = flag; + + ret = ioctl(pagemap_fd, cmd, &arg); + + if (ret < 0) + return ret; + + return ret; +} + +int sanity_tests(int page_size) +{ + char *mem; + int mem_size, vec_size, ret; + loff_t *vec; + + /* 1. wrong operation */ + vec_size = 100; + mem_size = page_size; + + vec = malloc(sizeof(loff_t) * vec_size); + mem = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); + if (!mem || !vec) + ksft_exit_fail_msg("error nomem\n"); + + ksft_test_result(pagemap_ioctl(mem, mem_size, 0, vec, vec_size, 0) < 0, + "%s no flag specified\n", __func__); + ksft_test_result(pagemap_ioctl(mem, mem_size, 0x01000000, vec, vec_size, 0) < 0, + "%s wrong flag specified\n", __func__); + ksft_test_result(pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET | 0xFF, + vec, vec_size, 0) < 0, + "%s mixture of correct and wrong flags\n", __func__); + + /* 2. Clear area with larger vec size */ + ret = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET_AND_CLEAR, + vec, vec_size, 0); + ksft_test_result(ret >= 0, "%s Clear area with larger vec size\n", __func__); + + free(vec); + munmap(mem, mem_size); + return 0; +} + +void *gethugepage(int map_size) +{ + int ret; + char *map; + size_t hpage_len = read_pmd_pagesize(); + + map = memalign(hpage_len, map_size); + if (!map) + ksft_exit_fail_msg("memalign failed %d %s\n", errno, strerror(errno)); + + ret = madvise(map, map_size, MADV_HUGEPAGE); + if (ret) + ksft_exit_fail_msg("madvise failed %d %d %s\n", ret, errno, strerror(errno)); + + memset(map, 0, map_size); + + if (check_huge(map)) + return map; + + free(map); + return NULL; + +} + +int hpage_unit_tests(int page_size) +{ + char *map; + int i, ret; + size_t hpage_len = read_pmd_pagesize(); + size_t num_pages = 1; + int map_size = hpage_len * num_pages; + int vec_size = map_size/page_size; + loff_t *vec, *vec2; + + vec = malloc(sizeof(loff_t) * vec_size); + vec2 = malloc(sizeof(loff_t) * vec_size); + if (!vec || !vec2) + ksft_exit_fail_msg("malloc failed\n"); + + map = gethugepage(map_size); + if (map) { + // 1. all new huge page must be dirty + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET_AND_CLEAR, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + for (i = 0; i < vec_size; i++) + if (vec[i] != i * page_size) + break; + + ksft_test_result(i == vec_size, "%s all new huge page must be dirty\n", __func__); + + // 2. all the huge page must not be dirty + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + ksft_test_result(ret == 0, "%s all the huge page must not be dirty\n", __func__); + + // 3. all the huge page must be dirty and clear dirty as well + memset(map, -1, map_size); + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET_AND_CLEAR, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + for (i = 0; i < vec_size; i++) + if (vec[i] != i * page_size) + break; + + ksft_test_result(ret == vec_size && i == vec_size, + "%s all the huge page must be dirty and clear\n", __func__); + + // 4. only middle page dirty + free(map); + map = gethugepage(map_size); + clear_softdirty(); + map[vec_size/2 * page_size]++; + + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + for (i = 0; i < vec_size; i++) { + if (vec[i] == vec_size/2 * page_size) + break; + } + ksft_test_result(vec[i] == vec_size/2 * page_size, + "%s only middle page dirty\n", __func__); + + free(map); + } else { + ksft_test_result_skip("all new huge page must be dirty\n"); + ksft_test_result_skip("all the huge page must not be dirty\n"); + ksft_test_result_skip("all the huge page must be dirty and clear\n"); + ksft_test_result_skip("only middle page dirty\n"); + } + + // 5. clear first half of huge page + map = gethugepage(map_size); + if (map) { + ret = pagemap_ioctl(map, map_size/2, PAGEMAP_SD_CLEAR, NULL, 0, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + for (i = 0; i < vec_size/2; i++) + if (vec[i] != (i + vec_size/2) * page_size) + break; + + ksft_test_result(i == vec_size/2 && ret == vec_size/2, + "%s clear first half of huge page\n", __func__); + free(map); + } else { + ksft_test_result_skip("clear first half of huge page\n"); + } + + // 6. clear first half of huge page with limited buffer + map = gethugepage(map_size); + if (map) { + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET_AND_CLEAR, vec, vec_size/2, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + for (i = 0; i < vec_size/2; i++) + if (vec[i] != (i + vec_size/2) * page_size) + break; + + ksft_test_result(i == vec_size/2 && ret == vec_size/2, + "%s clear first half of huge page with limited buffer\n", + __func__); + free(map); + } else { + ksft_test_result_skip("clear first half of huge page with limited buffer\n"); + } + + // 7. clear second half of huge page + map = gethugepage(map_size); + if (map) { + memset(map, -1, map_size); + ret = pagemap_ioctl(map + map_size/2, map_size/2, PAGEMAP_SD_CLEAR, NULL, 0, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + ret = pagemap_ioctl(map, map_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + for (i = 0; i < vec_size/2; i++) + if (vec[i] != i * page_size) + break; + + ksft_test_result(i == vec_size/2, "%s clear second half huge page\n", __func__); + free(map); + } else { + ksft_test_result_skip("clear second half huge page\n"); + } + + free(vec); + free(vec2); + return 0; +} + +int base_tests(char *prefix, char *mem, int mem_size, int page_size, int skip) +{ + int vec_size, i, j, ret, dirty_pages, dirty_pages2; + loff_t *vec, *vec2; + + if (skip) { + ksft_test_result_skip("%s all new pages must be soft dirty\n", prefix); + ksft_test_result_skip("%s all pages must not be soft dirty\n", prefix); + ksft_test_result_skip("%s all pages dirty other than first and the last one\n", + prefix); + ksft_test_result_skip("%s only middle page dirty\n", prefix); + ksft_test_result_skip("%s only two middle pages dirty\n", prefix); + ksft_test_result_skip("%s only get 2 dirty pages and clear them as well\n", prefix); + ksft_test_result_skip("%s Range clear only\n", prefix); + return 0; + } + + vec_size = mem_size/page_size; + vec = malloc(sizeof(loff_t) * vec_size); + vec2 = malloc(sizeof(loff_t) * vec_size); + + /* 1. all new pages must be soft dirty and clear the range for next test */ + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET_AND_CLEAR, vec, vec_size - 2, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + dirty_pages2 = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET_AND_CLEAR, vec2, vec_size, 0); + if (dirty_pages2 < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages2, errno, strerror(errno)); + + for (i = 0; i < dirty_pages; i++) + if (vec[i] != i * page_size) + break; + for (j = 0; j < dirty_pages2; j++) + if (vec2[j] != (j + vec_size - 2) * page_size) + break; + + ksft_test_result(dirty_pages == vec_size - 2 && i == dirty_pages && + dirty_pages2 == 2 && j == dirty_pages2, + "%s all new pages must be soft dirty\n", prefix); + + // 2. all pages must not be soft dirty + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 0, "%s all pages must not be soft dirty\n", prefix); + + // 3. all pages dirty other than first and the last one + memset(mem + page_size, -1, (mem_size - (2 * page_size))); + + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages >= vec_size - 2 && dirty_pages <= vec_size, + "%s all pages dirty other than first and the last one\n", prefix); + + // 4. only middle page dirty + clear_softdirty(); + mem[vec_size/2 * page_size]++; + + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + for (i = 0; i < vec_size; i++) { + if (vec[i] == vec_size/2 * page_size) + break; + } + ksft_test_result(vec[i] == vec_size/2 * page_size, + "%s only middle page dirty\n", prefix); + + // 5. only two middle pages dirty and walk over only middle pages + clear_softdirty(); + mem[vec_size/2 * page_size]++; + mem[(vec_size/2 + 1) * page_size]++; + + dirty_pages = pagemap_ioctl(&mem[vec_size/2 * page_size], 2 * page_size, + PAGEMAP_SD_GET, vec, vec_size, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 2 && vec[0] == 0 && vec[1] == page_size, + "%s only two middle pages dirty\n", prefix); + + /* 6. only get 2 dirty pages and clear them as well */ + memset(mem, -1, mem_size); + + /* get and clear second and third pages */ + ret = pagemap_ioctl(mem + page_size, 2 * page_size, PAGEMAP_SD_GET_AND_CLEAR, vec, 2, 0); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, vec2, vec_size, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + for (i = 0; i < vec_size - 2; i++) { + if (i == 0 && (vec[i] != 0 || vec2[i] != 0)) + break; + else if (i == 1 && (vec[i] != page_size || vec2[i] != (i + 2) * page_size)) + break; + else if (i > 1 && (vec2[i] != (i + 2) * page_size)) + break; + } + + ksft_test_result(dirty_pages == vec_size - 2 && i == vec_size - 2, + "%s only get 2 dirty pages and clear them as well\n", prefix); + /* 7. Range clear only */ + memset(mem, -1, mem_size); + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_CLEAR, NULL, 0, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + dirty_pages2 = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, vec, vec_size, 0); + if (dirty_pages2 < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages2, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 0 && dirty_pages2 == 0, "%s Range clear only\n", + prefix); + + free(vec); + free(vec2); + return 0; +} + +int performance_base_tests(char *prefix, char *mem, int mem_size, int page_size, int skip) +{ + int vec_size, i, ret, dirty_pages, dirty_pages2; + loff_t *vec, *vec2; + + if (skip) { + ksft_test_result_skip("%s all new pages must be soft dirty\n", prefix); + ksft_test_result_skip("%s all pages must not be soft dirty\n", prefix); + ksft_test_result_skip("%s all pages dirty other than first and the last one\n", + prefix); + ksft_test_result_skip("%s only middle page dirty\n", prefix); + ksft_test_result_skip("%s only two middle pages dirty\n", prefix); + ksft_test_result_skip("%s only get 2 dirty pages and clear them as well\n", prefix); + ksft_test_result_skip("%s Range clear only\n", prefix); + return 0; + } + + vec_size = mem_size/page_size; + vec = malloc(sizeof(loff_t) * vec_size); + vec2 = malloc(sizeof(loff_t) * vec_size); + + /* 1. all new pages must be soft dirty and clear the range for next test */ + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET_AND_CLEAR, + vec, vec_size - 2, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + dirty_pages2 = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET_AND_CLEAR, + vec2, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages2 < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages2, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 0 && dirty_pages2 == 0, + "%s page isn't dirty\n", prefix); + + // 2. all pages must not be soft dirty + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, + vec, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 0, "%s all pages must not be soft dirty\n", prefix); + + // 3. all pages dirty other than first and the last one + memset(mem + page_size, -1, (mem_size - 2 * page_size)); + + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, + vec, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + for (i = 0; i < dirty_pages; i++) { + if (vec[i] != (i + 1) * page_size) + break; + } + + ksft_test_result(dirty_pages == vec_size - 2 && i == vec_size - 2, + "%s all pages dirty other than first and the last one\n", prefix); + + // 4. only middle page dirty + clear_softdirty(); + mem[vec_size/2 * page_size]++; + + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, + vec, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + for (i = 0; i < vec_size; i++) { + if (vec[i] == vec_size/2 * page_size) + break; + } + ksft_test_result(vec[i] == vec_size/2 * page_size, + "%s only middle page dirty\n", prefix); + + // 5. only two middle pages dirty and walk over only middle pages + clear_softdirty(); + mem[vec_size/2 * page_size]++; + mem[(vec_size/2 + 1) * page_size]++; + + dirty_pages = pagemap_ioctl(&mem[vec_size/2 * page_size], 2 * page_size, + PAGEMAP_SD_GET, vec, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 2 && vec[0] == 0 && vec[1] == page_size, + "%s only two middle pages dirty\n", prefix); + + /* 6. only get 2 dirty pages and clear them as well */ + memset(mem, -1, mem_size); + + /* get and clear second and third pages */ + ret = pagemap_ioctl(mem + page_size, 2 * page_size, PAGEMAP_SD_GET_AND_CLEAR, + vec, 2, PAGEMAP_SD_NO_REUSED_REGIONS); + if (ret < 0) + ksft_exit_fail_msg("error %d %d %s\n", ret, errno, strerror(errno)); + + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, + vec2, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + for (i = 0; i < vec_size - 2; i++) { + if (i == 0 && (vec[i] != 0 || vec2[i] != 0)) + break; + else if (i == 1 && (vec[i] != page_size || vec2[i] != (i + 2) * page_size)) + break; + else if (i > 1 && (vec2[i] != (i + 2) * page_size)) + break; + } + + ksft_test_result(dirty_pages == vec_size - 2 && i == vec_size - 2, + "%s only get 2 dirty pages and clear them as well\n", prefix); + + /* 7. Range clear only */ + memset(mem, -1, mem_size); + dirty_pages = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_CLEAR, + NULL, 0, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + dirty_pages2 = pagemap_ioctl(mem, mem_size, PAGEMAP_SD_GET, + vec, vec_size, PAGEMAP_SD_NO_REUSED_REGIONS); + if (dirty_pages2 < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages2, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 0 && dirty_pages2 == 0, "%s Range clear only\n", + prefix); + + free(vec); + free(vec2); + return 0; +} + +int unmapped_region_tests(int page_size) +{ + void *start = (void *)0x10000000; + int dirty_pages, len = 0x00040000; + int vec_size = len / page_size; + loff_t *vec = malloc(sizeof(loff_t) * vec_size); + + /* 1. Get dirty pages */ + dirty_pages = pagemap_ioctl(start, len, PAGEMAP_SD_GET, vec, vec_size, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages >= 0, "%s Get dirty pages\n", __func__); + + /* 2. Clear dirty bit of whole address space */ + dirty_pages = pagemap_ioctl(0, 0x7FFFFFFF, PAGEMAP_SD_CLEAR, NULL, 0, 0); + if (dirty_pages < 0) + ksft_exit_fail_msg("error %d %d %s\n", dirty_pages, errno, strerror(errno)); + + ksft_test_result(dirty_pages == 0, "%s Get dirty pages\n", __func__); + + free(vec); + return 0; +} + +static void test_simple(int page_size) +{ + int i; + char *map; + loff_t *vec = NULL; + + map = aligned_alloc(page_size, page_size); + if (!map) + ksft_exit_fail_msg("mmap failed\n"); + + clear_softdirty(); + + for (i = 0 ; i < TEST_ITERATIONS; i++) { + if (pagemap_ioctl(map, page_size, PAGEMAP_SD_GET, vec, 1, 0) == 1) { + ksft_print_msg("dirty bit was 1, but should be 0 (i=%d)\n", i); + break; + } + + clear_softdirty(); + // Write something to the page to get the dirty bit enabled on the page + map[0]++; + + if (pagemap_ioctl(map, page_size, PAGEMAP_SD_GET, vec, 1, 0) == 0) { + ksft_print_msg("dirty bit was 0, but should be 1 (i=%d)\n", i); + break; + } + + clear_softdirty(); + } + free(map); + + ksft_test_result(i == TEST_ITERATIONS, "Test %s\n", __func__); +} + +int main(int argc, char **argv) +{ + int page_size = getpagesize(); + size_t hpage_len = read_pmd_pagesize(); + char *mem, *map; + int mem_size; + + ksft_print_header(); + ksft_set_plan(42); + + pagemap_fd = open(PAGEMAP, O_RDWR); + if (pagemap_fd < 0) + return -EINVAL; + + /* 1. Sanity testing */ + sanity_tests(page_size); + + /* 2. Normal page testing */ + mem_size = 10 * page_size; + mem = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); + if (!mem) + ksft_exit_fail_msg("error nomem\n"); + + base_tests("Page testing:", mem, mem_size, page_size, 0); + + munmap(mem, mem_size); + + /* 3. Large page testing */ + mem_size = 512 * 10 * page_size; + mem = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); + if (!mem) + ksft_exit_fail_msg("error nomem\n"); + + base_tests("Large Page testing:", mem, mem_size, page_size, 0); + + munmap(mem, mem_size); + + /* 4. Huge page testing */ + map = gethugepage(hpage_len); + if (check_huge(map)) + base_tests("Huge page testing:", map, hpage_len, page_size, 0); + else + base_tests("Huge page testing:", NULL, 0, 0, 1); + + free(map); + + /* 5. Normal page testing */ + mem_size = 10 * page_size; + mem = mmap(NULL, mem_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); + if (!mem) + ksft_exit_fail_msg("error nomem\n"); + + performance_base_tests("Performance Page testing:", mem, mem_size, page_size, 0); + + munmap(mem, mem_size); + + /* 6. Huge page tests */ + hpage_unit_tests(page_size); + + /* 7. Unmapped address test */ + unmapped_region_tests(page_size); + + /* 8. Iterative test */ + test_simple(page_size); + + close(pagemap_fd); + return ksft_exit_pass(); +}
Add the explanation of the added ioctl on pagemap file. It can be used to get, clear or both soft-dirty PTE bit of the specified range. or both at the same time.
Signed-off-by: Muhammad Usama Anjum usama.anjum@collabora.com --- Changes in v2: - Update documentation to mention ioctl instead of the syscall --- Documentation/admin-guide/mm/soft-dirty.rst | 42 ++++++++++++++++++++- 1 file changed, 41 insertions(+), 1 deletion(-)
diff --git a/Documentation/admin-guide/mm/soft-dirty.rst b/Documentation/admin-guide/mm/soft-dirty.rst index cb0cfd6672fa..d3d33e63a965 100644 --- a/Documentation/admin-guide/mm/soft-dirty.rst +++ b/Documentation/admin-guide/mm/soft-dirty.rst @@ -5,7 +5,12 @@ Soft-Dirty PTEs ===============
The soft-dirty is a bit on a PTE which helps to track which pages a task -writes to. In order to do this tracking one should +writes to. + +Using Proc FS +------------- + +In order to do this tracking one should
1. Clear soft-dirty bits from the task's PTEs.
@@ -20,6 +25,41 @@ writes to. In order to do this tracking one should 64-bit qword is the soft-dirty one. If set, the respective PTE was written to since step 1.
+Using IOCTL +----------- + +The IOCTL on the ``/proc/PID/pagemap`` can be can be used to find the dirty pages +atomically. The following commands are supported:: + + MEMWATCH_SD_GET + Get the page offsets which are soft dirty. + + MEMWATCH_SD_CLEAR + Clear the pages which are soft dirty. + + MEMWATCH_SD_GET_AND_CLEAR + Get and clear the pages which are soft dirty. + +The struct :c:type:`pagemap_sd_args` is used as the argument. In this struct: + + 1. The range is specified through start and len. The len argument need not be + the multiple of the page size, but since the information is returned for the + whole pages, len is effectively rounded up to the next multiple of the page + size. + + 2. The output buffer and size is specified in vec and vec_len. The offsets of + the dirty pages from start are returned in vec. The ioctl returns when the + whole range has been searched or vec is completely filled. The whole range + isn't cleared if vec fills up completely. + + 3. The flags can be specified in flags field. Currently only one flag, + PAGEMAP_SD_NO_REUSED_REGIONS is supported which can be specified to ignore + the VMA dirty flags for better performance. This flag shows only those pages + dirty which have been written to by the user. All new allocations aren't returned + to be dirty. + +Explanation +-----------
Internally, to do this tracking, the writable bit is cleared from PTEs when the soft-dirty bit is cleared. So, after this, when the task tries to
linux-kselftest-mirror@lists.linaro.org